Fast ext4 fsck times, revisited


Last night I managed to finish up a rather satisfying improvement to ext4’s inode and block allocators. The ext4’s original allocator was actually a bit more simple-minded than ext3’s, in that it didn’t implement the Orlov algorithm to spread out top-level directories for better filesystem aging. It also was buggy in certain ways, where it would return ENOSPC even when there were still plenty of inodes in the file system.

So I had been working on extending ext3’s original Orlov allocator so it would work well with ext4. While I was at it, it occurred to me that one of the tricks I could play with ext4’s flex groups (which are higher-order collection of block groups), was to bias the block allocation algorithms such that the first block group in a flexgroup would be preferred for use by directories, and biased against data blocks for regular files. This meant that directory blocks would get clustered together, which cut a third off the time needed for e2fsck pass2:

  <td colspan="5" align="center">
    ext4 old allocator
  </td>
  
  <td rowspan="9" align="center">
  </td>
  
  <td colspan="5" align="center">
    ext4 new allocator
  </td>
</tr>

<tr>
  <td colspan="3" align="center">
    time (s)
  </td>
  
  <td colspan="2" align="center">
    I/O
  </td>
  
  <td colspan="3" align="center">
    time (s)
  </td>
  
  <td colspan="2" align="center">
    I/O
  </td>
</tr>

<tr>
  <td align="center" valign="middle">
    real
  </td>
  
  <td align="center" valign="middle">
    user
  </td>
  
  <td align="center" valign="middle">
    system
  </td>
  
  <td align="center" valign="middle">
    MB read
  </td>
  
  <td align="center" valign="middle">
    MB/s
  </td>
  
  <td align="center" valign="middle">
    real
  </td>
  
  <td align="center" valign="middle">
    user
  </td>
  
  <td align="center" valign="middle">
    system
  </td>
  
  <td align="center" valign="middle">
    MB read
  </td>
  
  <td align="center" valign="middle">
    MB/s
  </td>
</tr>

<tr>
  <td align="center">
    1
  </td>
  
  <td align="right">
    6.69
  </td>
  
  <td align="right">
    4.06
  </td>
  
  <td align="right">
    0.90
  </td>
  
  <td align="right">
    82
  </td>
  
  <td align="right">
    12.25
  </td>
  
  <td align="right">
    6.70
  </td>
  
  <td align="right">
    3.63
  </td>
  
  <td align="right">
    1.58
  </td>
  
  <td align="right">
    82
  </td>
  
  <td align="right">
    12.23
  </td>
</tr>

<tr>
  <td align="center">
    2
  </td>
  
  <td align="right">
    <b>13.34</b>
  </td>
  
  <td align="right">
    2.30
  </td>
  
  <td align="right">
    3.78
  </td>
  
  <td align="right">
    133
  </td>
  
  <td align="right">
    9.97
  </td>
  
  <td align="right">
    <b>4.24</b>
  </td>
  
  <td align="right">
    1.27
  </td>
  
  <td align="right">
    2.46
  </td>
  
  <td align="right">
    133
  </td>
  
  <td align="right">
    31.36
  </td>
</tr>

<tr>
  <td align="center">
    3
  </td>
  
  <td align="right">
    0.02
  </td>
  
  <td align="right">
    0.01
  </td>
  
  <td align="right">
  </td>
  
  <td align="right">
    1
  </td>
  
  <td align="right">
    63.85
  </td>
  
  <td align="right">
    0.01
  </td>
  
  <td align="right">
    0.01
  </td>
  
  <td align="right">
    0.01
  </td>
  
  <td align="right">
    1
  </td>
  
  <td align="right">
    82.69
  </td>
</tr>

<tr>
  <td align="center">
    4
  </td>
  
  <td align="right">
    0.28
  </td>
  
  <td align="right">
    0.27
  </td>
  
  <td align="right">
  </td>
  
  <td align="right">
  </td>
  
  <td align="right">
  </td>
  
  <td align="right">
    0.23
  </td>
  
  <td align="right">
    0.22
  </td>
  
  <td align="right">
  </td>
  
  <td align="right">
  </td>
  
  <td align="right">
  </td>
</tr>

<tr>
  <td align="center">
    5
  </td>
  
  <td align="right">
    2.60
  </td>
  
  <td align="right">
    2.31
  </td>
  
  <td align="right">
    0.03
  </td>
  
  <td align="right">
    1
  </td>
  
  <td align="right">
    0.38
  </td>
  
  <td align="right">
    2.42
  </td>
  
  <td align="right">
    2.15
  </td>
  
  <td align="right">
    0.07
  </td>
  
  <td align="right">
    1
  </td>
  
  <td align="right">
    0.41
  </td>
</tr>

<tr>
  <td align="center">
    Total
  </td>
  
  <td align="right">
    <b>23.06</b>
  </td>
  
  <td align="right">
    9.03
  </td>
  
  <td align="right">
    4.74
  </td>
  
  <td align="right">
    216
  </td>
  
  <td align="right">
    9.37
  </td>
  
  <td align="right">
    <b>13.78</b>
  </td>
  
  <td align="right">
    7.33
  </td>
  
  <td align="right">
    4.19
  </td>
  
  <td align="right">
    216
  </td>
  
  <td align="right">
    15.68
  </td>
</tr>
  <table border="1" cellspacing="1" cellpadding="2">
    <caption> Comparison of e2fsck times on an 32GB partition<br /> </caption> <colgroup align="center"> </colgroup> <colgroup align="right" span="3" width="50"> </colgroup> <colgroup align="right" span="2"> </colgroup> <colgroup> </colgroup> <colgroup align="right" span="3" width="50"> </colgroup> <colgroup align="right" span="2"> </colgroup> <colgroup> </colgroup> <tr>
      <td rowspan="3" align="center">
        Pass
      </td>
      
      <td colspan="5" align="center">
        ext3
      </td>
      
      <td rowspan="9" align="center">
      </td>
      
      <td colspan="5" align="center">
        ext4 old allocator
      </td>
    </tr>
    
    <tr>
      <td colspan="3" align="center">
        time (s)
      </td>
      
      <td colspan="2" align="center">
        I/O
      </td>
      
      <td colspan="3" align="center">
        time (s)
      </td>
      
      <td colspan="2" align="center">
        I/O
      </td>
    </tr>
    
    <tr>
      <td align="center" valign="middle">
        real
      </td>
      
      <td align="center" valign="middle">
        user
      </td>
      
      <td align="center" valign="middle">
        system
      </td>
      
      <td align="center" valign="middle">
        MB read
      </td>
      
      <td align="center" valign="middle">
        MB/s
      </td>
      
      <td align="center" valign="middle">
        real
      </td>
      
      <td align="center" valign="middle">
        user
      </td>
      
      <td align="center" valign="middle">
        system
      </td>
      
      <td align="center" valign="middle">
        MB read
      </td>
      
      <td align="center" valign="middle">
        MB/s
      </td>
    </tr>
    
    <tr>
      <td align="center">
        1
      </td>
      
      <td align="right">
        <b>108.40</b>
      </td>
      
      <td align="right">
        13.74
      </td>
      
      <td align="right">
        11.53
      </td>
      
      <td align="right">
        583
      </td>
      
      <td align="right">
        5.38
      </td>
      
      <td align="right">
        <b>6.69</b>
      </td>
      
      <td align="right">
        4.06
      </td>
      
      <td align="right">
        0.90
      </td>
      
      <td align="right">
        82
      </td>
      
      <td align="right">
        12.25
      </td>
    </tr>
    
    <tr>
      <td align="center">
        2
      </td>
      
      <td align="right">
        5.91
      </td>
      
      <td align="right">
        1.74
      </td>
      
      <td align="right">
        2.56
      </td>
      
      <td align="right">
        133
      </td>
      
      <td align="right">
        22.51
      </td>
      
      <td align="right">
        13.34
      </td>
      
      <td align="right">
        2.30
      </td>
      
      <td align="right">
        3.78
      </td>
      
      <td align="right">
        133
      </td>
      
      <td align="right">
        9.97
      </td>
    </tr>
    
    <tr>
      <td align="center">
        3
      </td>
      
      <td align="right">
        0.03
      </td>
      
      <td align="right">
        0.01
      </td>
      
      <td align="right">
      </td>
      
      <td align="right">
        1
      </td>
      
      <td align="right">
        31.21
      </td>
      
      <td align="right">
        0.02
      </td>
      
      <td align="right">
        0.01
      </td>
      
      <td align="right">
      </td>
      
      <td align="right">
        1
      </td>
      
      <td align="right">
        63.85
      </td>
    </tr>
    
    <tr>
      <td align="center">
        4
      </td>
      
      <td align="right">
        0.28
      </td>
      
      <td align="right">
        0.27
      </td>
      
      <td align="right">
      </td>
      
      <td align="right">
      </td>
      
      <td align="right">
      </td>
      
      <td align="right">
        0.28
      </td>
      
      <td align="right">
        0.27
      </td>
      
      <td align="right">
      </td>
      
      <td align="right">
      </td>
      
      <td align="right">
      </td>
    </tr>
    
    <tr>
      <td align="center">
        5
      </td>
      
      <td align="right">
        3.17
      </td>
      
      <td align="right">
        0.92
      </td>
      
      <td align="right">
        0.13
      </td>
      
      <td align="right">
        2
      </td>
      
      <td align="right">
        0.63
      </td>
      
      <td align="right">
        2.60
      </td>
      
      <td align="right">
        2.31
      </td>
      
      <td align="right">
        0.03
      </td>
      
      <td align="right">
        1
      </td>
      
      <td align="right">
        0.38
      </td>
    </tr>
    
    <tr>
      <td align="center">
        Total
      </td>
      
      <td align="right">
        <b>118.15</b>
      </td>
      
      <td align="right">
        16.75
      </td>
      
      <td align="right">
        14.25
      </td>
      
      <td align="right">
        718
      </td>
      
      <td align="right">
        6.08
      </td>
      
      <td align="right">
        <b>23.06</b>
      </td>
      
      <td align="right">
        9.03
      </td>
      
      <td align="right">
        4.74
      </td>
      
      <td align="right">
        216
      </td>
      
      <td align="right">
        9.37
      </td>
    </tr>
  </table>
  
  <p>
    </center>
  </p>
  
  <p>
  </p>
  
  <p>
    <center>
      </p> 
      
      <table border="1" cellspacing="1" cellpadding="2">
        <caption> Vital Statistics of the 32GB partition<br /> </caption> <colgroup align="right"> </colgroup> <colgroup align="left"></colgroup> <tr>
          <td align="right">
            312214
          </td>
          
          <td align="left">
            inodes used (14.89%)<br /> </tr> 
            
            <tr>
              <td align="right">
                263
              </td>
              
              <td align="left">
                non-contiguous files (0.1%)
              </td>
            </tr>
            
            <tr>
              <td align="right">
                198
              </td>
              
              <td align="left">
                non-contiguous directories (0.1%)
              </td>
            </tr>
            
            <tr>
              <td align="right">
                &nbsp;
              </td>
              
              <td align="left">
                # of inodes with ind/dind/tind blocks: 0/0/0
              </td>
            </tr>
            
            <tr>
              <td align="right">
                &nbsp;
              </td>
              
              <td align="left">
                Extent depth histogram: 292698/40
              </td>
            </tr>
            
            <tr>
              <td align="right">
                4388697
              </td>
              
              <td align="left">
                blocks used (52.32%)
              </td>
            </tr>
            
            <tr>
              <td align="right">
              </td>
              
              <td align="left">
                bad blocks
              </td>
            </tr>
            
            <tr>
              <td align="right">
                1
              </td>
              
              <td align="left">
                large file
              </td>
            </tr>
            
            <tr>
              <td align="right">
                &nbsp;
              </td>
              
              <td align="left">
                &nbsp;
              </td>
            </tr>
            
            <tr>
              <td align="right">
                263549
              </td>
              
              <td align="left">
                regular files
              </td>
            </tr>
            
            <tr>
              <td align="right">
                28022
              </td>
              
              <td align="left">
                directories
              </td>
            </tr>
            
            <tr>
              <td align="right">
                5
              </td>
              
              <td align="left">
                character device files
              </td>
            </tr>
            
            <tr>
              <td align="right">
                1
              </td>
              
              <td align="left">
                block device file
              </td>
            </tr>
            
            <tr>
              <td align="right">
                5
              </td>
              
              <td align="left">
                fifos
              </td>
            </tr>
            
            <tr>
              <td align="right">
                615
              </td>
              
              <td align="left">
                links
              </td>
            </tr>
            
            <tr>
              <td align="right">
                20618
              </td>
              
              <td align="left">
                symbolic links (19450 fast symbolic links)
              </td>
            </tr>
            
            <tr>
              <td align="right">
                5
              </td>
              
              <td align="left">
                sockets
              </td>
            </tr>
            
            <tr>
            </tr>
            
            <tr>
              <td align="right">
                312820
              </td>
              
              <td align="left">
                files
              </td>
            </tr></tbody> </table> 
            
            <p>
              </center>
            </p>