Fast ext4 fsck times, revisited
Last night I managed to finish up a rather satisfying improvement to ext4’s inode and block allocators. The ext4’s original allocator was actually a bit more simple-minded than ext3’s, in that it didn’t implement the Orlov algorithm to spread out top-level directories for better filesystem aging. It also was buggy in certain ways, where it would return ENOSPC even when there were still plenty of inodes in the file system.
So I had been working on extending ext3’s original Orlov allocator so it would work well with ext4. While I was at it, it occurred to me that one of the tricks I could play with ext4’s flex groups (which are higher-order collection of block groups), was to bias the block allocation algorithms such that the first block group in a flexgroup would be preferred for use by directories, and biased against data blocks for regular files. This meant that directory blocks would get clustered together, which cut a third off the time needed for e2fsck pass2:
<td colspan="5" align="center">
ext4 old allocator
</td>
<td rowspan="9" align="center">
</td>
<td colspan="5" align="center">
ext4 new allocator
</td>
</tr>
<tr>
<td colspan="3" align="center">
time (s)
</td>
<td colspan="2" align="center">
I/O
</td>
<td colspan="3" align="center">
time (s)
</td>
<td colspan="2" align="center">
I/O
</td>
</tr>
<tr>
<td align="center" valign="middle">
real
</td>
<td align="center" valign="middle">
user
</td>
<td align="center" valign="middle">
system
</td>
<td align="center" valign="middle">
MB read
</td>
<td align="center" valign="middle">
MB/s
</td>
<td align="center" valign="middle">
real
</td>
<td align="center" valign="middle">
user
</td>
<td align="center" valign="middle">
system
</td>
<td align="center" valign="middle">
MB read
</td>
<td align="center" valign="middle">
MB/s
</td>
</tr>
<tr>
<td align="center">
1
</td>
<td align="right">
6.69
</td>
<td align="right">
4.06
</td>
<td align="right">
0.90
</td>
<td align="right">
82
</td>
<td align="right">
12.25
</td>
<td align="right">
6.70
</td>
<td align="right">
3.63
</td>
<td align="right">
1.58
</td>
<td align="right">
82
</td>
<td align="right">
12.23
</td>
</tr>
<tr>
<td align="center">
2
</td>
<td align="right">
<b>13.34</b>
</td>
<td align="right">
2.30
</td>
<td align="right">
3.78
</td>
<td align="right">
133
</td>
<td align="right">
9.97
</td>
<td align="right">
<b>4.24</b>
</td>
<td align="right">
1.27
</td>
<td align="right">
2.46
</td>
<td align="right">
133
</td>
<td align="right">
31.36
</td>
</tr>
<tr>
<td align="center">
3
</td>
<td align="right">
0.02
</td>
<td align="right">
0.01
</td>
<td align="right">
</td>
<td align="right">
1
</td>
<td align="right">
63.85
</td>
<td align="right">
0.01
</td>
<td align="right">
0.01
</td>
<td align="right">
0.01
</td>
<td align="right">
1
</td>
<td align="right">
82.69
</td>
</tr>
<tr>
<td align="center">
4
</td>
<td align="right">
0.28
</td>
<td align="right">
0.27
</td>
<td align="right">
</td>
<td align="right">
</td>
<td align="right">
</td>
<td align="right">
0.23
</td>
<td align="right">
0.22
</td>
<td align="right">
</td>
<td align="right">
</td>
<td align="right">
</td>
</tr>
<tr>
<td align="center">
5
</td>
<td align="right">
2.60
</td>
<td align="right">
2.31
</td>
<td align="right">
0.03
</td>
<td align="right">
1
</td>
<td align="right">
0.38
</td>
<td align="right">
2.42
</td>
<td align="right">
2.15
</td>
<td align="right">
0.07
</td>
<td align="right">
1
</td>
<td align="right">
0.41
</td>
</tr>
<tr>
<td align="center">
Total
</td>
<td align="right">
<b>23.06</b>
</td>
<td align="right">
9.03
</td>
<td align="right">
4.74
</td>
<td align="right">
216
</td>
<td align="right">
9.37
</td>
<td align="right">
<b>13.78</b>
</td>
<td align="right">
7.33
</td>
<td align="right">
4.19
</td>
<td align="right">
216
</td>
<td align="right">
15.68
</td>
</tr>
<table border="1" cellspacing="1" cellpadding="2">
<caption> Comparison of e2fsck times on an 32GB partition<br /> </caption> <colgroup align="center"> </colgroup> <colgroup align="right" span="3" width="50"> </colgroup> <colgroup align="right" span="2"> </colgroup> <colgroup> </colgroup> <colgroup align="right" span="3" width="50"> </colgroup> <colgroup align="right" span="2"> </colgroup> <colgroup> </colgroup> <tr>
<td rowspan="3" align="center">
Pass
</td>
<td colspan="5" align="center">
ext3
</td>
<td rowspan="9" align="center">
</td>
<td colspan="5" align="center">
ext4 old allocator
</td>
</tr>
<tr>
<td colspan="3" align="center">
time (s)
</td>
<td colspan="2" align="center">
I/O
</td>
<td colspan="3" align="center">
time (s)
</td>
<td colspan="2" align="center">
I/O
</td>
</tr>
<tr>
<td align="center" valign="middle">
real
</td>
<td align="center" valign="middle">
user
</td>
<td align="center" valign="middle">
system
</td>
<td align="center" valign="middle">
MB read
</td>
<td align="center" valign="middle">
MB/s
</td>
<td align="center" valign="middle">
real
</td>
<td align="center" valign="middle">
user
</td>
<td align="center" valign="middle">
system
</td>
<td align="center" valign="middle">
MB read
</td>
<td align="center" valign="middle">
MB/s
</td>
</tr>
<tr>
<td align="center">
1
</td>
<td align="right">
<b>108.40</b>
</td>
<td align="right">
13.74
</td>
<td align="right">
11.53
</td>
<td align="right">
583
</td>
<td align="right">
5.38
</td>
<td align="right">
<b>6.69</b>
</td>
<td align="right">
4.06
</td>
<td align="right">
0.90
</td>
<td align="right">
82
</td>
<td align="right">
12.25
</td>
</tr>
<tr>
<td align="center">
2
</td>
<td align="right">
5.91
</td>
<td align="right">
1.74
</td>
<td align="right">
2.56
</td>
<td align="right">
133
</td>
<td align="right">
22.51
</td>
<td align="right">
13.34
</td>
<td align="right">
2.30
</td>
<td align="right">
3.78
</td>
<td align="right">
133
</td>
<td align="right">
9.97
</td>
</tr>
<tr>
<td align="center">
3
</td>
<td align="right">
0.03
</td>
<td align="right">
0.01
</td>
<td align="right">
</td>
<td align="right">
1
</td>
<td align="right">
31.21
</td>
<td align="right">
0.02
</td>
<td align="right">
0.01
</td>
<td align="right">
</td>
<td align="right">
1
</td>
<td align="right">
63.85
</td>
</tr>
<tr>
<td align="center">
4
</td>
<td align="right">
0.28
</td>
<td align="right">
0.27
</td>
<td align="right">
</td>
<td align="right">
</td>
<td align="right">
</td>
<td align="right">
0.28
</td>
<td align="right">
0.27
</td>
<td align="right">
</td>
<td align="right">
</td>
<td align="right">
</td>
</tr>
<tr>
<td align="center">
5
</td>
<td align="right">
3.17
</td>
<td align="right">
0.92
</td>
<td align="right">
0.13
</td>
<td align="right">
2
</td>
<td align="right">
0.63
</td>
<td align="right">
2.60
</td>
<td align="right">
2.31
</td>
<td align="right">
0.03
</td>
<td align="right">
1
</td>
<td align="right">
0.38
</td>
</tr>
<tr>
<td align="center">
Total
</td>
<td align="right">
<b>118.15</b>
</td>
<td align="right">
16.75
</td>
<td align="right">
14.25
</td>
<td align="right">
718
</td>
<td align="right">
6.08
</td>
<td align="right">
<b>23.06</b>
</td>
<td align="right">
9.03
</td>
<td align="right">
4.74
</td>
<td align="right">
216
</td>
<td align="right">
9.37
</td>
</tr>
</table>
<p>
</center>
</p>
<p>
</p>
<p>
<center>
</p>
<table border="1" cellspacing="1" cellpadding="2">
<caption> Vital Statistics of the 32GB partition<br /> </caption> <colgroup align="right"> </colgroup> <colgroup align="left"></colgroup> <tr>
<td align="right">
312214
</td>
<td align="left">
inodes used (14.89%)<br /> </tr>
<tr>
<td align="right">
263
</td>
<td align="left">
non-contiguous files (0.1%)
</td>
</tr>
<tr>
<td align="right">
198
</td>
<td align="left">
non-contiguous directories (0.1%)
</td>
</tr>
<tr>
<td align="right">
</td>
<td align="left">
# of inodes with ind/dind/tind blocks: 0/0/0
</td>
</tr>
<tr>
<td align="right">
</td>
<td align="left">
Extent depth histogram: 292698/40
</td>
</tr>
<tr>
<td align="right">
4388697
</td>
<td align="left">
blocks used (52.32%)
</td>
</tr>
<tr>
<td align="right">
</td>
<td align="left">
bad blocks
</td>
</tr>
<tr>
<td align="right">
1
</td>
<td align="left">
large file
</td>
</tr>
<tr>
<td align="right">
</td>
<td align="left">
</td>
</tr>
<tr>
<td align="right">
263549
</td>
<td align="left">
regular files
</td>
</tr>
<tr>
<td align="right">
28022
</td>
<td align="left">
directories
</td>
</tr>
<tr>
<td align="right">
5
</td>
<td align="left">
character device files
</td>
</tr>
<tr>
<td align="right">
1
</td>
<td align="left">
block device file
</td>
</tr>
<tr>
<td align="right">
5
</td>
<td align="left">
fifos
</td>
</tr>
<tr>
<td align="right">
615
</td>
<td align="left">
links
</td>
</tr>
<tr>
<td align="right">
20618
</td>
<td align="left">
symbolic links (19450 fast symbolic links)
</td>
</tr>
<tr>
<td align="right">
5
</td>
<td align="left">
sockets
</td>
</tr>
<tr>
</tr>
<tr>
<td align="right">
312820
</td>
<td align="left">
files
</td>
</tr></tbody> </table>
<p>
</center>
</p>