<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Fast ext4 fsck times</title>
	<atom:link href="http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/feed/" rel="self" type="application/rss+xml" />
	<link>http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/</link>
	<description>Musings about Open Source, Linux, and Life by Theodore Tso</description>
	<lastBuildDate>Mon, 22 Feb 2010 22:39:59 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Fast ext4 fsck times, revisited &#124; Thoughts by Ted</title>
		<link>http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/comment-page-1/#comment-1898</link>
		<dc:creator>Fast ext4 fsck times, revisited &#124; Thoughts by Ted</dc:creator>
		<pubDate>Thu, 26 Feb 2009 15:04:36 +0000</pubDate>
		<guid isPermaLink="false">http://thunk.org/tytso/blog/?p=152#comment-1898</guid>
		<description>[...] posts (automatically generated):Fast ext4 fsck times This wasn&#8217;t one of the things we were explicitly engineering for when were designing the [...]</description>
		<content:encoded><![CDATA[<p>[...] posts (automatically generated):Fast ext4 fsck times This wasn&#8217;t one of the things we were explicitly engineering for when were designing the [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tshepang Lekhonkhobe</title>
		<link>http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/comment-page-1/#comment-1634</link>
		<dc:creator>Tshepang Lekhonkhobe</dc:creator>
		<pubDate>Tue, 13 Jan 2009 12:22:00 +0000</pubDate>
		<guid isPermaLink="false">http://thunk.org/tytso/blog/?p=152#comment-1634</guid>
		<description>For some reason, I missed the announcement that ext4 has been declared stable since Christmas, nor was I aware that 2.6.28 was available on that date. That&#039;s a bit earlier than I thought. Thanks for the work...</description>
		<content:encoded><![CDATA[<p>For some reason, I missed the announcement that ext4 has been declared stable since Christmas, nor was I aware that 2.6.28 was available on that date. That&#8217;s a bit earlier than I thought. Thanks for the work&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tshepang Lekhonkhobe</title>
		<link>http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/comment-page-1/#comment-1589</link>
		<dc:creator>Tshepang Lekhonkhobe</dc:creator>
		<pubDate>Thu, 08 Jan 2009 06:56:46 +0000</pubDate>
		<guid isPermaLink="false">http://thunk.org/tytso/blog/?p=152#comment-1589</guid>
		<description>@17,

Thanks for the info, for that was quite a mouthful, and thanks for your mighty contributions to FLOSS. When do you estimate (what kernel release) that ext4 will be considered ready for production systems?</description>
		<content:encoded><![CDATA[<p>@17,</p>
<p>Thanks for the info, for that was quite a mouthful, and thanks for your mighty contributions to FLOSS. When do you estimate (what kernel release) that ext4 will be considered ready for production systems?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tytso</title>
		<link>http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/comment-page-1/#comment-1588</link>
		<dc:creator>tytso</dc:creator>
		<pubDate>Wed, 07 Jan 2009 23:54:17 +0000</pubDate>
		<guid isPermaLink="false">http://thunk.org/tytso/blog/?p=152#comment-1588</guid>
		<description>Avery,

Note that tst_extents was intended as a debugging tool, not as an officially supported interface.   So it may very well change without warning.   It&#039;s basically an interface to the extent functions in libext2fs which I used when I was debugging it.   I would not recommend trying to create a program based on interacting with tst_extents and trying to parse its output.   You&#039;re much better off writing C program that links with libext2fs directly, or using SWIG to link it into Python or Perl.

The extents information is stored in a tree structure, where the root node of the tree is located in the inode and stores at most 4 entries.   Said entries can either be pointers to single blocks that can hold 340 entries assuming a 4k blocksize, or if the entry is a leaf entry, it will be a pointer to a file&#039;s extents.   In practice, ext4&#039;s anti-fragmentation allocation algorithms are good enough that most of the time, 4 entries in the inode is more than enough to cover most normally sized files, or for big files or if things are fragmented, a single external block.  Very rarely (as in at the moment I have a single such file on my filesystem) more than 4 blocks are needed, at which point a depth two extent tree might be needed.  It&#039;s usually not because the filesystem&#039;s free space is fragmented, but because the file is sparse, such that the discontinuities are in the logical block numbers, not the physical block numbers.

Finally, note that the &quot;all&quot; command to tst_extents will walk the entire extent tree, which is far more convenient that just entering the &#039;n&#039; command over and over.</description>
		<content:encoded><![CDATA[<p>Avery,</p>
<p>Note that tst_extents was intended as a debugging tool, not as an officially supported interface.   So it may very well change without warning.   It&#8217;s basically an interface to the extent functions in libext2fs which I used when I was debugging it.   I would not recommend trying to create a program based on interacting with tst_extents and trying to parse its output.   You&#8217;re much better off writing C program that links with libext2fs directly, or using SWIG to link it into Python or Perl.</p>
<p>The extents information is stored in a tree structure, where the root node of the tree is located in the inode and stores at most 4 entries.   Said entries can either be pointers to single blocks that can hold 340 entries assuming a 4k blocksize, or if the entry is a leaf entry, it will be a pointer to a file&#8217;s extents.   In practice, ext4&#8217;s anti-fragmentation allocation algorithms are good enough that most of the time, 4 entries in the inode is more than enough to cover most normally sized files, or for big files or if things are fragmented, a single external block.  Very rarely (as in at the moment I have a single such file on my filesystem) more than 4 blocks are needed, at which point a depth two extent tree might be needed.  It&#8217;s usually not because the filesystem&#8217;s free space is fragmented, but because the file is sparse, such that the discontinuities are in the logical block numbers, not the physical block numbers.</p>
<p>Finally, note that the &#8220;all&#8221; command to tst_extents will walk the entire extent tree, which is far more convenient that just entering the &#8216;n&#8217; command over and over.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: btmorex</title>
		<link>http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/comment-page-1/#comment-1587</link>
		<dc:creator>btmorex</dc:creator>
		<pubDate>Wed, 07 Jan 2009 20:36:23 +0000</pubDate>
		<guid isPermaLink="false">http://thunk.org/tytso/blog/?p=152#comment-1587</guid>
		<description>Thanks, that explains it.

I think I&#039;m using tst_extents correctly:
1.) open /path/to/fs
2.) inode /path/to/vid.avi
3.) root... gives

(Left 0)
extent: lblk 0--179014, len 179015, pblk 66879488, flags: (none)

4.) n... gives

[deleted because wordpress snips it, important part is (Left 6)]

So, if I keep hitting &#039;n&#039; for this particular file I find 7 extents and indeed there is a break in there which would explain filefrag reporting 2. I have a couple questions though:

1.) Is the root an extent too (which would make 8 total)? Or some kind of directory to the other ones? Why does the root report &quot;Left 0&quot;? If I use the procedure above (open-&gt;inode-&gt;root-&gt;n) and just parse &quot;(Left X)&quot;, is the number of extents used always X+1 or are there special cases?

2.) Is there a way to use tst_extents not interactively other than feeding commands through stdin?

Thanks,
Avery</description>
		<content:encoded><![CDATA[<p>Thanks, that explains it.</p>
<p>I think I&#8217;m using tst_extents correctly:<br />
1.) open /path/to/fs<br />
2.) inode /path/to/vid.avi<br />
3.) root&#8230; gives</p>
<p>(Left 0)<br />
extent: lblk 0&#8211;179014, len 179015, pblk 66879488, flags: (none)</p>
<p>4.) n&#8230; gives</p>
<p>[deleted because wordpress snips it, important part is (Left 6)]</p>
<p>So, if I keep hitting &#8216;n&#8217; for this particular file I find 7 extents and indeed there is a break in there which would explain filefrag reporting 2. I have a couple questions though:</p>
<p>1.) Is the root an extent too (which would make 8 total)? Or some kind of directory to the other ones? Why does the root report &#8220;Left 0&#8243;? If I use the procedure above (open-&gt;inode-&gt;root-&gt;n) and just parse &#8220;(Left X)&#8221;, is the number of extents used always X+1 or are there special cases?</p>
<p>2.) Is there a way to use tst_extents not interactively other than feeding commands through stdin?</p>
<p>Thanks,<br />
Avery</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: btmorex</title>
		<link>http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/comment-page-1/#comment-1586</link>
		<dc:creator>btmorex</dc:creator>
		<pubDate>Wed, 07 Jan 2009 20:34:38 +0000</pubDate>
		<guid isPermaLink="false">http://thunk.org/tytso/blog/?p=152#comment-1586</guid>
		<description>Thanks, that explains it.

I think I&#039;m using tst_extents correctly:
1.) open /path/to/fs
2.) inode /path/to/vid.avi
3.) root... gives

(Left 0)
extent: lblk 0--179014, len 179015, pblk 66879488, flags: (none)

4.) n... gives

&lt;&lt;&lt;inode-&gt;root-&gt;n) and just parse &quot;(Left X)&quot;, is the number of extents used always X+1 or are there special cases?

2.) Is there a way to use tst_extents not interactively other than feeding commands through stdin?

Thanks,
Avery</description>
		<content:encoded><![CDATA[<p>Thanks, that explains it.</p>
<p>I think I&#8217;m using tst_extents correctly:<br />
1.) open /path/to/fs<br />
2.) inode /path/to/vid.avi<br />
3.) root&#8230; gives</p>
<p>(Left 0)<br />
extent: lblk 0&#8211;179014, len 179015, pblk 66879488, flags: (none)</p>
<p>4.) n&#8230; gives</p>
<p>&lt;&lt;&lt;inode-&gt;root-&gt;n) and just parse &#8220;(Left X)&#8221;, is the number of extents used always X+1 or are there special cases?</p>
<p>2.) Is there a way to use tst_extents not interactively other than feeding commands through stdin?</p>
<p>Thanks,<br />
Avery</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tytso</title>
		<link>http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/comment-page-1/#comment-1585</link>
		<dc:creator>tytso</dc:creator>
		<pubDate>Wed, 07 Jan 2009 19:03:33 +0000</pubDate>
		<guid isPermaLink="false">http://thunk.org/tytso/blog/?p=152#comment-1585</guid>
		<description>The filefrag program is a bit misleading.  When it uses the word &quot;extents&quot;, what it means is &quot;contiguous ranges of blocks&quot;.   Keep in mind that filefrags predates ext4 by something like 5-6 years or more.   So its use of extents is well before ext4 came on the scene.   Similarly, if you look inside the source code for resize2fs, it uses &quot;extents&quot; to mean its own internal way of tracking a contiguous range of blocks used by a particular inode.   But again, this predates ext4, so it has nothing to do with how ext4 happens to encode extents.

So if you were to look at vid.avi using a tool such as tst_extents, you would no doubt see a series of 128 meg extents, since that is the maximum number of blocks that can be encoded in the extents structure used is indeed 128 metabytes (assuming a 4k blocksize).   But ext4 will try very hard to keep block allocations contiguous, even between ext4 extent encodings, such that if extent number #2 ended with block N, extent #3 will begin with N+1 if at all possible.   Apparently, in your vid.avi file, this was true for all but one case, so filefrag reported two contiguous block ranges, which it reported as &quot;2 extents&quot;.

I hope that clarifies things!</description>
		<content:encoded><![CDATA[<p>The filefrag program is a bit misleading.  When it uses the word &#8220;extents&#8221;, what it means is &#8220;contiguous ranges of blocks&#8221;.   Keep in mind that filefrags predates ext4 by something like 5-6 years or more.   So its use of extents is well before ext4 came on the scene.   Similarly, if you look inside the source code for resize2fs, it uses &#8220;extents&#8221; to mean its own internal way of tracking a contiguous range of blocks used by a particular inode.   But again, this predates ext4, so it has nothing to do with how ext4 happens to encode extents.</p>
<p>So if you were to look at vid.avi using a tool such as tst_extents, you would no doubt see a series of 128 meg extents, since that is the maximum number of blocks that can be encoded in the extents structure used is indeed 128 metabytes (assuming a 4k blocksize).   But ext4 will try very hard to keep block allocations contiguous, even between ext4 extent encodings, such that if extent number #2 ended with block N, extent #3 will begin with N+1 if at all possible.   Apparently, in your vid.avi file, this was true for all but one case, so filefrag reported two contiguous block ranges, which it reported as &#8220;2 extents&#8221;.</p>
<p>I hope that clarifies things!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: btmorex</title>
		<link>http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/comment-page-1/#comment-1584</link>
		<dc:creator>btmorex</dc:creator>
		<pubDate>Wed, 07 Jan 2009 18:37:59 +0000</pubDate>
		<guid isPermaLink="false">http://thunk.org/tytso/blog/?p=152#comment-1584</guid>
		<description>Hi Ted,

I&#039;m finally trying out ext4 and especially seeing how fragmentation is vs ext3. I&#039;m running into some output that I don&#039;t understand from filefrag:

# filefrag vid.avi 
vid.avi: 2 extents found
# ls -l vid.avi
-rw-r--r-- 1 avery avery 733243392 2008-08-12 02:01 vid.avi

Shouldn&#039;t the minimum number of extents be 6 based on 128MB extents?

I found this bug which is similar and supposed to be fixed:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=458306

I thought I would post here before filing another one to make sure I&#039;m not missing something obvious.</description>
		<content:encoded><![CDATA[<p>Hi Ted,</p>
<p>I&#8217;m finally trying out ext4 and especially seeing how fragmentation is vs ext3. I&#8217;m running into some output that I don&#8217;t understand from filefrag:</p>
<p># filefrag vid.avi<br />
vid.avi: 2 extents found<br />
# ls -l vid.avi<br />
-rw-r&#8211;r&#8211; 1 avery avery 733243392 2008-08-12 02:01 vid.avi</p>
<p>Shouldn&#8217;t the minimum number of extents be 6 based on 128MB extents?</p>
<p>I found this bug which is similar and supposed to be fixed:<br />
<a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=458306" rel="nofollow">http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=458306</a></p>
<p>I thought I would post here before filing another one to make sure I&#8217;m not missing something obvious.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tytso</title>
		<link>http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/comment-page-1/#comment-1583</link>
		<dc:creator>tytso</dc:creator>
		<pubDate>Wed, 07 Jan 2009 14:02:01 +0000</pubDate>
		<guid isPermaLink="false">http://thunk.org/tytso/blog/?p=152#comment-1583</guid>
		<description>@16: Tshepang,

Unfortunately, the fast fsck times won&#039;t show up if you convert the filesystem from ext3.   Sorry, I should have made this clear.

New files will benefit from extents and delayed allocation, so you&#039;ll see run-time performance by moving to ext4.  However, the improvements in fsck time stem from changes in how the bitmap and inode tables are laid out (which require a fresh mkfs -t ext4; a conversion from ext3 won&#039;t help that) and the use of extents instead of indirect blocks to take a huge amount of time out of pass 1 seeking to deal with the indirect blocks requires that all of your files larger than 48k (assuming 4k block files) are rewritten using extents.   We do have code that will be available in the future to &quot;migrate&quot; files using indirect blocks to extents, but it still won&#039;t be as good as a filesystem which is freshly made with ext4.   

Finally, not all of the instructions on converting ext3 filesystems tell you to run tune2fs -O uninit_bg /dev/XXX; e2fsck /dev/XXX.   This will also reduce the fsck time, but the first fsck run is really annoying since you have to individually answer &quot;Y&quot; to all of the questions about setting the group checksums.  After that, e2fsck will be able to skip inode table blocks that are completely empty during pass 1.   In the long run I need to improve e2fsprogs to make this step not quite so annoying, but even then, you&#039;ll get the best performance (and the best anti-fragmentation resistance, et. al), by doing a backup, recreating the filesystem using mke2fs -t ext4, and then a restore.

I&#039;ll also note just out of a sense of fair warning, that there&#039;s an additional block and allocation layout change which I am planning which should make improvements in ext4&#039;s fsck and fragmentation resistance.  Essentially it&#039;s a change in where ext4 decides to allocate blocks for directories, segregating them from blocks for regular inodes.   As before, it&#039;s fully backwards compatible, but to see the full results there will be the need to do a backup, re-mkfs, and restore.   The changes I&#039;m contemplating will be probably only a tiny incremental improvement (if it&#039;s 5% better, I&#039;ll be ecstatic), so it&#039;s probably not a reason to justify a full rebuild, but I thought I would mention it.  Since it won&#039;t get done in time for the 2.6.29 merge window, it won&#039;t see fruition in a stable release for six months (i.e., when 2.6.30 is released).   So it&#039;s probably not worth waiting for, since the improvements between the ext3 layout and the current ext4 layout and block allocation algorithms are a factor of 7 or so, plus or minus.</description>
		<content:encoded><![CDATA[<p>@16: Tshepang,</p>
<p>Unfortunately, the fast fsck times won&#8217;t show up if you convert the filesystem from ext3.   Sorry, I should have made this clear.</p>
<p>New files will benefit from extents and delayed allocation, so you&#8217;ll see run-time performance by moving to ext4.  However, the improvements in fsck time stem from changes in how the bitmap and inode tables are laid out (which require a fresh mkfs -t ext4; a conversion from ext3 won&#8217;t help that) and the use of extents instead of indirect blocks to take a huge amount of time out of pass 1 seeking to deal with the indirect blocks requires that all of your files larger than 48k (assuming 4k block files) are rewritten using extents.   We do have code that will be available in the future to &#8220;migrate&#8221; files using indirect blocks to extents, but it still won&#8217;t be as good as a filesystem which is freshly made with ext4.   </p>
<p>Finally, not all of the instructions on converting ext3 filesystems tell you to run tune2fs -O uninit_bg /dev/XXX; e2fsck /dev/XXX.   This will also reduce the fsck time, but the first fsck run is really annoying since you have to individually answer &#8220;Y&#8221; to all of the questions about setting the group checksums.  After that, e2fsck will be able to skip inode table blocks that are completely empty during pass 1.   In the long run I need to improve e2fsprogs to make this step not quite so annoying, but even then, you&#8217;ll get the best performance (and the best anti-fragmentation resistance, et. al), by doing a backup, recreating the filesystem using mke2fs -t ext4, and then a restore.</p>
<p>I&#8217;ll also note just out of a sense of fair warning, that there&#8217;s an additional block and allocation layout change which I am planning which should make improvements in ext4&#8217;s fsck and fragmentation resistance.  Essentially it&#8217;s a change in where ext4 decides to allocate blocks for directories, segregating them from blocks for regular inodes.   As before, it&#8217;s fully backwards compatible, but to see the full results there will be the need to do a backup, re-mkfs, and restore.   The changes I&#8217;m contemplating will be probably only a tiny incremental improvement (if it&#8217;s 5% better, I&#8217;ll be ecstatic), so it&#8217;s probably not a reason to justify a full rebuild, but I thought I would mention it.  Since it won&#8217;t get done in time for the 2.6.29 merge window, it won&#8217;t see fruition in a stable release for six months (i.e., when 2.6.30 is released).   So it&#8217;s probably not worth waiting for, since the improvements between the ext3 layout and the current ext4 layout and block allocation algorithms are a factor of 7 or so, plus or minus.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tshepang Lekhonkhobe</title>
		<link>http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/comment-page-1/#comment-1582</link>
		<dc:creator>Tshepang Lekhonkhobe</dc:creator>
		<pubDate>Wed, 07 Jan 2009 09:09:09 +0000</pubDate>
		<guid isPermaLink="false">http://thunk.org/tytso/blog/?p=152#comment-1582</guid>
		<description>Hi,

I&#039;m having trouble having those promised fast fsck times. I&#039;m running Debian&#039;s 2.6.27 on Sid (early January). The filesystem has been converted from ext3 some weeks ago.</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>I&#8217;m having trouble having those promised fast fsck times. I&#8217;m running Debian&#8217;s 2.6.27 on Sid (early January). The filesystem has been converted from ext3 some weeks ago.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
