I received a trackback from Tim Bray’s Saving Data Safely post on the Android Developer’s blog to my Don’t fear the fsync! blog entry, so I guess the cat’s out of the bag. Starting with Gingerbread, newer Android phones (starting with the Nexus S) will be using the ext4 file system. Very cool! So just as IBM used to promote Linux by saying that it was scalable enough to run on everything between watches and mainframes, I can now talk about ext4 as running in production on cell phones to Google data centers.
I have the money shot for my LCA presentation
Thanks to Eric Whitney’s benchmarking results, I have my money shot for my upcoming 2011 LCA talk in Brisbane, which will be about how to improve scalability in the Linux kernel, using the case study of the work that I did to improve scalability via a series of scalability patches that were developed during 2.6.34, 2.6.35, and 2.6.36 (and went into the kernel during subsequent merge window). These benchmarks were done on a 48-core AMD system (8 sockets, 6 cores/socket) using a 24 SAS-disk hardware RAID array.
Don’t fear the fsync!
After reading the comments on my earlier post, Delayed allocation and the zero-length file problem as well as some of the comments on the Slashdot story as well as the Ubuntu bug, it’s become very clear to me that there are a lot of myths and misplaced concerns about fsync() and how best to use it. I thought it would be appropriate to correct as many of these misunderstandings about fsync() in one comprehensive blog posting.
Delayed allocation and the zero-length file problem
A recent Ubuntu bug has gotten slashdotted, and has started raising a lot of questions about the safety of using ext4. I’ve actually been meaning to blog about this for a week or so, but between a bout of the stomach flu and a huge todo list at work, I simply haven’t had the time. The essential “problem” is that ext4 implements something called delayed allocation. Delayed allocation isn’t new to Linux; xfs has had delayed allocation for years.
SSD’s, Journaling, and noatime/relatime
On occasion, you will see the advice that the ext3 file system is not suitable for Solid State Disks (SSD’s) due to the extra writes caused by journaling — and so Linux users using SSD’s should use ext2 instead. However, is this folk wisdom actually true? This weekend, I decided to measure exactly what the write overhead of journaling actually is in actual practice. For this experiment I used ext4, since I recently added a feature to track the amount of writes to the file system over its lifetime (to better gauge the wear and tear on an SSD).
Fast ext4 fsck times, revisited
Last night I managed to finish up a rather satisfying improvement to ext4’s inode and block allocators. The ext4’s original allocator was actually a bit more simple-minded than ext3’s, in that it didn’t implement the Orlov algorithm to spread out top-level directories for better filesystem aging. It also was buggy in certain ways, where it would return ENOSPC even when there were still plenty of inodes in the file system.
Should Filesystems Be Optimized for SSD’s?
In one of the comments to my last blog entry, an anonymous commenter writes: You seem to be taking a different perspective to linus on the “adapting to the the disk technology” front (Linus seems to against having to have the OS know about disk boundaries and having to do levelling itself) That’s an interesting question, and I figure it’s worth its own top-level entry, as opposed to a reply in the comment stream.
Aligning filesystems to an SSD’s erase block size
I recently purchased a new toy, an Intel X25-M SSD, and when I was setting it up initially, I decided I wanted to make sure the file system was aligned on an erase block boundary. This is a generally considered to be a Very Good Thing to do for most SSD’s available today, although there’s some question about how important this really is for Intel SSD’s — more on that in a moment.
Wanted: Incremental Backup Solutions that Use a Database
Dear Lazyweb, I’m looking for recommendations for Open Source backup solutions which track incremental backups using a database, and which do not use hard link directories. Someone gave me a suggested OSS backup program at UDS, but it’s slipped my memory; so I’m fairly sure that at least one or more such OSS backup solutions exist, but don’t know their names. Can some folks give me some suggestions? Thanks! There are a number of very popular Open Source backup solutions that use a very clever hack of using hard link trees to maintain incremental backups.
Fast ext4 fsck times
This wasn’t one of the things we were explicitly engineering for when were designing the features that would go into ext4, but one of the things which we’ve found as a pleasant surprise is how much more quickly ext4 filesystems can be checked. Ric Wheeler reported some really good fsck times that were over ten times better than ext3 using filesystems generated using what was admittedly a very artificial/synthetic benchmark. During the past six weeks, though, I’ve been using ext4 on my laptop, and I’ve seen very similar results.