October 7, 2006

SUSE 10.2 Ditching ReiserFS as its’ default FS?

A letter by Jeff Mahoney from SUSE Labs

Hi all –

We’ve been using ReiserFS as our default installation file system for
the last 6-7 years now, and it’s served us well in that time.
Unfortunately, there are a number of problems with it, some purely
technical, some more related to maintenance. I’ll outline a few of the
larger issues and offer my solution as a conclusion.

ReiserFS has serious scalability problems. David Chinner’s talk at OLS
really underscored the problem well for a single, large, high bandwidth
file system. While I realize that XFS-style scalability isn’t a real
goal for most users, ans isn’t a target workload for reiserfs, the
scalability problems are real. ReiserFS uses the BKL for synchronization
everywhere, and since it’s system-global lock, the problem doesn’t go
away when you split the file system into smaller ones. Lock contention
alone is one problem, but it’s made worse by cache bouncing between processors on larger systems.

ReiserFS has serious performance problems with extended attributes and
ACLs. (Yes, this one is my own fault, see numerous flamewars on lkml and
reiserfs-list for my opinion on this.) xattrs are backed by normal files
rooted in a hidden directory structure. This is bad for performance and
in rare cases is deadlock prone due to lock inversions between pdflush
and the xattr code. The quota code gets around this, but the fix would
result in huge amounts of wasted space with ReiserFS. With increasing
deployment of SLES as samba servers, and perhaps NFSv4 servers, the use
of extended attributes will only increase.

ReiserFS has a small and shrinking development community. Right now, the
only developers really working with ReiserFS are Chris Mason, Jan Kara
(internally), a rotating member of Hans Reiser’s team, and myself. All
of us have projects we’re very much more interested in than working with
ReiserFS. While Jan and I will be continuing to support ReiserFS for
SUSE, Hans is increasingly (hard to believe) pushing people to use
reiser4. Chris has moved on to Oracle and has expressed his opinions on
leaving ReiserFS behind.

ReiserFS v3 is a dead end. Hans has been pushing reiser4 for years now
and declared Reiser3 in maintenance mode. Any changes that aren’t bug
fixes are met with violent resistance. Reiser4 is not an incremental
update and requires a reformat, which is unreasonable for most people.
Reiser3 lacks a number of features that other file systems either have
or are adding soon, such as extents and growth beyond current limits.
Since it’s in maintenance mode, that’s unlikely to change. I view
reiser4 as an interesting research file system, but that’s about as far
as it goes. I’ve been unimpressed with its stability so far. I don’t
know how advanced the recovery tools are yet, but I suspect that the
complexity of the format and the ability to essentially define the
format on-the-fly with plugins will make a useful fsck extremely difficult.

The solution for replacing an aging file system isn’t to switch to a
brand new unproven file system, but rather a proven one with a clear
upgrade path. That file system is ext3.

Ext3’s performance in some situations may not be on par with Reiser3,
but it scales better and Andi mentioned the other day that there is
quite a bit of research going into improving the locking and general
performance of ext3 going on right now, and since reiser3 is stagnant, I
don’t doubt they’ll pass them soon.

Ext3 has a much larger development community out there. Most other
distributions use ext3 as their default file system, so bugs that don’t
end up getting reported to us and are fixed by other developers, we get
for free – most Reiser3 fixes originate from Chris or I.

Ext3 has a clear upgrade path. There is quite a bit of interest in the
community in improving ext3, and ext4 is already under development. Like
the upgrade path from ext2 to ext3, the path to ext4 is clearly defined.
Existing file systems can be updated easily, and new files will be able
to take advantage of the new features. Features already written and
queued up include extents, a 64-bit journal, and 64-bit file sizes.

Most of the institutional knowledge of reiserfs is bouncing around in my
head. Jan has been getting his hands dirty a little bit, but beyond
that, finding additional developers with reiserfs experience will be
extremely difficult and I’d call training additional developers a wasted
effort. Since reiserfs is in maintenance mode, the effort needed to
continue to support it in future releases should be shrinking.

To be clear, my long term goal is to use OCFS2 (or another CFS if one
shows a clear adoption advantage) for the root file system. This would
enable single-instance clustering at both the physical and the virtual
distribution level and get us ease of management and flexibility in HA
deployments. Realistically, though, desktop users are likely to continue
to use ext[34] for the foreseeable future. Until we have OCFS2 (and the
rest of the distribution) ready for such a deployment described above on
larger servers, ext3 would be a suitable choice across the board.


