You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Hudson <gh...@MIT.EDU> on 2004/04/30 16:44:50 UTC

FSFS propaganda

I've written a little propaganda document about FSFS.  It lives in
http://web.mit.edu/ghudson/info/fsfs if you need to link to it; here's
a copy for easy access and for the list archives.

In the longer term (closer to the 1.1 release), this could be used as
source material for FAQs or for the svn book.

---

"FSFS" is the name of a Subversion filesystem implementation, an
alternative to the original Berkeley DB-based implementation.  See
http://subversion.tigris.org/ for information about Subversion.  This
is a propaganda document for FSFS, to help people determine if they
should be interested in using it instead of the BDB filesystem.

How FSFS is Better
------------------

* Write access not required for read operations

To perform a checkout, update, or similar operation on an FSFS
repository requires no write access to any part of the repository.

* Little or no need for recovery

An svn process which terminates improperly will not generally cause
the repository to wedge.  (See "Note: Recovery" below for a more
in-depth discussion of what could conceivably go wrong.)

* Smaller repositories

An FSFS repository is smaller than a BDB repository.  Generally, the
space savings are on the order of 10-20%, but if you do a lot of work
on branches, the savings could be much higher, due to the way FSFS
stores deltas.  Also, if you have many small repositories, the
overhead of FSFS is much smaller than the overhead of the BDB
implementation.

* Platform-independent

The format of an FSFS repository is platform-independent, whereas a
BDB repository will generallly require recovery (or a dump and load)
before it can be accessed with a different operating system, hardware
platform, or BDB version.

* Can host on network filesystem

FSFS repositories can be hosted on network filesystems, just as CVS
repositories can.  (See "Note: Locking" for caveats about
write-locking.)

* No umask issues

FSFS is careful to match the permissions of new revision files to the
permissions of the previous most-recent revision, so there is no need
to worry about a committer's umask rendering part of the repository
inaccessible to other users.  (You must still set the g+s bit on the
db directories on most Unix platforms other than the *BSDs.)

* Standard backup software

An FSFS repository can be backed up with standard backup software.
Since old revision files don't change, incremental backups with
standard backup software are efficient.

(BDB repositories can be backed up using "svnadmin hotcopy" and can be
backed up incrementally using "svnadmin dump".  FSFS just makes it
easier.)

* Can split up repository across multiple spools

If an FSFS repository is outgrowing the filesystem it lives on, you
can symlink old revisions off to another filesystem.

* More easily understood repository layout

If something goes wrong and you need to examine your repository, it
may be easier to do so with the FSFS format than with the BDB format.
(To be fair, both of them are difficult to extract file contents from
by hand, because they use delta storage, and "db_dump" makes it
possible to analyze a BDB repository.)

* (Fine point) Fast "svn log -v" over big revisions

In the BDB filesystem, if you do a large import and then do "svn log
-v", the server has to crawl the database for each changed path to
find the copyfrom information, which can take a minute or two of high
server load.  FSFS stores the copyfrom information along with the
changed-path information, so the same operation takes just a few
seconds.

* (Marginal) Can give insert-only access to revs subdir for commits

In some filesystems such as AFS, it is possible to give insert-only
write access to a directory.  If you can do this, you can give people
commit access to an FSFS repository without allowing them to modify
old revisions, without using a server.

(The Unix sticky bit comes close, but people would still have
permission to modify their own old revisions, which, because of delta
storage, might allow them to influence the contents of other people's
more recent revisions.)

How FSFS is Worse
-----------------

* More server work for head checkout

Because of the way FSFS stores deltas, it takes more work to derive
the contents of the head revision than it does in a BDB filesystem.
Measurements suggest that in a typical workload, the server has to do
about twice as much work (computation and file access) to check out
the head.  From the client's perspective, with network and working
copy overhead added in, the extra time required for a checkout
operation is minimal, but if server resources are scarce, FSFS might
not be the best choice for a repository with many readers.

* Finalization delay

Although FSFS commits are generally faster than BDB commits, more of
the work of an FSFS commit is deferred until the final step.  For a
very large commit (tens of thousands of files), the final step may
involve a delay of over a minute.  There is no user feedback during
the final phase of a commit, which can lead to impatience and, in
really bad cases, HTTP client timeouts.

* Lower commit throughput

Because of the greater amount of work done during the final phase of a
commit, if there are many commits to an FSFS repository, they may
stack up behind each other waiting for the write lock, whereas in a
BDB repository they would be able to do more of their work in
parallel.

* Immature code

FSFS was only recently implemented.  At the time of this writing, it
is not part of any Subversion release, and it has received only
minimal testing.

* (Developers) More difficult to index

Every so often, people propose new Subversion features which require
adding new indexing to the repository in order to implement
efficiently.  Here's a little picture showing where FSFS lies on the
indexing difficulty axis:

               Ease of adding new indexing
   harder <----------------------------------> easier
           FSFS            BDB            SQL

With a hypothetical SQL database implementation, new indexes could be
added easily.  In the BDB implementation, it is necessary to write
code to maintain the index, but transactions and tables make that code
relatively straightforward to write.  In a dedicated format like FSFS,
particularly with its "old revisions never change" constraint, adding
new indexing features would generally require a careful design
process.


How To Use
----------

At the time of this writing, FSFS support only exists in the
unreleased trunk, in r9573 or later.  If you aren't comfortable with
building Subversion from source, you should probably wait until the
Subversion 1.1 release.

If you've gotten that out of the way, using FSFS is simple: just
create your repositories with "svnadmin create --fs-type=fsfs PATH".

Note: Recovery
--------------

If a process terminates abnormally during a read operation, it should
leave behind no traces in the repository, since read operations do not
modify the repository in any way.

If a process terminates abnormally during a commit operation, it will
leave behind a stale transaction, which will not interfere with
operation and which can be removed with a normal recursive delete
operation.

If a process terminates abnormally during the final phase of a commit
operation, it may be holding the write lock.  The way locking is
currently implemented, a dead process should not be able to hold a
lock, but over a remote filesystem that guarantee may not apply.
Also, in the future, FSFS may have optional support for
NFSv2-compatible locking which would allow for the possibility of
stale locks.  In either case, the write-lock file can simply be
removed to unblock commits, and read operations will remain
unaffected.

Note: Locking
-------------

Locking is currently implemented using the apr_file_lock() function,
which on Unix uses fcntl() locking, and on Windows uses LockFile().
Modern remote filesystem implementations should support these
operations, but may not do so perfectly, and NFSv2 servers may not
support them at all.

It is possible to do exclusive locking under basic NFSv2 using a
complicated dance involving link().  It's possible that FSFS will
evolve to allow NFSv2-compatible locking, or perhaps just basic O_EXCL
locking, as a repository configuration option.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: FSFS propaganda

Posted by "D.J. Heap" <dj...@shadyvale.net>.
Branko Čibej wrote:

> Greg Hudson wrote:
> 
>> On Fri, 2004-04-30 at 13:08, Josh Pieper wrote:
>>  
>>
>>> I think the proper solution here is to only write the 'current' and
>>> revprop files by using filesystem renames, which are atomic.  Then it
>>> would be impossible to read a current file that is in an inconsistent
>>> state.  Does that sound good ghudson?
>>>   
>>
>>
>> That's certainly the proper answer on Unix.  I don't know if Windows is
>> any different.
>>  
>>
> Yes, I've been wondering about that... I can't find any documentation 
> stating that it's /not/ atomic as long as the source and destination are 
> in the same volume. That's unfortunately not good enough...
> 
> -- Brane
> 

Worse than that, they won't succeed if the destination file is in use 
which it may be here, correct?

The svn_io_* functions sort of deal with this already through the 
infamous 'access denied' retry code.  I'm not sure if that will work for 
what is going on here, though...

DJ


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: FSFS propaganda

Posted by Branko Čibej <br...@xbc.nu>.
Greg Hudson wrote:

>On Fri, 2004-04-30 at 13:08, Josh Pieper wrote:
>  
>
>>I think the proper solution here is to only write the 'current' and
>>revprop files by using filesystem renames, which are atomic.  Then it
>>would be impossible to read a current file that is in an inconsistent
>>state.  Does that sound good ghudson?
>>    
>>
>
>That's certainly the proper answer on Unix.  I don't know if Windows is
>any different.
>  
>
Yes, I've been wondering about that... I can't find any documentation 
stating that it's /not/ atomic as long as the source and destination are 
in the same volume. That's unfortunately not good enough...

-- Brane



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: FSFS propaganda

Posted by Greg Hudson <gh...@MIT.EDU>.
On Fri, 2004-04-30 at 13:08, Josh Pieper wrote:
> I think the proper solution here is to only write the 'current' and
> revprop files by using filesystem renames, which are atomic.  Then it
> would be impossible to read a current file that is in an inconsistent
> state.  Does that sound good ghudson?

That's certainly the proper answer on Unix.  I don't know if Windows is
any different.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: FSFS propaganda

Posted by Josh Pieper <jp...@andrew.cmu.edu>.
Garrett Rooney wrote:
> What if a process dies while rewriting the 'current' file?  It seems 
> like we might want to make 'svnadmin recover' able to reconstruct that file.
> 
> In a somewhat related issue, the last time I looked, the reading of 
> various files in the fsfs filesystem (current, revprops, etc) isn't 
> locked, despite the fact that those files can be changed.  Should we be 
> using apr_file_lock to obtain read locks on them while they're being 
> read, to avoid the possibility of reading while they're being rewritten?

I think the proper solution here is to only write the 'current' and
revprop files by using filesystem renames, which are atomic.  Then it
would be impossible to read a current file that is in an inconsistent
state.  Does that sound good ghudson?

-Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: FSFS propaganda

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
Greg Hudson wrote:

> Note: Recovery
> --------------
> 
> If a process terminates abnormally during a read operation, it should
> leave behind no traces in the repository, since read operations do not
> modify the repository in any way.
> 
> If a process terminates abnormally during a commit operation, it will
> leave behind a stale transaction, which will not interfere with
> operation and which can be removed with a normal recursive delete
> operation.
> 
> If a process terminates abnormally during the final phase of a commit
> operation, it may be holding the write lock.  The way locking is
> currently implemented, a dead process should not be able to hold a
> lock, but over a remote filesystem that guarantee may not apply.
> Also, in the future, FSFS may have optional support for
> NFSv2-compatible locking which would allow for the possibility of
> stale locks.  In either case, the write-lock file can simply be
> removed to unblock commits, and read operations will remain
> unaffected.

What if a process dies while rewriting the 'current' file?  It seems 
like we might want to make 'svnadmin recover' able to reconstruct that file.

In a somewhat related issue, the last time I looked, the reading of 
various files in the fsfs filesystem (current, revprops, etc) isn't 
locked, despite the fact that those files can be changed.  Should we be 
using apr_file_lock to obtain read locks on them while they're being 
read, to avoid the possibility of reading while they're being rewritten?

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: FSFS propaganda

Posted by Greg Hudson <gh...@MIT.EDU>.
On Fri, 2004-04-30 at 15:20, Philip Martin wrote:
> You may want to consider using fsync on the directory as well,
> particularly if you start to rename files, as I believe this is the
> usual way to flush inode data to disk.

It's only "usual" on Linux; other Unix operating systems don't require
that and I think don't allow it.  But sure, I can do that too.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: FSFS propaganda

Posted by Philip Martin <ph...@codematters.co.uk>.
Greg Hudson <gh...@MIT.EDU> writes:

> This is a known loose end.  I would like to add code to fsync the rev
> and prop files before returning from the commit, but APR does not have
> an interface to fsync(), nor could I find a Unix-specific interface to
> get the fd so that I could put a call to fsync() inside #ifndef WIN32. 
> I haven't thought of a way to circumvent this problem yet, but hopefully
> there's an option.

You may want to consider using fsync on the directory as well,
particularly if you start to rename files, as I believe this is the
usual way to flush inode data to disk.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: FSFS propaganda

Posted by Branko Čibej <br...@xbc.nu>.
Greg Hudson wrote:

>On Fri, 2004-04-30 at 14:55, Philip Martin wrote:
>  
>
>>Another point to discuss would be a machine crash.  When a process
>>writes a file it generally writes it to the OS cache, and the OS
>>itself is responsible for flushing the cache to disk.  A process can
>>use fsync(2) to force a flush, but if it doesn't explicitly flush and
>>the machine crashes shortly after the commit completes then data may
>>be lost.
>>    
>>
>
>This is a known loose end.  I would like to add code to fsync the rev
>and prop files before returning from the commit, but APR does not have
>an interface to fsync(), nor could I find a Unix-specific interface to
>get the fd
>
apr_os_file_get in apr_portable.h

> so that I could put a call to fsync() inside #ifndef WIN32. 
>I haven't thought of a way to circumvent this problem yet, but hopefully
>there's an option.
>  
>




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: FSFS propaganda

Posted by Greg Hudson <gh...@MIT.EDU>.
On Fri, 2004-04-30 at 14:55, Philip Martin wrote:
> Another point to discuss would be a machine crash.  When a process
> writes a file it generally writes it to the OS cache, and the OS
> itself is responsible for flushing the cache to disk.  A process can
> use fsync(2) to force a flush, but if it doesn't explicitly flush and
> the machine crashes shortly after the commit completes then data may
> be lost.

This is a known loose end.  I would like to add code to fsync the rev
and prop files before returning from the commit, but APR does not have
an interface to fsync(), nor could I find a Unix-specific interface to
get the fd so that I could put a call to fsync() inside #ifndef WIN32. 
I haven't thought of a way to circumvent this problem yet, but hopefully
there's an option.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: FSFS propaganda

Posted by Philip Martin <ph...@codematters.co.uk>.
Greg Hudson <gh...@MIT.EDU> writes:

> If a process terminates abnormally during a commit operation, it will
> leave behind a stale transaction, which will not interfere with
> operation and which can be removed with a normal recursive delete
> operation.

Another point to discuss would be a machine crash.  When a process
writes a file it generally writes it to the OS cache, and the OS
itself is responsible for flushing the cache to disk.  A process can
use fsync(2) to force a flush, but if it doesn't explicitly flush and
the machine crashes shortly after the commit completes then data may
be lost.

The BDB backend has DB_TXN_NOSYNC to control whether the process
explicitly flushes, I'm guessing FSFS behaves more like BDB with
DB_TXN_NOSYNC set.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org