You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Malcolm Rowe <ma...@farside.org.uk> on 2007/09/19 13:18:40 UTC

Re: FSFS optimization

On Mon, Aug 13, 2007 at 03:18:59PM -0700, Dan Christian wrote:
> The first thought is to work on caching some of these:  "node.0.0" is
> the node revision, and "props" contains the revision properties.  The
> "changes" file is always(?) appended to, so we might be able to keep
> it open and keep appending to it.  The tricky bit is that the httpd
> will close at some point and an a fresh one will end up advancing the
> transaction.  This implies that the disk version must be kept up to
> date.
> 

This is the key, actually.  NFS only guarantees close-to-open cache
coherency, so we need to ensure a file written by one process/host is
closed before it's opened by another one... and there unfortunately
isn't any explicit FS call that says "I'm finished with this transaction
on this process for the minute, feel free to flush state to disk now".

And so we end up flushing to disk after _every_ operation, which sucks.

> An alternative/additional thought would be to move the transactions
> directory to local disk so that the OS can cache more aggressively.
> This would require network load balancers to always direct the same
> client to the same server.

I think this is the way to go.  I had a patch to do this, I think,
though I don't think I had thought through everything, so there might be
lurking issues.  (Also, from a UI point-of-view, how do you switch it
on?)

> A completely separate idea is to work on caching revision files on
> local disk (so the full text version doesn't have to be regenerated
> repeatedly).

+1.  A server-side rep cache sounds like an excellent idea.
(Again, I wonder about the configuration, though.  Maybe FSFS needs some
equivalent to the DB_CONFIG file... or maybe svnadmin should have an
fsfstune command?)

Regards,
Malcolm

Re: FSFS optimization

Posted by Eric Gillespie <ep...@pretzelnet.org>.
"Dan Christian" <dc...@google.com> writes:

> UUID stands for Universally Unique Identifier.  You've failed as soon
> as you have two different filesystems with the same UUID.

> The test suite would need to be modified to generate properly unique
> UUIDs.  The current behavior is broken (IMHO).

+1

-- 
Eric Gillespie <*> epg@pretzelnet.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: FSFS optimization

Posted by Dan Christian <dc...@google.com>.
On 9/21/07, Malcolm Rowe <ma...@farside.org.uk> wrote:
> On Wed, Sep 19, 2007 at 02:18:40PM +0100, Malcolm Rowe wrote:
> > > A completely separate idea is to work on caching revision files on
> > > local disk (so the full text version doesn't have to be regenerated
> > > repeatedly).
> >
> > +1.  A server-side rep cache sounds like an excellent idea.
>
> Oh, something that makes any kind of caching hard to do in FSFS is that
> the following should work (or, rather, we haven't said it shouldn't, and
> haven't prohibited it in any meaningful way):
>
> $ svnserve -d -r .
> $ svnadmin create repo
> $ svnadmin load repo < dumpfile
>
> Option 1:
> $ svn co svn://localhost/repo wc
> $ touch wc/foo; svn add wc/foo; svn ci wc -m "log message"; rm -rf wc
>   # now undo that commit:
> $ rm -rf repo; svnadmin create repo; svnadmin load repo < dumpfile
>   # (how can svnserve [or any process] know to invalidate the cache?]

It doesn't have to.  The files are still valid at certain revisions.
There is already a layer that knows whether that file "exists" at the
requested revision.

>
> Option 2:
> $ svnadmin hotcopy repo repocopy
> $ svn co svn://localhost/repo wc
> $ touch wc/foo; svn add wc/foo; svn ci wc -m "log message"; rm -rf wc
> $ svn ls svn://localhost/repocopy
>   # oops! Two filesystems open with the same UUID.  Hope we're not just
>   # using UUID/path/rev as the cache key.

UUID stands for Universally Unique Identifier.  You've failed as soon
as you have two different filesystems with the same UUID.

Yes, the svn test suite does this, and tests fail when caching is enabled.

I have an updated version of DannyB's memcached caching patch.  I had
to add a hook to the test suite to restart memcached after every
repository creation step.  This enables the test suite to pass with
caching enabled.

> I've been wondering whether to check for 'same UUID, different
> filesystem' at open time and disallow it (since we already have data
> structures that are keyed off the UUID).

This seems like the right thing to do.  I don't know if there would be
special cases where you would have to skip the check.

> I'm concerned that that might
> cause more problems that it solves (like: I wouldn't be suprised if it
> breaks our test suite).

The test suite would need to be modified to generate properly unique
UUIDs.  The current behavior is broken (IMHO).

-Dan C

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: FSFS optimization

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Wed, Sep 19, 2007 at 02:18:40PM +0100, Malcolm Rowe wrote:
> > A completely separate idea is to work on caching revision files on
> > local disk (so the full text version doesn't have to be regenerated
> > repeatedly).
> 
> +1.  A server-side rep cache sounds like an excellent idea.

Oh, something that makes any kind of caching hard to do in FSFS is that
the following should work (or, rather, we haven't said it shouldn't, and
haven't prohibited it in any meaningful way):

$ svnserve -d -r .
$ svnadmin create repo
$ svnadmin load repo < dumpfile

Option 1:
$ svn co svn://localhost/repo wc
$ touch wc/foo; svn add wc/foo; svn ci wc -m "log message"; rm -rf wc
  # now undo that commit:
$ rm -rf repo; svnadmin create repo; svnadmin load repo < dumpfile
  # (how can svnserve [or any process] know to invalidate the cache?]

Option 2:
$ svnadmin hotcopy repo repocopy
$ svn co svn://localhost/repo wc
$ touch wc/foo; svn add wc/foo; svn ci wc -m "log message"; rm -rf wc
$ svn ls svn://localhost/repocopy
  # oops! Two filesystems open with the same UUID.  Hope we're not just
  # using UUID/path/rev as the cache key.


I've been wondering whether to check for 'same UUID, different
filesystem' at open time and disallow it (since we already have data
structures that are keyed off the UUID).  I'm concerned that that might
cause more problems that it solves (like: I wouldn't be suprised if it
breaks our test suite).

Regards,
Malcolm

Re: FSFS optimization

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Wed, Sep 19, 2007 at 07:09:09AM -0700, Blair Zajac wrote:
>  I need long lived transactions that are visible to multiple systems, so 
>  definitely don't want to make this the only place where transactions are 
>  built.  I would like to see a configuration for fsfs that would specify 
>  where the transaction directory is.
> 

This would absolutely need to be configurable, yes.  If you want
mutliple front-end servers (against, say, an NFS backend), you need to
either use session-affinity or leave the transaction on NFS.

Regards,
Malcolm

Re: FSFS optimization

Posted by Blair Zajac <bl...@orcaware.com>.
On Sep 19, 2007, at 6:18 AM, Malcolm Rowe wrote:

> On Mon, Aug 13, 2007 at 03:18:59PM -0700, Dan Christian wrote:
>> The first thought is to work on caching some of these:  "node.0.0" is
>> the node revision, and "props" contains the revision properties.  The
>> "changes" file is always(?) appended to, so we might be able to keep
>> it open and keep appending to it.  The tricky bit is that the httpd
>> will close at some point and an a fresh one will end up advancing the
>> transaction.  This implies that the disk version must be kept up to
>> date.
>>
>
> This is the key, actually.  NFS only guarantees close-to-open cache
> coherency, so we need to ensure a file written by one process/host is
> closed before it's opened by another one... and there unfortunately
> isn't any explicit FS call that says "I'm finished with this  
> transaction
> on this process for the minute, feel free to flush state to disk now".
>
> And so we end up flushing to disk after _every_ operation, which  
> sucks.
>
>> An alternative/additional thought would be to move the transactions
>> directory to local disk so that the OS can cache more aggressively.
>> This would require network load balancers to always direct the same
>> client to the same server.
>
> I think this is the way to go.  I had a patch to do this, I think,
> though I don't think I had thought through everything, so there  
> might be
> lurking issues.  (Also, from a UI point-of-view, how do you switch it
> on?)

I need long lived transactions that are visible to multiple systems,  
so definitely don't want to make this the only place where  
transactions are built.  I would like to see a configuration for fsfs  
that would specify where the transaction directory is.

>
>> A completely separate idea is to work on caching revision files on
>> local disk (so the full text version doesn't have to be regenerated
>> repeatedly).
>
> +1.  A server-side rep cache sounds like an excellent idea.
> (Again, I wonder about the configuration, though.  Maybe FSFS needs  
> some
> equivalent to the DB_CONFIG file... or maybe svnadmin should have an
> fsfstune command?)

A DB_CONFIG like file sounds easier to code, just to edit a text  
file.  But maybe you want to be able to edit that file in a live  
repository, so you would need an svnadmin fsfstune that would allow  
you to change values and move transactions in progress from one  
directory to another.

Regards,
Blair

-- 
Blair Zajac, Ph.D.
<bl...@orcaware.com>
Subversion training, consulting and support
http://www.orcaware.com/training/



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org