You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Stefan Fuhrmann <eq...@web.de> on 2010/12/29 19:37:58 UTC

Re: FSFS format 6

On 29.12.2010 01:58, Johan Corveleyn wrote:
> On Sun, Dec 12, 2010 at 4:23 PM, Stefan Fuhrmann
> <st...@alice-dsl.de>  wrote:
>> On 19.10.2010 15:10, Daniel Shahaf wrote:
>>> Greg Stein wrote on Tue, Oct 19, 2010 at 04:31:42 -0400:
>>>> Personally, I see [FSv2] as a broad swath of API changes to align our
>>>> needs with the underlying storage. Trowbridge noted that our current
>>>> API makes it *really* difficult to implement an effective backend. I'd
>>>> also like to see a backend that allows for parallel PUTs during the
>>>> commit process. Hyrum sees FSv2 as some kind of super-key-value
>>>> storage with layers on top, allowing for various types of high-scaling
>>>> mechanisms.
>>> At the retreat, stefan2 also had some thoughts about this...
>>>
>> [This is just a brain-dump for 1.8+]
>>
>> While working on the performance branch I made some
>> observations concerning the way FSFS organizes data
>> and how that could be changed for reduced I/O overhead.
>>
>> notes/fsfs-improvements.txt contains a summary of that
>> could be done to improve FSFS before FS-NG. A later
>> FS-NG implementation should then still benefit from the
>> improvements.
> +(number of fopen calls during a log operation)
>
> I like this proposal a lot. As I already told before, we are running
> our FSFS back-end on a SAN over NFS (and I suspect we're not the only
> company doing this). In this environment, the server-side I/O of SVN
> (especially the amount of random reads and fopen calls during e.g.
> log) is often the major bottleneck.
>
> There is one question going around in my head though: won't you have
> to change/rearrange a lot of the FS layer code (and maybe repos
> layer?) to benefit from this new format?
Maybe. But as far as I understand the current
FSFS structure, data access is mainly chasing
pointers, i.e. reading relative or absolute byte
offsets and moving there for the next piece of
information. If everything goes well, none of that
code needs to change; the revision packing
algorithm will simply produce different offset
values.
> The current code is written in a certain way, not particularly
> optimized for this new format (I seem to remember "log" does around 10
> fopen calls for every interesting rev file, each time reading a
> different part of it). Also, if an operation currently needs to access
> many revisions (like log or blame), it doesn't take advantage at all
> of the fact that they might be in a single packed rev file. The pack
> file is opened and seeked in just as much as the sum of the individual
> rev files.
The fopen() calls should be eliminated by the
file handle cache. IOW, they should already be
addressed on the performance branch. Please
let me know if that is not the case.

FSFS format 6 would primarily reduce the number
of seek() and read() calls. Once the seeks() are
"in check", the size of the read buffer might become
configurable: remote file access might benefit from
larger buffers, e.g. equal to the network throughput
per 1 .. 10 ms.
> So: how will the current code be able to take advantage of this new
> format? Won't this require a major effort to restructure that code?
Old servers won't be able to read format 6 repos
(maybe they will but there is no guarantee). If a
large scale restructuring of the code would be
necessary, I may not be able to do and validate it.

The packing code, however, will probably be
completely rewritten.
> (This reminds me of the current difficulty (as I can see it, as an
> innocent bystander) with the WC-NG rewrite: theoretically it should be
> very fast, but the "higher level" code is still largely based upon the
> old principles. So to take advantage of it, certain things have to be
> changed at the higher level, making operations work "dir-based" or
> "tree-based", instead of file-based etc).
Well, the official goal is still to make 1.7 clients
faster than 1.6 for every operation. But there will
certainly be room for improvement in 1.8.

-- Stefan^2.

Re: FSFS format 6

Posted by Stefan Fuhrmann <eq...@web.de>.

On 20.02.2011 09:50, Ivan Zhakov wrote:
> On Wed, Dec 29, 2010 at 22:37, Stefan Fuhrmann<eq...@web.de>  wrote:
>> The fopen() calls should be eliminated by the
>> file handle cache. IOW, they should already be
>> addressed on the performance branch. Please
>> let me know if that is not the case.
>>
> Just my 20 cents.
High roller.
> My belief that file handles cache should be implemented at OS level
> and I pretty sure that it's implemented.
You can certainly data to demonstrate your claim?

In fact, fopen() is extremely expensive (1..5ms) on FS with
ACLs. Even for a local, low overhead (EXT3) FS, the effect
of handle caching is significant:

time ./svnadmin verify $TSVN_MIRROR -q -F 256 -M 0
real   1m46.603s
user   1m43.474s
sys    0m3.132s

time ./svnadmin verify $TSVN_MIRROR -q -F 0 -M 0
real   2m26.664s
user   2m0.856s
sys    0m25.818s

Note that the gains are split about 50:50 between the OS
and the application. Things become even more interesting
albeit less easily demonstrable with concurrent queries
being run by a threaded server. One would expect a even
higher level of reuse.
> And right way to eliminate
> number of duplicate fopen()/reads() is improving our FS API.
Why would that be necessary if the OS already takes care
of all the optimizations?

FSFS6 is about optimizing the interface between OS and
the FSFS code: Fewer seek()s and drastically reduced
number of read()s.

Once that is in place and its behavior well understood, we
may start designing I/O aggregation and scheduling. In
particular holding off requests while another request already
fetches the desired data, will be a very interesting task

 From what I understood of the FS API there is very little
that needed to be added to allow for effective I/O optimization.
Basically, I simple "advise" or "prefetch" option on the
read functions could possibly do the trick.

If we get to that stage, I'm sure to receive "the OS should
take care of I/O scheduling and stuff" posts.
> I didn't reviewed how file handles cache is implemented in
> fs-performance branch, but I'm nearly to -1 against implementing cache
> of open file handles in Subversion.
File handle caching definitely has its drawbacks, risks
in particular. The number of file handles within an OS
instance is quite limited (typ. 1000) and open files may
prevent file deletion (e.g. during packing). The code is
supposed to take care of the latter but may be faulty.

Alternative designs are welcome.

-- Stefan^2.

Re: FSFS format 6

Posted by Stefan Fuhrmann <eq...@web.de>.

On 20.02.2011 21:02, Johan Corveleyn wrote:
> On Sun, Feb 20, 2011 at 6:35 PM, Mark Mielke<ma...@mark.mielke.cc>  wrote:
>
>> That said, I'm also (in principle) against implementing cache of open file
>> handles. I prefer architectures that cache intermediate data in a processed
>> form that the application has made a determined choice to make use of such
>> that the cache is the most useful to the application, rather than a
>> transparent caching layer that guesses at what is safe. The OS file system
>> layer is exactly this - any caching it does is transparent to the
>> application and a guess. Guesses are dangerous, which is exactly why the OS
>> file system layer cannot do as much caching unless it has 100% control of
>> the file system (= local file system).
Agreed. For that very reason, I added extensive
caching to the FSFS code and got even more of that
in the pipeline for 1.8.

That being said, there are still typical situations in
which the data cache may not be effective:

* access to relatively rarely read data
   (log, older tags;
    you still want to perform decently in that case)
* first access to the latest revision
   (due to the way transactions are implemented,
    it is difficult to fill all the caches upon write)
* amount of active data > available RAM
   (throws you back to the first issue more often)

> I agree that it would be best if the architecture was so that svn
> could organize its work for most use cases in a way that's efficient
> for the lower levels of the system. For instance, for "svn log", svn
> should in theory be able to do its work with exactly 1 open/close per
> rev file (or in a packed repository, maybe even only 1 open/close per
> packed file).
Yes, it may be very hard to anticipate what data may
be needed further down the road, even if we had a
marvelous "1 query gets it all" interface where feasible:
svn log, for instance, is often run with a limit on the number
of results. However, there is no way to tell how much of
a packed file needs to be read to process that query.
There is only a lower bound.

So, it can be very beneficial to keep a small number of
file handles around to "bridge" various stages / iterations
within a single request.
> But right now, this isn't the case, and I think it would be a huge
> amount of work, change in architecture, layering, ... Until that
> happens, I think such a generic file-handle caching layer could prove
> very helpful :-). Note though that, if I understood correctly, the
> file-handle caching of the performance branch will not be reintegrated
> into 1.7, but maybe 1.8 ...
>
> But maybe stefan2 can comment more on that :-).
Because keeping file open for a potentially much
longer period of time may have an impact on other,
rarely run operations like pack, I don't think we should
risk merging this into 1.7.

-- Stefan^2.

Re: FSFS format 6

Posted by Johan Corveleyn <jc...@gmail.com>.

On Sun, Feb 20, 2011 at 6:35 PM, Mark Mielke <ma...@mark.mielke.cc> wrote:
> On 02/20/2011 03:50 AM, Ivan Zhakov wrote:
>>
>> On Wed, Dec 29, 2010 at 22:37, Stefan Fuhrmann<eq...@web.de>  wrote:
>>>
>>> The fopen() calls should be eliminated by the
>>> file handle cache. IOW, they should already be
>>> addressed on the performance branch. Please
>>> let me know if that is not the case.
>>
>> My belief that file handles cache should be implemented at OS level
>> and I pretty sure that it's implemented. And right way to eliminate
>> number of duplicate fopen()/reads() is improving our FS API.
>>
>> I didn't reviewed how file handles cache is implemented in
>> fs-performance branch, but I'm nearly to -1 against implementing cache
>> of open file handles in Subversion.
>
> What OS implements file handle caching? The OS file system layer for most
> operating systems does implement caching - but open()/close() can easily
> invalidate some or all of this cache due to required POSIX behaviour,
> especially if the backend storage is remote and shared between multiple
> clients such as would be the case over NFS. This is required to implement
> consistency across clients. The local operating system cannot arbitrarily
> cache everything, and every bit of data it does decide to cache could be
> wrong at any point in time without other aspects in use such as file
> locking.
>
> Of particular concern to me is how slow Subversion gets over NFS, and this
> thread grabbed my attention as a result. When using NFS Subversion
> operations can take many times longer (20 seconds -> 20 minutes). I think
> people may be testing and making assumptions that a "local file system" will
> be in use. Do people working on the fs-performance branch check with NFS?
>
> I don't know... just dropping in... feel free to set me straight. :-)

Hi Mark,

You're absolutely right, some Subversion operations perform horribly
with FSFS over NFS (we have such a setup @work). In fact, the poor
performance of e.g. "svn log somefile" on NFS was one of the problems
I was first interested in when looking at svn (and one of the reasons
I got involved with svn development, a positive side-effect :-)).

On our setup at work, "svn log" is about 10 times slower when done
over NFS than on local disk. As I described in this thread (but also
some threads before), "svn log somefile" opens and closes each rev
file about 20 times (and the situation is not better with a packed
repository, because the packed file is opened/closed just as many
times), and it seems that is very expensive when working over NFS.

I haven't been able to test the performance branch (with the file
handle caching) on our NFS setup at work. I have only measured the
number of fopen() calls for an "svn log" operation, compared to trunk,
assuming that is *the* most critical performance differentiator for
NFS setups.

If someone could do some real measurements/benchmarks of "svn log"
(and other operations of course) of the performance branch on an NFS
setup, compared with trunk (and maybe also compare them with a similar
setup with FSFS on local disk), that could be very interesting...

> That said, I'm also (in principle) against implementing cache of open file
> handles. I prefer architectures that cache intermediate data in a processed
> form that the application has made a determined choice to make use of such
> that the cache is the most useful to the application, rather than a
> transparent caching layer that guesses at what is safe. The OS file system
> layer is exactly this - any caching it does is transparent to the
> application and a guess. Guesses are dangerous, which is exactly why the OS
> file system layer cannot do as much caching unless it has 100% control of
> the file system (= local file system).

I agree that it would be best if the architecture was so that svn
could organize its work for most use cases in a way that's efficient
for the lower levels of the system. For instance, for "svn log", svn
should in theory be able to do its work with exactly 1 open/close per
rev file (or in a packed repository, maybe even only 1 open/close per
packed file).

But right now, this isn't the case, and I think it would be a huge
amount of work, change in architecture, layering, ... Until that
happens, I think such a generic file-handle caching layer could prove
very helpful :-). Note though that, if I understood correctly, the
file-handle caching of the performance branch will not be reintegrated
into 1.7, but maybe 1.8 ...

But maybe stefan2 can comment more on that :-).

Cheers,
-- 
Johan

Re: FSFS format 6

Posted by Mark Mielke <ma...@mark.mielke.cc>.

On 02/20/2011 03:50 AM, Ivan Zhakov wrote:
> On Wed, Dec 29, 2010 at 22:37, Stefan Fuhrmann<eq...@web.de>  wrote:
>> The fopen() calls should be eliminated by the
>> file handle cache. IOW, they should already be
>> addressed on the performance branch. Please
>> let me know if that is not the case.
> My belief that file handles cache should be implemented at OS level
> and I pretty sure that it's implemented. And right way to eliminate
> number of duplicate fopen()/reads() is improving our FS API.
>
> I didn't reviewed how file handles cache is implemented in
> fs-performance branch, but I'm nearly to -1 against implementing cache
> of open file handles in Subversion.

What OS implements file handle caching? The OS file system layer for 
most operating systems does implement caching - but open()/close() can 
easily invalidate some or all of this cache due to required POSIX 
behaviour, especially if the backend storage is remote and shared 
between multiple clients such as would be the case over NFS. This is 
required to implement consistency across clients. The local operating 
system cannot arbitrarily cache everything, and every bit of data it 
does decide to cache could be wrong at any point in time without other 
aspects in use such as file locking.

Of particular concern to me is how slow Subversion gets over NFS, and 
this thread grabbed my attention as a result. When using NFS Subversion 
operations can take many times longer (20 seconds -> 20 minutes). I 
think people may be testing and making assumptions that a "local file 
system" will be in use. Do people working on the fs-performance branch 
check with NFS?

I don't know... just dropping in... feel free to set me straight. :-)

That said, I'm also (in principle) against implementing cache of open 
file handles. I prefer architectures that cache intermediate data in a 
processed form that the application has made a determined choice to make 
use of such that the cache is the most useful to the application, rather 
than a transparent caching layer that guesses at what is safe. The OS 
file system layer is exactly this - any caching it does is transparent 
to the application and a guess. Guesses are dangerous, which is exactly 
why the OS file system layer cannot do as much caching unless it has 
100% control of the file system (= local file system).

Cheers,
mark

-- 
Mark Mielke<ma...@mielke.cc>

Re: FSFS format 6

Posted by Ivan Zhakov <iv...@visualsvn.com>.

On Wed, Dec 29, 2010 at 22:37, Stefan Fuhrmann <eq...@web.de> wrote:
> The fopen() calls should be eliminated by the
> file handle cache. IOW, they should already be
> addressed on the performance branch. Please
> let me know if that is not the case.
>
Just my 20 cents.

My belief that file handles cache should be implemented at OS level
and I pretty sure that it's implemented. And right way to eliminate
number of duplicate fopen()/reads() is improving our FS API.

I didn't reviewed how file handles cache is implemented in
fs-performance branch, but I'm nearly to -1 against implementing cache
of open file handles in Subversion.

-- 
Ivan Zhakov

Re: FSFS format 6

Posted by Stefan Fuhrmann <eq...@web.de>.

On 24.01.2011 03:12, Johan Corveleyn wrote:
> On Wed, Dec 29, 2010 at 8:37 PM, Stefan Fuhrmann<eq...@web.de>  wrote:
>> On 29.12.2010 01:58, Johan Corveleyn wrote:
>>> The current code is written in a certain way, not particularly
>>> optimized for this new format (I seem to remember "log" does around 10
>>> fopen calls for every interesting rev file, each time reading a
>>> different part of it). Also, if an operation currently needs to access
>>> many revisions (like log or blame), it doesn't take advantage at all
>>> of the fact that they might be in a single packed rev file. The pack
>>> file is opened and seeked in just as much as the sum of the individual
>>> rev files.
>> The fopen() calls should be eliminated by the
>> file handle cache. IOW, they should already be
>> addressed on the performance branch. Please
>> let me know if that is not the case.
> Ok, finally got around to verifying this.
Thanks for taking the time.
> You are completely correct: the performance branch avoids the vast
> amount of repeated fopen() calls. With a simple test (testfile with 3
> revisions, executing "svn log" of it) (note: this is an unpacked 1.6
> repository):
>
> - trunk: opens each rev file between 19 and 21 times.
>
> - performance branch: opens each rev file 2 times.
>
> (I don't know why it's not simply 1 time, but ok, 2 times is already a
> factor 10 better than trunk :-)).
The file cache won't hand out the same handle
at the same twice. If one part of the FSFS opens
a revision file and keeps it open for some reason
while a sub-routine also needs to read the same
file without having access to the parent's handle,
it will open the same file a second time.
> I tested this simply by adding one line of printf instrumentation
> inside libsvn_subr/io.c#svn_io_file_open (see patch in attachment, as
> well as the output for trunk and for perf-branch).
When developing the file handle cache, I used
a similar method (also counting some other
low-level file operation statistics).
> Now, if only that file-handle cache could be merged to trunk :-) ...
As opposed to the full text cache, the file handle
cache may have unknown side effects as it keeps
files open longer than may be expected.

-- Stefan^2.

Re: FSFS format 6

Posted by Johan Corveleyn <jc...@gmail.com>.

On Wed, Dec 29, 2010 at 8:37 PM, Stefan Fuhrmann <eq...@web.de> wrote:
> On 29.12.2010 01:58, Johan Corveleyn wrote:
>>
>> On Sun, Dec 12, 2010 at 4:23 PM, Stefan Fuhrmann
>> <st...@alice-dsl.de> �wrote:
>>>
>>> On 19.10.2010 15:10, Daniel Shahaf wrote:
>>>>
>>>> Greg Stein wrote on Tue, Oct 19, 2010 at 04:31:42 -0400:
>>>>>
>>>>> Personally, I see [FSv2] as a broad swath of API changes to align our
>>>>> needs with the underlying storage. Trowbridge noted that our current
>>>>> API makes it *really* difficult to implement an effective backend. I'd
>>>>> also like to see a backend that allows for parallel PUTs during the
>>>>> commit process. Hyrum sees FSv2 as some kind of super-key-value
>>>>> storage with layers on top, allowing for various types of high-scaling
>>>>> mechanisms.
>>>>
>>>> At the retreat, stefan2 also had some thoughts about this...
>>>>
>>> [This is just a brain-dump for 1.8+]
>>>
>>> While working on the performance branch I made some
>>> observations concerning the way FSFS organizes data
>>> and how that could be changed for reduced I/O overhead.
>>>
>>> notes/fsfs-improvements.txt contains a summary of that
>>> could be done to improve FSFS before FS-NG. A later
>>> FS-NG implementation should then still benefit from the
>>> improvements.
>>
>> +(number of fopen calls during a log operation)
>>
>> I like this proposal a lot. As I already told before, we are running
>> our FSFS back-end on a SAN over NFS (and I suspect we're not the only
>> company doing this). In this environment, the server-side I/O of SVN
>> (especially the amount of random reads and fopen calls during e.g.
>> log) is often the major bottleneck.
>>
>> There is one question going around in my head though: won't you have
>> to change/rearrange a lot of the FS layer code (and maybe repos
>> layer?) to benefit from this new format?
>
> Maybe. But as far as I understand the current
> FSFS structure, data access is mainly chasing
> pointers, i.e. reading relative or absolute byte
> offsets and moving there for the next piece of
> information. If everything goes well, none of that
> code needs to change; the revision packing
> algorithm will simply produce different offset
> values.
>>
>> The current code is written in a certain way, not particularly
>> optimized for this new format (I seem to remember "log" does around 10
>> fopen calls for every interesting rev file, each time reading a
>> different part of it). Also, if an operation currently needs to access
>> many revisions (like log or blame), it doesn't take advantage at all
>> of the fact that they might be in a single packed rev file. The pack
>> file is opened and seeked in just as much as the sum of the individual
>> rev files.
>
> The fopen() calls should be eliminated by the
> file handle cache. IOW, they should already be
> addressed on the performance branch. Please
> let me know if that is not the case.

Ok, finally got around to verifying this.

You are completely correct: the performance branch avoids the vast
amount of repeated fopen() calls. With a simple test (testfile with 3
revisions, executing "svn log" of it) (note: this is an unpacked 1.6
repository):

- trunk: opens each rev file between 19 and 21 times.

- performance branch: opens each rev file 2 times.

(I don't know why it's not simply 1 time, but ok, 2 times is already a
factor 10 better than trunk :-)).

I tested this simply by adding one line of printf instrumentation
inside libsvn_subr/io.c#svn_io_file_open (see patch in attachment, as
well as the output for trunk and for perf-branch).

Now, if only that file-handle cache could be merged to trunk :-) ...

Cheers,
-- 
Johan