You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ivan Zhakov <iv...@visualsvn.com> on 2012/01/03 10:01:23 UTC

Re: file_handle_cache branch ready for review

On Sun, Dec 18, 2011 at 22:46, Stefan Fuhrmann
<st...@alice-dsl.de> wrote:
> For interested parties,
>
> Work on said branch is done so far and I think
> it is generally ready for being merged into /trunk.
> But it is certainly a good idea to give it some
> detailed review, in particular with regard to side-
> effects on e.g. FSFS recovery code.
>
> What it does in a nutshell:
>
> * to FSFS svn_fs_t, attaches a cache for open file
>  handles to reduce the number of open, lseek,
>  read and close operations
> * svnserve CL and apache config option to set
>  the per-process maximum of cached handles
>  (default: 16)
>
> * use cached file handles for r/o access to revision
>  files in FSFS
> * notify the cache after revision files have been
>  written ("flush" cache entries for these files)
>
> With the repository on a linux RAM disk, it speeds
> up a "cold" export of TSVN /trunk by approx. 10%.
> Slower disks may show larger gains.
>
> If there are no major objections, I would like to merge
> the code into /trunk around mid-January.
>
Hi, Stefan!

As I stated before I'm -1 for merging file-handles cache stuff to trunk:
1. The performance gain is not significant and may vary from
environment to environment.
2. We introduce really dirty hacks. After merging your branch we have
to worry about file handles to keep them in reusable state or do not
forget to notify file handles cache to flush this handle.
3. By design handles are not intended to be cached, underlying objects
should be cached. In your branch your introduce handles to handles.
4. It seems current implementation reuses file handles even error is
occurred when working with the handle. It's potentially dangerous from
my point view.

-- 
Ivan Zhakov

Re: file_handle_cache branch ready for review

Posted by Julian Foad <ju...@btopenworld.com>.
Hyrum K Wright wrote:

> Branko Čibej wrote:
>>  On 09.01.2012 14:56, Hyrum K Wright wrote:
>>>  Until we can change the minimum required version of APR, it just 
>>> isn't  worth the hassle.
>> [...]
>>  For something like the filehandle cache, which is not a functional
>>  requirement, we can then use it if APR has it, or just not use it if it
>>  doesn't.
> 
> Sure, and that jives with what we've done in the past.  We can also
> eventually bump the minimum APR requirements so that we can guarantee
> the APR implementation exists.

+1 to all this.  I take your point about maintaining two versions; in the past we've mainly (only?) done this with very small additions to APR.  And I overlooked the fact that this (file handle caching) can be an optional feature.

- Julian

Re: file_handle_cache branch ready for review

Posted by Hyrum K Wright <hy...@wandisco.com>.
On Mon, Jan 9, 2012 at 8:37 AM, Branko Čibej <br...@apache.org> wrote:
> On 09.01.2012 14:56, Hyrum K Wright wrote:
>> Until we can change the minimum required version of APR, it just isn't
>> worth the hassle. -Hyrum
>
> We can change the minimum required version of APR any time we want,
> really. Our API versioning guidelines aren't /that/ set in stone. Sure,
> we'd have to announce that we plan to stop supporting apr-0.9.x long
> enough in advance, but since we already "support" two different ABIs,
> that's the same as saying we picked one ABI over another.
>
> Once we're down that road, we can pick, e.g., 1.3 instead of the latest
> release, and go with that.

I would *love* to do this, and have been arguing for it for years.  I
always get rebuffed by the "0.9 isn't ABI-compatible with 1.x, so we'd
have to go 2.0" crowd.  If we can reach consensus to finally do this,
I'm happy to help work out the details.

> For something like the filehandle cache, which is not a functional
> requirement, we can then use it if APR has it, or just not use it if it
> doesn't.

Sure, and that jives with what we've done in the past.  We can also
eventually bump the minimum APR requirements so that we can guarantee
the APR implementation exists.

-Hyrum


-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Re: file_handle_cache branch ready for review

Posted by Branko Čibej <br...@apache.org>.
On 09.01.2012 14:56, Hyrum K Wright wrote:
> Until we can change the minimum required version of APR, it just isn't
> worth the hassle. -Hyrum 

We can change the minimum required version of APR any time we want,
really. Our API versioning guidelines aren't /that/ set in stone. Sure,
we'd have to announce that we plan to stop supporting apr-0.9.x long
enough in advance, but since we already "support" two different ABIs,
that's the same as saying we picked one ABI over another.

Once we're down that road, we can pick, e.g., 1.3 instead of the latest
release, and go with that.

For something like the filehandle cache, which is not a functional
requirement, we can then use it if APR has it, or just not use it if it
doesn't.

-- Brane

Re: file_handle_cache branch ready for review

Posted by Hyrum K Wright <hy...@wandisco.com>.
On Mon, Jan 9, 2012 at 2:55 AM, Julian Foad <ju...@btopenworld.com> wrote:
> Stefan Fuhrmann wrote:
>
>> Ivan Zhakov wrote:
>>>    In your branch your introduce handles to handles.
>>> APR does the same. Is that unreasonable?
>>>  4. It seems current implementation reuses file handles even error is
>>>  occurred when working with the handle. It's potentially dangerous from
>>>  my point view.
>>>
>> Interesting point. But I think it will not cause issues in
>> our case because we use it only within a FSFS "session".
>> Any I/O error should result in a failure of the respective FS
>> operation, in turn freeing the whole cache instance.
>>
>> The whole problem with I/O improvement is that there
>> is probably no other way to make progress in the next
>> 3 years: Some of it could be addressed within APR,
>> but that won't be available for Apache < 2.4 (2.6?).
>> A better solution is FSv2 but that is at least 3 years
>> away from now.
>
> It is worth considering putting this kind of work into the APR source code.  We have close links with APR: some of the APR developers are Subversion developers.  Historically, what we have done before when developing an improvement that is best suited to being in APR, is:
>
>   * write the new code in Subversion first;
>   * port the new code to APR;
>   * make Subversion use the APR version if it is available, otherwise the Subversion version of the code;
>   * test Subversion using both versions of the code;
>   * submit a patch to the APR project;
>   * wait a few years;
>   * strip out Subversion's own copy of the code.

While I'm all for moving things upstream to APR--I've done it before
myself!--the end result is usually less than optimal.

Realistically, until we allow ourselves to move off of the APR 0.9
series, we still have to haul around our own private implementation of
whatever it is we ported upstream.  This leads to all the standard
issues with maintaining two implementations, and usually isn't worth
the effort.  Until we can change the minimum required version of APR,
it just isn't worth the hassle.

-Hyrum

-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Re: file_handle_cache branch ready for review

Posted by Julian Foad <ju...@btopenworld.com>.
Stefan Fuhrmann wrote:

> Ivan Zhakov wrote:
>>    In your branch your introduce handles to handles.
>> APR does the same. Is that unreasonable?
>>  4. It seems current implementation reuses file handles even error is
>>  occurred when working with the handle. It's potentially dangerous from
>>  my point view.
>> 
> Interesting point. But I think it will not cause issues in
> our case because we use it only within a FSFS "session".
> Any I/O error should result in a failure of the respective FS
> operation, in turn freeing the whole cache instance.
> 
> The whole problem with I/O improvement is that there
> is probably no other way to make progress in the next
> 3 years: Some of it could be addressed within APR,
> but that won't be available for Apache < 2.4 (2.6?).
> A better solution is FSv2 but that is at least 3 years
> away from now.

It is worth considering putting this kind of work into the APR source code.  We have close links with APR: some of the APR developers are Subversion developers.  Historically, what we have done before when developing an improvement that is best suited to being in APR, is:

  * write the new code in Subversion first;
  * port the new code to APR;
  * make Subversion use the APR version if it is available, otherwise the Subversion version of the code;
  * test Subversion using both versions of the code;
  * submit a patch to the APR project;
  * wait a few years;
  * strip out Subversion's own copy of the code.

- Julian

Re: file_handle_cache branch ready for review

Posted by Stefan Fuhrmann <st...@alice-dsl.de>.
On 03.01.2012 10:01, Ivan Zhakov wrote:
> On Sun, Dec 18, 2011 at 22:46, Stefan Fuhrmann
> <st...@alice-dsl.de>  wrote:
>> For interested parties,
>>
>> Work on said branch is done so far and I think
>> it is generally ready for being merged into /trunk.
>> But it is certainly a good idea to give it some
>> detailed review, in particular with regard to side-
>> effects on e.g. FSFS recovery code.
>>
>> What it does in a nutshell:
>>
>> * to FSFS svn_fs_t, attaches a cache for open file
>>   handles to reduce the number of open, lseek,
>>   read and close operations
>> * svnserve CL and apache config option to set
>>   the per-process maximum of cached handles
>>   (default: 16)
>>
>> * use cached file handles for r/o access to revision
>>   files in FSFS
>> * notify the cache after revision files have been
>>   written ("flush" cache entries for these files)
>>
>> With the repository on a linux RAM disk, it speeds
>> up a "cold" export of TSVN /trunk by approx. 10%.
>> Slower disks may show larger gains.
>>
>> If there are no major objections, I would like to merge
>> the code into /trunk around mid-January.
>>
> Hi, Stefan!
>
> As I stated before I'm -1 for merging file-handles cache stuff to trunk:
> 1. The performance gain is not significant and may vary from
> environment to environment.

What would constitute a significant improvement?
Keep in mind that these measurements did not
involve any physical disk access or even things
like a SAN or even repositories on NFS shares.

 From what I have seen, the patch eliminates up to
~75% of I/O activity.
> 2. We introduce really dirty hacks. After merging your branch we have
> to worry about file handles to keep them in reusable state or do not
> forget to notify file handles cache to flush this handle.

That is indeed an issue. I think I can rework the code
to disable the cache as soon as a transaction starts.
The cache is mainly effective in read requests.
> 3. By design handles are not intended to be cached, underlying objects
> should be cached.
What would those underlying objects be? File buffers,
position of the file pointer? Well, that is *exactly* what
the cache does.

In fact, most of the performance improvement stems from
intelligently aligning and re-using existing data buffers
instead of re-reading the same data over and over.

>   In your branch your introduce handles to handles.
APR does the same. Is that unreasonable?
> 4. It seems current implementation reuses file handles even error is
> occurred when working with the handle. It's potentially dangerous from
> my point view.
>
Interesting point. But I think it will not cause issues in
our case because we use it only within a FSFS "session".
Any I/O error should result in a failure of the respective FS
operation, in turn freeing the whole cache instance.

The whole problem with I/O improvement is that there
is probably no other way to make progress in the next
3 years: Some of it could be addressed within APR,
but that won't be available for Apache < 2.4 (2.6?).
A better solution is FSv2 but that is at least 3 years
away from now.

-- Stefan^2.