You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ivan Zhakov <iv...@visualsvn.com> on 2016/02/17 15:33:21 UTC

Re: svn commit: r1730617 - /subversion/trunk/subversion/libsvn_repos/log.c

On 16 February 2016 at 00:47,  <st...@apache.org> wrote:
> Author: stefan2
> Date: Mon Feb 15 21:47:00 2016
> New Revision: 1730617
>
> URL: http://svn.apache.org/viewvc?rev=1730617&view=rev
> Log:
> Continue work on the svn_repos_get_logs4 to svn_repos_get_logs5 migration:
> Switch the last svn_fs_paths_changed2 call to svn_fs_paths_changed3.
>
> * subversion/libsvn_repos/log.c
>   (fs_mergeinfo_changed): No longer fetch the whole changes list.  However,
>                           we need to iterate twice for best total performance
>                           and we need to minimize FS iterator lifetimes.
>

It seems that I would be -1 against this particular change. In the
current implementation the svn_fs_paths_changed3() is called twice
that in the worst case will lead to *double read from disk*.

As far as I understand you're relying to the fact that the second call
will hit the FSFS/FSX cache. But there will be a significant
performance degradation comparing to the 1.9 implementation in the
case of cache miss.

As we are adding more and more of such code, more and more users
become faced with the significant performance regression (see [1] and
other cases).

Do you intend to resolve this problem in the future commits? I have
some obvious solutions in mind, but maybe you also know something
about this.

[1] http://svn.haxx.se/users/archive-2015-12/0060.shtml

-- 
Ivan Zhakov

Re: svn commit: r1730617 - /subversion/trunk/subversion/libsvn_repos/log.c

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Feb 17, 2016 at 05:33:21PM +0300, Ivan Zhakov wrote:
> As we are adding more and more of such code, more and more users
> become faced with the significant performance regression (see [1] and
> other cases).
> 
> Do you intend to resolve this problem in the future commits? I have
> some obvious solutions in mind, but maybe you also know something
> about this.
> 
> [1] http://svn.haxx.se/users/archive-2015-12/0060.shtml

As I understand it, that's a misconfigured server, not a bug, at least
in terms of where our configuration guidelines stand right now.

And as I mentioned in that thread, I'd prefer if we could fail more
gracefully in such situations. But I also advocated for a small default
cache size to keep svn working in low-memory environments by default.
Just growing the cache by default is not really an option I'd like.
But perhaps we should reconsider raising the default size to something like
1GB in the next patch release, and have the server fail to start up in
low-memory environments rather than failing during normal operation in
standard servers.

In the long term, could we try to make the cache tune itself at runtime
instead of forcing users to specify a size in config files or command line?
Is this really a setting we want users to have to worry about?
We've always been trying to avoid configuration knobs, and cache size is a
big warty knob people have to tune on virtually every SVN server instance
out there. If we could automate this somehow I believe we would see a lot
of benefits.

Re: svn commit: r1730617 - /subversion/trunk/subversion/libsvn_repos/log.c

Posted by Stefan Fuhrmann <st...@alice-dsl.de>.
On 17.02.2016 15:33, Ivan Zhakov wrote:
> On 16 February 2016 at 00:47,  <st...@apache.org> wrote:
>> Author: stefan2
>> Date: Mon Feb 15 21:47:00 2016
>> New Revision: 1730617
>>
>> URL: http://svn.apache.org/viewvc?rev=1730617&view=rev
>> Log:
>> Continue work on the svn_repos_get_logs4 to svn_repos_get_logs5 migration:
>> Switch the last svn_fs_paths_changed2 call to svn_fs_paths_changed3.
>>
>> * subversion/libsvn_repos/log.c
>>    (fs_mergeinfo_changed): No longer fetch the whole changes list.  However,
>>                            we need to iterate twice for best total performance
>>                            and we need to minimize FS iterator lifetimes.
>>
>
> It seems that I would be -1 against this particular change. In the
> current implementation the svn_fs_paths_changed3() is called twice
> that in the worst case will lead to *double read from disk*.

Pick your worst-case behaviour:

(1) Crash the server with OOM.
(2) For each change in a revision, perform a 1-step history lookup
     (random I/O, about 2x as much data to read as with (3)).
(3) Read a linear file section twice.

I went with option (3).  What is your preference?

> As far as I understand you're relying to the fact that the second call
> will hit the FSFS/FSX cache. But there will be a significant
> performance degradation comparing to the 1.9 implementation in the
> case of cache miss.

On a system without OS-side file cache, 'log -g'
on the repository root would take at most twice
as long as in 1.9.  Other operations are not
affected.

Running it on any other directory will actually
be faster in many cases with 1.10 because the new
history traversal code no longer reconstructs all
directories up the tree for each revision.

So, hardly anything people will complain about.

> As we are adding more and more of such code, more and more users
> become faced with the significant performance regression (see [1] and
> other cases).

If you consider [1] a significant performance regression,
please follow up on it and review the two relevant
backports for 1.9.

> Do you intend to resolve this problem in the future commits? I have
> some obvious solutions in mind, but maybe you also know something
> about this.

The only reasonable alternative is to pick option
(2) and hope that the performance regression in
real-life is acceptable.

> [1] http://svn.haxx.se/users/archive-2015-12/0060.shtml

-- Stefan^2.