You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Bert Huijben <be...@qqmail.nl> on 2016/01/18 10:47:12 UTC

RE: svn commit: r1725184 - /subversion/branches/1.9.x/STATUS


> -----Original Message-----
> From: stefan2@apache.org [mailto:stefan2@apache.org]
> Sent: maandag 18 januari 2016 09:22
> To: commits@subversion.apache.org
> Subject: svn commit: r1725184 - /subversion/branches/1.9.x/STATUS
> 
> Author: stefan2
> Date: Mon Jan 18 08:22:27 2016
> New Revision: 1725184
> 
> URL: http://svn.apache.org/viewvc?rev=1725184&view=rev
> Log:
> * STATUS:
>   Add r1721285 (raise size limit on cachable directories).
> 

<snip>

> + * r1721285
> +   Quadruple the maximum cacheable directory size in FSFS.
> +   Justification:
> +     Not caching large directories has a massive impact on runtime with
> +     the cutoff point being about 9000 entries for the default cache size.
> +     Also, memory usage with mod_dav_svn may go through the roof for
> +     uncached directories.
> +   Votes:
> +     +1: stefan2

When I look at the patch (and the log message), it looks at a very simple change... but I have absolutely no idea what the side effects are.

Given your descriptions, I would like to see it backported, but can you help us review this change?

Thanks,
	Bert

Re: svn commit: r1725184 - /subversion/branches/1.9.x/STATUS

Posted by Stefan Sperling <st...@elego.de>.

On Thu, Jan 21, 2016 at 01:59:41PM +0100, Stefan Fuhrmann wrote:
> In real world applications this means people just
> hitting the old limit aren't likely to ever hit
> the new limit while those already far beyond it
> have fixed their config by now.

One would hope they'd fix their config. I would bet most people who find
themselves stuck with a performance issue like this just give up and live
with it or decide to migrate away from SVN if they have an alternative.
Few people will turn to the 1.7.x release notes for answers, and many
will not even ask for help on users@.

Making this work out of the box for more people is good!

Re: svn commit: r1725184 - /subversion/branches/1.9.x/STATUS

Posted by Stefan Fuhrmann <st...@apache.org>.

On 21.01.2016 13:32, Stefan Sperling wrote:
> On Thu, Jan 21, 2016 at 01:20:56PM +0100, Stefan Fuhrmann wrote:
>> TL;DR: Change is effective as per attached test case.
>> No crashes should ever result from it.  Some 20% portion
>> of cache memory is guaranteed to work as before which
>> limits potential side effects.
>
> Sounds like this is related to the yellow box here?
> http://subversion.apache.org/docs/release-notes/1.7.html#server-performance-tuning

Yes, it is related.  The recent trigger was that
mod_dav_svn also causes OOM crashes for complicated
reasons once a directory is no longer cachable.

> We recently ran into a case where the default cache settings resulted
> in performance issues during checkout of a tree with large directories,
> so bad that clients just gave up and disconnected while the checkout
> was still running. Configuring the cache made the system usable.
> If this change avoids this problem by default, I like it.

Well, it pushes the limits of the default (to ~4x
the old value).

In real world applications this means people just
hitting the old limit aren't likely to ever hit
the new limit while those already far beyond it
have fixed their config by now.

-- Stefan^2.

Re: svn commit: r1725184 - /subversion/branches/1.9.x/STATUS

Posted by Stefan Sperling <st...@elego.de>.

On Thu, Jan 21, 2016 at 01:20:56PM +0100, Stefan Fuhrmann wrote:
> TL;DR: Change is effective as per attached test case.
> No crashes should ever result from it.  Some 20% portion
> of cache memory is guaranteed to work as before which
> limits potential side effects.

Sounds like this is related to the yellow box here?
http://subversion.apache.org/docs/release-notes/1.7.html#server-performance-tuning

We recently ran into a case where the default cache settings resulted
in performance issues during checkout of a tree with large directories,
so bad that clients just gave up and disconnected while the checkout
was still running. Configuring the cache made the system usable.
If this change avoids this problem by default, I like it.

And thanks for the detailed explanation!

Re: svn commit: r1725184 - /subversion/branches/1.9.x/STATUS

Posted by Stefan Fuhrmann <st...@apache.org>.

On 18.01.2016 10:47, Bert Huijben wrote:
>
>
>> -----Original Message-----
>> From: stefan2@apache.org [mailto:stefan2@apache.org]
>> Sent: maandag 18 januari 2016 09:22
>> To: commits@subversion.apache.org
>> Subject: svn commit: r1725184 - /subversion/branches/1.9.x/STATUS
>>
>> Author: stefan2
>> Date: Mon Jan 18 08:22:27 2016
>> New Revision: 1725184
>>
>> URL: http://svn.apache.org/viewvc?rev=1725184&view=rev
>> Log:
>> * STATUS:
>>    Add r1721285 (raise size limit on cachable directories).
>>
>
> <snip>
>
>> + * r1721285
>> +   Quadruple the maximum cacheable directory size in FSFS.
>> +   Justification:
>> +     Not caching large directories has a massive impact on runtime with
>> +     the cutoff point being about 9000 entries for the default cache size.
>> +     Also, memory usage with mod_dav_svn may go through the roof for
>> +     uncached directories.
>> +   Votes:
>> +     +1: stefan2
>
> When I look at the patch (and the log message), it looks at a very simple change... but I have absolutely no idea what the side effects are.
>
> Given your descriptions, I would like to see it backported, but can you help us review this change?

TL;DR: Change is effective as per attached test case.
No crashes should ever result from it.  Some 20% portion
of cache memory is guaranteed to work as before which
limits potential side effects.

If you have specific questions, I'll be here to answer them.

-- Stefan^2.

What the patch does and doesn't do
----------------------------------
The caching priority for directory objects is being raised
above 'normal'.  That will effect cachability and eviction.
It does not change cache sizes or serialization formats.

Why does that help?
-------------------
The membuffer cache tries to be fair and more effective
than a simple round robin buffer.  Since large objects
would evict lots of other data and hit opportunities,
the cache will reject very large objects.  It's
MAX_ENTRY_SIZE member stores the limit and select_level()
as well as svn_membuffer_cache_is_cachable() check it.
MAX_ENTRY_SIZE is roughly 1/10th of the cache segment.

If an item has high priority (a property that is stored
at the object-type-specific front-end cache instance),
it can be as large as the whole buffer and still gets
accepted.  See the same code locations.  This is the
point behind the change.

Is there more to it?
--------------------
Sure.  Priorities tell the cache how likely future hits
on that data are - regardless of past hits.  Index data,
for instance, gets above-normal priority while temporary
data has below-normal priority.  Rarely used high prio
data may "clog" the cache.

Therefore, the data buffer gets split into an "L1" and
a large "L2" section.  Access times are identical, only
the management scheme differs similarly to generational
garbage collectors.  L1 is a strict round-robin buffer
and will accept any data and duly record any cache hits.
This gives perfect caching on short time scales.

Upon eviction from L1, an item may get promoted to L2,
depending on size, hits and priority relative to the L2
contents it would replace.  The heuristics is in
ensure_data_insertable_l2.

All data is first put into L1 - unless it is deemed to
big.  High prio data that is too big can be written
directly into L2 following the same rules as a promotion
from L1 to L2.

Also, not all cache memory is used for the data buffer.
20% are allocated to the contents directory.  The remainder
is split 1:3 between L1 and L2.  Hence, the L1 is 20% and
L2 is 60% of the memory given to the membuffer.  Larger
caches get divided into multiple segments, so the max.
object sizes don't grow as fast as the total membuffer
size.

What are possible side effects?
-------------------------------
Because none of the "doing" code is changed and we already
use the "high prio" code path for other data, this patch
will not introduce bugs in the functionality of the tool.

The side effects for directories up to a few 1000 entries
is only that they will be kept in cache for longer, meaning
that other data (e.g. node revisions, txdelta, full texts)
will be evicted earlier.  But those have a lower hit rate
anyway (read about once or twice per request vs. 10s to
1000s of times for a directory) and are therefore more
likely to be dropped early even without the prio change.

Moreover, the L1 section of the cache is a "prio-less"
round robin buffer.  That guarantees that short-term
locality is not effected at all.

Very large directories that are not written into L2 directly
will quickly purge all other data out of cache until the
dir content fits.  Again, that is fine due to the expected
high hit rate of directory info vs. anything with normal
prio or below.  Not rereading a large directory from the
repo should easily offset any loss due to some other data
now being re-read once or twice.  Finally, the L1 still
gives some minimal (25% of the full size) cache support
for any of that lower-prio data.

IOW, it might be possible to construct an artificial
request pattern that is now slower then it used to be.  But
all of the standard requests (reporter, editor, ls etc.)
should see a huge net benefit for large dirs and virtually
no change for smaller ones.

How to verify that it is effective?
-----------------------------------
Running the attached test with the COUNT constant set to
8000, 16000 and 30000, respectively, I get debug execution
times for 'eatmydata ./fs-test 65' of 2.3s, 4.7s and 9.3s.
Without the fix it's 2.3s, 135s and 599s.  You may also
do a 'svn ls -v http://localhost/$repo/' on that repository
and should then see the memory usage issue in addition to
the performance problem.