You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Fuhrmann <st...@wandisco.com> on 2014/12/08 18:42:38 UTC

On pool / memory usage debugging

This post has been prompted by issue 4531 and r1643834
(http://subversion.tigris.org/issues/show_bug.cgi?id=4531
 http://svn.apache.org/viewvc?view=revision&revision=r1643834).
We had a somewhat similar issue with log -v & friends in
summer this year. Now, I dug into the APR code and this
what I found.

The _debug varieties of the APR pool functions don't use
the allocator nor chunky allocation (8k+ at a time) but plain
malloc / free instead and traces them individually. That means:

* On 64 bits, each alloc has a 4x8=32 bytes extra overhead
  for typical CRT implementations. Memory usage can be
  expected to double for everything that's not a delta window
  or fulltext in SVN.

* Our 4MB limit of unused memory on allocators is ineffective.
  On platforms that use MMAP in APR allocators, that can be
  a significant difference. Everywhere else, it may or may not be.

* Since we interleave allocations from use scratch and result
  pools, we can expect high levels of fragmentation. This will
  most likely prevent memory from being freed after peak usage.
  It may also increase peak memory usage.

IOW, pool debugging is nice for tracing allocations but if you
want to measure memory consumption on the OS side, turn
pool debugging off.

APR pools also replace the usually desired malloc / free
symmetry with a pool create / clear|destroy symmetry. The
individual PALLOC / PCALLOC lines in the pool usage trace
will give you context but the things one is interested in are
CREATE, CLEAR and DESTROY lines.

Finally, to minimize cache usage, make sure to disable fulltext
caching as well (enabled by default in 1.9) and set the cache
size to *1*, not 0. The latter would fall back to 1.6-style caches,
keeping a fixed number of objects stored in the svn_fs_t, e.g.
8192 full directories. With ra_serf, a single large dir higher up
in the tree is sufficient to blow get up to 100+ MB because it will
be requested in 100s or 1000s of different revisions.

Membuffer cache, OTOH, does not allocate memory after its
creation. All cache-related allocations use the scratch and
result pools provided by the caller to (de-)serialize cached
data. The cache memory itself is fixed and objects that are
too large to fit in will simply be ignored.

-- Stefan^2.

Re: On pool / memory usage debugging

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Mon, Dec 8, 2014 at 8:42 PM, Stefan Sperling <st...@elego.de> wrote:

> On Mon, Dec 08, 2014 at 08:30:31PM +0100, Stefan Fuhrmann
> > > > Finally, to minimize cache usage, make sure to disable fulltext
> > > > caching as well (enabled by default in 1.9) and set the cache
> > > > size to *1*, not 0. The latter would fall back to 1.6-style caches,
> > >
> > > Which option are you referring? The SVNInMemoryCacheSize option?
> > > The doc for that option says "0 deactivates the cache". Is this an
> error?
> > >
> > >   /* per server */
> > >   AP_INIT_TAKE1("SVNInMemoryCacheSize", SVNInMemoryCacheSize_cmd, NULL,
> > >                 RSRC_CONF,
> > >                 "specifies the maximum size in kB per process of
> > > Subversion's "
> > >                 "in-memory object cache (default value is 16384; 0
> > > deactivates "
> > >                 "the cache)."),
> > >
> >
> > It's technically correct, because the "static" per-process cache does
> > get disabled and all that's left is memory dynamically allocated for
> > an open connection (svn_fs_t, actually).
> >
> > But the doc string is misleading. I would suggest to change it to
> > "0 switches to dynamically sized caches".
>
> Yes please, that would help a lot! I can never remember what the various
> caching knobs do. The clearer the documentation, the better :)
>

Done in r1644035.

-- Stefan^2.

Re: On pool / memory usage debugging

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Dec 08, 2014 at 08:30:31PM +0100, Stefan Fuhrmann wrote:
> Hm. 381 MB are massive, then. Maybe I can find reproduce it
> and help tracking it down with a modified APR.

Yes, we should be able to manage with much less.
Though perhaps what's left is in in mod_dav rather than mod_dav_svn.

> > > Finally, to minimize cache usage, make sure to disable fulltext
> > > caching as well (enabled by default in 1.9) and set the cache
> > > size to *1*, not 0. The latter would fall back to 1.6-style caches,
> >
> > Which option are you referring? The SVNInMemoryCacheSize option?
> > The doc for that option says "0 deactivates the cache". Is this an error?
> >
> >   /* per server */
> >   AP_INIT_TAKE1("SVNInMemoryCacheSize", SVNInMemoryCacheSize_cmd, NULL,
> >                 RSRC_CONF,
> >                 "specifies the maximum size in kB per process of
> > Subversion's "
> >                 "in-memory object cache (default value is 16384; 0
> > deactivates "
> >                 "the cache)."),
> >
> 
> It's technically correct, because the "static" per-process cache does
> get disabled and all that's left is memory dynamically allocated for
> an open connection (svn_fs_t, actually).
> 
> But the doc string is misleading. I would suggest to change it to
> "0 switches to dynamically sized caches".

Yes please, that would help a lot! I can never remember what the various
caching knobs do. The clearer the documentation, the better :)

Re: On pool / memory usage debugging

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Mon, Dec 8, 2014 at 7:46 PM, Stefan Sperling <st...@elego.de> wrote:

> On Mon, Dec 08, 2014 at 06:42:38PM +0100, Stefan Fuhrmann wrote:
> > IOW, pool debugging is nice for tracing allocations but if you
> > want to measure memory consumption on the OS side, turn
> > pool debugging off.
>
> All measurements I mentioned in the issue were done with pool debugging
> disabled. Measuring memory usage of the issue #4531 copy operation with
> pool debugging enabled was impossible because the copy operation never
> completed in a reasonable amount of time due to pool-debugging-induced
> logging overhead hogging the CPU.
>

Hm. 381 MB are massive, then. Maybe I can find reproduce it
and help tracking it down with a modified APR.


> > Finally, to minimize cache usage, make sure to disable fulltext
> > caching as well (enabled by default in 1.9) and set the cache
> > size to *1*, not 0. The latter would fall back to 1.6-style caches,
>
> Which option are you referring? The SVNInMemoryCacheSize option?
> The doc for that option says "0 deactivates the cache". Is this an error?
>
>   /* per server */
>   AP_INIT_TAKE1("SVNInMemoryCacheSize", SVNInMemoryCacheSize_cmd, NULL,
>                 RSRC_CONF,
>                 "specifies the maximum size in kB per process of
> Subversion's "
>                 "in-memory object cache (default value is 16384; 0
> deactivates "
>                 "the cache)."),
>

It's technically correct, because the "static" per-process cache does
get disabled and all that's left is memory dynamically allocated for
an open connection (svn_fs_t, actually).

But the doc string is misleading. I would suggest to change it to
"0 switches to dynamically sized caches".

-- Stefan^2.

Re: On pool / memory usage debugging

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Sperling <st...@elego.de> writes:

> Which option are you referring? The SVNInMemoryCacheSize option?
> The doc for that option says "0 deactivates the cache". Is this an error?

http://subversion.apache.org/docs/release-notes/1.7.html#data-caches

"Please note that a cache size of 0 will deactivate the new caching
 facilities and cause the server to fall back to 1.6 caching
 mechanisms."

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: On pool / memory usage debugging

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Dec 08, 2014 at 06:42:38PM +0100, Stefan Fuhrmann wrote:
> IOW, pool debugging is nice for tracing allocations but if you
> want to measure memory consumption on the OS side, turn
> pool debugging off.

All measurements I mentioned in the issue were done with pool debugging
disabled. Measuring memory usage of the issue #4531 copy operation with
pool debugging enabled was impossible because the copy operation never
completed in a reasonable amount of time due to pool-debugging-induced
logging overhead hogging the CPU.
 
> Finally, to minimize cache usage, make sure to disable fulltext
> caching as well (enabled by default in 1.9) and set the cache
> size to *1*, not 0. The latter would fall back to 1.6-style caches,

Which option are you referring? The SVNInMemoryCacheSize option?
The doc for that option says "0 deactivates the cache". Is this an error?

  /* per server */
  AP_INIT_TAKE1("SVNInMemoryCacheSize", SVNInMemoryCacheSize_cmd, NULL,
                RSRC_CONF,
                "specifies the maximum size in kB per process of Subversion's "
                "in-memory object cache (default value is 16384; 0 deactivates "
                "the cache)."),

Re: On pool / memory usage debugging

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> I'm still trying to determine why this
> extra walk is present, perhaps the new version is doing an unnecessary
> walk or perhaps the old version has an unfixed bug.

Debian's httpd doesn't have r1497441 which introduced the extra walk for
copies, apparently to fix PR 54610.  This was followed by r1515569 and
r1540728 which changed the checks on the copy source but did not remove
the walk.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: On pool / memory usage debugging

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Sperling <st...@elego.de> writes:

> Should we raise an issue with mod_dav developers? (gstein?)

I suspect any mod_dav fixes are probably going to be written by us.  I
think the fix might be to have dav_validate_request() identify the
conditions that mean dav_validate_walker() will always return NULL,
i.e. a copy with no if header, and skip the walk.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: On pool / memory usage debugging

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Dec 08, 2014 at 09:23:10PM +0000, Philip Martin wrote:
> Philip Martin <ph...@wandisco.com> writes:
> 
> > I think this extra walk is pointless most of the time and is just making
> > our nominally O(1) copy operation slower.  Locks on the copy source do
> > not prevent a copy.
> 
> Oops! I meant to write "Locks on the copy source held by somebody else
> do not prevent a copy".
> 
> -- 
> Philip

Thanks for digging up the history. I was wondering too whether this walk
is really needed but ended up assuming it had always been there.

Should we raise an issue with mod_dav developers? (gstein?)

Re: On pool / memory usage debugging

Posted by Philip Martin <ph...@codematters.co.uk>.
Philip Martin <ph...@wandisco.com> writes:

> I think this extra walk is pointless most of the time and is just making
> our nominally O(1) copy operation slower.  Locks on the copy source do
> not prevent a copy.

Oops! I meant to write "Locks on the copy source held by somebody else
do not prevent a copy".

-- 
Philip

Re: On pool / memory usage debugging

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> My pool-debug build uses my own httpd build from 2.2.x@1562432 while my
> normal build used the system httpd, Debian's 2.2.22-13+deb7u3.  It looks
> like some change to mod_dav has added an extra walk over the copy source
> in the pool-debug build and the absence of this walk means there is no
> problem in my normal build.  I'm still trying to determine why this
> extra walk is present, perhaps the new version is doing an unnecessary
> walk or perhaps the old version has an unfixed bug.

I think this extra walk is pointless most of the time and is just making
our nominally O(1) copy operation slower.  Locks on the copy source do
not prevent a copy.  This walk appears to be checking that locks
(ETags?)  supplied by the COPY requests match the copy source.  On an
URL-URL copy a Subversion client will not supply any tokens, so the walk
scans the whole tree for no point.  The walk repeatedly calls back into
mod_dav's dav_validate_resource_state() and it always returns NULL as
DAV_VALIDATE_NO_MODIFY is set and if_header is NULL.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: On pool / memory usage debugging

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> The problem is easy to reproduce with pool debugging enabled and the
> patch does reduce memory use, but with a normal build I don't see the
> excessive memory in the first place.

My pool-debug build uses my own httpd build from 2.2.x@1562432 while my
normal build used the system httpd, Debian's 2.2.22-13+deb7u3.  It looks
like some change to mod_dav has added an extra walk over the copy source
in the pool-debug build and the absence of this walk means there is no
problem in my normal build.  I'm still trying to determine why this
extra walk is present, perhaps the new version is doing an unnecessary
walk or perhaps the old version has an unfixed bug.


-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: On pool / memory usage debugging

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Dec 08, 2014 at 07:19:57PM +0000, Philip Martin wrote:
> I have been looking at the proposed 1.8/1.7 backport for issue 4531.
> The problem is easy to reproduce with pool debugging enabled and the
> patch does reduce memory use, but with a normal build I don't see the
> excessive memory in the first place.

That's surprising.

> Was this issue raised in response to a problem observed in a normal
> build?  Can the problem be reproduced in a normal build?  Perhaps the
> large tree produced by the script in the issue is not large enough to
> cause the problem in a normal build?

It was found and reproduced first with a set of CollabNet SVN 1.8 binaries,
both stand-alone and Edge ones, where it was crashing the server which
evidently maxed out memory as displayed by top(1) on Linux.
The original repository was much larger than the sample created by the
gentree script, though, in the order of 40GB in total size. I'm not sure
which repository format exactly.

As mentioned in the issue I can reproduce the problem with a trunk build.
The copy doesn't abort but uses roughly 2GB of memory before it completes.

Re: On pool / memory usage debugging

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Fuhrmann <st...@wandisco.com> writes:

> This post has been prompted by issue 4531 and r1643834
> (http://subversion.tigris.org/issues/show_bug.cgi?id=4531
>  http://svn.apache.org/viewvc?view=revision&revision=r1643834).

> IOW, pool debugging is nice for tracing allocations but if you
> want to measure memory consumption on the OS side, turn
> pool debugging off.

I have been looking at the proposed 1.8/1.7 backport for issue 4531.
The problem is easy to reproduce with pool debugging enabled and the
patch does reduce memory use, but with a normal build I don't see the
excessive memory in the first place.  The runtime of the copy is
similarly affected: it is negligible with a normal build but takes
several seconds when pool debugging is enabled.

Was this issue raised in response to a problem observed in a normal
build?  Can the problem be reproduced in a normal build?  Perhaps the
large tree produced by the script in the issue is not large enough to
cause the problem in a normal build?

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*