You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Garrett Rooney <ro...@electricjellyfish.net> on 2006/02/08 21:51:20 UTC

svn_repos_get_logs3 and unbounded memory use

So I'm looking at some svn related httpd crashes we've been seeing on
svn.apache.org, and it turns out that the root cause seems to be
unbounded memory use in svn_repos_get_logs3.  We construct a history
object for each path, and walk backwards through them until we've
either run out of history or sent "limit" revisions back to the user.

Unfortunately, this means we actually keep two pools and a history
object per path open, which for large numbers of paths (around 1200 in
the crashes i'm seeing) results in a LOT of memory being used (about
512 megs in this case, at which point the process crashes because it
hits its per-process memory limit).

I'm starting to think we might need some sort of system where we only
open a fixed number of history objects (and thus only need a fixed
number of scratch pools around for iterating through history) at a
time.  We'd still need to track some info on a per-path basis (i.e.
what location in history we were at when we last had that node's
history opened) but it would be far less.  More troubling would be the
performance hit of recreating the history objects lazily, but it's
certainly better to be slow than it is to use unbounded memory.

Additionally, it seems like we can get away with not opening a
revision root per path, since they are all rooted at the same revision
anyway.  This change at least reduces our per-path overhead a bit, and
basic tests seem to indicate that it's safe.  Is there any reason we
can't do that (see attached patch)?

Thoughts?

-garrett

Re: svn_repos_get_logs3 and unbounded memory use

Posted by Julian Foad <ju...@btopenworld.com>.
Garrett Rooney wrote:
> On 2/9/06, Julian Foad <ju...@btopenworld.com> wrote:
> 
>>Sorry to jump in with what may be just an ignorant question, but could you
>>briefly explain whether such a large number of paths is passed in common
>>every-day scenarios (and, if so, why) or just in some extreme use cases like
>>"svn log */*/*.c".
> 
> I was able to reproduce it via 'svn log URL --targets FILE', where
> FILE contains a list of paths within URL.  I have no idea what's
> actually causing this in practice, but I have seen numerous crashes
> that were caused by this on svn.apache.org, so something's doing it. 
> So if I had to guess, I'd say it's an extreme edge case, but it does
> actually occur.

OK, thanks.

> Regardless of how easy it is to reproduce this problem with the
> command line tools, the fact that it can happen at all via the
> protocol/API requires that we fix it.

Right.  Fully agreed that we should fix this here, but I also wonder if there's 
an inefficient caller that we should be looking for, where we might make even 
bigger gains by avoiding this scenario.  Maybe there isn't, but ...

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn_repos_get_logs3 and unbounded memory use

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/9/06, Julian Foad <ju...@btopenworld.com> wrote:
> Garrett Rooney wrote:
> > On 2/8/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> >>On 2/8/06, C. Michael Pilato <cm...@collab.net> wrote:
> >>
> >>>Hrm.  So, the additional calls to svn_fs_node_history() simply populate
> >>>a structure.  But you'd have to call svn_fs_history_prev() twice, once
> >>>to get to the history location you reported in your last iteration, and
> >>>then again to reach the next new and interesting location.  That could
> >>>really get costly in terms of performance, I imagine.  Especially over
> >>>the likes of 1200 paths.  Still, memory is a fixed resource; Time not so
> >>>much.
>
> [...]
> > I'm not sure if this is the kind of approach we want to taks (it would
> > be really nice if a call with a more reasonable number of paths could
> > go faster by keeping the histories open, for example), but I imagine
> > we're going to need to go down this path at some point unless we want
> > to put a hard cap on the total number of paths you're allowed to pass
> > to log.
>
> Sorry to jump in with what may be just an ignorant question, but could you
> briefly explain whether such a large number of paths is passed in common
> every-day scenarios (and, if so, why) or just in some extreme use cases like
> "svn log */*/*.c".

I was able to reproduce it via 'svn log URL --targets FILE', where
FILE contains a list of paths within URL.  I have no idea what's
actually causing this in practice, but I have seen numerous crashes
that were caused by this on svn.apache.org, so something's doing it. 
So if I had to guess, I'd say it's an extreme edge case, but it does
actually occur.

Regardless of how easy it is to reproduce this problem with the
command line tools, the fact that it can happen at all via the
protocol/API requires that we fix it.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn_repos_get_logs3 and unbounded memory use

Posted by Julian Foad <ju...@btopenworld.com>.
Garrett Rooney wrote:
> On 2/8/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
>>On 2/8/06, C. Michael Pilato <cm...@collab.net> wrote:
>>
>>>Hrm.  So, the additional calls to svn_fs_node_history() simply populate
>>>a structure.  But you'd have to call svn_fs_history_prev() twice, once
>>>to get to the history location you reported in your last iteration, and
>>>then again to reach the next new and interesting location.  That could
>>>really get costly in terms of performance, I imagine.  Especially over
>>>the likes of 1200 paths.  Still, memory is a fixed resource; Time not so
>>>much.

[...]
> I'm not sure if this is the kind of approach we want to taks (it would
> be really nice if a call with a more reasonable number of paths could
> go faster by keeping the histories open, for example), but I imagine
> we're going to need to go down this path at some point unless we want
> to put a hard cap on the total number of paths you're allowed to pass
> to log.

Sorry to jump in with what may be just an ignorant question, but could you 
briefly explain whether such a large number of paths is passed in common 
every-day scenarios (and, if so, why) or just in some extreme use cases like 
"svn log */*/*.c".

Thanks.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn_repos_get_logs3 and unbounded memory use

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/8/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:

> I'm not sure if this is the kind of approach we want to taks (it would
> be really nice if a call with a more reasonable number of paths could
> go faster by keeping the histories open, for example), but I imagine
> we're going to need to go down this path at some point unless we want
> to put a hard cap on the total number of paths you're allowed to pass
> to log.

I just committed (in r18404) a hybrid implementation that uses a
combination of the two approaches.  The first N paths (where N is
currently set at 32) get a history object that remains open for the
whole request, any after that end up getting their history opened as
needed, which is slower but keeps memory usage down.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn_repos_get_logs3 and unbounded memory use

Posted by Stefan Küng <to...@gmail.com>.
C. Michael Pilato wrote:
> Garrett Rooney wrote:
>> So, just to see what the impact would be like, I implemented this the
>> "easy" way, not keeping any history objects open at all, just doing
>> the open/prev/prev thing each time we need to move back.  On my test
>> case (which runs log on about 1200 paths in a repository that's got
>> 867 revisions in it) this results in about a 20% speed hit, with max
>> memory usage (via the oh-so-scientific "look at the output of top"
>> method of measuring) capping out at around 3 megs, as opposed to about
>> 250 megs the old way of doing things.
> 
> As I noted on the phone a few minutes ago, my opinion runs thusly:  if
> typical real-world use-cases (say, < 10 paths passed to 'svn log') don't
> suffer noticably from this approach, then the fix is fine with me -- it
> solves memory problems for all use-cases, even if the obnoxious cases
> suffer a 20% hit in speed.  At least the operations (eventually)
> complete successfully.

I remember that one Subversion UI client (don't remember which one 
exactly) passes multiple paths/urls to svn_client_log3(), even if the 
user only wan't the log for one file/folder. It does this to get the 
additional information on what happened to the same file/folder on all 
tags/branches.

So I guess, if there are many branches and tags, that number of passed 
paths will grow too (probably way beyond 10).

Not that I'm not ok with your approach here, just wanted to tell you 
why/where/how this can easily happen.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn_repos_get_logs3 and unbounded memory use

Posted by "C. Michael Pilato" <cm...@collab.net>.
Garrett Rooney wrote:
> So, just to see what the impact would be like, I implemented this the
> "easy" way, not keeping any history objects open at all, just doing
> the open/prev/prev thing each time we need to move back.  On my test
> case (which runs log on about 1200 paths in a repository that's got
> 867 revisions in it) this results in about a 20% speed hit, with max
> memory usage (via the oh-so-scientific "look at the output of top"
> method of measuring) capping out at around 3 megs, as opposed to about
> 250 megs the old way of doing things.

As I noted on the phone a few minutes ago, my opinion runs thusly:  if
typical real-world use-cases (say, < 10 paths passed to 'svn log') don't
suffer noticably from this approach, then the fix is fine with me -- it
solves memory problems for all use-cases, even if the obnoxious cases
suffer a 20% hit in speed.  At least the operations (eventually)
complete successfully.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Re: svn_repos_get_logs3 and unbounded memory use

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/8/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> On 2/8/06, C. Michael Pilato <cm...@collab.net> wrote:
> > Hrm.  So, the additional calls to svn_fs_node_history() simply populate
> > a structure.  But you'd have to call svn_fs_history_prev() twice, once
> > to get to the history location you reported in your last iteration, and
> > then again to reach the next new and interesting location.  That could
> > really get costly in terms of performance, I imagine.  Especially over
> > the likes of 1200 paths.  Still, memory is a fixed resource; Time not so
> > much.
>
> The worse part is that because we need to check ALL the histories each
> time through the loop, we are stuck doing that open/prev/prev dance N
> revs x M paths we can't keep open all the time.  So as soon as you go
> over this magical limit you'll hit a wall and performance goes to
> hell.  Unless I'm missing something here...
>

So, just to see what the impact would be like, I implemented this the
"easy" way, not keeping any history objects open at all, just doing
the open/prev/prev thing each time we need to move back.  On my test
case (which runs log on about 1200 paths in a repository that's got
867 revisions in it) this results in about a 20% speed hit, with max
memory usage (via the oh-so-scientific "look at the output of top"
method of measuring) capping out at around 3 megs, as opposed to about
250 megs the old way of doing things.

I'm not sure if this is the kind of approach we want to taks (it would
be really nice if a call with a more reasonable number of paths could
go faster by keeping the histories open, for example), but I imagine
we're going to need to go down this path at some point unless we want
to put a hard cap on the total number of paths you're allowed to pass
to log.

Thoughts?

-garrett

[[[
Switch svn_repos_get_logs3 to a far less memory intensive implementation.
Instead of keeping each history object open all the time we now open them
as needed, which burns more CPU, but keeps us from using all available
memory.

* subversion/libsvn_repos/log.c
  (path_info): Remove the hist, newpool, and oldpool members, make path
   into a stringbuf and add two booleans, done and first_time.
  (get_history): Take a pool.  Open the history object each time through
   and advance it to the proper location.
  (check_history): Take a pool, update for changes to get_history.
  (next_history_rev): Account for new way to tell if we're done.
  (svn_repos_get_logs3): Handle changes to path_info and get_history.
]]]

Re: svn_repos_get_logs3 and unbounded memory use

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/8/06, C. Michael Pilato <cm...@collab.net> wrote:
> Garrett Rooney wrote:
> > So I'm looking at some svn related httpd crashes we've been seeing on
> > svn.apache.org, and it turns out that the root cause seems to be
> > unbounded memory use in svn_repos_get_logs3.  We construct a history
> > object for each path, and walk backwards through them until we've
> > either run out of history or sent "limit" revisions back to the user.
> >
> > Unfortunately, this means we actually keep two pools and a history
> > object per path open, which for large numbers of paths (around 1200 in
> > the crashes i'm seeing) results in a LOT of memory being used (about
> > 512 megs in this case, at which point the process crashes because it
> > hits its per-process memory limit).
>
> Ouch.

Yeah, tell me about it.

> > I'm starting to think we might need some sort of system where we only
> > open a fixed number of history objects (and thus only need a fixed
> > number of scratch pools around for iterating through history) at a
> > time.  We'd still need to track some info on a per-path basis (i.e.
> > what location in history we were at when we last had that node's
> > history opened) but it would be far less.  More troubling would be the
> > performance hit of recreating the history objects lazily, but it's
> > certainly better to be slow than it is to use unbounded memory.
>
> Hrm.  So, the additional calls to svn_fs_node_history() simply populate
> a structure.  But you'd have to call svn_fs_history_prev() twice, once
> to get to the history location you reported in your last iteration, and
> then again to reach the next new and interesting location.  That could
> really get costly in terms of performance, I imagine.  Especially over
> the likes of 1200 paths.  Still, memory is a fixed resource; Time not so
> much.

The worse part is that because we need to check ALL the histories each
time through the loop, we are stuck doing that open/prev/prev dance N
revs x M paths we can't keep open all the time.  So as soon as you go
over this magical limit you'll hit a wall and performance goes to
hell.  Unless I'm missing something here...

> > Additionally, it seems like we can get away with not opening a
> > revision root per path, since they are all rooted at the same revision
> > anyway.
>
> Yep.  +1 on that part for sure.

I'll clean that up and commit it.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn_repos_get_logs3 and unbounded memory use

Posted by "C. Michael Pilato" <cm...@collab.net>.
Garrett Rooney wrote:
> So I'm looking at some svn related httpd crashes we've been seeing on
> svn.apache.org, and it turns out that the root cause seems to be
> unbounded memory use in svn_repos_get_logs3.  We construct a history
> object for each path, and walk backwards through them until we've
> either run out of history or sent "limit" revisions back to the user.
> 
> Unfortunately, this means we actually keep two pools and a history
> object per path open, which for large numbers of paths (around 1200 in
> the crashes i'm seeing) results in a LOT of memory being used (about
> 512 megs in this case, at which point the process crashes because it
> hits its per-process memory limit).

Ouch.

> I'm starting to think we might need some sort of system where we only
> open a fixed number of history objects (and thus only need a fixed
> number of scratch pools around for iterating through history) at a
> time.  We'd still need to track some info on a per-path basis (i.e.
> what location in history we were at when we last had that node's
> history opened) but it would be far less.  More troubling would be the
> performance hit of recreating the history objects lazily, but it's
> certainly better to be slow than it is to use unbounded memory.

Hrm.  So, the additional calls to svn_fs_node_history() simply populate
a structure.  But you'd have to call svn_fs_history_prev() twice, once
to get to the history location you reported in your last iteration, and
then again to reach the next new and interesting location.  That could
really get costly in terms of performance, I imagine.  Especially over
the likes of 1200 paths.  Still, memory is a fixed resource; Time not so
much.

> Additionally, it seems like we can get away with not opening a
> revision root per path, since they are all rooted at the same revision
> anyway.  

Yep.  +1 on that part for sure.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand