You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "Peter N. Lundblad" <pe...@famlundblad.se> on 2005/02/14 13:34:51 UTC

[PATCH] FSFS speedup (was: Re: Making blame even faster)

On Sun, 13 Feb 2005, Daniel Berlin wrote:

>
> > It depends on your repository, but probably yes.
> >
> Just to followup:
>
> Brane's timings on windows showed that for the gcc combine.c file (which
> had ~1500 revisions, spread out across branches, etc), we had
>
> blame on an bdb repo with vdelta: 40 seconds
> blame on an bdb repo with xdelta: 10 seconds
>
> blame on an fsfs repo with vdelta: 108 seconds
> blame on an fsfs repo with xdelta: 105 seconds
>
>
> To fully take advantage of the xdelta speedup on fsfs, you need to
> implement greg hudson's suggestion of making sure the vdelta rep against
> empty is *not* combined with other windows, or be willing to give up the
> first rep being compressed (IE it will store one fulltext in the repo).
>
OK. Here we go. On my tests it is a big speedup. I tested on FSFS with and
without xdelta when the dump was loaded. It seems like we win a lot even
if vdelta was used to create the repo. Anyway, feel free to fill in the
missing row above:-)

(BTW, while there, I replaced apr_pcalloc with apr_palloc. That in fact
gave us 0.5 seconds or so. That's nothing, but there's no reason to change
t back.)

Regards,
//Peter

Re: [PATCH] FSFS speedup (was: Re: Making blame even faster)

Posted by Daniel Berlin <db...@dberlin.org>.
On Sat, 2005-02-26 at 13:09 -0500, Daniel Berlin wrote:
> On Sat, 2005-02-26 at 12:43 -0500, Greg Hudson wrote:
> > On Tue, 2005-02-22 at 20:24, Daniel Berlin wrote:
> > > On Mon, 2005-02-21 at 23:03 -0500, Greg Hudson wrote:
> > > > On Mon, 2005-02-21 at 20:37, Daniel Berlin wrote:
> > > > > ChangeLog is > SVN_DELTA_WINDOW_SIZE when deltified, and you only
> > > > > auto-expand the first window
> > > > 
> > > > That doesn't agree with my reading of Peter's change.  It looks like he
> > > > auto-expands all windows of the last rep.
> > > > 
> > > > (Something is obviously wrong, but I'm not convinced this is it.)
> > > 
> > > Fair enough.
> > > I've put the repo in question (it's about 1/8th the total revisions of
> > > our changelog) at http://www.toolchain.org/~dberlin/repo.dump.bz2
> > 
> > As far as I can tell, what's wrong is the basic premise of this whole
> > optimization.  A chain of deltas with no "copy from target" operations
> > does not necessarily avoid copy_source_ops().
> 
> This is true from the current composition standpoint, however, with no
> overlaps, composition is trivial. Much more trivial than what we do now
> (we shouldn't need the search_offset_index portion, because we can just
> sort the offsets once and walk both windows at the same time).
> 
> Also, the idea was not to avoid copy_source_ops entirely in the current
> scheme, it was to avoid quadratic behavior we got from copy_source_ops,
> which it should.
> 
Just to clarify further, i believe my original message was incorrect,
because we should still see copy_source_ops activity (just not quadratic
behavior).

IOW, nothing to see here at the moment, move along :)

--Dan

PS that'll teach me to write emails while in the midst of stressing over
a bar exam.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] FSFS speedup (was: Re: Making blame even faster)

Posted by Daniel Berlin <db...@dberlin.org>.
On Sat, 2005-02-26 at 12:43 -0500, Greg Hudson wrote:
> On Tue, 2005-02-22 at 20:24, Daniel Berlin wrote:
> > On Mon, 2005-02-21 at 23:03 -0500, Greg Hudson wrote:
> > > On Mon, 2005-02-21 at 20:37, Daniel Berlin wrote:
> > > > ChangeLog is > SVN_DELTA_WINDOW_SIZE when deltified, and you only
> > > > auto-expand the first window
> > > 
> > > That doesn't agree with my reading of Peter's change.  It looks like he
> > > auto-expands all windows of the last rep.
> > > 
> > > (Something is obviously wrong, but I'm not convinced this is it.)
> > 
> > Fair enough.
> > I've put the repo in question (it's about 1/8th the total revisions of
> > our changelog) at http://www.toolchain.org/~dberlin/repo.dump.bz2
> 
> As far as I can tell, what's wrong is the basic premise of this whole
> optimization.  A chain of deltas with no "copy from target" operations
> does not necessarily avoid copy_source_ops().

This is true from the current composition standpoint, however, with no
overlaps, composition is trivial. Much more trivial than what we do now
(we shouldn't need the search_offset_index portion, because we can just
sort the offsets once and walk both windows at the same time).

Also, the idea was not to avoid copy_source_ops entirely in the current
scheme, it was to avoid quadratic behavior we got from copy_source_ops,
which it should.

:)






---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] FSFS speedup (was: Re: Making blame even faster)

Posted by Greg Hudson <gh...@MIT.EDU>.
On Tue, 2005-02-22 at 20:24, Daniel Berlin wrote:
> On Mon, 2005-02-21 at 23:03 -0500, Greg Hudson wrote:
> > On Mon, 2005-02-21 at 20:37, Daniel Berlin wrote:
> > > ChangeLog is > SVN_DELTA_WINDOW_SIZE when deltified, and you only
> > > auto-expand the first window
> > 
> > That doesn't agree with my reading of Peter's change.  It looks like he
> > auto-expands all windows of the last rep.
> > 
> > (Something is obviously wrong, but I'm not convinced this is it.)
> 
> Fair enough.
> I've put the repo in question (it's about 1/8th the total revisions of
> our changelog) at http://www.toolchain.org/~dberlin/repo.dump.bz2

As far as I can tell, what's wrong is the basic premise of this whole
optimization.  A chain of deltas with no "copy from target" operations
does not necessarily avoid copy_source_ops().

(This is from loading your sample repository and poking around in gdb
with a checkout of the head of ChangeLog.  I see copy_source_ops()
activity from the very first window composition.  window_B of this
composition comes from rev 4550 and has only four instructions, none of
which are "copy from target" instructions.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] FSFS speedup (was: Re: Making blame even faster)

Posted by Daniel Berlin <db...@dberlin.org>.
On Mon, 2005-02-21 at 23:03 -0500, Greg Hudson wrote:
> On Mon, 2005-02-21 at 20:37, Daniel Berlin wrote:
> > ChangeLog is > SVN_DELTA_WINDOW_SIZE when deltified, and you only
> > auto-expand the first window
> 
> That doesn't agree with my reading of Peter's change.  It looks like he
> auto-expands all windows of the last rep.
> 
> (Something is obviously wrong, but I'm not convinced this is it.)

Fair enough.
I've put the repo in question (it's about 1/8th the total revisions of
our changelog) at http://www.toolchain.org/~dberlin/repo.dump.bz2

It's 1.9 meg compressed, 8 meg uncompressed dump file, and 55 meg in
repo form (in both bdb and fsfs. The FSFS is about 50% revprops space in
the case of changelog, of course)

svn blame on the bdb repo doesn't show any copy_source_ops activity.
svn blame on the fsfs repo does.
So something is up, not sure what.





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] FSFS speedup (was: Re: Making blame even faster)

Posted by Greg Hudson <gh...@MIT.EDU>.
On Mon, 2005-02-21 at 20:37, Daniel Berlin wrote:
> ChangeLog is > SVN_DELTA_WINDOW_SIZE when deltified, and you only
> auto-expand the first window

That doesn't agree with my reading of Peter's change.  It looks like he
auto-expands all windows of the last rep.

(Something is obviously wrong, but I'm not convinced this is it.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] FSFS speedup (was: Re: Making blame even faster)

Posted by Daniel Berlin <db...@dberlin.org>.
On Mon, 2005-02-14 at 14:34 +0100, Peter N. Lundblad wrote:
> On Sun, 13 Feb 2005, Daniel Berlin wrote:
> 
> >
> > > It depends on your repository, but probably yes.
> > >
> > Just to followup:
> >
> > Brane's timings on windows showed that for the gcc combine.c file (which
> > had ~1500 revisions, spread out across branches, etc), we had
> >
> > blame on an bdb repo with vdelta: 40 seconds
> > blame on an bdb repo with xdelta: 10 seconds
> >
> > blame on an fsfs repo with vdelta: 108 seconds
> > blame on an fsfs repo with xdelta: 105 seconds
> >
> >
> > To fully take advantage of the xdelta speedup on fsfs, you need to
> > implement greg hudson's suggestion of making sure the vdelta rep against
> > empty is *not* combined with other windows, or be willing to give up the
> > first rep being compressed (IE it will store one fulltext in the repo).
> >
> OK. Here we go. On my tests it is a big speedup. I tested on FSFS with and
> without xdelta when the dump was loaded. It seems like we win a lot even
> if vdelta was used to create the repo. Anyway, feel free to fill in the
> missing row above:-)


Okay, i noticed a problem with this patch after a lot of testing, and
trying to figure out why on ChangeLog, copy_source_ops was still at the
top of the list for fsfs.
It then hit me in the shower.

ChangeLog is > SVN_DELTA_WINDOW_SIZE when deltified, and you only
auto-expand the first window.  However, the entire file is vdelta'd,
which means all of the windows.  So the other windows with target ops
get combined, causing quadratic behavior again on this file.

I'm completely in law mode right now, so nothing pops into my head as a
good solution for this problem.

--Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

[PATCH] BDB speedup (was: FSFS speedup)

Posted by Branko Čibej <br...@xbc.nu>.
Peter N. Lundblad wrote:

>OK. Here we go. On my tests it is a big speedup. I tested on FSFS with and
>without xdelta when the dump was loaded. It seems like we win a lot even
>if vdelta was used to create the repo. Anyway, feel free to fill in the
>missing row above:-)
>  
>
Well, I've done something similar to the BDB combiner, except that it 
only pre-expands windows that are truly self-compressed. At the moment, 
this makes hardly any difference. However, it paves the way for storing 
fulltexts as self-compressed vdeltas in BDB, too, just as FSFS does it. 
This could significantly reduce the size of BDB repositories, especially 
where there are many active branches.

[[[
Treat self-compressed delta windows as virutal fulltexts in the 
delta combiner; expand them first instead of combining them with
the rest of the delta chain.

* subversion/libsvn_fs_base/reps-strings.c (compose_handler_baton):
   New member source_buf; holds expanded window data.
  (compose_handler): When handling a self-compressed window, expand
   it instead of combining it with the existing (combined) window.
  (rep_undeltify_range): If available, use the expanded window from
   the baton instead of the fulltext. Remove empty_buf.
]]]


-- Brane


Re: [PATCH] FSFS speedup (was: Re: Making blame even faster)

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Mon, 14 Feb 2005, Greg Hudson wrote:

> This patch looks great.  Thanks for implementing it.
>
Thx for reviewing as usual.

> I noticed just one style issue: in many places you created lines of
> exactly 80 characters.  If you could tweak your editing environment such

I actually use Emacs in an 80x25 terminal, so I should have been annoyed
myself:-) Thanks for clarifying in hacking.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] FSFS speedup (was: Re: Making blame even faster)

Posted by Greg Hudson <gh...@MIT.EDU>.
This patch looks great.  Thanks for implementing it.

I noticed just one style issue: in many places you created lines of
exactly 80 characters.  If you could tweak your editing environment such
that you restrict lines to 79 characters instead of 80, that would be
good.  (HACKING just says "stay within 80 columns", but if the goal is
to display well on an 80-column terminal, we need to stop at 79
columns.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] FSFS speedup (was: Re: Making blame even faster)

Posted by Daniel Berlin <db...@dberlin.org>.
On Mon, 2005-02-14 at 14:34 +0100, Peter N. Lundblad wrote:
> On Sun, 13 Feb 2005, Daniel Berlin wrote:
> 
> >
> > > It depends on your repository, but probably yes.
> > >
> > Just to followup:
> >
> > Brane's timings on windows showed that for the gcc combine.c file (which
> > had ~1500 revisions, spread out across branches, etc), we had
> >
> > blame on an bdb repo with vdelta: 40 seconds
> > blame on an bdb repo with xdelta: 10 seconds
> >
> > blame on an fsfs repo with vdelta: 108 seconds
> > blame on an fsfs repo with xdelta: 105 seconds
> >
> >
> > To fully take advantage of the xdelta speedup on fsfs, you need to
> > implement greg hudson's suggestion of making sure the vdelta rep against
> > empty is *not* combined with other windows, or be willing to give up the
> > first rep being compressed (IE it will store one fulltext in the repo).
> >
> OK. Here we go. On my tests it is a big speedup. I tested on FSFS with and
> without xdelta when the dump was loaded. It seems like we win a lot even
> if vdelta was used to create the repo. Anyway, feel free to fill in the
> missing row above:-)
> 


Just FYI, i get a large speedup from this as well, and no longer require
the "don't vdelta against empty source" patch i have locally in order to
get good speed.

With current trunk, without patch, blame on large file locally with repo
using fsfs:

real    1m24.982s
user    1m7.414s
sys     0m6.480

With patch:
real    0m47.240s
user    0m28.449s
sys     0m6.306s


(I have a progress bar printing out how far along it is in getting the
revisions for blame, and you can see it's being given fulltexts *MUCH*
quicker)



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org