You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Neels J Hofmeyr <ne...@elego.de> on 2010/09/27 01:39:54 UTC

Re: place of svnrdump

On 2010-09-25 14:43, Daniel Shahaf wrote:
> Ramkumar Ramachandra wrote on Sat, Sep 25, 2010 at 11:59:49 +0530:
>> Agreed, these modules should not be part of the core. However, in the
>> case of Subversion, there absolutely NO way to get/ back up the
>> revision history data* [5].
> 
> svnsync.

On a side note, svnsync happens to be relatively slow. I tried to svnsync
the ASF repos once (for huge test data). The slowness of svnsync made it
practically unfeasible to pull off. I ended up downloading a zipped dump and
'svnadmin load'ing that dump. Even with a zipped dump already downloaded,
'unzip | svnadmin load' took a few *days* to load the 950.000+ revisions.
(And someone rebooted that box after two days, halfway through, grr. Took
some serious hacking to finish up without starting over.)

So, that experience tells me that svnsync and svnadmin dump/load aren't
close to optimal, for example compared to a straight download of 34 gigs
that the ASF repos is... Anything that could speed up a remote dump/load
process would probably be good -- while I don't know any details about svnrdump.

My two cents: Rephrasing everything into the dump format and back blows up
both data size and ETA. Maybe a remote backup mechanism could even break
loose from discrete revision boundaries during transfer/load...

In contrast, the speed of a remote 'svn log' just amazes me. It's pretty
darn fast to get all the commit logs of a repos. So between that and getting
the rev content as well there's some big speed loss.

Heh, that's my reply to a single-word statement ;)

~Neels


P.S.: If the whole ASF repos were a single Git WC, how long would that take
to pull? (Given that Git tends to take up much more space than a Subversion
repos, I wonder.)


Re: place of svnrdump

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Ramkumar Ramachandra wrote on Tue, Sep 28, 2010 at 00:27:38 +0530:
> Hi Neels,
> 
> Neels J Hofmeyr writes:
> > > I just benchmarked it recently and found that it dumps 10000 revisions
> > > of the ASF repository in 106 seconds: that's about 94 revisions per
> > > second.
> > 
> > Wow! That's magnitudes better than the 5 to 10 revisions per second I'm used
> > to (I think that's using svnsync).
> 
> Yep :)
> 
> > While we're at it... svnsync's slowness is particularly painful when doing
> > 'svnsync copy-revprops'. With revprop changes enabled, any revprops may be
> > changed at any time. So to maintain an up-to-date mirror, one would like to
> > copy *all* revprops at the very least once per day. With a repos of average
> > corporate size, though, that can take the whole night and soon longer than
> > the developers need to go home and come back to work next morning (to find
> > the mirror lagging). So one could copy only the youngest 1000 revprops each
> > night and do a complete run every weekend. Or script a revprop-change hook
> > that propagates revprop change signals to mirrors. :(
> 
> Wow. This is quite a serious problem. I'm a very new developer, and I
> don't really use Subversion. You should probably let the other
> Subversion developers know about this on a new thread?
> 
> @Daniel, @Stefan: Thoughts on this?
> 

Use the commits@ list and run copy-revprops only on revisions that
actually had been revprop-edited?

> > svnrdump won't help in that compartment, would it?
> 
> That would be a feature request (although I'm not sure svnrdump will
> ever be extended to handle that),

How could svnrdump help here?  What we might need is an RA call that has
the server provide the N last revisions to have undergone revprop edits...

> because svnrdump is still very
> young- it just dumps/ loads dumpfiles from remote repositories quickly
> at the moment. I've decided to feature freeze until I fix the perf
> issues for the upcoming release- I'll keep this in mind though.
> 
> -- Ram

RE: place of svnrdump

Posted by "Bolstridge, Andrew" <an...@intergraph.com>.

-----Original Message-----
From: Ramkumar Ramachandra [mailto:artagnon@gmail.com] 
Sent: 27 September 2010 19:58
To: Neels J Hofmeyr
Cc: dev@subversion.apache.org; Daniel Shahaf; Stefan Sperling
Subject: Re: place of svnrdump

Neels J Hofmeyr writes:

> While we're at it... svnsync's slowness is particularly painful when 
> doing 'svnsync copy-revprops'. With revprop changes enabled, any 
> revprops may be changed at any time. So to maintain an up-to-date 
> mirror, one would like to copy *all* revprops at the very least once 
> per day. With a repos of average corporate size, though, that can take

> the whole night and soon longer than the developers need to go home 
> and come back to work next morning (to find the mirror lagging). So 
> one could copy only the youngest 1000 revprops each night and do a 
> complete run every weekend. Or script a revprop-change hook that 
> propagates revprop change signals to mirrors. :(

Of course, you could put a post-revprop-change hook in place to note
which revprop was changed, and then run a script that only syncs those
revprops.

I wouldn't recommend putting the 'sync copy-revprops' command in the
post-revprop-change hook, if someone commits a revision then immediately
updates the revprop the sync will fail (as the rev may not have been
synced yet). 

If anything, changing svnsync to ignore a failed copy-revprop command if
no revision existed to sync to would fix this problem, and the
copy-revprop could then be put in the hook without worry.


Re: place of svnrdump

Posted by Ramkumar Ramachandra <ar...@gmail.com>.
Hi Neels,

Neels J Hofmeyr writes:
> > I just benchmarked it recently and found that it dumps 10000 revisions
> > of the ASF repository in 106 seconds: that's about 94 revisions per
> > second.
> 
> Wow! That's magnitudes better than the 5 to 10 revisions per second I'm used
> to (I think that's using svnsync).

Yep :)

> While we're at it... svnsync's slowness is particularly painful when doing
> 'svnsync copy-revprops'. With revprop changes enabled, any revprops may be
> changed at any time. So to maintain an up-to-date mirror, one would like to
> copy *all* revprops at the very least once per day. With a repos of average
> corporate size, though, that can take the whole night and soon longer than
> the developers need to go home and come back to work next morning (to find
> the mirror lagging). So one could copy only the youngest 1000 revprops each
> night and do a complete run every weekend. Or script a revprop-change hook
> that propagates revprop change signals to mirrors. :(

Wow. This is quite a serious problem. I'm a very new developer, and I
don't really use Subversion. You should probably let the other
Subversion developers know about this on a new thread?

@Daniel, @Stefan: Thoughts on this?

> svnrdump won't help in that compartment, would it?

That would be a feature request (although I'm not sure svnrdump will
ever be extended to handle that), because svnrdump is still very
young- it just dumps/ loads dumpfiles from remote repositories quickly
at the moment. I've decided to feature freeze until I fix the perf
issues for the upcoming release- I'll keep this in mind though.

-- Ram

Re: place of svnrdump

Posted by Neels J Hofmeyr <ne...@elego.de>.
On 2010-09-27 09:45, Ramkumar Ramachandra wrote:
...

> I just benchmarked it recently and found that it dumps 10000 revisions
> of the ASF repository in 106 seconds: that's about 94 revisions per
> second.

Wow! That's magnitudes better than the 5 to 10 revisions per second I'm used
to (I think that's using svnsync).

While we're at it... svnsync's slowness is particularly painful when doing
'svnsync copy-revprops'. With revprop changes enabled, any revprops may be
changed at any time. So to maintain an up-to-date mirror, one would like to
copy *all* revprops at the very least once per day. With a repos of average
corporate size, though, that can take the whole night and soon longer than
the developers need to go home and come back to work next morning (to find
the mirror lagging). So one could copy only the youngest 1000 revprops each
night and do a complete run every weekend. Or script a revprop-change hook
that propagates revprop change signals to mirrors. :(

svnrdump won't help in that compartment, would it?

Thanks,
~Neels


Re: place of svnrdump

Posted by Ramkumar Ramachandra <ar...@gmail.com>.
Hi Neels,

Neels J Hofmeyr writes:
> On a side note, svnsync happens to be relatively slow. I tried to svnsync
> the ASF repos once (for huge test data). The slowness of svnsync made it
> practically unfeasible to pull off. I ended up downloading a zipped dump and
> 'svnadmin load'ing that dump. Even with a zipped dump already downloaded,
> 'unzip | svnadmin load' took a few *days* to load the 950.000+ revisions.
> (And someone rebooted that box after two days, halfway through, grr. Took
> some serious hacking to finish up without starting over.)

Yeah, we had a tough time obtaining the complete undeltified ASF dump
for testing purposes as well.

> So, that experience tells me that svnsync and svnadmin dump/load aren't
> close to optimal, for example compared to a straight download of 34 gigs
> that the ASF repos is... Anything that could speed up a remote dump/load
> process would probably be good -- while I don't know any details about svnrdump.

I just benchmarked it recently and found that it dumps 10000 revisions
of the ASF repository in 106 seconds: that's about 94 revisions per
second. It used to be faster than `svnadmin` in an older benchmark:
I'll work on perf issues this week. I estimate that it should be
possible to get it to dump at ~140 revisions/second.

@Daniel and others: I'd recommend a feature freeze. I'm currently
profiling svnrdump and working on improving especially the I/O
profile.

> My two cents: Rephrasing everything into the dump format and back blows up
> both data size and ETA. Maybe a remote backup mechanism could even break
> loose from discrete revision boundaries during transfer/load...

I've been thinking about this too: we'll have to start attacking the
RA layer itself to make svnrdump even faster. The replay API isn't
optimized for this kind of operation.

> P.S.: If the whole ASF repos were a single Git WC, how long would that take
> to pull? (Given that Git tends to take up much more space than a Subversion
> repos, I wonder.)

The gzipped undeltified dump of the complete ASF repository comes to
about 25 GiB and it takes ~70 minutes to import it into the Git object
store using a tool which is currently under development in Git. Thanks
to David for these statistics.

Cloning takes as long as it takes to transmit this data. After a
repack, it'll probably shrink in size, but that's besides the
point. Git was never designed to handle this- each project being a
separate repository would be a fairer comparison. Even linux-2.6.git
contains just 210887 revisions, and it tests Git's limits.

-- Ram