You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Kuno Meyer <ku...@gmx.ch> on 2014/01/23 14:11:36 UTC

possible optimization on update at externals with fixed-revision number?

Hi all,

One of the repositories I am working with has a long list of "svn:external"
links. This has a major impact on working copy update performance, since
each external link needs an additional roundtrip to the (painfully slow) SVN
server. So it can easily be the case that the update of the main working
copy (several 10k files) is done in <10 seconds, but querying the 40-45
external repository takes another 60-80 seconds.

However, since the external links are all fixed to a specific revision
number, and since the working copy knows its current reference revision
number, why do you have to roundtrip to the SVN server anyway? Wouldn't it
be safe to omit the server roundtrip, under the assumption that a published
revision is immutable?

At least in my case, this would result in a speedup of "svn update" of
factor 5-15 in the case of no or only a only few changesets to download.

Thanks,
Kuno


Re: possible optimization on update at externals with fixed-revision number?

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Sperling <st...@elego.de> writes:

> There may be more restrictions I'm not aware of right now.
> Many retrictions can be determined cheaply by querying the external
> working copy's database, but some may need a disk crawl.

A disk crawl to identify missing files/directories is necessary since
update would restore.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: possible optimization on update at externals with fixed-revision number?

Posted by Johan Corveleyn <jc...@gmail.com>.
On Fri, Jan 24, 2014 at 2:22 PM, Branko Čibej <br...@wandisco.com> wrote:
> On 24.01.2014 14:09, Stefan Sperling wrote:
>
> On Thu, Jan 23, 2014 at 01:11:36PM +0000, Kuno Meyer wrote:
>
> Hi all,
>
> One of the repositories I am working with has a long list of "svn:external"
> links. This has a major impact on working copy update performance, since
> each external link needs an additional roundtrip to the (painfully slow) SVN
> server. So it can easily be the case that the update of the main working
> copy (several 10k files) is done in <10 seconds, but querying the 40-45
> external repository takes another 60-80 seconds.
>
> However, since the external links are all fixed to a specific revision
> number, and since the working copy knows its current reference revision
> number, why do you have to roundtrip to the SVN server anyway? Wouldn't it
> be safe to omit the server roundtrip, under the assumption that a published
> revision is immutable?
>
> Yes in principle, but the external working copy must have been tampered
> with. E.g. it must not contain mixed revisions.
>
> And non-pinned externals inside of pinned externals need to be handled
> properly.
>
> There may be more restrictions I'm not aware of right now.
> Many retrictions can be determined cheaply by querying the external
> working copy's database, but some may need a disk crawl.
>
> At least in my case, this would result in a speedup of "svn update" of
> factor 5-15 in the case of no or only a only few changesets to download.
>
> Please file an ENHANCEMENT issue for this. I like the idea.
>
>
> I think we'd get much better performance if we parallelized externals
> handling. I'm not aware of any reason (apart from serializing notifications)
> why this wouldn't work, as long as the externals have separate working
> copies.

That would be a more general improvement, making all externals updates
faster (not only the pinned ones). However, "doing no work" should
still be faster than "doing N things in parallel" :-), so that would
be an additional saving for externals I guess.

-- 
Johan

Re: possible optimization on update at externals with fixed-revision number?

Posted by Branko Čibej <br...@wandisco.com>.
On 24.01.2014 14:09, Stefan Sperling wrote:
> On Thu, Jan 23, 2014 at 01:11:36PM +0000, Kuno Meyer wrote:
>> Hi all,
>>
>> One of the repositories I am working with has a long list of "svn:external"
>> links. This has a major impact on working copy update performance, since
>> each external link needs an additional roundtrip to the (painfully slow) SVN
>> server. So it can easily be the case that the update of the main working
>> copy (several 10k files) is done in <10 seconds, but querying the 40-45
>> external repository takes another 60-80 seconds.
>>
>> However, since the external links are all fixed to a specific revision
>> number, and since the working copy knows its current reference revision
>> number, why do you have to roundtrip to the SVN server anyway? Wouldn't it
>> be safe to omit the server roundtrip, under the assumption that a published
>> revision is immutable?
> Yes in principle, but the external working copy must have been tampered
> with. E.g. it must not contain mixed revisions.
>
> And non-pinned externals inside of pinned externals need to be handled
> properly.
>
> There may be more restrictions I'm not aware of right now.
> Many retrictions can be determined cheaply by querying the external
> working copy's database, but some may need a disk crawl.
>
>> At least in my case, this would result in a speedup of "svn update" of
>> factor 5-15 in the case of no or only a only few changesets to download.
> Please file an ENHANCEMENT issue for this. I like the idea.

I think we'd get much better performance if we parallelized externals
handling. I'm not aware of any reason (apart from serializing
notifications) why this wouldn't work, as long as the externals have
separate working copies.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com

Re: possible optimization on update at externals with fixed-revision number?

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Jan 23, 2014 at 01:11:36PM +0000, Kuno Meyer wrote:
> Hi all,
> 
> One of the repositories I am working with has a long list of "svn:external"
> links. This has a major impact on working copy update performance, since
> each external link needs an additional roundtrip to the (painfully slow) SVN
> server. So it can easily be the case that the update of the main working
> copy (several 10k files) is done in <10 seconds, but querying the 40-45
> external repository takes another 60-80 seconds.
> 
> However, since the external links are all fixed to a specific revision
> number, and since the working copy knows its current reference revision
> number, why do you have to roundtrip to the SVN server anyway? Wouldn't it
> be safe to omit the server roundtrip, under the assumption that a published
> revision is immutable?

Yes in principle, but the external working copy must have been tampered
with. E.g. it must not contain mixed revisions.

And non-pinned externals inside of pinned externals need to be handled
properly.

There may be more restrictions I'm not aware of right now.
Many retrictions can be determined cheaply by querying the external
working copy's database, but some may need a disk crawl.

> At least in my case, this would result in a speedup of "svn update" of
> factor 5-15 in the case of no or only a only few changesets to download.

Please file an ENHANCEMENT issue for this. I like the idea.

Would you be willing to try working on a patch that implements
your idea, even if it's only a rough start?

Thanks!