You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Thomas Zander <za...@kde.org> on 2005/05/07 15:56:12 UTC

ideas to make svn update faster.

Hi;
I'm a kde developer and as such lived through the cvs-svn conversion of the 
kde codebase.  I even blogged a bit about that.. [1]

One of the things I notice is that svn update is not faster then cvs update, 
which is contrary to expectations since there should be a global tree 
revision, so it should be faster then the cvs which has a revision per 
file.
I was told on #svn that this is due to the mixed revisions stuff. [2] Which 
I fully understand.  Looking at strace output I notice that svn could be a 
lot faster (do less writes) if svn was to be more optimistic about version 
numbers.

kdelibs has ~8800 files and 378 dirs. At any time maybe 10 files have a                 
different version then the rest (hell; let it be 10%).  That means that 
around 370 .svn/entries files have been written with the only change being 
a new version number in the name="" entry that is equal to just about all 
the other dirs in the project.
A simple optimalisation would be to remove the directory-version number (the 
one in the xml entry-tag with 'name=""') when it has the same one as the 
parent dir.
Its probably not goint to be as simple as that (since you update subdirs 
seperately), but I'm pretty sure that a lot less xml's have to be written 
if you follow the route that the normal state is a dir having the same 
version as its parent.  Only when that fails do you need to do extra work. 
Being optimistic about version changes; I'd call that.

Now; there is probably going to be a lot of opinions on the above subject; 
and I'd like to point out that svn really needs speed optimalisations; I 
have seen a LOT of complaints about this issue in the KDE switchover.  
Remember that if you find the above suggestion technically less-then-ideal.

The strace also showed me things like;
* the .svn/format file is opened 5 times for each directory.  I would think 
that with auto-upgrades only one (the root dir) should be enough. Saving 
5*378 -1 open-files for me. :)

* .svn/lock files being created in every subdir is not needed if you check 
parent dirs that also have a .svn (and maybe the same root).
So you create one in the dir you typed 'svn up' in and if someone types svn 
up in a subdir it will change dir to parent and check for a lock file until 
it either finds it (in this case it will, and abort) or it will leave the 
checkout.
This will save a _lot_ of file-creation and removal afterwards.

1) http://www.kdedevelopers.org/node/view/1028
2) http://colabti.de/irclogger/irclogger_log/svn?date=2005-05-07,Sat#l1300
-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Stephan Kulow <co...@kde.org>.

On Sunday 08 May 2005 16:49, Thomas Zander wrote:
> > How would you feel if I went on the KDE dev list with a proposal that
> > KDE use .Net (O.K., Mono) for RPC, on the grounds that it's "obviously
> > better" than whatever KDE uses now?
>
> Ahh;  the personal attack, how did we get here so fast?  No technical
> arguments left?
When became proposing Mono an insult? :)

Greetings, Stephan

-- 
Pace Peace Paix Paz Frieden Pax Pokój Friður Fred Béke 和平
Hasiti Lapé Hetep Malu Mир Wolakota Santiphap Irini Peoch
Shanti Vrede Baris Rój Mír Taika Rongo Sulh Py'guapy 평화

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by kf...@collab.net.

Thomas Zander <za...@kde.org> writes:
> > Ah. So you /do/ understand that your optimisation ideas would invalidate
> > our core WC design decisions, 
> 
> Again the defensive tone; svn has many design decisions that I have no idea 
> if those are relevant for all users (do you?).  After talking on IRC I 
> gather that the whole handling of the points I proposed a couple of small 
> optional optimisations to, is in desperate need of a rewrite. And has been 
> for a long time; with mail-threads like these; where people attack ideas 
> without trying to grok the idea first, time is wasted where code could be 
> written.

You know, I have to agree with Thomas here.

I mean, not with his proposals -- which have been analyzed to death
and need no further comment -- but with his point that we're being
overly defensive and not focusing on the real problem.

Basically, his complaint is legitimate: svn update is slower than it
has to be.  Maybe Thomas hasn't done enough careful measurement to get
at the root causes, and maybe his proposals are a bit naive, but we
don't need to hammer him down for that.

This thread *should* have started with us forthrightly acknowledging
the existence of the problem.  Then we should point out that it's very
complex to determine the precise causes, say why we haven't gotten
around to it, and point out what sorts of measurements need to be made
before we talk about a solution.  His proposals we should either rebut
amicably, or not respond to at all.  The key word is "amicably".
Instead, Thomas got a lot of put-downs dressed as technical replies.

I'm not naming names because I've been as guilty of this as anyone
lately, just not in this thread (this being my first post here).

As Subversion matures, the difference in background knowledge between
an experienced developer and a newcomer will only increase.  This
means that the longtime devs will start to appear short-fused, because
even if our fuses stay exactly as long as they've always been, we'll
still be contending with questions that are (relatively speaking) more
naive than questions in the past.

The solution to that is not to start slamming newcomers, but to point
out -- nicely :-) -- that SVN is complicated, and to help them
understand those complications, so they can contribute.  Let us please
lengthen fuses accordingly!

End of soapbox,
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Thomas Zander <za...@kde.org>.

On Sunday 08 May 2005 15:49, Branko Čibej wrote:
> Thomas Zander wrote:
> >On Sunday 08 May 2005 12:26, Gustav Munkby wrote:
> >>if you don't have the revision in the subdirectories, then this will no
> >>longer be possible?
> >
> >Thats right.
> >Notice that its an expert function anyway.  Expert because moving
> >directories around in your IDE (or even filesystem) will horribly fail
> >revision management (as I have seen some former colleagues do).
>
> Ah. So you /do/ understand that your optimisation ideas would invalidate
> our core WC design decisions, 

Again the defensive tone; svn has many design decisions that I have no idea 
if those are relevant for all users (do you?).  After talking on IRC I 
gather that the whole handling of the points I proposed a couple of small 
optional optimisations to, is in desperate need of a rewrite. And has been 
for a long time; with mail-threads like these; where people attack ideas 
without trying to grok the idea first, time is wasted where code could be 
written.

> and you're waving that fact away as 
> irrelevant.
Where did I do that?  Be serious here;  I did not propse you drop this; I 
just confirmed that in one specific use case old functionality will not 
work anymore.  This is normal in software development and is generally 
called refactoring.  You _could_ for example try to find out which people 
this effects and make the functional change an option.  (for example).

> How would you feel if I went on the KDE dev list with a proposal that
> KDE use .Net (O.K., Mono) for RPC, on the grounds that it's "obviously
> better" than whatever KDE uses now?

Ahh;  the personal attack, how did we get here so fast?  No technical 
arguments left?

Please; keep your eyes on the ball and actually try to work together here.
Or be honest and tell me to piss off so I won't waste my time trying to help 
svn become a better product for less-then-small projects.

-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Branko Čibej <br...@xbc.nu>.

Thomas Zander wrote:

>On Sunday 08 May 2005 12:26, Gustav Munkby wrote:
>  
>
>>Thomas Zander wrote:
>>    
>>
>>>>A simple optimalisation would be to remove the directory-version number
>>>>(the one in the xml entry-tag with 'name=""') when it has the same one
>>>>as the parent dir.
>>>>        
>>>>
>>excuse me for barging in like this, but as I understand it every
>>subdirectory of a subversion working copy is by design supposed to work
>>as a complete working copy, so that i can check out a whole tree and
>>then just simply move/copy a subdirectory to a different place and it
>>will still work.
>>
>>if you don't have the revision in the subdirectories, then this will no
>>longer be possible?
>>    
>>
>
>Thats right.
>Notice that its an expert function anyway.  Expert because moving 
>directories around in your IDE (or even filesystem) will horribly fail 
>revision management (as I have seen some former colleagues do).
>  
>
Ah. So you /do/ understand that your optimisation ideas would invalidate 
our core WC design decisions, and you're waving that fact away as 
irrelevant.

How would you feel if I went on the KDE dev list with a proposal that 
KDE use .Net (O.K., Mono) for RPC, on the grounds that it's "obviously 
better" than whatever KDE uses now?

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Thomas Zander <za...@kde.org>.

On Sunday 08 May 2005 12:26, Gustav Munkby wrote:
> Thomas Zander wrote:
> >>A simple optimalisation would be to remove the directory-version number
> >>(the one in the xml entry-tag with 'name=""') when it has the same one
> >> as the parent dir.
>
> excuse me for barging in like this, but as I understand it every
> subdirectory of a subversion working copy is by design supposed to work
> as a complete working copy, so that i can check out a whole tree and
> then just simply move/copy a subdirectory to a different place and it
> will still work.
>
> if you don't have the revision in the subdirectories, then this will no
> longer be possible?

Thats right.
Notice that its an expert function anyway.  Expert because moving 
directories around in your IDE (or even filesystem) will horribly fail 
revision management (as I have seen some former colleagues do).

-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Gustav Munkby <gr...@gmx.net>.

Thomas Zander wrote:
>>A simple optimalisation would be to remove the directory-version number
>>(the one in the xml entry-tag with 'name=""') when it has the same one as
>>the parent dir.
> 

excuse me for barging in like this, but as I understand it every 
subdirectory of a subversion working copy is by design supposed to work
as a complete working copy, so that i can check out a whole tree and
then just simply move/copy a subdirectory to a different place and it 
will still work.

if you don't have the revision in the subdirectories, then this will no 
longer be possible?

!g

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Thomas Zander <za...@kde.org>.

On Saturday 07 May 2005 23:28, Branko Čibej wrote:
> Thomas Zander wrote:
> Well, you /did/ say you didn't want to write the changed entries files
> if the "shape" of the directory didn't change. So the directory revision
> numbers in the entries file wouldn't be updated.
>
> If that's not what you meant, then I am now thorougly confused.

Let me quote myself;
> A simple optimalisation would be to remove the directory-version number
> (the one in the xml entry-tag with 'name=""') when it has the same one as
> the parent dir.
The effect would indeed be that you don't need the write the changed entries 
if the shape of the dir didn't change.
Well; naturally you'd have to do it ones, to remove the version number, but 
all successive updates won't need to anymore.

Hope that makes the idea clearer.
Do you still think this will create inconsistencies?

> >>>But, if you did the profiling part; I'd be happy to compare notes! :)
> >>
> >>Oh no -- that's your job, part of the task of convincing us you're
> >> right. :)
> >
> >Maybe you'd believe a co-developer better;
>
> This is not about whom I believe, but about you proving that your
> assertions are correct.

By quoting previous done work (by a co-developer, no less) saying the that 
IO is indeed the bottleneck; didn't I just do that?
Your reasoning makes me afraid that the only prove you will believe are 
tests that you ran yourself anyway.

> >Ahh; yeah; I think my solution will be full of race conditions, why
> > didn't I think of that when I wrote it.
> >Wait; its not written yet; so you can still make sure it doesn't have
> > any! Micro-bugfixing is not quite usefull at this stage; is it? :-)
>
> If you're impying that resolving race conditions is micro-bugfixing,
> then I strongly suggest you stop right there an do another think. I hope
> you're not implying that.

I'm not exactly new to programming/multithreading etc; with over 15 years of 
experience and the last 6 years doing multithreaded IO and UI work (in 
Java) for a living, I think I know how to avoid race conditions.
The global design phase isn't it (unless maybe with zero experience).

> >Ofcourse you need to first create an optimistic lock file and then you
> > start checking for subdirs / superdirs that have one.
>
> The result is that you actually have to traverse the whole subtree,
> doing "stat"s instead of "creat"s, *and* you'd have to walk upwards, too
> (which we currently don't do).

1) stats in the same dir as reads on the entries file you are going to be 
reading anyway is optimised by the filesystem and gives you near zero 
overhead.
Note that svn walks the tree now as well; but it actually opens the locks 
for write instead of statting them.  If you followed the filesystem classes 
you'd know that the roundtrips goes from 1 to 3 (or even 5, depending on 
filesytem being journalling or not).
2) Ignoring a subdir which is already locked in an update session brings 
this back to 0.
3) traversing directories up instead of recursively traversing directories 
down is by definition less work and tests may have to show if it fits in 
the gain of now writing; but I'm pretty confident it will for all but the 
most extreme cases.

> That's an interesting approach to 
> optimisation. :)
If you can't take new ideas seriously, just tell me to shove it and I'll go 
away, no problem.
I did expect a more mature approuch from you, though.

-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Branko Čibej <br...@xbc.nu>.

Thomas Zander wrote:

>Hi Branko,
>
>On Saturday 07 May 2005 22:45, Branko Čibej wrote:
>  
>
>>Thomas Zander wrote:
>>    
>>
>>>On Saturday 07 May 2005 20:47, Branko Čibej wrote:
>>>As I said; you read the entries files as normal, but you don't have to
>>>overwrite them for each dir if only the global version changed. Since
>>>the resulting xml would be exactly the same.
>>>      
>>>
>>Whatever you'd gain during update by not recording the new directory
>>revision (on the assumption that it's the same as the old one), you'd
>>lose because your working copy would have a greater mix of revision
>>numbers, which means that the tree report sent to the server before the
>>commit would be larger. Exactly what this gain/loss ratio would be, I
>>wouldn't venture to guess, but I'm pretty sure it doesn't scale linearly
>>with WC size.
>>    
>>
>Hmm?  How do you gather that?
>If I do an update afterwards I'm sure that all dirs in that tree have the 
>exact same version. (well; except if their sticky, but thats not what you 
>seem to be talking about).
>So; how on earth would registering the revisions differently have any effect 
>on the actual version numbers that those dirs have?
>Are we talking in completely different directions here?
>  
>
Well, you /did/ say you didn't want to write the changed entries files 
if the "shape" of the directory didn't change. So the directory revision 
numbers in the entries file wouldn't be updated.

If that's not what you meant, then I am now thorougly confused.

>>Most of the time, yes, but disk access isn't the only slow part of an
>>update.
>>
>>    
>>
>>>But, if you did the profiling part; I'd be happy to compare notes! :)
>>>      
>>>
>>Oh no -- that's your job, part of the task of convincing us you're right.
>>:)
>>    
>>
>
>Maybe you'd believe a co-developer better;
>
This is not about whom I believe, but about you proving that your 
assertions are correct.


>>>If I type update in foo/bar  then the root is bar.  If I type update in
>>>foo/bar/baz; then the root is baz.  Simple because thats already what
>>>you do now.
>>>      
>>>
>>But what if you type "svn update foo/bar & svn update foo/bar/baz/qux"?
>>    
>>
>
>Thats _exactly_ what I explained below! (now snipped)
>
>  
>
>>Except of course for the potential race conditions, which can zap your
>>working copy.
>>    
>>
>
>Ahh; yeah; I think my solution will be full of race conditions, why didn't I 
>think of that when I wrote it.
>Wait; its not written yet; so you can still make sure it doesn't have any!
>Micro-bugfixing is not quite usefull at this stage; is it? :-)
>  
>
If you're impying that resolving race conditions is micro-bugfixing, 
then I strongly suggest you stop right there an do another think. I hope 
you're not implying that.

>Ofcourse you need to first create an optimistic lock file and then you start 
>checking for subdirs / superdirs that have one.
>  
>
The result is that you actually have to traverse the whole subtree, 
doing "stat"s instead of "creat"s, *and* you'd have to walk upwards, too 
(which we currently don't do). That's an interesting approach to 
optimisation. :)

>Which is nothing new since thats already how you do it right now, or does 
>the current way of working suffer from race conditions?
>  
>
I hope not. It should be free of race conditions.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Thomas Zander <za...@kde.org>.

Hi Branko,

On Saturday 07 May 2005 22:45, Branko Čibej wrote:
> Thomas Zander wrote:
> >On Saturday 07 May 2005 20:47, Branko Čibej wrote:
> >As I said; you read the entries files as normal, but you don't have to
> >overwrite them for each dir if only the global version changed. Since
> > the resulting xml would be exactly the same.
>
> Whatever you'd gain during update by not recording the new directory
> revision (on the assumption that it's the same as the old one), you'd
> lose because your working copy would have a greater mix of revision
> numbers, which means that the tree report sent to the server before the
> commit would be larger. Exactly what this gain/loss ratio would be, I
> wouldn't venture to guess, but I'm pretty sure it doesn't scale linearly
> with WC size.
Hmm?  How do you gather that?
If I do an update afterwards I'm sure that all dirs in that tree have the 
exact same version. (well; except if their sticky, but thats not what you 
seem to be talking about).
So; how on earth would registering the revisions differently have any effect 
on the actual version numbers that those dirs have?
Are we talking in completely different directions here?

> Most of the time, yes, but disk access isn't the only slow part of an
> update.
>
> >But, if you did the profiling part; I'd be happy to compare notes! :)
>
> Oh no -- that's your job, part of the task of convincing us you're right.
> :)

Maybe you'd believe a co-developer better; note that he mentioned IO being 
the biggest slowdown here:
http://svn.collab.net/repos/svn/trunk/notes/wc-improvements

> >You can only make assumtions if you wrote the things; you make
> > assumtions on the format of the entries file (and other things) for the
> > plain and simple reason that svn wrote the file.
> >So if the upgrading routine of the format of the .svn dir makes sure he
> >actually _knows_ about the format file afterwards; then yes you can make
> >assumtions.
>
> The catch is that you can't make assumptions about the order of working
> copy directory accesses. It's entirely possible to have a path A/B/C,
> where A anc C are at version N, and B is at N+1.

Ehm; please take a little longer to reply so you have time to read my 
response...
I'll repeat;  write the upgrade-format-version code so you _do_ know. At 
that point you can make assumtions. The how is explained below;

> >There are lots of ways to do this; if you find an old version in a
> > parent dir you upgrade it and upgrade all child dirs (which are listed
> > in each entries file) at the same time; and only when everything went
> > fine you note that in the format file.  With this approuch only one
> > .svn/format needs to be read.
>
> How do you find an old version, except by reading he format (or
> equivalent) file?

Like I said; by reading the format file; the trick is to make sure the 
entire repo has the same format version so there is only one in the repo to 
read.
Its not like the format will change between each update and commit, right?

> >If I type update in foo/bar  then the root is bar.  If I type update in
> >foo/bar/baz; then the root is baz.  Simple because thats already what
> > you do now.
>
> But what if you type "svn update foo/bar & svn update foo/bar/baz/qux"?

Thats _exactly_ what I explained below! (now snipped)

> Except of course for the potential race conditions, which can zap your
> working copy.

Ahh; yeah; I think my solution will be full of race conditions, why didn't I 
think of that when I wrote it.
Wait; its not written yet; so you can still make sure it doesn't have any!
Micro-bugfixing is not quite usefull at this stage; is it? :-)
Ofcourse you need to first create an optimistic lock file and then you start 
checking for subdirs / superdirs that have one.
Which is nothing new since thats already how you do it right now, or does 
the current way of working suffer from race conditions?

-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Branko Čibej <br...@xbc.nu>.

Thomas Zander wrote:

>On Saturday 07 May 2005 20:47, Branko Čibej wrote:
>  
>
>>Thomas Zander wrote:
>>    
>>
>>>[2] Which
>>>I fully understand.  Looking at strace output I notice that svn could be
>>>a lot faster (do less writes) if svn was to be more optimistic about
>>>version numbers.
>>>      
>>>
>>With "more optimistic" == "wrong", unfortunately...
>>    
>>
>
>No its not; don't dig your heels in the sand just yet; please.  But, please 
>tell me exactly what usecase I missed where things go wrong.  Thanks.
>  
>
I think I covered some ot that later in the reply, but in general, I 
think you're making assumptions about WC usage patterns that aren't 
valid. A simplistic approach to speeding up "svn update" could easily 
slow down other operations.

>>>kdelibs has ~8800 files and 378 dirs. At any time maybe 10 files have a
>>>different version then the rest (hell; let it be 10%).  That means that
>>>around 370 .svn/entries files have been written with the only change
>>>being a new version number in the name="" entry that is equal to just
>>>about all the other dirs in the project.
>>>A simple optimalisation would be to remove the directory-version number
>>>(the one in the xml entry-tag with 'name=""') when it has the same one
>>>as the parent dir.
>>>      
>>>
>>Have you actually measured what percentage of update time it takes to
>>write those 378 entries files, or are you simply guessing that this is
>>the bottleneck?
>>    
>>
>
>What? Don't you think the amount of writes is a problem, then?
>
I've learned long ago not to "think" or "guess" about performance 
bottlenecks, but to measure them.

> The work done 
>on each update _is_ huge for a project like KDE (where kdelibs is just a 
>subdir; a normal update will easilly go to 200000 files).
>If you dd the profiling; thats fine.  Lets work on that; if you didn't then 
>what about working on this part, now, eh?
>Statting less files etc comes later.
>  
>
Look, I didn't say your analysis was wrong, I asked if you'd actually 
measured the performance. If you haven't then anything you say is pure 
guesswork.

>>>Its probably not goint to be as simple as that (since you update subdirs
>>>seperately), but I'm pretty sure that a lot less xml's have to be
>>>written if you follow the route that the normal state is a dir having
>>>the same version as its parent.  Only when that fails do you need to do
>>>extra work. Being optimistic about version changes; I'd call that.
>>>      
>>>
>>Well, the first question that pops to mind is, how do you tell that the
>>equal-version assumption is wrong, unless you record the dir's version
>>number?
>>    
>>
>Sure you record it; but only for the dirs/files that actually have a 
>different version number. (and svn already does that partly)
>Don't think so black and white, here.
>As I said; you read the entries files as normal, but you don't have to 
>overwrite them for each dir if only the global version changed. Since the 
>resulting xml would be exactly the same.
>  
>
Whatever you'd gain during update by not recording the new directory 
revision (on the assumption that it's the same as the old one), you'd 
lose because your working copy would have a greater mix of revision 
numbers, which means that the tree report sent to the server before the 
commit would be larger. Exactly what this gain/loss ratio would be, I 
wouldn't venture to guess, but I'm pretty sure it doesn't scale linearly 
with WC size.

>>>Now; there is probably going to be a lot of opinions on the above
>>>subject; and I'd like to point out that svn really needs speed
>>>optimalisations; I have seen a LOT of complaints about this issue in
>>>the KDE switchover. Remember that if you find the above suggestion
>>>technically less-then-ideal.
>>>      
>>>
>>Certainly SVN needs speed optimizations. But I think you're approaching
>>them exactly the wrong way around. The thing to do is to measure where
>>the bottlenecks are, and strace is far from enough for that.
>>    
>>
>
>Hmm. I'm afraid its not really a secret recipy that if your process is not 
>taking a lot of cpu and memory, but is reading and writing a lot of files; 
>then the first thing to look into is to get it to write less files since 
>writing files is _always_ the slow part of disk access.
>  
>
Most of the time, yes, but disk access isn't the only slow part of an 
update.

>But, if you did the profiling part; I'd be happy to compare notes! :)
>  
>
Oh no -- that's your job, part of the task of convincing us you're right. :)

>>>The strace also showed me things like;
>>>* the .svn/format file is opened 5 times for each directory.
>>>      
>>>
>>We know about that, and we already have a (tentative) plan to remove the
>>format file and put the format information into the entries file.
>>    
>>
>
>Sounds great; good to hear I'm not smoking crack then :)
>
>  
>
>>>I would think
>>>that with auto-upgrades only one (the root dir) should be enough.
>>>      
>>>
>>That, of course, is again an oversimplification. You can't make
>>assumptions about the state of subdirectories in the working copy.
>>    
>>
>
>You can only make assumtions if you wrote the things; you make assumtions on 
>the format of the entries file (and other things) for the plain and simple 
>reason that svn wrote the file.
>So if the upgrading routine of the format of the .svn dir makes sure he 
>actually _knows_ about the format file afterwards; then yes you can make 
>assumtions.
>  
>
The catch is that you can't make assumptions about the order of working 
copy directory accesses. It's entirely possible to have a path A/B/C, 
where A anc C are at version N, and B is at N+1.

>There are lots of ways to do this; if you find an old version in a parent 
>dir you upgrade it and upgrade all child dirs (which are listed in each 
>entries file) at the same time; and only when everything went fine you note 
>that in the format file.  With this approuch only one .svn/format needs to 
>be read.
>  
>
How do you find an old version, except by reading he format (or 
equivalent) file?

>>>* .svn/lock files being created in every subdir is not needed if you
>>>check parent dirs that also have a .svn (and maybe the same root).
>>>      
>>>
>>What you think of as the "root" of the working copy is a figment of the
>>imagination. It's quite valid to have two SVN processes fiddle in
>>parallel with two subtrees in the WC. A third SVN working from a common
>>root of those two subtrees could zap the WC if it didn't try to lock it
>>recursively first.
>>    
>>
>
>If I type update in foo/bar  then the root is bar.  If I type update in 
>foo/bar/baz; then the root is baz.  Simple because thats already what you 
>do now.
>  
>
But what if you type "svn update foo/bar & svn update foo/bar/baz/qux"? 
If you only create the lock file in the roots, the two updates will 
likely interfere with each other somewhere along the line. (And please 
note that, while this looks like nonsense from the command-line client's 
point of view, a file-manager-like GUI could have other ideas.)

>The only difference being that you create a whole lot less lock files.
>Your example;
>consider
>a/b/c
>a/b/d
>One svn is updating c, another is updating d.  Effect; one lock file in 
>c/.svn and another in d/.svn
>Then the user types svn update in 'a'.
>Effect now; svn: Working copy 'a/b/c' locked
>Effect with my proposed change; well, none actually, it again gives the same 
>problem.
>  
>
Except of course for the potential race conditions, which can zap your 
working copy.

>The fact that svn could just skip that dir in the update and only print a 
>warning is another point. But I won't go there just now.
>
>  
>
>>>So you create one in the dir you typed 'svn up' in and if someone types
>>>svn up in a subdir it will change dir to parent and check for a lock
>>>file until it either finds it (in this case it will, and abort) or it
>>>will leave the checkout.
>>>This will save a _lot_ of file-creation and removal afterwards.
>>>      
>>>
>>So, you're saying that we should check locks upwards in the working
>>copy, not downwards. Interesting idea. I'd not want to guess what
>>happens if you have symlinked working copies.
>>    
>>
>This is the opposite effect of the situation we described above. Same dirs
>a/b/c.   Only this time the first svn is in the dir 'a'.  And while thats 
>running I start one in thesubdir 'c'.
>You expect it to bail out, as it does right now.
>So; read my explenation and see how that will do exactly that.
>Symlinks are a non issue since svn doesn't follow them anyway in an update.
>  
>
-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Thomas Zander <za...@kde.org>.

On Saturday 07 May 2005 20:47, Branko Čibej wrote:
> Thomas Zander wrote:
> >One of the things I notice is that svn update is not faster then cvs
> > update, which is contrary to expectations since there should be a
> > global tree revision, so it should be faster then the cvs which has a
> > revision per file.
>
> I can't imagine how you reached that conclusion. How the revisions are
> numbered has no bearing on update speed.

The initial idea is not so strange; actually everyone I talked to that knows 
about svn's global numbering came to the same (apparently incorrect) 
conclusion.
If you already know the global version number you don't need to read the 
version for each file and dir.  Systems like darcs work changeset based and 
only 1 lock file is needed and no matter how big the repo is; the time to 
update is always the same.  I expected svn to have that same advantage.

> >I was told on #svn that this is due to the mixed revisions stuff.
>
> Yup, and the fact that SVN has to do a lot more work than CVS, because
> of directory versioning.

The slowness is not the processor or the network, its the amazing amount of 
disk access that for just about all kde modules means I'm totally going 
through the diskcaches on each update/commit or whatever.
Less writes seems to be a good start.  Lets get to the less-reads later. 
Shall we?

> > [2] Which
> >I fully understand.  Looking at strace output I notice that svn could be
> > a lot faster (do less writes) if svn was to be more optimistic about
> > version numbers.
>
> With "more optimistic" == "wrong", unfortunately...

No its not; don't dig your heels in the sand just yet; please.  But, please 
tell me exactly what usecase I missed where things go wrong.  Thanks.

> >kdelibs has ~8800 files and 378 dirs. At any time maybe 10 files have a
> >different version then the rest (hell; let it be 10%).  That means that
> >around 370 .svn/entries files have been written with the only change
> > being a new version number in the name="" entry that is equal to just
> > about all the other dirs in the project.
> >A simple optimalisation would be to remove the directory-version number
> > (the one in the xml entry-tag with 'name=""') when it has the same one
> > as the parent dir.
>
> Have you actually measured what percentage of update time it takes to
> write those 378 entries files, or are you simply guessing that this is
> the bottleneck?

What? Don't you think the amount of writes is a problem, then? The work done 
on each update _is_ huge for a project like KDE (where kdelibs is just a 
subdir; a normal update will easilly go to 200000 files).
If you dd the profiling; thats fine.  Lets work on that; if you didn't then 
what about working on this part, now, eh?
Statting less files etc comes later.

> >Its probably not goint to be as simple as that (since you update subdirs
> >seperately), but I'm pretty sure that a lot less xml's have to be
> > written if you follow the route that the normal state is a dir having
> > the same version as its parent.  Only when that fails do you need to do
> > extra work. Being optimistic about version changes; I'd call that.
>
> Well, the first question that pops to mind is, how do you tell that the
> equal-version assumption is wrong, unless you record the dir's version
> number?

Sure you record it; but only for the dirs/files that actually have a 
different version number. (and svn already does that partly)
Don't think so black and white, here.
As I said; you read the entries files as normal, but you don't have to 
overwrite them for each dir if only the global version changed. Since the 
resulting xml would be exactly the same.

> >Now; there is probably going to be a lot of opinions on the above
> > subject; and I'd like to point out that svn really needs speed
> > optimalisations; I have seen a LOT of complaints about this issue in
> > the KDE switchover. Remember that if you find the above suggestion
> > technically less-then-ideal.
>
> Certainly SVN needs speed optimizations. But I think you're approaching
> them exactly the wrong way around. The thing to do is to measure where
> the bottlenecks are, and strace is far from enough for that.

Hmm. I'm afraid its not really a secret recipy that if your process is not 
taking a lot of cpu and memory, but is reading and writing a lot of files; 
then the first thing to look into is to get it to write less files since 
writing files is _always_ the slow part of disk access.
But, if you did the profiling part; I'd be happy to compare notes! :)

> >The strace also showed me things like;
> >* the .svn/format file is opened 5 times for each directory.
>
> We know about that, and we already have a (tentative) plan to remove the
> format file and put the format information into the entries file.

Sounds great; good to hear I'm not smoking crack then :)

> >I would think
> >that with auto-upgrades only one (the root dir) should be enough.
>
> That, of course, is again an oversimplification. You can't make
> assumptions about the state of subdirectories in the working copy.

You can only make assumtions if you wrote the things; you make assumtions on 
the format of the entries file (and other things) for the plain and simple 
reason that svn wrote the file.
So if the upgrading routine of the format of the .svn dir makes sure he 
actually _knows_ about the format file afterwards; then yes you can make 
assumtions.
There are lots of ways to do this; if you find an old version in a parent 
dir you upgrade it and upgrade all child dirs (which are listed in each 
entries file) at the same time; and only when everything went fine you note 
that in the format file.  With this approuch only one .svn/format needs to 
be read.

> >* .svn/lock files being created in every subdir is not needed if you
> > check parent dirs that also have a .svn (and maybe the same root).
>
> What you think of as the "root" of the working copy is a figment of the
> imagination. It's quite valid to have two SVN processes fiddle in
> parallel with two subtrees in the WC. A third SVN working from a common
> root of those two subtrees could zap the WC if it didn't try to lock it
> recursively first.

If I type update in foo/bar  then the root is bar.  If I type update in 
foo/bar/baz; then the root is baz.  Simple because thats already what you 
do now.
The only difference being that you create a whole lot less lock files.
Your example;
consider
a/b/c
a/b/d
One svn is updating c, another is updating d.  Effect; one lock file in 
c/.svn and another in d/.svn
Then the user types svn update in 'a'.
Effect now; svn: Working copy 'a/b/c' locked
Effect with my proposed change; well, none actually, it again gives the same 
problem.

The fact that svn could just skip that dir in the update and only print a 
warning is another point. But I won't go there just now.

> >So you create one in the dir you typed 'svn up' in and if someone types
> > svn up in a subdir it will change dir to parent and check for a lock
> > file until it either finds it (in this case it will, and abort) or it
> > will leave the checkout.
> >This will save a _lot_ of file-creation and removal afterwards.
>
> So, you're saying that we should check locks upwards in the working
> copy, not downwards. Interesting idea. I'd not want to guess what
> happens if you have symlinked working copies.

This is the opposite effect of the situation we described above. Same dirs
a/b/c.   Only this time the first svn is in the dir 'a'.  And while thats 
running I start one in thesubdir 'c'.
You expect it to bail out, as it does right now.
So; read my explenation and see how that will do exactly that.
Symlinks are a non issue since svn doesn't follow them anyway in an update.

-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Branko Čibej <br...@xbc.nu>.

Thomas Zander wrote:

>Hi;
>I'm a kde developer and as such lived through the cvs-svn conversion of the 
>kde codebase.  I even blogged a bit about that.. [1]
>
>One of the things I notice is that svn update is not faster then cvs update, 
>which is contrary to expectations since there should be a global tree 
>revision, so it should be faster then the cvs which has a revision per 
>file.
>  
>
I can't imagine how you reached that conclusion. How the revisions are 
numbered has no bearing on update speed.

>I was told on #svn that this is due to the mixed revisions stuff.
>
Yup, and the fact that SVN has to do a lot more work than CVS, because 
of directory versioning.

> [2] Which 
>I fully understand.  Looking at strace output I notice that svn could be a 
>lot faster (do less writes) if svn was to be more optimistic about version 
>numbers.
>  
>
With "more optimistic" == "wrong", unfortunately...

>kdelibs has ~8800 files and 378 dirs. At any time maybe 10 files have a                 
>different version then the rest (hell; let it be 10%).  That means that 
>around 370 .svn/entries files have been written with the only change being 
>a new version number in the name="" entry that is equal to just about all 
>the other dirs in the project.
>A simple optimalisation would be to remove the directory-version number (the 
>one in the xml entry-tag with 'name=""') when it has the same one as the 
>parent dir.
>  
>
Have you actually measured what percentage of update time it takes to 
write those 378 entries files, or are you simply guessing that this is 
the bottleneck?

>Its probably not goint to be as simple as that (since you update subdirs 
>seperately), but I'm pretty sure that a lot less xml's have to be written 
>if you follow the route that the normal state is a dir having the same 
>version as its parent.  Only when that fails do you need to do extra work. 
>Being optimistic about version changes; I'd call that.
>  
>
Well, the first question that pops to mind is, how do you tell that the 
equal-version assumption is wrong, unless you record the dir's version 
number?

>Now; there is probably going to be a lot of opinions on the above subject; 
>and I'd like to point out that svn really needs speed optimalisations; I 
>have seen a LOT of complaints about this issue in the KDE switchover.  
>Remember that if you find the above suggestion technically less-then-ideal.
>  
>
Certainly SVN needs speed optimizations. But I think you're approaching 
them exactly the wrong way around. The thing to do is to measure where 
the bottlenecks are, and strace is far from enough for that.


>The strace also showed me things like;
>* the .svn/format file is opened 5 times for each directory.
>
We know about that, and we already have a (tentative) plan to remove the 
format file and put the format information into the entries file.

>I would think 
>that with auto-upgrades only one (the root dir) should be enough.
>
That, of course, is again an oversimplification. You can't make 
assumptions about the state of subdirectories in the working copy.

> Saving 
>5*378 -1 open-files for me. :)
>  
>
>* .svn/lock files being created in every subdir is not needed if you check 
>parent dirs that also have a .svn (and maybe the same root).
>  
>
What you think of as the "root" of the working copy is a figment of the 
imagination. It's quite valid to have two SVN processes fiddle in 
parallel with two subtrees in the WC. A third SVN working from a common 
root of those two subtrees could zap the WC if it didn't try to lock it 
recursively first.

>So you create one in the dir you typed 'svn up' in and if someone types svn 
>up in a subdir it will change dir to parent and check for a lock file until 
>it either finds it (in this case it will, and abort) or it will leave the 
>checkout.
>This will save a _lot_ of file-creation and removal afterwards.
>  
>
So, you're saying that we should check locks upwards in the working 
copy, not downwards. Interesting idea. I'd not want to guess what 
happens if you have symlinked working copies.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: DeltaV vs ra_dav [was: Re: ideas to make svn update faster.]

Posted by Branko Čibej <br...@xbc.nu>.

Daniel Patterson wrote:

>  It's probably a lot of extra work to maintain two Apache modules
>  with very different behaviours though.
>
Yes, so I really don't think it makes sense. One module is quite capable 
of doing it all, but of course would have to be developed a bit 
differently than it is now. For example, we'd have to test with generic 
clients.

>  Plus, is there any intention
>  of making Subversion a valid DeltaV *client* as well?
>  
>
I don't think it makes sense to do that.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

DeltaV vs ra_dav [was: Re: ideas to make svn update faster.]

Posted by Daniel Patterson <da...@danpat.net>.

Julian Reschke wrote:
> Greg Hudson wrote:
>> It has long been my position that we should give up on DeltaV (which is
>> a failed standard, in my opinion, and one that doesn't fit our
>> architecture) and consider ourselves to be talking a private versioning
>> protocol over HTTP/DAV.  As part of that, we could feel free to
>
> I do agree that the current situation is frustrating. For instance, none
> of the currently available DeltaV clients works with a Subversion server
> because it fails to implement lots of mandatory stuff.
>
> On the other hand, I do disagree that dropping the goal of becoming
> RFC3253 compliant would be good :-).

  Would it make any sense to divide the Subversion-over-HTTP RA layer
  into two parts?  A server-side-only "mod_dav_deltav_svn", and
  rename the current "mod_dav_svn" to "mod_svn" (or somesuch).

  In this setup, mod_dav_deltav_svn would allow DeltaV clients
  to talk to a Subversion server.  mod_svn is used by ra_dav
  to allow Subversion clients to talk to Subversion servers
  over HTTP in an optimised way.

  It's probably a lot of extra work to maintain two Apache modules
  with very different behaviours though.  Plus, is there any intention
  of making Subversion a valid DeltaV *client* as well?

daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Julian Reschke <ju...@gmx.de>.

Greg Hudson wrote:
> On Sun, 2005-05-08 at 18:00 +0200, Thomas Zander wrote:
> 
>>Do I understand correctly that the local filesystem access is different 
>>based on which network layer I choose?
> 
> 
> Yes.  The DeltaV protocol (which we sort of follow, but not very well)
> requires us to use a server-provided "version-resource URL" for each
> file and directory in the working copy.  In practice, when talking to a

But only if you actually want to access that version, right?

> Subversion server, these URLs follow a very specific form, but in theory
> they could be arbitrary.  We have a mechanism called "wcprops" which we
> use to cache these version-resource URLs in the working copy so that we
> don't have to re-fetch them for each operation.

This shouldn't be an issue as long as the SVN client talks to an SVN 
server because they could in theory share the knowledge how to compute 
these URIs.

> It has long been my position that we should give up on DeltaV (which is
> a failed standard, in my opinion, and one that doesn't fit our
> architecture) and consider ourselves to be talking a private versioning
> protocol over HTTP/DAV.  As part of that, we could feel free to

I do agree that the current situation is frustrating. For instance, none 
of the currently available DeltaV clients works with a Subversion server 
because it fails to implement lots of mandatory stuff.

On the other hand, I do disagree that dropping the goal of becoming 
RFC3253 compliant would be good :-).

> synthesize version-resource URLs, which would let us ditch the wcprops
> mechanism, which would eliminate the discrepancy between ra_dav and
> ra_svn in working copy performance.

As far as I can tell, you can do that easily as long as your client only 
needs to work with SVN servers.

> Some developers agree with me and some don't; regardless, we haven't
> gone in that direction as yet.

Best regards,

Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Greg Hudson <gh...@MIT.EDU>.

On Sun, 2005-05-08 at 18:00 +0200, Thomas Zander wrote:
> Do I understand correctly that the local filesystem access is different 
> based on which network layer I choose?

Yes.  The DeltaV protocol (which we sort of follow, but not very well)
requires us to use a server-provided "version-resource URL" for each
file and directory in the working copy.  In practice, when talking to a
Subversion server, these URLs follow a very specific form, but in theory
they could be arbitrary.  We have a mechanism called "wcprops" which we
use to cache these version-resource URLs in the working copy so that we
don't have to re-fetch them for each operation.

It has long been my position that we should give up on DeltaV (which is
a failed standard, in my opinion, and one that doesn't fit our
architecture) and consider ourselves to be talking a private versioning
protocol over HTTP/DAV.  As part of that, we could feel free to
synthesize version-resource URLs, which would let us ditch the wcprops
mechanism, which would eliminate the discrepancy between ra_dav and
ra_svn in working copy performance.

Some developers agree with me and some don't; regardless, we haven't
gone in that direction as yet.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Thomas Zander <za...@kde.org>.

On Sunday 08 May 2005 17:35, Greg Hudson wrote:
> On Sun, 2005-05-08 at 16:35 +0200, Thomas Zander wrote:
> > I actually used https, since the update is reading I'm pointing this
> > out as I'm not sure thats handled by dav..
>
> It is.  A lot of our working copy inefficiency is specific to the ra_dav
> layer; you might try experimenting with the ra_svn layer (i.e. using
> svnserve) and seeing how much that helps.

Do I understand correctly that the local filesystem access is different 
based on which network layer I choose?
I assumed similar slowness since its slow with or without any updates coming 
in over the network.  But I gather that I should avoid https so detecting 
local changes becomes faster.  Wierd :)

Thanks.
-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Greg Hudson <gh...@MIT.EDU>.

On Sun, 2005-05-08 at 16:35 +0200, Thomas Zander wrote:
> I actually used https, since the update is reading I'm pointing this out as 
> I'm not sure thats handled by dav..

It is.  A lot of our working copy inefficiency is specific to the ra_dav
layer; you might try experimenting with the ra_svn layer (i.e. using
svnserve) and seeing how much that helps.

Of course, if you have other reasons to want to use http, it won't help
you very much to know that the network layer you don't want to use would
be much faster.  But it would give us more evidence that we should be
fixing up the ra_dav layer as it relates to the workgin copy--either
using access batons, or (as I've argued we should in the past) ditching
the wcprops mechanism and synthesizing version-resource URLs instead of
remembering them.  That gets into a complex argument about protocol
compatibility, but it's at least a relatively simply change to make.

If using ra_svn doesn't help enough, then more complicated changes might
be required, such as ones you suggested.  (Although your particular
changes would break working copy severability, as has been noted
elsewhere in the thread.  We might some day decide to sacrifice
severability for better performance, probably as part of a massive WC
library redesign, but right now it would be considered a bug.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Thomas Zander <za...@kde.org>.

On Sunday 08 May 2005 14:08, Philip Martin wrote:
> Philip Martin <ph...@codematters.co.uk> writes:
> > What version of Subversion are you using?  When I try "svn update" it
> > only opens each format file once:
> >
> > $ svn up -r5 . > /dev/null
> > $ strace -etrace=open svn up -r6 . 2>&1 | grep .svn/format
> > open(".svn/format", O_RDONLY)           = 3
> > open("foo/.svn/format", O_RDONLY)       = 3
> > open("foo/bar/.svn/format", O_RDONLY)   = 3
>
> That result was for ra_svn, but I realise now that it's ra_dav that
> opens the format file multiple times.  The problem is that the wcprops
> code is "old fashioned" and doesn't use access batons.

I actually used https, since the update is reading I'm pointing this out as 
I'm not sure thats handled by dav..
I would appreciate it if you could make sure that this issue was addressed.
Thanks.
-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Philip Martin <ph...@codematters.co.uk>.

Philip Martin <ph...@codematters.co.uk> writes:

> What version of Subversion are you using?  When I try "svn update" it
> only opens each format file once:
>
> $ svn up -r5 . > /dev/null
> $ strace -etrace=open svn up -r6 . 2>&1 | grep .svn/format
> open(".svn/format", O_RDONLY)           = 3
> open("foo/.svn/format", O_RDONLY)       = 3
> open("foo/bar/.svn/format", O_RDONLY)   = 3

That result was for ra_svn, but I realise now that it's ra_dav that
opens the format file multiple times.  The problem is that the wcprops
code is "old fashioned" and doesn't use access batons.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Thomas Zander <za...@kde.org>.

On Sunday 08 May 2005 18:07, Greg Hudson wrote:
> If you update a tree, and then visit one of its subdirs in TortoiseSVN,
> tsvn will do a non-recursive status operation on the subdir.  Currently,
> this involves reading only one entries file; under your scheme, it might
> have to read several.  Thus, several times slower.  (At least, in
> theory.  There might be more important sources of slowness in the
> non-recursive status operation.)

I'm sure there are other things making it slow; I just wrote a class that 
reads all the entries xml files and fetches the version fields from them.  
This I ran in KOffice (which has 483 dirs under version control).
Result; 7.1 milliseconds per parsed XML.

Oh; this was done in jit-compiled Java; I hope you will agree that this test 
will not read xmls significantly faster then your c-library.
-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Greg Hudson <gh...@MIT.EDU>.

On Sun, 2005-05-08 at 17:56 +0200, Thomas Zander wrote:
> Making things multiple times slower surely is unacceptable; I fully agree. 
> But please tell me how you come to the conclusion that this would be the 
> case?
> Just reading an xml file is something that already happens for each and 
> every directory you have managed; if you do an update from the root level 
> its going to be faster, if you do an update lower it could cause a couple 
> of extra local filesystem reads.  That does not translate to multiple times 
> slower at all.

If you update a tree, and then visit one of its subdirs in TortoiseSVN,
tsvn will do a non-recursive status operation on the subdir.  Currently,
this involves reading only one entries file; under your scheme, it might
have to read several.  Thus, several times slower.  (At least, in
theory.  There might be more important sources of slowness in the
non-recursive status operation.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Philip Martin <ph...@codematters.co.uk>.

Branko Äibej <br...@xbc.nu> writes:

> Of course, it doesn't help that we have to do more work on Windows,
> too (like removing the read-only flag before deleting a file).

That's platform specific code, you could change it so that it never
sets the read-only flag instead (I suppose svn:needs-lock has made
this more complicated than it would have been in 1.1.x).

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Branko Čibej <br...@xbc.nu>.

Folker Schamel wrote:

>
> Some additional notes:
> Performance under windows really is a serious problem;
> it is much worse than under Linux due to the windows file system.
> My impression is that unix-only-users often are not aware of this.
> My personal experience is: what costs only seconds under Linux
> often costs a minute under windows (e.g. a svn status; but strongly
> depending on the current state of the file system cache,
> which is worse under windows).

Yes, sad but true... Windows filesystem performance sucks tremendously. 
Of course, it doesn't help that we have to do more work on Windows, too 
(like removing the read-only flag before deleting a file).

> And there's another fundamental difference between windows
> and linux: The Tortoise performance is really critical,
> because it is integrated into the windows explorer itself.
> I am not aware of something similar on Linux.
> A command line command hanging for some time is one thing.
> But the windows explorer itself hanging for several seconds is another.
> The performance of the tortoise shell extension probably is the most
> critical performance issue compared to all other svn clients.
>
> This is the reason why I really don't like suggestions for performance
> "optimizations" without being aware of Tortoise or not using windows
> at all. ;-) Linux is really cool, but it is a fact that far the biggest
> market share is windows. At the end it is one reason for the success
> of svn that svn has a really good windows support.

Ach, if only there were more Windows developers on the SVN team, too...

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Folker Schamel <sc...@spinor.com>.

Some additional notes:
Performance under windows really is a serious problem;
it is much worse than under Linux due to the windows file system.
My impression is that unix-only-users often are not aware of this.
My personal experience is: what costs only seconds under Linux
often costs a minute under windows (e.g. a svn status; but strongly
depending on the current state of the file system cache,
which is worse under windows).

And there's another fundamental difference between windows
and linux: The Tortoise performance is really critical,
because it is integrated into the windows explorer itself.
I am not aware of something similar on Linux.
A command line command hanging for some time is one thing.
But the windows explorer itself hanging for several seconds is another.
The performance of the tortoise shell extension probably is the most
critical performance issue compared to all other svn clients.

This is the reason why I really don't like suggestions for performance
"optimizations" without being aware of Tortoise or not using windows
at all. ;-) Linux is really cool, but it is a fact that far the biggest
market share is windows. At the end it is one reason for the success
of svn that svn has a really good windows support.

> Philip Martin wrote:
> 
>> Folker Schamel <sc...@spinor.com> writes:
>>
>>
>>> And Tortoise queries the wc status non-recusively, which then would
>>> have to go up to root all the time instead of a single file read,
>>> as already explained by Philip.
>>> Which translates into multiple times slower.
>>
>>
>>
>> It will make it slower, but not multiple times slower.  Non-recursive
>> status already reads entries files for immediate subdirs and the
>> immediate parent. 
> 
> 
> Immediate subdirs also in case of the status of a file?
> 
>  > It also does a lot more than just read entries
> 
>> files.  However changing it to read entries files all the way back to
>> the root must, in general, make it a bit slower.
> 
> 
> Of course the actual costs depends on your particular situation.
> It is pure speculation, but for example in our particular case
> we have a quite deep directory structure (and not too many
> immediate sub-dirs), and so I suppose Tortoise would be really
> multiple times slower.
> 
> But what I really wanted to point out:
> As far as I know, currently the costs depend only _locally_ on
> the project directory structure. But then it would depend on the
> tree depth of the _global_ structure of the project.
> Which I consider as bad scaling behaviour for large projects,
> because it is somehow a change from O(1) to O(log project_size).
> Maybe the wording was choosen badly, but this is what I wanted
> to express with "multiple times slower".
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Folker Schamel <sc...@spinor.com>.

Philip Martin wrote:
> Folker Schamel <sc...@spinor.com> writes:
> 
> 
>>And Tortoise queries the wc status non-recusively, which then would
>>have to go up to root all the time instead of a single file read,
>>as already explained by Philip.
>>Which translates into multiple times slower.
> 
> 
> It will make it slower, but not multiple times slower.  Non-recursive
> status already reads entries files for immediate subdirs and the
> immediate parent. 

Immediate subdirs also in case of the status of a file?

 > It also does a lot more than just read entries
> files.  However changing it to read entries files all the way back to
> the root must, in general, make it a bit slower.

Of course the actual costs depends on your particular situation.
It is pure speculation, but for example in our particular case
we have a quite deep directory structure (and not too many
immediate sub-dirs), and so I suppose Tortoise would be really
multiple times slower.

But what I really wanted to point out:
As far as I know, currently the costs depend only _locally_ on
the project directory structure. But then it would depend on the
tree depth of the _global_ structure of the project.
Which I consider as bad scaling behaviour for large projects,
because it is somehow a change from O(1) to O(log project_size).
Maybe the wording was choosen badly, but this is what I wanted
to express with "multiple times slower".

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Philip Martin <ph...@codematters.co.uk>.

Folker Schamel <sc...@spinor.com> writes:

> And Tortoise queries the wc status non-recusively, which then would
> have to go up to root all the time instead of a single file read,
> as already explained by Philip.
> Which translates into multiple times slower.

It will make it slower, but not multiple times slower.  Non-recursive
status already reads entries files for immediate subdirs and the
immediate parent.  It also does a lot more than just read entries
files.  However changing it to read entries files all the way back to
the root must, in general, make it a bit slower.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Folker Schamel <sc...@spinor.com>.

Thomas Zander wrote:
> On Sunday 08 May 2005 17:18, Folker Schamel wrote:
> 
>>Thomas Zander wrote:
>>
>>>I'll paste it here:
>>>you are almost sure to have different
>>>versions in the individual dirs as is; this won't change if only
>>>changes to the parent are recorded, like I proposed.  So in practice
>>>only one dir up will be read the first time; and a second time only the
>>>current one.
> 
> ..
> 
>>I do global updates quite regulary (-> rev is the same for all dirs and
>>files). But also use Tortoise all the time.
>>And Tortoise performance already is a serious problem,
>>because it hooks windows itself (the explorer).
>>So it seems that your proposal would make subversion
>>multiple times slower for an typical use-case.
>>Acceptable? Definitely not!
> 
> 
> Making things multiple times slower surely is unacceptable; I fully agree. 
> But please tell me how you come to the conclusion that this would be the 
> case?
> Just reading an xml file is something that already happens for each and 
> every directory you have managed; if you do an update from the root level 
> its going to be faster, if you do an update lower it could cause a couple 
> of extra local filesystem reads.  That does not translate to multiple times 
> slower at all.

I'm not talking about the performance of updates. I'm talking about Tortoise,
as already explained in my previous email.
And Tortoise queries the wc status non-recusively, which then would
have to go up to root all the time instead of a single file read,
as already explained by Philip.
Which translates into multiple times slower.

You also may look into the archives of this list and the tsvn
list about the dozen discussions about tsvn performance.

> 
> So stop worrying :-)
> 
> 
>>Performance of wc management definitely is a problem in subversion.
>>But I also understand that the problem is not easy to solve.
> 
> 
> The problem I am running into here is that nobody seems to try to understand 
> the other and just repeats old points while ignoring explanations.

My impression is that the svn guys understand you very well.
But you seem to have only limited view of svn (and its tool family),
and how it is used, and as consequence you seem to not really
understand the issues.

For example, do you use Tortoise?
Do you know how it works?

> 
> 
>>My impression is that the svn team knows very well the
>>performance problems.
>>And it also knows very well possible solutions because
>>it understands both subversion and its implementation
>>(see the wc performance paper already quoted some mails ago.)
>>Providing "good ideas" without the latter is not really helpful.
> 
> 
> I'm not going to ask them to explain the whole thing in detail to me, that 
> would be impractical indeed.  What I do expect is that if someone tries to 
> pose a 'good idea' is that people try to understand it and not simply say 
> its going to fail followed by an argument thats incredibly easy to debunk.
> In short; I expect people to talk about technical implementations; since it 
> is my understanding that that is what this list is for.
> 
> But I'm seriously starting to doubt that is what this list is being used 
> for..


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Thomas Zander <za...@kde.org>.

On Sunday 08 May 2005 17:18, Folker Schamel wrote:
> Thomas Zander wrote:
> > I'll paste it here:
> > you are almost sure to have different
> > versions in the individual dirs as is; this won't change if only
> > changes to the parent are recorded, like I proposed.  So in practice
> > only one dir up will be read the first time; and a second time only the
> > current one.
..
> I do global updates quite regulary (-> rev is the same for all dirs and
> files). But also use Tortoise all the time.
> And Tortoise performance already is a serious problem,
> because it hooks windows itself (the explorer).
> So it seems that your proposal would make subversion
> multiple times slower for an typical use-case.
> Acceptable? Definitely not!

Making things multiple times slower surely is unacceptable; I fully agree. 
But please tell me how you come to the conclusion that this would be the 
case?
Just reading an xml file is something that already happens for each and 
every directory you have managed; if you do an update from the root level 
its going to be faster, if you do an update lower it could cause a couple 
of extra local filesystem reads.  That does not translate to multiple times 
slower at all.

So stop worrying :-)

> Performance of wc management definitely is a problem in subversion.
> But I also understand that the problem is not easy to solve.

The problem I am running into here is that nobody seems to try to understand 
the other and just repeats old points while ignoring explanations.

> My impression is that the svn team knows very well the
> performance problems.
> And it also knows very well possible solutions because
> it understands both subversion and its implementation
> (see the wc performance paper already quoted some mails ago.)
> Providing "good ideas" without the latter is not really helpful.

I'm not going to ask them to explain the whole thing in detail to me, that 
would be impractical indeed.  What I do expect is that if someone tries to 
pose a 'good idea' is that people try to understand it and not simply say 
its going to fail followed by an argument thats incredibly easy to debunk.
In short; I expect people to talk about technical implementations; since it 
is my understanding that that is what this list is for.

But I'm seriously starting to doubt that is what this list is being used 
for..
-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Philip Martin <ph...@codematters.co.uk>.

Thomas Zander <za...@kde.org> writes:

> On Sunday 08 May 2005 14:21, Philip Martin wrote:
>> Thomas Zander <za...@kde.org> writes:
>> > On Sunday 08 May 2005 13:27, Philip Martin wrote:
>> >> Thomas Zander <za...@kde.org> writes:
>> >> > On Sunday 08 May 2005 01:12, Philip Martin wrote:
>> >>
>> >> The paragraph above...
>> >
>> > ..
>> >
>> >> ...is my answer to your question.
>> >
>> > I posted an answer to prove you wrong twice already; please respond why
>> > you don't think that fixes it. With an example if that will make it
>> > clearer.
>>
>> Either I didn't see it, or I didn't understand it.
>
>
> I'll paste it here:
> you are almost sure to have different 
> versions in the individual dirs as is; this won't change if only changes to 
> the parent are recorded, like I proposed.

That makes no sense to me.  Your proposal concerns updates that affect
the whole working copy (your "global update") and cause the revision
number to be stored in the root entries file only.

>  So in practice only one dir up 
> will be read the first time;

That makes no sense to me.  The root might be several levels up,
status will have to read all the intervening entries files.

> and a second time only the current one.

That makes no sense to me, what changes between the first and the
second time I run status?

>>  [...] Do you agree? [...]
>> Do you agree? [...]
>> Do you agree?
> 3 x Yes

It looks like you agree that status will be slower after a "global
update".

>> While your idea may make update faster, it will make other operations
>> slower.
> No; read my answer on how in practice this will not happen in the usecase 
> you proposed.

My usecase is a user running status on bits of a single revision
working copy.

>  In fact; the only usecase where this will have an effect is 
> if you usually do a global update (your whole project)

That's my usecase.

> and then only update one nested subdir.

That makes no sense to me.  I'm concerned about single revision
working copies.

>  Then _one time_ will that do some extra reads.

That makes no sense to me.  What does "one time" mean?  Every time
status runs it will be slower.

> Which part of my answer don't you understand?  I pasted it 3 times and you 
> still have not responed to it.

I responded as well as I could, but your argument makes no sense to
me.  You want to make "global update" faster.  You agree that your
proposal will make status slower after such an update.  Then you argue
that the slowdown doesn't matter for reasons I don't understand.  As
far as I can tell you give are two reasons why the slowdown doesn't
matter: a) it doesn't occur "in practice", and b) it only happens "the
first time".

The "in practice" bit doesn't make sense.  If you want to optimise
"global update" because it's important to you, you cannot then go on
and argue that it doesn't occur "in practice".  That's just silly.
Also I think it does occur "in practice".

The "first time" bit doesn't make sense either.  Your proposal relies
on the revision number only being stored in the root, so that most of
the entries files don't have to be changed.  Running status isn't
going to change those entries files, and if it did it would defeat
your optimisation, so every time status runs it will be affected.

I'm trying to take your arguments seriously, but it's getting harder.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Folker Schamel <sc...@spinor.com>.

Thomas Zander wrote:
> On Sunday 08 May 2005 14:21, Philip Martin wrote:
> 
>>Thomas Zander <za...@kde.org> writes:
>>
>>>On Sunday 08 May 2005 13:27, Philip Martin wrote:
>>>
>>>>Thomas Zander <za...@kde.org> writes:
>>>>
>>>>>On Sunday 08 May 2005 01:12, Philip Martin wrote:
>>>>
>>>>The paragraph above...
>>>
>>>..
>>>
>>>
>>>>...is my answer to your question.
>>>
>>>I posted an answer to prove you wrong twice already; please respond why
>>>you don't think that fixes it. With an example if that will make it
>>>clearer.
>>
>>Either I didn't see it, or I didn't understand it.
> 
> 
> 
> I'll paste it here:
> you are almost sure to have different 
> versions in the individual dirs as is; this won't change if only changes to 
> the parent are recorded, like I proposed.  So in practice only one dir up 
> will be read the first time; and a second time only the current one.
> 
> 
>> [...] Do you agree? [...]
>>Do you agree? [...]
>>Do you agree?
> 
> 3 x Yes
> 
> 
>>While your idea may make update faster, it will make other operations
>>slower.
> 
> No; read my answer on how in practice this will not happen in the usecase 
> you proposed.  In fact; the only usecase where this will have an effect is 
> if you usually do a global update (your whole project) and then only update 
> one nested subdir.  Then _one time_ will that do some extra reads.
> 
> Which part of my answer don't you understand?  I pasted it 3 times and you 
> still have not responed to it.

I must admid, I don't understand it, too.

I do global updates quite regulary (-> rev is the same for all dirs and
files). But also use Tortoise all the time.
And Tortoise performance already is a serious problem,
because it hooks windows itself (the explorer).
So it seems that your proposal would make subversion
multiple times slower for an typical use-case.
Acceptable? Definitely not!

Performance of wc management definitely is a problem in subversion.
But I also understand that the problem is not easy to solve.
The problem is easy to solve for other version control systems
lacking important features due to the patch based design
(e.g. lacking GUIs like Tortoise, or lacking mixed revisions,
or lacking efficient access to per-file infos, or lacking
automatic detection of local modifications etc. etc. etc.).
But the problem is not easy to solve if you want to continue
supporting such features.
And svn is so popular because it offers these features.

My impression is that the svn team knows very well the
performance problems.
And it also knows very well possible solutions because
it understands both subversion and its implementation
(see the wc performance paper already quoted some mails ago.)
Providing "good ideas" without the latter is not really helpful.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Thomas Zander <za...@kde.org>.

On Sunday 08 May 2005 14:21, Philip Martin wrote:
> Thomas Zander <za...@kde.org> writes:
> > On Sunday 08 May 2005 13:27, Philip Martin wrote:
> >> Thomas Zander <za...@kde.org> writes:
> >> > On Sunday 08 May 2005 01:12, Philip Martin wrote:
> >>
> >> The paragraph above...
> >
> > ..
> >
> >> ...is my answer to your question.
> >
> > I posted an answer to prove you wrong twice already; please respond why
> > you don't think that fixes it. With an example if that will make it
> > clearer.
>
> Either I didn't see it, or I didn't understand it.


I'll paste it here:
you are almost sure to have different 
versions in the individual dirs as is; this won't change if only changes to 
the parent are recorded, like I proposed.  So in practice only one dir up 
will be read the first time; and a second time only the current one.

>  [...] Do you agree? [...]
> Do you agree? [...]
> Do you agree?
3 x Yes

> While your idea may make update faster, it will make other operations
> slower.
No; read my answer on how in practice this will not happen in the usecase 
you proposed.  In fact; the only usecase where this will have an effect is 
if you usually do a global update (your whole project) and then only update 
one nested subdir.  Then _one time_ will that do some extra reads.

Which part of my answer don't you understand?  I pasted it 3 times and you 
still have not responed to it.
-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Philip Martin <ph...@codematters.co.uk>.

Thomas Zander <za...@kde.org> writes:

> On Sunday 08 May 2005 13:27, Philip Martin wrote:
>> Thomas Zander <za...@kde.org> writes:
>> > On Sunday 08 May 2005 01:12, Philip Martin wrote:
>> The paragraph above...
> ..
>> ...is my answer to your question.
>
> I posted an answer to prove you wrong twice already; please respond why you 
> don't think that fixes it. With an example if that will make it clearer.

Either I didn't see it, or I didn't understand it.

As I understand your proposal you want to avoid putting the revision
number in an entry file when the revision number matches the one in
the parent entry file.

Do you agree?

At present a status operation will read an entries file and get the
revision number.  If the revision number is not present then to get it
the status operation must read the parent's entry file, and then maybe
the parent's parent's entry file, and so on up to the root.

Do you agree?

If the operation which used to read a single entry file now has to
read several entry files it will be slower.

Do you agree?

While your idea may make update faster, it will make other operations
slower.

Do you agree?

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Thomas Zander <za...@kde.org>.

On Sunday 08 May 2005 13:27, Philip Martin wrote:
> Thomas Zander <za...@kde.org> writes:
> > On Sunday 08 May 2005 01:12, Philip Martin wrote:
> The paragraph above...
..
> ...is my answer to your question.

I posted an answer to prove you wrong twice already; please respond why you 
don't think that fixes it. With an example if that will make it clearer.

-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Philip Martin <ph...@codematters.co.uk>.

Thomas Zander <za...@kde.org> writes:

> On Sunday 08 May 2005 01:12, Philip Martin wrote:
>> Thomas Zander <za...@kde.org> writes:
>> > On Saturday 07 May 2005 22:36, Philip Martin wrote:
>> >> Thomas Zander <za...@kde.org> writes:
>> >> > Or maybe I'm not following your 'non-recursive status' point above;
>> >> > in that case please explain what you mean by that.
>> >>
>> >> As I understand it you propose to avoid storing the revision in a
>> >> directory's entry file if the revision matches that of the parent.  To
>> >> get the revision TSVN is going to have to read all the entries files
>> >> up to the root rather than just the the one for the directory in
>> >> question.

The paragraph above...

>> > If it doesn't update recursively; then you are almost sure to have
>> > different versions in the individual dirs as is; this won't change if
>> > only changes to the parent are recorded, like I proposed.  So in
>> > practice only one dir up will be read the first time; and a second time
>> > only the current one.
>> >
>> > Right?
>>
>> You seem to be referring to update but I'm worried about status.  If
>> your idea to make update faster causes status to be slower then it's
>> probably a non-starter.
>
> I agree.  Now; why would they be slower?  I don't see _how_ they could be 
> slower, actually.

...is my answer to your question.

> I have no problem with you questioning a new approach; but please don't just 
> hit me with arguments that have absolutely no backing in theory.  Makes 
> conversing a lot easier here...

Huh?

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Thomas Zander <za...@kde.org>.

On Sunday 08 May 2005 01:12, Philip Martin wrote:
> Thomas Zander <za...@kde.org> writes:
> > On Saturday 07 May 2005 22:36, Philip Martin wrote:
> >> Thomas Zander <za...@kde.org> writes:
> >> > Or maybe I'm not following your 'non-recursive status' point above;
> >> > in that case please explain what you mean by that.
> >>
> >> As I understand it you propose to avoid storing the revision in a
> >> directory's entry file if the revision matches that of the parent.  To
> >> get the revision TSVN is going to have to read all the entries files
> >> up to the root rather than just the the one for the directory in
> >> question.
> >
> > If it doesn't update recursively; then you are almost sure to have
> > different versions in the individual dirs as is; this won't change if
> > only changes to the parent are recorded, like I proposed.  So in
> > practice only one dir up will be read the first time; and a second time
> > only the current one.
> >
> > Right?
>
> You seem to be referring to update but I'm worried about status.  If
> your idea to make update faster causes status to be slower then it's
> probably a non-starter.

I agree.  Now; why would they be slower?  I don't see _how_ they could be 
slower, actually.
I have no problem with you questioning a new approach; but please don't just 
hit me with arguments that have absolutely no backing in theory.  Makes 
conversing a lot easier here...

-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Philip Martin <ph...@codematters.co.uk>.

Thomas Zander <za...@kde.org> writes:

> On Saturday 07 May 2005 22:36, Philip Martin wrote:
>> Thomas Zander <za...@kde.org> writes:
>> > Or maybe I'm not following your 'non-recursive status' point above; in
>> > that case please explain what you mean by that.
>>
>> As I understand it you propose to avoid storing the revision in a
>> directory's entry file if the revision matches that of the parent.  To
>> get the revision TSVN is going to have to read all the entries files
>> up to the root rather than just the the one for the directory in
>> question.
>
> If it doesn't update recursively; then you are almost sure to have different 
> versions in the individual dirs as is; this won't change if only changes to 
> the parent are recorded, like I proposed.  So in practice only one dir up 
> will be read the first time; and a second time only the current one.
>
> Right?

You seem to be referring to update but I'm worried about status.  If
your idea to make update faster causes status to be slower then it's
probably a non-starter.

Also, I find your argument a bit odd: you want to optimise single
revision working copies because you think they are important, but you
seem to be implying that TSVN users won't be using them.  If you think
they are important why won't TSVN users think the same?

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Thomas Zander <za...@kde.org>.

On Saturday 07 May 2005 22:36, Philip Martin wrote:
> Thomas Zander <za...@kde.org> writes:
> > Or maybe I'm not following your 'non-recursive status' point above; in
> > that case please explain what you mean by that.
>
> As I understand it you propose to avoid storing the revision in a
> directory's entry file if the revision matches that of the parent.  To
> get the revision TSVN is going to have to read all the entries files
> up to the root rather than just the the one for the directory in
> question.

If it doesn't update recursively; then you are almost sure to have different 
versions in the individual dirs as is; this won't change if only changes to 
the parent are recorded, like I proposed.  So in practice only one dir up 
will be read the first time; and a second time only the current one.

Right?
-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Philip Martin <ph...@codematters.co.uk>.

Thomas Zander <za...@kde.org> writes:

> Or maybe I'm not following your 'non-recursive status' point above; in that 
> case please explain what you mean by that.

As I understand it you propose to avoid storing the revision in a
directory's entry file if the revision matches that of the parent.  To
get the revision TSVN is going to have to read all the entries files
up to the root rather than just the the one for the directory in
question.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: ideas to make svn update faster.

Posted by Thomas Zander <za...@kde.org>.

On Saturday 07 May 2005 20:51, Philip Martin wrote:
> TortoiseSVN uses non-recursive status on directories, with your scheme
> it might have to read multiple entries files, all the way back to the
> root, rather than just the entries file for the directory in question.
> That would be very unpopular with the TSVN crowd :)

If it doesn't update recursively; then you are almost sure to have different 
versions in the individual dirs as is; this won't change if only changes to 
the parent are recorded, like I proposed.  So in practice only one dir up 
will be read the first time; and a second time only the current one.
Or maybe I'm not following your 'non-recursive status' point above; in that 
case please explain what you mean by that.

> What version of Subversion are you using?  When I try "svn update" it
> only opens each format file once:
>
> $ svn up -r5 . > /dev/null
> $ strace -etrace=open svn up -r6 . 2>&1 | grep .svn/format
> open(".svn/format", O_RDONLY)           = 3
> open("foo/.svn/format", O_RDONLY)       = 3
> open("foo/bar/.svn/format", O_RDONLY)   = 3

Cool; the most glaring problem has been fixed then :-)  I'm using 1.1.4 (the 
latest version on debian testing).

Thanks!
-- 
Thomas Zander

Re: ideas to make svn update faster.

Posted by Philip Martin <ph...@codematters.co.uk>.

Thomas Zander <za...@kde.org> writes:

> A simple optimalisation would be to remove the directory-version number (the 
> one in the xml entry-tag with 'name=""') when it has the same one as the 
> parent dir.
> Its probably not goint to be as simple as that (since you update subdirs 
> seperately), but I'm pretty sure that a lot less xml's have to be written 
> if you follow the route that the normal state is a dir having the same 
> version as its parent.  Only when that fails do you need to do extra work. 
> Being optimistic about version changes; I'd call that.
>
> Now; there is probably going to be a lot of opinions on the above subject; 
> and I'd like to point out that svn really needs speed optimalisations; I 
> have seen a LOT of complaints about this issue in the KDE switchover.  
> Remember that if you find the above suggestion technically less-then-ideal.

TortoiseSVN uses non-recursive status on directories, with your scheme
it might have to read multiple entries files, all the way back to the
root, rather than just the entries file for the directory in question.
That would be very unpopular with the TSVN crowd :)

> The strace also showed me things like;
> * the .svn/format file is opened 5 times for each directory.  I would think 
> that with auto-upgrades only one (the root dir) should be enough. Saving 
> 5*378 -1 open-files for me. :)

What version of Subversion are you using?  When I try "svn update" it
only opens each format file once:

$ svn up -r5 . > /dev/null
$ strace -etrace=open svn up -r6 . 2>&1 | grep .svn/format
open(".svn/format", O_RDONLY)           = 3
open("foo/.svn/format", O_RDONLY)       = 3
open("foo/bar/.svn/format", O_RDONLY)   = 3

I'm using Subversion trunk@HEAD but as far as I know Subversion 1.2
should be the same, I haven't tried earlier versions.

I've done work in the past to make Subversion working copy handling
faster, I haven't done so much recently because it's "fast enough" for
my relatively small working copies.  There are a number of ideas for
changes in:

http://svn.collab.net/repos/svn/trunk/notes/wc-improvements

One suggestion is that we get rid of the format file and just use the
entries file.  Another idea is to cache the incomplete="true" write to
reduce the number of times the entries file is written.  I think those
would be relatively simple ways to make update faster.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org