You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Carsten Koch <Ca...@icem.com> on 2005/08/18 16:18:10 UTC

Re: svn list -R of medium-size repository takes 10 hours.

Barry Scott wrote:
 > I think you need to take this up on the svn dev list.

OK.

I have now looked at a much smaller test case with ethereal.
This test case does a 'svn list -R -v' on a directory
that contains only 16 files and one subdirectory with
another 8 files. So 'svn list -R -v' returns a total
of 1465 bytes in 25 lines and 'svn list -R' returns a total
of 440 bytes in 25 lines.

ethereal tells me that 60 k bytes in 113 packets were exchanged
in 7.32 seconds over a 64 k bit ISDN line.

If I am doing my math correctly, the ISDN line is running at
almost full speed. Meaning that my test case is slow due to
bandwidth, not due to latency.
So the only problem - at least in this test case - is,
that 1465 bytes (or 440 bytes) of end result are packed
in 60 k bytes of protocol, resulting in over 4000%
(or over 13000% )protocol overhead.

I looked at the data transmitted and found all kinds of XML
data that even with -v is never displayed: Apparently the data
contains names and values of properties, md5-checksums,
repository-uuids, etc. The dump even contains the string
http://subversion.tigris.org 55 times. I have no idea why
that could be useful.

Of course I fully understand that the protocol is very general
and satisfies more needs that just the ones of 'svn list'.
I also understand that transmitting uncompressed XML is
both very flexible and easy. But 7 seconds to transmit 25 lines
of listing? One must be very patient to like that. ;-)

My question is: Am I the only one suffering from terribly
slow "svn list -R" performance?
If this is not of general interest, I could create a
quick-and-dirty local solution, maybe based on "svnlook -tree"
in the post-commit hook.
If this is of general interest, would somebody be willing
to fix it, so that "svn list -R" becomes up to 130 times
faster?

Btw: "svnlook -tree" takes about 14 seconds to list the
repository of my test case below (the one that takes 10
hours over ISDN and 45 minutes locally with "svn list -R").


Thanks for any insight and Cheers,

Carsten.


P.S.: Sorry for the repost. I did not find this mail in the
       archives and nobody replied, so maybe it never got there.


 >
 > Barry
 >
 > On Jul 21, 2005, at 10:02, Carsten Koch wrote:
 >
 >> I have written a python script (using pysvn) that needs to know
 >> all file/directory names under a certain URL.
 >> The script works fine, but it takes ages to complete.
 >> The underlying problem being that "svn list -R" is extremly slow.
 >>
 >> Does anybody know a workaround against the performance problem
 >> of "svn list -R"?
 >>
 >> Our repository has been created just a few months ago, so it is
 >> not really big yet, but an "svn list -R" already takes 9 hours,
 >> 53 minutes and 56 seconds if run over my 128 kbit/s ISDN line:
 >>
 >> /usr/bin/time -v svn list -R http://svn/svn | wc -l
 >>         Command being timed: "svn list -R http://svn/svn"
 >>         User time (seconds): 757.42
 >>         System time (seconds): 6.58
 >>         Percent of CPU this job got: 2%
 >>         Elapsed (wall clock) time (h:mm:ss or m:ss): 9:53:56
 >>         Average shared text size (kbytes): 0
 >>         Average unshared data size (kbytes): 0
 >>         Average stack size (kbytes): 0
 >>         Average total size (kbytes): 0
 >>         Maximum resident set size (kbytes): 0
 >>         Average resident set size (kbytes): 0
 >>         Major (requiring I/O) page faults: 0
 >>         Minor (reclaiming a frame) page faults: 99632
 >>         Voluntary context switches: 206892
 >>         Involuntary context switches: 613
 >>         Swaps: 0
 >>         File system inputs: 0
 >>         File system outputs: 0
 >>         Socket messages sent: 0
 >>         Socket messages received: 0
 >>         Signals delivered: 0
 >>         Page size (bytes): 4096
 >>         Exit status: 0
 >> 139529
 >>
 >> (This test was run at night with no other load on the ISDN line
 >>  and almost no other load on either machine)
 >>
 >> Needless to say that this makes "svn list -R" completely useless for
 >> me. ;-)
 >>
 >> Even when run directly on the svn server, the same "svn list -R"
 >> command takes about 45 minutes:
 >>
 >> /usr/bin/time -v svn list -R http://svn/svn | wc -l
 >>         Command being timed: "svn list -R http://svn/svn"
 >>         User time (seconds): 960.05
 >>         System time (seconds): 32.64
 >>         Percent of CPU this job got: 36%
 >>         Elapsed (wall clock) time (h:mm:ss or m:ss): 44:58.62
 >>         Average shared text size (kbytes): 0
 >>         Average unshared data size (kbytes): 0
 >>         Average stack size (kbytes): 0
 >>         Average total size (kbytes): 0
 >>         Maximum resident set size (kbytes): 0
 >>         Average resident set size (kbytes): 0
 >>         Major (requiring I/O) page faults: 6096
 >>         Minor (reclaiming a frame) page faults: 115157
 >>         Voluntary context switches: 0
 >>         Involuntary context switches: 0
 >>         Swaps: 0
 >>         File system inputs: 0
 >>         File system outputs: 0
 >>         Socket messages sent: 0
 >>         Socket messages received: 0
 >>         Signals delivered: 0
 >>         Page size (bytes): 4096
 >>         Exit status: 0
 >> 133250
 >>
 >>
 >> I know that this is going to be fixed by issue 1809, see
 >> http://subversion.tigris.org/issues/show_bug.cgi?id=1809
 >>
 >> Is there anything that I can do in the meantime?
 >> Would a series of non-recursive lists be faster than the
 >> recursive list?
 >> Do I have to implement some kind of cache in the post-commit
 >> hook?
 >>
 >> Any hints will be appreciated.
 >>
 >> Thanks and Cheers,
 >>
 >> Carsten.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Carsten Koch <Ca...@icem.com>.
kfogel@collab.net wrote:
> Carsten Koch <Ca...@icem.com> writes:
...
>>So the only problem - at least in this test case - is,
>>that 1465 bytes (or 440 bytes) of end result are packed
>>in 60 k bytes of protocol, resulting in over 4000%
>>(or over 13000% )protocol overhead.
> 
> 
> Well, an important question is whether that overhead is a constant, or
> is always similarly proportional to the data size.
> 
> What happens if you run the same experiments with much bigger
> directories?

The same thing.
I looked at the data and much of the overhead is really
*per directory entry*, not per transaction.

The original test case I ran (the one that takes 10 hours)
has not been analyzed with ethereal (would have taken a
lot of effort and disk space ;-) .
But at least I ran it through wc -l.
So you can see that "svn list" eventually displayed 139529
result lines in 9 hours and 53 minutes over a 128kbps link.
I also checked that the 128kbps link was fully loaded, so
we transmitted roughly
   35636 seconds * 128kbps = 4561408000 bits = 570MB
to get  139529 result lines. Assuming the result lines were
an average of 20 characters long, they would comprise a
net result of 4 MB, again resulting in 13000% (or a
factor of 130), just as in the small example.


Carsten.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Carsten Koch <Ca...@icem.com>.
Carsten Koch wrote:
> Ben Collins-Sussman wrote:
> ...
> 
>> Another thing you may want to investigate:  enable mod_deflate in  
>> your apache server, and make sure your svn client has "http- 
>> compression = yes" set in your ~/.subversion/servers file.  Then the  
>> whole HTTP response will be gzipped.

Good idea! Thanks.
That sounds like an easy thing to do and I believe it will
improve matters a bit.

Carsten.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Carsten Koch <Ca...@icem.com>.
Ben Collins-Sussman wrote:
...
>> It would certainly be a tremendous gain for people who are trying to
>> do an svn ls -R over a low-bandwidth connection.
> 
> 
> No argument.  On the other hand, not many people run 'svn ls -R'.

Let me tell you the problem I am trying to solve.
Maybe there is a better approach without using "svn ls -R",
which we now agree is broken. ;-)

We work on plugins for a large vendor's CAD product.
The code structure given by the vendor consists of a tree
that we need entirely for a full build, but to work on
a small module it is sufficient to check out just a few
branches of the source tree, which must, however, start
at the root to allow builds of the module.
So I have created python scripts that use "svn checkout -N"
to get the root directory and "svn update -N" to get
directories under it as required.
As a convenience, the latter script only requires a partial
file name as a parameter. It does a "svn list -R" on the
root directory, a substring match on all results and
"svn update -N" on all directories between the root and
the matching entries.

If there were something like "svn find", I would not have
to transfer the entire list from the server to find the
matching entries...

Carsten.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Neon and /dev/random

Posted by kf...@collab.net.
Dale, you've already received on-topic replies.  I'm writing to make a
meta-comment:

Please don't follow up to an existing thread to start a new subject.
See http://subversion.tigris.org/mailing-list-guidelines.html#fresh-post
for a detailed explanation of why.

Thanks,
-Karl

"Dale Worley" <dw...@pingtel.com> writes:
> My apologies if this has been discussed before, but it seems to me that for
> Subversion's use, Neon should be built to use /dev/urandom by default.
> /dev/random is necessary if one wants cryptographic-quality random bits, but
> as far as I know, Subversion's security does not depend on the
> unpredictability of transaction IDs.
> 
> On the other hand, a peculiarity of /dev/random is that it extracts its
> random information from hardware input events on the computer, but it does
> not include disk accesses and network packets, because they are not due to
> external physical systems, and might be manipulatable by other
> processes/systems on the network.  On a workstation, /dev/random gets all
> the information it needs from the keyboard and mouse, but Neon runs on a
> server, which does not get keyboard and mouse events.
> 
> The result is that it's hardly surprising that accessing /dev/random blocks
> on some people's servers, and there's no reason not to use /dev/urandom.
> 
> Dale
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 

-- 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Neon and /dev/random

Posted by Max Bowsher <ma...@ukf.net>.
Dale Worley wrote:
> My apologies if this has been discussed before, but it seems to me that 
> for
> Subversion's use, Neon should be built to use /dev/urandom by default.

We can't really place any reliance on that, though, because more and more, 
neon is being built as an independent library, for use by more than just 
subversion. If there is a pressing need to stop subversion's use of neon 
depleting /dev/random, then neon's API needs to have a way for subversion to 
tell it that crypto-strength random numbers are not required.

Max.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Neon and /dev/random

Posted by Branko Čibej <br...@xbc.nu>.
Dale Worley wrote:

>My apologies if this has been discussed before, but it seems to me that for
>Subversion's use, Neon should be built to use /dev/urandom by default.
>  
>
APR, not Neon.

Anyway, whether or not /dev/urandom should be the default, and on which 
systems (not all systems have the blocking /dev/random problem), is a 
question for the APR project, not for Subversion.

We use APR as our portability layer, therefore we shouldn't be in the 
business of deciding how APR should behave on a particular platform.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Neon and /dev/random

Posted by James FitzGibbon <jf...@primustel.ca>.
Wouldn't using /dev/urandom mean that you would no longer get
cryptographic-quality random bits for accessing https:// repositories?  Or
does that setting come from the compiled openssl that neon was built
against?

-----Original Message-----
From: Dale Worley [mailto:dworley@pingtel.com] 
Sent: Friday, August 19, 2005 11:30 AM
To: dev@subversion.tigris.org
Subject: Neon and /dev/random

My apologies if this has been discussed before, but it seems to me that for
Subversion's use, Neon should be built to use /dev/urandom by default.
/dev/random is necessary if one wants cryptographic-quality random bits, but
as far as I know, Subversion's security does not depend on the
unpredictability of transaction IDs.

On the other hand, a peculiarity of /dev/random is that it extracts its
random information from hardware input events on the computer, but it does
not include disk accesses and network packets, because they are not due to
external physical systems, and might be manipulatable by other
processes/systems on the network.  On a workstation, /dev/random gets all
the information it needs from the keyboard and mouse, but Neon runs on a
server, which does not get keyboard and mouse events.

The result is that it's hardly surprising that accessing /dev/random blocks
on some people's servers, and there's no reason not to use /dev/urandom.

Dale


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


-- 
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.338 / Virus Database: 267.10.12/77 - Release Date: 8/18/2005
 

-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.338 / Virus Database: 267.10.12/77 - Release Date: 8/18/2005
 



-- 
----------------------------------------------------------------------------
This electronic message contains information from Primus Telecommunications
Canada Inc. ("PRIMUS") , which may be legally privileged and confidential.
The information is intended to be for the use of the individual(s) or entity
named above. If you are not the intended recipient, be aware that any
disclosure, copying, distribution or use of the contents of this information
is prohibited. If you have received this electronic message in error, please
notify us by telephone or e-mail (to the number or address above)
immediately. Any views, opinions or advice expressed in this electronic
message are not necessarily the views, opinions or advice of PRIMUS.
It is the responsibility of the recipient to ensure that
any attachments are virus free and PRIMUS bears no responsibility
for any loss or damage arising in any way from the use
thereof.The term "PRIMUS" includes its affiliates.
----------------------------------------------------------------------------
Pour la version en français de ce message, veuillez voir
 http://www.primustel.ca/fr/legal/cs.htm
----------------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Neon and /dev/random

Posted by Dale Worley <dw...@pingtel.com>.
My apologies if this has been discussed before, but it seems to me that for
Subversion's use, Neon should be built to use /dev/urandom by default.
/dev/random is necessary if one wants cryptographic-quality random bits, but
as far as I know, Subversion's security does not depend on the
unpredictability of transaction IDs.

On the other hand, a peculiarity of /dev/random is that it extracts its
random information from hardware input events on the computer, but it does
not include disk accesses and network packets, because they are not due to
external physical systems, and might be manipulatable by other
processes/systems on the network.  On a workstation, /dev/random gets all
the information it needs from the keyboard and mouse, but Neon runs on a
server, which does not get keyboard and mouse events.

The result is that it's hardly surprising that accessing /dev/random blocks
on some people's servers, and there's no reason not to use /dev/urandom.

Dale


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Ben Collins-Sussman <su...@collab.net>.
On Aug 19, 2005, at 8:33 AM, Carsten Koch wrote:

> Ben Collins-Sussman wrote:
> ...
>
>> we could make up yet  another custom protocol response/request  
>> that only svn and  mod_dav_svn would understand.  We're already  
>> doing that in many  places where subversion concepts don't line up  
>> with WebDAV/DeltaV.   Doing it for the sake of performance gains  
>> would be a new policy.  I  don't think everyone is (yet) in  
>> agreement yet about this.
>>
>
> It would certainly be a tremendous gain for people who are trying to
> do an svn ls -R over a low-bandwidth connection.

No argument.  On the other hand, not many people run 'svn ls -R'.    
'svn ls', sure.  'svn ls -R', not so much.   I think you'll have a  
hard time convincing us that your particular pain is common to  
everyone.  :-)

>
> I know next to nothing about WebDAV/DeltaV and the XML data exchanged,
> but would it not be possible to generate the 604kB of compressed data,
> wrap it in XML and transmit that instead of over a gigabyte?
>

Of course it's possible, that's what I mean when I say "invent a  
custom protocol".

Another thing you may want to investigate:  enable mod_deflate in  
your apache server, and make sure your svn client has "http- 
compression = yes" set in your ~/.subversion/servers file.  Then the  
whole HTTP response will be gzipped.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Carsten Koch <Ca...@icem.com>.
Ben Collins-Sussman wrote:
...
> we could make up yet  another custom 
> protocol response/request that only svn and  mod_dav_svn would 
> understand.  We're already doing that in many  places where subversion 
> concepts don't line up with WebDAV/DeltaV.   Doing it for the sake of 
> performance gains would be a new policy.  I  don't think everyone is 
> (yet) in agreement yet about this.

It would certainly be a tremendous gain for people who are trying to
do an svn ls -R over a low-bandwidth connection.

I repeated my benchmark with our current repository, which by now has
grown to 263182 entries, which is about twice the size it had during
my previous test. So it's now over 1GB of network traffic.

Even if I run svn list directly on our svn server machine
(AMD Athlon XP 3200+ running Linux), it now takes
         Command being timed: "svn list -R http://svn/svn"
         User time (seconds): 4317.76
         System time (seconds): 99.78
         Percent of CPU this job got: 60%
         Elapsed (wall clock) time (h:mm:ss or m:ss): 2:02:21


I compressed the resulting list using bzip2, which works very well:
   svn_list: 40.121:1,  0.199 bits/byte, 97.51% saved, 24246468 in, 604335 out.

So without redundancy, the net information I asked "svn list" for
is 604kB. It took "svn list" over two hours to get it to me and
it caused over a gigabyte of http traffic.

I know next to nothing about WebDAV/DeltaV and the XML data exchanged,
but would it not be possible to generate the 604kB of compressed data,
wrap it in XML and transmit that instead of over a gigabyte?

Carsten.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Tobias Ringström <to...@ringstrom.mine.nu>.
Ben Collins-Sussman wrote:

> This is one of many situations where "being a DAV client" is hurting  
> the svn client's performance.   In DAV-land, the proper way to get a  
> list of children in a collection (directory) is to do a depth-1  
> PROPFIND.  This has the unfortunate side effect of not just listing  
> all children, but also all of their properties -- both user-generated  
> and server-generated ones.  All in XML.
>
> If we were to just say "forget it, the svn client isn't a DAV  client" 
> (which has been discussed before), we could make up yet  another 
> custom protocol response/request that only svn and  mod_dav_svn would 
> understand.  We're already doing that in many  places where subversion 
> concepts don't line up with WebDAV/DeltaV.   Doing it for the sake of 
> performance gains would be a new policy.  I  don't think everyone is 
> (yet) in agreement yet about this.

Anyone who's interested in making 'svn ls' faster over http should take 
a look at issue 2151.

    http://subversion.tigris.org/issues/show_bug.cgi?id=2151

In this case, I'd say it's more a problem of how we use DAV, than a 
problem with DAV itself.

/Tobias


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Carsten Koch <Ca...@icem.com>.
Jean-Marc Godbout wrote:
> I am currently looking into this issue.

Thanks! That is good news.



> If you
> could send me an ethereal log (assuming your stuff isn't too private)
> I could confirm this for you.

No problem.
I guess sending you the whole gigabyte will not be useful,
since it will basically be the same data over and over again.
Would a small "svn list -R http://..." ethereal log be sufficient?

Carsten.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Jean-Marc Godbout <jm...@gmail.com>.
Yes, a simple ethereal log zipped and sent to my mailbox would help me
see if there are any other issues that might indicate why you are
sending so much data.

The bigger problem here is that "svn list" is slow. It will send large
amounts of data and take some time no matter what. Other dav clients
such as cadaver take a very comparable amount of time. In fact, apart
from the initial http requests, cadaver acts in a very similar fashion
than the subversion client - and takes about the same amount of time.

So there is not much hope for a massive improvement of "svn list", you
probably won't see improvements that would cut your request in half.
HOWEVER, I have a sneaking suspicion that "svn list -R" might be
optimizable (if that's a word). I will look into that today and get
back to you if I find something interesting.

Thanks,
Jean-Marc

On 8/21/05, Carsten Koch <Ca...@icem.com> wrote:
> Jean-Marc Godbout wrote:
> > I am currently looking into this issue.
> 
> Thanks! That is good news.
> 
> 
> 
> > If you
> > could send me an ethereal log (assuming your stuff isn't too private)
> > I could confirm this for you.
> 
> No problem.
> I guess sending you the whole gigabyte will not be useful,
> since it will basically be the same data over and over again.
> Would a small "svn list -R http://..." ethereal log be sufficient?
> 
> Carsten.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn list -R of medium-size repository takes 10 hours.

Posted by Philip Martin <ph...@codematters.co.uk>.
Jean-Marc Godbout <jm...@gmail.com> writes:

> Actually, I might be wrong about "svn list" after all. After a couple
> more attempts, here's what I've got:
>
> svn - mod_dav_svn == slow
> cadaver - mod_dav_svn == slow
> cadaver - mod_dav == fast!
>
> The slow part of this equation seems to be mod_dav_svn. 

That's issue 2151.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Jean-Marc Godbout <jm...@gmail.com>.
Oh, I'm sorry if I was unclear. Issue 2151 has a reference to RFC
3253, Section 3.11 which states that DeltaV properties SHOULD NOT be
returned in an allprop request (for efficiency reasons actually).
"Checked-in" is a DeltaV property and shouldn't be returned in an
in an allprop request.

Your suggestion to only query for props that we would want also works
as it eliminates the most costful properties to generate. What about
adding a new prop on the server called has_props that would return the
correct value and we could query for that. That way we wouldn't need
to deprecate has_props. I'll try that today.

The problem I see with your solution though is that other dav clients
wouldn't get any speed improvements. If we removed DeltaV properties
from allprop requests, then we'd speed up every client, not just
subversion. It would be somewhat easier to implement I believe, so
I'll try it and see where I can get with it.

On 8/22/05, Ben Collins-Sussman <su...@collab.net> wrote:
> As an experiment, I tweaked the ra_dav code to request only the 5
> properties it needs to fill an svn_dirent_t, rather than requesting
> *all* properties.
> 
> The time it took to do an 'svn ls -R' against subversion own trunk
> URL went from about 30 seconds to 15 seconds.
> 
> We're still not out of the woods entirely, though.  svn_dirent_t has
> an annoying "has_props" boolean field that needs to be filled in.
> The current code loops through all the returned properties, and looks
> to see if any user-defined properties exist in the list.  In order to
> complete the patch, we'll have to invent some new server-generated
> prop that indicates "this resource contains user-defined properties",
> and add it to our list of requested props.  It also means that this
> speedup will only work against 1.3 mod_dav_svn servers;  when talking
> to older servers, we'll have to still fetch *all* properties just to
> find out if user-defined props exist.  :-(
> 
> I guess another option is to play with deprecating the 'has_props'
> field.  Maybe create an svn_ra_get_dir2() which doesn't fill in the
> has_props field by default, unless you explicitly ask it to.  (Does
> any client actually use the 'has_props' field?)
> 
> Here's the patch I was playing with.  (You can play with it, but not
> that it's not complete or totally correct!  svn_dirent_t->has_props
> is always 0.)
> 
> 
> 
> Index: subversion/libsvn_ra_dav/fetch.c
> ===================================================================
> --- subversion/libsvn_ra_dav/fetch.c    (revision 15883)
> +++ subversion/libsvn_ra_dav/fetch.c    (working copy)
> @@ -915,6 +915,20 @@
> }
> +
> +/* the properties we need to fill in an svn_dirent_t, used by
> +   svn_ra_get_dir(). */
> +static const ne_propname dirent_props[] =
> +{
> +  { "DAV:", "resourcetype" },         /* kind */
> +  { "DAV:", "getcontentlength" },     /* size */
> +  { "DAV:", "version-name" },         /* created_rev */
> +  { "DAV:", "creationdate" },         /* time */
> +  { "DAV:", "creator-displayname" },  /* last_author */
> +  { NULL }
> +};
> +
> +
> svn_error_t *svn_ra_dav__get_dir(svn_ra_session_t *session,
>                                    const char *path,
>                                    svn_revnum_t revision,
> @@ -961,7 +975,7 @@
>            PROPFIND on the directory of depth 1. */
>         SVN_ERR( svn_ra_dav__get_props(&resources, ras->sess,
>                                        final_url, NE_DEPTH_ONE,
> -                                     NULL, NULL /* all props */,
> pool) );
> +                                     NULL, dirent_props, pool) );
> 
>         /* Count the number of path components in final_url. */
>         final_url_n_components = svn_path_component_count(final_url);
> 
> 
> 
> 
> 
> 
> 
> 
> --
> www.collab.net  <>  CollabNet  |  Distributed Development On Demand
> 
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn list -R of medium-size repository takes 10 hours.

Posted by Ben Collins-Sussman <su...@collab.net>.
On Aug 22, 2005, at 2:18 PM, Jean-Marc Godbout wrote:

>>
>> Actually, I'm not so sure about that.  ra_dav usually asks for
>> specific properties.  I don't think it asks for <allprops> very often
>> at all.  You should check the code;  it may in fact be perfectly safe
>> to remove DeltaV props from an <allprop> response.
>>
>
> When I removed "Checked-in" svn ls failed saying it expected to find
> the Checked-in prop. But now that you are casting doubt in my mind,
> I'll have to try it again. But if in fact we don't need it, then I do
> agree that this would be a better solution.
>

Well, sure, 'svn ls' currently asks for <allprop>.  We all know  
that... that's what my sample patch changes, right?

I'm saying that that might be the *only* time we ask for <allprop>,  
or at least where it matters.  See if anything else fails.  Try  
running the regression suite over dav, for example.

>
> Agreed - Two propfinds on huge collections would take a while indeed.
> I like your solution.
>

Great, I look forward to your patches!  :-D



-- 
www.collab.net  <>  CollabNet  |  Distributed Development On Demand




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Jean-Marc Godbout <jm...@gmail.com>.
> 
> Actually, I'm not so sure about that.  ra_dav usually asks for
> specific properties.  I don't think it asks for <allprops> very often
> at all.  You should check the code;  it may in fact be perfectly safe
> to remove DeltaV props from an <allprop> response.

When I removed "Checked-in" svn ls failed saying it expected to find
the Checked-in prop. But now that you are casting doubt in my mind,
I'll have to try it again. But if in fact we don't need it, then I do
agree that this would be a better solution.

> 
> > 2. If we make the clients query for the has_props, then we would need
> > to handle the case of the older server. This is fixable. If we get a
> > 404 for that property, then we do an allprop and request everything -
> > using the old way to determine has_props. This would be much slower,
> > but not break any existing functionality.
> 
> That's one solution.  But then it means doing two PROPFIND requests
> instead of one, when talking to older servers.  It reeks of evil:
> "Subversion 1.3 now runs 'svn ls' twice as fast, but it's even
> *slower* when you point it to a pre-1.3 server!"
> 
> There may be more clever solutions.  Like, for example, in the
> PROPFINDs for vcc, baseline, etc. which lead up to the "final"
> PROPFIND, we could add the 'has_props' to our list of props to
> fetch.  Then we'd know -- early on -- if 'has_props' is supported or
> not.  Then the final PROPFIND can either be a target list, or an
> <allprop>.

Agreed - Two propfinds on huge collections would take a while indeed.
I like your solution.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn list -R of medium-size repository takes 10 hours.

Posted by Ben Collins-Sussman <su...@collab.net>.
On Aug 22, 2005, at 2:43 PM, Tobias Ringström wrote:
>
> There's another clever solution to avoid the extra PROPFIND in the  
> issue (2151). The idea is to use OPTIONS, but I'll not repeat the  
> details here. It's all in the issue. The techique is generic and  
> can be used in other similar situations too to avoid extra requests  
> when talking to older servers.
>

Ah, good.  Jean-Marc, that's definitely worth looking into.  It may  
be more elegant to just do feature-exchanges in an OPTIONS request.





-- 
www.collab.net  <>  CollabNet  |  Distributed Development On Demand




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn list -R of medium-size repository takes 10 hours.

Posted by Tobias Ringström <to...@ringstrom.mine.nu>.
Ben Collins-Sussman wrote:

> That's one solution.  But then it means doing two PROPFIND requests  
> instead of one, when talking to older servers.  It reeks of evil:   
> "Subversion 1.3 now runs 'svn ls' twice as fast, but it's even  
> *slower* when you point it to a pre-1.3 server!"
>
> There may be more clever solutions.  Like, for example, in the  
> PROPFINDs for vcc, baseline, etc. which lead up to the "final"  
> PROPFIND, we could add the 'has_props' to our list of props to  
> fetch.  Then we'd know -- early on -- if 'has_props' is supported or  
> not.  Then the final PROPFIND can either be a target list, or an  
> <allprop>.

There's another clever solution to avoid the extra PROPFIND in the issue 
(2151). The idea is to use OPTIONS, but I'll not repeat the details 
here. It's all in the issue. The techique is generic and can be used in 
other similar situations too to avoid extra requests when talking to 
older servers.

/Tobias


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Ben Collins-Sussman <su...@collab.net>.
On Aug 22, 2005, at 1:54 PM, Jean-Marc Godbout wrote:

> Yes, I was calmly eating my cheerios when I realised that I wasn't
> thinking about backwards compatibility.
>
> 1. If we simply remove all the DeltaV props, then we are preventing
> any old clients from connecting as they expect Checked-in to be there.
>

Actually, I'm not so sure about that.  ra_dav usually asks for  
specific properties.  I don't think it asks for <allprops> very often  
at all.  You should check the code;  it may in fact be perfectly safe  
to remove DeltaV props from an <allprop> response.


> 2. If we make the clients query for the has_props, then we would need
> to handle the case of the older server. This is fixable. If we get a
> 404 for that property, then we do an allprop and request everything -
> using the old way to determine has_props. This would be much slower,
> but not break any existing functionality.

That's one solution.  But then it means doing two PROPFIND requests  
instead of one, when talking to older servers.  It reeks of evil:   
"Subversion 1.3 now runs 'svn ls' twice as fast, but it's even  
*slower* when you point it to a pre-1.3 server!"

There may be more clever solutions.  Like, for example, in the  
PROPFINDs for vcc, baseline, etc. which lead up to the "final"  
PROPFIND, we could add the 'has_props' to our list of props to  
fetch.  Then we'd know -- early on -- if 'has_props' is supported or  
not.  Then the final PROPFIND can either be a target list, or an  
<allprop>.




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by "C. Michael Pilato" <cm...@collab.net>.
Jean-Marc Godbout <jm...@gmail.com> writes:

> Yes, I was calmly eating my cheerios when I realised that I wasn't
> thinking about backwards compatibility.

Hm.  I too find that Cheerios destroys my ability to think straight
(though less so with the Honey Nut and Apple Cinnamon varieties).

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Jean-Marc Godbout <jm...@gmail.com>.
Yes, I was calmly eating my cheerios when I realised that I wasn't
thinking about backwards compatibility.

1. If we simply remove all the DeltaV props, then we are preventing
any old clients from connecting as they expect Checked-in to be there.

2. If we make the clients query for the has_props, then we would need
to handle the case of the older server. This is fixable. If we get a
404 for that property, then we do an allprop and request everything -
using the old way to determine has_props. This would be much slower,
but not break any existing functionality.

Solution 2 would mean that only clients and servers that are newer
than *now* would still be slow. However, they would all work - And
newer clients and servers would be faster. It seems like solution 1
would be better suited to a major upgrade.



On 8/22/05, Ben Collins-Sussman <su...@collab.net> wrote:
> (Please keep the dev@ list cc'd on all discussion.)
> 
> 
> On Aug 22, 2005, at 1:32 PM, Jean-Marc Godbout wrote:
> 
> > Oh, I'm sorry if I was unclear. Issue 2151 has a reference to RFC
> > 3253, Section 3.11 which states that DeltaV properties SHOULD NOT be
> > returned in an allprop request (for efficiency reasons actually).
> > "Checked-in" is a DeltaV property and shouldn't be returned in an
> > in an allprop request.
> >
> 
> Wow, I had no idea!  We're really out of date.  When we started
> subversion 5 years ago, the DeltaV spec was much different (and only
> an early draft).  Thanks for pointing that out!
> 
> As you suggest, it's probably a good idea to follow this part of the
> spec.  It would make generic DAV clients faster... such as when
> mounting a Subversion repository in Windows or OSX.  Unfortunately, I
> suspect it will require a change to mod_dav, not mod_dav_svn.  I
> don't actually think that logic is under subversion's control.
> 
> 
> > Your suggestion to only query for props that we would want also works
> > as it eliminates the most costful properties to generate. What about
> > adding a new prop on the server called has_props that would return the
> > correct value and we could query for that.
> 
> That's exactly what I suggested in my previous mail.  :-)  The
> problem is, what happens when you run 'svn ls' against an older
> server, and you get a 404 on that particular new property?  How are
> you going to find out if user-defined props exist?  That's the
> problem which needs discussion.
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn list -R of medium-size repository takes 10 hours.

Posted by Ben Collins-Sussman <su...@collab.net>.
(Please keep the dev@ list cc'd on all discussion.)


On Aug 22, 2005, at 1:32 PM, Jean-Marc Godbout wrote:

> Oh, I'm sorry if I was unclear. Issue 2151 has a reference to RFC
> 3253, Section 3.11 which states that DeltaV properties SHOULD NOT be
> returned in an allprop request (for efficiency reasons actually).
> "Checked-in" is a DeltaV property and shouldn't be returned in an
> in an allprop request.
>

Wow, I had no idea!  We're really out of date.  When we started  
subversion 5 years ago, the DeltaV spec was much different (and only  
an early draft).  Thanks for pointing that out!

As you suggest, it's probably a good idea to follow this part of the  
spec.  It would make generic DAV clients faster... such as when  
mounting a Subversion repository in Windows or OSX.  Unfortunately, I  
suspect it will require a change to mod_dav, not mod_dav_svn.  I  
don't actually think that logic is under subversion's control.


> Your suggestion to only query for props that we would want also works
> as it eliminates the most costful properties to generate. What about
> adding a new prop on the server called has_props that would return the
> correct value and we could query for that.

That's exactly what I suggested in my previous mail.  :-)  The  
problem is, what happens when you run 'svn ls' against an older  
server, and you get a 404 on that particular new property?  How are  
you going to find out if user-defined props exist?  That's the  
problem which needs discussion.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Ben Collins-Sussman <su...@collab.net>.
As an experiment, I tweaked the ra_dav code to request only the 5  
properties it needs to fill an svn_dirent_t, rather than requesting  
*all* properties.

The time it took to do an 'svn ls -R' against subversion own trunk  
URL went from about 30 seconds to 15 seconds.

We're still not out of the woods entirely, though.  svn_dirent_t has  
an annoying "has_props" boolean field that needs to be filled in.   
The current code loops through all the returned properties, and looks  
to see if any user-defined properties exist in the list.  In order to  
complete the patch, we'll have to invent some new server-generated  
prop that indicates "this resource contains user-defined properties",  
and add it to our list of requested props.  It also means that this  
speedup will only work against 1.3 mod_dav_svn servers;  when talking  
to older servers, we'll have to still fetch *all* properties just to  
find out if user-defined props exist.  :-(

I guess another option is to play with deprecating the 'has_props'  
field.  Maybe create an svn_ra_get_dir2() which doesn't fill in the  
has_props field by default, unless you explicitly ask it to.  (Does  
any client actually use the 'has_props' field?)

Here's the patch I was playing with.  (You can play with it, but not  
that it's not complete or totally correct!  svn_dirent_t->has_props  
is always 0.)



Index: subversion/libsvn_ra_dav/fetch.c
===================================================================
--- subversion/libsvn_ra_dav/fetch.c    (revision 15883)
+++ subversion/libsvn_ra_dav/fetch.c    (working copy)
@@ -915,6 +915,20 @@
}
+
+/* the properties we need to fill in an svn_dirent_t, used by
+   svn_ra_get_dir(). */
+static const ne_propname dirent_props[] =
+{
+  { "DAV:", "resourcetype" },         /* kind */
+  { "DAV:", "getcontentlength" },     /* size */
+  { "DAV:", "version-name" },         /* created_rev */
+  { "DAV:", "creationdate" },         /* time */
+  { "DAV:", "creator-displayname" },  /* last_author */
+  { NULL }
+};
+
+
svn_error_t *svn_ra_dav__get_dir(svn_ra_session_t *session,
                                   const char *path,
                                   svn_revnum_t revision,
@@ -961,7 +975,7 @@
           PROPFIND on the directory of depth 1. */
        SVN_ERR( svn_ra_dav__get_props(&resources, ras->sess,
                                       final_url, NE_DEPTH_ONE,
-                                     NULL, NULL /* all props */,  
pool) );
+                                     NULL, dirent_props, pool) );

        /* Count the number of path components in final_url. */
        final_url_n_components = svn_path_component_count(final_url);








-- 
www.collab.net  <>  CollabNet  |  Distributed Development On Demand




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Michael Sinz <Mi...@sinz.org>.
Ben Collins-Sussman wrote:
> 
> On Aug 22, 2005, at 9:44 AM, Michael Sinz wrote:
> 
>>
>> Now if I could only remember when I last did "svn list" - other than
>> to test this problem...
> 
> 
> It's used quite extensively in most svn GUI clients (like 
> TortoiseSVN).  Such clients almost always include a "repository 
> browser" for selecting URLs.

Ahh, that explains the TSVN performance difference in the repository
browser when local vs remote.

I believe that making the client only ask for what it needs would be
the best bet - especially since that could drastically reduce the
network overhead.

-- 
Michael Sinz                     Technology and Engineering Director/Consultant
"Starting Startups"                                mailto:michael.sinz@sinz.org
My place on the web                            http://www.sinz.org/Michael.Sinz

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Ben Collins-Sussman <su...@collab.net>.
On Aug 22, 2005, at 9:44 AM, Michael Sinz wrote:

>
> Now if I could only remember when I last did "svn list" - other than
> to test this problem...

It's used quite extensively in most svn GUI clients (like  
TortoiseSVN).  Such clients almost always include a "repository  
browser" for selecting URLs.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Michael Sinz <Mi...@sinz.org>.
Ben Collins-Sussman wrote:
> Jean-Marc Godbout wrote:
> 
>> I do realise that we are getting a bit far from the original issue
>> here. Removing some of the props have 3 benefits:
>> 1. Making us more "DeltaV" compliant... we don't care so much for this
> 
> 
> I don't understand.  It sounds like you're recommending that mod_dav_svn
> stop generating the DAV:checked-in property, and that doing so would
> make us more DeltaV compliant.  As I understand it, it would make us
> *less* DeltaV compliant!  It's a standard DeltaV "live property" that
> any DeltaV client would expect to see when fetching all properties
> (<allprop>).

That would be bad - I hope that is not what was suggested.

> Perhaps what you mean is:  we should make svn_ra_get_dir() generate a
> very specific, targeted list of properties to fetch.  It should ask for
> only those live/dead properties that it actually needs to fill out the
> svn_dirent_t structure.  (DAV:checked-in is not needed.)  svn_ra_stat()
> would need to do the same.

If he did not mean this, I think you are onto something here.  It would
not help old clients but new clients could be significantly lower network
overhead (which for non-local networks can become an issue)

Now if I could only remember when I last did "svn list" - other than
to test this problem...

-- 
Michael Sinz                     Technology and Engineering Director/Consultant
"Starting Startups"                                mailto:michael.sinz@sinz.org
My place on the web                            http://www.sinz.org/Michael.Sinz

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Ben Collins-Sussman <su...@collab.net>.
Jean-Marc Godbout wrote:

> I do realise that we are getting a bit far from the original issue
> here. Removing some of the props have 3 benefits:
> 1. Making us more "DeltaV" compliant... we don't care so much for this

I don't understand.  It sounds like you're recommending that mod_dav_svn 
stop generating the DAV:checked-in property, and that doing so would 
make us more DeltaV compliant.  As I understand it, it would make us 
*less* DeltaV compliant!  It's a standard DeltaV "live property" that 
any DeltaV client would expect to see when fetching all properties 
(<allprop>).

Perhaps what you mean is:  we should make svn_ra_get_dir() generate a 
very specific, targeted list of properties to fetch.  It should ask for 
only those live/dead properties that it actually needs to fill out the 
svn_dirent_t structure.  (DAV:checked-in is not needed.)  svn_ra_stat() 
would need to do the same.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Jean-Marc Godbout <jm...@gmail.com>.
One thing that issue 2151 does mention is that we shouldn't have any
DeltaV properties in allprop queries, which we currently do. One of
the more expensive ones is "checked-in". Simply removing it causes
"svn list" to fail because it expects it. However, returning a bogus
value, such as simply "/" allows "svn list" to complete sucesfully,
and it does improve the times a whole lot! (note that I don't know how
the rest of subversion reacts to this, I think commit makes use of
"checked-in").

200 file test:
With:      1.09
Without:  0.73

800 file test
With:      6.38
Without:  3.20  

1600 file test
With:      20.43
Without:   8.84

The difference in time is pretty important. Also, note that this is in
a VM, times are somewhat different in real world - Still, we are
talking about twice the speed.

An important note is that when removing the "checked-in" property,
traditional dav clients like cadaver also gain a pretty substantial
speed improvement (the server is much speedier).

I do realise that we are getting a bit far from the original issue
here. Removing some of the props have 3 benefits:
1. Making us more "DeltaV" compliant... we don't care so much for this
2. The server responds faster to queries
3. The server needs to send less data

I think part 3 has been the bulk of the issue in Carsten's case. I
doubt that there will be THAT MUCH of a difference by simply removing
the un needed properties in terms of the amount of data... but then
again I could be wrong. As well, with mod_deflate enabled the amount
of data should become less of an issue and #2 would become more
important.

There are a couple "todo"s in issue 2151 which I'll attempt to get to
in the next few days. But as a summary, it *seems* possible to
partially fix Carsten's problem.

Cheers
Jean-Marc Godbout

On 8/21/05, Philip Martin <ph...@codematters.co.uk> wrote:
> Jean-Marc Godbout <jm...@gmail.com> writes:
> 
> > Actually, I might be wrong about "svn list" after all. After a couple
> > more attempts, here's what I've got:
> >
> > svn - mod_dav_svn == slow
> > cadaver - mod_dav_svn == slow
> > cadaver - mod_dav == fast!
> >
> > The slow part of this equation seems to be mod_dav_svn.
> 
> That's issue 2151.
> 
> --
> Philip Martin
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn list -R of medium-size repository takes 10 hours.

Posted by Jean-Marc Godbout <jm...@gmail.com>.
Actually, I might be wrong about "svn list" after all. After a couple
more attempts, here's what I've got:

svn - mod_dav_svn == slow
cadaver - mod_dav_svn == slow
cadaver - mod_dav == fast!

The slow part of this equation seems to be mod_dav_svn. 

On 8/21/05, Jean-Marc Godbout <jm...@gmail.com> wrote:
> Yes, a simple ethereal log zipped and sent to my mailbox would help me
> see if there are any other issues that might indicate why you are
> sending so much data.
> 
> The bigger problem here is that "svn list" is slow. It will send large
> amounts of data and take some time no matter what. Other dav clients
> such as cadaver take a very comparable amount of time. In fact, apart
> from the initial http requests, cadaver acts in a very similar fashion
> than the subversion client - and takes about the same amount of time.
> 
> So there is not much hope for a massive improvement of "svn list", you
> probably won't see improvements that would cut your request in half.
> HOWEVER, I have a sneaking suspicion that "svn list -R" might be
> optimizable (if that's a word). I will look into that today and get
> back to you if I find something interesting.
> 
> Thanks,
> Jean-Marc
> 
> On 8/21/05, Carsten Koch <Ca...@icem.com> wrote:
> > Jean-Marc Godbout wrote:
> > > I am currently looking into this issue.
> >
> > Thanks! That is good news.
> >
> >
> >
> > > If you
> > > could send me an ethereal log (assuming your stuff isn't too private)
> > > I could confirm this for you.
> >
> > No problem.
> > I guess sending you the whole gigabyte will not be useful,
> > since it will basically be the same data over and over again.
> > Would a small "svn list -R http://..." ethereal log be sufficient?
> >
> > Carsten.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> > For additional commands, e-mail: dev-help@subversion.tigris.org
> >
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn list -R of medium-size repository takes 10 hours.

Posted by Jean-Marc Godbout <jm...@gmail.com>.
I am currently looking into this issue. It seems like there is
somewhat of an inherent slowness with WebDAV/DeltaV because of the
amount of requests and the format in which they return data. The
gigabytes of data do seem odd though.

I think that it will always be slow to a certain extent, but there
should be ways to improve it. One current issue is that mod_dav_svn
(the apache module part of the subversion server). Is taking it's
sweet time to answer certain requests that involve many different
elements.

Here is a simple example of a dav request - note that this is running
in a debugger, in a VM so the times are simply examples:

1. There are a couple requests going to the server to resolve the
proper folder (5 seconds of chatter).
2. Then the "proper" dav request is made that will list a folder.
(mod_dav_svn takes 10 seconds to reply).
3. The folder listing is sent back to the client. (23 seconds of
active network traffic).

The parts that could be reasonably cut down would be 1 and 2. There is
unfortunatly no simple solution for #3. There is alot of data being
sent back. In your specific case given that you are effectively doing
a ton of lists (-R), the amount of data likely comes from #3. If you
could send me an ethereal log (assuming your stuff isn't too private)
I could confirm this for you.

Regards,

Jean-Marc

On 8/20/05, Tobias Ringström <to...@ringstrom.mine.nu> wrote:
> Ben Collins-Sussman wrote:
> 
> > This is one of many situations where "being a DAV client" is hurting
> > the svn client's performance.   In DAV-land, the proper way to get a
> > list of children in a collection (directory) is to do a depth-1
> > PROPFIND.  This has the unfortunate side effect of not just listing
> > all children, but also all of their properties -- both user-generated
> > and server-generated ones.  All in XML.
> >
> > If we were to just say "forget it, the svn client isn't a DAV  client"
> > (which has been discussed before), we could make up yet  another
> > custom protocol response/request that only svn and  mod_dav_svn would
> > understand.  We're already doing that in many  places where subversion
> > concepts don't line up with WebDAV/DeltaV.   Doing it for the sake of
> > performance gains would be a new policy.  I  don't think everyone is
> > (yet) in agreement yet about this.
> 
> Anyone who's interested in making 'svn ls' faster over http should take
> a look at issue 2151.
> 
>     http://subversion.tigris.org/issues/show_bug.cgi?id=2151
> 
> In this case, I'd say it's more a problem of how we use DAV, than a
> problem with DAV itself.
> 
> /Tobias
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn list -R of medium-size repository takes 10 hours.

Posted by Carsten Koch <Ca...@icem.com>.
Jean-Marc Godbout wrote:
> For Ben's suggestion of using mod_deflate, I think it could be useful
> in your case.

I agree.


> To enable it, get mod_deflate compiled and installed, add the proper
> LoadModule to your apache conf file. Then add "SetOutputFilter
> DEFLATE" to your <Location ...> for the repository.

I will go to Hannover tomorrow and talk to my coworkers who have
set up our svn apache server.
We'll get that changed and I'll rerun the benchmark over my slow
ISDN connection when I am back here on Thursday.


Carsten.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Carsten Koch <Ca...@icem.com>.
Jean-Marc Godbout wrote:
> For Ben's suggestion of using mod_deflate, I think it could be useful
> in your case. I just ran a test on a real system and a sample svn list
> went from 60k to 16k.

We have mod_deflate active now on our svn server and the small test case
(svn list -R on a directory tree with only 25 entries) goes down from
8 seconds to 3 seconds.
Not the facto 100 improvement I am looking for, but nevertheless more
than twice as fast.

Thanks to Ben and you for the suggestion!

Carsten.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Jean-Marc Godbout <jm...@gmail.com>.
Oh, I'm sorry, I just realised I had not run all the tests. Please
don't waste your time looking at this patch until I make sure it
doesn't break tests.

On 8/26/05, Jean-Marc Godbout <jm...@gmail.com> wrote:
> I have a simple patch that trims down time by about half in my tests.
> 
> It's NOT complete, here's what works:
> 
> 1. Patched client + Patched server == fast
> 2. Patched client + Unpatched server == broken
> 3. Unpatched client + Patched server == same speed as before
> 4. Dav client + Patched server == same speed as before
> 
> I need to fix #2 before considering it a real patch.
> 
> Essentially this implements some of the suggestions in 2151 - starting
> with Ben's patch. Note that I don't think it's wise to remove DeltaV
> props from an allprop PROPFIND as this would break unpatched clients
> trying to access patched servers. We can discuss this though.
> 
> On 8/25/05, Carsten Koch <Ca...@icem.com> wrote:
> > Jean-Marc Godbout wrote:
> > > For Ben's suggestion of using mod_deflate, I think it could be useful
> > > in your case. I just ran a test on a real system and a sample svn list
> > > went from 60k to 16k.
> >
> > We have mod_deflate active now on our svn server and the small test case
> > (svn list -R on a directory tree with only 25 entries) goes down from
> > 8 seconds to 3 seconds.
> > Not the facto 100 improvement I am looking for, but nevertheless more
> > than twice as fast.
> >
> > Thanks to Ben and you for the suggestion!
> >
> > Carsten.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> > For additional commands, e-mail: dev-help@subversion.tigris.org
> >
> >
> >
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn list -R of medium-size repository takes 10 hours.

Posted by Jean-Marc Godbout <jm...@gmail.com>.
I am actually on my way out the door for a road trip across america
(going back home). I will work on suggestion 1 during the evenings. It
should be pretty easy to add supports_deadprop_count to the session
(with the added benefit of making it easier to move the code to detect
deadprop-count support).

I will also look at #2 and see if it's possible to do it without
breaking everything. It might require a separate patch but I'll see
what I can do.

Thanks for the suggestions
Jean-Marc

On 8/29/05, Carsten Koch <Ca...@icem.com> wrote:
> Jean-Marc Godbout wrote:
> ...
> > The patch does an extra propfind before doing the "big" propfind to
> > probe and see if the server supports the deadprop-count prop.
> 
> 
> Excellent!
> 
> I have three questions:
> 
> 1) is "supports_deadprop_count" called only once per "svn list" run
>     or is it called once per directory processed?
> 
> 
> 2)
> 
> +      if (supports_deadprop_count == TRUE)
> +        {
> +          SVN_ERR( svn_ra_dav__get_props(&resources, ras->sess,
> +                                         final_url, NE_DEPTH_ONE,
> +                                         NULL, dirent_props, pool) );
> +        }
> 
> Would it be possible to distinguish between "svn list" and "svn list -v"
> in the above statement, like this:
> 
> static const ne_propname verbose_dirent_props[] =
> {
>    { "DAV:", "resourcetype" },         /* kind */
>    { "DAV:", "getcontentlength" },     /* size */
>    { "DAV:", "version-name" },         /* created_rev */
>    { "DAV:", "creationdate" },         /* time */
>    { "DAV:", "creator-displayname" },  /* last_author */
>    { SVN_DAV_PROP_NS_DAV, "deadprop-count" },       /* has_props */
>    { NULL }
> }
> 
> static const ne_propname short_dirent_props[] =
> {
>    { "DAV:", "resourcetype" },         /* kind */
>    { NULL }
> }
> 
> ...
>        if (supports_deadprop_count)
>          {
>            if (verbose_listing)
>               SVN_ERR( svn_ra_dav__get_props(&resources, ras->sess,
>                                           final_url, NE_DEPTH_ONE,
>                                           NULL, verbose_dirent_props, pool) );
>            else
>               SVN_ERR( svn_ra_dav__get_props(&resources, ras->sess,
>                                           final_url, NE_DEPTH_ONE,
>                                           NULL, short_dirent_props, pool) );
> 
>          }
> 
> ?
> 
> 
> 3) Would it be possible to use NE_DEPTH_INFINITE in the case of "svn list -R"?
> 
> 
> 
> 
> Carsten.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn list -R of medium-size repository takes 10 hours.

Posted by Jean-Marc Godbout <jm...@gmail.com>.
There was in fact a test break. I fixed it and will include it in my next patch.

In issue 2151 (http://subversion.tigris.org/issues/show_bug.cgi?id=2151):
"The URIs listed in gather_propsets() are returned in
the OPTIONS response. Since we always(?) issue an OPTIONS request, then we
extract them at that point, recording the presence in the session state."

Unfortunatly, from what I can see int the code we only issue an
OPTIONS request on commit. Should we always do it? I will try to add
it and see if there are any issues that pop up.


On 8/26/05, Jean-Marc Godbout <jm...@gmail.com> wrote:
> Oh, I'm sorry, I just realised I had not run all the tests. Please
> don't waste your time looking at this patch until I make sure it
> doesn't break tests.
> 
> On 8/26/05, Jean-Marc Godbout <jm...@gmail.com> wrote:
> > I have a simple patch that trims down time by about half in my tests.
> >
> > It's NOT complete, here's what works:
> >
> > 1. Patched client + Patched server == fast
> > 2. Patched client + Unpatched server == broken
> > 3. Unpatched client + Patched server == same speed as before
> > 4. Dav client + Patched server == same speed as before
> >
> > I need to fix #2 before considering it a real patch.
> >
> > Essentially this implements some of the suggestions in 2151 - starting
> > with Ben's patch. Note that I don't think it's wise to remove DeltaV
> > props from an allprop PROPFIND as this would break unpatched clients
> > trying to access patched servers. We can discuss this though.
> >
> > On 8/25/05, Carsten Koch <Ca...@icem.com> wrote:
> > > Jean-Marc Godbout wrote:
> > > > For Ben's suggestion of using mod_deflate, I think it could be useful
> > > > in your case. I just ran a test on a real system and a sample svn list
> > > > went from 60k to 16k.
> > >
> > > We have mod_deflate active now on our svn server and the small test case
> > > (svn list -R on a directory tree with only 25 entries) goes down from
> > > 8 seconds to 3 seconds.
> > > Not the facto 100 improvement I am looking for, but nevertheless more
> > > than twice as fast.
> > >
> > > Thanks to Ben and you for the suggestion!
> > >
> > > Carsten.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> > > For additional commands, e-mail: dev-help@subversion.tigris.org
> > >
> > >
> > >
> >
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn list -R of medium-size repository takes 10 hours.

Posted by Ben Collins-Sussman <su...@collab.net>.
On Aug 29, 2005, at 4:17 AM, Carsten Koch wrote:
>
> 3) Would it be possible to use NE_DEPTH_INFINITE in the case of  
> "svn list -R"?
>

Such a request would fail 99% of the time, because depth-infinity is  
deactivated in mod_dav by default... it's considered a DoS attack by  
most.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Carsten Koch <Ca...@icem.com>.
Jean-Marc Godbout wrote:
...
> The patch does an extra propfind before doing the "big" propfind to
> probe and see if the server supports the deadprop-count prop.


Excellent!

I have three questions:

1) is "supports_deadprop_count" called only once per "svn list" run
    or is it called once per directory processed?


2)

+      if (supports_deadprop_count == TRUE)
+        {
+          SVN_ERR( svn_ra_dav__get_props(&resources, ras->sess,
+                                         final_url, NE_DEPTH_ONE,
+                                         NULL, dirent_props, pool) );
+        }

Would it be possible to distinguish between "svn list" and "svn list -v"
in the above statement, like this:

static const ne_propname verbose_dirent_props[] =
{
   { "DAV:", "resourcetype" },         /* kind */
   { "DAV:", "getcontentlength" },     /* size */
   { "DAV:", "version-name" },         /* created_rev */
   { "DAV:", "creationdate" },         /* time */
   { "DAV:", "creator-displayname" },  /* last_author */
   { SVN_DAV_PROP_NS_DAV, "deadprop-count" },       /* has_props */
   { NULL }
}

static const ne_propname short_dirent_props[] =
{
   { "DAV:", "resourcetype" },         /* kind */
   { NULL }
}

...
       if (supports_deadprop_count)
         {
           if (verbose_listing)
              SVN_ERR( svn_ra_dav__get_props(&resources, ras->sess,
                                          final_url, NE_DEPTH_ONE,
                                          NULL, verbose_dirent_props, pool) );
           else
              SVN_ERR( svn_ra_dav__get_props(&resources, ras->sess,
                                          final_url, NE_DEPTH_ONE,
                                          NULL, short_dirent_props, pool) );

         }

?


3) Would it be possible to use NE_DEPTH_INFINITE in the case of "svn list -R"?




Carsten.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Jean-Marc Godbout <jm...@gmail.com>.
Here is a patch I will submit to the list later on today, after a
friend of mine runs all the unit tests on it as well.

The patch does an extra propfind before doing the "big" propfind to
probe and see if the server supports the deadprop-count prop. I
realise that this is an extra round trip. Here's why.

We do not do an OPTIONS request at the beginning. This means that to
implement the OPTIONS solution I would have needed to make a new
OPTIONS request. This would have been an extra round trip as well. I
think that it's still a good solution, just better suited to an
independant patch.

I then wanted to simply add the deadprop-count prop to a propfind in
the initialization phase. Unfortunatly I didn't find any of those
propfinds that had both a svn_ra_session_t (to store the result) and
the properties at the same time.

So my compromise was to do an extra propfind, but it's very small as
it only request one prop.

One thing is that I didn't remove all the DeltaV props from an allprop
request on the server as suggested in the bug report. If I did that,
than any unpatched clients would fail when trying to access a patched
server. That should still be considered though as it would also speed
up generic dav clients.

With this patch we are talking about time reductions in the order of
50-75% (sometimes even better) and bandwidth reductions of the same
magnitude. This should make a big difference for heavy users of ls.
Given that we are doing an extra PROPFIND, there is still room for
improvement, but this falls more into the domain of issue 1161.

Thank you everyone for your suggestions and tips. They were very useful.

On 8/26/05, Jean-Marc Godbout <jm...@gmail.com> wrote:
> There was in fact a test break. I fixed it and will include it in my next patch.
> 
> In issue 2151 (http://subversion.tigris.org/issues/show_bug.cgi?id=2151):
> "The URIs listed in gather_propsets() are returned in
> the OPTIONS response. Since we always(?) issue an OPTIONS request, then we
> extract them at that point, recording the presence in the session state."
> 
> Unfortunatly, from what I can see int the code we only issue an
> OPTIONS request on commit. Should we always do it? I will try to add
> it and see if there are any issues that pop up.
> 
> 
> On 8/26/05, Jean-Marc Godbout <jm...@gmail.com> wrote:
> > Oh, I'm sorry, I just realised I had not run all the tests. Please
> > don't waste your time looking at this patch until I make sure it
> > doesn't break tests.
> >
> > On 8/26/05, Jean-Marc Godbout <jm...@gmail.com> wrote:
> > > I have a simple patch that trims down time by about half in my tests.
> > >
> > > It's NOT complete, here's what works:
> > >
> > > 1. Patched client + Patched server == fast
> > > 2. Patched client + Unpatched server == broken
> > > 3. Unpatched client + Patched server == same speed as before
> > > 4. Dav client + Patched server == same speed as before
> > >
> > > I need to fix #2 before considering it a real patch.
> > >
> > > Essentially this implements some of the suggestions in 2151 - starting
> > > with Ben's patch. Note that I don't think it's wise to remove DeltaV
> > > props from an allprop PROPFIND as this would break unpatched clients
> > > trying to access patched servers. We can discuss this though.
> > >
> > > On 8/25/05, Carsten Koch <Ca...@icem.com> wrote:
> > > > Jean-Marc Godbout wrote:
> > > > > For Ben's suggestion of using mod_deflate, I think it could be useful
> > > > > in your case. I just ran a test on a real system and a sample svn list
> > > > > went from 60k to 16k.
> > > >
> > > > We have mod_deflate active now on our svn server and the small test case
> > > > (svn list -R on a directory tree with only 25 entries) goes down from
> > > > 8 seconds to 3 seconds.
> > > > Not the facto 100 improvement I am looking for, but nevertheless more
> > > > than twice as fast.
> > > >
> > > > Thanks to Ben and you for the suggestion!
> > > >
> > > > Carsten.
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> > > > For additional commands, e-mail: dev-help@subversion.tigris.org
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> > For additional commands, e-mail: dev-help@subversion.tigris.org
> >
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 
> 
>

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Jean-Marc Godbout <jm...@gmail.com>.
I have a simple patch that trims down time by about half in my tests.

It's NOT complete, here's what works:

1. Patched client + Patched server == fast
2. Patched client + Unpatched server == broken
3. Unpatched client + Patched server == same speed as before
4. Dav client + Patched server == same speed as before

I need to fix #2 before considering it a real patch.

Essentially this implements some of the suggestions in 2151 - starting
with Ben's patch. Note that I don't think it's wise to remove DeltaV
props from an allprop PROPFIND as this would break unpatched clients
trying to access patched servers. We can discuss this though.

On 8/25/05, Carsten Koch <Ca...@icem.com> wrote:
> Jean-Marc Godbout wrote:
> > For Ben's suggestion of using mod_deflate, I think it could be useful
> > in your case. I just ran a test on a real system and a sample svn list
> > went from 60k to 16k.
> 
> We have mod_deflate active now on our svn server and the small test case
> (svn list -R on a directory tree with only 25 entries) goes down from
> 8 seconds to 3 seconds.
> Not the facto 100 improvement I am looking for, but nevertheless more
> than twice as fast.
> 
> Thanks to Ben and you for the suggestion!
> 
> Carsten.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 
> 
>

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Jean-Marc Godbout <jm...@gmail.com>.
For Ben's suggestion of using mod_deflate, I think it could be useful
in your case. I just ran a test on a real system and a sample svn list
went from 60k to 16k.

To enable it, get mod_deflate compiled and installed, add the proper
LoadModule to your apache conf file. Then add "SetOutputFilter
DEFLATE" to your <Location ...> for the repository.

I think it would help you because of your relatively slow connection.
A word of caution though, in my case it was actually slower because of
the extra processing required by mod_deflate. In local networks I
suspect that mod_deflate would actually not improve performance at
all.

Do try it though and tell me if it makes a difference.

Jean-Marc

On 8/21/05, Jean-Marc Godbout <jm...@gmail.com> wrote:
> I am currently looking into this issue. It seems like there is
> somewhat of an inherent slowness with WebDAV/DeltaV because of the
> amount of requests and the format in which they return data. The
> gigabytes of data do seem odd though.
> 
> I think that it will always be slow to a certain extent, but there
> should be ways to improve it. One current issue is that mod_dav_svn
> (the apache module part of the subversion server). Is taking it's
> sweet time to answer certain requests that involve many different
> elements.
> 
> Here is a simple example of a dav request - note that this is running
> in a debugger, in a VM so the times are simply examples:
> 
> 1. There are a couple requests going to the server to resolve the
> proper folder (5 seconds of chatter).
> 2. Then the "proper" dav request is made that will list a folder.
> (mod_dav_svn takes 10 seconds to reply).
> 3. The folder listing is sent back to the client. (23 seconds of
> active network traffic).
> 
> The parts that could be reasonably cut down would be 1 and 2. There is
> unfortunatly no simple solution for #3. There is alot of data being
> sent back. In your specific case given that you are effectively doing
> a ton of lists (-R), the amount of data likely comes from #3. If you
> could send me an ethereal log (assuming your stuff isn't too private)
> I could confirm this for you.
> 
> Regards,
> 
> Jean-Marc
> 
> On 8/20/05, Tobias Ringström <to...@ringstrom.mine.nu> wrote:
> > Ben Collins-Sussman wrote:
> >
> > > This is one of many situations where "being a DAV client" is hurting
> > > the svn client's performance.   In DAV-land, the proper way to get a
> > > list of children in a collection (directory) is to do a depth-1
> > > PROPFIND.  This has the unfortunate side effect of not just listing
> > > all children, but also all of their properties -- both user-generated
> > > and server-generated ones.  All in XML.
> > >
> > > If we were to just say "forget it, the svn client isn't a DAV  client"
> > > (which has been discussed before), we could make up yet  another
> > > custom protocol response/request that only svn and  mod_dav_svn would
> > > understand.  We're already doing that in many  places where subversion
> > > concepts don't line up with WebDAV/DeltaV.   Doing it for the sake of
> > > performance gains would be a new policy.  I  don't think everyone is
> > > (yet) in agreement yet about this.
> >
> > Anyone who's interested in making 'svn ls' faster over http should take
> > a look at issue 2151.
> >
> >     http://subversion.tigris.org/issues/show_bug.cgi?id=2151
> >
> > In this case, I'd say it's more a problem of how we use DAV, than a
> > problem with DAV itself.
> >
> > /Tobias
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> > For additional commands, e-mail: dev-help@subversion.tigris.org
> >
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn list -R of medium-size repository takes 10 hours.

Posted by Ben Collins-Sussman <su...@collab.net>.
On Aug 18, 2005, at 11:15 PM, Michael Price wrote:

>> Well, an important question is whether that overhead is a  
>> constant, or
>> is always similarly proportional to the data size.
>>
>
> Using 60KB of network traffic to show the user 440 bytes is bad
> regardless. All knowing if its constant or not tells us is if we have
> a bad problem or a really bad problem.


This is one of many situations where "being a DAV client" is hurting  
the svn client's performance.   In DAV-land, the proper way to get a  
list of children in a collection (directory) is to do a depth-1  
PROPFIND.  This has the unfortunate side effect of not just listing  
all children, but also all of their properties -- both user-generated  
and server-generated ones.  All in XML.

If we were to just say "forget it, the svn client isn't a DAV  
client" (which has been discussed before), we could make up yet  
another custom protocol response/request that only svn and  
mod_dav_svn would understand.  We're already doing that in many  
places where subversion concepts don't line up with WebDAV/DeltaV.   
Doing it for the sake of performance gains would be a new policy.  I  
don't think everyone is (yet) in agreement yet about this.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn list -R of medium-size repository takes 10 hours.

Posted by Michael Price <ec...@gmail.com>.
> Well, an important question is whether that overhead is a constant, or
> is always similarly proportional to the data size.

Using 60KB of network traffic to show the user 440 bytes is bad
regardless. All knowing if its constant or not tells us is if we have
a bad problem or a really bad problem.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn list -R of medium-size repository takes 10 hours.

Posted by kf...@collab.net.
Carsten Koch <Ca...@icem.com> writes:
> I have now looked at a much smaller test case with ethereal.
> This test case does a 'svn list -R -v' on a directory
> that contains only 16 files and one subdirectory with
> another 8 files. So 'svn list -R -v' returns a total
> of 1465 bytes in 25 lines and 'svn list -R' returns a total
> of 440 bytes in 25 lines.
> 
> ethereal tells me that 60 k bytes in 113 packets were exchanged
> in 7.32 seconds over a 64 k bit ISDN line.
> 
> If I am doing my math correctly, the ISDN line is running at
> almost full speed. Meaning that my test case is slow due to
> bandwidth, not due to latency.
> So the only problem - at least in this test case - is,
> that 1465 bytes (or 440 bytes) of end result are packed
> in 60 k bytes of protocol, resulting in over 4000%
> (or over 13000% )protocol overhead.

Well, an important question is whether that overhead is a constant, or
is always similarly proportional to the data size.

What happens if you run the same experiments with much bigger
directories?

-Karl

-- 
www.collab.net  <>  CollabNet  |  Distributed Development On Demand

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org