You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Küng <to...@gmail.com> on 2012/01/04 22:16:55 UTC

svn ls performance

Hi,

Due to a report on the TSVN mailing list I found that the CL client has 
the same problem:
'svn list' takes forever in some situations.
I don't know what the problem exactly is, but it's easily reproducable:

svn ls http://plugins.svn.wordpress.org/ -v --depth=immediates
prints one entry, then never returns (ok, maybe not never. But waiting 
10 minutes is not enough).

however, an
svn ls http://plugins.svn.wordpress.org/ --depth=immediates
(same command as above, but without the verbose flag) returns the 
entries almost immediately.
I don't think that simply fetching the verbose info could take so much 
longer?

This is with svn 1.7.2.
The same works fine with svn 1.6.6 - haven't tested with later 1.6 svn 
clients since I don't have that version ready here on my machine.

using serf instead of neon doesn't help.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: svn ls performance

Posted by Mark Phippard <ma...@gmail.com>.
On Wed, Jan 4, 2012 at 5:42 PM, Konstantin Kolinko
<kn...@gmail.com> wrote:
> http://plugins.svn.wordpress.org/
> The page has a lot of subdirectories (nearly 26000)
> The server is 1.6.12.

And it is the top level folder too.  The revision files must be
enormous.  I wonder how much smaller the repository would be when
using the folder deltification patch that was Stefan sent to the list?
 I am guessing it would be significant, 90% smaller or more.  Do they
post a dump file anywhere?

It would be interesting to see if the smaller files would improve
performance of if the increased work added by the deltification would
overwhelm the performance.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: svn ls performance

Posted by Philip Martin <ph...@wandisco.com>.
Ivan Zhakov <iv...@visualsvn.com> writes:

>> I created a repository with 10,000 subdirs:
>>
>> #!/bin/bash
>> for i in `seq 0 999`;do
>>  svn mkdir -mm file://`pwd`/repo/A${i}{0,1,2,3,4,5,6,7,8,9}
>> done
>>
> As far I remember this issue occurs only over HTTP protocol and
> related to the fact that mod_dav/mod_dav_svn stores *ALL* XML report
> in memory before sending to server.

I was testing over HTTP, I'm only using file: to create the repo.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: svn ls performance

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On Thu, Jan 5, 2012 at 04:25, Philip Martin <ph...@wandisco.com> wrote:
> Konstantin Kolinko <kn...@gmail.com> writes:
>
>> 2012/1/5 Stefan Küng <to...@gmail.com>:
>>> Hi,
>>>
>>> Due to a report on the TSVN mailing list I found that the CL client has the
>>> same problem:
>>> 'svn list' takes forever in some situations.
>>> I don't know what the problem exactly is, but it's easily reproducable:
>>>
>>> svn ls http://plugins.svn.wordpress.org/ -v --depth=immediates
>>> prints one entry, then never returns (ok, maybe not never. But waiting 10
>>> minutes is not enough).
>>>
>>> however, an
>>> svn ls http://plugins.svn.wordpress.org/ --depth=immediates
>>> (same command as above, but without the verbose flag) returns the entries
>>> almost immediately.
>>> I don't think that simply fetching the verbose info could take so much
>>> longer?
>>>
>>> This is with svn 1.7.2.
>>> The same works fine with svn 1.6.6 - haven't tested with later 1.6 svn
>>> clients since I don't have that version ready here on my machine.
>>>
>>> using serf instead of neon doesn't help.
>>>
>>
>> http://plugins.svn.wordpress.org/
>> The page has a lot of subdirectories (nearly 26000)
>> The server is 1.6.12.
>>
>> Using client 1.6.17 (built by CollabNet, 32-bit, on Windows) it
>> printed the first line in 10 seconds, and never printed the rest. I
>> waited for ~1 minute. (Ctrl+C did not help, I had to unplug the
>> network cable)
>>
>> Using client 1.7.2 (from TortoiseSVN, 32-bit, on Windows) the result
>> is the same:
>> the first line in ~10 seconds, the rest of data - ~never.
>>
>> 1.6.17: Removing "--depth" option does not change anything.
>> 1.6.17: Removing "-v" the result appears in ~15 seconds and then takes
>> ~10 seconds to print the listing to the console.
>
> I created a repository with 10,000 subdirs:
>
> #!/bin/bash
> for i in `seq 0 999`;do
>  svn mkdir -mm file://`pwd`/repo/A${i}{0,1,2,3,4,5,6,7,8,9}
> done
>
As far I remember this issue occurs only over HTTP protocol and
related to the fact that mod_dav/mod_dav_svn stores *ALL* XML report
in memory before sending to server.


-- 
Ivan Zhakov

Re: svn ls performance

Posted by Stefan Fuhrmann <eq...@web.de>.
On 05.01.2012 01:25, Philip Martin wrote:
> Konstantin Kolinko<kn...@gmail.com>  writes:
>
>> 2012/1/5 Stefan Küng<to...@gmail.com>:
>>> Hi,
>>>
>>> Due to a report on the TSVN mailing list I found that the CL client has the
>>> same problem:
>>> 'svn list' takes forever in some situations.
>>> I don't know what the problem exactly is, but it's easily reproducable:
>>>
>>> svn ls http://plugins.svn.wordpress.org/ -v --depth=immediates
>>> prints one entry, then never returns (ok, maybe not never. But waiting 10
>>> minutes is not enough).
>>>
>>> however, an
>>> svn ls http://plugins.svn.wordpress.org/ --depth=immediates
>>> (same command as above, but without the verbose flag) returns the entries
>>> almost immediately.
>>> I don't think that simply fetching the verbose info could take so much
>>> longer?
>>>
>>> This is with svn 1.7.2.
>>> The same works fine with svn 1.6.6 - haven't tested with later 1.6 svn
>>> clients since I don't have that version ready here on my machine.
>>>
>>> using serf instead of neon doesn't help.
>>>
>> http://plugins.svn.wordpress.org/
>> The page has a lot of subdirectories (nearly 26000)
>> The server is 1.6.12.
>>
>> Using client 1.6.17 (built by CollabNet, 32-bit, on Windows) it
>> printed the first line in 10 seconds, and never printed the rest. I
>> waited for ~1 minute. (Ctrl+C did not help, I had to unplug the
>> network cable)
>>
>> Using client 1.7.2 (from TortoiseSVN, 32-bit, on Windows) the result
>> is the same:
>> the first line in ~10 seconds, the rest of data - ~never.
>>
>> 1.6.17: Removing "--depth" option does not change anything.
>> 1.6.17: Removing "-v" the result appears in ~15 seconds and then takes
>> ~10 seconds to print the listing to the console.
> I created a repository with 10,000 subdirs:
>
> #!/bin/bash
> for i in `seq 0 999`;do
>    svn mkdir -mm file://`pwd`/repo/A${i}{0,1,2,3,4,5,6,7,8,9}
> done
>
> I measured the following times:
>
> client   server     ls -v   ls
> 1.6.6    1.6.12     43s     0.3s
> 1.6.12   1.6.12     43s     0.3s
> 1.7.3    1.6.12     43s     0.3s
> 1.6.6    1.7.3      1.8s    0.5s
> 1.6.12   1.7.3      1.8s    0.4s
> 1.7.3    1.7.3      1.8s    0.4s
>
> I don't see any significant difference between clients (I'm using neon)
> only between the servers.  1.7 is a major improvement in the verbose
> case, probably due to the better FSFS in-memory caching. There is
> perhaps a slight regression in the non-verbose case.
>
> As a side issue having 26,000 branches in the same directory is really
> bad for repository size due to the absence of directory deltification.
> My repository has 10,000 subdirs in 1,000 revisions and nothing else and
> yet it takes 175MB of disk.  The last commit, which adds 10 empty
> subdirs, produces a rev file that is 347KB.  Each commit to the
> wordpress repository probably adds about 1MB to the repository just
> rewriting that 26,000 branch directory.

The problem with 1.6.x is that its cache implementation
does not support partial cache access. Thus, ls -v results
in >600 000 000 directory entry copies (> 200GB) and
at least 5 minutes CPU time in the above setup.

In 1.7, the difference between ls and ls -v is dominated
by revprop access (one file per directory as each rev
touches only one wordpress plug-in). Packed revprops
might help here.

-- Stefan^2.

Re: svn ls performance

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Küng <to...@gmail.com> writes:

> Hmm - strange. I've had significant different timings for 1.6.6 and 1.7.2.
> But I've tried 1.6.6 right after 1.7.2, so maybe there was some
> caching involved?
>
> I'll have to do some more testing.

It can be hard to test these things when you don't control the server.
The repostory is about half a million revisions and probably hundreds of
GB, it gets frequent commits, and each commit probably writes at least
1MB due to to the 26,000 entry directory.

> But could the performance be somewhat improved? After all, without the
> verbose flag the result is available much, much faster.

Perhaps.  'svn ls' just has to look at the representation of one
directory, adding '-v' means that the representation of each child has
to be examined as well.  It's the FSFS backend code that needs to be
investigated, I see similar timings over all protocols.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: svn ls performance

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On Thu, Jan 5, 2012 at 22:02, Stefan Küng <to...@gmail.com> wrote:
> On 05.01.2012 01:25, Philip Martin wrote:
>>
>> Konstantin Kolinko<kn...@gmail.com>  writes:
>>
>>> 2012/1/5 Stefan Küng<to...@gmail.com>:
>>>>
>>>> Hi,
>>>>
>>>> Due to a report on the TSVN mailing list I found that the CL client has
>>>> the
>>>> same problem:
>>>> 'svn list' takes forever in some situations.
>>>> I don't know what the problem exactly is, but it's easily reproducable:
>>>>
>>>> svn ls http://plugins.svn.wordpress.org/ -v --depth=immediates
>>>> prints one entry, then never returns (ok, maybe not never. But waiting
>>>> 10
>>>> minutes is not enough).
>>>>
>>>> however, an
>>>> svn ls http://plugins.svn.wordpress.org/ --depth=immediates
>>>> (same command as above, but without the verbose flag) returns the
>>>> entries
>>>> almost immediately.
>>>> I don't think that simply fetching the verbose info could take so much
>>>> longer?
>>>>
>>>> This is with svn 1.7.2.
>>>> The same works fine with svn 1.6.6 - haven't tested with later 1.6 svn
>>>> clients since I don't have that version ready here on my machine.
>>>>
>>>> using serf instead of neon doesn't help.
>>>>
>>>
>>> http://plugins.svn.wordpress.org/
>>> The page has a lot of subdirectories (nearly 26000)
>>> The server is 1.6.12.
>>>
>>> Using client 1.6.17 (built by CollabNet, 32-bit, on Windows) it
>>> printed the first line in 10 seconds, and never printed the rest. I
>>> waited for ~1 minute. (Ctrl+C did not help, I had to unplug the
>>> network cable)
>>>
>>> Using client 1.7.2 (from TortoiseSVN, 32-bit, on Windows) the result
>>> is the same:
>>> the first line in ~10 seconds, the rest of data - ~never.
>>>
>>> 1.6.17: Removing "--depth" option does not change anything.
>>> 1.6.17: Removing "-v" the result appears in ~15 seconds and then takes
>>> ~10 seconds to print the listing to the console.
>>
>>
>> I created a repository with 10,000 subdirs:
>>
>> #!/bin/bash
>> for i in `seq 0 999`;do
>>   svn mkdir -mm file://`pwd`/repo/A${i}{0,1,2,3,4,5,6,7,8,9}
>> done
>>
>> I measured the following times:
>>
>> client   server     ls -v   ls
>> 1.6.6    1.6.12     43s     0.3s
>> 1.6.12   1.6.12     43s     0.3s
>> 1.7.3    1.6.12     43s     0.3s
>> 1.6.6    1.7.3      1.8s    0.5s
>> 1.6.12   1.7.3      1.8s    0.4s
>> 1.7.3    1.7.3      1.8s    0.4s
>>
>> I don't see any significant difference between clients (I'm using neon)
>> only between the servers.  1.7 is a major improvement in the verbose
>> case, probably due to the better FSFS in-memory caching. There is
>> perhaps a slight regression in the non-verbose case.
>>
>> As a side issue having 26,000 branches in the same directory is really
>> bad for repository size due to the absence of directory deltification.
>> My repository has 10,000 subdirs in 1,000 revisions and nothing else and
>> yet it takes 175MB of disk.  The last commit, which adds 10 empty
>> subdirs, produces a rev file that is 347KB.  Each commit to the
>> wordpress repository probably adds about 1MB to the repository just
>> rewriting that 26,000 branch directory.
>>
>
> Hmm - strange. I've had significant different timings for 1.6.6 and 1.7.2.
> But I've tried 1.6.6 right after 1.7.2, so maybe there was some caching
> involved?
>
> I'll have to do some more testing.
>
> But could the performance be somewhat improved? After all, without the
> verbose flag the result is available much, much faster.
>
>
Yes, performance of svn ls and svn ls -v was improved in svn 1.7.0.
See r1125391 and r1125326. Btw I'd like to '-v' be switched off by
default in TortoiseSVN Repo-Browser to make it faster.


-- 
Ivan Zhakov

Re: svn ls performance

Posted by Stefan Küng <to...@gmail.com>.
On 05.01.2012 01:25, Philip Martin wrote:
> Konstantin Kolinko<kn...@gmail.com>  writes:
>
>> 2012/1/5 Stefan Küng<to...@gmail.com>:
>>> Hi,
>>>
>>> Due to a report on the TSVN mailing list I found that the CL client has the
>>> same problem:
>>> 'svn list' takes forever in some situations.
>>> I don't know what the problem exactly is, but it's easily reproducable:
>>>
>>> svn ls http://plugins.svn.wordpress.org/ -v --depth=immediates
>>> prints one entry, then never returns (ok, maybe not never. But waiting 10
>>> minutes is not enough).
>>>
>>> however, an
>>> svn ls http://plugins.svn.wordpress.org/ --depth=immediates
>>> (same command as above, but without the verbose flag) returns the entries
>>> almost immediately.
>>> I don't think that simply fetching the verbose info could take so much
>>> longer?
>>>
>>> This is with svn 1.7.2.
>>> The same works fine with svn 1.6.6 - haven't tested with later 1.6 svn
>>> clients since I don't have that version ready here on my machine.
>>>
>>> using serf instead of neon doesn't help.
>>>
>>
>> http://plugins.svn.wordpress.org/
>> The page has a lot of subdirectories (nearly 26000)
>> The server is 1.6.12.
>>
>> Using client 1.6.17 (built by CollabNet, 32-bit, on Windows) it
>> printed the first line in 10 seconds, and never printed the rest. I
>> waited for ~1 minute. (Ctrl+C did not help, I had to unplug the
>> network cable)
>>
>> Using client 1.7.2 (from TortoiseSVN, 32-bit, on Windows) the result
>> is the same:
>> the first line in ~10 seconds, the rest of data - ~never.
>>
>> 1.6.17: Removing "--depth" option does not change anything.
>> 1.6.17: Removing "-v" the result appears in ~15 seconds and then takes
>> ~10 seconds to print the listing to the console.
>
> I created a repository with 10,000 subdirs:
>
> #!/bin/bash
> for i in `seq 0 999`;do
>    svn mkdir -mm file://`pwd`/repo/A${i}{0,1,2,3,4,5,6,7,8,9}
> done
>
> I measured the following times:
>
> client   server     ls -v   ls
> 1.6.6    1.6.12     43s     0.3s
> 1.6.12   1.6.12     43s     0.3s
> 1.7.3    1.6.12     43s     0.3s
> 1.6.6    1.7.3      1.8s    0.5s
> 1.6.12   1.7.3      1.8s    0.4s
> 1.7.3    1.7.3      1.8s    0.4s
>
> I don't see any significant difference between clients (I'm using neon)
> only between the servers.  1.7 is a major improvement in the verbose
> case, probably due to the better FSFS in-memory caching. There is
> perhaps a slight regression in the non-verbose case.
>
> As a side issue having 26,000 branches in the same directory is really
> bad for repository size due to the absence of directory deltification.
> My repository has 10,000 subdirs in 1,000 revisions and nothing else and
> yet it takes 175MB of disk.  The last commit, which adds 10 empty
> subdirs, produces a rev file that is 347KB.  Each commit to the
> wordpress repository probably adds about 1MB to the repository just
> rewriting that 26,000 branch directory.
>

Hmm - strange. I've had significant different timings for 1.6.6 and 1.7.2.
But I've tried 1.6.6 right after 1.7.2, so maybe there was some caching 
involved?

I'll have to do some more testing.

But could the performance be somewhat improved? After all, without the 
verbose flag the result is available much, much faster.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: svn ls performance

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> As a side issue having 26,000 branches in the same directory is really
> bad for repository size due to the absence of directory deltification.
> My repository has 10,000 subdirs in 1,000 revisions and nothing else and
> yet it takes 175MB of disk.  The last commit, which adds 10 empty
> subdirs, produces a rev file that is 347KB.  Each commit to the
> wordpress repository probably adds about 1MB to the repository just
> rewriting that 26,000 branch directory.

The experimental directory deltification code is particulary efficient
here, the rev files are all small and the repo is only 8.3M instead of
179M.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: svn ls performance

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> 1.7 is a major improvement in the verbose
> case, probably due to the better FSFS in-memory caching. There is
> perhaps a slight regression in the non-verbose case.

A small difference in server configurations is the reason for the
apparent regression in the non-verbose case; when I remove the
difference the regression goes away and 1.7 is slightly faster.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: svn ls performance

Posted by Philip Martin <ph...@wandisco.com>.
Konstantin Kolinko <kn...@gmail.com> writes:

> 2012/1/5 Stefan Küng <to...@gmail.com>:
>> Hi,
>>
>> Due to a report on the TSVN mailing list I found that the CL client has the
>> same problem:
>> 'svn list' takes forever in some situations.
>> I don't know what the problem exactly is, but it's easily reproducable:
>>
>> svn ls http://plugins.svn.wordpress.org/ -v --depth=immediates
>> prints one entry, then never returns (ok, maybe not never. But waiting 10
>> minutes is not enough).
>>
>> however, an
>> svn ls http://plugins.svn.wordpress.org/ --depth=immediates
>> (same command as above, but without the verbose flag) returns the entries
>> almost immediately.
>> I don't think that simply fetching the verbose info could take so much
>> longer?
>>
>> This is with svn 1.7.2.
>> The same works fine with svn 1.6.6 - haven't tested with later 1.6 svn
>> clients since I don't have that version ready here on my machine.
>>
>> using serf instead of neon doesn't help.
>>
>
> http://plugins.svn.wordpress.org/
> The page has a lot of subdirectories (nearly 26000)
> The server is 1.6.12.
>
> Using client 1.6.17 (built by CollabNet, 32-bit, on Windows) it
> printed the first line in 10 seconds, and never printed the rest. I
> waited for ~1 minute. (Ctrl+C did not help, I had to unplug the
> network cable)
>
> Using client 1.7.2 (from TortoiseSVN, 32-bit, on Windows) the result
> is the same:
> the first line in ~10 seconds, the rest of data - ~never.
>
> 1.6.17: Removing "--depth" option does not change anything.
> 1.6.17: Removing "-v" the result appears in ~15 seconds and then takes
> ~10 seconds to print the listing to the console.

I created a repository with 10,000 subdirs:

#!/bin/bash
for i in `seq 0 999`;do
  svn mkdir -mm file://`pwd`/repo/A${i}{0,1,2,3,4,5,6,7,8,9}
done

I measured the following times:

client   server     ls -v   ls
1.6.6    1.6.12     43s     0.3s
1.6.12   1.6.12     43s     0.3s
1.7.3    1.6.12     43s     0.3s
1.6.6    1.7.3      1.8s    0.5s
1.6.12   1.7.3      1.8s    0.4s
1.7.3    1.7.3      1.8s    0.4s

I don't see any significant difference between clients (I'm using neon)
only between the servers.  1.7 is a major improvement in the verbose
case, probably due to the better FSFS in-memory caching. There is
perhaps a slight regression in the non-verbose case.

As a side issue having 26,000 branches in the same directory is really
bad for repository size due to the absence of directory deltification.
My repository has 10,000 subdirs in 1,000 revisions and nothing else and
yet it takes 175MB of disk.  The last commit, which adds 10 empty
subdirs, produces a rev file that is 347KB.  Each commit to the
wordpress repository probably adds about 1MB to the repository just
rewriting that 26,000 branch directory.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: svn ls performance

Posted by Konstantin Kolinko <kn...@gmail.com>.
2012/1/5 Stefan Küng <to...@gmail.com>:
> Hi,
>
> Due to a report on the TSVN mailing list I found that the CL client has the
> same problem:
> 'svn list' takes forever in some situations.
> I don't know what the problem exactly is, but it's easily reproducable:
>
> svn ls http://plugins.svn.wordpress.org/ -v --depth=immediates
> prints one entry, then never returns (ok, maybe not never. But waiting 10
> minutes is not enough).
>
> however, an
> svn ls http://plugins.svn.wordpress.org/ --depth=immediates
> (same command as above, but without the verbose flag) returns the entries
> almost immediately.
> I don't think that simply fetching the verbose info could take so much
> longer?
>
> This is with svn 1.7.2.
> The same works fine with svn 1.6.6 - haven't tested with later 1.6 svn
> clients since I don't have that version ready here on my machine.
>
> using serf instead of neon doesn't help.
>

http://plugins.svn.wordpress.org/
The page has a lot of subdirectories (nearly 26000)
The server is 1.6.12.

Using client 1.6.17 (built by CollabNet, 32-bit, on Windows) it
printed the first line in 10 seconds, and never printed the rest. I
waited for ~1 minute. (Ctrl+C did not help, I had to unplug the
network cable)

Using client 1.7.2 (from TortoiseSVN, 32-bit, on Windows) the result
is the same:
the first line in ~10 seconds, the rest of data - ~never.

1.6.17: Removing "--depth" option does not change anything.
1.6.17: Removing "-v" the result appears in ~15 seconds and then takes
~10 seconds to print the listing to the console.

I am using neon in both cases.

Best regards,
Konstantin Kolinko