You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by mi...@agilent.com on 2011/10/31 16:09:21 UTC

Apparent "svn rm" scaling problem in 1.7.x

I am starting to see some very bad performance with "svn rm" compared to the 1.6.x line of subversion.  I have a directory that is full of files.  If I go into the directory and run "svn rm *", it is significantly slower than running svn rm on the whole directory.  While the difference in time taken is significant, the speed is still relatively acceptable for small working copies.

"svn rm dir/*"   3.8s      1.7.1
"svn rm dir"        0.173s 1.7.1
"svn rm dir/*"    1.008s 1.6.17

In this case 1.6.17 is nearly 4 times faster.  My working copy is nfs mounted and has 200 nodes.

When the working copy size starts to get larger, deleting the same directory as above, but within a larger working copy and things really start to slow down.

"svn rm dir/*"   6m15s      1.7.1
"svn rm dir"        8.5s          1.7.1
"svn rm dir/*"    1.14s       1.6.17

This working copy is also nfs mounted but I'm up to 23948 nodes (doing sqlite3 .svn/wc.db "select count (*) from nodes").  The directory that I am deleting has 49 files.  The largest is file is 2.5k.  I am doing the comparison of the same directory of the same repository.  In one case I am only doing a partial checkout to keep the working copy size down where the second test I am doing a complete checkout.

We can also see that subversion 1.6.17 scales very well for working copy size.




Michael Rytting
Agilent Technologies
michael_rytting@agilent.com<ma...@agilent.com>
719-590-3708


RE: Apparent "svn rm" scaling problem in 1.7.x

Posted by mi...@agilent.com.
Oh and to measure the time, I'm using a stopwatch and synchronizing the start of the command with pressing go on my stop watch.  I then push stop when the command completes.



P.S. Actually I'm using the linux time command :)  i.e. "time svn st <file>"

-----Original Message-----
From: Stefan Sperling [mailto:stsp@elego.de] 
Sent: Tuesday, November 01, 2011 11:00 AM
To: RYTTING,MICHAEL (A-ColSprings,ex1)
Cc: users@subversion.apache.org
Subject: Re: Apparent "svn rm" scaling problem in 1.7.x

On Tue, Nov 01, 2011 at 10:38:07AM -0600, michael_rytting@agilent.com wrote:
> Not much of an improvement.  "svn rm dir/*" now takes 2m6s vs 7s for "svn rm dir".  

Before the patch, we had:

"svn rm dir/*"   6m15s      1.7.1
"svn rm dir"        8.5s          1.7.1
"svn rm dir/*"    1.14s       1.6.17

So this patch cut about 4 minutes of runtime, which is somwhat significant but definitely not enough. But it's a step in the right direction.

> As a side note, I really think there is fundamentally something wrong of the performance of "svn rm" with large working copies.  Here are some example times.
> 
> svn rm <file>       7s
> svn add <file>      0.126s
> svn st <file>          2s
> svn blame <file> 0.2s
> svn lock <file>      0.12s
> svn unlock <file> 0.103s
> svn log <file>        0.089s
> svn revert <file>  0.133s
> svn info <file>      0.074s
> 
> I'm assuming that all these commands are doing some form of sqlite 
> database transactions, but the rm transaction, in particular, is very 
> slow.  Even when using a local working copy, I am seeing large 
> discrepancies in the time it takes to run "svn rm" vs most other svn 
> commands.  It's just since the local working copy is faster overall, 
> you are less likely to notice the large discrepancy in performance.

Yes, that looks bad. There might be a linear DB table scan involved in 'svn rm' that becomes noticable on large WCs.
Do you see this difference only on NFS or also on local disk?

How are you measuring the time?
Note that most commands take at least one second because Subversion waits for one second for filesystem timestamps to update after some operations. To cut this deliberate delay out of the equation, do this:
export SVN_I_LOVE_CORRUPTED_WORKING_COPIES_SO_DISABLE_SLEEP_FOR_TIMESTAMPS=yes

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Perhaps xpost this to dev@ at some point :)

Philip Martin wrote on Tue, Nov 01, 2011 at 18:44:29 +0000:
> Stefan Sperling <st...@elego.de> writes:
> 
> > On Tue, Nov 01, 2011 at 06:29:59PM +0000, Philip Martin wrote:
> >> I put in the ORDER BY to preserve the parents before children
> >> notification used by 1.6.  I wonder if that notification order is
> >> important?
> >
> > See r1196191.
> > It should preserve the 1.6.x order (via svn_path_compare_paths()).
> >
> >> A patch that we could commit without affecting the order is:
> >> 
> >> Index: subversion/libsvn_wc/wc-queries.sql
> >> ===================================================================
> >> --- subversion/libsvn_wc/wc-queries.sql	(revision 1196106)
> >> +++ subversion/libsvn_wc/wc-queries.sql	(working copy)
> >> @@ -1193,7 +1193,7 @@
> >>  CREATE TEMPORARY TABLE delete_list (
> >>  /* ### we should put the wc_id in here in case a delete spans multiple
> >>     ### working copies. queries, etc will need to be adjusted.  */
> >> -   local_relpath TEXT PRIMARY KEY NOT NULL
> >> +   local_relpath TEXT PRIMARY KEY NOT NULL UNIQUE
> >>     )
> >
> > Interesting. Can you explain why this doesn't affect order?
> 
> Because I retained ORDER BY in the select statement.
> 
> > I guess this works because there is only one column in the table?
> > Do UNIQUE columns happen to be inserted, or selected, in sorted order?
> 
> UNIQUE simple means that an index is created so the ORDER BY is fast.
> 
> -- 
> Philip

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Mark Phippard <ma...@gmail.com>.
Ahh, I thought the timings were based on the envvar that Stefan suggested
to try.  Thanks for clarifying.



On Tue, Nov 1, 2011 at 1:18 PM, <mi...@agilent.com> wrote:

> Perhaps I wasn’t clear, the second set of runs where with a local working
> copy instead of an nfs mounted working copy.****
>
> ** **
>
> *From:* Mark Phippard [mailto:markphip@gmail.com]
> *Sent:* Tuesday, November 01, 2011 11:18 AM
> *To:* RYTTING,MICHAEL (A-ColSprings,ex1)
> *Cc:* stsp@elego.de; users@subversion.apache.org
>
> *Subject:* Re: Apparent "svn rm" scaling problem in 1.7.x****
>
> ** **
>
> On Tue, Nov 1, 2011 at 1:10 PM, <mi...@agilent.com> wrote:****
>
> LOL!  I love the env variable.
>
> Here is some similar data for a local working copy.  These are all run
> with the env variable set.  Again, svn rm is significantly slower than all
> other operations.
>
> svn rm <file>  0.35s
> svn st <file>    0.105s
> svn blame  0.041s
> svn unlock 0.056s
> svn lock      0.053s
> svn log   0.036s
> svn info 0.014s****
>
> ** **
>
> But look how much it improved compared to how much the others improved?  *
> ***
>
> ** **
>
> svn rm <file>       7s
> svn add <file>      0.126s
> svn st <file>          2s
> svn blame <file> 0.2s
> svn lock <file>      0.12s
> svn unlock <file> 0.103s
> svn log <file>        0.089s
> svn revert <file>  0.133s
> svn info <file>      0.074s****
>
> ** **
>
> Many of these commands are not even impacted by that variable.  That said,
> I do not get how this envvar can shave 7 seconds off the operation when it
> usually only sleeps for a second.****
>
> ** **
>
> --
> Thanks
>
> Mark Phippard
> http://markphip.blogspot.com/****
>



-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

RE: Apparent "svn rm" scaling problem in 1.7.x

Posted by mi...@agilent.com.
I'd have to do some research to get the options.  It's a proprietary filesystem.

That being said, I understand that nfs mounted working copies will degrade my performance.  I really think this is a more fundamental performance issue with svn rm that gets exacerbated with slow performance over nfs.

From: Tony Sweeney [mailto:tsweeney@omnifone.com
Sent: Tuesday, November 01, 2011 11:32 AM
To: RYTTING,MICHAEL (A-ColSprings,ex1); markphip@gmail.com
Cc: stsp@elego.de; users@subversion.apache.org
Subject: RE: Apparent "svn rm" scaling problem in 1.7.x


________________________________
From: michael_rytting@agilent.com<ma...@agilent.com> [mailto:michael_rytting@agilent.com]<mailto:[mailto:michael_rytting@agilent.com]>
Sent: 01 November 2011 17:19
To: markphip@gmail.com<ma...@gmail.com>
Cc: stsp@elego.de<ma...@elego.de>; users@subversion.apache.org<ma...@subversion.apache.org>
Subject: RE: Apparent "svn rm" scaling problem in 1.7.x
Perhaps I wasn't clear, the second set of runs where with a local working copy instead of an nfs mounted working copy.


What are your NFS mount options?


From: Mark Phippard [mailto:markphip@gmail.com]<mailto:[mailto:markphip@gmail.com]>
Sent: Tuesday, November 01, 2011 11:18 AM
To: RYTTING,MICHAEL (A-ColSprings,ex1)
Cc: stsp@elego.de<ma...@elego.de>; users@subversion.apache.org<ma...@subversion.apache.org>
Subject: Re: Apparent "svn rm" scaling problem in 1.7.x

On Tue, Nov 1, 2011 at 1:10 PM, <mi...@agilent.com>> wrote:
LOL!  I love the env variable.

Here is some similar data for a local working copy.  These are all run with the env variable set.  Again, svn rm is significantly slower than all other operations.

svn rm <file>  0.35s
svn st <file>    0.105s
svn blame  0.041s
svn unlock 0.056s
svn lock      0.053s
svn log   0.036s
svn info 0.014s

But look how much it improved compared to how much the others improved?

svn rm <file>       7s
svn add <file>      0.126s
svn st <file>          2s
svn blame <file> 0.2s
svn lock <file>      0.12s
svn unlock <file> 0.103s
svn log <file>        0.089s
svn revert <file>  0.133s
svn info <file>      0.074s

Many of these commands are not even impacted by that variable.  That said, I do not get how this envvar can shave 7 seconds off the operation when it usually only sleeps for a second.

--
Thanks

Mark Phippard
http://markphip.blogspot.com/

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________
________________________________

No virus found in this message.
Checked by AVG - www.avg.com<http://www.avg.com>
Version: 2012.0.1834 / Virus Database: 2092/4589 - Release Date: 11/01/11

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

RE: Apparent "svn rm" scaling problem in 1.7.x

Posted by Tony Sweeney <ts...@omnifone.com>.

________________________________

From: michael_rytting@agilent.com [mailto:michael_rytting@agilent.com] 
Sent: 01 November 2011 17:19
To: markphip@gmail.com
Cc: stsp@elego.de; users@subversion.apache.org
Subject: RE: Apparent "svn rm" scaling problem in 1.7.x



Perhaps I wasn't clear, the second set of runs where with a local
working copy instead of an nfs mounted working copy. 

 

 

What are your NFS mount options?

 

From: Mark Phippard [mailto:markphip@gmail.com] 
Sent: Tuesday, November 01, 2011 11:18 AM
To: RYTTING,MICHAEL (A-ColSprings,ex1)
Cc: stsp@elego.de; users@subversion.apache.org
Subject: Re: Apparent "svn rm" scaling problem in 1.7.x

 

On Tue, Nov 1, 2011 at 1:10 PM, <mi...@agilent.com> wrote:

	LOL!  I love the env variable.
	
	Here is some similar data for a local working copy.  These are
all run with the env variable set.  Again, svn rm is significantly
slower than all other operations.
	
	svn rm <file>  0.35s
	svn st <file>    0.105s
	svn blame  0.041s
	svn unlock 0.056s
	svn lock      0.053s
	svn log   0.036s
	svn info 0.014s

 

But look how much it improved compared to how much the others improved?


 

svn rm <file>       7s
svn add <file>      0.126s
svn st <file>          2s
svn blame <file> 0.2s
svn lock <file>      0.12s
svn unlock <file> 0.103s
svn log <file>        0.089s
svn revert <file>  0.133s
svn info <file>      0.074s

 

Many of these commands are not even impacted by that variable.  That
said, I do not get how this envvar can shave 7 seconds off the operation
when it usually only sleeps for a second.

 

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

________________________________

No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1834 / Virus Database: 2092/4589 - Release Date:
11/01/11


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

RE: Apparent "svn rm" scaling problem in 1.7.x

Posted by mi...@agilent.com.
Perhaps I wasn't clear, the second set of runs where with a local working copy instead of an nfs mounted working copy.

From: Mark Phippard [mailto:markphip@gmail.com]
Sent: Tuesday, November 01, 2011 11:18 AM
To: RYTTING,MICHAEL (A-ColSprings,ex1)
Cc: stsp@elego.de; users@subversion.apache.org
Subject: Re: Apparent "svn rm" scaling problem in 1.7.x

On Tue, Nov 1, 2011 at 1:10 PM, <mi...@agilent.com>> wrote:
LOL!  I love the env variable.

Here is some similar data for a local working copy.  These are all run with the env variable set.  Again, svn rm is significantly slower than all other operations.

svn rm <file>  0.35s
svn st <file>    0.105s
svn blame  0.041s
svn unlock 0.056s
svn lock      0.053s
svn log   0.036s
svn info 0.014s

But look how much it improved compared to how much the others improved?

svn rm <file>       7s
svn add <file>      0.126s
svn st <file>          2s
svn blame <file> 0.2s
svn lock <file>      0.12s
svn unlock <file> 0.103s
svn log <file>        0.089s
svn revert <file>  0.133s
svn info <file>      0.074s

Many of these commands are not even impacted by that variable.  That said, I do not get how this envvar can shave 7 seconds off the operation when it usually only sleeps for a second.

--
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Mark Phippard <ma...@gmail.com>.
On Tue, Nov 1, 2011 at 1:10 PM, <mi...@agilent.com> wrote:

> LOL!  I love the env variable.
>
> Here is some similar data for a local working copy.  These are all run
> with the env variable set.  Again, svn rm is significantly slower than all
> other operations.
>
> svn rm <file>  0.35s
> svn st <file>    0.105s
> svn blame  0.041s
> svn unlock 0.056s
> svn lock      0.053s
> svn log   0.036s
> svn info 0.014s
>

But look how much it improved compared to how much the others improved?

svn rm <file>       7s
svn add <file>      0.126s
svn st <file>          2s
svn blame <file> 0.2s
svn lock <file>      0.12s
svn unlock <file> 0.103s
svn log <file>        0.089s
svn revert <file>  0.133s
svn info <file>      0.074s

Many of these commands are not even impacted by that variable.  That said,
I do not get how this envvar can shave 7 seconds off the operation when it
usually only sleeps for a second.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Neels J Hofmeyr <ne...@elego.de>.
On 11/02/2011 03:46 PM, Mark Phippard wrote:
> On Wed, Nov 2, 2011 at 10:39 AM, Neels J Hofmeyr <neels@elego.de
> <ma...@elego.de>> wrote:
> 
> 
>     It seems to show an improvement on 'svn delete' for 1.7.0.
> 
> 
> Yep, I noticed that.  FWIW, the benchmarks that I wrote also show delete
> being faster with 1.7.x:
> 
> https://ctf.open.collab.net/sf/projects/csvn/
> 
> In my case, delete is deleting a folder that contains a lot of files.
>  Perhaps we need a test that is doing:
> 
> $ svn rm folder/*
> 
> That seems to be the main area where there is a problem.
> 
> What does your delete test do?

Hmm, not much. At one point it locally deletes a branch from a WC. I had
more deletes in there originally, but that started to overcomplicate the
test automation, so there isn't much deletion going on. I guess I should
insert a few delete variants right at the end, now that I think of it from
the distance.

BTW, the commands are in tools/dev/benchmarks/suite1/benchmark.py starting
at line 435.

~Neels


Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Mark Phippard <ma...@gmail.com>.
On Wed, Nov 2, 2011 at 10:39 AM, Neels J Hofmeyr <ne...@elego.de> wrote:

>
> It seems to show an improvement on 'svn delete' for 1.7.0.
>

Yep, I noticed that.  FWIW, the benchmarks that I wrote also show delete
being faster with 1.7.x:

https://ctf.open.collab.net/sf/projects/csvn/

In my case, delete is deleting a folder that contains a lot of files.
 Perhaps we need a test that is doing:

$ svn rm folder/*

That seems to be the main area where there is a problem.

What does your delete test do?


-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Neels J Hofmeyr <ne...@elego.de>.
On 11/01/2011 07:56 PM, Mark Phippard wrote:
> On Tue, Nov 1, 2011 at 2:40 PM, Stefan Sperling <stsp@elego.de
> <ma...@elego.de>> wrote:
> 
>     On Tue, Nov 01, 2011 at 06:29:59PM +0000, Philip Martin wrote:
>     > I put in the ORDER BY to preserve the parents before children
>     > notification used by 1.6.  I wonder if that notification order is
>     > important?
> 
>     See r1196191.
>     It should preserve the 1.6.x order (via svn_path_compare_paths()).
> 
> 
> Just a side note.  I have not been able to find a "final" version of the
> svnbench tool with results for 1.7.0 compared with 1.6.17.  The results I
> can find are only comparing 1.7.x with trunk.  Just wondering if those tests
> show a problem with rm so that we can track progress via those tests.
> 
> Adding Neels in case he archived any results.

Have not, but I will gladly kick off a special run for you.
(...some time passes...)
See http://svn.haxx.se/dev/archive-2011-11/0029.shtml

*Disclaimer:* this tests only file://-URL access on a GNU/Linux VM. This is
intended to measure changes in performance of the local working copy layer,
only. These results are *not* generally true for everyone.

It seems to show an improvement on 'svn delete' for 1.7.0.

~Neels


Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Mark Phippard <ma...@gmail.com>.
On Tue, Nov 1, 2011 at 2:40 PM, Stefan Sperling <st...@elego.de> wrote:

> On Tue, Nov 01, 2011 at 06:29:59PM +0000, Philip Martin wrote:
> > I put in the ORDER BY to preserve the parents before children
> > notification used by 1.6.  I wonder if that notification order is
> > important?
>
> See r1196191.
> It should preserve the 1.6.x order (via svn_path_compare_paths()).
>

Just a side note.  I have not been able to find a "final" version of the
svnbench tool with results for 1.7.0 compared with 1.6.17.  The results I
can find are only comparing 1.7.x with trunk.  Just wondering if those
tests show a problem with rm so that we can track progress via those tests.

Adding Neels in case he archived any results.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Sperling <st...@elego.de> writes:

> On Tue, Nov 01, 2011 at 06:29:59PM +0000, Philip Martin wrote:
>> I put in the ORDER BY to preserve the parents before children
>> notification used by 1.6.  I wonder if that notification order is
>> important?
>
> See r1196191.
> It should preserve the 1.6.x order (via svn_path_compare_paths()).
>
>> A patch that we could commit without affecting the order is:
>> 
>> Index: subversion/libsvn_wc/wc-queries.sql
>> ===================================================================
>> --- subversion/libsvn_wc/wc-queries.sql	(revision 1196106)
>> +++ subversion/libsvn_wc/wc-queries.sql	(working copy)
>> @@ -1193,7 +1193,7 @@
>>  CREATE TEMPORARY TABLE delete_list (
>>  /* ### we should put the wc_id in here in case a delete spans multiple
>>     ### working copies. queries, etc will need to be adjusted.  */
>> -   local_relpath TEXT PRIMARY KEY NOT NULL
>> +   local_relpath TEXT PRIMARY KEY NOT NULL UNIQUE
>>     )
>
> Interesting. Can you explain why this doesn't affect order?

Because I retained ORDER BY in the select statement.

> I guess this works because there is only one column in the table?
> Do UNIQUE columns happen to be inserted, or selected, in sorted order?

UNIQUE simple means that an index is created so the ORDER BY is fast.

-- 
Philip

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Nov 01, 2011 at 06:29:59PM +0000, Philip Martin wrote:
> I put in the ORDER BY to preserve the parents before children
> notification used by 1.6.  I wonder if that notification order is
> important?

See r1196191.
It should preserve the 1.6.x order (via svn_path_compare_paths()).

> A patch that we could commit without affecting the order is:
> 
> Index: subversion/libsvn_wc/wc-queries.sql
> ===================================================================
> --- subversion/libsvn_wc/wc-queries.sql	(revision 1196106)
> +++ subversion/libsvn_wc/wc-queries.sql	(working copy)
> @@ -1193,7 +1193,7 @@
>  CREATE TEMPORARY TABLE delete_list (
>  /* ### we should put the wc_id in here in case a delete spans multiple
>     ### working copies. queries, etc will need to be adjusted.  */
> -   local_relpath TEXT PRIMARY KEY NOT NULL
> +   local_relpath TEXT PRIMARY KEY NOT NULL UNIQUE
>     )

Interesting. Can you explain why this doesn't affect order?
I guess this works because there is only one column in the table?
Do UNIQUE columns happen to be inserted, or selected, in sorted order?

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Nov 01, 2011 at 06:21:47PM +0100, Stefan Sperling wrote:
> What effect does the env var have on 'svn rm dir/*'?

Actually, looking closer at the code made me realise that 'svn rm'
never sleeps for timestamps. So the envvar should have no effect.

> How many files and directories are you actually deleting via 'svn rm dir/*'?

The answer to this question is still interesting though
so I can try to reproduce your timings locally.

RE: Apparent "svn rm" scaling problem in 1.7.x

Posted by mi...@agilent.com.
Benchmarking with 49 files is taking too long.  Here are some benchmarks of trying to delete a directory with 5 files.  I am using approximations because I am seeing variations in each run due to network traffic.

"svn rm dir/*" 1.6.17     ~0.15s
"svn rm dir"     1.7.1        ~10s
"svn rm dir/*" 1.7.1        ~50s 
"svn rm dir/*" 1.7.1.p1 ~30s (single transaction)
"svn rm dir/*" 1.7.1.p2 ~25s (sqlite change)
"svn rm dir/*" 1.7.1.p3 ~30s (c code change)

We can continue to benchmark deletes on multiple files, but I still think the culprit is the fundamental svn rm command is too slow.


-----Original Message-----
From: Stefan Sperling [mailto:stsp@elego.de] 
Sent: Tuesday, November 01, 2011 12:30 PM
To: RYTTING,MICHAEL (A-ColSprings,ex1); users@subversion.apache.org
Subject: Re: Apparent "svn rm" scaling problem in 1.7.x

On Tue, Nov 01, 2011 at 07:00:57PM +0100, Stefan Sperling wrote:
> On Tue, Nov 01, 2011 at 11:31:35AM -0600, michael_rytting@agilent.com wrote:
> > It's just one directory that has 49 files in it.
> 
> Good. Please try this patch in addition to the other one.
> It makes 'svn rm dir/*' with 49 files go down from about 4.20 seconds 
> to about 1.50 seconds for me (local disk).
> 
> Note that I am not going to commit this as is.
> It just tests whether the overhead of sorting paths in sqlite matters 
> much on NFS.

I went ahead and implemented sorting the deleted items list in C.
The following gives you a patch that applies cleanly on top of my previous 1.7.x patch:

 svn diff -c1196191 https://svn.apache.org/repos/asf/subversion/trunk

Can you try that? Thanks.

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Nov 01, 2011 at 07:00:57PM +0100, Stefan Sperling wrote:
> On Tue, Nov 01, 2011 at 11:31:35AM -0600, michael_rytting@agilent.com wrote:
> > It's just one directory that has 49 files in it.
> 
> Good. Please try this patch in addition to the other one.
> It makes 'svn rm dir/*' with 49 files go down from about 4.20 seconds
> to about 1.50 seconds for me (local disk).
> 
> Note that I am not going to commit this as is.
> It just tests whether the overhead of sorting paths in sqlite matters
> much on NFS.

I went ahead and implemented sorting the deleted items list in C.
The following gives you a patch that applies cleanly on top of
my previous 1.7.x patch:

 svn diff -c1196191 https://svn.apache.org/repos/asf/subversion/trunk

Can you try that? Thanks.

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Sperling <st...@elego.de> writes:

> On Tue, Nov 01, 2011 at 11:31:35AM -0600, michael_rytting@agilent.com wrote:
> Note that I am not going to commit this as is.
> It just tests whether the overhead of sorting paths in sqlite matters
> much on NFS.
>
> Index: subversion/libsvn_wc/wc-queries.sql
> ===================================================================
> --- subversion/libsvn_wc/wc-queries.sql	(revision 1196149)
> +++ subversion/libsvn_wc/wc-queries.sql	(working copy)
> @@ -1208,7 +1208,6 @@ WHERE wc_id = ?1
>  
>  -- STMT_SELECT_DELETE_LIST
>  SELECT local_relpath FROM delete_list
> -ORDER BY local_relpath
>  
>  -- STMT_FINALIZE_DELETE
>  DROP TABLE IF EXISTS delete_list

I put in the ORDER BY to preserve the parents before children
notification used by 1.6.  I wonder if that notification order is
important?

A patch that we could commit without affecting the order is:

Index: subversion/libsvn_wc/wc-queries.sql
===================================================================
--- subversion/libsvn_wc/wc-queries.sql	(revision 1196106)
+++ subversion/libsvn_wc/wc-queries.sql	(working copy)
@@ -1193,7 +1193,7 @@
 CREATE TEMPORARY TABLE delete_list (
 /* ### we should put the wc_id in here in case a delete spans multiple
    ### working copies. queries, etc will need to be adjusted.  */
-   local_relpath TEXT PRIMARY KEY NOT NULL
+   local_relpath TEXT PRIMARY KEY NOT NULL UNIQUE
    )
 
 /* This matches the selection in STMT_INSERT_DELETE_FROM_NODE_RECURSIVE */

-- 
Philip

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Nov 01, 2011 at 11:31:35AM -0600, michael_rytting@agilent.com wrote:
> It's just one directory that has 49 files in it.

Good. Please try this patch in addition to the other one.
It makes 'svn rm dir/*' with 49 files go down from about 4.20 seconds
to about 1.50 seconds for me (local disk).

Note that I am not going to commit this as is.
It just tests whether the overhead of sorting paths in sqlite matters
much on NFS.

Index: subversion/libsvn_wc/wc-queries.sql
===================================================================
--- subversion/libsvn_wc/wc-queries.sql	(revision 1196149)
+++ subversion/libsvn_wc/wc-queries.sql	(working copy)
@@ -1208,7 +1208,6 @@ WHERE wc_id = ?1
 
 -- STMT_SELECT_DELETE_LIST
 SELECT local_relpath FROM delete_list
-ORDER BY local_relpath
 
 -- STMT_FINALIZE_DELETE
 DROP TABLE IF EXISTS delete_list

RE: Apparent "svn rm" scaling problem in 1.7.x

Posted by mi...@agilent.com.
It's just one directory that has 49 files in it.

-----Original Message-----
From: Stefan Sperling [mailto:stsp@elego.de] 
Sent: Tuesday, November 01, 2011 11:22 AM
To: RYTTING,MICHAEL (A-ColSprings,ex1)
Cc: users@subversion.apache.org
Subject: Re: Apparent "svn rm" scaling problem in 1.7.x

On Tue, Nov 01, 2011 at 11:10:44AM -0600, michael_rytting@agilent.com wrote:
> LOL!  I love the env variable.
> 
> Here is some similar data for a local working copy.  These are all run with the env variable set.  Again, svn rm is significantly slower than all other operations.
> 
> svn rm <file>  0.35s
> svn st <file>    0.105s
> svn blame  0.041s
> svn unlock 0.056s
> svn lock      0.053s
> svn log   0.036s
> svn info 0.014s

What effect does the env var have on 'svn rm dir/*'?
How many files and directories are you actually deleting via 'svn rm dir/*'?

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Nov 01, 2011 at 11:10:44AM -0600, michael_rytting@agilent.com wrote:
> LOL!  I love the env variable.
> 
> Here is some similar data for a local working copy.  These are all run with the env variable set.  Again, svn rm is significantly slower than all other operations.
> 
> svn rm <file>  0.35s
> svn st <file>    0.105s
> svn blame  0.041s
> svn unlock 0.056s
> svn lock      0.053s
> svn log   0.036s
> svn info 0.014s

What effect does the env var have on 'svn rm dir/*'?
How many files and directories are you actually deleting via 'svn rm dir/*'?

RE: Apparent "svn rm" scaling problem in 1.7.x

Posted by mi...@agilent.com.
LOL!  I love the env variable.

Here is some similar data for a local working copy.  These are all run with the env variable set.  Again, svn rm is significantly slower than all other operations.

svn rm <file>  0.35s
svn st <file>    0.105s
svn blame  0.041s
svn unlock 0.056s
svn lock      0.053s
svn log   0.036s
svn info 0.014s

-----Original Message-----
From: Stefan Sperling [mailto:stsp@elego.de] 
Sent: Tuesday, November 01, 2011 11:00 AM
To: RYTTING,MICHAEL (A-ColSprings,ex1)
Cc: users@subversion.apache.org
Subject: Re: Apparent "svn rm" scaling problem in 1.7.x

On Tue, Nov 01, 2011 at 10:38:07AM -0600, michael_rytting@agilent.com wrote:
> Not much of an improvement.  "svn rm dir/*" now takes 2m6s vs 7s for "svn rm dir".  

Before the patch, we had:

"svn rm dir/*"   6m15s      1.7.1
"svn rm dir"        8.5s          1.7.1
"svn rm dir/*"    1.14s       1.6.17

So this patch cut about 4 minutes of runtime, which is somwhat significant but definitely not enough. But it's a step in the right direction.

> As a side note, I really think there is fundamentally something wrong of the performance of "svn rm" with large working copies.  Here are some example times.
> 
> svn rm <file>       7s
> svn add <file>      0.126s
> svn st <file>          2s
> svn blame <file> 0.2s
> svn lock <file>      0.12s
> svn unlock <file> 0.103s
> svn log <file>        0.089s
> svn revert <file>  0.133s
> svn info <file>      0.074s
> 
> I'm assuming that all these commands are doing some form of sqlite 
> database transactions, but the rm transaction, in particular, is very 
> slow.  Even when using a local working copy, I am seeing large 
> discrepancies in the time it takes to run "svn rm" vs most other svn 
> commands.  It's just since the local working copy is faster overall, 
> you are less likely to notice the large discrepancy in performance.

Yes, that looks bad. There might be a linear DB table scan involved in 'svn rm' that becomes noticable on large WCs.
Do you see this difference only on NFS or also on local disk?

How are you measuring the time?
Note that most commands take at least one second because Subversion waits for one second for filesystem timestamps to update after some operations. To cut this deliberate delay out of the equation, do this:
export SVN_I_LOVE_CORRUPTED_WORKING_COPIES_SO_DISABLE_SLEEP_FOR_TIMESTAMPS=yes

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Nov 01, 2011 at 10:38:07AM -0600, michael_rytting@agilent.com wrote:
> Not much of an improvement.  "svn rm dir/*" now takes 2m6s vs 7s for "svn rm dir".  

Before the patch, we had:

"svn rm dir/*"   6m15s      1.7.1
"svn rm dir"        8.5s          1.7.1
"svn rm dir/*"    1.14s       1.6.17

So this patch cut about 4 minutes of runtime, which is somwhat
significant but definitely not enough. But it's a step in the
right direction.

> As a side note, I really think there is fundamentally something wrong of the performance of "svn rm" with large working copies.  Here are some example times.
> 
> svn rm <file>       7s
> svn add <file>      0.126s
> svn st <file>          2s
> svn blame <file> 0.2s
> svn lock <file>      0.12s
> svn unlock <file> 0.103s
> svn log <file>        0.089s
> svn revert <file>  0.133s
> svn info <file>      0.074s
> 
> I'm assuming that all these commands are doing some form of sqlite
> database transactions, but the rm transaction, in particular, is very
> slow.  Even when using a local working copy, I am seeing large
> discrepancies in the time it takes to run "svn rm" vs most other svn
> commands.  It's just since the local working copy is faster overall,
> you are less likely to notice the large discrepancy in performance.

Yes, that looks bad. There might be a linear DB table scan involved
in 'svn rm' that becomes noticable on large WCs.
Do you see this difference only on NFS or also on local disk?

How are you measuring the time?
Note that most commands take at least one second because Subversion
waits for one second for filesystem timestamps to update after some
operations. To cut this deliberate delay out of the equation, do this:
export SVN_I_LOVE_CORRUPTED_WORKING_COPIES_SO_DISABLE_SLEEP_FOR_TIMESTAMPS=yes

RE: Apparent "svn rm" scaling problem in 1.7.x

Posted by mi...@agilent.com.
Not much of an improvement.  "svn rm dir/*" now takes 2m6s vs 7s for "svn rm dir".  

As a side note, I really think there is fundamentally something wrong of the performance of "svn rm" with large working copies.  Here are some example times.

svn rm <file>       7s
svn add <file>      0.126s
svn st <file>          2s
svn blame <file> 0.2s
svn lock <file>      0.12s
svn unlock <file> 0.103s
svn log <file>        0.089s
svn revert <file>  0.133s
svn info <file>      0.074s

I'm assuming that all these commands are doing some form of sqlite database transactions, but the rm transaction, in particular, is very slow.  Even when using a local working copy, I am seeing large discrepancies in the time it takes to run "svn rm" vs most other svn commands.  It's just since the local working copy is faster overall, you are less likely to notice the large discrepancy in performance.


-----Original Message-----
From: Stefan Sperling [mailto:stsp@elego.de] 
Sent: Tuesday, November 01, 2011 8:35 AM
To: RYTTING,MICHAEL (A-ColSprings,ex1)
Cc: users@subversion.apache.org
Subject: Re: Apparent "svn rm" scaling problem in 1.7.x

On Tue, Nov 01, 2011 at 06:45:59AM -0600, michael_rytting@agilent.com wrote:
> I'm always willing to try patches.

Great! Please give this patch a spin:

svn diff -c 1196018 https://svn.apache.org/repos/asf/subversion/branches/1.7.x-r1195873

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Nov 01, 2011 at 06:45:59AM -0600, michael_rytting@agilent.com wrote:
> I'm always willing to try patches.

Great! Please give this patch a spin:

svn diff -c 1196018 https://svn.apache.org/repos/asf/subversion/branches/1.7.x-r1195873

RE: Apparent "svn rm" scaling problem in 1.7.x

Posted by mi...@agilent.com.
I'm always willing to try patches.

-----Original Message-----
From: Stefan Sperling [mailto:stsp@elego.de] 
Sent: Tuesday, November 01, 2011 2:49 AM
To: RYTTING,MICHAEL (A-ColSprings,ex1)
Cc: users@subversion.apache.org
Subject: Re: Apparent "svn rm" scaling problem in 1.7.x

On Mon, Oct 31, 2011 at 10:12:04AM -0600, michael_rytting@agilent.com
wrote:
> I did an additional benchmark doing "svn rm dir/*" on a local 
> directory instead of an nfs directory.  It runs in 10.4s.  Is going 
> from 10.4s to 6m15s acceptable when using a working copy on nfs vs 
> local?  I am fine with a certain amount of slowdown when using nfs.
> But, I don't see this kind of degradation on nfs vs local for other 
> operations.  Svn rm seems particularly susceptible.
> 

I've taken a stab at fixing this, see
http://svn.apache.org/viewvc?view=revision&revision=1195873

Would you be able to test a patch against 1.7.x?
There are some conflicts when the above commit is merged into the 1.7.x branch, so I'll have to prepare a special patch for 1.7.x.

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Oct 31, 2011 at 10:12:04AM -0600, michael_rytting@agilent.com
wrote:
> I did an additional benchmark doing "svn rm dir/*" on a local
> directory instead of an nfs directory.  It runs in 10.4s.  Is going
> from 10.4s to 6m15s acceptable when using a working copy on nfs vs
> local?  I am fine with a certain amount of slowdown when using nfs.
> But, I don't see this kind of degradation on nfs vs local for other
> operations.  Svn rm seems particularly susceptible.
> 

I've taken a stab at fixing this, see
http://svn.apache.org/viewvc?view=revision&revision=1195873

Would you be able to test a patch against 1.7.x?
There are some conflicts when the above commit is merged into the 1.7.x
branch, so I'll have to prepare a special patch for 1.7.x.

RE: Apparent "svn rm" scaling problem in 1.7.x

Posted by mi...@agilent.com.
I did an additional benchmark doing "svn rm dir/*" on a local directory instead of an nfs directory.  It runs in 10.4s.  Is going from 10.4s to 6m15s acceptable when using a working copy on nfs vs local?  I am fine with a certain amount of slowdown when using nfs.  But, I don't see this kind of degradation on nfs vs local for other operations.  Svn rm seems particularly susceptible.

-----Original Message-----
From: Stefan Sperling [mailto:stsp@elego.de] 
Sent: Monday, October 31, 2011 9:45 AM
To: RYTTING,MICHAEL (A-ColSprings,ex1)
Cc: users@subversion.apache.org
Subject: Re: Apparent "svn rm" scaling problem in 1.7.x

On Mon, Oct 31, 2011 at 09:09:21AM -0600, michael_rytting@agilent.com wrote:
> I am starting to see some very bad performance with "svn rm" compared to the 1.6.x line of subversion.  I have a directory that is full of files.  If I go into the directory and run "svn rm *", it is significantly slower than running svn rm on the whole directory.  While the difference in time taken is significant, the speed is still relatively acceptable for small working copies.
> 
> "svn rm dir/*"   3.8s      1.7.1
> "svn rm dir"        0.173s 1.7.1
> "svn rm dir/*"    1.008s 1.6.17
> 
> In this case 1.6.17 is nearly 4 times faster.  My working copy is nfs mounted and has 200 nodes.
> 

What happens here is that each deletion target uses one sqlite transaction.
AFAIK creating sqlite transactions on NFS is expensive.

In the recursive case, only one transaction is used (for 'dir').
In the non-recursive case, the shell expands the target list, and svn ends up using one transaction per target.

To fix this problem, we could change the implementation so that one sqlite transaction is used even for multiple targets.

> When the working copy size starts to get larger, deleting the same directory as above, but within a larger working copy and things really start to slow down.
> 
> "svn rm dir/*"   6m15s      1.7.1
> "svn rm dir"        8.5s          1.7.1
> "svn rm dir/*"    1.14s       1.6.17
> 
> This working copy is also nfs mounted but I'm up to 23948 nodes (doing
> sqlite3 .svn/wc.db "select count (*) from nodes").  The directory that 
> I am deleting has 49 files.  The largest is file is 2.5k.  I am doing 
> the comparison of the same directory of the same repository.  In one 
> case I am only doing a partial checkout to keep the working copy size 
> down where the second test I am doing a complete checkout.
> 
> We can also see that subversion 1.6.17 scales very well for working copy size.

Yes, 1.6 performed better on NFS than 1.7 does.
This is because of how sqlite behaves on NFS filesystems.
But 1.7 performs and scale an awful lot better on local disk than 1.6 did, so this was a tradeoff we were willing to make.
I am not sure how much additional performance can be squeezed out of sqlite on NFS. It has been said that not much can be done about it.
But hints are welcome in case anyone has an idea.

Re: Apparent "svn rm" scaling problem in 1.7.x

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Oct 31, 2011 at 09:09:21AM -0600, michael_rytting@agilent.com wrote:
> I am starting to see some very bad performance with "svn rm" compared to the 1.6.x line of subversion.  I have a directory that is full of files.  If I go into the directory and run "svn rm *", it is significantly slower than running svn rm on the whole directory.  While the difference in time taken is significant, the speed is still relatively acceptable for small working copies.
> 
> "svn rm dir/*"   3.8s      1.7.1
> "svn rm dir"        0.173s 1.7.1
> "svn rm dir/*"    1.008s 1.6.17
> 
> In this case 1.6.17 is nearly 4 times faster.  My working copy is nfs mounted and has 200 nodes.
> 

What happens here is that each deletion target uses one sqlite transaction.
AFAIK creating sqlite transactions on NFS is expensive.

In the recursive case, only one transaction is used (for 'dir').
In the non-recursive case, the shell expands the target list, and
svn ends up using one transaction per target.

To fix this problem, we could change the implementation so that
one sqlite transaction is used even for multiple targets.

> When the working copy size starts to get larger, deleting the same directory as above, but within a larger working copy and things really start to slow down.
> 
> "svn rm dir/*"   6m15s      1.7.1
> "svn rm dir"        8.5s          1.7.1
> "svn rm dir/*"    1.14s       1.6.17
> 
> This working copy is also nfs mounted but I'm up to 23948 nodes (doing
> sqlite3 .svn/wc.db "select count (*) from nodes").  The directory that
> I am deleting has 49 files.  The largest is file is 2.5k.  I am doing
> the comparison of the same directory of the same repository.  In one
> case I am only doing a partial checkout to keep the working copy size
> down where the second test I am doing a complete checkout.
> 
> We can also see that subversion 1.6.17 scales very well for working copy size.

Yes, 1.6 performed better on NFS than 1.7 does.
This is because of how sqlite behaves on NFS filesystems.
But 1.7 performs and scale an awful lot better on local disk than 1.6 did,
so this was a tradeoff we were willing to make.
I am not sure how much additional performance can be squeezed out of
sqlite on NFS. It has been said that not much can be done about it.
But hints are welcome in case anyone has an idea.