You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Marc Sherman <ms...@projectile.ca> on 2005/07/25 20:42:19 UTC

Slow performance in check-case-insensitive.pl?

Has anyone ever noticed very poor performance in the 
check-case-insensitive.pl hook script?  I've been seeing it take a very 
long time to complete, even on simple commits like creating a new 
directory in my tags directory, using svn mkdir with an url.

Any pointers on how to speed this up, or where to look for the problem?

Thanks,
- Marc

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Slow performance in check-case-insensitive.pl?

Posted by Martin Tomes <li...@tomes.org>.
Marc Sherman wrote:
> Thanks for getting back to me, Martin.  Do you read users@subversion, or 
> would you like me to keep CC'ing you?

> Ok, do you possibly have a few minutes to describe the changes you were 
> planning to make in a bit more detail, so that I can take a crack at 
> them myself?  My python's quite rusty, but I'll try to get over my 
> irrational distaste for significant whitespace. :)

Python isn't as bad as I thought it would be, I even got to quite like 
it despite knowing Perl pretty well:-)  Actually I find the Python 
humour a wearing and that's my biggest criticism of it.

Currently the script looks for all added files, then it finds the 
highest common point in the file tree.  It the gets a list of files in 
the transaction from that point downwards and checks every directory for 
a clash.

This is a bit silly, but being new to Python and more importantly the 
bindings I took the conservative route and use the algorithm I intended 
to use in the Perl version.

What the script should do is:

for each of the added files:
   work out the directory it is in
   add it to a list removing duplicates

for each of the directories
   get a listing of that directory and *not* it's children
   check that listing for name clashes

Example 8.2 from the book has a useful snippet of code for doing this:

http://svnbook.red-bean.com/en/1.1/svn-book.html#svn-ch-8-sect-2.3-ex-1

I think the script will be smaller, simpler and most importantly a lot 
faster if you do this.

>> The old script can miss filename clashes, so the new one is an 
>> improvement, honest:-)
> 
> What's the failure mode, adding two files with clashing names in a 
> single revision?  Or is there a more subtle failure mode I've missed?

It is the way it calculates the root of the changed part of the tree, it 
can get that wrong so if there are separate clashes in different parts 
of the tree the perl version can miss one of them.

Thanks for the offer of help - it is much appreciated.

-- 
Martin Tomes
echo 'martin at tomes x org x uk'\
  | sed -e 's/ x /\./g' -e 's/ at /@/'

Visit http://www.subversionary.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Slow performance in check-case-insensitive.pl?

Posted by Marc Sherman <ms...@projectile.ca>.
Thanks for getting back to me, Martin.  Do you read users@subversion, or 
would you like me to keep CC'ing you?

Martin Tomes wrote:
> 
> I am surprised it takes longer.  Luckily not much longer!

Well, 26% is pretty significant (and those were repeatable measurements
to within .1s, by the way).  But considering that the perl version was
so slow already that I was asked to remove it from the repo, it doesn't
really matter.  We're a small enough team that we can get by on fear and
shame for a while, but it would be nice to be able to enforce this properly.

> Currently it is still doing a recursive listing of the repository
> which isn't really needed.  The next step is to stop it doing that,
> but right now I don't have time to do that.  I will try to get to
> that as soon as I can but we have a product to release and then I am
> going on vacation for a week.

Ok, do you possibly have a few minutes to describe the changes you were 
planning to make in a bit more detail, so that I can take a crack at 
them myself?  My python's quite rusty, but I'll try to get over my 
irrational distaste for significant whitespace. :)

> The old script can miss filename clashes, so the new one is an 
> improvement, honest:-)

What's the failure mode, adding two files with clashing names in a 
single revision?  Or is there a more subtle failure mode I've missed?

Thanks,
- Marc

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Slow performance in check-case-insensitive.pl?

Posted by Martin Tomes <li...@tomes.org>.
Marc Sherman wrote:
> Nathan Kidd wrote:
> 
> Hrm.  Thanks for the pointer.  However, the new python script actually 
> seems to be _worse_ than the old perl script.
> 
> Here's with the old perl script enabled:
> 
> msherman@TO-SCM-01:~$ time svn mkdir 
> http://$SERVER/repos/svntest/tags/empty-test -m "create empty tag dir"
> 
> Committed revision 146.
> 
> real    0m31.552s
> user    0m0.018s
> sys     0m0.005s
> 
> and top shows:
> 
>   PID USER   PR NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 16255 apache 25  0  7060 1708 5404 R 99.3  0.1   0:11.37 svnlook
> 16251 apache 16  0  9088 5656 2920 S 12.3  0.3   0:01.53 check-case-inse
> 
> With the new python script enabled:
> 
> msherman@TO-SCM-01:~$ time svn mkdir 
> http://$SERVER/repos/svntest/tags/empty-test-2 -m "create empty tag dir"
> 
> Committed revision 147.
> 
> real    0m39.686s
> user    0m0.018s
> sys     0m0.008s
> 
> and top shows (note that svnlook is no longer the big hog, 
> check-case-insensitive.py is instead):
> 
>   PID USER   PR NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 17059 apache 25  0 15912  11m 4672 R 99.9  0.6   0:27.28 check-case-inse

I am surprised it takes longer.  Luckily not much longer!  Currently it 
is still doing a recursive listing of the repository which isn't really 
needed.  The next step is to stop it doing that, but right now I don't 
have time to do that.  I will try to get to that as soon as I can but we 
have a product to release and then I am going on vacation for a week.

Adding files or directories at the top of a large tree in the repository 
will be the worse case.

The old script can miss filename clashes, so the new one is an 
improvement, honest:-)

-- 
Martin Tomes
echo 'martin at tomes x org x uk'\
  | sed -e 's/ x /\./g' -e 's/ at /@/'

Visit http://www.subversionary.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Slow performance in check-case-insensitive.pl?

Posted by Marc Sherman <ms...@projectile.ca>.
Nathan Kidd wrote:
> 
> The old perl version of this script is deprecated. (It was doing a 
> recursive svnlook on any changed paths.  This was fine when working deep 
> in a tree, but for actions like creating a tag as you are doing it means 
> enumerating a huge swath of the tree.)
> 
> Martin Tomes (the script author) posted this to the TortoiseSVN list a 
> few days ago: http://svn.haxx.se/tsvn/archive-2005-07/0754.shtml
> 
> Essentially check the svn trunk for a newer Python version of this script.

Hrm.  Thanks for the pointer.  However, the new python script actually 
seems to be _worse_ than the old perl script.

Here's with the old perl script enabled:

msherman@TO-SCM-01:~$ time svn mkdir 
http://$SERVER/repos/svntest/tags/empty-test -m "create empty tag dir"

Committed revision 146.

real    0m31.552s
user    0m0.018s
sys     0m0.005s

and top shows:

   PID USER   PR NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
16255 apache 25  0  7060 1708 5404 R 99.3  0.1   0:11.37 svnlook
16251 apache 16  0  9088 5656 2920 S 12.3  0.3   0:01.53 check-case-inse

With the new python script enabled:

msherman@TO-SCM-01:~$ time svn mkdir 
http://$SERVER/repos/svntest/tags/empty-test-2 -m "create empty tag dir"

Committed revision 147.

real    0m39.686s
user    0m0.018s
sys     0m0.008s

and top shows (note that svnlook is no longer the big hog, 
check-case-insensitive.py is instead):

   PID USER   PR NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17059 apache 25  0 15912  11m 4672 R 99.9  0.6   0:27.28 check-case-inse

- Marc

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Slow performance in check-case-insensitive.pl?

Posted by Nathan Kidd <na...@spicycrypto.ca>.
Marc Sherman wrote:
> Has anyone ever noticed very poor performance in the 
> check-case-insensitive.pl hook script?  I've been seeing it take a very 
> long time to complete, even on simple commits like creating a new 
> directory in my tags directory, using svn mkdir with an url.
> 
> Any pointers on how to speed this up, or where to look for the problem?

The old perl version of this script is deprecated. (It was doing a 
recursive svnlook on any changed paths.  This was fine when working deep 
in a tree, but for actions like creating a tag as you are doing it means 
enumerating a huge swath of the tree.)

Martin Tomes (the script author) posted this to the TortoiseSVN list a 
few days ago: http://svn.haxx.se/tsvn/archive-2005-07/0754.shtml

Essentially check the svn trunk for a newer Python version of this script.

HTH,

-Nathan

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org