You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Neil Bird <ne...@jibbyjobby.co.uk> on 2011/01/26 15:28:52 UTC

Checkout really slow in Windows with lots of files in one directory

   We have a graphics-oriented code-base that's auto-generated and has >5000 
source files in one directory.  While I can check this out OK on Linux, 
we're seeing an unusable slow-down on Windows XP (NTFS), both using Tortoise 
directly, and as a test on Linux with the Windows drive mapped over CIFS.

   The checkout starts sensibly enough, but then gets steadily slower and 
slower and slower, to the point were we're not sure it'd actually ever end.

   I know that there's a negative speed difference on NTFS, and that 1.7's 
WC-NG might make this better, but this is getting near-logarithmically slower.

   Is that to be expected, or at least known about?


   (we're going to jigger the files around into sep. directories to get the 
individual counts down;  I expect that to help in this instance).

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: Checkout really slow in Windows with lots of files in one directory

Posted by OBones <ob...@free.fr>.
Stefan Sperling wrote:
> On Wed, Feb 02, 2011 at 04:12:07PM +0100, OBones wrote:
>    
>> Neil Bird wrote:
>>      
>>>   I couldn't use the version from trunk/1.7 as it differs too
>>> much.  I will try to submit the patch for someone's perusal at
>>> some point, but I couldn't properly test is as for some reason my
>>> build of svn out of 1.6.x svn (even before modifying it) fails
>>> 'make check':  all the tests pass, but I get a load of XFAIL lines
>>> I don't understand.
>>>
>>>   AFAICT, the XFAILs I get after my patch are the same as before,
>>> but it's not the warm PASS feeling I was hoping for.
>>>        
>> XFAIL stands for "eXpected to FAIL", meaning that if SVN is doing
>> good, then the test should fail.
>> For instance, you expect svn not to delete an existing file on
>> checkout, so you write your test expecting it to fail the checkout.
>> If it succeeds, the test has not failed as expected.
>>      
> That's not quite right. Behaviour like that would be verified
> via a PASS test. The test would FAIL if svn overwrote the file.
>
> XFails are used to flag known bugs or undesirable behaviour which
> cannot be fixed at present (e.g. there are a couple of XFAIL tests
> for tree conflict handling -- we'd like to do better, but can't at the
> moment).  Once the bug is fixed, the test will XPASS (unexpected PASS),
> and we switch it to PASS then.
>
>    
Ah that makes even more sense this way, thanks for the clarification

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Feb 02, 2011 at 04:12:07PM +0100, OBones wrote:
> Neil Bird wrote:
> >  I couldn't use the version from trunk/1.7 as it differs too
> >much.  I will try to submit the patch for someone's perusal at
> >some point, but I couldn't properly test is as for some reason my
> >build of svn out of 1.6.x svn (even before modifying it) fails
> >'make check':  all the tests pass, but I get a load of XFAIL lines
> >I don't understand.
> >
> >  AFAICT, the XFAILs I get after my patch are the same as before,
> >but it's not the warm PASS feeling I was hoping for.
> XFAIL stands for "eXpected to FAIL", meaning that if SVN is doing
> good, then the test should fail.
> For instance, you expect svn not to delete an existing file on
> checkout, so you write your test expecting it to fail the checkout.
> If it succeeds, the test has not failed as expected.

That's not quite right. Behaviour like that would be verified
via a PASS test. The test would FAIL if svn overwrote the file.

XFails are used to flag known bugs or undesirable behaviour which
cannot be fixed at present (e.g. there are a couple of XFAIL tests
for tree conflict handling -- we'd like to do better, but can't at the
moment).  Once the bug is fixed, the test will XPASS (unexpected PASS),
and we switch it to PASS then.

Usually, there is an open issue corresponding to an XFail test.

A test may also be marked XFAIL temporarily to keep the buildbots
quiet about a test that is expected to fail for a while during
on-going development. Marking a test XFAIL is easier than changing
the test expectations and changing them again later. It's a one-line
change vs. potentially larger changes.

Ideally, there wouldn't be any XFAILing tests. Most XFAIL tests indicate
that a developer had to punt on fixing a problem, or postpone resolution
of a problem because of other blocking issues.

Stefan

Re: Checkout really slow in Windows with lots of files in one directory

Posted by OBones <ob...@free.fr>.
Neil Bird wrote:
>   I couldn't use the version from trunk/1.7 as it differs too much.  I 
> will try to submit the patch for someone's perusal at some point, but 
> I couldn't properly test is as for some reason my build of svn out of 
> 1.6.x svn (even before modifying it) fails 'make check':  all the 
> tests pass, but I get a load of XFAIL lines I don't understand.
>
>   AFAICT, the XFAILs I get after my patch are the same as before, but 
> it's not the warm PASS feeling I was hoping for.
XFAIL stands for "eXpected to FAIL", meaning that if SVN is doing good, 
then the test should fail.
For instance, you expect svn not to delete an existing file on checkout, 
so you write your test expecting it to fail the checkout. If it 
succeeds, the test has not failed as expected.


Re: Checkout really slow in Windows with lots of files in one directory

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 02/02/11 15:26, Stefan Sperling typed ...
> Please submit a patch against the 1.6.x branch. I will handle it.

   Will do, thanks.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Feb 02, 2011 at 03:04:22PM +0000, Neil Bird wrote:
>   I have since compiled up the latest 1.6.x branch, and patched
> svn_io_open_unique_file3() to not call svn_io_open_uniquely_named()
> with 'tempfile.tmp', & instead I cut'n'paste
> svn_io_open_uniquely_named(), replacing the 1-99999 loop with
> 1-MAX_RAND and used the output of rand() in the filename to try.
> 
>   This gave me a few seconds improvement in Linux, and a drastically
> improved 5 minutes in Windows (possibly only now slower as it was
> writing to a network mapped drive).
> 
> 
>   I couldn't use the version from trunk/1.7 as it differs too much.
> I will try to submit the patch for someone's perusal at some point,
> but I couldn't properly test is as for some reason my build of svn
> out of 1.6.x svn (even before modifying it) fails 'make check':  all
> the tests pass, but I get a load of XFAIL lines I don't understand.

Please submit a patch against the 1.6.x branch. I will handle it.

> 
>   AFAICT, the XFAILs I get after my patch are the same as before,
> but it's not the warm PASS feeling I was hoping for.

XFAILs don't matter -- they're expected failures.

Thanks,
Stefan

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 01/02/11 18:00, Mark Phippard typed ...
> I created a folder with 5001 files in it ... maybe that is not enough?

   I think the issue is to do with properties, so you'd have to make sure 
all those files have props. (we use autoprops for 
mime-type/eol-style/header-expansion).

   IIRC, upon checkout the file is copied to .svn/tmp/text-base using a 
version of the file name, but the associated prop-base file which the 
metadata are put into is called tempfileXXX.tmp, with an ever-increasing 
XXX, the calculation of which (via file probing) gives the logarithmic slow 
down.

   I don't know why it does that (or if I'm mis-interpreting) as the final 
prop-base files are named after the source file and not a temp file name.

   Certainly, I've got 1.6.15 on both Linux and Windows and they both do 
this on an NTFS drive (it was something like 85 minutes to check out our 
~5200 files [may be 6200, can't remember]).  Linux did it to ext3 in a 
little over a minute.


   I have since compiled up the latest 1.6.x branch, and patched 
svn_io_open_unique_file3() to not call svn_io_open_uniquely_named() with 
'tempfile.tmp', & instead I cut'n'paste svn_io_open_uniquely_named(), 
replacing the 1-99999 loop with 1-MAX_RAND and used the output of rand() in 
the filename to try.

   This gave me a few seconds improvement in Linux, and a drastically 
improved 5 minutes in Windows (possibly only now slower as it was writing to 
a network mapped drive).


   I couldn't use the version from trunk/1.7 as it differs too much.  I will 
try to submit the patch for someone's perusal at some point, but I couldn't 
properly test is as for some reason my build of svn out of 1.6.x svn (even 
before modifying it) fails 'make check':  all the tests pass, but I get a 
load of XFAIL lines I don't understand.

   AFAICT, the XFAILs I get after my patch are the same as before, but it's 
not the warm PASS feeling I was hoping for.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Nick <no...@codesniffer.com>.
On Wed, 2011-02-02 at 07:52 -0500, Mark Phippard wrote:
> On Wed, Feb 2, 2011 at 7:41 AM, Geoff Rowell <ge...@gmail.com> wrote:
> > On Wed, Feb 2, 2011 at 4:09 AM, Nick <no...@codesniffer.com> wrote:
> >> On Tue, 2011-02-01 at 13:00 -0500, Mark Phippard wrote:
> >>
> >> On Wed, Jan 26, 2011 at 9:28 AM, Neil Bird <ne...@jibbyjobby.co.uk> wrote:
> >>
> >>>  We have a graphics-oriented code-base that's auto-generated and has >5000
> >>> source files in one directory.  While I can check this out OK on Linux,
> >>> we're seeing an unusable slow-down on Windows XP (NTFS), both using
> >>> Tortoise
> >>> directly, and as a test on Linux with the Windows drive mapped over CIFS.
> >>
> >> I created a folder with 5001 files in it ... maybe that is not enough?
> >>  I just used small simple text files as I was only checking for the
> >> general problem in managing the temp files and the WC metadata.
> >>
> >> Upon checkout (using 1.6.15 command line client) I did not notice any
> >> slowdown.  Windows checked out via HTTP across internet in about 49
> >> seconds as opposed to 33 from my Mac (which is a faster system).  The
> >> main thing is checkout did not seem to slow down.
> >>
> >> I did a similar test, using 5100 files in a single directory.  Each file
> >> contained only the content "file XXXX" where XXXX was the number of the file
> >> (so tiny files).  My linux system took 17 seconds, while Windows took a bit
> >> less than 2 min (but Windows is virtualized while linux is on the
> >> hardware).  I also did not notice a slow-down as the checkout proceeded.
> >> Both systems used 1.6.15 and accessed the repo via https.
> >>
> >> I did, however, notice that the time to *add* the files (done via svn add
> >> *.txt) seemed to progressively slow down.  But this was only observed by
> >> watching the files in the console as they were being added (it was
> >> relatively easy to see the rate because the each file name had a linear
> >> number at the end).  I don't have any timings to back this up, though I'll
> >> collect some if anyone's interested.
> >>
> > I don't know why, but I believe the key thing here is working with
> > *binary* files.
> >
> > I noticed the same problem with a massive (10K+) amount of audio
> > snippets in a single directory.
> 
> I was thinking that this was a case where the reading/parsing/writing
> of our large entries file was causing a slowdown and moving to SQLite
> was going to bring performance gains.  Clearly that is not the case as
> trunk is much slower.
> 
> If I get another batch of free time I will try it with a lot of small PNG's.

I repeated my test of checking out a repo w/ 5100 files, but this time
using binary files (192 byte PNGs).  1m 7sec on linux, 6 min on Windows
(again, virtualized).  On windows, it went fairly quick through all the
files, and then sat for several minutes after listing the last file and
completing the command.

Time taken listing all files on Windows during checkout: ~3 min
csrss - started at 70%-80% CPU, declined to < 20% CPU by the end of the
checkout
svn.exe - inverse of csrss (took remaining CPU to 100%)

After file listing, and before command completes: ~3 min.
During this time, Windows (virtualized) took ~10% CPU of the host OS,
and the 'svn' EXE was occasionally taking 2%-10%--the guest OS was
predominantly idle.

Nick



RE: Checkout really slow in Windows with lots of files in one directory

Posted by Bob Archer <Bo...@amsi.com>.
> On Wed, Feb 2, 2011 at 7:41 AM, Geoff Rowell
> <ge...@gmail.com> wrote:
> > On Wed, Feb 2, 2011 at 4:09 AM, Nick <no...@codesniffer.com>
> wrote:
> >> On Tue, 2011-02-01 at 13:00 -0500, Mark Phippard wrote:
> >>
> >> On Wed, Jan 26, 2011 at 9:28 AM, Neil Bird
> <ne...@jibbyjobby.co.uk> wrote:
> >>
> >>>  We have a graphics-oriented code-base that's auto-generated
> and has >5000
> >>> source files in one directory.  While I can check this out OK
> on Linux,
> >>> we're seeing an unusable slow-down on Windows XP (NTFS), both
> using
> >>> Tortoise
> >>> directly, and as a test on Linux with the Windows drive mapped
> over CIFS.
> >>
> >> I created a folder with 5001 files in it ... maybe that is not
> enough?
> >>  I just used small simple text files as I was only checking for
> the
> >> general problem in managing the temp files and the WC metadata.
> >>
> >> Upon checkout (using 1.6.15 command line client) I did not
> notice any
> >> slowdown.  Windows checked out via HTTP across internet in about
> 49
> >> seconds as opposed to 33 from my Mac (which is a faster system).
>  The
> >> main thing is checkout did not seem to slow down.
> >>
> >> I did a similar test, using 5100 files in a single directory.
> Each file
> >> contained only the content "file XXXX" where XXXX was the number
> of the file
> >> (so tiny files).  My linux system took 17 seconds, while Windows
> took a bit
> >> less than 2 min (but Windows is virtualized while linux is on
> the
> >> hardware).  I also did not notice a slow-down as the checkout
> proceeded.
> >> Both systems used 1.6.15 and accessed the repo via https.
> >>
> >> I did, however, notice that the time to *add* the files (done
> via svn add
> >> *.txt) seemed to progressively slow down.  But this was only
> observed by
> >> watching the files in the console as they were being added (it
> was
> >> relatively easy to see the rate because the each file name had a
> linear
> >> number at the end).  I don't have any timings to back this up,
> though I'll
> >> collect some if anyone's interested.
> >>
> > I don't know why, but I believe the key thing here is working
> with
> > *binary* files.
> >
> > I noticed the same problem with a massive (10K+) amount of audio
> > snippets in a single directory.
> 
> I was thinking that this was a case where the
> reading/parsing/writing
> of our large entries file was causing a slowdown and moving to
> SQLite
> was going to bring performance gains.  Clearly that is not the case
> as
> trunk is much slower.
> 
> If I get another batch of free time I will try it with a lot of
> small PNG's.

Running Working Copies on RAM drives in Windows makes it fly like the devil. For anyone inclined to give it a try. Of course, you need some free RAM to be able to do it. I set up a 2GB RAM drive to try it. I was really trying to improve Visual Studio perf though, and not svn... so I don't use it any more. But, I did notice that all svn stuff was much much faster. I can't recall the software I tried.

BOb


Re: Checkout really slow in Windows with lots of files in one directory

Posted by Mark Phippard <ma...@gmail.com>.
On Wed, Feb 2, 2011 at 7:41 AM, Geoff Rowell <ge...@gmail.com> wrote:
> On Wed, Feb 2, 2011 at 4:09 AM, Nick <no...@codesniffer.com> wrote:
>> On Tue, 2011-02-01 at 13:00 -0500, Mark Phippard wrote:
>>
>> On Wed, Jan 26, 2011 at 9:28 AM, Neil Bird <ne...@jibbyjobby.co.uk> wrote:
>>
>>>  We have a graphics-oriented code-base that's auto-generated and has >5000
>>> source files in one directory.  While I can check this out OK on Linux,
>>> we're seeing an unusable slow-down on Windows XP (NTFS), both using
>>> Tortoise
>>> directly, and as a test on Linux with the Windows drive mapped over CIFS.
>>
>> I created a folder with 5001 files in it ... maybe that is not enough?
>>  I just used small simple text files as I was only checking for the
>> general problem in managing the temp files and the WC metadata.
>>
>> Upon checkout (using 1.6.15 command line client) I did not notice any
>> slowdown.  Windows checked out via HTTP across internet in about 49
>> seconds as opposed to 33 from my Mac (which is a faster system).  The
>> main thing is checkout did not seem to slow down.
>>
>> I did a similar test, using 5100 files in a single directory.  Each file
>> contained only the content "file XXXX" where XXXX was the number of the file
>> (so tiny files).  My linux system took 17 seconds, while Windows took a bit
>> less than 2 min (but Windows is virtualized while linux is on the
>> hardware).  I also did not notice a slow-down as the checkout proceeded.
>> Both systems used 1.6.15 and accessed the repo via https.
>>
>> I did, however, notice that the time to *add* the files (done via svn add
>> *.txt) seemed to progressively slow down.  But this was only observed by
>> watching the files in the console as they were being added (it was
>> relatively easy to see the rate because the each file name had a linear
>> number at the end).  I don't have any timings to back this up, though I'll
>> collect some if anyone's interested.
>>
> I don't know why, but I believe the key thing here is working with
> *binary* files.
>
> I noticed the same problem with a massive (10K+) amount of audio
> snippets in a single directory.

I was thinking that this was a case where the reading/parsing/writing
of our large entries file was causing a slowdown and moving to SQLite
was going to bring performance gains.  Clearly that is not the case as
trunk is much slower.

If I get another batch of free time I will try it with a lot of small PNG's.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Geoff Rowell <ge...@gmail.com>.
On Wed, Feb 2, 2011 at 4:09 AM, Nick <no...@codesniffer.com> wrote:
> On Tue, 2011-02-01 at 13:00 -0500, Mark Phippard wrote:
>
> On Wed, Jan 26, 2011 at 9:28 AM, Neil Bird <ne...@jibbyjobby.co.uk> wrote:
>
>>  We have a graphics-oriented code-base that's auto-generated and has >5000
>> source files in one directory.  While I can check this out OK on Linux,
>> we're seeing an unusable slow-down on Windows XP (NTFS), both using
>> Tortoise
>> directly, and as a test on Linux with the Windows drive mapped over CIFS.
>
> I created a folder with 5001 files in it ... maybe that is not enough?
>  I just used small simple text files as I was only checking for the
> general problem in managing the temp files and the WC metadata.
>
> Upon checkout (using 1.6.15 command line client) I did not notice any
> slowdown.  Windows checked out via HTTP across internet in about 49
> seconds as opposed to 33 from my Mac (which is a faster system).  The
> main thing is checkout did not seem to slow down.
>
> I did a similar test, using 5100 files in a single directory.  Each file
> contained only the content "file XXXX" where XXXX was the number of the file
> (so tiny files).  My linux system took 17 seconds, while Windows took a bit
> less than 2 min (but Windows is virtualized while linux is on the
> hardware).  I also did not notice a slow-down as the checkout proceeded.
> Both systems used 1.6.15 and accessed the repo via https.
>
> I did, however, notice that the time to *add* the files (done via svn add
> *.txt) seemed to progressively slow down.  But this was only observed by
> watching the files in the console as they were being added (it was
> relatively easy to see the rate because the each file name had a linear
> number at the end).  I don't have any timings to back this up, though I'll
> collect some if anyone's interested.
>
I don't know why, but I believe the key thing here is working with
*binary* files.

I noticed the same problem with a massive (10K+) amount of audio
snippets in a single directory.
-- 
Geoff Rowell
geoff.rowell@gmail.com

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Nick <no...@codesniffer.com>.
On Tue, 2011-02-01 at 13:00 -0500, Mark Phippard wrote:

> On Wed, Jan 26, 2011 at 9:28 AM, Neil Bird <ne...@jibbyjobby.co.uk> wrote:
> 
> >  We have a graphics-oriented code-base that's auto-generated and has >5000
> > source files in one directory.  While I can check this out OK on Linux,
> > we're seeing an unusable slow-down on Windows XP (NTFS), both using Tortoise
> > directly, and as a test on Linux with the Windows drive mapped over CIFS.
> 
> I created a folder with 5001 files in it ... maybe that is not enough?
>  I just used small simple text files as I was only checking for the
> general problem in managing the temp files and the WC metadata.
> 
> Upon checkout (using 1.6.15 command line client) I did not notice any
> slowdown.  Windows checked out via HTTP across internet in about 49
> seconds as opposed to 33 from my Mac (which is a faster system).  The
> main thing is checkout did not seem to slow down.



I did a similar test, using 5100 files in a single directory.  Each file
contained only the content "file XXXX" where XXXX was the number of the
file (so tiny files).  My linux system took 17 seconds, while Windows
took a bit less than 2 min (but Windows is virtualized while linux is on
the hardware).  I also did not notice a slow-down as the checkout
proceeded.  Both systems used 1.6.15 and accessed the repo via https.

I did, however, notice that the time to *add* the files (done via svn
add *.txt) seemed to progressively slow down.  But this was only
observed by watching the files in the console as they were being added
(it was relatively easy to see the rate because the each file name had a
linear number at the end).  I don't have any timings to back this up,
though I'll collect some if anyone's interested.

Nick


Re: Checkout really slow in Windows with lots of files in one directory

Posted by Mark Phippard <ma...@gmail.com>.
On Wed, Jan 26, 2011 at 9:28 AM, Neil Bird <ne...@jibbyjobby.co.uk> wrote:

>  We have a graphics-oriented code-base that's auto-generated and has >5000
> source files in one directory.  While I can check this out OK on Linux,
> we're seeing an unusable slow-down on Windows XP (NTFS), both using Tortoise
> directly, and as a test on Linux with the Windows drive mapped over CIFS.

I created a folder with 5001 files in it ... maybe that is not enough?
 I just used small simple text files as I was only checking for the
general problem in managing the temp files and the WC metadata.

Upon checkout (using 1.6.15 command line client) I did not notice any
slowdown.  Windows checked out via HTTP across internet in about 49
seconds as opposed to 33 from my Mac (which is a faster system).  The
main thing is checkout did not seem to slow down.

The bad news is that I tried it again with trunk.  Both Windows and
Mac were much slower and I was doing the tests because I thought it
would be much faster.  Windows time went up to 210 secs and OSX to
about 90 seconds.

In all cases, the performance seemed linear.  I did not see it start
to get slower.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 26/01/11 14:41, Campbell Allan typed ...
> If the code is auto generated would it be possible to generate it for each
> build?

   Interesting question;  I shall have to ask it.  I *think*, in actual 
fact, that it was *originally* generated, but has since been tweaked 
manually or with scripts.  Or it may have been done in an 'alien' 
environment (Cygwin or Linux) to the Windows-only build process.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Campbell Allan <ca...@sword-ciboodle.com>.
On Wednesday 26 Jan 2011, Campbell Allan wrote:
> On Wednesday 26 Jan 2011, Neil Bird wrote:
> >    We have a graphics-oriented code-base that's auto-generated and has
> >
> > >5000 source files in one directory.  While I can check this out OK on
> >
> > Linux, we're seeing an unusable slow-down on Windows XP (NTFS), both
> > using Tortoise directly, and as a test on Linux with the Windows drive
> > mapped over CIFS.
> >
> >    The checkout starts sensibly enough, but then gets steadily slower and
> > slower and slower, to the point were we're not sure it'd actually ever
> > end.
> >
> >    I know that there's a negative speed difference on NTFS, and that
> > 1.7's WC-NG might make this better, but this is getting
> > near-logarithmically slower.
> >
> >    Is that to be expected, or at least known about?
> >
> >
> >    (we're going to jigger the files around into sep. directories to get
> > the individual counts down;  I expect that to help in this instance).
>
> That is what I recall from previous reports. I originally was going to see
> if anything could be done as it sounds like a classic problem of a linear
> search/sort over a growing list. The big unanswered question was where was
> this list.
>
> If the code is auto generated would it be possible to generate it for each
> build? That's what we typically do where I work. Anything that is generated
> is not committed. A bad example would be to say I have java source code, I
> don't need to commit the compiled byte code too or jars too.

I should have added I'm not a subversion dev, just a curious interested party 
as a lot of my colleagues were still on windows at the time I first 
encountered this.

-- 

__________________________________________________________________________________
Sword Ciboodle is the trading name of ciboodle Limited (a company 
registered in Scotland with registered number SC143434 and whose 
registered office is at India of Inchinnan, Renfrewshire, UK, 
PA4 9LH) which is part of the Sword Group of companies.

This email (and any attachments) is intended for the named
recipient(s) and is private and confidential. If it is not for you, 
please inform us and then delete it. If you are not the intended 
recipient(s), the use, disclosure, copying or distribution of any 
information contained within this email is prohibited. Messages to 
and from us may be monitored. If the content is not about the 
business of the Sword Group then the message is neither from nor 
sanctioned by us.

Internet communications are not secure. You should scan this
message and any attachments for viruses. Under no circumstances
do we accept liability for any loss or damage which may result from
your receipt of this email or any attachment.
__________________________________________________________________________________


Re: Checkout really slow in Windows with lots of files in one directory

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 01/02/11 15:36, Neil Bird typed ...
> I've had no response from the email I sent him privately, so I may try that
> myself, and maybe then raise a ticket (or whatever system they use) to ask
> for the backport if it works.

   My bad, I just found Thunderbird has mis-threaded his reply.  He never 
tried it as they were using a workaround.

   I'll give it a bash myself, as our workaround is still sucking in 7+ minutes.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 28/01/11 04:32, Daniel Shahaf typed ...
> (I don't recall why the fixes on trunk weren't backported, but I imagine
> the threads linked from this thread say why.)

   The threads (that I've see) all peter out, with the last event being some 
saying a colleague was about to compile 1.6 with that fix backported to see 
if it helped.

   I've had no response from the email I sent him privately, so I may try 
that myself, and maybe then raise a ticket (or whatever system they use) to 
ask for the backport if it works.

   There may be a good reason it can't be done (there was a 'cleaner' way 
that couldn't have been done due to it being an API change, apparently, but 
this seems to be drop-in).

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
This will probably be backported to 1.6.(x+1) if someone tells dev@ what
revisions should be backported, or provides patches that we can apply. 

(I don't recall why the fixes on trunk weren't backported, but I imagine
the threads linked from this thread say why.)

Neil Bird wrote on Thu, Jan 27, 2011 at 08:23:50 +0000:
> Around about 27/01/11 01:40, Johan Corveleyn typed ...
>> http://svn.haxx.se/dev/archive-2010-04/0180.shtml
>>
>> Maybe someone else can comment more on this, and why it wasn't
>> backported to 1.6 (or is it?).
>
>   Excellent, yes, that would seem to be it.  I did notice the huge number 
> of counted .tmp files in .svn/tmp but didn't think anything of it.
>
>   The thread petered out with someone trying the 1.7  
> svn_io_open_unique_file impl. in a 1.6 build, but I can't see any 
> feedback from that (so I'm afraid I'm going to cheekily Cc: the guys who 
> posted that that was being tried!).
>
>
>   For now, I think I'm happy to simply be aware of the situation;  as I  
> posted earlier, we're going to split the files up across a no. of diff.  
> dirs. so hopefully that won't bite us.
>
>   If we are still suffering even after that, I'll try the above patched  
> myself, and then maybe I'll raise a ticket.
>
> -- 
> [neil@fnx ~]# rm -f .signature
> [neil@fnx ~]# ls -l .signature
> ls: .signature: No such file or directory
> [neil@fnx ~]# exit

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 27/01/11 01:40, Johan Corveleyn typed ...
> http://svn.haxx.se/dev/archive-2010-04/0180.shtml
>
> Maybe someone else can comment more on this, and why it wasn't
> backported to 1.6 (or is it?).

   Excellent, yes, that would seem to be it.  I did notice the huge number 
of counted .tmp files in .svn/tmp but didn't think anything of it.

   The thread petered out with someone trying the 1.7 
svn_io_open_unique_file impl. in a 1.6 build, but I can't see any feedback 
from that (so I'm afraid I'm going to cheekily Cc: the guys who posted that 
that was being tried!).


   For now, I think I'm happy to simply be aware of the situation;  as I 
posted earlier, we're going to split the files up across a no. of diff. 
dirs. so hopefully that won't bite us.

   If we are still suffering even after that, I'll try the above patched 
myself, and then maybe I'll raise a ticket.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, Jan 26, 2011 at 7:32 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Johan Corveleyn wrote on Wed, Jan 26, 2011 at 17:15:40 +0100:
>> If I have more time, I'll try to search the archives some more.
>
> If it helps your code/mail grepping, I believe you're talking about
> svn_io_open_uniquely_named().

Thanks, Daniel! That was exactly the search term I needed. Here is the
thread (from the dev-list) that describes exactly the same problem:

http://svn.haxx.se/dev/archive-2010-04/0180.shtml

Neil, please read that thread, and you'll see that it describes
exactly the same problem. It contains some interesting discussions and
points. As far as I understand it, it has been fixed for 1.7, but not
backported to 1.6 (backport was suggested, but the discussion ended
somewhere "in the middle").

Maybe someone else can comment more on this, and why it wasn't
backported to 1.6 (or is it?).

HTH,
-- 
Johan

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Johan Corveleyn wrote on Wed, Jan 26, 2011 at 17:15:40 +0100:
> If I have more time, I'll try to search the archives some more.

If it helps your code/mail grepping, I believe you're talking about 
svn_io_open_uniquely_named().

(I found it by doing `fgrep '.%' subversion/libsvn_subr/*c`.)

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, Jan 26, 2011 at 3:41 PM, Campbell Allan
<ca...@sword-ciboodle.com> wrote:
>
> On Wednesday 26 Jan 2011, Neil Bird wrote:
>>    We have a graphics-oriented code-base that's auto-generated and has
>> >5000 source files in one directory.  While I can check this out OK on
>> Linux, we're seeing an unusable slow-down on Windows XP (NTFS), both using
>> Tortoise directly, and as a test on Linux with the Windows drive mapped
>> over CIFS.
>>
>>    The checkout starts sensibly enough, but then gets steadily slower and
>> slower and slower, to the point were we're not sure it'd actually ever end.
>>
>>    I know that there's a negative speed difference on NTFS, and that 1.7's
>> WC-NG might make this better, but this is getting near-logarithmically
>> slower.
>>
>>    Is that to be expected, or at least known about?
>>
>>
>>    (we're going to jigger the files around into sep. directories to get the
>> individual counts down;  I expect that to help in this instance).
>
> That is what I recall from previous reports. I originally was going to see if
> anything could be done as it sounds like a classic problem of a linear
> search/sort over a growing list. The big unanswered question was where was
> this list.
>
> If the code is auto generated would it be possible to generate it for each
> build? That's what we typically do where I work. Anything that is generated
> is not committed. A bad example would be to say I have java source code, I
> don't need to commit the compiled byte code too or jars too.

I seem to remember that this has something to do with the way the svn
client determines unique names for its temp files during such a
checkout. Something like: it first tries filename 'tmpfile1', if that
exists it tries 'tmpfile2', then 'tmpfile3', ... and so on. So if it's
checking out file number 5000, it first tries 4999 filenames that are
already in use, and only then comes to the conclusion that
'tmpfile5000' is the unique filename it should use. That could explain
the 'ever slowing down' behavior that you see, when more and more
files in the same dir get checked out.

I'm not entirely sure, but I vaguely remember that this came up as a
thread on the users list or on the dev list (but unfortunately, I
can't find it right now). I also seem to remember that this was fixed
on trunk, so should be much better in 1.7 (by choosing the unique
filenames randomly (and then checking if it already exists), instead
of with an incrementing number). Again, I can't find the commit or
dev-list discussion, but it's floating in the back of my head
somewhere...

If I have more time, I'll try to search the archives some more.

In the meantime (while we're waiting for 1.7 :-)): splitting it up
into multiple directories seems like a good workaround...

Cheers,
-- 
Johan

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Campbell Allan <ca...@sword-ciboodle.com>.
On Wednesday 26 Jan 2011, Neil Bird wrote:
>    We have a graphics-oriented code-base that's auto-generated and has
> >5000 source files in one directory.  While I can check this out OK on
> Linux, we're seeing an unusable slow-down on Windows XP (NTFS), both using
> Tortoise directly, and as a test on Linux with the Windows drive mapped
> over CIFS.
>
>    The checkout starts sensibly enough, but then gets steadily slower and
> slower and slower, to the point were we're not sure it'd actually ever end.
>
>    I know that there's a negative speed difference on NTFS, and that 1.7's
> WC-NG might make this better, but this is getting near-logarithmically
> slower.
>
>    Is that to be expected, or at least known about?
>
>
>    (we're going to jigger the files around into sep. directories to get the
> individual counts down;  I expect that to help in this instance).

That is what I recall from previous reports. I originally was going to see if 
anything could be done as it sounds like a classic problem of a linear 
search/sort over a growing list. The big unanswered question was where was 
this list.

If the code is auto generated would it be possible to generate it for each 
build? That's what we typically do where I work. Anything that is generated 
is not committed. A bad example would be to say I have java source code, I 
don't need to commit the compiled byte code too or jars too.

-- 

__________________________________________________________________________________
Sword Ciboodle is the trading name of ciboodle Limited (a company 
registered in Scotland with registered number SC143434 and whose 
registered office is at India of Inchinnan, Renfrewshire, UK, 
PA4 9LH) which is part of the Sword Group of companies.

This email (and any attachments) is intended for the named
recipient(s) and is private and confidential. If it is not for you, 
please inform us and then delete it. If you are not the intended 
recipient(s), the use, disclosure, copying or distribution of any 
information contained within this email is prohibited. Messages to 
and from us may be monitored. If the content is not about the 
business of the Sword Group then the message is neither from nor 
sanctioned by us.

Internet communications are not secure. You should scan this
message and any attachments for viruses. Under no circumstances
do we accept liability for any loss or damage which may result from
your receipt of this email or any attachment.
__________________________________________________________________________________


Re: Checkout really slow in Windows with lots of files in one directory

Posted by Andy Levy <an...@gmail.com>.
On Wed, Jan 26, 2011 at 11:59, Neil Bird <ne...@jibbyjobby.co.uk> wrote:
> Around about 26/01/11 14:43, Andy Levy typed ...
>>
>> It's known and oft-lamented. NTFS just doesn't handle this scenario
>> well - it's probably one of the reasons FSFS sharding was introduced
>> (I'm speculating a bit here).
>
>  This is stuff that's currently in SourceSafe, which doesn't exhibit any
> obvious issues (well, apart from the obvious :) ), so I can't blame NTFS
> just for not handling the sheer volume of files.

SourceSafe (last I knew) manages working copies differently.
Subversion downloads into a temp location inside .svn, then copies
that to the "real" location in .svn, then copies to the "visible"
working copy. There's a *lot* more I/O going on with Subversion.

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 26/01/11 14:43, Andy Levy typed ...
> It's known and oft-lamented. NTFS just doesn't handle this scenario
> well - it's probably one of the reasons FSFS sharding was introduced
> (I'm speculating a bit here).

   This is stuff that's currently in SourceSafe, which doesn't exhibit any 
obvious issues (well, apart from the obvious :) ), so I can't blame NTFS 
just for not handling the sheer volume of files.


> How's the checkout performance with a command-line client on that XP
> box? It could also be your on-access virus scanner, and testing w/ the
> command-line client may help diagnose that.

   I tested Linux command-line over a CIFS mount to the NTFS drive, on a box 
for which that directory was excluded for on-access scan, so the virus 
scanner isn't in the loop.  Could be some other IT spyware^H^H^H^H^H^H 
logging software, though.  But there's nothing using excess CPU in that case.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Thorsten Schöning <ts...@am-soft.de>.
Guten Tag Neil Bird,
am Mittwoch, 26. Januar 2011 um 18:02 schrieben Sie:

>    When we do the checkout via tortoise, tortoise sits at 100% CPU load, and
> I don't see anything else fighting it for time.

This could be the cache processing the events from the shell about
updated and new files. I often see this for some seconds when I just
enter my larger working copies without updating out or else.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Systeme - Hameln | Potsdam | Leipzig
 
Telefon: Potsdam: 0331-743881-0
E-Mail:  tschoening@am-soft.de
Web:     http://www.am-soft.de

AM-SoFT GmbH IT-Systeme, Konsumhof 1-5, 14482 Potsdam
Amtsgericht Potsdam HRB 21278 P, Geschäftsführer: Andreas Muchow


Re: Checkout really slow in Windows with lots of files in one directory

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 26/01/11 15:24, Stefan Sperling typed ...
> Don't run virus scanners on working copies.
> If you want to check for viruses in the repository, use a designated
> working copy and trigger a scan of changed files from the post-commit hook.

   The virus scanner excludes this particular directory on my box, which 
also suffers the slow-down.

   When we do the checkout via tortoise, tortoise sits at 100% CPU load, and 
I don't see anything else fighting it for time.

   I've not monitored the load on the Windows svn command-line.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Jan 26, 2011 at 09:43:15AM -0500, Andy Levy wrote:
> NTFS just doesn't handle this scenario
> well - it's probably one of the reasons FSFS sharding was introduced
> (I'm speculating a bit here).

IIRC that is correct.
Sharding was introduced to prevent long-running readdir() system calls.
It is on the server though and does not affect client behaviour.

So while the client might suffer similar issues with directories with
many files in it, a virus scanner is much more likely to cause notable
delays on Windows. Subversion creates temporary files and then moves
them into place. Virus scanners don't interact well with that.
Especially since it's impossible to rename open files on Windows.

Don't run virus scanners on working copies.
If you want to check for viruses in the repository, use a designated
working copy and trigger a scan of changed files from the post-commit hook.

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Andy Levy <an...@gmail.com>.
On Wed, Jan 26, 2011 at 09:28, Neil Bird <ne...@jibbyjobby.co.uk> wrote:
>
>  We have a graphics-oriented code-base that's auto-generated and has >5000
> source files in one directory.  While I can check this out OK on Linux,
> we're seeing an unusable slow-down on Windows XP (NTFS), both using Tortoise
> directly, and as a test on Linux with the Windows drive mapped over CIFS.
>
>  The checkout starts sensibly enough, but then gets steadily slower and
> slower and slower, to the point were we're not sure it'd actually ever end.
>
>  I know that there's a negative speed difference on NTFS, and that 1.7's
> WC-NG might make this better, but this is getting near-logarithmically
> slower.
>
>  Is that to be expected, or at least known about?

It's known and oft-lamented. NTFS just doesn't handle this scenario
well - it's probably one of the reasons FSFS sharding was introduced
(I'm speculating a bit here).

How's the checkout performance with a command-line client on that XP
box? It could also be your on-access virus scanner, and testing w/ the
command-line client may help diagnose that.

RE: Checkout really slow in Windows with lots of files in one directory

Posted by "Echlin, Jamie" <ja...@credit-suisse.com>.
>    We see exactly the same problem on two diff. Windows PCs, 
> but I'll give ImDisk a bash in the morning for the sake of 
> experimentation.

Have you tried Process Explorer? Look at the stack trace of one of the
events when it's started slowing down. Make sure the symbol server is
configured. That can be very helpful...

Nothing funny with NTFS ACLs? Is the ACL it's inheriting sensible?
Trying on a FAT virtual disk would help rule that out.

=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=============================================================================== 


Re: Checkout really slow in Windows with lots of files in one directory

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 26/01/11 15:30, Echlin, Jamie typed ...
> Perhaps your Master File Table is fragmented... In which case a defrag
> might help.

   We see exactly the same problem on two diff. Windows PCs, but I'll give 
ImDisk a bash in the morning for the sake of experimentation.

   And I recently defreagged mine.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

RE: Checkout really slow in Windows with lots of files in one directory

Posted by "Echlin, Jamie" <ja...@credit-suisse.com>.
On second thoughts my previous mail is related to when you have a
massive number of sub-dirs in your working copy. I didn't properly read
you mail, sorry.

Perhaps your Master File Table is fragmented... In which case a defrag
might help. 

=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=============================================================================== 


Re: Checkout really slow in Windows with lots of files in one directory

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 27/01/11 12:51, Nico Kadel-Garcia typed ...
> Are you *normally* out to a CIFS shared location, or to local NTFS
> disk?

   Not ... as such.  No, this was particularly to test the client code/OS to 
try to pin things down for this issue.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
On Wed, Jan 26, 2011 at 9:28 AM, Neil Bird <ne...@jibbyjobby.co.uk> wrote:
>
>  We have a graphics-oriented code-base that's auto-generated and has >5000
> source files in one directory.  While I can check this out OK on Linux,
> we're seeing an unusable slow-down on Windows XP (NTFS), both using Tortoise
> directly, and as a test on Linux with the Windows drive mapped over CIFS.

Are you *normally* out to a CIFS shared location, or to local NTFS
disk? CIFS is very "chatty", and can often take 10 times as long to do
a checkout of a 1 Gig software working copy (in my personal
experience, 30 minutes instead of 3 for local disk).

Checkout to local disk, then copying it over to CIFS is grotesquely
faster. Maintaining a nightly updaed, checked out working copy for
copying to local working copies is also very effective to avoid this
slow down: the working copy can then be updated, far, far more
effectively.

>  The checkout starts sensibly enough, but then gets steadily slower and
> slower and slower, to the point were we're not sure it'd actually ever end.
>
>  I know that there's a negative speed difference on NTFS, and that 1.7's
> WC-NG might make this better, but this is getting near-logarithmically
> slower.
>
>  Is that to be expected, or at least known about?
>
>
>  (we're going to jigger the files around into sep. directories to get the
> individual counts down;  I expect that to help in this instance).
>
> --
> [neil@fnx ~]# rm -f .signature
> [neil@fnx ~]# ls -l .signature
> ls: .signature: No such file or directory
> [neil@fnx ~]# exit
>

Re: Checkout really slow in Windows with lots of files in one directory

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Jan 26, 2011, at 08:28, Neil Bird wrote:

>  We have a graphics-oriented code-base that's auto-generated and has >5000 source files in one directory.  While I can check this out OK on Linux, we're seeing an unusable slow-down on Windows XP (NTFS), both using Tortoise directly, and as a test on Linux with the Windows drive mapped over CIFS.
> 
>  The checkout starts sensibly enough, but then gets steadily slower and slower and slower, to the point were we're not sure it'd actually ever end.
> 
>  I know that there's a negative speed difference on NTFS, and that 1.7's WC-NG might make this better, but this is getting near-logarithmically slower.
> 
>  Is that to be expected, or at least known about?

If, as you posted in your other thread, there is an on-access virus scanner on this PC, then that could also be significant. Try turning off the virus scanner or excluding the directory where you're doing this checkout.



RE: Checkout really slow in Windows with lots of files in one directory

Posted by "Echlin, Jamie" <ja...@credit-suisse.com>.
>    I know that there's a negative speed difference on NTFS, 
> and that 1.7's WC-NG might make this better, but this is 
> getting near-logarithmically slower.

There's good information about NTFS wrt subversion here:
http://superuser.com/questions/15192/bad-ntfs-performance

I found FAT was 10-20x faster than NTFS for the lock/unlock working copy
operation, but I never saw a case where it was logarithmically slow or
appeared to get slower as it went on.

It's easy to test this. Install ImDisk from here:
http://www.ltr-data.se/opencode.html/#ImDisk which is very
straightforward (no reboot), and create two virtual disks, format one as
ntfs and one as FAT32. 

Then compare the timings between the three (the third being the physical
disk).

The differences are magnified when the amount of actual work to do is
small compared to the time required for the lock/unlock operation, eg an
svn update when nothing has changed on the server.

jamie

=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
===============================================================================