You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Neil Bird <ne...@jibbyjobby.co.uk> on 2011/02/03 17:18:32 UTC

[PATCH] issue #3719 fix slow large checkouts on Windows

   This is a patch for discussion which I submitted to issue 3719 
(“Extremely slow checkout on Windows”).  I shan't repeat everything that's 
there;  essentially it fixes *really* slow checkouts to NTFS of directories 
with large numbers of files with properties.


   I originally modified it slightly to address my second comment on that 
issue, namely that just using rand() will leave it with slightly less 
functionality than the previous 99999 loop.

   However, that was relying upon my Linux man page telling me RAND_MAX was 
32767;  it's actually 2147483647.  I think it's still smaller on Windows, 
though, so it would suffer that limitation.  Should we include a small, 
custom int-max rand() just for this?

   Otherwise it might be a #if on RAND_MAX.


   A supplementary question is:  should I worry about where rand() is 
included from?  It “just worked” on my Linux build, but I've not tried for 
Windows, etc.


   Apologies if I haven't formatted all this correctly.


[[[
Fix issue #3719: prevent logarithmic slow down when checking out large 
directories onto Windows NTFS

* subversion/libsvn_subr/io.c
   svn_io_open_unique_file3: don't call svn_io_open_uniquely_named(), but 
instead use a copy of that routine that uses rand() to get the next 
potential free number instead if simply incrementing, minimising clashes on 
repeated calls.

]]]

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Julian Foad <ju...@wandisco.com>.
Neil Bird wrote:
> Around about 04/02/11 10:52, Stefan Fuhrmann typed ...
> >> I'll give it another bash, though , if you think it's worth it.
> >>
> > Definitely. It contains quite a number of file access
> > optimizations that should become best visible on
> > "high overhead" FS like NTFS.
> 
>    It's turning out to be the PITA I expected.
> 
>    Are you recommending I backport libsvn_subr/io.c in it's entirety?

Please only attempt to back-port one functional change at a time, not
all the changes in the whole file.

>   I'm 
> currently only adopting the svn_io_open_unique_file3() routine and it's 
> flurry of associated helper functions.

That sounds fine if all the changes in that group of functions are part
of a single functional change.

- Julian



Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Mar 02, 2011 at 08:31:01AM +0000, Neil Bird wrote:
> Around about 01/03/11 17:13, Stefan Sperling typed ...
> >Great! Here's a new version that includes a fix for the lock_tests
> >failures. I'll propose this for backport now. Thanks for providing
> >the initial patch submission for this and for helping with testing!
> 
>   Yep, that's the kiddie!  Thanks for the effort;  I'm only sorry I
> couldn't be of any actual help bar testing in the end :)

You've been great help. The fact alone that you started writing a
patch made this issue more likely to get fixed because patch
submissions generally receive more attention than "me too!" problem
reports. This is not because such problem reports aren't being taken
seriously. It's because there's an additional incentive for existing
developers to keep an extra eye on patch submissions. Because that's
how we can grow the team :) So I hope you'll come back some day and
fix another problem, however small it might be, and that you've enjoyed
this exercise even if you don't have any reason to come back.

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 01/03/11 17:13, Stefan Sperling typed ...
> Great! Here's a new version that includes a fix for the lock_tests
> failures. I'll propose this for backport now. Thanks for providing
> the initial patch submission for this and for helping with testing!

   Yep, that's the kiddie!  Thanks for the effort;  I'm only sorry I 
couldn't be of any actual help bar testing in the end :)

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Mar 01, 2011 at 04:43:47PM +0000, Neil Bird wrote:
> Around about 01/03/11 16:34, Stefan Sperling typed ...
> >It's not perfect yet. There are regression test failures around
> >setting file permissions for svn:needs-lock and svn:executable.
> 
>   Yes, I saw those.
> 
> 
> >But the most interesting bit of information is whether it fixes
> >the performance issues, which I cannot test (no windows box around).
> 
>   Well, it seemed to go OK;  it's taking 8 mins. instead of 85 (but
> that's slower than it should be because both our network and my
> Windows PC are playing up a bit at the moment).
> 
>   It's the same sort of speed I was seeing this morning with my
> noddy patch, certainly.

Great! Here's a new version that includes a fix for the lock_tests
failures. I'll propose this for backport now. Thanks for providing
the initial patch submission for this and for helping with testing!

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 01/03/11 16:34, Stefan Sperling typed ...
> It's not perfect yet. There are regression test failures around
> setting file permissions for svn:needs-lock and svn:executable.

   Yes, I saw those.


> But the most interesting bit of information is whether it fixes
> the performance issues, which I cannot test (no windows box around).

   Well, it seemed to go OK;  it's taking 8 mins. instead of 85 (but that's 
slower than it should be because both our network and my Windows PC are 
playing up a bit at the moment).

   It's the same sort of speed I was seeing this morning with my noddy 
patch, certainly.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Mar 01, 2011 at 04:08:30PM +0000, Neil Bird wrote:
> Around about 01/03/11 16:03, Stefan Sperling typed ...
> >Neil, can you try the attached patch, please? Thanks!
> 
>   OK, are you just trying to embarrass me now by getting an
> alternative patch out so quickly??  :-)

Heh. No, that's not the intention :)
 
>   I'll try it now ...

It's not perfect yet. There are regression test failures around
setting file permissions for svn:needs-lock and svn:executable.

But the most interesting bit of information is whether it fixes
the performance issues, which I cannot test (no windows box around).

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 01/03/11 15:19, Daniel Shahaf typed ...
> The magic is "add 840074 to the old revnum to get the new one":
> http://svn.apache.org/repos/asf/subversion/README

   Ah, all is clear[er].

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Quick reply to address just this one point:

Neil Bird wrote on Tue, Mar 01, 2011 at 14:01:34 +0000:
>   All that added to the fact that the revision numbers seem to have
> all changed (the move from Tigris to Apache?), so even when the
> comments *do* say that they're fixing blah from rev. xxx, the xxx is
> wrong.

The magic is "add 840074 to the old revnum to get the new one":
http://svn.apache.org/repos/asf/subversion/README

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 01/03/11 16:03, Stefan Sperling typed ...
> Neil, can you try the attached patch, please? Thanks!

   OK, are you just trying to embarrass me now by getting an alternative 
patch out so quickly??  :-)

   I'll try it now ...

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 01/03/11 16:03, Stefan Sperling typed ...
> Neil, can you try the attached patch, please? Thanks!

Running all tests in lock_tests.py [57/71]...FAILURE
FAIL:  lock_tests.py 25: svn:needs-lock and svn:executable, part I
FAIL:  lock_tests.py 26: svn:needs-lock and svn:executable, part II


   I'm pretty certain that one of my attempts did that.  I'd had to backport 
one of the file perms routines in io.c, from trunk in its entirety to get an 
API that the trunk temp file code wanted, but I couldn't see why it would 
now be failing.

   I think it stuffed up in the routines that get/merge default file 
permissions.


   It does appear to fix the slowdown, though.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Mar 01, 2011 at 04:42:28PM +0100, Stefan Sperling wrote:
> I will try backporting the trunk code to 1.6.x myself.
> If that gets anywhere then we can use it.

Neil, can you try the attached patch, please? Thanks!

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Mar 01, 2011 at 02:01:34PM +0000, Neil Bird wrote:
> Around about 08/02/11 10:01, Neil Bird typed ...
>   Just to re-iterate, what I've done is change
> svn_io_open_unique_file3() so that instead of just calling
> svn_io_open_uniquely_named() it is in fact a verbatim copy of
> svn_io_open_uniquely_named(), except that it uses rand() (after a
> fashion) to generate iterative temporary file names instead of
> counting from 1.

The problem with this approach is that is also changes the numbers
for temporary files used to store log messages. These temporary
files are part of the user interface, in a way.
I think that using random numbers in those files names might confuse people.
And we're generally trying to avoid making such changes in a patch release.

When there are several svn-commit.tmp files, I find it quite convenient
to be able to tell from the filename which file is the latest.

>   I'll include the latest patch here, and put it on the bug entry
> for posterity as well.
> 
>   If you're happy with it, great, if not, well, so be it.

Thanks for trying!

I will try backporting the trunk code to 1.6.x myself.
If that gets anywhere then we can use it.

Else, we could use a slightly modified version of your patch 
that keeps the existing behaviour for log message temp files.

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 08/02/11 10:01, Neil Bird typed ...
> Around about 08/02/11 09:03, Daniel Shahaf typed ...
>> Thanks. However, to clarify, I'm not specifically interested in the
>> "give us N revisions" form; I'm just interested in seeing a coherent
>> patch at the end, and wanted to ensure you haven't forgotten that in
>> depth of coding.
>
> It's just going to take longer that I planned :)

   OK, I tried back-porting (svn merge) actual trunk revisions to get a 
working fix in place, but I stopped when I hit 7 of them (just for this one 
change) and knew (by the volume of conflicts I was having to patch) that it 
was a no-goer.

   Even if I had got it compiling and working, there would have been so many 
tertiary changes that it would have been difficult to prove that it wasn't 
breaking more than it fixed (there are an awful lot of secondary & important 
fixes in trunk relating to the initial re-working;  I'd have almost 
certainly missed one).

   All that added to the fact that the revision numbers seem to have all 
changed (the move from Tigris to Apache?), so even when the comments *do* 
say that they're fixing blah from rev. xxx, the xxx is wrong.


   So, I'm going back to my initial change.  It's not terribly elegant, but 
it is at least clean, does the job, and passes the tests.

   Just to re-iterate, what I've done is change svn_io_open_unique_file3() 
so that instead of just calling svn_io_open_uniquely_named() it is in fact a 
verbatim copy of svn_io_open_uniquely_named(), except that it uses rand() 
(after a fashion) to generate iterative temporary file names instead of 
counting from 1.

   This time, I believe I have catered for the case where RAND_MAX is only 
32767 (my Windows at least) without being OS-specific.  I have tested it on 
Linux on both ext3 and onto a shared NTFS drive (which used to exhibit the 
fault).

   I have not compiled it for Windows (I don't have a Windows subversion 
build environment), but I have compiled both sides of the ifdef I added.


   I'll include the latest patch here, and put it on the bug entry for 
posterity as well.

   If you're happy with it, great, if not, well, so be it.


[[[
Fix issue #3719: prevent logarithmic slow down when checking out large 
directories onto Windows NTFS

* subversion/libsvn_subr/io.c
   svn_io_open_unique_file3: don't call svn_io_open_uniquely_named(), but 
instead use a copy of that routine that uses rand() to get the next 
potential free number instead if simply incrementing, minimising clashes on 
repeated calls.

]]]

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 08/02/11 09:03, Daniel Shahaf typed ...
> Thanks.  However, to clarify, I'm not specifically interested in the
> "give us N revisions" form; I'm just interested in seeing a coherent
> patch at the end, and wanted to ensure you haven't forgotten that in
> depth of coding.
> Good luck / and thanks,

   That's fine;  I'm all too aware that this'd be a patch on a stable 
branch, with all the extra care that entails.  I'd already decided that if 
the final solution hid the wood with the trees I'd maybe not submit it. 
Well, I'll probably post what I've done for interest if nothing else.

   It's just going to take longer that I planned :)

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Neil Bird wrote on Tue, Feb 08, 2011 at 08:39:04 +0000:
> Around about 08/02/11 01:58, Daniel Shahaf typed ...
>> I'm concerned; that doesn't sound like a good process to develop
>> a patch.  Normally backporting a patch is a matter of finding
>> N applicable revisions and merging them... but it sounds that here
>> you're re-developing the feature from scratch.
>
>   Which is why this route wasn't my first choice.  The required changes  
> look as if they're too far down to line to apply separately;  they 
> require previous, possibly unrelated, revisions.
>
>   I'll take the time later to see if I can figure out exactly how many  
> trunk revisions I'd need.
>

Thanks.  However, to clarify, I'm not specifically interested in the
"give us N revisions" form; I'm just interested in seeing a coherent
patch at the end, and wanted to ensure you haven't forgotten that in
depth of coding.

>
>> If it won't compile on windows, for example, I'd be concerned that you
>> introduced run-time bugs on unix.
>
>   I poorly phrased it;  all I meant was, I'm likely going to be unable to 
> verify that it builds on Windows.  The code's actually not as ifdef'd as 
> it first appears, and a quick walk-though makes me think it *should* be 
> OK.  I just won't be able to check myself.

No problem here, then.

> 

Good luck / and thanks,

Daniel

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 08/02/11 01:58, Daniel Shahaf typed ...
> I'm concerned; that doesn't sound like a good process to develop
> a patch.  Normally backporting a patch is a matter of finding
> N applicable revisions and merging them... but it sounds that here
> you're re-developing the feature from scratch.

   Which is why this route wasn't my first choice.  The required changes 
look as if they're too far down to line to apply separately;  they require 
previous, possibly unrelated, revisions.

   I'll take the time later to see if I can figure out exactly how many 
trunk revisions I'd need.


> If it won't compile on windows, for example, I'd be concerned that you
> introduced run-time bugs on unix.

   I poorly phrased it;  all I meant was, I'm likely going to be unable to 
verify that it builds on Windows.  The code's actually not as ifdef'd as it 
first appears, and a quick walk-though makes me think it *should* be OK.  I 
just won't be able to check myself.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Neil Bird wrote on Fri, Feb 04, 2011 at 12:53:31 +0000:
> Around about 04/02/11 12:06, Neil Bird typed ...
> >It's turning out to be the PITA I expected.
> 
>   OK, I've backported enough of the trunk copy to get it compiling
> for Linux, but it now fails 2 tests.  I'll investigate next week.
> It almost certainly won't compile for Windows, either, as there's
> quite a bit of ifdef'd code I can't check.

I'm concerned; that doesn't sound like a good process to develop
a patch.  Normally backporting a patch is a matter of finding
N applicable revisions and merging them... but it sounds that here
you're re-developing the feature from scratch.

If it won't compile on windows, for example, I'd be concerned that you
introduced run-time bugs on unix.

> 
> -- 
> [neil@fnx ~]# rm -f .signature
> [neil@fnx ~]# ls -l .signature
> ls: .signature: No such file or directory
> [neil@fnx ~]# exit

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 04/02/11 12:06, Neil Bird typed ...
> It's turning out to be the PITA I expected.

   OK, I've backported enough of the trunk copy to get it compiling for 
Linux, but it now fails 2 tests.  I'll investigate next week.  It almost 
certainly won't compile for Windows, either, as there's quite a bit of 
ifdef'd code I can't check.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 04/02/11 10:52, Stefan Fuhrmann typed ...
>> I'll give it another bash, though , if you think it's worth it.
>>
> Definitely. It contains quite a number of file access
> optimizations that should become best visible on
> "high overhead" FS like NTFS.

   It's turning out to be the PITA I expected.

   Are you recommending I backport libsvn_subr/io.c in it's entirety?  I'm 
currently only adopting the svn_io_open_unique_file3() routine and it's 
flurry of associated helper functions.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Stefan Fuhrmann <eq...@web.de>.
On 04.02.2011 09:08, Neil Bird wrote:
> Around about 03/02/11 17:01, Stefan Sperling typed ...
>>> * subversion/libsvn_subr/io.c
>>>    svn_io_open_unique_file3: don't call svn_io_open_uniquely_named(),
>>> but instead use a copy of that routine
>> Which routine, precisely?
>
>   svn_io_open_uniquely_named();  sorry, I thought that was clear enough.

I had a similar patch in place on the performance branch.
Due to changes in the calling functions, the extra randomization
has now become unnecessary.
>
>> Have you taken a look at the implementation of 
>> svn_io_open_unique_file3()
>> in Subversion's trunk code?
>

>   Yes, and in fact that was what I initially tried, but (with 
> admittedly little effort) it wouldn't compile for several reasons that 
> seemed to be down to other API changes that weren't in 1.6.x (apr ones 
> IIRC).
>
>   I'll give it another bash, though , if you think it's worth it.
>
Definitely. It contains quite a number of file access
optimizations that should become best visible on
"high overhead" FS like NTFS.

-- Stefan^2.


Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Neil Bird <ne...@jibbyjobby.co.uk>.
Around about 03/02/11 17:01, Stefan Sperling typed ...
>> * subversion/libsvn_subr/io.c
>>    svn_io_open_unique_file3: don't call svn_io_open_uniquely_named(),
>> but instead use a copy of that routine
> Which routine, precisely?

   svn_io_open_uniquely_named();  sorry, I thought that was clear enough.


> Have you taken a look at the implementation of svn_io_open_unique_file3()
> in Subversion's trunk code?

   Yes, and in fact that was what I initially tried, but (with admittedly 
little effort) it wouldn't compile for several reasons that seemed to be 
down to other API changes that weren't in 1.6.x (apr ones IIRC).

   I'll give it another bash, though , if you think it's worth it.

-- 
[neil@fnx ~]# rm -f .signature
[neil@fnx ~]# ls -l .signature
ls: .signature: No such file or directory
[neil@fnx ~]# exit

Re: [PATCH] issue #3719 fix slow large checkouts on Windows

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Feb 03, 2011 at 04:18:32PM +0000, Neil Bird wrote:
> 
>   This is a patch for discussion which I submitted to issue 3719
> (“Extremely slow checkout on Windows”).  I shan't repeat everything
> that's there;  essentially it fixes *really* slow checkouts to NTFS
> of directories with large numbers of files with properties.
> 
> 
>   I originally modified it slightly to address my second comment on
> that issue, namely that just using rand() will leave it with
> slightly less functionality than the previous 99999 loop.
> 
>   However, that was relying upon my Linux man page telling me
> RAND_MAX was 32767;  it's actually 2147483647.  I think it's still
> smaller on Windows, though, so it would suffer that limitation.
> Should we include a small, custom int-max rand() just for this?
> 
>   Otherwise it might be a #if on RAND_MAX.
> 
> 
>   A supplementary question is:  should I worry about where rand() is
> included from?  It “just worked” on my Linux build, but I've not
> tried for Windows, etc.
> 
> 
>   Apologies if I haven't formatted all this correctly.
> 
> 
> [[[
> Fix issue #3719: prevent logarithmic slow down when checking out
> large directories onto Windows NTFS
> 
> * subversion/libsvn_subr/io.c
>   svn_io_open_unique_file3: don't call svn_io_open_uniquely_named(),
> but instead use a copy of that routine

Which routine, precisely?

> that uses rand() to get the
> next potential free number instead if simply incrementing,
> minimising clashes on repeated calls.

Have you taken a look at the implementation of svn_io_open_unique_file3()
in Subversion's trunk code? This has already been changed, and it looks
like it already solves the performance problem. It also has a special case
for windows -- see the temp_file_create() helper function.

I think it would be better to backport that code to 1.6.x, instead of
changing svn_io_open_unique_file3() for 1.6.x in some other way.

Stefan