You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Vladimir Prus <gh...@cs.msu.su> on 2002/07/24 09:14:13 UTC

"svn diff" performance

Hello, 
I've just run

    time svn diff -r2674:2675 http://svn.collab.net/repos/svn

and the output is
    real    1m37.448s
    user    0m0.230s
    sys     0m0.030s

This is somewhat long, given that the diff has only one line, and it 2675 is 
head revision.

I've previously run, using local repository

     time svn diff -r42:43 http://localhost:7813/repos/whale/

and got

    real    0m19.510s
    user    0m0.030s
    sys     0m0.010s

This is certainly a ra_dav problem, because

    time svn diff -r42:43 file://localhost/home/ghost/local/svn/whale

gives

    real    0m0.146s
    user    0m0.030s
    sys     0m0.060s

I've looked at ethereal log for the case of localhost. Unless I've missed 
some packets the situation is: client and server talk for a couple of 
seconds, then wait for a long time, and then server closes connection.
What can be the problem?

- Volodya

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Philip Martin <ph...@codematters.co.uk>.
Ben Collins-Sussman <su...@collab.net> writes:

> In your comparison, you're testing file:/// vs. http://localhost.  I
> would still expect the http version to be slower, although maybe
> you're right -- maybe it's too slow.  Do you have reason to believe
> that apache is 'hanging' or something?  Do you see the GET requests in
> ethereal?  Where does the 'delay' happen?

If I switch on neon debugging (line 341 in libsvn_ra_dav/session.c)
and use Vladimir's command

$ svn diff -r2674:2675 http://svn.collab.net/repos/svn

then during the what would otherwise appear to be a delay I see the
client reading and processing the <S:resource-walk>.

We can use the stress.pl script to create a repository with a couple
of hundred files in over a hundred directories, with revisions 2 and 3
changing just a single file.

$ tools/dev/stress.pl -c -N7 -x1 -s0 -n2

Now compare

$ time svn diff -r2:3 file:///home/pm/repostres
real    0m0.838s
user    0m0.150s
sys     0m0.060s

and

$ time svn diff -r2:3 http://localhost:8888/repostress
real    0m11.197s
user    0m0.120s
sys     0m0.010s

The problem, if there is one, seems to be in the server

$ ps ax | grep [h]ttpd
31941 ?        S      0:00 /usr/local/apache2/bin/httpd -k start
31942 ?        S      0:00 /usr/local/apache2/bin/httpd -k start
31943 ?        S      0:10 /usr/local/apache2/bin/httpd -k start
31944 ?        S      0:00 /usr/local/apache2/bin/httpd -k start
31945 ?        S      0:00 /usr/local/apache2/bin/httpd -k start
31946 ?        S      0:00 /usr/local/apache2/bin/httpd -k start
31950 ?        S      0:10 /usr/local/apache2/bin/httpd -k start
31952 ?        S      0:00 /usr/local/apache2/bin/httpd -k start
31953 ?        S      0:00 /usr/local/apache2/bin/httpd -k start
31954 ?        S      0:10 /usr/local/apache2/bin/httpd -k start
$ time svn diff -r2:3 http://localhost:8888/repostress
real    0m10.857s
user    0m0.130s
sys     0m0.020s
ps ax | grep [h]ttpd
31941 ?        S      0:00 /usr/local/apache2/bin/httpd -k start
31942 ?        S      0:00 /usr/local/apache2/bin/httpd -k start
31943 ?        S      0:10 /usr/local/apache2/bin/httpd -k start
31944 ?        S      0:00 /usr/local/apache2/bin/httpd -k start
31945 ?        S      0:00 /usr/local/apache2/bin/httpd -k start
31946 ?        S      0:10 /usr/local/apache2/bin/httpd -k start
31950 ?        S      0:10 /usr/local/apache2/bin/httpd -k start
31952 ?        S      0:00 /usr/local/apache2/bin/httpd -k start
31953 ?        S      0:00 /usr/local/apache2/bin/httpd -k start
31954 ?        S      0:10 /usr/local/apache2/bin/httpd -k start

The httpd process 31946 has used 10 seconds of processor time, so that
explains the delay seen by the client, it's waiting for the server.
So the question becomes why does the server take so long to generate
<S:resource-walk>?

Disclaimer: I've got APR_POOL_DEBUG set so my numbers may be misleading.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Ben Collins-Sussman <su...@collab.net>.
Vladimir Prus <gh...@cs.msu.su> writes:

> And I now know what caused the problem: I had MaxServers set to 2 in
> httpd.conf. After changing this to 4, the performance became
> reasonable.  Hmm... I still think there's something wrong -- why
> would ra_dav require more than one apache instance?! Isn't that
> inefficient?

This has nothing to do with ra_dav;  ra_dav is just a client side
module that uses neon to send HTTP requests.

The question is:  why does *apache* need many instances to respond to
a couple of PROPFIND and GET requests?  Could have something to do
with apache's design (not sure), or a bug in mod_dav or mod_dav_svn.
Maybe someone else has some ideas.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: "svn diff" performance

Posted by Sander Striker <st...@apache.org>.
> From: Vladimir Prus [mailto:ghost@cs.msu.su]
> Sent: 24 July 2002 15:45
> To: Ben Collins-Sussman

> Ben Collins-Sussman wrote:
> > Vladimir Prus <gh...@cs.msu.su> writes:
> > > REPORT request is at 1.816335
> > > response comes at 3.018566
> > >
> > > Then, almost instantly client does PROPFIND for
> > > /whale/trunk/src/codegen.cpp (the only changed file).
> > >
> > > Then, I see GET at 3.062126 (for revision 42)
> > > Server starts responding at 3.516448
> > >
> > > Then, there's another GET at 3.517878 (for revision 43)
> > >
> > > After that I'm at loss -- I see some "FIN" packets and finally, at
> > > 20.074125 the server returns revision 43.
> > >
> > > So, getting the second revision takes a lot of time, and there's nothing
> > > special about that revision.
> >
> > That definitely sounds like something is broken.
> 
> And I now know what caused the problem: I had MaxServers set to 2 in 
> httpd.conf. After changing this to 4, the performance became reasonable.

[dare I ask?]
Why wasn't the default good for you?

> Hmm... I still think there's something wrong -- why would ra_dav require more 
> than one apache instance?! Isn't that inefficient?

It requires 2.  The history can be found in the mailing list archives.  Basically
we use one connection to stream the report, and to other to fetch files* during
the parsing of the report.

*) or properties, or whatever.
 
> - Volodya

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Vladimir Prus <gh...@cs.msu.su>.
Ben Collins-Sussman wrote:
> Vladimir Prus <gh...@cs.msu.su> writes:
> > REPORT request is at 1.816335
> > response comes at 3.018566
> >
> > Then, almost instantly client does PROPFIND for
> > /whale/trunk/src/codegen.cpp (the only changed file).
> >
> > Then, I see GET at 3.062126 (for revision 42)
> > Server starts responding at 3.516448
> >
> > Then, there's another GET at 3.517878 (for revision 43)
> >
> > After that I'm at loss -- I see some "FIN" packets and finally, at
> > 20.074125 the server returns revision 43.
> >
> > So, getting the second revision takes a lot of time, and there's nothing
> > special about that revision.
>
> That definitely sounds like something is broken.

And I now know what caused the problem: I had MaxServers set to 2 in 
httpd.conf. After changing this to 4, the performance became reasonable. 
Hmm... I still think there's something wrong -- why would ra_dav require more 
than one apache instance?! Isn't that inefficient?

- Volodya


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Ben Collins-Sussman <su...@collab.net>.
Vladimir Prus <gh...@cs.msu.su> writes:

> After that I'm at loss -- I see some "FIN" packets and finally, at
> 20.074125 the server returns revision 43.
> 
> So, getting the second revision takes a lot of time, and there's nothing 
> special about that revision.

OK, this bug should now be fixed; see cmpilato's last commit.

'svn diff' and 'svn merge' are no longer causing the server to
generate huge <resource-walk> trees anymore;  only 'svn switch'
requires that.

cmpilato reports that running 'svn diff -rX:Y' went from about 40
seconds, to about 2 seconds.  :-)



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Ben Collins-Sussman <su...@collab.net>.
Vladimir Prus <gh...@cs.msu.su> writes:

> REPORT request is at 1.816335
> response comes at 3.018566
> 
> Then, almost instantly client does PROPFIND for /whale/trunk/src/codegen.cpp 
> (the only changed file).
> 
> Then, I see GET at 3.062126 (for revision 42)
> Server starts responding at 3.516448
> 
> Then, there's another GET at 3.517878 (for revision 43)
> 
> After that I'm at loss -- I see some "FIN" packets and finally, at
> 20.074125 the server returns revision 43.
> 
> So, getting the second revision takes a lot of time, and there's nothing 
> special about that revision.

That definitely sounds like something is broken.

Does this problem happen consistently, i.e. when diffing any two revisions?

Can you gdb into httpd and see why it's sitting for 20 seconds on the
second GET request?  




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Vladimir Prus <gh...@cs.msu.su>.
Ben Collins-Sussman wrote:

> Vladimir Prus <gh...@cs.msu.su> writes:
> > I've looked at ethereal log for the case of localhost. Unless I've missed
> > some packets the situation is: client and server talk for a couple of
> > seconds, then wait for a long time, and then server closes connection.
> > What can be the problem?
> [snip]
> In the ra_local case, all conversation is happening in memory: the
> initial report and response, as well as file fetching.  Berkeley DB is
> being used directly by the client to suck files from the repository
> into the temporary area.
>
> In the ra_dav case, we're talking about multiple network turnarounds.
> *Two* http GETs for each file that has changed.
>
> In your comparison, you're testing file:/// vs. http://localhost.  I
> would still expect the http version to be slower, although maybe
> you're right -- maybe it's too slow.  Do you have reason to believe
> that apache is 'hanging' or something?  Do you see the GET requests in
> ethereal?  Where does the 'delay' happen?

Okay, I've looked thought the log more carefully. This is what I see.
(The log is at
 http://zigzag.cs.msu.su:7813/~ghost/svn_diff.cap 
 http://zigzag.cs.msu.su:7813/~ghost/svn_diff.txt
 in case you'd like to look at it
)

REPORT request is at 1.816335
response comes at 3.018566

Then, almost instantly client does PROPFIND for /whale/trunk/src/codegen.cpp 
(the only changed file).

Then, I see GET at 3.062126 (for revision 42)
Server starts responding at 3.516448

Then, there's another GET at 3.517878 (for revision 43)

After that I'm at loss -- I see some "FIN" packets and finally, at
20.074125 the server returns revision 43.

So, getting the second revision takes a lot of time, and there's nothing 
special about that revision.

- Volodya

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Ben Collins-Sussman <su...@collab.net>.
Vladimir Prus <gh...@cs.msu.su> writes:

> I've looked at ethereal log for the case of localhost. Unless I've missed 
> some packets the situation is: client and server talk for a couple of 
> seconds, then wait for a long time, and then server closes connection.
> What can be the problem?

Here's how a server-side diff works:

  * the client sends a REPORT, claiming that it has revision X, and
    that it wants to update to revision Y.

  * the server sends a response to the report:  just a single xml tree
    that describes which files need to be updated.

  * for each file mentioned in the xml response, the client does two
    GET requests.  It pulls down two fulltexts of the file into a
    temporary area, diffs them locally, then deletes them.

This process works identically over ra_local or ra_dav.  But it's not
a surprise at all that the ra_dav is much, much slower.  

In the ra_local case, all conversation is happening in memory: the
initial report and response, as well as file fetching.  Berkeley DB is
being used directly by the client to suck files from the repository
into the temporary area.

In the ra_dav case, we're talking about multiple network turnarounds.
*Two* http GETs for each file that has changed.

In your comparison, you're testing file:/// vs. http://localhost.  I
would still expect the http version to be slower, although maybe
you're right -- maybe it's too slow.  Do you have reason to believe
that apache is 'hanging' or something?  Do you see the GET requests in
ethereal?  Where does the 'delay' happen?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Vladimir Prus <gh...@cs.msu.su>.
Ben Collins-Sussman wrote:
> Vladimir Prus <gh...@cs.msu.su> writes:
> >     time svn diff -r2674:2675 http://svn.collab.net/repos/svn
> >
> > and the output is
> >     real    1m37.448s
> >     user    0m0.230s
> >     sys     0m0.030s
>
> Vladmir,
>
> You're asking the server to compare two trees, namely '/' and '/'.  Do
> you have any idea how *big* those trees are?  We have the /trunk tree,
> and at least 4 copies of it in /branches, and 2 more copies of it in
> /tags.  And each source tree is itself pretty large, over 800 files
> and directories to compare.

Sorry, but this does not explain the difference between ra_dav and ra_local 
in my second example. 

- Volodya

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Ben Collins-Sussman <su...@collab.net>.
Branko Čibej <br...@xbc.nu> 

> Yes, and fold do_update into do_switch, if the only difference is the
> same-path assumption. Then rename that to do_update again, and use
> that everywhere, with appropriate flage.

Heh, cmpilato is fixing this bug right now.  After chatting, I think
we're going for the "3 functions" approach, rather than adding more
flags to do_update() or do_switch().  It's certainly conceptually
easier to understand the code if you see

   do_update
   do_switch
   do_diff

in the RA vtable.  The fact that they all share the same code under
the hood is just an implementation detail. 

If we had just a single do_update, it would be a little weird that all
these callers in libsvn_client just "know" which flags to pass.
That's almost admitting that libsvn_client understands the shared
implementation, which it shouldn't.




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Branko Čibej <br...@xbc.nu>.
Ben Collins-Sussman wrote:

>Marcus Comstedt <ma...@mc.pp.se> writes:
>
>  
>
>>>We need a way to prevent that resource walk from happening for two of
>>>the three callers to RA->do_switch().
>>>      
>>>
>>Sounds like a third variant is needed then.  do_update(), do_switch()
>>and do_diff().  What are the other differences (apart from the
>>resource walk) between do_update() and do_switch() that cause diff and
>>merge to not be able to use do_update()?
>>    
>>
>
>do_update() assumes you're comparing the same path in two different
>revisions.  do_switch() assumes you're comparing two different paths.
>
>The diff and merge command might be comparing anything;  the paths
>could be identical, or totally unrelated.  So they call do_switch(),
>since that's always guaranteed to work.
>
>Rather than create a whole new do_diff(), maybe we can add a new flag
>to do_switch() that indicates whether we want the vsn-rsc-url data
>sent back to us.
>  
>

Yes, and fold do_update into do_switch, if the only difference is the 
same-path assumption. Then rename that to do_update again, and use that 
everywhere, with appropriate flage.

/me hasn't looked at the code, but pontificates anyway.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Ben Collins-Sussman <su...@collab.net>.
Marcus Comstedt <ma...@mc.pp.se> writes:

> > We need a way to prevent that resource walk from happening for two of
> > the three callers to RA->do_switch().
> 
> Sounds like a third variant is needed then.  do_update(), do_switch()
> and do_diff().  What are the other differences (apart from the
> resource walk) between do_update() and do_switch() that cause diff and
> merge to not be able to use do_update()?

do_update() assumes you're comparing the same path in two different
revisions.  do_switch() assumes you're comparing two different paths.

The diff and merge command might be comparing anything;  the paths
could be identical, or totally unrelated.  So they call do_switch(),
since that's always guaranteed to work.

Rather than create a whole new do_diff(), maybe we can add a new flag
to do_switch() that indicates whether we want the vsn-rsc-url data
sent back to us.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Marcus Comstedt <ma...@mc.pp.se>.
Ben Collins-Sussman <su...@collab.net> writes:

> I think I have a better explanation of the problem now:  it's that
> stupid <resource-walk> tree being appended to the normal update
> response.  Generating that <resource-walk> tree means returning the
> vsn-rsc-url for *every* item in the tree!  *That's* probably why the
> server is taking anywhere from 20 seconds to a minute to chug.
> 
> The underlying problem:  'svn diff -rX:Y' and 'svn merge -rX:Y' are
> calling RA->do_switch() under the hood, which is a variant of
> RA->do_update().  Because it's a switch, the <resource-walk> is being
> sent.  The design problem here is that only 'svn switch' actually
> wants the resource walk.  'svn diff' and 'svn merge' don't care one
> whit.
> 
> We need a way to prevent that resource walk from happening for two of
> the three callers to RA->do_switch().

Sounds like a third variant is needed then.  do_update(), do_switch()
and do_diff().  What are the other differences (apart from the
resource walk) between do_update() and do_switch() that cause diff and
merge to not be able to use do_update()?


  // Marcus



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Ben Collins-Sussman <su...@collab.net>.
Marcus Comstedt <ma...@mc.pp.se> writes:

> Ben Collins-Sussman <su...@collab.net> writes:
> 
> > Vladmir,
> > 
> > You're asking the server to compare two trees, namely '/' and '/'.  Do
> > you have any idea how *big* those trees are?  We have the /trunk tree,
> > and at least 4 copies of it in /branches, and 2 more copies of it in
> > /tags.  And each source tree is itself pretty large, over 800 files
> > and directories to compare.
> 
> Um, that sounds wrong.  Why would it start comparing files in
> /branches and /tags?  Shouldn't they have the same node id in both
> revisions, thus eliminating the need for further comparison?  The same
> for any subdirectory in trunk which doesn't contain modified files.

Hm, yeah, you're right.  As soon as svn_repos_dir_delta() sees that
the nodes are identical, it won't recurse.  I take back my stupid
analysis.  :-)

I think I have a better explanation of the problem now:  it's that
stupid <resource-walk> tree being appended to the normal update
response.  Generating that <resource-walk> tree means returning the
vsn-rsc-url for *every* item in the tree!  *That's* probably why the
server is taking anywhere from 20 seconds to a minute to chug.

The underlying problem:  'svn diff -rX:Y' and 'svn merge -rX:Y' are
calling RA->do_switch() under the hood, which is a variant of
RA->do_update().  Because it's a switch, the <resource-walk> is being
sent.  The design problem here is that only 'svn switch' actually
wants the resource walk.  'svn diff' and 'svn merge' don't care one
whit.

We need a way to prevent that resource walk from happening for two of
the three callers to RA->do_switch().


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Marcus Comstedt <ma...@mc.pp.se>.
Ben Collins-Sussman <su...@collab.net> writes:

> Vladmir,
> 
> You're asking the server to compare two trees, namely '/' and '/'.  Do
> you have any idea how *big* those trees are?  We have the /trunk tree,
> and at least 4 copies of it in /branches, and 2 more copies of it in
> /tags.  And each source tree is itself pretty large, over 800 files
> and directories to compare.

Um, that sounds wrong.  Why would it start comparing files in
/branches and /tags?  Shouldn't they have the same node id in both
revisions, thus eliminating the need for further comparison?  The same
for any subdirectory in trunk which doesn't contain modified files.


  // Marcus



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "svn diff" performance

Posted by Ben Collins-Sussman <su...@collab.net>.
Vladimir Prus <gh...@cs.msu.su> writes:

>     time svn diff -r2674:2675 http://svn.collab.net/repos/svn
> 
> and the output is
>     real    1m37.448s
>     user    0m0.230s
>     sys     0m0.030s

Vladmir,

You're asking the server to compare two trees, namely '/' and '/'.  Do
you have any idea how *big* those trees are?  We have the /trunk tree,
and at least 4 copies of it in /branches, and 2 more copies of it in
/tags.  And each source tree is itself pretty large, over 800 files
and directories to compare.

If you run 

    svn diff -r2674:2675 http://svn.collab.net/repos/svn/trunk

...it only takes 12 seconds.

If you run 'svn diff -v -r2675', you'd see exactly which files
changed, and might be able to refine the diff even further, to search
a smaller tree.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org