You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by km...@rockwellcollins.com on 2009/10/29 16:36:42 UTC

404 error for each newly added file

In doing some performance tests, I noticed that each newly added file for 
a
svn commit causes a separate HTTP request to the server which then returns
a 404 error page.  For large numbers of files over a high latency 
connection
this is a significant amount of time.  (Even worse if you have a large
custom 404 page defined)

Adding 1000 new files in a single commit over a connection with 500ms 
latency
will waste over 8 minutes in this step alone.

Is it by design that each newly added file in a commit requires the client
to perform a separate HTTP request?  (neon does two PROPFINDs
per file, serf does one HEAD per file)

Both client and server are v1.6.5.  Client was the windows distribution
from tigris.  Server is self compiled on solaris 10 x86 using:

APR_VER       := 1.3.8
APRUTIL_VER   := 1.3.9
NEON_VER      := 0.28.6
SERF_VER      := 0.3.0

Would the new working copy stuff change this behavior, or is it something
required by webdav, or just never annoyed anyone enough to optimize it?

I'm willing to look into it more, if the behavior isn't expected to change
with the new working copy stuff...

Thanks!
Kevin R.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2412719

Re: 404 error for each newly added file

Posted by Justin Erenkrantz <je...@apache.org>.
On Thu, Oct 29, 2009 at 8:04 PM, Branko Čibej <br...@xbc.nu> wrote:
> (But I don't believe you can actually put an ETag value in
> If-Unmodified-Since? I thought those were only for If-Match & co.)

Yah, you're right - Since is probably a date only.  Custom header is
still the way to go unless Greg has another idea.  -- justin

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2412906

Re: 404 error for each newly added file

Posted by Branko Cibej <br...@xbc.nu>.
Justin Erenkrantz wrote:
> On Thu, Oct 29, 2009 at 7:15 PM, Ben Collins-Sussman
> <su...@red-bean.com> wrote:
>   
>>> It's quite possible that we had such checks in there becasue we wanted
>>> compatibility with non-Subversion DeltaV servers or something. Even in
>>> that case, the PUT could be conditional -- just adding an
>>> If-Unmodified-Since header should safely avoid any PUT races.
>>>       
>> Ooh, interesting.  Gstein could comment on whether this is feasible...
>>     
>
> Since we control the horizontal and the vertical here, there's no
> reason we couldn't add this to both mod_dav_svn and ra_serf.  It would
> be a nice low-hanging optimization.  (Could even just call the header
> "If-Unmodified-Since-Version" or whatnot as "Since" would generally be
> an ETag or a date; I think we'd rather key off the version.)

Oh, that's a good point -- just use a non-standard header.
(But I don't believe you can actually put an ETag value in
If-Unmodified-Since? I thought those were only for If-Match & co.)

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2412903

Re: 404 error for each newly added file

Posted by Justin Erenkrantz <je...@apache.org>.
On Thu, Oct 29, 2009 at 7:15 PM, Ben Collins-Sussman
<su...@red-bean.com> wrote:
>> It's quite possible that we had such checks in there becasue we wanted
>> compatibility with non-Subversion DeltaV servers or something. Even in
>> that case, the PUT could be conditional -- just adding an
>> If-Unmodified-Since header should safely avoid any PUT races.
>
> Ooh, interesting.  Gstein could comment on whether this is feasible...

Since we control the horizontal and the vertical here, there's no
reason we couldn't add this to both mod_dav_svn and ra_serf.  It would
be a nice low-hanging optimization.  (Could even just call the header
"If-Unmodified-Since-Version" or whatnot as "Since" would generally be
an ETag or a date; I think we'd rather key off the version.)  --
justin

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2412895

Re: 404 error for each newly added file

Posted by Branko Cibej <br...@xbc.nu>.
Ben Collins-Sussman wrote:
> On Thu, Oct 29, 2009 at 4:31 PM, Branko Cibej <br...@xbc.nu> wrote:
>   
>> It's quite possible that we had such checks in there becasue we wanted
>> compatibility with non-Subversion DeltaV servers or something. Even in
>> that case, the PUT could be conditional -- just adding an
>> If-Unmodified-Since header should safely avoid any PUT races.
>>     
>
> Ooh, interesting.  Gstein could comment on whether this is feasible...
>   

Well, at least according to RFC-2616, it should fit in with the spec. Of
course it's a bit tricky to get the correct date for the header, since
it must be generated by the server in order to avoid clock skew. But the
svn:date of the HEAD-at-start-of-commit should satisfy that consraint
(unless changed by a propset, but doing that is horrible anyway).

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2412901

Re: 404 error for each newly added file

Posted by Ben Collins-Sussman <su...@red-bean.com>.
On Thu, Oct 29, 2009 at 4:31 PM, Branko Cibej <br...@xbc.nu> wrote:
> Ben Collins-Sussman wrote:
>> It's probably just the logic in the ra_dav or ra_serf client network
>> module, enforcing proper webdav (and svn?) semantics.  Every path
>> modified/deleted/added in a commit is a whole separate request.  For
>> 'adds', the client is likely doing an existence check:  does the path
>> already exist?  If so, fail the commit (the client is out of date).
>> Most of the time the answer is "404", i.e. "no, it doesn't exist, so
>> go ahead and PUT the file".)   If this existence check didn't happen,
>> then there's a risk of the PUT simply overwriting an existing object.
>>
>
> I don't quite get that ... if what you say is true, then:
>
>    * you don't gain anything by checking for existence with a GET
>      before a PUT, since that just slightly narrows the race window for
>      someone else creating the resource.

Correct.  It catches out-of-dateness 'most' of the time, but not if
somebody slips in a commit just instant before your own commit
finalizes.  Commit-finalization catches that rare situation.  (And by
the way, it's a PROPFIND, not a GET, because the response is tiny.  We
don't want to be existence-checking by accidentally downloading huge
files.)


>    * In the context of Subversion doing a commit, every PUT that
>      happens is the result of an editor drive callback, which implies
>      that we *do* already know which objects are on the server.

Incorrect.  mod_dav_svn always begins a txn as a copy of "HEAD at that
moment".  That HEAD may be newer than what the working copy is sending
an editor drive against.  So these checks allow us to fail out early
due to out-of-dateness.

>
> It's quite possible that we had such checks in there becasue we wanted
> compatibility with non-Subversion DeltaV servers or something. Even in
> that case, the PUT could be conditional -- just adding an
> If-Unmodified-Since header should safely avoid any PUT races.

Ooh, interesting.  Gstein could comment on whether this is feasible...




On Thu, Oct 29, 2009 at 4:29 PM, Mark Phippard <ma...@gmail.com> wrote:

> It'd be just as bad to send all the data and have the commit rejected
> (properly) because it is out of date.  That said, couldn't we add a
> custom REPORT request or something where we send all these checks to
> the server in one batch before we start the PUT phase?

That might help, possibly.  The final result, I hope is that we end up
doing what DVCS systems do:  after this initial REPORT sanity-check,
push the entire CL in a single PUT, not divide the CL into a zillion
tiny write requests.

> I know the
> HTTP v2 stuff eliminated the PROPFIND, did it do anything like this
> too?  Would it be worth looking at if Kevin wanted to?

PROPFIND wasn't eliminated;  just spurious *uses* of PROPFIND which
mindlessly followed DeltaV formalisms.  PROPFIND still has simple
legitimate uses (like... fetching props!).  :-)

> I still prefer HTTP for all of the things that Apache gives us.  Now
> that we are willing to "improve upon" WebDAV why shouldn't we consider
> other opportunities to speed up our usage of HTTP?

Preaching to the choir!

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2412886

Re: 404 error for each newly added file

Posted by Branko Cibej <br...@xbc.nu>.
Ben Collins-Sussman wrote:
> It's probably just the logic in the ra_dav or ra_serf client network
> module, enforcing proper webdav (and svn?) semantics.  Every path
> modified/deleted/added in a commit is a whole separate request.  For
> 'adds', the client is likely doing an existence check:  does the path
> already exist?  If so, fail the commit (the client is out of date).
> Most of the time the answer is "404", i.e. "no, it doesn't exist, so
> go ahead and PUT the file".)   If this existence check didn't happen,
> then there's a risk of the PUT simply overwriting an existing object.
>   

I don't quite get that ... if what you say is true, then:

    * you don't gain anything by checking for existence with a GET
      before a PUT, since that just slightly narrows the race window for
      someone else creating the resource.
    * In the context of Subversion doing a commit, every PUT that
      happens is the result of an editor drive callback, which implies
      that we *do* already know which objects are on the server. Since
      the whole commit is a transaction at the repository layer, any PUT
      that would overwrite an object that's "already there" would in
      fact fail due to a conflict at txn commit time.

It's quite possible that we had such checks in there becasue we wanted
compatibility with non-Subversion DeltaV servers or something. Even in
that case, the PUT could be conditional -- just adding an
If-Unmodified-Since header should safely avoid any PUT races.

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2412826

Re: 404 error for each newly added file

Posted by km...@rockwellcollins.com.
sussman@gmail.com wrote on 10/29/2009 04:11:21 PM:

> It's probably just the logic in the ra_dav or ra_serf client network
> module, enforcing proper webdav (and svn?) semantics.  Every path
> modified/deleted/added in a commit is a whole separate request.  For
> 'adds', the client is likely doing an existence check:  does the path
> already exist?  If so, fail the commit (the client is out of date).
> Most of the time the answer is "404", i.e. "no, it doesn't exist, so
> go ahead and PUT the file".)   If this existence check didn't happen,
> then there's a risk of the PUT simply overwriting an existing object.
> 
> Honestly, though:  if you're worried about svn's performance over
> HTTP, stop using HTTP.  It's a lost battle.  HTTP is stateless, slow,
> and WebDAV is really complicated.  We already rewrote it to use 30%
> fewer requests earlier this year, and I doubt it can get much leaner.
> 
> If you want speed, you'll get an order-of-magnitude speedup by
> switching to svn:// instead of http://.

In my testing, I've seen very little (if any) performance difference
between svn:// and http://...  (Both wan and lan links, both large
and small transactions).  Possibly I'm doing something wrong, or
possibly it is an artifact of our network topology.  I was even
giving svnserve an advantage by not using any authentication in
my tests...

Until recently, the lack of logging in svnserve was a show stopper.
I'd also need to get kerberos authentication setup with
both windows and solaris servers under svnserve.  Might be
a real challenge on the windows server.

I'd gladly move to svn:// (or at least support both) if I can
find raw data to validate the performance claims.

Kevin R.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2412853

Re: 404 error for each newly added file

Posted by Mark Phippard <ma...@gmail.com>.
On Thu, Oct 29, 2009 at 5:11 PM, Ben Collins-Sussman
<su...@red-bean.com> wrote:
> It's probably just the logic in the ra_dav or ra_serf client network
> module, enforcing proper webdav (and svn?) semantics.  Every path
> modified/deleted/added in a commit is a whole separate request.  For
> 'adds', the client is likely doing an existence check:  does the path
> already exist?  If so, fail the commit (the client is out of date).
> Most of the time the answer is "404", i.e. "no, it doesn't exist, so
> go ahead and PUT the file".)   If this existence check didn't happen,
> then there's a risk of the PUT simply overwriting an existing object.

It'd be just as bad to send all the data and have the commit rejected
(properly) because it is out of date.  That said, couldn't we add a
custom REPORT request or something where we send all these checks to
the server in one batch before we start the PUT phase?  I know the
HTTP v2 stuff eliminated the PROPFIND, did it do anything like this
too?  Would it be worth looking at if Kevin wanted to?


> Honestly, though:  if you're worried about svn's performance over
> HTTP, stop using HTTP.  It's a lost battle.  HTTP is stateless, slow,
> and WebDAV is really complicated.  We already rewrote it to use 30%
> fewer requests earlier this year, and I doubt it can get much leaner.
>
> If you want speed, you'll get an order-of-magnitude speedup by
> switching to svn:// instead of http://.

It sounds like glasser is rubbing off on you.  :)

I still prefer HTTP for all of the things that Apache gives us.  Now
that we are willing to "improve upon" WebDAV why shouldn't we consider
other opportunities to speed up our usage of HTTP?

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2412824

Re: 404 error for each newly added file

Posted by Ben Collins-Sussman <su...@red-bean.com>.
It's probably just the logic in the ra_dav or ra_serf client network
module, enforcing proper webdav (and svn?) semantics.  Every path
modified/deleted/added in a commit is a whole separate request.  For
'adds', the client is likely doing an existence check:  does the path
already exist?  If so, fail the commit (the client is out of date).
Most of the time the answer is "404", i.e. "no, it doesn't exist, so
go ahead and PUT the file".)   If this existence check didn't happen,
then there's a risk of the PUT simply overwriting an existing object.

Honestly, though:  if you're worried about svn's performance over
HTTP, stop using HTTP.  It's a lost battle.  HTTP is stateless, slow,
and WebDAV is really complicated.  We already rewrote it to use 30%
fewer requests earlier this year, and I doubt it can get much leaner.

If you want speed, you'll get an order-of-magnitude speedup by
switching to svn:// instead of http://.


On Thu, Oct 29, 2009 at 11:36 AM,  <km...@rockwellcollins.com> wrote:
>
> In doing some performance tests, I noticed that each newly added file for a
> svn commit causes a separate HTTP request to the server which then returns
> a 404 error page.  For large numbers of files over a high latency connection
> this is a significant amount of time.  (Even worse if you have a large
> custom 404 page defined)
>
> Adding 1000 new files in a single commit over a connection with 500ms
> latency
> will waste over 8 minutes in this step alone.
>
> Is it by design that each newly added file in a commit requires the client
> to perform a separate HTTP request?  (neon does two PROPFINDs
> per file, serf does one HEAD per file)
>
> Both client and server are v1.6.5.  Client was the windows distribution
> from tigris.  Server is self compiled on solaris 10 x86 using:
>
> APR_VER       := 1.3.8
> APRUTIL_VER   := 1.3.9
> NEON_VER      := 0.28.6
> SERF_VER      := 0.3.0
>
> Would the new working copy stuff change this behavior, or is it something
> required by webdav, or just never annoyed anyone enough to optimize it?
>
> I'm willing to look into it more, if the behavior isn't expected to change
> with the new working copy stuff...
>
> Thanks!
> Kevin R.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2412821