You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Shahaf <d....@daniel.shahaf.name> on 2015/04/01 12:28:18 UTC

branching over mod_dav 2.4.6 is O(tree) (was: Re: Branching slow 1.8.11 https)

[ moving to dev@, please remove users@ from replies ]

Johan Corveleyn wrote on Sun, Mar 29, 2015 at 19:57:34 +0200:
> On Sat, Mar 28, 2015 at 5:09 PM, Bert Huijben <be...@qqmail.nl> wrote:
> > Httpd's mod_dav was updated in some recent version to do a full lock
> > traversal on copies and moves. I think we already applied some
> > optimizations, but the real fix would be that mod_dav shouldn't do
> > this work (which our repos layer already does).
> >
> > I'm not sure which release we applied the first set of optimizations.
> >
> 
> Thanks for refreshing my memory.
> 
> So the problem is known as issue #4531 (server-side copy (over dav)
> uses too much memory) [1]. The memory usage issue has been fixed in
> SVN 1.8.11 and 1.7.19 (see CHANGES), but a performance problem remains
> (copy is no longer O(1), but depends on the size of the tree being
> copied). That's a direct violation of one of Subversion's "old selling
> points" vs. CVS: that branching / tagging is O(1). Branching / tagging
> taking several minutes brings back "fond memories" from CVS' days.
> 
> As Philip pointed out in his last comment on #4531 [2]: "This issue is
> related to a change in mod_dav in 2.2.25 to fix PR54610 which
> added a walk over the copy source looking for lock tokens." (also
> released in 2.4.5; so both httpd 2.2.25+ and 2.4.5+ are affected --
> older httpd's won't have this problem I guess).
> 
> Again quoting Philip: "Apache knows in advance that the walk is
> redundant in cases such as Subversion's URL-to-URL copy but Subversion
> cannot avoid the read access. We should attempt to fix mod_dav to
> avoid the walk where possible."
> 
> So my hope rests with Philip and others who might have the necessary
> knowledge to fix this in mod_dav. It's really not acceptable that
> branching / tagging (or I'm guessing also: moving a large tree with a
> server-side move) takes several minutes.

So, what will a mod_dav fix look like?  I understand the issue is that
it walks the copy source for locks.  Should it stop doing that?  Should
it allow the backend module (mod_dav_svn / mod_dav_fs) to implement the
walk in a more efficient manner — for example, adding a "Are there any
locks under path X" hook that the backend module could implement?

(Rather than the current design, which AIUI is the backend walks the
tree and mod_dav calls "Is path X locked" for each path reported by the
backend.)

Daniel

RE: branching over mod_dav 2.4.6 is O(tree)

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Ben Reser [mailto:ben@reser.org]
> Sent: zaterdag 4 april 2015 23:49
> To: Johan Corveleyn
> Cc: Subversion Development
> Subject: Re: branching over mod_dav 2.4.6 is O(tree)
> 
> On 4/4/15 1:47 PM, Johan Corveleyn wrote:
> > I only know SVN, not much of httpd and certainly not much of DAV. So
> > I'm reading all this from behind "SVN glasses". In what cases would
> > SVN send lock tokens (in the form of If headers) when executing a
> > (server-side) copy?
> 
> If by server-side copy you mean `svn cp $URL1 $URL2` I don't believe that will
> ever happen.
> 
> However a trivial operation like the following would result in a very slow copy
> even with the patch I posted (assuming that foo is a large tree that has at
> least one file locked in it that the client holds a lock token for):
> 
> svn rm foo
> svn cp bar foo
> svn ci
> 
> 
> > (I'm wondering: when executing a wc-repos copy ('svn copy $WC $URL')
> > this issue is also relevant, isn't it? Because such a copy is
> > internally executed as a copy on the server combined with
> > modifications from the WC, right? Maybe that's the case where the
> > client could be sending lock tokens together with the copy?)
> 
> Don't believe we'd send any lock tokens in that case but I could be wrong.

In this case we don't send lock tokens.

But when we perform a
$ svn rm bar --force
$ svn cp foo bar
$ svn ci -m ""

We will send the lock tokens that exist below bar pre-replacement.

Subversion does send the path (=full url) with the lock token, so it might be possible to optimize for that case.


There is another interesting difference between generic DAV and Subversion.
Subversion doesn't operate on the public url when committing, while generic DAV clients do. Our ra operations that produce a revision always work on a transaction which has a specific (hidden) url. I think mod_dav would be able to tell the difference.

(In mod_dav_svn we just detect whether a client is an is_svn_client, but we don't want such logic in mod_dav itself)

	Bert


RE: branching over mod_dav 2.4.6 is O(tree)

Posted by Bert Huijben <be...@qqmail.nl>.
Our compatibility with dav also depends on the autoversioning flag in the apache config and inside mod dav sometimes also on the is_svn_client flag.

Bert

-----Original Message-----
From: "Johan Corveleyn" <jc...@gmail.com>
Sent: ‎5-‎4-‎2015 10:10
To: "Ben Reser" <be...@reser.org>
Cc: "Subversion Development" <de...@subversion.apache.org>
Subject: Re: branching over mod_dav 2.4.6 is O(tree)

On Sat, Apr 4, 2015 at 11:48 PM, Ben Reser <be...@reser.org> wrote:
...
> This isn't about lock validation.  I already added a flag to mod_dav's request
> validation to say not to validate locks on the COPY source.  Doing the lock
> validation on the COPY source was flat out wrong in generic DAV as well as
> SVN's usage of DAV.  The problem is that mod_dav's code to validate ETag's in
> If statements is (unfortunately) intertwined with the locks.  Since mod_dav's
> interface to validate locks walks the tree to discover the locks this means
> that validating the ETag's also walks the tree (which is itself probably
> suboptimal but that's another issue for another day).

Ah I see. I was starting to worry that PR54610 [1] also fixed some
wrong behavior applicable to SVN (lock token validation), but now I
understand that is not the case. This is all a side-effect of fixing
the code for ETag's (which needed fixing for COPY, hence the patch),
the logic of which unfortunately was intertwined with lock validation
(not needed for COPY, but part of the deal with the current code) --
and both of them work the same way, by crawling the tree.

> My long term plan is for the request validation is to behave as follows:
>
> Walk the If headers and validate any ETags and then remove them from the list
> of If headers.  Ultimately leaving only lock tokens in the list of If headers
> to handle.
>
> If lock validation is required then handle that as we do today.  Which will
> ultimately be skipped on the COPY source.
>
> This means that mod_dav's protocol support is intact without any additional
> cost to SVN.  No flags or options are required.  It just works.  Because SVN
> never sends ETags.

That sounds like the perfect solution. I'm afraid I can't help much in
practice ... I can only cheer you on with this :-).

> mod_dav used to ignore ETags on the COPY source, but someone submitted a patch
> to the httpd project, which got applied to fix this.  Meaning it's important to
> someone for them to spend the time to fix it.  They were using mod_dav_fs.  So
> I'm not inclined to break their improvement.

Ack.

>> Also, as hinted by someone else: how strict do we in mod_dav_svn have
>> to follow the dav spec (to act like a law-abiding dav implementation)?
>> Didn't we already take steps away from that by implementing our http
>> v2 protocol?
>
> mod_dav_svn needs to stay a fully valid DAV server.  This is actually an
> advertised feature of Subversion.  It's probably not a very commonly used
> feature but it is one that we advertise.  There are many generic DAV clients
> out there (OS X can mount a DAV server as a file system, various Microsoft
> products can talk directly to them, etc...)
>
> What mod_dav_svn is not advertised is as a generic DeltaV server (which is an
> additional layer on top of DAV).  Our protocol changes with http v2 were to
> diverge from the official DeltaV specification.  We never promised anyone that
> we'd continue to adhere strictly to DeltaV.
>
> To the best of my knowledge there is almost no interoperable DeltaV software
> out there.  I haven't actually tried but I don't think Cadavear can really be
> used as a SVN client and I'm certain you can't use SVN against the only other
> DeltaV server (Oberon) that exists.

I understand now, thanks for clarifying that. It's the DeltaV
extensions that we dropped support for with http v2, but DAV
definitely needs to stay.

> Despite all of that, we actually do have to continue to support the original
> protocol implementation (that as far as I'm aware of is precisely DeltaV)
> because we have SVN clients in the wild that only talk that protocol.
>
> So no I don't think we can just ignore the protocol.

Ack. As long as we produce 1.x servers, we need to keep the http v1
protocol (with DeltaV) working.

[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=54610

-- 
Johan

Re: branching over mod_dav 2.4.6 is O(tree)

Posted by Johan Corveleyn <jc...@gmail.com>.
On Sat, Apr 4, 2015 at 11:48 PM, Ben Reser <be...@reser.org> wrote:
...
> This isn't about lock validation.  I already added a flag to mod_dav's request
> validation to say not to validate locks on the COPY source.  Doing the lock
> validation on the COPY source was flat out wrong in generic DAV as well as
> SVN's usage of DAV.  The problem is that mod_dav's code to validate ETag's in
> If statements is (unfortunately) intertwined with the locks.  Since mod_dav's
> interface to validate locks walks the tree to discover the locks this means
> that validating the ETag's also walks the tree (which is itself probably
> suboptimal but that's another issue for another day).

Ah I see. I was starting to worry that PR54610 [1] also fixed some
wrong behavior applicable to SVN (lock token validation), but now I
understand that is not the case. This is all a side-effect of fixing
the code for ETag's (which needed fixing for COPY, hence the patch),
the logic of which unfortunately was intertwined with lock validation
(not needed for COPY, but part of the deal with the current code) --
and both of them work the same way, by crawling the tree.

> My long term plan is for the request validation is to behave as follows:
>
> Walk the If headers and validate any ETags and then remove them from the list
> of If headers.  Ultimately leaving only lock tokens in the list of If headers
> to handle.
>
> If lock validation is required then handle that as we do today.  Which will
> ultimately be skipped on the COPY source.
>
> This means that mod_dav's protocol support is intact without any additional
> cost to SVN.  No flags or options are required.  It just works.  Because SVN
> never sends ETags.

That sounds like the perfect solution. I'm afraid I can't help much in
practice ... I can only cheer you on with this :-).

> mod_dav used to ignore ETags on the COPY source, but someone submitted a patch
> to the httpd project, which got applied to fix this.  Meaning it's important to
> someone for them to spend the time to fix it.  They were using mod_dav_fs.  So
> I'm not inclined to break their improvement.

Ack.

>> Also, as hinted by someone else: how strict do we in mod_dav_svn have
>> to follow the dav spec (to act like a law-abiding dav implementation)?
>> Didn't we already take steps away from that by implementing our http
>> v2 protocol?
>
> mod_dav_svn needs to stay a fully valid DAV server.  This is actually an
> advertised feature of Subversion.  It's probably not a very commonly used
> feature but it is one that we advertise.  There are many generic DAV clients
> out there (OS X can mount a DAV server as a file system, various Microsoft
> products can talk directly to them, etc...)
>
> What mod_dav_svn is not advertised is as a generic DeltaV server (which is an
> additional layer on top of DAV).  Our protocol changes with http v2 were to
> diverge from the official DeltaV specification.  We never promised anyone that
> we'd continue to adhere strictly to DeltaV.
>
> To the best of my knowledge there is almost no interoperable DeltaV software
> out there.  I haven't actually tried but I don't think Cadavear can really be
> used as a SVN client and I'm certain you can't use SVN against the only other
> DeltaV server (Oberon) that exists.

I understand now, thanks for clarifying that. It's the DeltaV
extensions that we dropped support for with http v2, but DAV
definitely needs to stay.

> Despite all of that, we actually do have to continue to support the original
> protocol implementation (that as far as I'm aware of is precisely DeltaV)
> because we have SVN clients in the wild that only talk that protocol.
>
> So no I don't think we can just ignore the protocol.

Ack. As long as we produce 1.x servers, we need to keep the http v1
protocol (with DeltaV) working.

[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=54610

-- 
Johan

Re: branching over mod_dav 2.4.6 is O(tree)

Posted by Greg Stein <gs...@gmail.com>.
On Sat, Apr 4, 2015 at 4:48 PM, Ben Reser <be...@reser.org> wrote:
>...

> Lock Token: The client does not have to provide a mapping of the paths to
> the
> tokens that must match what the server has.  Rather it must simply provide
> all
> the tokens attached to the active locks.  So the server has to find all the
> active tokens under a given path and then match them against what the
> client
> has provided.  DAV supports recursive locks so the same token can be
> applied to
> multiple paths.  The client can provide that token attached to any path
> (not
> even one that it is attached to).  As long as they provide the token then
> they're ok.
>
> In the process of fixing mod_dav's issues over the last few years I've
> written
> a test suite for mod_dav_fs (one of these days I'm going to clean it up so
> I
> can commit it someplace).  When I first wrote it I thought this locking
> behavior was a bug.  Careful reading of the DAV spec says this is exactly
> how
> the protocol is supposed to operate.
>
> I believe, but I wasn't involved in the original DAV standardization
> discussions so I can't say for sure, that there are probably good reasons
> for
> this.  Locks move with paths as they move as so a lock token provided by a
> client might be based on a different location than the current lock
> applies to.
>  Also DAV supports recursive locks and the actual locked location might be
> above anything the client is aware of.
>

And if you modify A/B, then sometimes you must provide lock tokens for both
A and B.

>...

> This isn't about lock validation.  I already added a flag to mod_dav's
> request
> validation to say not to validate locks on the COPY source.  Doing the lock
> validation on the COPY source was flat out wrong in generic DAV as well as
> SVN's usage of DAV.  The problem is that mod_dav's code to validate ETag's
> in
> If statements is (unfortunately) intertwined with the locks.  Since
> mod_dav's
> interface to validate locks walks the tree to discover the locks this means
> that validating the ETag's also walks the tree (which is itself probably
> suboptimal but that's another issue for another day).
>

It's kind of hard to get around that tree walk :-/

But I'll warn you now: that if-header processing logic, that handles the
etags and locks ... the code in there is my THIRD attack at the problem. It
was incredibly difficult to get it done correctly. Other people ran into
the same difficulty of implementing that stuff; I still recall another
implementor trying to get it right, and the answer his manager provided
was, "do whatever mod_dav does".

I'd be happy to review any patches. Just let me know.

>...

> What mod_dav_svn is not advertised is as a generic DeltaV server (which is
> an
> additional layer on top of DAV).  Our protocol changes with http v2 were to
> diverge from the official DeltaV specification.  We never promised anyone
> that
> we'd continue to adhere strictly to DeltaV.
>
> To the best of my knowledge there is almost no interoperable DeltaV
> software
> out there.  I haven't actually tried but I don't think Cadavear can really
> be
> used as a SVN client and I'm certain you can't use SVN against the only
> other
> DeltaV server (Oberon) that exists.
>

None. We hoped there would be, and that we could make the necessary
client/server changes to interoperate with that software ... but none ever
developed. DeltaV was designed with ClearCase and Perforce models in mind,
and possibly a couple others. It had zero support for SVN/CVS style
client-side working copies. I got that added into the spec, back in 2000,
and that's what we used for our first protocol.

But with the lack of other DeltaV software, there was no need to stick to
extra machinations, and CMike got all that ripped out with v2.


> Despite all of that, we actually do have to continue to support the
> original
> protocol implementation (that as far as I'm aware of is precisely DeltaV)
> because we have SVN clients in the wild that only talk that protocol.
>

Right. SVN 1.0 clients should be able to work against a 1.9 server (and
vice-versa).

Cheers,
-g

Re: branching over mod_dav 2.4.6 is O(tree)

Posted by Ben Reser <be...@reser.org>.
On 4/4/15 1:47 PM, Johan Corveleyn wrote:
> I only know SVN, not much of httpd and certainly not much of DAV. So
> I'm reading all this from behind "SVN glasses". In what cases would
> SVN send lock tokens (in the form of If headers) when executing a
> (server-side) copy?

If by server-side copy you mean `svn cp $URL1 $URL2` I don't believe that will
ever happen.

However a trivial operation like the following would result in a very slow copy
even with the patch I posted (assuming that foo is a large tree that has at
least one file locked in it that the client holds a lock token for):

svn rm foo
svn cp bar foo
svn ci


> (I'm wondering: when executing a wc-repos copy ('svn copy $WC $URL')
> this issue is also relevant, isn't it? Because such a copy is
> internally executed as a copy on the server combined with
> modifications from the WC, right? Maybe that's the case where the
> client could be sending lock tokens together with the copy?)

Don't believe we'd send any lock tokens in that case but I could be wrong.

> Side-question: if the COPY request contains 1 If header with 1 token,
> can't DAV locate where that lock token applies in the tree (so as not
> to have to crawl it)? This may be a dumb question, I don't really
> understand much of all of this :-).

The If header has two different behaviors depending on if you're sending an
ETag or a Lock Token.

ETag: The specific path provided (either implicitly as the request URI by not
tagging the condition with a path or explicitly by tagging the condition with a
specific path) must match the ETag provided.  Thus you can very much do what
you're suggesting of just iterating the ETags provided and verify they match.

Lock Token: The client does not have to provide a mapping of the paths to the
tokens that must match what the server has.  Rather it must simply provide all
the tokens attached to the active locks.  So the server has to find all the
active tokens under a given path and then match them against what the client
has provided.  DAV supports recursive locks so the same token can be applied to
multiple paths.  The client can provide that token attached to any path (not
even one that it is attached to).  As long as they provide the token then
they're ok.

In the process of fixing mod_dav's issues over the last few years I've written
a test suite for mod_dav_fs (one of these days I'm going to clean it up so I
can commit it someplace).  When I first wrote it I thought this locking
behavior was a bug.  Careful reading of the DAV spec says this is exactly how
the protocol is supposed to operate.

I believe, but I wasn't involved in the original DAV standardization
discussions so I can't say for sure, that there are probably good reasons for
this.  Locks move with paths as they move as so a lock token provided by a
client might be based on a different location than the current lock applies to.
 Also DAV supports recursive locks and the actual locked location might be
above anything the client is aware of.

> This sounds a bit like: from dav we have to do this expensive stuff,
> but if you know you're dav_svn, you don't really have to do all of
> this, and can decide to take shortcuts, or implement the same
> semantics in another way. But you're suggesting modifying the dav
> logic in a hard-coded way to do this. Can't this better be done via
> some configuration, or a callback?
>
> 1) Configuration: directive
> DAVDontCheckLocksOnCopyBecauseIHaveItCovered :-) to be used in
> combination with DAV svn. Or maybe mod_dav_svn can set this
> configuration itself immediately in the right scope?
> 
> 2) Callback: boolean dav_want_lock_validation_on_copy() or something
> like that ...

This isn't about lock validation.  I already added a flag to mod_dav's request
validation to say not to validate locks on the COPY source.  Doing the lock
validation on the COPY source was flat out wrong in generic DAV as well as
SVN's usage of DAV.  The problem is that mod_dav's code to validate ETag's in
If statements is (unfortunately) intertwined with the locks.  Since mod_dav's
interface to validate locks walks the tree to discover the locks this means
that validating the ETag's also walks the tree (which is itself probably
suboptimal but that's another issue for another day).

My long term plan is for the request validation is to behave as follows:

Walk the If headers and validate any ETags and then remove them from the list
of If headers.  Ultimately leaving only lock tokens in the list of If headers
to handle.

If lock validation is required then handle that as we do today.  Which will
ultimately be skipped on the COPY source.

This means that mod_dav's protocol support is intact without any additional
cost to SVN.  No flags or options are required.  It just works.  Because SVN
never sends ETags.

mod_dav used to ignore ETags on the COPY source, but someone submitted a patch
to the httpd project, which got applied to fix this.  Meaning it's important to
someone for them to spend the time to fix it.  They were using mod_dav_fs.  So
I'm not inclined to break their improvement.

> Also, as hinted by someone else: how strict do we in mod_dav_svn have
> to follow the dav spec (to act like a law-abiding dav implementation)?
> Didn't we already take steps away from that by implementing our http
> v2 protocol?

mod_dav_svn needs to stay a fully valid DAV server.  This is actually an
advertised feature of Subversion.  It's probably not a very commonly used
feature but it is one that we advertise.  There are many generic DAV clients
out there (OS X can mount a DAV server as a file system, various Microsoft
products can talk directly to them, etc...)

What mod_dav_svn is not advertised is as a generic DeltaV server (which is an
additional layer on top of DAV).  Our protocol changes with http v2 were to
diverge from the official DeltaV specification.  We never promised anyone that
we'd continue to adhere strictly to DeltaV.

To the best of my knowledge there is almost no interoperable DeltaV software
out there.  I haven't actually tried but I don't think Cadavear can really be
used as a SVN client and I'm certain you can't use SVN against the only other
DeltaV server (Oberon) that exists.

Despite all of that, we actually do have to continue to support the original
protocol implementation (that as far as I'm aware of is precisely DeltaV)
because we have SVN clients in the wild that only talk that protocol.

So no I don't think we can just ignore the protocol.

Re: branching over mod_dav 2.4.6 is O(tree)

Posted by Johan Corveleyn <jc...@gmail.com>.
On Fri, Apr 3, 2015 at 8:23 PM, Ben Reser <be...@reser.org> wrote:
> On 4/1/15 5:28 AM, Johan Corveleyn wrote:
>> On Wed, Apr 1, 2015 at 1:21 PM, Philip Martin
>> <ph...@wandisco.com> wrote:
>>> Yes, I believe so.  mod_dav should avoid the walk when the callback
>>> result for every path in the walk is known in advance.  I spoke to Ben
>>> yesterday and he said he has an httpd patch that is just about ready.

Hi Ben, thanks for looking into this. I have a few questions / thoughts below.

> So first of all this DAV code has two somewhat overlapping purposes.  It
> handles If headers and it handles locks.  There is some logic to this since
> lock tokens can be encoded in If headers.  However, you can also encode ETags
> as conditions in If headers.  For example a DAV client could use an If header
> to only copy something if it had not been modified since it was last aware.
> Until r1476604 (released in 2.2.25 and 2.4.5) mod_dav didn't handle these
> properly for the copy source.
>
> It is necessary to do the walk to handle locks properly since we have to look
> for all the locks and then make sure we received all the lock tokens for them.
>  It is not necessary to walk the tree to handle ETag conditions since we are
> given the paths and can just handle them one by one.
>
> So what I've been working on is trying to separate these two things out from
> each other.  So that locks and ETags can be handled separately and so that we
> only walk the tree if necessary to check locks.
>
> I'm not quite done with this work.
>
> I'm considering committing the following to httpd today given the attention
> this has gotten:
> [[[
> Index: modules/dav/main/util.c
> ===================================================================
> --- modules/dav/main/util.c     (revision 1671097)
> +++ modules/dav/main/util.c     (working copy)
> @@ -1595,8 +1595,10 @@
>          }
>      }
>
> -    /* (1) Validate the specified resource, at the specified depth */
> -    if (resource->exists && depth > 0) {
> +    /* (1) Validate the specified resource, at the specified depth.
> +     * Avoid the walk there is no if_header and we aren't planning
> +     * to modify this resource. */
> +    if (resource->exists && depth > 0 && !(!if_header && flags &
> DAV_VALIDATE_NO_MODIFY)) {
>          dav_walker_ctx ctx = { { 0 } };
>          dav_response *multi_status;
>
> ]]]
>
> This is worse than the workarounds you have already come up with from an SVN
> only perspective.  The workarounds you have come up with would prevent the use
> of ETag If headers on the copy source, but SVN clients don't use those anyway.
>  The above patch would not break ETag If headers but SVN would still have the
> penalty of the walk if the client is sending lock tokens for the destination
> (since SVN sends those as If headers).  But I think it would at least hide the
> issue from more users in the meantime.

I only know SVN, not much of httpd and certainly not much of DAV. So
I'm reading all this from behind "SVN glasses". In what cases would
SVN send lock tokens (in the form of If headers) when executing a
(server-side) copy?

(I'm wondering: when executing a wc-repos copy ('svn copy $WC $URL')
this issue is also relevant, isn't it? Because such a copy is
internally executed as a copy on the server combined with
modifications from the WC, right? Maybe that's the case where the
client could be sending lock tokens together with the copy?)

Side-question: if the COPY request contains 1 If header with 1 token,
can't DAV locate where that lock token applies in the tree (so as not
to have to crawl it)? This may be a dumb question, I don't really
understand much of all of this :-).

> For SVN only use you can remove the test for the if_header (which I'm not
> comfortable committing to httpd since it just moves the breakage to mod_dav_fs):
> if (resource->exists && depth > 0 && !(flags & DAV_VALIDATE_NO_MODIFY))

This sounds a bit like: from dav we have to do this expensive stuff,
but if you know you're dav_svn, you don't really have to do all of
this, and can decide to take shortcuts, or implement the same
semantics in another way. But you're suggesting modifying the dav
logic in a hard-coded way to do this. Can't this better be done via
some configuration, or a callback?

1) Configuration: directive
DAVDontCheckLocksOnCopyBecauseIHaveItCovered :-) to be used in
combination with DAV svn. Or maybe mod_dav_svn can set this
configuration itself immediately in the right scope?

2) Callback: boolean dav_want_lock_validation_on_copy() or something
like that ...

Also, as hinted by someone else: how strict do we in mod_dav_svn have
to follow the dav spec (to act like a law-abiding dav implementation)?
Didn't we already take steps away from that by implementing our http
v2 protocol?

>> That's great news.
>>
>> Just to be clear: the subject says "mod_dav 2.4.6", but be advised
>> that 2.2.25 and higher are also affected, so I hope that an httpd fix
>> will be backported to 2.2.x as well.
>
> That's my intent.  Whatever is done will be backported to 2.2.x and 2.4.x.
> I'll keep an eye open and make sure that the above patch gets committed and
> backported to 2.2.x and 2.4.x if I don't have the more complete refactoring
> done to at least try and limit the situations this comes up in

Great, thanks in any case.
-- 
Johan

Re: branching over mod_dav 2.4.6 is O(tree)

Posted by Ben Reser <be...@reser.org>.
On 4/1/15 5:28 AM, Johan Corveleyn wrote:
> On Wed, Apr 1, 2015 at 1:21 PM, Philip Martin
> <ph...@wandisco.com> wrote:
>> Yes, I believe so.  mod_dav should avoid the walk when the callback
>> result for every path in the walk is known in advance.  I spoke to Ben
>> yesterday and he said he has an httpd patch that is just about ready.

So first of all this DAV code has two somewhat overlapping purposes.  It
handles If headers and it handles locks.  There is some logic to this since
lock tokens can be encoded in If headers.  However, you can also encode ETags
as conditions in If headers.  For example a DAV client could use an If header
to only copy something if it had not been modified since it was last aware.
Until r1476604 (released in 2.2.25 and 2.4.5) mod_dav didn't handle these
properly for the copy source.

It is necessary to do the walk to handle locks properly since we have to look
for all the locks and then make sure we received all the lock tokens for them.
 It is not necessary to walk the tree to handle ETag conditions since we are
given the paths and can just handle them one by one.

So what I've been working on is trying to separate these two things out from
each other.  So that locks and ETags can be handled separately and so that we
only walk the tree if necessary to check locks.

I'm not quite done with this work.

I'm considering committing the following to httpd today given the attention
this has gotten:
[[[
Index: modules/dav/main/util.c
===================================================================
--- modules/dav/main/util.c	(revision 1671097)
+++ modules/dav/main/util.c	(working copy)
@@ -1595,8 +1595,10 @@
         }
     }

-    /* (1) Validate the specified resource, at the specified depth */
-    if (resource->exists && depth > 0) {
+    /* (1) Validate the specified resource, at the specified depth.
+     * Avoid the walk there is no if_header and we aren't planning
+     * to modify this resource. */
+    if (resource->exists && depth > 0 && !(!if_header && flags &
DAV_VALIDATE_NO_MODIFY)) {
         dav_walker_ctx ctx = { { 0 } };
         dav_response *multi_status;

]]]

This is worse than the workarounds you have already come up with from an SVN
only perspective.  The workarounds you have come up with would prevent the use
of ETag If headers on the copy source, but SVN clients don't use those anyway.
 The above patch would not break ETag If headers but SVN would still have the
penalty of the walk if the client is sending lock tokens for the destination
(since SVN sends those as If headers).  But I think it would at least hide the
issue from more users in the meantime.

For SVN only use you can remove the test for the if_header (which I'm not
comfortable committing to httpd since it just moves the breakage to mod_dav_fs):
if (resource->exists && depth > 0 && !(flags & DAV_VALIDATE_NO_MODIFY))

> That's great news.
> 
> Just to be clear: the subject says "mod_dav 2.4.6", but be advised
> that 2.2.25 and higher are also affected, so I hope that an httpd fix
> will be backported to 2.2.x as well.

That's my intent.  Whatever is done will be backported to 2.2.x and 2.4.x.
I'll keep an eye open and make sure that the above patch gets committed and
backported to 2.2.x and 2.4.x if I don't have the more complete refactoring
done to at least try and limit the situations this comes up in

Re: branching over mod_dav 2.4.6 is O(tree)

Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, Apr 1, 2015 at 1:21 PM, Philip Martin
<ph...@wandisco.com> wrote:
> Daniel Shahaf <d....@daniel.shahaf.name> writes:
>
>> So, what will a mod_dav fix look like?  I understand the issue is that
>> it walks the copy source for locks.  Should it stop doing that?
>
> Yes, I believe so.  mod_dav should avoid the walk when the callback
> result for every path in the walk is known in advance.  I spoke to Ben
> yesterday and he said he has an httpd patch that is just about ready.

That's great news.

Just to be clear: the subject says "mod_dav 2.4.6", but be advised
that 2.2.25 and higher are also affected, so I hope that an httpd fix
will be backported to 2.2.x as well.

-- 
Johan

Re: branching over mod_dav 2.4.6 is O(tree)

Posted by Philip Martin <ph...@wandisco.com>.
Daniel Shahaf <d....@daniel.shahaf.name> writes:

> So, what will a mod_dav fix look like?  I understand the issue is that
> it walks the copy source for locks.  Should it stop doing that?

Yes, I believe so.  mod_dav should avoid the walk when the callback
result for every path in the walk is known in advance.  I spoke to Ben
yesterday and he said he has an httpd patch that is just about ready.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: branching over mod_dav 2.4.6 is O(tree)

Posted by Philip Martin <ph...@wandisco.com>.
Daniel Shahaf <d....@daniel.shahaf.name> writes:

> So, what will a mod_dav fix look like?  I understand the issue is that
> it walks the copy source for locks.  Should it stop doing that?

Yes, I believe so.  mod_dav should avoid the walk when the callback
result for every path in the walk is known in advance.  I spoke to Ben
yesterday and he said he has an httpd patch that is just about ready.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*