You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@wandisco.com> on 2010/06/22 12:57:01 UTC

FSFS error in DAV MERGE - Can't open file 'db/transactions/props'

I am investigating the following DAV + FSFS server error message which
we at WANdisco have seen rarely, but in at least two real installations,
and am sharing it here in case anyone can shed light on it.  It happens
during a DAV 'MERGE':

Can't open file '[...]/db/transactions/props': No such file or directory

FSFS is trying to open an invalid path; it should be
'[...]/db/transactions/<txn-id>/props'.  From the source code, I can see
how this must happen at a low level: the txn-id path component will be
omitted if and only if something passes a NULL transaction-id string to
fs_fs.c:get_txn_proplist().

But why and from where?

The error might be triggered by the WANdisco software sending commands
in the wrong sequence, or something like that, and so to that extent it
may not be interesting to the general community.  In one case, detailed
below, it happened after a DELETE was sent, more than an hour into a
very long commit.  In another case, it happened after a start-commit
hook failed.  The server might be receiving a DAV MERGE request when it
isn't expecting one.  Can anyone comment on that, given this limited
information?

Here are some Apache access log entries from one occurrence, with paths
anonymized and the client Id string (always "SVN/1.6.6 (r40053)
neon/0.28.3") omitted for brevity:
[[[
127.0.0.1 - UserK [16/Dec/2009:20:25:17 -0800]
"MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-"
127.0.0.1 - UserK [16/Dec/2009:23:20:45 -0800]
"PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 207 582 "-"
127.0.0.1 - UserM [17/Dec/2009:00:12:45 -0800]
"MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-"
127.0.0.1 - UserM [17/Dec/2009:00:12:59 -0800]
"MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-"
127.0.0.1 - UserK [17/Dec/2009:00:21:19 -0800]
"DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-"
127.0.0.1 - UserK [17/Dec/2009:00:32:45 -0800]
"MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-"
127.0.0.1 - UserK [17/Dec/2009:00:39:58 -0800]
"MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
"PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 207 569 "-"
127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
"MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
]]]

and the corresponding error log:
[[[
[Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into "/svn/RepoJ/branches/BranchD/PathJ". [500, #0]
[Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
There was a problem opening the transaction specified by this activity. [500, #2]
[Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or directory [500, #2]
]]]

The error appears to match the second-last MERGE in the access log.  I
don't have a copy of the body of the MERGE request in this case.

The error refers to an activity Id that was the subject of the DELETE
that was logged 18 minutes earlier.  A comment from one of our guys
looking at the logs: "We are seeing 22,600 changes for Activity A and no
merge at the end (just a DELETE). Then we see about 931 commands for
Activity B and a MERGE that fails for Activity A."

Is this a case where the client side shouldn't be sending a MERGE at
this point?  Even if that is the primary problem, I'm still interested
in Subversion's response to this MERGE request.

I want to assert(txn-id != NULL) in the relevant FSFS functions, as in
the attached patch.  I determined the comments in the attached patch
after manually passing NULL instead of the txn-id.  However, that won't
get me far on its own.

To try to reproduce the error, do you think I should try sending a
sequence of DAV commands similar to the sequence shown above?

Any other ideas on what might be happening or how to diagnose this?

- Julian


Re: FSFS error in DAV MERGE - Can't open file 'db/transactions/props'

Posted by Julian Foad <ju...@wandisco.com>.
I have opened issue #3696,
<http://subversion.tigris.org/issues/show_bug.cgi?id=3696>, to track
this.

- Julian


On Fri, 2010-07-30 at 16:17 +0100, Julian Foad wrote:
> On Tue, 2010-06-22 at 13:57 +0100, Julian Foad wrote:
> > I am investigating the following DAV + FSFS server error message which
> > we at WANdisco have seen rarely, but in at least two real installations,
> > and am sharing it here in case anyone can shed light on it.  It happens
> > during a DAV 'MERGE':
> > 
> > Can't open file '[...]/db/transactions/props': No such file or directory
> > 
> > FSFS is trying to open an invalid path; it should be
> > '[...]/db/transactions/<txn-id>/props'.  From the source code, I can see
> > how this must happen at a low level: the txn-id path component will be
> > omitted if and only if something passes a NULL transaction-id string to
> > fs_fs.c:get_txn_proplist().
> > 
> > But why and from where?
> 
> A status update:
> 
> I've traced the calls to get_txn_proplist(), a few call levels up, and I
> believe one of these must have supplied txn_id = NULL.
> 
> Callees of get_txn_proplist(), with txn-id parameter identified by name
> and by param-list position:
> 
> get_txn_proplist(,,txn_id)
>   svn_fs_fs__change_txn_props(txn->id)  # DOESN'T CAUSE THE ERROR
>     txn_vtable.change_props(txn->id)
>     svn_fs_fs__change_txn_prop(txn->id)
>     commit_body()
>     svn_fs_fs__begin_txn(using svn_fs_fs__create_txn())
>     svn_fs_fs__begin_obliteration_txn(using ...)
>   svn_fs_fs__get_txn(txn_id)            # CAN CAUSE THE ERROR
>     svn_fs_fs__open_txn(,,name)
>       svn_fs_t.vtable.open_txn(,,name)
>         svn_fs_open_txn(,,name)
>           dav_svn__abort_txn(,txn_name)
>           prep_working()  # NO, txn_name != NULL
>           prep_private(comb->priv.root.txn_name)
>             DAV_RESOURCE_TYPE_PRIVATE
>           is_our_resource()  # NO, txn name != NULL
>           open_txn(,,txn_name)
>             dav_svn__checkout()
>             merge()
>               dav_svn__hooks_vsn.merge
>       fs_merge(,,,target_root->txn)
>         svn_fs_root_t.vtable.merge(,,,target_root->txn)
>           svn_fs_merge(,,target_root->txn)
>             # no callers?
>     svn_fs_fs__get_txn_ids(,,,txn_name)
>       svn_fs_fs__dag_txn_root(,,txn_id)
>         root_node<-open_path<-get_dag(,root->txn)
>       svn_fs_fs__dag_txn_base_root(,,txn_id)
>         merge_changes(,,txn->id)
>           svn_fs_fs__commit_txn(,,txn->id)
>             svn_fs_txn_t.vtable.commit(,,txn->id)
>               svn_fs_commit_txn(,,txn->id)
>           fs_merge()
>       svn_fs_fs__dag_clone_root(,,txn_id)
>         mutable_root_node(,root->txn)
>           make_path_mutable(root->txn)
>   commit_body(baton->txn->id)           # DOESN'T CAUSE THE ERROR
>     svn_fs_fs__commit(,,txn->id)
>       svn_fs_fs__commit_txn(,,txn->id)
>   svn_fs_fs__txn_proplist(,txn->id)     # CAN CAUSE THE ERROR
>     svn_fs_txn_t.vtable.get_proplist(,txn->id)
>       svn_fs_txn_proplist(,,txn->id)
>     commit_body()
>     svn_fs_fs__txn_prop(,txn->id)
>       svn_fs_txn_t.vtable.get_prop(,txn->id)
>         svn_fs_txn_prop(,txn->id)
>     svn_fs_fs__txn_root(,txn->id)
>       svn_fs_txn_t.vtable.root(,txn->id)
>         svn_fs_txn_root(,txn->id)
> 
> The ones annotated "CAN CAUSE THE ERROR" did result in the "Can't open
> 'db/transactions/props'" error when I forced txn_id=NULL at these places
> in a trivial test scenario, and the ones marked "DOESN'T", didn't.
> 
> >From the fact that a DAV MERGE was the operation under way, I would
> hazard a guess that dav_svn__hooks_vsn.merge() might be the call that's
> going wrong.  I haven't tried to trace that back to its caller (in
> mod_dav perhaps?).
> 
> I don't know if this line of attack has any mileage left in it.  I'm
> thinking of leaving it at that for now, unless anyone comes up with
> suggestions for how to proceed.  I don't feel capable of setting up a
> test system that would replicate the sequence of DAV messages that
> resulted in the error condition.
> 
> (The issue of the current assertion leaving a DOS attack possible is
> still in my sights - email thread "Re: svn commit: r980046".)
> 
> - Julian
> 
> 
> > The error might be triggered by the WANdisco software sending commands
> > in the wrong sequence, or something like that, and so to that extent it
> > may not be interesting to the general community.  In one case, detailed
> > below, it happened after a DELETE was sent, more than an hour into a
> > very long commit.  In another case, it happened after a start-commit
> > hook failed.  The server might be receiving a DAV MERGE request when it
> > isn't expecting one.  Can anyone comment on that, given this limited
> > information?
> > 
> > Here are some Apache access log entries from one occurrence, with paths
> > anonymized and the client Id string (always "SVN/1.6.6 (r40053)
> > neon/0.28.3") omitted for brevity:
> > [[[
> > 127.0.0.1 - UserK [16/Dec/2009:20:25:17 -0800]
> > "MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-"
> > 127.0.0.1 - UserK [16/Dec/2009:23:20:45 -0800]
> > "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 207 582 "-"
> > 127.0.0.1 - UserM [17/Dec/2009:00:12:45 -0800]
> > "MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-"
> > 127.0.0.1 - UserM [17/Dec/2009:00:12:59 -0800]
> > "MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:21:19 -0800]
> > "DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:32:45 -0800]
> > "MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:39:58 -0800]
> > "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> > "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 207 569 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> > "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> > ]]]
> > 
> > and the corresponding error log:
> > [[[
> > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> > Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into "/svn/RepoJ/branches/BranchD/PathJ". [500, #0]
> > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> > There was a problem opening the transaction specified by this activity. [500, #2]
> > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> > Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or directory [500, #2]
> > ]]]
> > 
> > The error appears to match the second-last MERGE in the access log.  I
> > don't have a copy of the body of the MERGE request in this case.
> > 
> > The error refers to an activity Id that was the subject of the DELETE
> > that was logged 18 minutes earlier.  A comment from one of our guys
> > looking at the logs: "We are seeing 22,600 changes for Activity A and no
> > merge at the end (just a DELETE). Then we see about 931 commands for
> > Activity B and a MERGE that fails for Activity A."
> > 
> > Is this a case where the client side shouldn't be sending a MERGE at
> > this point?  Even if that is the primary problem, I'm still interested
> > in Subversion's response to this MERGE request.
> > 
> > I want to assert(txn-id != NULL) in the relevant FSFS functions, as in
> > the attached patch.  I determined the comments in the attached patch
> > after manually passing NULL instead of the txn-id.  However, that won't
> > get me far on its own.
> > 
> > To try to reproduce the error, do you think I should try sending a
> > sequence of DAV commands similar to the sequence shown above?
> > 
> > Any other ideas on what might be happening or how to diagnose this?
> > 
> > - Julian
> > 
> 
> 


Re: FSFS error in DAV MERGE - Can't open file 'db/transactions/props'

Posted by Julian Foad <ju...@wandisco.com>.
On Tue, 2010-06-22 at 13:57 +0100, Julian Foad wrote:
> I am investigating the following DAV + FSFS server error message which
> we at WANdisco have seen rarely, but in at least two real installations,
> and am sharing it here in case anyone can shed light on it.  It happens
> during a DAV 'MERGE':
> 
> Can't open file '[...]/db/transactions/props': No such file or directory
> 
> FSFS is trying to open an invalid path; it should be
> '[...]/db/transactions/<txn-id>/props'.  From the source code, I can see
> how this must happen at a low level: the txn-id path component will be
> omitted if and only if something passes a NULL transaction-id string to
> fs_fs.c:get_txn_proplist().
> 
> But why and from where?

A status update:

I've traced the calls to get_txn_proplist(), a few call levels up, and I
believe one of these must have supplied txn_id = NULL.

Callees of get_txn_proplist(), with txn-id parameter identified by name
and by param-list position:

get_txn_proplist(,,txn_id)
  svn_fs_fs__change_txn_props(txn->id)  # DOESN'T CAUSE THE ERROR
    txn_vtable.change_props(txn->id)
    svn_fs_fs__change_txn_prop(txn->id)
    commit_body()
    svn_fs_fs__begin_txn(using svn_fs_fs__create_txn())
    svn_fs_fs__begin_obliteration_txn(using ...)
  svn_fs_fs__get_txn(txn_id)            # CAN CAUSE THE ERROR
    svn_fs_fs__open_txn(,,name)
      svn_fs_t.vtable.open_txn(,,name)
        svn_fs_open_txn(,,name)
          dav_svn__abort_txn(,txn_name)
          prep_working()  # NO, txn_name != NULL
          prep_private(comb->priv.root.txn_name)
            DAV_RESOURCE_TYPE_PRIVATE
          is_our_resource()  # NO, txn name != NULL
          open_txn(,,txn_name)
            dav_svn__checkout()
            merge()
              dav_svn__hooks_vsn.merge
      fs_merge(,,,target_root->txn)
        svn_fs_root_t.vtable.merge(,,,target_root->txn)
          svn_fs_merge(,,target_root->txn)
            # no callers?
    svn_fs_fs__get_txn_ids(,,,txn_name)
      svn_fs_fs__dag_txn_root(,,txn_id)
        root_node<-open_path<-get_dag(,root->txn)
      svn_fs_fs__dag_txn_base_root(,,txn_id)
        merge_changes(,,txn->id)
          svn_fs_fs__commit_txn(,,txn->id)
            svn_fs_txn_t.vtable.commit(,,txn->id)
              svn_fs_commit_txn(,,txn->id)
          fs_merge()
      svn_fs_fs__dag_clone_root(,,txn_id)
        mutable_root_node(,root->txn)
          make_path_mutable(root->txn)
  commit_body(baton->txn->id)           # DOESN'T CAUSE THE ERROR
    svn_fs_fs__commit(,,txn->id)
      svn_fs_fs__commit_txn(,,txn->id)
  svn_fs_fs__txn_proplist(,txn->id)     # CAN CAUSE THE ERROR
    svn_fs_txn_t.vtable.get_proplist(,txn->id)
      svn_fs_txn_proplist(,,txn->id)
    commit_body()
    svn_fs_fs__txn_prop(,txn->id)
      svn_fs_txn_t.vtable.get_prop(,txn->id)
        svn_fs_txn_prop(,txn->id)
    svn_fs_fs__txn_root(,txn->id)
      svn_fs_txn_t.vtable.root(,txn->id)
        svn_fs_txn_root(,txn->id)

The ones annotated "CAN CAUSE THE ERROR" did result in the "Can't open
'db/transactions/props'" error when I forced txn_id=NULL at these places
in a trivial test scenario, and the ones marked "DOESN'T", didn't.

>>From the fact that a DAV MERGE was the operation under way, I would
hazard a guess that dav_svn__hooks_vsn.merge() might be the call that's
going wrong.  I haven't tried to trace that back to its caller (in
mod_dav perhaps?).

I don't know if this line of attack has any mileage left in it.  I'm
thinking of leaving it at that for now, unless anyone comes up with
suggestions for how to proceed.  I don't feel capable of setting up a
test system that would replicate the sequence of DAV messages that
resulted in the error condition.

(The issue of the current assertion leaving a DOS attack possible is
still in my sights - email thread "Re: svn commit: r980046".)

- Julian


> The error might be triggered by the WANdisco software sending commands
> in the wrong sequence, or something like that, and so to that extent it
> may not be interesting to the general community.  In one case, detailed
> below, it happened after a DELETE was sent, more than an hour into a
> very long commit.  In another case, it happened after a start-commit
> hook failed.  The server might be receiving a DAV MERGE request when it
> isn't expecting one.  Can anyone comment on that, given this limited
> information?
> 
> Here are some Apache access log entries from one occurrence, with paths
> anonymized and the client Id string (always "SVN/1.6.6 (r40053)
> neon/0.28.3") omitted for brevity:
> [[[
> 127.0.0.1 - UserK [16/Dec/2009:20:25:17 -0800]
> "MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-"
> 127.0.0.1 - UserK [16/Dec/2009:23:20:45 -0800]
> "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 207 582 "-"
> 127.0.0.1 - UserM [17/Dec/2009:00:12:45 -0800]
> "MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-"
> 127.0.0.1 - UserM [17/Dec/2009:00:12:59 -0800]
> "MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:21:19 -0800]
> "DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:32:45 -0800]
> "MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:58 -0800]
> "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 207 569 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> ]]]
> 
> and the corresponding error log:
> [[[
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into "/svn/RepoJ/branches/BranchD/PathJ". [500, #0]
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> There was a problem opening the transaction specified by this activity. [500, #2]
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or directory [500, #2]
> ]]]
> 
> The error appears to match the second-last MERGE in the access log.  I
> don't have a copy of the body of the MERGE request in this case.
> 
> The error refers to an activity Id that was the subject of the DELETE
> that was logged 18 minutes earlier.  A comment from one of our guys
> looking at the logs: "We are seeing 22,600 changes for Activity A and no
> merge at the end (just a DELETE). Then we see about 931 commands for
> Activity B and a MERGE that fails for Activity A."
> 
> Is this a case where the client side shouldn't be sending a MERGE at
> this point?  Even if that is the primary problem, I'm still interested
> in Subversion's response to this MERGE request.
> 
> I want to assert(txn-id != NULL) in the relevant FSFS functions, as in
> the attached patch.  I determined the comments in the attached patch
> after manually passing NULL instead of the txn-id.  However, that won't
> get me far on its own.
> 
> To try to reproduce the error, do you think I should try sending a
> sequence of DAV commands similar to the sequence shown above?
> 
> Any other ideas on what might be happening or how to diagnose this?
> 
> - Julian
> 



Re: FSFS error in DAV MERGE - Can't open file 'db/transactions/props'

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
On Thu, Jul 15, 2010 at 4:06 PM, C. Michael Pilato <cm...@collab.net> wrote:
> On 06/22/2010 08:57 AM, Julian Foad wrote:
>> Here are some Apache access log entries from one occurrence, with paths
>> anonymized and the client Id string (always "SVN/1.6.6 (r40053)
>> neon/0.28.3") omitted for brevity:
>> [[[
>> 127.0.0.1 - UserK [16/Dec/2009:20:25:17 -0800]
>> "MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-"
>> 127.0.0.1 - UserK [16/Dec/2009:23:20:45 -0800]
>> "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 207 582 "-"
>> 127.0.0.1 - UserM [17/Dec/2009:00:12:45 -0800]
>> "MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-"
>> 127.0.0.1 - UserM [17/Dec/2009:00:12:59 -0800]
>> "MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-"
>> 127.0.0.1 - UserK [17/Dec/2009:00:21:19 -0800]
>> "DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-"
>> 127.0.0.1 - UserK [17/Dec/2009:00:32:45 -0800]
>> "MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-"
>> 127.0.0.1 - UserK [17/Dec/2009:00:39:58 -0800]
>> "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
>> 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
>> "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 207 569 "-"
>> 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
>> "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
>> ]]]
>
> This sequence looks weird to me.  In Subversion, there's always exactly one
> MERGE -- the next-to-last step in a commit (which is begun by a MKACTIVITY
> in HTTPv1, a POST in HTTPv2).  You've got two MKACTIVITYs, and four MERGEs.
>  Unless there are more operations that occurred but are just missing from
> your report there, that's just weird.
>
>> and the corresponding error log:
>> [[[
>> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
>> Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into "/svn/RepoJ/branches/BranchD/PathJ". [500, #0]
>> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
>> There was a problem opening the transaction specified by this activity. [500, #2]
>> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
>> Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or directory [500, #2]
>> ]]]
>>
>> The error appears to match the second-last MERGE in the access log.  I
>> don't have a copy of the body of the MERGE request in this case.
>>
>> The error refers to an activity Id that was the subject of the DELETE
>> that was logged 18 minutes earlier.  A comment from one of our guys
>> looking at the logs: "We are seeing 22,600 changes for Activity A and no
>> merge at the end (just a DELETE). Then we see about 931 commands for
>> Activity B and a MERGE that fails for Activity A."
>>
>> Is this a case where the client side shouldn't be sending a MERGE at
>> this point?  Even if that is the primary problem, I'm still interested
>> in Subversion's response to this MERGE request.
>
> Our codebase should definitely not allow a bogus (NULL) txn-id to be
> casually transformed in a path.  You can commit assertions around those
> parts *right now*.
>
> If no MERGE occurs at the end of a stream of activity modifications, that's
> usually means that one of those modifications failed and the client has
> bailed on the commit (it then issues the DELETE of the activity as a cleanup
> step).  Weird that something would then try to MERGE the activity that was
> already DELETEd.
>
> This is not behavior I've seen in Subversion's own codebase -- so maybe just
> a rare WANdisco software bug?

Possibly.  If we're shoving in bogus commands, that would
understandably result in Subversion weirdness.  Thanks for double
checking the arithmetic!

-Hyrum

Re: FSFS error in DAV MERGE - Can't open file 'db/transactions/props'

Posted by Julian Foad <ju...@wandisco.com>.
On Thu, 2010-07-15, C. Michael Pilato wrote:
> On 06/22/2010 08:57 AM, Julian Foad wrote:
> > Here are some Apache access log entries from one occurrence, with paths
> > anonymized and the client Id string (always "SVN/1.6.6 (r40053)
> > neon/0.28.3") omitted for brevity:
> > [[[
> > 127.0.0.1 - UserK [16/Dec/2009:20:25:17 -0800]
> > "MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-"
> > 127.0.0.1 - UserK [16/Dec/2009:23:20:45 -0800]
> > "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 207 582 "-"
> > 127.0.0.1 - UserM [17/Dec/2009:00:12:45 -0800]
> > "MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-"
> > 127.0.0.1 - UserM [17/Dec/2009:00:12:59 -0800]
> > "MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:21:19 -0800]
> > "DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:32:45 -0800]
> > "MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:39:58 -0800]
> > "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> > "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 207 569 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> > "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> > ]]]
> 
> This sequence looks weird to me.  In Subversion, there's always exactly one
> MERGE -- the next-to-last step in a commit (which is begun by a MKACTIVITY
> in HTTPv1, a POST in HTTPv2).  You've got two MKACTIVITYs, and four MERGEs.
>  Unless there are more operations that occurred but are just missing from
> your report there, that's just weird.
> 
> > and the corresponding error log:
> > [[[
> > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> > Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into "/svn/RepoJ/branches/BranchD/PathJ". [500, #0]
> > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> > There was a problem opening the transaction specified by this activity. [500, #2]
> > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> > Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or directory [500, #2]
> > ]]]
> > 
> > The error appears to match the second-last MERGE in the access log.  I
> > don't have a copy of the body of the MERGE request in this case.
> > 
> > The error refers to an activity Id that was the subject of the DELETE
> > that was logged 18 minutes earlier.  A comment from one of our guys
> > looking at the logs: "We are seeing 22,600 changes for Activity A and no
> > merge at the end (just a DELETE). Then we see about 931 commands for
> > Activity B and a MERGE that fails for Activity A."
> > 
> > Is this a case where the client side shouldn't be sending a MERGE at
> > this point?  Even if that is the primary problem, I'm still interested
> > in Subversion's response to this MERGE request.
> 
> Our codebase should definitely not allow a bogus (NULL) txn-id to be
> casually transformed in a path.  You can commit assertions around those
> parts *right now*.

Done in r980046.

- Julian


> 
> If no MERGE occurs at the end of a stream of activity modifications, that's
> usually means that one of those modifications failed and the client has
> bailed on the commit (it then issues the DELETE of the activity as a cleanup
> step).  Weird that something would then try to MERGE the activity that was
> already DELETEd.
> 
> This is not behavior I've seen in Subversion's own codebase -- so maybe just
> a rare WANdisco software bug?
> 


Re: FSFS error in DAV MERGE - Can't open file 'db/transactions/props'

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 06/22/2010 08:57 AM, Julian Foad wrote:
> Here are some Apache access log entries from one occurrence, with paths
> anonymized and the client Id string (always "SVN/1.6.6 (r40053)
> neon/0.28.3") omitted for brevity:
> [[[
> 127.0.0.1 - UserK [16/Dec/2009:20:25:17 -0800]
> "MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-"
> 127.0.0.1 - UserK [16/Dec/2009:23:20:45 -0800]
> "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 207 582 "-"
> 127.0.0.1 - UserM [17/Dec/2009:00:12:45 -0800]
> "MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-"
> 127.0.0.1 - UserM [17/Dec/2009:00:12:59 -0800]
> "MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:21:19 -0800]
> "DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:32:45 -0800]
> "MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:58 -0800]
> "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 207 569 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> ]]]

This sequence looks weird to me.  In Subversion, there's always exactly one
MERGE -- the next-to-last step in a commit (which is begun by a MKACTIVITY
in HTTPv1, a POST in HTTPv2).  You've got two MKACTIVITYs, and four MERGEs.
 Unless there are more operations that occurred but are just missing from
your report there, that's just weird.

> and the corresponding error log:
> [[[
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into "/svn/RepoJ/branches/BranchD/PathJ". [500, #0]
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> There was a problem opening the transaction specified by this activity. [500, #2]
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or directory [500, #2]
> ]]]
> 
> The error appears to match the second-last MERGE in the access log.  I
> don't have a copy of the body of the MERGE request in this case.
> 
> The error refers to an activity Id that was the subject of the DELETE
> that was logged 18 minutes earlier.  A comment from one of our guys
> looking at the logs: "We are seeing 22,600 changes for Activity A and no
> merge at the end (just a DELETE). Then we see about 931 commands for
> Activity B and a MERGE that fails for Activity A."
> 
> Is this a case where the client side shouldn't be sending a MERGE at
> this point?  Even if that is the primary problem, I'm still interested
> in Subversion's response to this MERGE request.

Our codebase should definitely not allow a bogus (NULL) txn-id to be
casually transformed in a path.  You can commit assertions around those
parts *right now*.

If no MERGE occurs at the end of a stream of activity modifications, that's
usually means that one of those modifications failed and the client has
bailed on the commit (it then issues the DELETE of the activity as a cleanup
step).  Weird that something would then try to MERGE the activity that was
already DELETEd.

This is not behavior I've seen in Subversion's own codebase -- so maybe just
a rare WANdisco software bug?

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: FSFS error in DAV MERGE - Can't open file 'db/transactions/props'

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
Ping?  Just wondering if anybody else has seen this problem, or looked
at Julian's patch.

-Hyrum

On Tue, Jun 22, 2010 at 1:57 PM, Julian Foad <ju...@wandisco.com> wrote:
> I am investigating the following DAV + FSFS server error message which
> we at WANdisco have seen rarely, but in at least two real installations,
> and am sharing it here in case anyone can shed light on it.  It happens
> during a DAV 'MERGE':
>
> Can't open file '[...]/db/transactions/props': No such file or directory
>
> FSFS is trying to open an invalid path; it should be
> '[...]/db/transactions/<txn-id>/props'.  From the source code, I can see
> how this must happen at a low level: the txn-id path component will be
> omitted if and only if something passes a NULL transaction-id string to
> fs_fs.c:get_txn_proplist().
>
> But why and from where?
>
> The error might be triggered by the WANdisco software sending commands
> in the wrong sequence, or something like that, and so to that extent it
> may not be interesting to the general community.  In one case, detailed
> below, it happened after a DELETE was sent, more than an hour into a
> very long commit.  In another case, it happened after a start-commit
> hook failed.  The server might be receiving a DAV MERGE request when it
> isn't expecting one.  Can anyone comment on that, given this limited
> information?
>
> Here are some Apache access log entries from one occurrence, with paths
> anonymized and the client Id string (always "SVN/1.6.6 (r40053)
> neon/0.28.3") omitted for brevity:
> [[[
> 127.0.0.1 - UserK [16/Dec/2009:20:25:17 -0800]
> "MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-"
> 127.0.0.1 - UserK [16/Dec/2009:23:20:45 -0800]
> "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 207 582 "-"
> 127.0.0.1 - UserM [17/Dec/2009:00:12:45 -0800]
> "MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-"
> 127.0.0.1 - UserM [17/Dec/2009:00:12:59 -0800]
> "MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:21:19 -0800]
> "DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:32:45 -0800]
> "MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:58 -0800]
> "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 207 569 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> ]]]
>
> and the corresponding error log:
> [[[
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into "/svn/RepoJ/branches/BranchD/PathJ". [500, #0]
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> There was a problem opening the transaction specified by this activity. [500, #2]
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or directory [500, #2]
> ]]]
>
> The error appears to match the second-last MERGE in the access log.  I
> don't have a copy of the body of the MERGE request in this case.
>
> The error refers to an activity Id that was the subject of the DELETE
> that was logged 18 minutes earlier.  A comment from one of our guys
> looking at the logs: "We are seeing 22,600 changes for Activity A and no
> merge at the end (just a DELETE). Then we see about 931 commands for
> Activity B and a MERGE that fails for Activity A."
>
> Is this a case where the client side shouldn't be sending a MERGE at
> this point?  Even if that is the primary problem, I'm still interested
> in Subversion's response to this MERGE request.
>
> I want to assert(txn-id != NULL) in the relevant FSFS functions, as in
> the attached patch.  I determined the comments in the attached patch
> after manually passing NULL instead of the txn-id.  However, that won't
> get me far on its own.
>
> To try to reproduce the error, do you think I should try sending a
> sequence of DAV commands similar to the sequence shown above?
>
> Any other ideas on what might be happening or how to diagnose this?
>
> - Julian
>
>

Re: FSFS error in DAV MERGE - Can't open file 'db/transactions/props'

Posted by Julian Foad <ju...@wandisco.com>.
On Tue, 2010-06-22 at 13:57 +0100, Julian Foad wrote:
> I am investigating the following DAV + FSFS server error message which
> we at WANdisco have seen rarely, but in at least two real installations,
> and am sharing it here in case anyone can shed light on it.  It happens
> during a DAV 'MERGE':
> 
> Can't open file '[...]/db/transactions/props': No such file or directory
> 
> FSFS is trying to open an invalid path; it should be
> '[...]/db/transactions/<txn-id>/props'.  From the source code, I can see
> how this must happen at a low level: the txn-id path component will be
> omitted if and only if something passes a NULL transaction-id string to
> fs_fs.c:get_txn_proplist().
> 
> But why and from where?

A status update:

I've traced the calls to get_txn_proplist(), a few call levels up, and I
believe one of these must have supplied txn_id = NULL.

Callees of get_txn_proplist(), with txn-id parameter identified by name
and by param-list position:

get_txn_proplist(,,txn_id)
  svn_fs_fs__change_txn_props(txn->id)  # DOESN'T CAUSE THE ERROR
    txn_vtable.change_props(txn->id)
    svn_fs_fs__change_txn_prop(txn->id)
    commit_body()
    svn_fs_fs__begin_txn(using svn_fs_fs__create_txn())
    svn_fs_fs__begin_obliteration_txn(using ...)
  svn_fs_fs__get_txn(txn_id)            # CAN CAUSE THE ERROR
    svn_fs_fs__open_txn(,,name)
      svn_fs_t.vtable.open_txn(,,name)
        svn_fs_open_txn(,,name)
          dav_svn__abort_txn(,txn_name)
          prep_working()  # NO, txn_name != NULL
          prep_private(comb->priv.root.txn_name)
            DAV_RESOURCE_TYPE_PRIVATE
          is_our_resource()  # NO, txn name != NULL
          open_txn(,,txn_name)
            dav_svn__checkout()
            merge()
              dav_svn__hooks_vsn.merge
      fs_merge(,,,target_root->txn)
        svn_fs_root_t.vtable.merge(,,,target_root->txn)
          svn_fs_merge(,,target_root->txn)
            # no callers?
    svn_fs_fs__get_txn_ids(,,,txn_name)
      svn_fs_fs__dag_txn_root(,,txn_id)
        root_node<-open_path<-get_dag(,root->txn)
      svn_fs_fs__dag_txn_base_root(,,txn_id)
        merge_changes(,,txn->id)
          svn_fs_fs__commit_txn(,,txn->id)
            svn_fs_txn_t.vtable.commit(,,txn->id)
              svn_fs_commit_txn(,,txn->id)
          fs_merge()
      svn_fs_fs__dag_clone_root(,,txn_id)
        mutable_root_node(,root->txn)
          make_path_mutable(root->txn)
  commit_body(baton->txn->id)           # DOESN'T CAUSE THE ERROR
    svn_fs_fs__commit(,,txn->id)
      svn_fs_fs__commit_txn(,,txn->id)
  svn_fs_fs__txn_proplist(,txn->id)     # CAN CAUSE THE ERROR
    svn_fs_txn_t.vtable.get_proplist(,txn->id)
      svn_fs_txn_proplist(,,txn->id)
    commit_body()
    svn_fs_fs__txn_prop(,txn->id)
      svn_fs_txn_t.vtable.get_prop(,txn->id)
        svn_fs_txn_prop(,txn->id)
    svn_fs_fs__txn_root(,txn->id)
      svn_fs_txn_t.vtable.root(,txn->id)
        svn_fs_txn_root(,txn->id)

The ones annotated "CAN CAUSE THE ERROR" did result in the "Can't open
'db/transactions/props'" error when I forced txn_id=NULL at these places
in a trivial test scenario, and the ones marked "DOESN'T", didn't.

From the fact that a DAV MERGE was the operation under way, I would
hazard a guess that dav_svn__hooks_vsn.merge() might be the call that's
going wrong.  I haven't tried to trace that back to its caller (in
mod_dav perhaps?).

I don't know if this line of attack has any mileage left in it.  I'm
thinking of leaving it at that for now, unless anyone comes up with
suggestions for how to proceed.  I don't feel capable of setting up a
test system that would replicate the sequence of DAV messages that
resulted in the error condition.

(The issue of the current assertion leaving a DOS attack possible is
still in my sights - email thread "Re: svn commit: r980046".)

- Julian


> The error might be triggered by the WANdisco software sending commands
> in the wrong sequence, or something like that, and so to that extent it
> may not be interesting to the general community.  In one case, detailed
> below, it happened after a DELETE was sent, more than an hour into a
> very long commit.  In another case, it happened after a start-commit
> hook failed.  The server might be receiving a DAV MERGE request when it
> isn't expecting one.  Can anyone comment on that, given this limited
> information?
> 
> Here are some Apache access log entries from one occurrence, with paths
> anonymized and the client Id string (always "SVN/1.6.6 (r40053)
> neon/0.28.3") omitted for brevity:
> [[[
> 127.0.0.1 - UserK [16/Dec/2009:20:25:17 -0800]
> "MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-"
> 127.0.0.1 - UserK [16/Dec/2009:23:20:45 -0800]
> "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 207 582 "-"
> 127.0.0.1 - UserM [17/Dec/2009:00:12:45 -0800]
> "MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-"
> 127.0.0.1 - UserM [17/Dec/2009:00:12:59 -0800]
> "MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:21:19 -0800]
> "DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:32:45 -0800]
> "MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:58 -0800]
> "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 207 569 "-"
> 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> ]]]
> 
> and the corresponding error log:
> [[[
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into "/svn/RepoJ/branches/BranchD/PathJ". [500, #0]
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> There was a problem opening the transaction specified by this activity. [500, #2]
> [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or directory [500, #2]
> ]]]
> 
> The error appears to match the second-last MERGE in the access log.  I
> don't have a copy of the body of the MERGE request in this case.
> 
> The error refers to an activity Id that was the subject of the DELETE
> that was logged 18 minutes earlier.  A comment from one of our guys
> looking at the logs: "We are seeing 22,600 changes for Activity A and no
> merge at the end (just a DELETE). Then we see about 931 commands for
> Activity B and a MERGE that fails for Activity A."
> 
> Is this a case where the client side shouldn't be sending a MERGE at
> this point?  Even if that is the primary problem, I'm still interested
> in Subversion's response to this MERGE request.
> 
> I want to assert(txn-id != NULL) in the relevant FSFS functions, as in
> the attached patch.  I determined the comments in the attached patch
> after manually passing NULL instead of the txn-id.  However, that won't
> get me far on its own.
> 
> To try to reproduce the error, do you think I should try sending a
> sequence of DAV commands similar to the sequence shown above?
> 
> Any other ideas on what might be happening or how to diagnose this?
> 
> - Julian
>