You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Alexey Neyman <st...@att.net> on 2014/08/10 07:50:11 UTC

Regression in bindings? 1.7/1.8 vs 1.6

Hi Subversion developers,

I am trying to migrate some scripts from 1.6 server that we're currently 
running to a newer SVN version, and encountered something that looks like 
a bug: with 1.7/1.8 the fs.change_node_prop (in Python bindings) is no 
longer able to modify a node's properties unless that node is already 
modified in the transaction being handled.

A reproduction script is attached. It yields PASS on 1.6.11 and FAILs on 1.7.8 
and 1.8.8.

Is it by design, or a regression? If it is by design - what is the proper way to 
modify a node in a transaction in the newer Subversion?

Regards,
Alexey.

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Alexey Neyman <st...@att.net>.
[CC'ing users@]

On Sunday, August 10, 2014 08:35:44 PM Ben Reser wrote:
> > Also, I think it would be a good idea to have the transaction-modifying
> > functions return an error once the transaction reached the stage of
> > pre-commit hooks from functions like svn_fs_change_node_prop() - to avoid
> > getting one's hopes high because it works in the current release.
> > Something like SVN_TXN_READONLY. What do you think?
> 
> Keeping in mind that Subversion doesn't have a central process that owns the
> file system, it'd have to be something we write out to the transaction. 
> I'm not sure how practical that is given our current formats.  But yes this
> might not be a bad idea since it's not something we intend to allow.  I
> haven't done too much digging but that might even be what we're doing.

That's what I meant - the functions like svn_fs_change_node_prop() will return an error 
status, not that some "owner process" will report an error to the system log. Is there a way 
for these functions could check whether the FS root passed as the first argument belongs 
to a transaction that is currently executing the pre-commit hook?

> > But back to use case: I am thinking about alternative approaches to doing
> > such auto-updates of properties and/or other content. I assume that it is
> > not possible to create a transaction B based on a transaction A in the
> > pre-commit hook (so that when transaction A becomes a revision,
> > transaction B uses that new revision as a base), is it?
> > 
> > It seems that the only supported way to do that would be to schedule the
> > "update tasks" to be done in the pre-commit script, but actually execute
> > them in a new transaction. Hence, another question - is a post-commit
> > hook allowed to create and commit a transaction, or does it have to be
> > deferred until after the post-commit hook finishes?
> 
> Perhaps it would be better to tell us what you're trying to do rather than
> talking about how you're trying to do it.  I can't think of a good reason
> why you need to be modifying properties on files like this.

I described this in the previous email: I want to have one file, 
/project/trunk/include/version.h, reflect the last revision that touched any file in the 
project. For that purpose, pre-commit hook (on a server currently using SVN 1.6) checks if 
any file under /project/trunk is modified and if it's the case, modifies a property on the 
/project/trunk/include/version.h. This in turn causes $Revision$ in <version.h> to reflect the 
last revision when ANY file in the project was modified, not when the <version.h> itself was 
modified.

There is another case where our current scripts modified the node properties during pre-
commit: we use a modified version of FreeBSD's verify.py, and among other checks, it uses 
heuristics to determine whether a file is binary or text. The FreeBSD's version checks that 
every time, while we tried to cache the result of that heuristicss in a file's property.

Am I missing some obvious way to do these tasks?

> Additionally, I'd point out that this whole thread should probably be
> happening on users@subversion.apache.org.  I don't think this is a bug. 
> You may also get more responses to your questions there, since you're
> hitting a much broader audience that largely consists of admins instead of
> just developers.  I'd guess some of those admins have dealt with similar
> problems.  By contracts us developers don't typically administer
> repositories, in fact since joining the ASF we don't even admin our own
> repository.

CC'ed users@; feel free to drop dev@ when replying.

Regards,
Alexey.

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Alexey Neyman <st...@att.net>.
[CC'ing users@]

On Sunday, August 10, 2014 08:35:44 PM Ben Reser wrote:
> > Also, I think it would be a good idea to have the transaction-modifying
> > functions return an error once the transaction reached the stage of
> > pre-commit hooks from functions like svn_fs_change_node_prop() - to avoid
> > getting one's hopes high because it works in the current release.
> > Something like SVN_TXN_READONLY. What do you think?
> 
> Keeping in mind that Subversion doesn't have a central process that owns the
> file system, it'd have to be something we write out to the transaction. 
> I'm not sure how practical that is given our current formats.  But yes this
> might not be a bad idea since it's not something we intend to allow.  I
> haven't done too much digging but that might even be what we're doing.

That's what I meant - the functions like svn_fs_change_node_prop() will return an error 
status, not that some "owner process" will report an error to the system log. Is there a way 
for these functions could check whether the FS root passed as the first argument belongs 
to a transaction that is currently executing the pre-commit hook?

> > But back to use case: I am thinking about alternative approaches to doing
> > such auto-updates of properties and/or other content. I assume that it is
> > not possible to create a transaction B based on a transaction A in the
> > pre-commit hook (so that when transaction A becomes a revision,
> > transaction B uses that new revision as a base), is it?
> > 
> > It seems that the only supported way to do that would be to schedule the
> > "update tasks" to be done in the pre-commit script, but actually execute
> > them in a new transaction. Hence, another question - is a post-commit
> > hook allowed to create and commit a transaction, or does it have to be
> > deferred until after the post-commit hook finishes?
> 
> Perhaps it would be better to tell us what you're trying to do rather than
> talking about how you're trying to do it.  I can't think of a good reason
> why you need to be modifying properties on files like this.

I described this in the previous email: I want to have one file, 
/project/trunk/include/version.h, reflect the last revision that touched any file in the 
project. For that purpose, pre-commit hook (on a server currently using SVN 1.6) checks if 
any file under /project/trunk is modified and if it's the case, modifies a property on the 
/project/trunk/include/version.h. This in turn causes $Revision$ in <version.h> to reflect the 
last revision when ANY file in the project was modified, not when the <version.h> itself was 
modified.

There is another case where our current scripts modified the node properties during pre-
commit: we use a modified version of FreeBSD's verify.py, and among other checks, it uses 
heuristics to determine whether a file is binary or text. The FreeBSD's version checks that 
every time, while we tried to cache the result of that heuristicss in a file's property.

Am I missing some obvious way to do these tasks?

> Additionally, I'd point out that this whole thread should probably be
> happening on users@subversion.apache.org.  I don't think this is a bug. 
> You may also get more responses to your questions there, since you're
> hitting a much broader audience that largely consists of admins instead of
> just developers.  I'd guess some of those admins have dealt with similar
> problems.  By contracts us developers don't typically administer
> repositories, in fact since joining the ASF we don't even admin our own
> repository.

CC'ed users@; feel free to drop dev@ when replying.

Regards,
Alexey.

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Ben Reser <be...@reser.org>.
Side note this is in the pre-commit.tmpl that we create when you create a
repository:

[[[
#   ***  NOTE: THE HOOK PROGRAM MUST NOT MODIFY THE TXN, EXCEPT  ***
#   ***  FOR REVISION PROPERTIES (like svn:log or svn:author).   ***
]]]

On 8/10/14 7:36 PM, Alexey Neyman wrote:
> So, the unversioned properties are ok to modify? Our scripts do that, too - so
> I'd like to know if it's ok to depend on this not to be broken by future releases.

I think that's a safe assumption given the above statement in our template file.

> Also, I think it would be a good idea to have the transaction-modifying
> functions return an error once the transaction reached the stage of pre-commit
> hooks from functions like svn_fs_change_node_prop() - to avoid getting one's
> hopes high because it works in the current release. Something like
> SVN_TXN_READONLY. What do you think?

Keeping in mind that Subversion doesn't have a central process that owns the
file system, it'd have to be something we write out to the transaction.  I'm
not sure how practical that is given our current formats.  But yes this might
not be a bad idea since it's not something we intend to allow.  I haven't done
too much digging but that might even be what we're doing.

> But back to use case: I am thinking about alternative approaches to doing such
> auto-updates of properties and/or other content. I assume that it is not
> possible to create a transaction B based on a transaction A in the pre-commit
> hook (so that when transaction A becomes a revision, transaction B uses that
> new revision as a base), is it? 
> 
> It seems that the only supported way to do that would be to schedule the
> "update tasks" to be done in the pre-commit script, but actually execute them
> in a new transaction. Hence, another question - is a post-commit hook allowed
> to create and commit a transaction, or does it have to be deferred until after
> the post-commit hook finishes?

Perhaps it would be better to tell us what you're trying to do rather than
talking about how you're trying to do it.  I can't think of a good reason why
you need to be modifying properties on files like this.

No transactions cannot be based on transactions, they must always be based on
revisions.

Sure a post-commit can execute a commit.

> And finally, is there a way to prevent further transactions from being created
> and/or turned into revisions until the post-commit hook has finished performing
> the scheduled tasks? In other words, can Subversion start another transaction
> before the post-commit hook finishes on the revision just created?

We build transactions up without any sort of locking.  The only sort of locking
that happens is when a transaction is being converted to a revision (which is
of course has to be serialized).  The post-commit hook runs outside that lock.
 There is in fact no guarantee the post-commit hook even runs (say power
failure between the commit completing and post-commit starting).  For that
matter, the post-commit hooks can even run out of order.

For all of those reasons, I think we should spend time discussing what you're
actually trying to do.

Additionally, I'd point out that this whole thread should probably be happening
on users@subversion.apache.org.  I don't think this is a bug.  You may also get
more responses to your questions there, since you're hitting a much broader
audience that largely consists of admins instead of just developers.  I'd guess
some of those admins have dealt with similar problems.  By contracts us
developers don't typically administer repositories, in fact since joining the
ASF we don't even admin our own repository.

So I'd suggest you start a new thread on users@subversion.apache.org asking how
to solve the actual problem you're trying to solve rather than the problem
you've run into solving it.  It's fine to mention that you tried pre-commit and
ran into these problems, but focus on your actual problem rather than the
solution you have in mind.


Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Alexey Neyman <st...@att.net>.
Thanks Ben for a detailed answer. A few follow-up questions below:

On Sunday, August 10, 2014 06:21:04 PM Ben Reser wrote:
[... snip ...]
> Here's the most important sentence form the book:
> 
> "While hook scripts can do almost anything, there is one dimension in which
> hook script authors should show restraint: do not modify a commit
> transaction using hook scripts."
> 
> Yes the word should appears in that sentence.  But we really are intending
> to tell you not to modify a transaction.  I suspect the word should is
> there because there is actually one thing that is ok to modify in a hook
> script and that's the unversioned properties that will become revision
> properties. Perhaps the language could be improved, but I do think that
> italics on the word "not" and the following text should give you a pretty
> good idea that this is not a supported operation.

So, the unversioned properties are ok to modify? Our scripts do that, too - so I'd like to 
know if it's ok to depend on this not to be broken by future releases.

Also, I think it would be a good idea to have the transaction-modifying functions return an 
error once the transaction reached the stage of pre-commit hooks from functions like 
svn_fs_change_node_prop() - to avoid getting one's hopes high because it works in the 
current release. Something like SVN_TXN_READONLY. What do you think?

[... snip ...]
> Actually this still makes the client that's doing the commit have a stale
> cache of the property.  It will believe that "myprop" either has no value
> or whatever the previous value is.
> 
> You probably won't pick up any change to this property until the next time
> the node changes.  However, you can get away this if you're not terribly
> concerned about the client having the correct state of the property because
> we never send deltas for property changes.  We just send the whole property
> value.
> 
> If that ever changes (which is entirely conceivable since there are property
> use case that quite honestly have probably grown beyond the original intent
> of properties, including svn:mergeinfo) you will be in a world of hurt
> since your existing working copies when upgraded and run with clients that
> understand this and servers that handle this will suddenly be busted. 
> Every commit until you remove this will break the working copy.

I understand why it's bad for properties that Subversion interprets. In our case - as I 
explained, we don't care if the developer's build environment keeps a stale version of that 
property forever.

But back to use case: I am thinking about alternative approaches to doing such auto-
updates of properties and/or other content. I assume that it is not possible to create a 
transaction B based on a transaction A in the pre-commit hook (so that when transaction A 
becomes a revision, transaction B uses that new revision as a base), is it?

It seems that the only supported way to do that would be to schedule the "update tasks" 
to be done in the pre-commit script, but actually execute them in a new transaction. 
Hence, another question - is a post-commit hook allowed to create and commit a 
transaction, or does it have to be deferred until after the post-commit hook finishes?

And finally, is there a way to prevent further transactions from being created and/or 
turned into revisions until the post-commit hook has finished performing the scheduled 
tasks? In other words, can Subversion start another transaction before the post-commit 
hook finishes on the revision just created?

[... snip ...]
> I haven't dug in to the specifics of how we broke this particular use case.
> But I suspect you'll find that if you create a transaction, call
> svn_fs_change_node_prop() and commit the transaction that this call works
> just fine.  So I'd bet that the problem only happens when you try to do
> this from the pre-commit hook.

Indeed. So because of this it makes even more sense to have functions like 
svn_fs_change_node_prop() fail explicitly with an error, rather than modifying a transaction 
in an unpredictable way.

Thanks again,
Alexey.

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Alexey Neyman <st...@att.net>.
On Tuesday, August 12, 2014 09:49:08 AM Philip Martin wrote:
> Alexey Neyman <st...@att.net> writes:
> > Indeed, calling svn_fs_txn_root() after execution of the pre-commit
> > hook in svn_repos_fs_commit_txn() makes Subversion behave in the 
same
> > way as it did in 1.6.
> 
> Just so I can be clear about what is happening: can you confirm that you
> are not using apache/mod_dav_svn?

No, I was using file:// and svn:// protocols for this test.

Regards,
Alexey.

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Philip Martin <ph...@wandisco.com>.
Alexey Neyman <st...@att.net> writes:

> Indeed, calling svn_fs_txn_root() after execution of the pre-commit
> hook in svn_repos_fs_commit_txn() makes Subversion behave in the same
> way as it did in 1.6.

Just so I can be clear about what is happening: can you confirm that you
are not using apache/mod_dav_svn?

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Alexey Neyman <st...@att.net>.
Wow. Thanks for the very detailed and on the spot explanation.

On Monday, August 11, 2014 01:50:48 PM Philip Martin wrote:
> Ben Reser <be...@reser.org> writes:
> > I haven't dug in to the specifics of how we broke this particular use
> > case.
> > But I suspect you'll find that if you create a transaction, call
> > svn_fs_change_node_prop() and commit the transaction that this call works
> > just fine.  So I'd bet that the problem only happens when you try to do
> > this from the pre-commit hook.
> 
> A post-commit should not change the versioned data because it confuses
> the client.  However a script in general should be able to modify a
> versioned property and it's perfectly possible to write a python script
> that does it.  I think the problem here is that the process committing
> the transaction has populated the fs_fs_data_t.txn_dir_cache and this
> cache gets outdated when the post-commit runs in a separate process.
>
> Consider the test I added in 1617263, it's one process and so doesn't
> demonstrate the problem:
> 
> +  /* Create txn with changes. */
> +  SVN_ERR(svn_fs_begin_txn(&txn, fs, head_rev, pool));
> +  SVN_ERR(svn_fs_txn_name(&txn_name, txn, pool));
> +  SVN_ERR(svn_fs_txn_root(&root, txn, pool));
> +  SVN_ERR(svn_fs_make_dir(root, "X", pool));
> +
> +  /* Reopen, add more changes. */
> +  SVN_ERR(svn_fs_open_txn(&reopen_txn, fs, txn_name, pool));
> +  SVN_ERR(svn_fs_txn_root(&reopen_root, reopen_txn, pool));
> +  SVN_ERR(svn_fs_change_node_prop(reopen_root, "A", "name",
> +                                  svn_string_create("value", pool),
> +                                  pool));
> +
> +  /* Reopen, commit */
> +  SVN_ERR(svn_fs_open_txn(&reopen_txn, fs, txn_name, pool));
> +  SVN_ERR(test_commit_txn(&head_rev, reopen_txn, NULL, pool));
> 
> If I remove the svn_fs_change_node_prop() line from the test, run the
> test in gdb and stop at the commit I can then run a python script in
> another process to make the equivalent property change.  Then I resume
> the test and the commit includes the expected property change.
> 
> However if I also remove the svn_fs_txn_root() call that reopens the
> transaction in addition to the svn_fs_change_node_prop() call then the
> commit will lose the property change made externally.  In both cases the
> property change is present in the transaction on disk but if the
> txn_dir_cache is out-of-date the property change gets lost.
> 
> svn_fs_txn_root() calls svn_fs_fs__initialize_txn_caches() and if that
> detects concurrent transactions it sets ffd->txn_dir_cache to NULL so
> the out-of-date cache doesn't get used and the property change on disk
> makes it into the revision.

Indeed, calling svn_fs_txn_root() after execution of the pre-commit hook in 
svn_repos_fs_commit_txn() makes Subversion behave in the same way as it did in 1.6.

Still, as far as I understand, Subversion is not going to have svn_repos_fs_commit_txn() 
reload the directory cache for the transaction (unless some future release of Subversion 
allows the pre-commit script to modify the transaction and syncs the client caches with 
such modifications :P). So I would still appreciate suggestions as to how Subversion 1.7+ 
hooks should be configured to handle this use case:

I want to have one file, /project/trunk/include/version.h, reflect the last revision that 
touched any file in the project. For that purpose, pre-commit hook (on a server currently 
using SVN 1.6) checks if any file under /project/trunk is modified and if it's the case, 
modifies a property on the /project/trunk/include/version.h. This in turn causes $Revision$ 
in <version.h> to reflect the last revision when ANY file in the project was modified, not 
when the <version.h> itself was modified.

(the other use case I for 1.6 behavior that I previously described can be easily modified to 
use a combination of pre-commit and post-commit hooks, so that is not a big issue).

Thanks,
Alexey.


Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Philip Martin <ph...@wandisco.com>.
Ivan Zhakov <iv...@visualsvn.com> writes:

> On 12 August 2014 12:46, Philip Martin <ph...@wandisco.com> wrote:
>> Philip Martin <ph...@wandisco.com> writes:
>>
>>> The txn_dir_cache gets cleared when the pool associated with the
>>> txn_root is cleared
>>
>> The net result is that the txn_dir_cache doesn't really work with
>> apache/mod_dav_svn.  Import a directory containing a large number of
>> small files and the rate at which files are added slows as the number of
>> file increases.  For svnserve the rate remains more-or-less constant.
>> The final stage comverting the transaction into a revision is much
>> faster for svnserve than for apache/mod_dav_svn.
>>
> Hi Philp,
>
> Ouch, it looks like the serious problem that could lead invalid data
> committed to repository. Is it true? Did you already filed issue?

Cache invalidation has a couple of effects:

  - the cache doesn't work very well for the most common access pattern,
    i.e. mod_dav_svn doesn't scale as well as svnserve.

  - the cache causes out-of-date data to be written for some unusual
    access pattern

I wrote a regression test (r1617500) for the second part but I have not
raised an issue.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 12 August 2014 12:46, Philip Martin <ph...@wandisco.com> wrote:
> Philip Martin <ph...@wandisco.com> writes:
>
>> The txn_dir_cache gets cleared when the pool associated with the
>> txn_root is cleared
>
> The net result is that the txn_dir_cache doesn't really work with
> apache/mod_dav_svn.  Import a directory containing a large number of
> small files and the rate at which files are added slows as the number of
> file increases.  For svnserve the rate remains more-or-less constant.
> The final stage comverting the transaction into a revision is much
> faster for svnserve than for apache/mod_dav_svn.
>
Hi Philp,

Ouch, it looks like the serious problem that could lead invalid data
committed to repository. Is it true? Did you already filed issue?

-- 
Ivan Zhakov
CTO | VisualSVN | http://www.visualsvn.com

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> The txn_dir_cache gets cleared when the pool associated with the
> txn_root is cleared

The net result is that the txn_dir_cache doesn't really work with
apache/mod_dav_svn.  Import a directory containing a large number of
small files and the rate at which files are added slows as the number of
file increases.  For svnserve the rate remains more-or-less constant.
The final stage comverting the transaction into a revision is much
faster for svnserve than for apache/mod_dav_svn.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Wed, Aug 13, 2014 at 12:32 PM, Philip Martin <ph...@wandisco.com>
wrote:

> Stefan Fuhrmann <st...@wandisco.com> writes:
>
> > The solution seems to be to simply destroy the txn_dir_cache
> > instance at the begin of the commit. That comes at some extra
> > cost but that is still proportional to the size of the commit.
>
> There would be no cost to mod_dav_svn since the txn_dir_cache is already
> NULL when handling MERGE.  There would be a cost to svn:// and file://.
>
> > The only sequence that would get us into real trouble is
> >
> >   thread / process A: modify txn
> >   thread / process B: modify txn
> >   thread / process A: modify txn (via same fs_t as before)
> >
> > Since A won't re-read the directory contents and may therefore
> > return "path not found" errors or miss earlier modifications to a
> > given node / sub-tree and use a new copy for the following mods.
> > But I guess we don't support that sequence at all.
>
> We don't support it right now because it doesn't work, but do we want it
> to work?  fs-test 44 is the new test I wrote: it FAILS for FSFS but is a
> PASS for BDB and FSX.  It is also a PASS for 1.6 FSFS.
>

Hm. Right now, we haven't changed the on-disk representation
of in-txn directories. It is still append-only. So, we could cache
the directory file size along with the contents and use that for
an OOD check.

I wouldn't want to do it for 1.9 but it is certainly an option later.
The driver behind this would be concurrent puts.

-- Stefan^2.

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Fuhrmann <st...@wandisco.com> writes:

> The solution seems to be to simply destroy the txn_dir_cache
> instance at the begin of the commit. That comes at some extra
> cost but that is still proportional to the size of the commit.

There would be no cost to mod_dav_svn since the txn_dir_cache is already
NULL when handling MERGE.  There would be a cost to svn:// and file://.

> The only sequence that would get us into real trouble is
>
>   thread / process A: modify txn
>   thread / process B: modify txn
>   thread / process A: modify txn (via same fs_t as before)
>
> Since A won't re-read the directory contents and may therefore
> return "path not found" errors or miss earlier modifications to a
> given node / sub-tree and use a new copy for the following mods.
> But I guess we don't support that sequence at all.

We don't support it right now because it doesn't work, but do we want it
to work?  fs-test 44 is the new test I wrote: it FAILS for FSFS but is a
PASS for BDB and FSX.  It is also a PASS for 1.6 FSFS.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Tue, Aug 12, 2014 at 12:59 PM, Philip Martin <ph...@wandisco.com>
wrote:

> Philip Martin <ph...@wandisco.com> writes:
>
> >   1. client sends MKCOL
> >   2. apache process MKCOL and populates txn_dir_cache
> >   3. apache sends MKCOL response
> >   4. apache clears MKCOL request pool
> >   5. client sends MERGE
> >   6. apache process MERGE
> >
> > 5 can happen before 4 since two processes are involved but can 6 happen
> > before 4?  I've not seen it happen with the worker MPM but I'm not
> > familiar with all the MPMs.
>
> This could happen if 4 and 6 occur in different threads/processes but
> since the txn_dir_cache is per-thread or per-svn_fs_t any cache from a
> MKCOL will not be visible to MERGE.  So there is no txn_dir_cache when
> apache/mod_dav_svn runs the pre-commit and so no cache to become
> out-of-date.
>

The solution seems to be to simply destroy the txn_dir_cache
instance at the begin of the commit. That comes at some extra
cost but that is still proportional to the size of the commit.

The only sequence that would get us into real trouble is

  thread / process A: modify txn
  thread / process B: modify txn
  thread / process A: modify txn (via same fs_t as before)

Since A won't re-read the directory contents and may therefore
return "path not found" errors or miss earlier modifications to a
given node / sub-tree and use a new copy for the following mods.
But I guess we don't support that sequence at all.

-- Stefan^2.

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

>   1. client sends MKCOL
>   2. apache process MKCOL and populates txn_dir_cache
>   3. apache sends MKCOL response
>   4. apache clears MKCOL request pool
>   5. client sends MERGE
>   6. apache process MERGE
>
> 5 can happen before 4 since two processes are involved but can 6 happen
> before 4?  I've not seen it happen with the worker MPM but I'm not
> familiar with all the MPMs.

This could happen if 4 and 6 occur in different threads/processes but
since the txn_dir_cache is per-thread or per-svn_fs_t any cache from a
MKCOL will not be visible to MERGE.  So there is no txn_dir_cache when
apache/mod_dav_svn runs the pre-commit and so no cache to become
out-of-date.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> svn_fs_txn_root() calls svn_fs_fs__initialize_txn_caches() and if that
> detects concurrent transactions it sets ffd->txn_dir_cache to NULL so
> the out-of-date cache doesn't get used and the property change on disk
> makes it into the revision.

The txn_dir_cache gets cleared when the pool associated with the
txn_root is cleared and so the problem only occurs when the server holds
a root open for long enough: the txn_root used to build the transaction
has to remain open when the transaction is committed.  svnserve and
ra_local both do this and show the problem but for apache/mod_dav_svn it
depends on exactly when apache clears the request pool between requests.

  1. client sends MKCOL
  2. apache process MKCOL and populates txn_dir_cache
  3. apache sends MKCOL response
  4. apache clears MKCOL request pool
  5. client sends MERGE
  6. apache process MERGE

5 can happen before 4 since two processes are involved but can 6 happen
before 4?  I've not seen it happen with the worker MPM but I'm not
familiar with all the MPMs.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Philip Martin <ph...@wandisco.com>.
Ben Reser <be...@reser.org> writes:

> I haven't dug in to the specifics of how we broke this particular use case.
> But I suspect you'll find that if you create a transaction, call
> svn_fs_change_node_prop() and commit the transaction that this call works just
> fine.  So I'd bet that the problem only happens when you try to do this from
> the pre-commit hook.

A post-commit should not change the versioned data because it confuses
the client.  However a script in general should be able to modify a
versioned property and it's perfectly possible to write a python script
that does it.  I think the problem here is that the process committing
the transaction has populated the fs_fs_data_t.txn_dir_cache and this
cache gets outdated when the post-commit runs in a separate process.

Consider the test I added in 1617263, it's one process and so doesn't
demonstrate the problem:

+  /* Create txn with changes. */
+  SVN_ERR(svn_fs_begin_txn(&txn, fs, head_rev, pool));
+  SVN_ERR(svn_fs_txn_name(&txn_name, txn, pool)); 
+  SVN_ERR(svn_fs_txn_root(&root, txn, pool));
+  SVN_ERR(svn_fs_make_dir(root, "X", pool));
+
+  /* Reopen, add more changes. */
+  SVN_ERR(svn_fs_open_txn(&reopen_txn, fs, txn_name, pool));
+  SVN_ERR(svn_fs_txn_root(&reopen_root, reopen_txn, pool));
+  SVN_ERR(svn_fs_change_node_prop(reopen_root, "A", "name",
+                                  svn_string_create("value", pool),
+                                  pool));
+
+  /* Reopen, commit */
+  SVN_ERR(svn_fs_open_txn(&reopen_txn, fs, txn_name, pool));
+  SVN_ERR(test_commit_txn(&head_rev, reopen_txn, NULL, pool));

If I remove the svn_fs_change_node_prop() line from the test, run the
test in gdb and stop at the commit I can then run a python script in
another process to make the equivalent property change.  Then I resume
the test and the commit includes the expected property change.

However if I also remove the svn_fs_txn_root() call that reopens the
transaction in addition to the svn_fs_change_node_prop() call then the
commit will lose the property change made externally.  In both cases the
property change is present in the transaction on disk but if the
txn_dir_cache is out-of-date the property change gets lost.

svn_fs_txn_root() calls svn_fs_fs__initialize_txn_caches() and if that
detects concurrent transactions it sets ffd->txn_dir_cache to NULL so
the out-of-date cache doesn't get used and the property change on disk
makes it into the revision.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Ben Reser <be...@reser.org>.
On 8/10/14 5:19 PM, Alexey Neyman wrote:
> - First, if the fs.change_node_prop was not intended for use in scripts, why is
> it exposed in bindings at all?

Bindings are not intended just for writing hook scripts.  Rather the bindings
are intended to wrap the entirety of our C API (though sometimes the wrapping
doesn't always work because we add bits and forget to update it).

We've written all sorts of things with the bindings that are very low level.
It's not uncommon for us to take a higher level feature and implement it with
the bindings to work out if we like how it works before making changes to our C
APIs.

> - Second, if you care to read the quoted paragraph, you'd notice that it is
> phrased as a recommendation, not a restriction. For example, here's how the
> IETF determines the requirement levels:
> 
>  
> 
> tools.ietf.org/html/rfc2119
> 
> 3. SHOULD This word, or the adjective "RECOMMENDED", mean that there
> 
> may exist valid reasons in particular circumstances to ignore a
> 
> particular item, but the full implications must be understood and
> 
> carefully weighed before choosing a different course.

Our book isn't an RFC.

Here's the most important sentence form the book:

"While hook scripts can do almost anything, there is one dimension in which
hook script authors should show restraint: do not modify a commit transaction
using hook scripts."

Yes the word should appears in that sentence.  But we really are intending to
tell you not to modify a transaction.  I suspect the word should is there
because there is actually one thing that is ok to modify in a hook script and
that's the unversioned properties that will become revision properties.
Perhaps the language could be improved, but I do think that italics on the word
"not" and the following text should give you a pretty good idea that this is
not a supported operation.

> - Third, if you care to read the original report I sent - you'd notice that it
> does not involve client caches at all (which is the potential issue that the
> SVN book warns about with respect to modifying the transaction). Instead, the
> transaction is modified in a way different from how the script attempted to
> modify it.

Actually this still makes the client that's doing the commit have a stale cache
of the property.  It will believe that "myprop" either has no value or whatever
the previous value is.

You probably won't pick up any change to this property until the next time the
node changes.  However, you can get away this if you're not terribly concerned
about the client having the correct state of the property because we never send
deltas for property changes.  We just send the whole property value.

If that ever changes (which is entirely conceivable since there are property
use case that quite honestly have probably grown beyond the original intent of
properties, including svn:mergeinfo) you will be in a world of hurt since your
existing working copies when upgraded and run with clients that understand this
and servers that handle this will suddenly be busted.  Every commit until you
remove this will break the working copy.

So I'm sure you've been getting away with this just fine.  We however, won't
encourage you to do it because we don't promise not to break it.

I haven't dug in to the specifics of how we broke this particular use case.
But I suspect you'll find that if you create a transaction, call
svn_fs_change_node_prop() and commit the transaction that this call works just
fine.  So I'd bet that the problem only happens when you try to do this from
the pre-commit hook.  In fact we actually do exactly what I suggest in several
places in our test suite (though it happens to be in C but you're calling the
same function and the Python bindings are hardly wrapped).  For example this
code from subversion/tests/libsvn_repos/dump-load-test.c in
est_dump_r0_mergeinfo():

[[[
  /* Revision 2:  Add bad mergeinfo */
  SVN_ERR(svn_fs_begin_txn2(&txn, fs, youngest_rev, 0, pool));
  SVN_ERR(svn_fs_txn_root(&txn_root, txn, pool));
  SVN_ERR(svn_fs_change_node_prop(txn_root, "/bar", "svn:mergeinfo",
bad_mergeinfo, pool));
  SVN_ERR(svn_repos_fs_commit_txn(NULL, repos, &youngest_rev, txn, pool));
  SVN_TEST_ASSERT(SVN_IS_VALID_REVNUM(youngest_rev));
]]]

As a result my bet would be that someone made the assumption that this wasn't
allowed and made an optimization assuming that nobody does that.

All I can really tell you (without spending more of my Sunday on this), is that
we told you so in the documentation.

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Alexey Neyman <st...@att.net>.
On Monday, August 11, 2014 01:55:41 AM Branko Čibej wrote:
> On 10.08.2014 23:59, Alexey Neyman wrote:
> > On Sunday, August 10, 2014 08:46:45 PM Stefan Sperling wrote:
> > > On Sun, Aug 10, 2014 at 09:14:05AM -0700, Alexey Neyman wrote:
> > > > On Saturday, August 09, 2014 10:50:11 PM Alexey Neyman wrote:
> > > > > Hi Subversion developers,
> > > > > 
> > > > > 
> > > > > 
> > > > > I am trying to migrate some scripts from 1.6 server that we're
> > 
> > currently
> > 
> > > > > running to a newer SVN version, and encountered something that looks
> > > > > 
> > > > > like
> > > > > 
> > > > > a bug: with 1.7/1.8 the fs.change_node_prop (in Python bindings)
> > 
> > is no
> > 
> > > > > longer able to modify a node's properties unless that node is
> > 
> > already
> > 
> > > > > modified in the transaction being handled.
> > > > 
> > > > How come the node is marked as changed, but the changes are lost?
> > > 
> > > Ummm... hooks are not supposed to modify transactions.
> > > 
> > > 
> > > 
> > > Have you seen this paragraph in the svn book?
> > > 
> > > 
> > > 
> > > """
> > > 
> > > While hook scripts can do almost anything, there is one dimension in
> > 
> > which
> > 
> > > hook script authors should show restraint: do not modify a commit
> > > 
> > > transaction using hook scripts. While it might be tempting to use hook
> > > 
> > > scripts to automatically correct errors, shortcomings, or policy
> > 
> > violations
> > 
> > > present in the files being committed, doing so can cause problems.
> > > 
> > > Subversion keeps client-side caches of certain bits of repository
> > 
> > data, and
> > 
> > > if you change a commit transaction in this way, those caches become
> > > 
> > > indetectably stale. This inconsistency can lead to surprising and
> > > 
> > > unexpected behavior. Instead of modifying the transaction, you should
> > > 
> > > simply validate the transaction in the pre-commit hook and reject the
> > > 
> > > commit if it does not meet the desired requirements. As a bonus,
> > 
> > your users
> > 
> > > will learn the value of careful, compliance-minded work habits.
> > > 
> > > """
> > 
> > http://svnbook.red-bean.com/en/1.7/svn.reposadmin.create.html#svn.reposadm
> > in> 
> > > .create.hooks
> > 
> > Yes, I've seen it and I remember there have been discussion in the
> > past on the mailing list that the Subversion server should notify the
> > client that the committed revision is different from the client's
> > submission - so that the client can invalidate those caches.
> > 
> > 
> > 
> > In our case, though, the modification is very benign: the script
> > detects that any file under /project/trunk gets modified, and if it
> > does - updates the property on the /project/trunk/include/version.h.
> > The production build server checks out a fresh copy for the build, so
> > that there is no caches involved; and if the local (committer's)
> > environment has a stale version.h, it is not a big deal.
> > 
> > 
> > 
> > So the question still stands - was this change in
> > svn_fs_change_node_prop() by design, or is it an unwanted side effect
> > of some other change? An in any case - is there way to modify the
> > transaction that *works*, even if not recommended?
> 
> Not in a hook. Let's not quibble about "benign" changes, and what the
> server could do in some parallel universe. Please just follow the
> restrictions plainly stated in our documentation since 1.0.

I would appreciate a more constructive response, for the following three reasons:

- First, if the fs.change_node_prop was not intended for use in scripts, why is it exposed in 
bindings at all?

- Second, if you care to read the quoted paragraph, you'd notice that it is phrased as a 
recommendation, not a restriction. For example, here's how the IETF determines the 
requirement levels:

   tools.ietf.org/html/rfc2119
   3. SHOULD   This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.

- Third, if you care to read the original report I sent - you'd notice that it does not involve 
client caches at all (which is the potential issue that the SVN book warns about with 
respect to modifying the transaction). Instead, the transaction is modified in a way 
different from how the script attempted to modify it.


Regards,
Alexey.


Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Branko Čibej <br...@wandisco.com>.
On 10.08.2014 23:59, Alexey Neyman wrote:
>
> On Sunday, August 10, 2014 08:46:45 PM Stefan Sperling wrote:
>
> > On Sun, Aug 10, 2014 at 09:14:05AM -0700, Alexey Neyman wrote:
>
> > > On Saturday, August 09, 2014 10:50:11 PM Alexey Neyman wrote:
>
> > > > Hi Subversion developers,
>
> > > >
>
> > > > I am trying to migrate some scripts from 1.6 server that we're
> currently
>
> > > > running to a newer SVN version, and encountered something that looks
>
> > > > like
>
> > > > a bug: with 1.7/1.8 the fs.change_node_prop (in Python bindings)
> is no
>
> > > > longer able to modify a node's properties unless that node is
> already
>
> > > > modified in the transaction being handled.
>
> > >
>
> > > How come the node is marked as changed, but the changes are lost?
>
> >
>
> > Ummm... hooks are not supposed to modify transactions.
>
> >
>
> > Have you seen this paragraph in the svn book?
>
> >
>
> > """
>
> > While hook scripts can do almost anything, there is one dimension in
> which
>
> > hook script authors should show restraint: do not modify a commit
>
> > transaction using hook scripts. While it might be tempting to use hook
>
> > scripts to automatically correct errors, shortcomings, or policy
> violations
>
> > present in the files being committed, doing so can cause problems.
>
> > Subversion keeps client-side caches of certain bits of repository
> data, and
>
> > if you change a commit transaction in this way, those caches become
>
> > indetectably stale. This inconsistency can lead to surprising and
>
> > unexpected behavior. Instead of modifying the transaction, you should
>
> > simply validate the transaction in the pre-commit hook and reject the
>
> > commit if it does not meet the desired requirements. As a bonus,
> your users
>
> > will learn the value of careful, compliance-minded work habits.
>
> > """
>
> >
> http://svnbook.red-bean.com/en/1.7/svn.reposadmin.create.html#svn.reposadmin
>
> > .create.hooks
>
>  
>
> Yes, I've seen it and I remember there have been discussion in the
> past on the mailing list that the Subversion server should notify the
> client that the committed revision is different from the client's
> submission - so that the client can invalidate those caches.
>
>  
>
> In our case, though, the modification is very benign: the script
> detects that any file under /project/trunk gets modified, and if it
> does - updates the property on the /project/trunk/include/version.h.
> The production build server checks out a fresh copy for the build, so
> that there is no caches involved; and if the local (committer's)
> environment has a stale version.h, it is not a big deal.
>
>  
>
> So the question still stands - was this change in
> svn_fs_change_node_prop() by design, or is it an unwanted side effect
> of some other change? An in any case - is there way to modify the
> transaction that *works*, even if not recommended?
>

Not in a hook. Let's not quibble about "benign" changes, and what the
server could do in some parallel universe. Please just follow the
restrictions plainly stated in our documentation since 1.0.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco | Realising the impossibilities of Big Data
e. brane@wandisco.com

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Alexey Neyman <st...@att.net>.
On Sunday, August 10, 2014 08:46:45 PM Stefan Sperling wrote:
> On Sun, Aug 10, 2014 at 09:14:05AM -0700, Alexey Neyman wrote:
> > On Saturday, August 09, 2014 10:50:11 PM Alexey Neyman wrote:
> > > Hi Subversion developers,
> > > 
> > > I am trying to migrate some scripts from 1.6 server that we're currently
> > > running to a newer SVN version, and encountered something that looks
> > > like
> > > a bug: with 1.7/1.8 the fs.change_node_prop (in Python bindings) is no
> > > longer able to modify a node's properties unless that node is already
> > > modified in the transaction being handled.
> > 
> > How come the node is marked as changed, but the changes are lost?
> 
> Ummm... hooks are not supposed to modify transactions.
> 
> Have you seen this paragraph in the svn book?
> 
> """
> While hook scripts can do almost anything, there is one dimension in which
> hook script authors should show restraint: do not modify a commit
> transaction using hook scripts. While it might be tempting to use hook
> scripts to automatically correct errors, shortcomings, or policy violations
> present in the files being committed, doing so can cause problems.
> Subversion keeps client-side caches of certain bits of repository data, and
> if you change a commit transaction in this way, those caches become
> indetectably stale. This inconsistency can lead to surprising and
> unexpected behavior. Instead of modifying the transaction, you should
> simply validate the transaction in the pre-commit hook and reject the
> commit if it does not meet the desired requirements. As a bonus, your users
> will learn the value of careful, compliance-minded work habits.
> """
> http://svnbook.red-bean.com/en/1.7/svn.reposadmin.create.html#svn.reposadmin
> .create.hooks

Yes, I've seen it and I remember there have been discussion in the past on the mailing list 
that the Subversion server should notify the client that the committed revision is 
different from the client's submission - so that the client can invalidate those caches.

In our case, though, the modification is very benign: the script detects that any file under 
/project/trunk gets modified, and if it does - updates the property on the 
/project/trunk/include/version.h. The production build server checks out a fresh copy for 
the build, so that there is no caches involved; and if the local (committer's) environment 
has a stale version.h, it is not a big deal.

So the question still stands - was this change in svn_fs_change_node_prop() by design, or is 
it an unwanted side effect of some other change? An in any case - is there way to modify 
the transaction that *works*, even if not recommended?

Regards,
Alexey.

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Stefan Sperling <st...@elego.de>.
On Sun, Aug 10, 2014 at 09:14:05AM -0700, Alexey Neyman wrote:
> On Saturday, August 09, 2014 10:50:11 PM Alexey Neyman wrote:
> > Hi Subversion developers,
> > 
> > I am trying to migrate some scripts from 1.6 server that we're currently
> > running to a newer SVN version, and encountered something that looks like
> > a bug: with 1.7/1.8 the fs.change_node_prop (in Python bindings) is no
> > longer able to modify a node's properties unless that node is already
> > modified in the transaction being handled.
 
> How come the node is marked as changed, but the changes are lost?

Ummm... hooks are not supposed to modify transactions.

Have you seen this paragraph in the svn book?

"""
While hook scripts can do almost anything, there is one dimension in which hook
script authors should show restraint: do not modify a commit transaction using
hook scripts. While it might be tempting to use hook scripts to automatically
correct errors, shortcomings, or policy violations present in the files being
committed, doing so can cause problems. Subversion keeps client-side caches of
certain bits of repository data, and if you change a commit transaction in this
way, those caches become indetectably stale. This inconsistency can lead to
surprising and unexpected behavior. Instead of modifying the transaction, you
should simply validate the transaction in the pre-commit hook and reject the
commit if it does not meet the desired requirements. As a bonus, your users
will learn the value of careful, compliance-minded work habits.
"""
http://svnbook.red-bean.com/en/1.7/svn.reposadmin.create.html#svn.reposadmin.create.hooks

Re: Regression in bindings? 1.7/1.8 vs 1.6

Posted by Alexey Neyman <st...@att.net>.
On Saturday, August 09, 2014 10:50:11 PM Alexey Neyman wrote:
> Hi Subversion developers,
> 
> I am trying to migrate some scripts from 1.6 server that we're currently
> running to a newer SVN version, and encountered something that looks like
> a bug: with 1.7/1.8 the fs.change_node_prop (in Python bindings) is no
> longer able to modify a node's properties unless that node is already
> modified in the transaction being handled.

Correction: "unless that node's ancestor is already modified".

More: if I read back the node property immediately after changing it, the new value is 
returned by fs.node_prop even in 1.7/1.8. However, when the transaction is turned into a 
revision - the changed value is lost. Also, the dump in 1.7/1.8 still shows the node as 
modified - but has no modifications reported on it:

[[[
Revision-number: 2
Prop-content-length: 105
Content-length: 105

K 10
svn:author
V 7
aneyman
K 8
svn:date
V 27
2014-08-10T16:11:08.311598Z
K 7
svn:log
V 4
test
PROPS-END

Node-path: baz
Node-kind: dir
Node-action: change
Prop-content-length: 22
Content-length: 22

K 1
x
V 1
y
PROPS-END


Node-path: foo/bar
Node-kind: file
Node-action: change
]]]

How come the node is marked as changed, but the changes are lost?

Regards,
Alexey.

> 
> A reproduction script is attached. It yields PASS on 1.6.11 and FAILs on
> 1.7.8 and 1.8.8.
> 
> Is it by design, or a regression? If it is by design - what is the proper
> way to modify a node in a transaction in the newer Subversion?
> 
> Regards,
> Alexey.