You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Johan Corveleyn <jc...@gmail.com> on 2015/09/21 16:38:20 UTC

No-op changes no longer dumped by 'svnadmin dump' in 1.9

Behaviour of 'svnadmin dump' has changed in 1.9: a no-op change to
path X comes into the dump file without "Text-content" block (there is
still a "Node-action: change" to the path X). The consequence is that
the no-op change disappears when you load this into a new repository.
When you dump from the new repository again, the node-change is
completely gone from the revision.

For an example, run svnmucc_tests.py 2 'basic svnmucc tests' without
cleanup. The test repository contains a no-op modification to an empty
file, in revision 16.

If you 'svnadmin dump' this revision with 1.8 (you'll have to run the
svnmucc test with 1.8, otherwise you'll not be able to dump it), it
contains:

[[[
Node-path: boozle/buz/svnmucc-test.py
Node-kind: file
Node-action: change
Text-content-length: 0
Text-content-md5: d41d8cd98f00b204e9800998ecf8427e
Text-content-sha1: da39a3ee5e6b4b0d3255bfef95601890afd80709
Content-length: 0
]]]

If you 'svnadmin dump' it with 1.9, you'll see:

[[[
Node-path: boozle/buz/svnmucc-test.py
Node-kind: file
Node-action: change
]]]


In fact, Julian already observed this as a difference between
'svnadmin dump' and 'svnrdump dump', in [1]. He didn't realize that
this was a behaviour change in 'svnadmin dump' ... if he would have
done the same test with 1.8, he wouldn't have seen a difference
between dump and svnrdump.

The change was introduced in r1572363 and further tweaked in r1573111,
by stefan2. When discussing this last week in Berlin, Stefan agreed
that this was most likely an unintended / unwanted change.

As already discussed by others on the mailinglist [2], I think it's
important that dump and load try to preserve repository data as much
as possible (except for necessary fixes, normalizations, ...). In this
case, information is being lost, and the newly loaded repository is
visibly different from the original (see below). I think this should
be fixed in 'svnadmin dump'.

For now, I will probably use 1.8 to perform the dump that I want to load in 1.9.

To illustrate a visible difference:
- 'svn log path/with/no-op/change/in/rev/R' will show rev R.
- the same command will no longer show rev R after a dump/load with 1.9.

In our repository, there are a couple of such revisions, dating from
our conversion from cvs to svn (actually, revision N-1 contains a real
change, but only a short log message; and revision N has a no-op
change to that same path, and a very informative log message
describing the real change in detail). It's not like we consciously
used this as a specific use-case, but now the information is like that
in our repository, and I'd hate to lose it.


[1] https://mail-archives.apache.org/mod_mbox/subversion-dev/201501.mbox/%3C1421067283.19093.YahooMailNeo@web87701.mail.ir2.yahoo.com%3E

[2] https://mail-archives.apache.org/mod_mbox/subversion-dev/201412.mbox/%3C1418991791.49595.YahooMailNeo@web87702.mail.ir2.yahoo.com%3E

-- 
Johan

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Johan Corveleyn <jc...@gmail.com>.
On Mon, Sep 21, 2015 at 4:38 PM, Johan Corveleyn <jc...@gmail.com> wrote:
...
> In our repository, there are a couple of such revisions, dating from
> our conversion from cvs to svn (actually, revision N-1 contains a real
> change, but only a short log message; and revision N has a no-op
> change to that same path, and a very informative log message
> describing the real change in detail). It's not like we consciously
> used this as a specific use-case, but now the information is like that
> in our repository, and I'd hate to lose it.

With "our repository" I'm referring to my company's repository, not
the one from the Subversion Project or the ASF (just realized I had my
company hat on when I wrote that :-)).

-- 
Johan

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Shahaf wrote on Wed, Sep 23, 2015 at 13:07:31 +0000:
> Julian Foad wrote on Tue, Sep 22, 2015 at 12:54:58 +0200:
> > Daniel Shahaf wrote:
> > > Johan Corveleyn wrote:
> > >> [...], revision N-1 contains a real
> > >> change, but only a short log message; and revision N has a no-op
> > >> change to that same path, and a very informative log message [...]
> > >
> > > The FreeBSD project used to intentionally make no-op commits (they term it
> > > "forced commits") as part of their new committer workflow.  I don't know
> > > whether they still do that.
> > 
> > We need to be careful with terminology. We're not talking about a
> > no-op commit, we're talking about a path that is marked as 'changed'
> > within a commit, but whose content did not change. (The same commit
> > might or might not also contain other paths that have real changes.)
> > 
> 
> Yes, that is what I was referring to.
> 
> > In fact, I think one of the first things we need to do is a precise
> > analysis of the issue:
> > 
> >   * What exactly are the existing possible forms of 'no-op change'
> > that any part of Subversion can represent?
> >     - Text-change?
> 
> There's another hair to split here: a "no-op text change" can take two
> forms: either an _empty_ delta — that is, an svndiff stream with no
> windows — or a non-empty delta that, when applied to the source file,
> produces the source file again.
> 
> The former is conceptually
>     --- iota
>     +++ iota
>     [no hunks]
> 
> The latter is conceptually
>     --- iota
>     +++ iota
>     @@ -1 +1 @@
>     -This is the file 'iota'.
>     +This is the file 'iota'.
> 

Both of these are semantically equivalent, of course.  My point is that
we can't _detect_ no-op changes by looking for "an svndiff stream with
no windows", since such a test wouldn't catch all no-op changes: it
would be 100% specific but not 100% sensitive.

Cheers,

Daniel

> (saying "conceptually" because text deltas are _not_ unidiffs)
> 
> Cheers,
> 
> Daniel
> 
> >     - Props-change?
> >     - Whole-node-change?
> >     - Commit? (not so interesting in the current issue)
> >     - Only certain combinations of those?
> > 
> >   * At which APIs can each those changes be (a) made and (b) seen?
> >     - FS API?
> >     - Repos API?
> >     - RA and client-side APIs?
> >     - svnadmin dump/load?
> >     - svnrdump dump/load?
> >     - svnsync?
> > 
> > - Julian

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian Foad wrote on Tue, Sep 22, 2015 at 12:54:58 +0200:
> Daniel Shahaf wrote:
> > Johan Corveleyn wrote:
> >> [...], revision N-1 contains a real
> >> change, but only a short log message; and revision N has a no-op
> >> change to that same path, and a very informative log message [...]
> >
> > The FreeBSD project used to intentionally make no-op commits (they term it
> > "forced commits") as part of their new committer workflow.  I don't know
> > whether they still do that.
> 
> We need to be careful with terminology. We're not talking about a
> no-op commit, we're talking about a path that is marked as 'changed'
> within a commit, but whose content did not change. (The same commit
> might or might not also contain other paths that have real changes.)
> 

Yes, that is what I was referring to.

> In fact, I think one of the first things we need to do is a precise
> analysis of the issue:
> 
>   * What exactly are the existing possible forms of 'no-op change'
> that any part of Subversion can represent?
>     - Text-change?

There's another hair to split here: a "no-op text change" can take two
forms: either an _empty_ delta — that is, an svndiff stream with no
windows — or a non-empty delta that, when applied to the source file,
produces the source file again.

The former is conceptually
    --- iota
    +++ iota
    [no hunks]

The latter is conceptually
    --- iota
    +++ iota
    @@ -1 +1 @@
    -This is the file 'iota'.
    +This is the file 'iota'.

(saying "conceptually" because text deltas are _not_ unidiffs)

Cheers,

Daniel

>     - Props-change?
>     - Whole-node-change?
>     - Commit? (not so interesting in the current issue)
>     - Only certain combinations of those?
> 
>   * At which APIs can each those changes be (a) made and (b) seen?
>     - FS API?
>     - Repos API?
>     - RA and client-side APIs?
>     - svnadmin dump/load?
>     - svnrdump dump/load?
>     - svnsync?
> 
> - Julian

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, Sep 23, 2015 at 12:55 PM, Julian Foad
<ju...@btopenworld.com> wrote:
>> Johan Corveleyn wrote:
>>> [...] stefan2 told me in person that that part of the
>>> change in r1572363 was unintentional :-). IIUC, he didn't realize that
>>> it would have this effect on the output of dump.
> [...]
>>>>> I think the dump.c part of r1572363 and r1573111 should be reverted /
>>>>> fixed so that we get the previous behaviour again, and this should be
>>>>> backported to 1.9. At this point, IMO 'svnadmin dump' is broken in
>>>>> 1.9.
>
> 1. I pretty much agree now that we should revert the change.
>
> We understand now that the change in 1.9.0 was unintentional, and
> after we analyse the situation, we are very unlikely to conclude that
> that change was a complete bug fix to the whole issue of no-op
> changes. It is surprising and is regarded as losing information, and
> is not justified (yet) by some higher purpose.
>
> It seems fairly clear what the change was, and so how to revert it.
>
> 2. We might also want to make another change to the behaviour of
> 'svnadmin load', so that the result of loading a dump file that people
> have *already* produced using 1.9.0-1.9.2 will be the same as if they
> had dumped and loaded using 1.8.x. I don't yet understand the details
> enough to know whether this option is possible.
>
> 3. I firmly believe that our handling of 'no-op changes' is mistaken
> and a bad idea. I'll explain that, but not in this thread -- that's a
> follow-on task.
>
> Brane wrote:
>> I also suggest adding a note to
>> http://subversion.apache.org/docs/release-notes/1.9.html#issues .
>
> And we need to file an issue.
>
> I'll do both of those things (issue and rel-notes) now.

Thanks. I'm a bit swamped by $dayjob now.

-- 
Johan

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Tue, Sep 29, 2015 at 3:09 PM, Julian Foad <ju...@gmail.com> wrote:

> Thanks for the comments, Stefan.
>
> Stefan Fuhrmann wrote:
> > On Mon, Sep 28, 2015, Stefan Fuhrmann wrote:
> >> I suggest that we apply the patch below. It is more efficient
> >> that the original behaviour but ensures that "no-op" changes
> >> will be retained.
> >
> > Now that I've read through the whole thread, here my view
> > on this problem plus an explanation why I think my patch is
> > the right approach.
> >
> > First some facts and things that I believe:
> >
> > * Internally, FSFS can record no-op prop and no-op content
> >   changes (the delta between new representation and the
> >   respective proplist / text of its predecessor is empty). There
> >   is even the possibility of a no-op node change, i.e. a new
> >   node without touching text nor props (not happening atm).
>
> When you say "not happening ATM", are you referring to 1.9 or all
> released versions of FSFS or something else?
>

It does not happen in FSFS, is not planned to happen
and probably did never happen in FSFS. In FSX, though,
it is a side-effect of rep-sharing: Because the reps in the
noderev struct don't have unifiers any more, they become
identical to the entries in the predecessor node.

> * Relying on implementation details of a lower layer is bad.
> >
> > * Via FS and Repos layer API, new instances of no-op changes
> >   to either text and properties can be created.
> > * Dump files (produced by 3rd party tools or non-incremental
> >   svnadmin dumps) are another source of those no-op changes.
> > * SVN editors don't produce no-op changes.
>
> An svn_delta_editor can implicitly describe all sorts of no-op
> changes. For example, if I add svn_delta__get_debug_editor() into
> svnmucc's commit, we can make no-op changes to a file's text and
> properties as follows:
>
[snip]

I see. I didn't have svnmucc on my radar. It actually supports
my point: "normal" SVN usage via the svn client should never
produce no-op changes but there still are other legal interfaces
that may create new instances.

So, it is not a problem of the past (e.g. created by CVS->SVN
conversions) that could be fixed by e.g. updating the log messages.
We will have to deal with them in the future and need to preserve
that information.

> * A commit links user-controlled revision meta data (e.g. comment)
> >   to a user-controlled list of changes. That link shall not be
> >   lost when tweak implementation details.
>
> When you say "list of changes", I agree if you mean the "difference
> between snapshots" in principle, rather than the specific list that's
> stored at the end of a revision file.
>

Yes, and there is a clear mismatch in the output between diff
and log. It is due to the changes list being (mostly?) redundant
information with a tendency to become inconsistent is some
sense for edge cases like no-op changes.


> > * Dump / load is a compatibility interface between SVN versions
> >   and allows for forward and backward migration.
> > * A plain node modification (no text nor prop change) will be
> >   ignored by "svnadmin load". Even if we changed that on /trunk,
> >   it would not fix old releases.
>
> What's a "plain node modification (no text nor prop change)"?
> Ambiguous w.r.t. no-op changes.
>

A dump file may say "modified node X" but not give the new
props nor the new contents. This is exactly what happens today
for no-op file modifications where neither text nor props actually
differ from their predecessors.

The loader will not create the new node upon seeing the "modified"
record but only as a side-effect of an actual text / prop set - which
also populates the changed paths list in the target repo.


> > From that, we can derive a few requirements:
> >
> > * If the a revision contains a "M" change, svnadmin must produce
> >   a dump file that causes previous releases to produce the "M"
> >   change - within reason.
>
> The problem is, we don't have an agreed definition of an 'M' change.
>

My point here is that we only need to agree that we don't want
to lose the information contained in the association of the rev
meta data and the changed paths list. The patch is simply stops
us losing some information in a somewhat rare case.

Beyond that, we may agree that there shall be no future instances
of 'M' with empty deltas. Personally, I don't think that empty deltas
should break any part of SVN. E.g. the result of a merge can be an
empty delta (which we then do not commit as a individual change
but may only track though mergeinfo). IOW, preventing no-op
changes may end up not actually eliminating edge cases from the
more complicated parts of SVN.


> > * Load shall produce no-op changes for the same data as in previous
> >   releases - within reason. If the FS API would no longer support
> >   creating no-op changes at some point in the future, the load process
> >   should issue a warning.
> > * The dump file contents shall be independent of FS implementation
> >   details, i.e. use "strict" FS API functions.
>
> I'm not sure to what you're referring here. The 'strict' flag in the
> text and props comparison functions? That would omit no-op changes.
>

What I mean is to use the parts of the FS API whose behaviour is
not implementation-specific. I'm not 100% sure whether we will be
fine with applying the same logic as in my patch but using the API
variants that allow for false positives. It feels better to have a more
predictable / stable output.


> > The
> > text / proplist asymmetry in the patch is due to the fact that all
> > files due have a text but maybe no proplist representation. So,
> > dumping the text is always safe.
> >
> > V2 of the patch also covers no-op prop changes to directories.
>
> I don't see that we've decided whether we want to preserve no-op text
> changes, no-op prop changes, no-op node changes, and/or other.
>

Right now, we are losing information. In rare cases and probably
nothing too important but still. This aspect makes me consider
the current behaviour a bug. Whether creating that situation in
the first place was o.k. nor not is a separate issue.


> It looks like your patch is aimed at preserving the no-op node change.
> This correlates with an 'M' line in the 'svn log -v' output.
>

Exactly.


> Do we also care about no-op text and property changes, such as are
> reported in 'svn log -v --xml'? More precisely, we should define what
> we care about in terms of an API.
>

Good point, log -v --xml could say "modified" but not set the
flags for either prop nor text change. That would need fixing,
I guess and it would be quite expensive in terms of CPU load.
But it makes sense at least as an option.

-- Stefan^2.

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Julian Foad <ju...@gmail.com>.
Thanks for the comments, Stefan.

Stefan Fuhrmann wrote:
> On Mon, Sep 28, 2015, Stefan Fuhrmann wrote:
>> I suggest that we apply the patch below. It is more efficient
>> that the original behaviour but ensures that "no-op" changes
>> will be retained.
>
> Now that I've read through the whole thread, here my view
> on this problem plus an explanation why I think my patch is
> the right approach.
>
> First some facts and things that I believe:
>
> * Internally, FSFS can record no-op prop and no-op content
>   changes (the delta between new representation and the
>   respective proplist / text of its predecessor is empty). There
>   is even the possibility of a no-op node change, i.e. a new
>   node without touching text nor props (not happening atm).

When you say "not happening ATM", are you referring to 1.9 or all
released versions of FSFS or something else?

> * SVN versions file trees and the decision how to represent
>   them (e.g. as delta) is an implementation detail.

Yes, this is a key point. It's not quite as simple as that, since we
do need to distinguish whether the node at /foo is a continuation of
the previous node at /foo or is a new node that replaces it, and a
similar thing with identifying copies. But to a large extent, yes, the
system is designed to behave as storing a sequence of tree snapshots.

> * Relying on implementation details of a lower layer is bad.
>
> * Via FS and Repos layer API, new instances of no-op changes
>   to either text and properties can be created.
> * Dump files (produced by 3rd party tools or non-incremental
>   svnadmin dumps) are another source of those no-op changes.
> * SVN editors don't produce no-op changes.

An svn_delta_editor can implicitly describe all sorts of no-op
changes. For example, if I add svn_delta__get_debug_editor() into
svnmucc's commit, we can make no-op changes to a file's text and
properties as follows:

$ svnmucc -U $REPO -m "" put repo/format foo
DBG: open_root : 6
DBG:  open_file : 'foo':6
DBG:   apply_textdelta : (null)
DBG:  close_file : 1dcca23355272056f04fe8bf20edfce0
DBG: close_directory
DBG: close_edit
r7 committed ...

$ svnmucc -U $REPO -m "" propset p v foo
DBG: open_root : 7
DBG:  open_file : 'foo':7
DBG:   change_file_prop : p -> v
DBG:  close_file : (null)
DBG: close_directory
DBG: close_edit
r8 committed ...

$ svn log -v --xml -r7:8 $REPO | grep "revision\|action\|mods"
   revision="7">
   action="M"
   prop-mods="false"
   text-mods="true"
   revision="8">
   action="M"
   prop-mods="true"
   text-mods="false"

$ svn diff -c7 $REPO

$ svn diff -c8 $REPO

And an editor can describe many more kinds of no-op change too, such
as "add /foo; delete /foo" and so on.

It's up to the edit receiver to decide whether to collapse no-op changes.


> * A commit links user-controlled revision meta data (e.g. comment)
>   to a user-controlled list of changes. That link shall not be
>   lost when tweak implementation details.

When you say "list of changes", I agree if you mean the "difference
between snapshots" in principle, rather than the specific list that's
stored at the end of a revision file.

> * Dump / load is a compatibility interface between SVN versions
>   and allows for forward and backward migration.
> * A plain node modification (no text nor prop change) will be
>   ignored by "svnadmin load". Even if we changed that on /trunk,
>   it would not fix old releases.

What's a "plain node modification (no text nor prop change)"?
Ambiguous w.r.t. no-op changes.

> From that, we can derive a few requirements:
>
> * If the a revision contains a "M" change, svnadmin must produce
>   a dump file that causes previous releases to produce the "M"
>   change - within reason.

The problem is, we don't have an agreed definition of an 'M' change.

> * Load shall produce no-op changes for the same data as in previous
>   releases - within reason. If the FS API would no longer support
>   creating no-op changes at some point in the future, the load process
>   should issue a warning.
> * The dump file contents shall be independent of FS implementation
>   details, i.e. use "strict" FS API functions.

I'm not sure to what you're referring here. The 'strict' flag in the
text and props comparison functions? That would omit no-op changes.

> No-op changes are rare, hence including them into a dump file
> does not significantly increase its size or processing time.

Ack.

> The
> text / proplist asymmetry in the patch is due to the fact that all
> files due have a text but maybe no proplist representation. So,
> dumping the text is always safe.
>
> V2 of the patch also covers no-op prop changes to directories.

I don't see that we've decided whether we want to preserve no-op text
changes, no-op prop changes, no-op node changes, and/or other.

It looks like your patch is aimed at preserving the no-op node change.
This correlates with an 'M' line in the 'svn log -v' output.

Do we also care about no-op text and property changes, such as are
reported in 'svn log -v --xml'? More precisely, we should define what
we care about in terms of an API.


- Julian

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Stefan Fuhrmann <st...@apache.org>.
On 26.10.2015 17:45, Evgeny Kotkov wrote:
> Stefan Fuhrmann <st...@wandisco.com> writes:
>>> As for /trunk, I think that we could do that, so I sketched this option in
>>> the attached patch.
>> The patch looks o.k.
> Thanks for giving this patch a look.
>
> I examined the differences between these functions, svn_repos__compare_files()
> and svn_fs_contents_different(), and I think that we also have a problem in
> FSFS's implementation of the new svn_fs_contents_different() API in 1.9.x.
> The implementation relies on representation_t.expanded_size value when doing
> the comparison.  If this value is zero, a representation is considered empty,
> and two empty representations are considered equal.
>
> The problem is that the expanded_size can be zero for empty files and also for
> some of the PLAIN representations, which are not empty.  If I am not mistaken,
> we have a workaround for this in /trunk — r1673875 [1] and follow-ups, but not
> in Subversion 1.9.
To be sure, I just read through the code again
and it seems that on /trunk, expanded_size
indeed always has correct value.

>    Hence, the new svn_fs_contents_different() API can fail to
> distinguish different contents in 1.9.x under the circumstances described in
> issue 4554 [2].
So, svn_fs_contents_different is broken in 1.9.x
and needs to be fixed. I'll take care of that.

-- Stefan^2.


Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Wed, Oct 21, 2015 at 2:21 PM, Evgeny Kotkov <ev...@visualsvn.com>
wrote:

> Stefan Fuhrmann <st...@wandisco.com> writes:
>
> > Could you at least use the new API in svn_repos__compare_files instead
> > of re-implementing parts of the FS back-end but worse? I know this is
> > the code as it has been in 1.8 but that does not make the it any better.
>
> Speaking of /branches/1.9.x, I would like to nominate this change as is.
> It should be easier to review, because we would be restoring things to a
> known previous state, instead of mixing new with old.
>

Old is 1.9, new is whatever delta is being proposed
for backport. Combining both patches will effectively
reduce code churn.

Ultimately, the decision of what to nominate will be
yours of course.


> As for /trunk, I think that we could do that, so I sketched this option in
> the attached patch.


The patch looks o.k.


>   Currently I am not sure that there are no subtle but
> important differences between two implementations, so doing this is going
> to require a bit more time.  Hopefully, I would be able to sort it out
> after
> we're done with the backport.
>

You see, that's the whole point here. People must not
ave to rely on undocumented behaviour but (only) on
API guarantees. If the implementation does not fulfil
those guarantees, that's a bug.

-- Stefan^2.

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Branko Čibej <br...@apache.org>.
On 29.10.2015 17:13, Evgeny Kotkov wrote:
> Julian Foad <ju...@btopenworld.com> writes:
>
>> If the state has been "well defined", then please point us to the
>> definition of it, or to the tests that confirm it works as defined. If
>> you can't, then the state has not been "well defined". All we can say
>> is it has been "unchanged" over a long period of time.
> Okay, the term 'well defined' might be not fully applicable here.  But in the
> meanwhile, if we are looking for a proper term, saying that it's 'unchanged'
> is not enough as well.
>
> We had this unchanged state for many years and releases, and it wasn't causing
> practical problems for the users.  The callers were relying on that behavior,
> and changing it caused several problems.  On the contrary, I think that the
> r1572363 + r1573111 change we're talking about is 'undefined'.  Using the
> same logic, could you please point to the tests that specify the behavior
> after this change?

The fact that we used to expose no-op changes in dumps and logs is
purely an accidental side-effect of the top-down DAG we happened to
chose for the repository structure. In other words, we've been exposing
FS implementation details to public API consumers. This was a mistake,
and the bug was fixed in 1.9.

> My point is that even if state A is not 'well defined', it's still better than
> the current 'undefined' state.  It has been there for years, it wasn't causing
> problems, it had a reasoning behind it [1], and the callers were used to it.

The link you post to basically admits that we're using an API that deals
with details of the FS representation instead of the semantics of a
revision. We've never /defined/ the semantics; the current behaviour
that actually treats a no-op as a no-op is, IMO, quite a bit more
logical than the old behaviour.

-- Brane

-- Brane

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 27 October 2015 at 04:52, Johan Corveleyn <jc...@gmail.com> wrote:
> Bert,
>
> As the OP of this mail-thread, which spun out of the discovery of a
> loss of information by 'dump' in 1.9 [1], I'd like to add some things.
>
> I found out about this problem during the Berlin hackathon, when I
> tested various dumped/loaded repositories. The loss of information is
> real, and is IMO significant (we're losing a, possibly intended,
> relationship between a log message and a particular path [2]).
>
> On Mon, Oct 26, 2015 at 6:16 PM, Bert Huijben <be...@qqmail.nl> wrote:
>>
>>
>>> -----Original Message-----
>>> From: Evgeny Kotkov [mailto:evgeny.kotkov@visualsvn.com]
>>> Sent: maandag 26 oktober 2015 17:45
>>> To: Bert Huijben <be...@qqmail.nl>; Stefan Fuhrmann
>>> <st...@wandisco.com>
>>> Cc: Johan Corveleyn <jc...@gmail.com>; Julian Foad
>>> <ju...@btopenworld.com>; dev <de...@subversion.apache.org>
>>> Subject: Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9
>>
>>
>>> This means that after r1572363 and r1573111, svn_ra_get_file_revs2() and
>>> svn_repos_get_file_revs2() were skipping some of the "interesting"
>>> revisions,
>>> according to the FS API defining the concept.  Moreover, this behavior could
>>> be inconsistent even within a single function like svn_ra_get_file_revs2()
>>> that calls svn_ra_get_log2() for old servers, as get-log notices revisions
>>> with empty deltas.
>>>
>>> I think that it's another example of where r1572363 and r1573111 introduce
>>> an
>>> inconsistent and unwanted behavior change.
>>
>> And 1.9.x assumes that the old behavior is a bug... and in many cases I agree.
>
> Did the old behavior have serious bugs that were visible to users?
> Evgeny seemed to think not [3], and no-one said otherwise. And even
> so: is it okay to introduce new bugs while fixing old ones? IMO a bug
> in 'dump' is a big deal, because it changes your repository / history.
> Moreover, people who do a dump/load might not notice the change until
> years later, after they have piled up tons more history on top of it.
>
> Maybe there is some doubt whether this is a bug or a feature, but
> while we're in doubt I think the safest option is the 0-option: keep
> the old and known behavior (or rollback to it), which didn't lose this
> information.
>
>> This is exactly where the document Julian wrote comes in.
>
> As I said earlier in this thread, I think that document [4] is a great
> effort. But if you read it carefully, you'll see it does not
> contradict having no-op changes in the repository history, and
> exposing them for instance through 'svn log'. If we're supporting
> those (and we have until 1.8), we must be able to dump them.
>
> See specifically the final section titled "ARBITRARY DIFF VS.
> SINGLE-REVISION CHANGE ", where Julian argues:
>
> [[[
> As best I understand it, the idea of recording a no-op-change is meaningful
> and relatively straightforward to define at the level of a single state
> transition. We think of a commit as such a transition, and it is, but as
> mentioned above it's not in general the exact same transition that the
> client described.
>
> Attempting to derive a notion of 'no-op-change' that applies to a
> difference taken between an arbitrary pair of points in the version
> history, on the other hand, is not at all straightforward, and we do not
> have a concept of its meaning in relation to merging and so on.
>
> Now, the "svn log -v" output clearly applies to a single commit, a single
> state transition, and thus we find the indication of no-op changes there to
> be somewhat satisfactory. The code that generates this output, on the other
> hand, uses APIs that compare arbitrary points in history, such as
>
>     svn_fs_contents_changed(root1:path1, root2:path2)
>     svn_fs_props_changed
>
> Comparing arbitrary points in history is an operation that, throughout
> pretty much all of the version control system, is used really only when we
> want and need to know about real changes. Hence the definition of a new
> pair of APIs,
>
>     svn_fs_contents_different
>     svn_fs_props_different
>
> to specifically provide that meaning.
>
> What purpose remains for the original _changed() APIs, then? At first it
> wasn't clear there was any real purpose, but if we want "svn log" etc. to
> continue as before, then we need something like them. Except for this
> purpose we don't need APIs that compare two arbitrary states; we need APIs
> that compare two successive states, because this 'touched' concept only
> makes sense in this context.
> ]]]
>
> Whether we go for a complete redesign of the APIs or not, the above
> text nicely explains some different ways of looking at this, and gives
> "no-op changes" a place in our feature set.
>
>> If we wanted 1.9.x to behave in all ways identical to 1.8.x, we wouldn't
>> have created 1.9. We would have never released something different than
>> the old thing. Stefan spend quite some time in improving things, and
>> upto now most users agreed that this was an improvement. (The time to
>> speak up was during the release candidates)
>>
>>
>> Every new feature or bugfix changes behavior.
>> Just 'thinking that this is another inconsistent behavior change' doesn't
>> make a new argument on why this behavior change should be backported to
>> 1.9.x.
>>
>
> In my opinion the changed behavior of dump is a bug, not just a
> behavior change. Unfortunately, I only found the bug after release.
> But even if you don't think it's a bug, it was definitely an
> unexpected side-effect of the refactoring done by stefan2.
>
> Stefan proposed another way of fixing this, different from Evgeny's
> patch, but both agreed that the dump behavior was a bug and that it
> should be fixed. Julian too agreed that the change (the new code)
> should be reverted [5].
>
>>
>> I don't think reporting something as changed, when it is clearly not
>> changed is a good thing.
>>
>> We should decide when we want to see something as 'changed' and what
>> definition of 'changed' should be used where.
>>
>> Just going back to 1.8 is not the way to approach this.
>> That just changes one 'somehow broken implementation' (in one definition)
>> with a 'somehow broken implementation' (with a different definition).
>> We should define what behavior we really want (where)... and document why
>> we want that behavior there. Until then I don't think we should backport
>> anything.
>>
>> Both the 1.8.x behavior and the 1.9.x behavior are released... Going back
>> to 1.8.x is not going to fix everybody's usecases.
>
> Okay, well, I agree we should eventually go for a clear specification
> and design, and then implement that. But in the meantime we have a
> real bug in 1.9 [1] which can cause damage (or in any case "doubtful
> changes") to repositories (when an admin performs a dump+load). I
> would prefer that we try to fix the dump bug and backport it as soon
> as possible (getting us back to a good working state), and then take
> time to work out the long term solution.
>
>
+1 on all above.


-- 
Ivan Zhakov

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Julian Foad <ju...@apache.org>.
Evgeny Kotkov <ev...@visualsvn.com> wrote:
> Bert Huijben <be...@qqmail.nl> writes:
>> But this 'go back to 1.8' suggestio changes Subversion everywhere. It will
>> make 'svn annotate' slower... Makes 'svn update' slower and report more tree
>> conflicts, etc. etc.
>
> For 'svn blame', the only difference is that we would be processing no-op
> changes in file history, and I don't think that it would have a noticeable
> impact on the speed because no-op changes aren't the common case.

I think you misunderstand what the problem was with the old 'blame -g'
code. We should look up the actual code and As far as I recall, the
interesting 'blame -g' case concerns a comparison between files
'trunk/foo' and 'branch1/foo': it is common for those files to have
the same content, while the old 'blame -g' code relied on the
comparison always reporting a 'difference' in such a case even when
they have the same content. This is not a case where "committing a
no-op change" ever happened to the file.



> 'svn update' isn't going to be slower as well, and isn't going to produce
> more tree conflicts.  The reporter does not send empty deltas since 1.8 —
> see r1442555 and the related discussion in [1, 2].
>
>> We never designed the old behavior; we just used the functions that were
>> there. If we want it back we should document it, probably add regression
>> tests... and determine in which places we want it.
>
> I have to disagree.
>
> From what I see, the original design of svn_fs_contents_changed() and its
> calling sites had all these cases in mind.  Even more, it was done that way to
> support them.  Quoting C. Michael Pilato from the already mentioned thread [3]:
> [[[
>   svn_fs_contents_changed() was not designed to be used to detect when two
>   files have different content, but really to detect when the contents of a
>   given file have changed across two points in its history. For the purposes
>   of preserving accurate history, certain bits of code (such as the repos dump
>   code) needs to care about this distinction. For example, it's not an error
>   from the FS API point of view to call svn_fs_apply_textdelta() and
>   explicitly set a file's contents to exactly what they were before you made
>   the edit. But we try to preserve that fact when dumping rather than claim
>   that the file didn't change at all (despite there being a change of modified
>   parent directories and an associated `changes' table entry which claim
>   otherwise.)
> ]]]

If you will notice, C-Mike didn't address the difference between a
"change" in a linear sequence of commits and a "change" across two
[arbitrary] points in its history. While the meaning of the former is
straightforward, the latter is not, as I discussed in my separate
post. In fact, the idea that there is a "change" across two arbitrary
points in history is degenerative, in that after two versions of file
"foo" have been created (trunk/foo and branch1/foo) and a "change" is
reported when comparing them, there is no way in Subversion to commit
a new state such that no "change" is reported.

So I understand C-Mike's quote as applying only to the case where
svn_fs_contents_changed() is given two points along the history of a
given file or directory. The case where it is given two arbitrary
points in history (such as trunk/foo and branch1/foo) is, so far,
neither defined nor justified.

Do you think there is a way in which reporting a "change" as the
result of comparing two arbitrary points in history can be useful? If
so, can you explain what and how?

- Julian


> In 1.9 we changed how svn_fs_contents_changed() works, added an errata,
> claimed that all of these distinctions are "false positives" [4], and switched
> all the calling sites to the new API that doesn't care about them.  As one
> visible consequence, we no longer dump the changes that the original API
> was designed to preserve.  And there could be more.
>
> [1] https://svn.apache.org/r1442555
> [2] http://svn.haxx.se/dev/archive-2013-01/0439.shtml
> [3] http://svn.haxx.se/dev/archive-2013-02/0002.shtml
> [4] https://subversion.apache.org/docs/api/latest/group__fs__handling.html#gaede66ee7850d389bcdeb5ddef1540fdc
>
>
> Regards,
> Evgeny Kotkov

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Evgeny Kotkov <ev...@visualsvn.com> writes:

> Since this discussion has been lasting for over a month now, I proposed
> what I believe is the proper solution for the problem (and that's (1) )
> in /branches/1.9.x/STATUS.

After giving it more thought, I agree that changing other cases isn't required
and justified in context of this nomination.  I undid the part of the change
touching other calling sites, except for those in dump.c that are required to
solve issue #4598, and updated the nomination.

This leaves us with a couple of related issues in Subversion 1.9.2 (fixed in
trunk), but we can handle them with separate nominations:

(1) svn_fs_contents_different() can produce invalid results under circumstances
    described in issue 4554 [1].  In practice this means that, for instance,
    one can have a working copy that misses an update in a file that happened
    in the repository, and produces "Base checksum mismatch" errors on attempts
    to commit changes in this file.

    This issue was addressed by r1673875 [2] and follow-ups.

(2) svn_fs_props_different() relies on MD5 checksum for the comparison.

    This issue was addressed by r1705638 [3] and follow-ups.

I'll look into doing that.

[1] https://issues.apache.org/jira/browse/SVN-4554
[2] https://svn.apache.org/r1673875
[3] https://svn.apache.org/r1705638


Regards,
Evgeny Kotkov

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Julian Foad <ju...@btopenworld.com> writes:

> If the state has been "well defined", then please point us to the
> definition of it, or to the tests that confirm it works as defined. If
> you can't, then the state has not been "well defined". All we can say
> is it has been "unchanged" over a long period of time.

Okay, the term 'well defined' might be not fully applicable here.  But in the
meanwhile, if we are looking for a proper term, saying that it's 'unchanged'
is not enough as well.

We had this unchanged state for many years and releases, and it wasn't causing
practical problems for the users.  The callers were relying on that behavior,
and changing it caused several problems.  On the contrary, I think that the
r1572363 + r1573111 change we're talking about is 'undefined'.  Using the
same logic, could you please point to the tests that specify the behavior
after this change?

My point is that even if state A is not 'well defined', it's still better than
the current 'undefined' state.  It has been there for years, it wasn't causing
problems, it had a reasoning behind it [1], and the callers were used to it.

[1] http://svn.haxx.se/dev/archive-2013-02/0002.shtml


Regards,
Evgeny Kotkov

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Julian Foad <ju...@btopenworld.com>.
Evgeny, part of the problem is this:

Evgeny Kotkov <ev...@visualsvn.com> wrote:
> Bert Huijben <be...@qqmail.nl> writes:
>> I'm just -0.9 on reverting the rest 'as it *may* have problems too'.
>
> Perhaps I was not clear on my motivation, so here is how I see it.
>
> We had state A, and it has been well-defined during many consecutive releases.

If the state has been "well defined", then please point us to the
definition of it, or to the tests that confirm it works as defined. If
you can't, then the state has not been "well defined". All we can say
is it has been "unchanged" over a long period of time.

Just making this point before reading further.

- Julian

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Bert Huijben <be...@qqmail.nl> writes:

> I'm just -0.9 on reverting the rest 'as it *may* have problems too'.

Perhaps I was not clear on my motivation, so here is how I see it.

We had state A, and it has been well-defined during many consecutive releases.
Then we made an arbitrary change that transitioned everything to state B.  This
transition to state B doesn't fix any user-visible problems.  One year later we
discover a couple of broken cases that affect users in concrete ways — and,
unfortunately, one of them was only noticed after we released 1.9.

What can we do about that?

(1) Get back to state A, as the change was arbitrary.  We couldn't foresee
    the consequences of the change and that resulted in problems.  The change
    itself didn't have a specific use case in mind and didn't target a specific
    user-visible problem.  As opposed to that, state A existed for a long time,
    had a reasoning behind the chosen design, and everyone was happy with it.

(2) Move into state B* that mitigates the concrete problem for concrete users.

If you're thinking that we should do (2), this leaves some questions:

- Why is the arbitrary change made without a specific use case in mind suddenly
  more sacred than state A that has been working for years?

- Should we be fixing other problems possibly caused by this change in patch
  releases, as they happen to unveil themselves?

- What are the benefits from going to B* from the end-user's perspective?
  Perhaps, there are more important things that we could concentrate on,
  instead of possibly dealing with other problems caused by an arbitrary change
  in 1.9?  Although this is particularly out of scope of this discussion, we
  are spending time on solving a problem that wasn't bothering anyone and
  probably didn't exist in the past.

> Yes, there are still more users that use 1.0-1.8.x, but that doesn't make
> the 1.9.0-1.9.2 behavior directly 'broken'... 1.8.x has different behavior,
> relying on unexpected/unintended features but for many users they are useful.

I hardly believe that the previous behavior that dates back to year 2001 is
less expected/intended than the behavior done by a commit that doesn't target
a specific use case and has as well broken several use cases.

Since this discussion has been lasting for over a month now, I proposed
what I believe is the proper solution for the problem (and that's (1) )
in /branches/1.9.x/STATUS.


Regards,
Evgeny Kotkov

RE: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Evgeny Kotkov [mailto:evgeny.kotkov@visualsvn.com]
> Sent: dinsdag 27 oktober 2015 14:38
> To: Bert Huijben <be...@qqmail.nl>
> Cc: Johan Corveleyn <jc...@gmail.com>; Stefan Fuhrmann
> <st...@wandisco.com>; Julian Foad
> <ju...@btopenworld.com>; dev <de...@subversion.apache.org>
> Subject: Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9


> In 1.9 we changed how svn_fs_contents_changed() works, added an errata,
> claimed that all of these distinctions are "false positives" [4], and switched
> all the calling sites to the new API that doesn't care about them.  As one
> visible consequence, we no longer dump the changes that the original API
> was designed to preserve.  And there could be more.

You are still focusing on the dump scenario.

I'm +1 on backporting that when documentation and regression tests are added.
(I think for that specific case the tests are done)


I'm just -0.9 on reverting the rest 'as it *may* have problems too'.

That is not a good enough reason for me to backport a 'revert to 1.8 style' thing. That part needs a better reason. The other behavior changes need their own tests, their own documentation and their own argumentation.


Backporting now and only then trying to determine where this affects behavior is the wrong order.



If somebody wants to backport more than the dump/load fix it should be for a good reason. And every part of a backport with such impact needs a reason.



We should find out where backporting the change affects behavior *now* and only backport after that.  Not the other way around (Revert, release and only then review)


We can't undo the release of 1.9.0, 1.9.1 and 1.9.2... and going back to the old behavior is a huge change.


Yes, there are still more users that use 1.0-1.8.x, but that doesn't make the 1.9.0-1.9.2 behavior directly 'broken'... 1.8.x has different behavior, relying on unexpected/unintended features but for many users they are useful.


But I haven't seen a document update explaining why those errata you mentioned where *all false*.




I still see a reasoning for fixing 'dump'... and a patch (not even a nomination yet) for changing much more than dump.


If the errata were bad, I also miss the errata patch.

	Bert


Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Bert Huijben <be...@qqmail.nl> writes:

> But this 'go back to 1.8' suggestio changes Subversion everywhere. It will
> make 'svn annotate' slower... Makes 'svn update' slower and report more tree
> conflicts, etc. etc.

For 'svn blame', the only difference is that we would be processing no-op
changes in file history, and I don't think that it would have a noticeable
impact on the speed because no-op changes aren't the common case.

'svn update' isn't going to be slower as well, and isn't going to produce
more tree conflicts.  The reporter does not send empty deltas since 1.8 —
see r1442555 and the related discussion in [1, 2].

> We never designed the old behavior; we just used the functions that were
> there. If we want it back we should document it, probably add regression
> tests... and determine in which places we want it.

I have to disagree.

>From what I see, the original design of svn_fs_contents_changed() and its
calling sites had all these cases in mind.  Even more, it was done that way to
support them.  Quoting C. Michael Pilato from the already mentioned thread [3]:
[[[
  svn_fs_contents_changed() was not designed to be used to detect when two
  files have different content, but really to detect when the contents of a
  given file have changed across two points in its history. For the purposes
  of preserving accurate history, certain bits of code (such as the repos dump
  code) needs to care about this distinction. For example, it's not an error
  from the FS API point of view to call svn_fs_apply_textdelta() and
  explicitly set a file's contents to exactly what they were before you made
  the edit. But we try to preserve that fact when dumping rather than claim
  that the file didn't change at all (despite there being a change of modified
  parent directories and an associated `changes' table entry which claim
  otherwise.)
]]]

In 1.9 we changed how svn_fs_contents_changed() works, added an errata,
claimed that all of these distinctions are "false positives" [4], and switched
all the calling sites to the new API that doesn't care about them.  As one
visible consequence, we no longer dump the changes that the original API
was designed to preserve.  And there could be more.

[1] https://svn.apache.org/r1442555
[2] http://svn.haxx.se/dev/archive-2013-01/0439.shtml
[3] http://svn.haxx.se/dev/archive-2013-02/0002.shtml
[4] https://subversion.apache.org/docs/api/latest/group__fs__handling.html#gaede66ee7850d389bcdeb5ddef1540fdc


Regards,
Evgeny Kotkov

RE: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Bert Huijben <be...@qqmail.nl>.
Summarizing a few mental notes:

 

Broken behavior (1 revision to the direct next/previous)

·         Replay / svnadmin dump
(svnadmin dump is tightly integrated with replay. Svnrdump works on top of replay)

·         Log

Mostly caused by having the old behavior for years… never intended/designed, but ‘works’.

 

è Needs design/documentation + tests + backport

 

 

 

Good behavior (calculating deltas of multiple changes):

·         Update/checkout/switch/diff/merge

No-op changes make no sense when we collapse changes

·         What about last-* values?

 

 

Design issues:

·         Editor v2

o   can’t express no-op changes yet

o   Is used as a 100% replacement for the delta editor even in these cases

o   Partially released in 1.9 via JavaHL

·         (Editor v3??)

 

 

File revisions:

·         1.9 behavior works best for ‘svn annotate’ (avoids calculations on not-changed files)

·         1.5-1.8 behavior required for ‘svn annotate -g’ (Wants to know all revisions. Uses that to trigger behavior. Deltas sometimes between unexpected revisions)

Api only used for blame or otherwise determining actual changes.

 

                Bert

 

 

From: bert@qqmail.nl [mailto:bert@qqmail.nl] 
Sent: dinsdag 27 oktober 2015 08:44
To: Johan Corveleyn <jc...@gmail.com>
Cc: Evgeny Kotkov <ev...@visualsvn.com>; Stefan Fuhrmann <st...@wandisco.com>; Julian Foad <ju...@btopenworld.com>; dev <de...@subversion.apache.org>
Subject: RE: No-op changes no longer dumped by 'svnadmin dump' in 1.9

 

But as Julian and Branko pointed out Subversion's update operation works on calculating deltas over the actual changes. Seeing non-changes as changes there introduces unwanted behavior.

 

Going back to the old code that assumes something is changed in these cases + in some unknown/undocumented/unintended other cases is not the way to design our software.

 

We should *decide* when we want which behaior. We should not decide we want to go back to that unknown/undocumented/unintended everywhere, without documenting this.

 

 

Just going back *everywhere* without deciding why will make it impossible to improve Subversion in the future as we will always break things.

 

 

We never designed the old behavior; we just used the functions that were there. If we want it back we should document it, probably add regression tests... and determine in which places we want it.

 

 

The original request is about legacy behavior of a cvs import during svn log.

 

 

But this 'go back to 1.8' suggestio changes Subversion everywhere. It will make 'svn annotate' slower... Makes 'svn update' slower and report more tree conflicts, etc. etc.

 

Handling non-changes as changes make a lot less sense in those cases... while we already have the code to fix those cases.

 

 

We should revert the behavior where it makes sense. Reverting it everywhere 'just because' doesn't make sense.

 

 

    Bert

 

Sent from Mail <http://go.microsoft.com/fwlink/?LinkId=550986>  for Windows 10

 

 


From: Johan Corveleyn
Sent: dinsdag 27 oktober 2015 02:53
To: Bert Huijben
Cc: Evgeny Kotkov;Stefan Fuhrmann;Julian Foad;dev
Subject: Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

 

 

Bert,

 

As the OP of this mail-thread, which spun out of the discovery of a

loss of information by 'dump' in 1.9 [1], I'd like to add some things.

 

I found out about this problem during the Berlin hackathon, when I

tested various dumped/loaded repositories. The loss of information is

real, and is IMO significant (we're losing a, possibly intended,

relationship between a log message and a particular path [2]).

 

On Mon, Oct 26, 2015 at 6:16 PM, Bert Huijben <bert@qqmail.nl <ma...@qqmail.nl> > wrote:

> 

> 

>> -----Original Message-----

>> From: Evgeny Kotkov [mailto:evgeny.kotkov@visualsvn.com]

>> Sent: maandag 26 oktober 2015 17:45

>> To: Bert Huijben <bert@qqmail.nl <ma...@qqmail.nl> >; Stefan Fuhrmann

>> <stefan.fuhrmann@wandisco.com <ma...@wandisco.com> >

>> Cc: Johan Corveleyn <jcorvel@gmail.com <ma...@gmail.com> >; Julian Foad

>> <julianfoad@btopenworld.com <ma...@btopenworld.com> >; dev <dev@subversion.apache.org <ma...@subversion.apache.org> >

>> Subject: Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

> 

> 

>> This means that after r1572363 and r1573111, svn_ra_get_file_revs2() and

>> svn_repos_get_file_revs2() were skipping some of the "interesting"

>> revisions,

>> according to the FS API defining the concept.  Moreover, this behavior could

>> be inconsistent even within a single function like svn_ra_get_file_revs2()

>> that calls svn_ra_get_log2() for old servers, as get-log notices revisions

>> with empty deltas.

>> 

>> I think that it's another example of where r1572363 and r1573111 introduce

>> an

>> inconsistent and unwanted behavior change.

> 

> And 1.9.x assumes that the old behavior is a bug... and in many cases I agree.

 

Did the old behavior have serious bugs that were visible to users?

Evgeny seemed to think not [3], and no-one said otherwise. And even

so: is it okay to introduce new bugs while fixing old ones? IMO a bug

in 'dump' is a big deal, because it changes your repository / history.

Moreover, people who do a dump/load might not notice the change until

years later, after they have piled up tons more history on top of it.

 

Maybe there is some doubt whether this is a bug or a feature, but

while we're in doubt I think the safest option is the 0-option: keep

the old and known behavior (or rollback to it), which didn't lose this

information.

 

> This is exactly where the document Julian wrote comes in.

 

As I said earlier in this thread, I think that document [4] is a great

effort. But if you read it carefully, you'll see it does not

contradict having no-op changes in the repository history, and

exposing them for instance through 'svn log'. If we're supporting

those (and we have until 1.8), we must be able to dump them.

 

See specifically the final section titled "ARBITRARY DIFF VS.

SINGLE-REVISION CHANGE ", where Julian argues:

 

[[[

As best I understand it, the idea of recording a no-op-change is meaningful

and relatively straightforward to define at the level of a single state

transition. We think of a commit as such a transition, and it is, but as

mentioned above it's not in general the exact same transition that the

client described.

 

Attempting to derive a notion of 'no-op-change' that applies to a

difference taken between an arbitrary pair of points in the version

history, on the other hand, is not at all straightforward, and we do not

have a concept of its meaning in relation to merging and so on.

 

Now, the "svn log -v" output clearly applies to a single commit, a single

state transition, and thus we find the indication of no-op changes there to

be somewhat satisfactory. The code that generates this output, on the other

hand, uses APIs that compare arbitrary points in history, such as

 

    svn_fs_contents_changed(root1:path1, root2:path2)

    svn_fs_props_changed

 

Comparing arbitrary points in history is an operation that, throughout

pretty much all of the version control system, is used really only when we

want and need to know about real changes. Hence the definition of a new

pair of APIs,

 

    svn_fs_contents_different

    svn_fs_props_different

 

to specifically provide that meaning.

 

What purpose remains for the original _changed() APIs, then? At first it

wasn't clear there was any real purpose, but if we want "svn log" etc. to

continue as before, then we need something like them. Except for this

purpose we don't need APIs that compare two arbitrary states; we need APIs

that compare two successive states, because this 'touched' concept only

makes sense in this context.

]]]

 

Whether we go for a complete redesign of the APIs or not, the above

text nicely explains some different ways of looking at this, and gives

"no-op changes" a place in our feature set.

 

> If we wanted 1.9.x to behave in all ways identical to 1.8.x, we wouldn't have created 1.9. We would have never released something different than the old thing. Stefan spend quite some time in improving things, and upto now most users agreed that this was an improvement. (The time to speak up was during the release candidates)

> 

> 

> Every new feature or bugfix changes behavior.

> Just 'thinking that this is another inconsistent behavior change' doesn't make a new argument on why this behavior change should be backported to 1.9.x.

> 

 

In my opinion the changed behavior of dump is a bug, not just a

behavior change. Unfortunately, I only found the bug after release.

But even if you don't think it's a bug, it was definitely an

unexpected side-effect of the refactoring done by stefan2.

 

Stefan proposed another way of fixing this, different from Evgeny's

patch, but both agreed that the dump behavior was a bug and that it

should be fixed. Julian too agreed that the change (the new code)

should be reverted [5].

 

> 

> I don't think reporting something as changed, when it is clearly not changed is a good thing.

> 

> We should decide when we want to see something as 'changed' and what definition of 'changed' should be used where.

> 

> Just going back to 1.8 is not the way to approach this.

> That just changes one 'somehow broken implementation' (in one definition) with a 'somehow broken implementation' (with a different definition).

> We should define what behavior we really want (where)... and document why we want that behavior there. Until then I don't think we should backport anything.

> 

> Both the 1.8.x behavior and the 1.9.x behavior are released... Going back to 1.8.x is not going to fix everybody's usecases.

 

Okay, well, I agree we should eventually go for a clear specification

and design, and then implement that. But in the meantime we have a

real bug in 1.9 [1] which can cause damage (or in any case "doubtful

changes") to repositories (when an admin performs a dump+load). I

would prefer that we try to fix the dump bug and backport it as soon

as possible (getting us back to a good working state), and then take

time to work out the long term solution.

 

 

[1] Issue #4598 "No-op changes no longer dumped by 'svnadmin dump' in

1.9", http://subversion.tigris.org/issues/show_bug.cgi?id=4598

 

[2] http://svn.haxx.se/dev/archive-2015-09/0290.shtml

 

[3] http://svn.haxx.se/dev/archive-2015-10/0085.shtml

 

[4] http://svn.haxx.se/dev/archive-2015-10/0082.shtml

 

[5] http://svn.haxx.se/dev/archive-2015-09/0292.shtml

 

-- 

Johan

 

 


RE: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by be...@qqmail.nl.
But as Julian and Branko pointed out Subversion's update operation works on calculating deltas over the actual changes. Seeing non-changes as changes there introduces unwanted behavior.

Going back to the old code that assumes something is changed in these cases + in some unknown/undocumented/unintended other cases is not the way to design our software.

We should *decide* when we want which behaior. We should not decide we want to go back to that unknown/undocumented/unintended everywhere, without documenting this.


Just going back *everywhere* without deciding why will make it impossible to improve Subversion in the future as we will always break things.


We never designed the old behavior; we just used the functions that were there. If we want it back we should document it, probably add regression tests... and determine in which places we want it.


The original request is about legacy behavior of a cvs import during svn log.


But this 'go back to 1.8' suggestio changes Subversion everywhere. It will make 'svn annotate' slower... Makes 'svn update' slower and report more tree conflicts, etc. etc.

Handling non-changes as changes make a lot less sense in those cases... while we already have the code to fix those cases.


We should revert the behavior where it makes sense. Reverting it everywhere 'just because' doesn't make sense.


    Bert

Sent from Mail for Windows 10



From: Johan Corveleyn
Sent: dinsdag 27 oktober 2015 02:53
To: Bert Huijben
Cc: Evgeny Kotkov;Stefan Fuhrmann;Julian Foad;dev
Subject: Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9


Bert,

As the OP of this mail-thread, which spun out of the discovery of a
loss of information by 'dump' in 1.9 [1], I'd like to add some things.

I found out about this problem during the Berlin hackathon, when I
tested various dumped/loaded repositories. The loss of information is
real, and is IMO significant (we're losing a, possibly intended,
relationship between a log message and a particular path [2]).

On Mon, Oct 26, 2015 at 6:16 PM, Bert Huijben <be...@qqmail.nl> wrote:
>
>
>> -----Original Message-----
>> From: Evgeny Kotkov [mailto:evgeny.kotkov@visualsvn.com]
>> Sent: maandag 26 oktober 2015 17:45
>> To: Bert Huijben <be...@qqmail.nl>; Stefan Fuhrmann
>> <st...@wandisco.com>
>> Cc: Johan Corveleyn <jc...@gmail.com>; Julian Foad
>> <ju...@btopenworld.com>; dev <de...@subversion.apache.org>
>> Subject: Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9
>
>
>> This means that after r1572363 and r1573111, svn_ra_get_file_revs2() and
>> svn_repos_get_file_revs2() were skipping some of the "interesting"
>> revisions,
>> according to the FS API defining the concept.  Moreover, this behavior could
>> be inconsistent even within a single function like svn_ra_get_file_revs2()
>> that calls svn_ra_get_log2() for old servers, as get-log notices revisions
>> with empty deltas.
>>
>> I think that it's another example of where r1572363 and r1573111 introduce
>> an
>> inconsistent and unwanted behavior change.
>
> And 1.9.x assumes that the old behavior is a bug... and in many cases I agree.

Did the old behavior have serious bugs that were visible to users?
Evgeny seemed to think not [3], and no-one said otherwise. And even
so: is it okay to introduce new bugs while fixing old ones? IMO a bug
in 'dump' is a big deal, because it changes your repository / history.
Moreover, people who do a dump/load might not notice the change until
years later, after they have piled up tons more history on top of it.

Maybe there is some doubt whether this is a bug or a feature, but
while we're in doubt I think the safest option is the 0-option: keep
the old and known behavior (or rollback to it), which didn't lose this
information.

> This is exactly where the document Julian wrote comes in.

As I said earlier in this thread, I think that document [4] is a great
effort. But if you read it carefully, you'll see it does not
contradict having no-op changes in the repository history, and
exposing them for instance through 'svn log'. If we're supporting
those (and we have until 1.8), we must be able to dump them.

See specifically the final section titled "ARBITRARY DIFF VS.
SINGLE-REVISION CHANGE ", where Julian argues:

[[[
As best I understand it, the idea of recording a no-op-change is meaningful
and relatively straightforward to define at the level of a single state
transition. We think of a commit as such a transition, and it is, but as
mentioned above it's not in general the exact same transition that the
client described.

Attempting to derive a notion of 'no-op-change' that applies to a
difference taken between an arbitrary pair of points in the version
history, on the other hand, is not at all straightforward, and we do not
have a concept of its meaning in relation to merging and so on.

Now, the "svn log -v" output clearly applies to a single commit, a single
state transition, and thus we find the indication of no-op changes there to
be somewhat satisfactory. The code that generates this output, on the other
hand, uses APIs that compare arbitrary points in history, such as

    svn_fs_contents_changed(root1:path1, root2:path2)
    svn_fs_props_changed

Comparing arbitrary points in history is an operation that, throughout
pretty much all of the version control system, is used really only when we
want and need to know about real changes. Hence the definition of a new
pair of APIs,

    svn_fs_contents_different
    svn_fs_props_different

to specifically provide that meaning.

What purpose remains for the original _changed() APIs, then? At first it
wasn't clear there was any real purpose, but if we want "svn log" etc. to
continue as before, then we need something like them. Except for this
purpose we don't need APIs that compare two arbitrary states; we need APIs
that compare two successive states, because this 'touched' concept only
makes sense in this context.
]]]

Whether we go for a complete redesign of the APIs or not, the above
text nicely explains some different ways of looking at this, and gives
"no-op changes" a place in our feature set.

> If we wanted 1.9.x to behave in all ways identical to 1.8.x, we wouldn't have created 1.9. We would have never released something different than the old thing. Stefan spend quite some time in improving things, and upto now most users agreed that this was an improvement. (The time to speak up was during the release candidates)
>
>
> Every new feature or bugfix changes behavior.
> Just 'thinking that this is another inconsistent behavior change' doesn't make a new argument on why this behavior change should be backported to 1.9.x.
>

In my opinion the changed behavior of dump is a bug, not just a
behavior change. Unfortunately, I only found the bug after release.
But even if you don't think it's a bug, it was definitely an
unexpected side-effect of the refactoring done by stefan2.

Stefan proposed another way of fixing this, different from Evgeny's
patch, but both agreed that the dump behavior was a bug and that it
should be fixed. Julian too agreed that the change (the new code)
should be reverted [5].

>
> I don't think reporting something as changed, when it is clearly not changed is a good thing.
>
> We should decide when we want to see something as 'changed' and what definition of 'changed' should be used where.
>
> Just going back to 1.8 is not the way to approach this.
> That just changes one 'somehow broken implementation' (in one definition) with a 'somehow broken implementation' (with a different definition).
> We should define what behavior we really want (where)... and document why we want that behavior there. Until then I don't think we should backport anything.
>
> Both the 1.8.x behavior and the 1.9.x behavior are released... Going back to 1.8.x is not going to fix everybody's usecases.

Okay, well, I agree we should eventually go for a clear specification
and design, and then implement that. But in the meantime we have a
real bug in 1.9 [1] which can cause damage (or in any case "doubtful
changes") to repositories (when an admin performs a dump+load). I
would prefer that we try to fix the dump bug and backport it as soon
as possible (getting us back to a good working state), and then take
time to work out the long term solution.


[1] Issue #4598 "No-op changes no longer dumped by 'svnadmin dump' in
1.9", http://subversion.tigris.org/issues/show_bug.cgi?id=4598

[2] http://svn.haxx.se/dev/archive-2015-09/0290.shtml

[3] http://svn.haxx.se/dev/archive-2015-10/0085.shtml

[4] http://svn.haxx.se/dev/archive-2015-10/0082.shtml

[5] http://svn.haxx.se/dev/archive-2015-09/0292.shtml

-- 
Johan



Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Johan Corveleyn <jc...@gmail.com>.
Bert,

As the OP of this mail-thread, which spun out of the discovery of a
loss of information by 'dump' in 1.9 [1], I'd like to add some things.

I found out about this problem during the Berlin hackathon, when I
tested various dumped/loaded repositories. The loss of information is
real, and is IMO significant (we're losing a, possibly intended,
relationship between a log message and a particular path [2]).

On Mon, Oct 26, 2015 at 6:16 PM, Bert Huijben <be...@qqmail.nl> wrote:
>
>
>> -----Original Message-----
>> From: Evgeny Kotkov [mailto:evgeny.kotkov@visualsvn.com]
>> Sent: maandag 26 oktober 2015 17:45
>> To: Bert Huijben <be...@qqmail.nl>; Stefan Fuhrmann
>> <st...@wandisco.com>
>> Cc: Johan Corveleyn <jc...@gmail.com>; Julian Foad
>> <ju...@btopenworld.com>; dev <de...@subversion.apache.org>
>> Subject: Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9
>
>
>> This means that after r1572363 and r1573111, svn_ra_get_file_revs2() and
>> svn_repos_get_file_revs2() were skipping some of the "interesting"
>> revisions,
>> according to the FS API defining the concept.  Moreover, this behavior could
>> be inconsistent even within a single function like svn_ra_get_file_revs2()
>> that calls svn_ra_get_log2() for old servers, as get-log notices revisions
>> with empty deltas.
>>
>> I think that it's another example of where r1572363 and r1573111 introduce
>> an
>> inconsistent and unwanted behavior change.
>
> And 1.9.x assumes that the old behavior is a bug... and in many cases I agree.

Did the old behavior have serious bugs that were visible to users?
Evgeny seemed to think not [3], and no-one said otherwise. And even
so: is it okay to introduce new bugs while fixing old ones? IMO a bug
in 'dump' is a big deal, because it changes your repository / history.
Moreover, people who do a dump/load might not notice the change until
years later, after they have piled up tons more history on top of it.

Maybe there is some doubt whether this is a bug or a feature, but
while we're in doubt I think the safest option is the 0-option: keep
the old and known behavior (or rollback to it), which didn't lose this
information.

> This is exactly where the document Julian wrote comes in.

As I said earlier in this thread, I think that document [4] is a great
effort. But if you read it carefully, you'll see it does not
contradict having no-op changes in the repository history, and
exposing them for instance through 'svn log'. If we're supporting
those (and we have until 1.8), we must be able to dump them.

See specifically the final section titled "ARBITRARY DIFF VS.
SINGLE-REVISION CHANGE ", where Julian argues:

[[[
As best I understand it, the idea of recording a no-op-change is meaningful
and relatively straightforward to define at the level of a single state
transition. We think of a commit as such a transition, and it is, but as
mentioned above it's not in general the exact same transition that the
client described.

Attempting to derive a notion of 'no-op-change' that applies to a
difference taken between an arbitrary pair of points in the version
history, on the other hand, is not at all straightforward, and we do not
have a concept of its meaning in relation to merging and so on.

Now, the "svn log -v" output clearly applies to a single commit, a single
state transition, and thus we find the indication of no-op changes there to
be somewhat satisfactory. The code that generates this output, on the other
hand, uses APIs that compare arbitrary points in history, such as

    svn_fs_contents_changed(root1:path1, root2:path2)
    svn_fs_props_changed

Comparing arbitrary points in history is an operation that, throughout
pretty much all of the version control system, is used really only when we
want and need to know about real changes. Hence the definition of a new
pair of APIs,

    svn_fs_contents_different
    svn_fs_props_different

to specifically provide that meaning.

What purpose remains for the original _changed() APIs, then? At first it
wasn't clear there was any real purpose, but if we want "svn log" etc. to
continue as before, then we need something like them. Except for this
purpose we don't need APIs that compare two arbitrary states; we need APIs
that compare two successive states, because this 'touched' concept only
makes sense in this context.
]]]

Whether we go for a complete redesign of the APIs or not, the above
text nicely explains some different ways of looking at this, and gives
"no-op changes" a place in our feature set.

> If we wanted 1.9.x to behave in all ways identical to 1.8.x, we wouldn't have created 1.9. We would have never released something different than the old thing. Stefan spend quite some time in improving things, and upto now most users agreed that this was an improvement. (The time to speak up was during the release candidates)
>
>
> Every new feature or bugfix changes behavior.
> Just 'thinking that this is another inconsistent behavior change' doesn't make a new argument on why this behavior change should be backported to 1.9.x.
>

In my opinion the changed behavior of dump is a bug, not just a
behavior change. Unfortunately, I only found the bug after release.
But even if you don't think it's a bug, it was definitely an
unexpected side-effect of the refactoring done by stefan2.

Stefan proposed another way of fixing this, different from Evgeny's
patch, but both agreed that the dump behavior was a bug and that it
should be fixed. Julian too agreed that the change (the new code)
should be reverted [5].

>
> I don't think reporting something as changed, when it is clearly not changed is a good thing.
>
> We should decide when we want to see something as 'changed' and what definition of 'changed' should be used where.
>
> Just going back to 1.8 is not the way to approach this.
> That just changes one 'somehow broken implementation' (in one definition) with a 'somehow broken implementation' (with a different definition).
> We should define what behavior we really want (where)... and document why we want that behavior there. Until then I don't think we should backport anything.
>
> Both the 1.8.x behavior and the 1.9.x behavior are released... Going back to 1.8.x is not going to fix everybody's usecases.

Okay, well, I agree we should eventually go for a clear specification
and design, and then implement that. But in the meantime we have a
real bug in 1.9 [1] which can cause damage (or in any case "doubtful
changes") to repositories (when an admin performs a dump+load). I
would prefer that we try to fix the dump bug and backport it as soon
as possible (getting us back to a good working state), and then take
time to work out the long term solution.


[1] Issue #4598 "No-op changes no longer dumped by 'svnadmin dump' in
1.9", http://subversion.tigris.org/issues/show_bug.cgi?id=4598

[2] http://svn.haxx.se/dev/archive-2015-09/0290.shtml

[3] http://svn.haxx.se/dev/archive-2015-10/0085.shtml

[4] http://svn.haxx.se/dev/archive-2015-10/0082.shtml

[5] http://svn.haxx.se/dev/archive-2015-09/0292.shtml

-- 
Johan

RE: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Evgeny Kotkov [mailto:evgeny.kotkov@visualsvn.com]
> Sent: maandag 26 oktober 2015 17:45
> To: Bert Huijben <be...@qqmail.nl>; Stefan Fuhrmann
> <st...@wandisco.com>
> Cc: Johan Corveleyn <jc...@gmail.com>; Julian Foad
> <ju...@btopenworld.com>; dev <de...@subversion.apache.org>
> Subject: Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9


> This means that after r1572363 and r1573111, svn_ra_get_file_revs2() and
> svn_repos_get_file_revs2() were skipping some of the "interesting"
> revisions,
> according to the FS API defining the concept.  Moreover, this behavior could
> be inconsistent even within a single function like svn_ra_get_file_revs2()
> that calls svn_ra_get_log2() for old servers, as get-log notices revisions
> with empty deltas.
> 
> I think that it's another example of where r1572363 and r1573111 introduce
> an
> inconsistent and unwanted behavior change.

And 1.9.x assumes that the old behavior is a bug... and in many cases I agree.


This is exactly where the document Julian wrote comes in.

If we wanted 1.9.x to behave in all ways identical to 1.8.x, we wouldn't have created 1.9. We would have never released something different than the old thing. Stefan spend quite some time in improving things, and upto now most users agreed that this was an improvement. (The time to speak up was during the release candidates)


Every new feature or bugfix changes behavior. 
Just 'thinking that this is another inconsistent behavior change' doesn't make a new argument on why this behavior change should be backported to 1.9.x.



I don't think reporting something as changed, when it is clearly not changed is a good thing.

We should decide when we want to see something as 'changed' and what definition of 'changed' should be used where.



Just going back to 1.8 is not the way to approach this.

That just changes one 'somehow broken implementation' (in one definition) with a 'somehow broken implementation' (with a different definition).



We should define what behavior we really want (where)... and document why we want that behavior there. Until then I don't think we should backport anything.


Both the 1.8.x behavior and the 1.9.x behavior are released... Going back to 1.8.x is not going to fix everybody's usecases.


	Bert


Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Bert Huijben <be...@qqmail.nl> writes:

> Noting everything as a change even if there is no actual delta... still
> looks like a bug we should fix (and we did) instead of something that we
> should restore.
>
> For the reasoning on why:
>
> For blame we are really interested if there are deltas... and if there
> aren't any changes the revision is not 'interesting'. (See docs of the api).

The place where we define "interesting" revisions is svn_fs_history_prev2():

  /** Set @a *prev_history_p to an opaque node history object which
   * represents the previous (or "next oldest") interesting history
   * location for the filesystem node represented by @a history, or @c
   * NULL if no such previous history exists.  If @a cross_copies is @c
   * FALSE, also return @c NULL if stepping backwards in history to @a
   * *prev_history_p would cross a filesystem copy operation.
   *
   * @note If this is the first call to svn_fs_history_prev() for the @a
   * history object, it could return a history object whose location is
   * the same as the original.  This will happen if the original
   * location was an interesting one (where the node was modified, or
   * took place in a copy event).  This behavior allows looping callers
   * to avoid the calling svn_fs_history_location() on the object
   * returned by svn_fs_node_history(), and instead go ahead and begin
   * calling svn_fs_history_prev().
   ...

Functions like svn_ra_get_file_revs2() that you mentioned just link to it:

  /**
   * Retrieve a subset of the interesting revisions of a file @a path
   * as seen in revision @a end (see svn_fs_history_prev() for a
   * definition of "interesting revisions").
   ...

The svn_fs_history_prev2() function reports revisions with empty deltas as
interesting, and has been doing that for years (see the attached patch with
a test).  This behavior did not change in 1.9.

This means that after r1572363 and r1573111, svn_ra_get_file_revs2() and
svn_repos_get_file_revs2() were skipping some of the "interesting" revisions,
according to the FS API defining the concept.  Moreover, this behavior could
be inconsistent even within a single function like svn_ra_get_file_revs2()
that calls svn_ra_get_log2() for old servers, as get-log notices revisions
with empty deltas.

I think that it's another example of where r1572363 and r1573111 introduce an
inconsistent and unwanted behavior change.

Stefan Fuhrmann <st...@wandisco.com> writes:

>> As for /trunk, I think that we could do that, so I sketched this option in
>> the attached patch.
>
> The patch looks o.k.

Thanks for giving this patch a look.

I examined the differences between these functions, svn_repos__compare_files()
and svn_fs_contents_different(), and I think that we also have a problem in
FSFS's implementation of the new svn_fs_contents_different() API in 1.9.x.
The implementation relies on representation_t.expanded_size value when doing
the comparison.  If this value is zero, a representation is considered empty,
and two empty representations are considered equal.

The problem is that the expanded_size can be zero for empty files and also for
some of the PLAIN representations, which are not empty.  If I am not mistaken,
we have a workaround for this in /trunk — r1673875 [1] and follow-ups, but not
in Subversion 1.9.  Hence, the new svn_fs_contents_different() API can fail to
distinguish different contents in 1.9.x under the circumstances described in
issue 4554 [2].

So, we can use it in trunk, but this API may produce invalid results in 1.9.x.

I'll commit the patch, and then I am going to prepare the nomination consisting
of r1709388 and the new dump_no_op_change() test.

[1] https://svn.apache.org/r1673875
[2] https://issues.apache.org/jira/browse/SVN-4554


Regards,
Evgeny Kotkov

RE: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by be...@qqmail.nl.
The 'revert' to something like '1.8' also introduces most changes on 1.9.x behavior. If you look at it that way, you have more to review, the more you revert to 1.8.

I'm still not convinced that the 1.8 behavior is really the behavior we want *everywhere*. Julian's great document outlines the different ways we look at this in different scenarios.

Noting everything as a change even if there is no actual delta... still looks like a bug we should fix (and we did) instead of something that we should restore.

For the reasoning on why:
For blame we are really interested if there are deltas... and if there aren't any changes the revision is not 'interesting'. (See docs of the api).

Blame with mergeinfo is just a hack, which has all kinds of assumptions on how closely all merged branches are related. We simply don't store the information to really obtain where the change originated*. The way this code (build as a summer of code project if I remember correct) assumed that every revision was reported as interesting is just one of the many limitations.

Keeping this broken code working is not a goal by itself, that should force us to keep our apis inefficient.

Bert

* I think we once (far before 1.5) had a branch that started to really store what was needed here. Perhaps Julian's current work will improve things later on.

Sent from Mail for Windows 10



From: Evgeny Kotkov
Sent: woensdag 21 oktober 2015 14:22
To: Stefan Fuhrmann
Cc: Johan Corveleyn;Julian Foad;dev
Subject: Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9


Stefan Fuhrmann <st...@wandisco.com> writes:

> Could you at least use the new API in svn_repos__compare_files instead
> of re-implementing parts of the FS back-end but worse? I know this is
> the code as it has been in 1.8 but that does not make the it any better.

Speaking of /branches/1.9.x, I would like to nominate this change as is.
It should be easier to review, because we would be restoring things to a
known previous state, instead of mixing new with old.

As for /trunk, I think that we could do that, so I sketched this option in
the attached patch.  Currently I am not sure that there are no subtle but
important differences between two implementations, so doing this is going
to require a bit more time.  Hopefully, I would be able to sort it out after
we're done with the backport.


Regards,
Evgeny Kotkov



Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Stefan Fuhrmann <st...@wandisco.com> writes:

> Could you at least use the new API in svn_repos__compare_files instead
> of re-implementing parts of the FS back-end but worse? I know this is
> the code as it has been in 1.8 but that does not make the it any better.

Speaking of /branches/1.9.x, I would like to nominate this change as is.
It should be easier to review, because we would be restoring things to a
known previous state, instead of mixing new with old.

As for /trunk, I think that we could do that, so I sketched this option in
the attached patch.  Currently I am not sure that there are no subtle but
important differences between two implementations, so doing this is going
to require a bit more time.  Hopefully, I would be able to sort it out after
we're done with the backport.


Regards,
Evgeny Kotkov

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Mon, Oct 19, 2015 at 1:06 PM, Evgeny Kotkov <ev...@visualsvn.com>
wrote:

> Evgeny Kotkov <ev...@visualsvn.com> writes:
>
> > Irrespectively of what we plan for 1.10, there's a problem in a 1.9 that
> we
> > need to solve.  The associated issues are hard to diagnose and hard to
> > stumble upon, and I don't think that fixing them one by one as we become
> > aware of their existence is a way to go, as opposed to just restoring the
> > stable behavior.  That's why I would like to commit the patch that does
> > that, and backport it to /branches/1.9.x — in order to fix the regression
> > in svnadmin dump, and to be safe against the unknown amount of similar
> > problems.
>
> I adjusted some of the docstrings and committed the patch in r1709388.
>

Could you at least use the new API in svn_repos__compare_files
instead of re-implementing parts of the FS back-end but worse?
I know this is the code as it has been in 1.8 but that does not
make the it any better.

-- Stefan^2.

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Evgeny Kotkov <ev...@visualsvn.com> writes:

> Irrespectively of what we plan for 1.10, there's a problem in a 1.9 that we
> need to solve.  The associated issues are hard to diagnose and hard to
> stumble upon, and I don't think that fixing them one by one as we become
> aware of their existence is a way to go, as opposed to just restoring the
> stable behavior.  That's why I would like to commit the patch that does
> that, and backport it to /branches/1.9.x — in order to fix the regression
> in svnadmin dump, and to be safe against the unknown amount of similar
> problems.

I adjusted some of the docstrings and committed the patch in r1709388.


Regards,
Evgeny Kotkov

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Johan Corveleyn <jc...@gmail.com> writes:

> I don't know if a full revert, or a forward-working fix is the best
> course of action. The immediate concern for me is fixing the
> regression of losing no-op change information in dumps. Perhaps if
> stefan2's work can be fitted in the larger picture of a design such as
> Julian's (and then fix the bugs from that perspective), that might be
> the best way forward. Or perhaps this is too risky to do in a short
> time frame (since we already have the bug in released versions), and
> it's best to rollback and perform the "rework" with a proper design
> for 1.10 (so there is more time to stabilize design and
> implementation)? Dunno.
>
> Would the rollback reintroduce old bugs or unpredictable behavior that
> we know of?

As far as I know, there are no bugs associated with the old code.

On the contrary, the new code has a history of problems.  The corresponding
changesets, r1572363 and r1573111, update every single caller and state that
all of them would benefit from the new strict behavior.  As we now know, at
least two of them don't, and I also do not think that we can foresee the
consequences from changing the behavior of other four, because this low
level change propagates to higher level public functions used by many other
callers.

Irrespectively of what we plan for 1.10, there's a problem in a 1.9 that we
need to solve.  The associated issues are hard to diagnose and hard to stumble
upon, and I don't think that fixing them one by one as we become aware of
their existence is a way to go, as opposed to just restoring the stable
behavior.  That's why I would like to commit the patch that does that, and
backport it to /branches/1.9.x — in order to fix the regression in svnadmin
dump, and to be safe against the unknown amount of similar problems.

The new API would still be there and available for us, in case we require it
somewhere in the future.


Regards,
Evgeny Kotkov

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, Oct 14, 2015 at 6:53 PM, Julian Foad <ju...@btopenworld.com> wrote:
> All,
>
> Please see the email I've just sent, entitled "Changes, Differences,
> and States". I wanted to put down some thoughts and that's the best I
> can do in the time. I hope it may shed a few glimmers of light in this
> area.

Thanks, Julian, that's a great effort. I've quickly read through it,
and the concept of "touched" vs. "modified" (also mentioned by stefan2
earlier in this thread) sounds interesting. As well as the fact that
no-op changes may be relevant for "single state transitions" (such as
reported by log -v, or when dumping), but not interesting /
well-defined when comparing between artibitrary points in history.
That resonates well with me.

> On 14 October 2015 at 16:37, Evgeny Kotkov <ev...@visualsvn.com> wrote:
>> Stefan Fuhrmann <st...@wandisco.com> writes:
>>
>>>> Exactly how does this change make the situation better in practice?  In
>>>> other words, what particular use cases is it supposed to fix?
>>>
>>> Any use-case that depends on this information, IOW all those that you
>>> are afraid may have been broken by the new API.
>>
>> [...]
>>
>>> * non-bubble-up directory representations, e.g. Brane's approach
>>> * "native" support for move / rename possibly not creating a new ID
>>>
>>> I'm not saying that any of these will be implemented in the near future
>>> but those ideas and proposals demonstrate that we cannot assume specifics
>>> of the DAG implementation.
>>
>> So, this change doesn't improve any real use cases in the current code?
>>
>> Overall, I don't see how it's worth it, as there were no user-visible problems
>> associated with the old code.  The new approach, on the contrary, has a history
>> of breaking at least two use cases, and there could be more:
>>
>>  - We were lucky to find and fix svn blame -g a year after the change was
>>    committed.
>>
>>  - Now, a year and a half later, we have an issue with repositories behaving
>>    differently in client-side operations like 'svn log' after dump/load. As
>>    the dump/load is a well-known procedure, I am thinking that it's a serious
>>    problem that could affect many users.
>>
>>  - This could only be a part of the problems we need to deal with, because
>>    the low level behavior change from r1572363 + r1573111 propagates up to
>>    higher levels like svn_repos or svn_ra and alters the behavior of many
>>    different callers like svn_ra_get_file_revs2() or the update reporter.
>>    Third-party API callers could not be ready for it as well, because public
>>    API functions like svn_ra_get_file_revs2() didn't receive the erratum or
>>    a bump.
>>
>> Johan Corveleyn <jc...@gmail.com> writes:
>>
>>> I think the dump.c part of r1572363 and r1573111 should be reverted /
>>> fixed so that we get the previous behaviour again, and this should be
>>> backported to 1.9. At this point, IMO 'svnadmin dump' is broken in
>>> 1.9.
>>
>> I would like to commit the patch that restores 1.8 behavior in all relevant
>> places.  It fixes the svnadmin dump issue and should prevent other similar
>> problems; see the attachment.
>>
>> Please note that the patch doesn't do something about our experimental FSX
>> back end.  As it turns out, the behavior of the svn_fs_contents_changed() and
>> svn_fs_props_changed() in FSX received a change even prior to the introduction
>> of the new API, and happened in r1568600 [1].  Although we could restore the
>> previous behavior for it, I am thinking that we should begin from fixing the
>> problem for the stable FSFS and BDB back ends.
>>
>> Thoughts?
>>
>> [1] https://svn.apache.org/r1568600
>>
>>
>> Regards,
>> Evgeny Kotkov

I don't know if a full revert, or a forward-working fix is the best
course of action. The immediate concern for me is fixing the
regression of losing no-op change information in dumps. Perhaps if
stefan2's work can be fitted in the larger picture of a design such as
Julian's (and then fix the bugs from that perspective), that might be
the best way forward. Or perhaps this is too risky to do in a short
time frame (since we already have the bug in released versions), and
it's best to rollback and perform the "rework" with a proper design
for 1.10 (so there is more time to stabilize design and
implementation)? Dunno.

Would the rollback reintroduce old bugs or unpredictable behavior that
we know of?

-- 
Johan

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Julian Foad <ju...@btopenworld.com>.
All,

Please see the email I've just sent, entitled "Changes, Differences,
and States". I wanted to put down some thoughts and that's the best I
can do in the time. I hope it may shed a few glimmers of light in this
area.

- Julian


On 14 October 2015 at 16:37, Evgeny Kotkov <ev...@visualsvn.com> wrote:
> Stefan Fuhrmann <st...@wandisco.com> writes:
>
>>> Exactly how does this change make the situation better in practice?  In
>>> other words, what particular use cases is it supposed to fix?
>>
>> Any use-case that depends on this information, IOW all those that you
>> are afraid may have been broken by the new API.
>
> [...]
>
>> * non-bubble-up directory representations, e.g. Brane's approach
>> * "native" support for move / rename possibly not creating a new ID
>>
>> I'm not saying that any of these will be implemented in the near future
>> but those ideas and proposals demonstrate that we cannot assume specifics
>> of the DAG implementation.
>
> So, this change doesn't improve any real use cases in the current code?
>
> Overall, I don't see how it's worth it, as there were no user-visible problems
> associated with the old code.  The new approach, on the contrary, has a history
> of breaking at least two use cases, and there could be more:
>
>  - We were lucky to find and fix svn blame -g a year after the change was
>    committed.
>
>  - Now, a year and a half later, we have an issue with repositories behaving
>    differently in client-side operations like 'svn log' after dump/load. As
>    the dump/load is a well-known procedure, I am thinking that it's a serious
>    problem that could affect many users.
>
>  - This could only be a part of the problems we need to deal with, because
>    the low level behavior change from r1572363 + r1573111 propagates up to
>    higher levels like svn_repos or svn_ra and alters the behavior of many
>    different callers like svn_ra_get_file_revs2() or the update reporter.
>    Third-party API callers could not be ready for it as well, because public
>    API functions like svn_ra_get_file_revs2() didn't receive the erratum or
>    a bump.
>
> Johan Corveleyn <jc...@gmail.com> writes:
>
>> I think the dump.c part of r1572363 and r1573111 should be reverted /
>> fixed so that we get the previous behaviour again, and this should be
>> backported to 1.9. At this point, IMO 'svnadmin dump' is broken in
>> 1.9.
>
> I would like to commit the patch that restores 1.8 behavior in all relevant
> places.  It fixes the svnadmin dump issue and should prevent other similar
> problems; see the attachment.
>
> Please note that the patch doesn't do something about our experimental FSX
> back end.  As it turns out, the behavior of the svn_fs_contents_changed() and
> svn_fs_props_changed() in FSX received a change even prior to the introduction
> of the new API, and happened in r1568600 [1].  Although we could restore the
> previous behavior for it, I am thinking that we should begin from fixing the
> problem for the stable FSFS and BDB back ends.
>
> Thoughts?
>
> [1] https://svn.apache.org/r1568600
>
>
> Regards,
> Evgeny Kotkov

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Stefan Fuhrmann <st...@wandisco.com> writes:

>> Exactly how does this change make the situation better in practice?  In
>> other words, what particular use cases is it supposed to fix?
>
> Any use-case that depends on this information, IOW all those that you
> are afraid may have been broken by the new API.

[...]

> * non-bubble-up directory representations, e.g. Brane's approach
> * "native" support for move / rename possibly not creating a new ID
>
> I'm not saying that any of these will be implemented in the near future
> but those ideas and proposals demonstrate that we cannot assume specifics
> of the DAG implementation.

So, this change doesn't improve any real use cases in the current code?

Overall, I don't see how it's worth it, as there were no user-visible problems
associated with the old code.  The new approach, on the contrary, has a history
of breaking at least two use cases, and there could be more:

 - We were lucky to find and fix svn blame -g a year after the change was
   committed.

 - Now, a year and a half later, we have an issue with repositories behaving
   differently in client-side operations like 'svn log' after dump/load. As
   the dump/load is a well-known procedure, I am thinking that it's a serious
   problem that could affect many users.

 - This could only be a part of the problems we need to deal with, because
   the low level behavior change from r1572363 + r1573111 propagates up to
   higher levels like svn_repos or svn_ra and alters the behavior of many
   different callers like svn_ra_get_file_revs2() or the update reporter.
   Third-party API callers could not be ready for it as well, because public
   API functions like svn_ra_get_file_revs2() didn't receive the erratum or
   a bump.

Johan Corveleyn <jc...@gmail.com> writes:

> I think the dump.c part of r1572363 and r1573111 should be reverted /
> fixed so that we get the previous behaviour again, and this should be
> backported to 1.9. At this point, IMO 'svnadmin dump' is broken in
> 1.9.

I would like to commit the patch that restores 1.8 behavior in all relevant
places.  It fixes the svnadmin dump issue and should prevent other similar
problems; see the attachment.

Please note that the patch doesn't do something about our experimental FSX
back end.  As it turns out, the behavior of the svn_fs_contents_changed() and
svn_fs_props_changed() in FSX received a change even prior to the introduction
of the new API, and happened in r1568600 [1].  Although we could restore the
previous behavior for it, I am thinking that we should begin from fixing the
problem for the stable FSFS and BDB back ends.

Thoughts?

[1] https://svn.apache.org/r1568600


Regards,
Evgeny Kotkov

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Thu, Oct 8, 2015 at 1:07 PM, Evgeny Kotkov <ev...@visualsvn.com>
wrote:

> Stefan Fuhrmann <st...@wandisco.com> writes:
>
> > This is not about performance, it is about API guarantees and functional
> > stability. So, yes that is an optimization in the sense of "making it
> better
> > than before". And no, we should not go back to worse unless there is no
> other
> > practical way.
>
> Exactly how does this change make the situation better in practice?  In
> other
> words, what particular use cases is it supposed to fix?
>

Any use-case that depends on this information, IOW all
those that you are afraid may have been broken by the
new API.

Decoupling API behaviour from implementation details is
about preventing future breakage in the callers of that API.
Two proposed backend changes come to my mind that
may affect the conditions under which new noderevs will
be created:

* non-bubble-up directory representations, e.g. Brane's approach
* "native" support for move / rename possibly not creating a new ID

I'm not saying that any of these will be implemented in
the near future but those ideas and proposals demonstrate
that we cannot assume specifics of the DAG implementation.


> I see it as an abstract change that alters the behavior of many different
> calling sites.  Given that the practical benefits are unknown, I think that
> we should restore the known stable behavior, and avoid problems coming
> from various places.
>

As shown above, the behaviour is not know stable. In fact,
SVN 1.5 had to jump through loops to mimic the original
behaviour by adding "uniquifiers" to shared representations.
If it was a mere theoretical issue, I wouldn't have bothered
addressing it.

-- Stefan^2.

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Stefan Fuhrmann <st...@wandisco.com> writes:

> This is not about performance, it is about API guarantees and functional
> stability. So, yes that is an optimization in the sense of "making it better
> than before". And no, we should not go back to worse unless there is no other
> practical way.

Exactly how does this change make the situation better in practice?  In other
words, what particular use cases is it supposed to fix?

I see it as an abstract change that alters the behavior of many different
calling sites.  Given that the practical benefits are unknown, I think that
we should restore the known stable behavior, and avoid problems coming
from various places.


Regards,
Evgeny Kotkov

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Sun, Oct 4, 2015 at 4:28 PM, Evgeny Kotkov <ev...@visualsvn.com>
wrote:

> Stefan Fuhrmann <st...@wandisco.com> writes:
>
> > Right now, we are losing information. In rare cases and probably nothing
> too
> > important but still. This aspect makes me consider the current behaviour
> a
> > bug. Whether creating that situation in the first place was o.k. nor not
> is
> > a separate issue.
>
> To my mind, the current behavior is a regression, because:
>
> - svnadmin dump produces different dumps for the same repository, depending
>   on the version (1.8 or 1.9)
>

That is not a problem because we don't guarantee identical
dump files between releases. And only since 1.8 (?) do we
even create reproducible dump files for a given release.


> - svnadmin dump and svnrdump dump in 1.9 produce different dumps for the
>   same repository
>

Not sure whether that is new to 1.9 but when it is, it needs
to be fixed.


> - After a dump-load sequence with svnadmin from 1.9, the resulting
> repository
>   behaves differently from the original in operations like 'svn log'
>

Different "behaviour" is expected: I will most likely see
different transaction IDs, different rep sharing, different
pack status etc.

Ideally, none if these would change the output of client-
side operations, but I would settle for "no information
lost".


> I committed a failing test in r1706428, and I also think that we have quite
> a problem to solve — see below.
>

Thanks for the test case. My patch should fix that. I
agree that the broader scope you are asking for new
feature that will require some work.


> [...]
>
> > I suggest that we apply the patch below. It is more efficient that the
> > original behaviour but ensures that "no-op" changes will be retained.
>
> [...]
>
> > V2 of the patch also covers no-op prop changes to directories.
>
> Although the V2 patch seems to mitigate the problem in one particular case,
> it does not solve it generally.  For instance, here is another scenario
> where
> 1.8 and 1.9 binaries produce different dumps, even with the patch applied:
>
>   svnmucc cp 1 bar baz put empty-file baz
>
>   Node-copyfrom-path: bar
>   Text-copy-source-md5: d41d8cd98f00b204e9800998ecf8427e
>   Text-copy-source-sha1: da39a3ee5e6b4b0d3255bfef95601890afd80709
>  -Text-content-length: 0
>  -Text-content-md5: d41d8cd98f00b204e9800998ecf8427e
>  -Text-content-sha1: da39a3ee5e6b4b0d3255bfef95601890afd80709
>  -Content-length: 0
>

What information is being lost here? That change should show
up as added-with-history. It simply no longer claims to also
change the file contents.

You are basically asking here for a new "T" ("touched") state
alongside the traditional "M". Subversion does not have that
concept formally, yet, and any attempt to mimic it relies heavily
on implementation details of your respective repository backend.

Sadly, some of our logic in the higher layers either tried to exploit
knowledge of the repo's internal workings or simply did not care
to specify whether "touched" is a subset of "modified" or not.
As a result, underspecified APIs leak unspecified behaviour up
to the client.

Introducing "T" consistently will require some work. Most of it
should be moderately easy. Two areas are problematic, though:

* Does a date change in the working copy constitute a "touch"?
  That would break many people's work flows. So, 'svn touch'
  would probably be an explicit sub-command.

* Under what circumstances would a merge produce "T"?
  That leads directly to the question of what the semantics
  of touch would be plus a problem similar to "T" in working
  copies.


> I examined the history of fixing problems caused by r1572363 + r1573111,
> and
> this is the second time when we find ourselves whack-a-moling the calling
> sites
> of svn_fs_props_different() and svn_fs_contents_different().  Previously,
> we
> had issues with misbehaving 'svn blame -g' [1].  This ended up with placing
> a hack in subversion/libsvn_repos/rev_hunt.c, and with rather questionable
> compatibility promises for svn_ra_get_file_revs2():
>
> [[[
>   else if (sb->include_merged_revisions
>            && strcmp(sb->last_path, path_rev->path))
>     {
>       /* ### This is a HACK!!!
>        * Blame -g, in older clients anyways, relies on getting a
> notification
>        * whenever the path changes - even if there was no content change.
>        *
>        * TODO: A future release should take an extra parameter and
> depending
>        * on that either always send a text delta or only send it if there
>        * is a difference. */
>       contents_changed = TRUE;
>     }
>
>  ...
>
>  * @note Prior to Subversion 1.9, this function may request delta handlers
>  * from @a handler even for empty text deltas.  Starting with 1.9, the
>  * delta handler / baton return arguments passed to @a handler will be
>  * #NULL unless there is an actual difference in the file contents between
>  * the current and the previous call.
>  *
>  * @since New in 1.5.
>  */
> svn_error_t *
> svn_ra_get_file_revs2(svn_ra_session_t *session,
>                       const char *path,
>                       ...
>
> ]]]
>
> The r1572363 + r1573111 patchset implicitly changes behavior of around six
> different callers, and two of them have already backfired with problems.
> Moreover, apart from switching everything to the new API, these revisions
> change the behavior of *existing* API like svn_fs_contents_changed().  We
> have an erratum describing the change [2], but I doubt that API users can
> properly adjust their code, if they notice this change — because even we
> failed
> to do so.
>

'svn blame -g' is a prime example of exploiting unspecified
FS API behaviour. 1.9 does fixes that aspect, while leaving
'svn blame -g' in it's "approximate - at best" state. IOW, we
fixed a layering violation and had to add workarounds for
old clients that relied on it.


> Finally, I cannot understand the practical benefits from doing this.  If
> these
> changes are aimed at making the checks more strict, but break existing
> code,
> I don't think we should be doing this.  If this is an optimization, I'd
> favor
> correctness over it.
>

This is not about performance, it is about API guarantees
and functional stability. So, yes that is an optimization in
the sense of "making it better than before". And no, we
should not go back to worse unless there is no other
practical way.


> I propose that, instead of patching the callers in response to problems, we
> restore every relevant part to its 1.8.x state, where they're known to be
> stable
> and work.


It is the *callers* who are the problem and the explicit API
of 1.9 simply exposes that fact. Therefore, fixing the callers
is the right thing to do. To get fully behaviour, we need to
decide upon "M" vs. "T", though. IMO, this is beyond the
scope of this thread.


> This would allow us to remove the hacks like the one associated with
> fixing svn blame -g, would restore the proper behavior of svnadmin dump,
> and
> would also allow us not to deal with similar problems in other places, if
> they
> exist.
>
> How does this sound?
>

Well-intended but, in this case, terrible.

-- Stefan^2.

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Stefan Fuhrmann <st...@wandisco.com> writes:

> Right now, we are losing information. In rare cases and probably nothing too
> important but still. This aspect makes me consider the current behaviour a
> bug. Whether creating that situation in the first place was o.k. nor not is
> a separate issue.

To my mind, the current behavior is a regression, because:

- svnadmin dump produces different dumps for the same repository, depending
  on the version (1.8 or 1.9)
- svnadmin dump and svnrdump dump in 1.9 produce different dumps for the
  same repository
- After a dump-load sequence with svnadmin from 1.9, the resulting repository
  behaves differently from the original in operations like 'svn log'

I committed a failing test in r1706428, and I also think that we have quite
a problem to solve — see below.

[...]

> I suggest that we apply the patch below. It is more efficient that the
> original behaviour but ensures that "no-op" changes will be retained.

[...]

> V2 of the patch also covers no-op prop changes to directories.

Although the V2 patch seems to mitigate the problem in one particular case,
it does not solve it generally.  For instance, here is another scenario where
1.8 and 1.9 binaries produce different dumps, even with the patch applied:

  svnmucc cp 1 bar baz put empty-file baz

  Node-copyfrom-path: bar
  Text-copy-source-md5: d41d8cd98f00b204e9800998ecf8427e
  Text-copy-source-sha1: da39a3ee5e6b4b0d3255bfef95601890afd80709
 -Text-content-length: 0
 -Text-content-md5: d41d8cd98f00b204e9800998ecf8427e
 -Text-content-sha1: da39a3ee5e6b4b0d3255bfef95601890afd80709
 -Content-length: 0

I examined the history of fixing problems caused by r1572363 + r1573111, and
this is the second time when we find ourselves whack-a-moling the calling sites
of svn_fs_props_different() and svn_fs_contents_different().  Previously, we
had issues with misbehaving 'svn blame -g' [1].  This ended up with placing
a hack in subversion/libsvn_repos/rev_hunt.c, and with rather questionable
compatibility promises for svn_ra_get_file_revs2():

[[[
  else if (sb->include_merged_revisions
           && strcmp(sb->last_path, path_rev->path))
    {
      /* ### This is a HACK!!!
       * Blame -g, in older clients anyways, relies on getting a notification
       * whenever the path changes - even if there was no content change.
       *
       * TODO: A future release should take an extra parameter and depending
       * on that either always send a text delta or only send it if there
       * is a difference. */
      contents_changed = TRUE;
    }

 ...

 * @note Prior to Subversion 1.9, this function may request delta handlers
 * from @a handler even for empty text deltas.  Starting with 1.9, the
 * delta handler / baton return arguments passed to @a handler will be
 * #NULL unless there is an actual difference in the file contents between
 * the current and the previous call.
 *
 * @since New in 1.5.
 */
svn_error_t *
svn_ra_get_file_revs2(svn_ra_session_t *session,
                      const char *path,
                      ...

]]]

The r1572363 + r1573111 patchset implicitly changes behavior of around six
different callers, and two of them have already backfired with problems.
Moreover, apart from switching everything to the new API, these revisions
change the behavior of *existing* API like svn_fs_contents_changed().  We
have an erratum describing the change [2], but I doubt that API users can
properly adjust their code, if they notice this change — because even we failed
to do so.

Finally, I cannot understand the practical benefits from doing this.  If these
changes are aimed at making the checks more strict, but break existing code,
I don't think we should be doing this.  If this is an optimization, I'd favor
correctness over it.

I propose that, instead of patching the callers in response to problems, we
restore every relevant part to its 1.8.x state, where they're known to be stable
and work.  This would allow us to remove the hacks like the one associated with
fixing svn blame -g, would restore the proper behavior of svnadmin dump, and
would also allow us not to deal with similar problems in other places, if they
exist.

How does this sound?

[1] http://svn.haxx.se/dev/archive-2015-06/0069.shtml
[2] http://svn.apache.org/repos/asf/subversion/trunk/notes/api-errata/1.9/fs001.txt


Regards,
Evgeny Kotkov

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Johan Corveleyn <jc...@gmail.com>.
On Tue, Sep 29, 2015 at 1:47 PM, Stefan Fuhrmann
<st...@wandisco.com> wrote:
> On Mon, Sep 28, 2015 at 10:55 AM, Stefan Fuhrmann
> <st...@wandisco.com> wrote:
>>
>> On Wed, Sep 23, 2015 at 12:55 PM, Julian Foad <ju...@btopenworld.com>
>> wrote:
>>>
>>> > Johan Corveleyn wrote:
>>> >> [...] stefan2 told me in person that that part of the
>>> >> change in r1572363 was unintentional :-). IIUC, he didn't realize that
>>> >> it would have this effect on the output of dump.
>>> [...]
>>> >>>> I think the dump.c part of r1572363 and r1573111 should be reverted
>>> >>>> /
>>> >>>> fixed so that we get the previous behaviour again, and this should
>>> >>>> be
>>> >>>> backported to 1.9. At this point, IMO 'svnadmin dump' is broken in
>>> >>>> 1.9.
>>>
>>> 1. I pretty much agree now that we should revert the change.
>>
>>
>> I suggest that we apply the patch below. It is more efficient
>> that the original behaviour but ensures that "no-op" changes
>> will be retained.
>
>
> Now that I've read through the whole thread, here my view
> on this problem plus an explanation why I think my patch is
> the right approach.
>
> First some facts and things that I believe:
>
> * Internally, FSFS can record no-op prop and no-op content
>   changes (the delta between new representation and the
>   respective proplist / text of its predecessor is empty). There
>   is even the possibility of a no-op node change, i.e. a new
>   node without touching text nor props (not happening atm).
> * SVN versions file trees and the decision how to represent
>   them (e.g. as delta) is an implementation detail.
> * Relying on implementation details of a lower layer is bad.
>
> * Via FS and Repos layer API, new instances of no-op changes
>   to either text and properties can be created.
> * Dump files (produced by 3rd party tools or non-incremental
>   svnadmin dumps) are another source of those no-op changes.
> * SVN editors don't produce no-op changes.
>
> * A commit links user-controlled revision meta data (e.g. comment)
>   to a user-controlled list of changes. That link shall not be
>   lost when tweak implementation details.
> * Dump / load is a compatibility interface between SVN versions
>   and allows for forward and backward migration.
> * A plain node modification (no text nor prop change) will be
>   ignored by "svnadmin load". Even if we changed that on /trunk,
>   it would not fix old releases.
>
> From that, we can derive a few requirements:
>
> * If the a revision contains a "M" change, svnadmin must produce
>   a dump file that causes previous releases to produce the "M"
>   change - within reason.
> * Load shall produce no-op changes for the same data as in previous
>   releases - within reason. If the FS API would no longer support
>   creating no-op changes at some point in the future, the load process
>   should issue a warning.
> * The dump file contents shall be independent of FS implementation
>   details, i.e. use "strict" FS API functions.
>
> No-op changes are rare, hence including them into a dump file
> does not significantly increase its size or processing time. The
> text / proplist asymmetry in the patch is due to the fact that all
> files due have a text but maybe no proplist representation. So,
> dumping the text is always safe.
>
> V2 of the patch also covers no-op prop changes to directories.

I haven't reviewed the patch, but I agree with all of your points above.

Thanks for taking this on (both you and Julian, and others
participating in the discussion).

-- 
Johan

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Mon, Sep 28, 2015 at 10:55 AM, Stefan Fuhrmann <
stefan.fuhrmann@wandisco.com> wrote:

> On Wed, Sep 23, 2015 at 12:55 PM, Julian Foad <ju...@btopenworld.com>
> wrote:
>
>> > Johan Corveleyn wrote:
>> >> [...] stefan2 told me in person that that part of the
>> >> change in r1572363 was unintentional :-). IIUC, he didn't realize that
>> >> it would have this effect on the output of dump.
>> [...]
>> >>>> I think the dump.c part of r1572363 and r1573111 should be reverted /
>> >>>> fixed so that we get the previous behaviour again, and this should be
>> >>>> backported to 1.9. At this point, IMO 'svnadmin dump' is broken in
>> >>>> 1.9.
>>
>> 1. I pretty much agree now that we should revert the change.
>>
>
> I suggest that we apply the patch below. It is more efficient
> that the original behaviour but ensures that "no-op" changes
> will be retained.
>

Now that I've read through the whole thread, here my view
on this problem plus an explanation why I think my patch is
the right approach.

First some facts and things that I believe:

* Internally, FSFS can record no-op prop and no-op content
  changes (the delta between new representation and the
  respective proplist / text of its predecessor is empty). There
  is even the possibility of a no-op node change, i.e. a new
  node without touching text nor props (not happening atm).
* SVN versions file trees and the decision how to represent
  them (e.g. as delta) is an implementation detail.
* Relying on implementation details of a lower layer is bad.

* Via FS and Repos layer API, new instances of no-op changes
  to either text and properties can be created.
* Dump files (produced by 3rd party tools or non-incremental
  svnadmin dumps) are another source of those no-op changes.
* SVN editors don't produce no-op changes.

* A commit links user-controlled revision meta data (e.g. comment)
  to a user-controlled list of changes. That link shall not be
  lost when tweak implementation details.
* Dump / load is a compatibility interface between SVN versions
  and allows for forward and backward migration.
* A plain node modification (no text nor prop change) will be
  ignored by "svnadmin load". Even if we changed that on /trunk,
  it would not fix old releases.

>From that, we can derive a few requirements:

* If the a revision contains a "M" change, svnadmin must produce
  a dump file that causes previous releases to produce the "M"
  change - within reason.
* Load shall produce no-op changes for the same data as in previous
  releases - within reason. If the FS API would no longer support
  creating no-op changes at some point in the future, the load process
  should issue a warning.
* The dump file contents shall be independent of FS implementation
  details, i.e. use "strict" FS API functions.

No-op changes are rare, hence including them into a dump file
does not significantly increase its size or processing time. The
text / proplist asymmetry in the patch is due to the fact that all
files due have a text but maybe no proplist representation. So,
dumping the text is always safe.

V2 of the patch also covers no-op prop changes to directories.

-- Stefan^2.

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Wed, Sep 23, 2015 at 12:55 PM, Julian Foad <ju...@btopenworld.com>
wrote:

> > Johan Corveleyn wrote:
> >> [...] stefan2 told me in person that that part of the
> >> change in r1572363 was unintentional :-). IIUC, he didn't realize that
> >> it would have this effect on the output of dump.
> [...]
> >>>> I think the dump.c part of r1572363 and r1573111 should be reverted /
> >>>> fixed so that we get the previous behaviour again, and this should be
> >>>> backported to 1.9. At this point, IMO 'svnadmin dump' is broken in
> >>>> 1.9.
>
> 1. I pretty much agree now that we should revert the change.
>

I suggest that we apply the patch below. It is more efficient
that the original behaviour but ensures that "no-op" changes
will be retained.

-- Stefan^2.

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Julian Foad <ju...@btopenworld.com>.
I (Julian Foad) wrote:
> Issue #4598 "No-op changes no longer dumped by 'svnadmin dump' in
> 1.9", http://subversion.tigris.org/issues/show_bug.cgi?id=4598
>
> Added to the release notes in http://svn.apache.org/r1704822 :
> file:///home/julianfoad/src/svn-site/publish/docs/release-notes/1.9.html#no-op-changes

For one line of attack, I started constructing some test cases to
explore the dump behaviour. This is quite a hard direction to start
from, because I don't know what internal details may affect the
output.

For another line of attack, I looked into the source code. The logic
in 1.8.x for 'svnadmin dump' is:

Write a node record when the appropriate 'revision-replay' function
calls the dump editor's open_file() or add_file() method for a file,
or its change_dir_prop() method for a dir. The 'revision-replay'
function is svn_repos_replay2() in most cases, or
svn_repos_dir_delta2() when dumping an initial non-incremental rev.

svn_repos_replay2() calls those methods in a manner close to the
following pseudo-Python:

def svn_repos_replay2():
  for (path, change) in svn_fs_paths_changed2():  # via svn_delta_path_driver2()
    if change.node_kind is 'file':
      add_file() or open_file()
    if change.prop_mod:
      if path is being reported as copied:
        for pc in svn_prop_diffs(new_props, old_props):
          change_*_prop(pc.name, pc.value)
      else:
        change_*_prop(dummy parameters)  # call just once

Back to the dumper. Within a node record, output a properties section
iff svn_fs_fs__noderev_same_rep_key() returns true for the properties
representations, and output a text section iff
svn_fs_fs__noderev_same_rep_key() returns true for the text
representations.

svn_boolean_t
svn_fs_fs__noderev_same_rep_key(representation_t *a,
                                representation_t *b)
{
  if (a == b)
    return TRUE;

  if (a == NULL || b == NULL)
    return FALSE;

  if (a->offset != b->offset)
    return FALSE;

  if (a->revision != b->revision)
    return FALSE;

  if (a->uniquifier == b->uniquifier)
    return TRUE;

  if (a->uniquifier == NULL || b->uniquifier == NULL)
    return FALSE;

  return strcmp(a->uniquifier, b->uniquifier) == 0;
}

(For BDB, svn_fs_base__things_different() calls
svn_fs_base__same_keys() to compare the props rep key, text rep key
text rep uniquifier.)

The function svn_fs_fs__noderev_same_rep_key() does not exist in
1.9.{0...2}, but we could recreate a suitable version of it. (The
.uniquifier member has changed form since 1.8.)

The FS layer APIs are:
  svn_fs_contents_changed()
  svn_fs_props_changed()

In 1.9 the implementations of these no longer compare the rep-keys but
use a different strategy, implemented in
svn_fs_fs__dag_things_different() which returns 'false positive'
outputs in fewer and different cases.

If our aim is to restore exact compatibility with 1.8 we'll have to be
careful to put the exact same logic back.

I'll continue examining the code paths.


One more issue I notice in svn_fs_fs__dag_things_different() is even
in 'strict' mode it reports that two strings are equal if their MD5
checksums are equal. Is this intentional, and is it consistent with
the rest of the Subversion system? The test 'compare_contents()' in
fs-test.c expects this behaviour, but I wonder if that is an
oversight?

- Julian

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Julian Foad <ju...@btopenworld.com>.
> Brane wrote:
>> I also suggest adding a note to
>> http://subversion.apache.org/docs/release-notes/1.9.html#issues .
>
> And we need to file an issue.
>
> I'll do both of those things (issue and rel-notes) now.

Issue #4598 "No-op changes no longer dumped by 'svnadmin dump' in
1.9", http://subversion.tigris.org/issues/show_bug.cgi?id=4598

Added to the release notes in http://svn.apache.org/r1704822 :
file:///home/julianfoad/src/svn-site/publish/docs/release-notes/1.9.html#no-op-changes

- Julian

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Julian Foad <ju...@btopenworld.com>.
> Johan Corveleyn wrote:
>> [...] stefan2 told me in person that that part of the
>> change in r1572363 was unintentional :-). IIUC, he didn't realize that
>> it would have this effect on the output of dump.
[...]
>>>> I think the dump.c part of r1572363 and r1573111 should be reverted /
>>>> fixed so that we get the previous behaviour again, and this should be
>>>> backported to 1.9. At this point, IMO 'svnadmin dump' is broken in
>>>> 1.9.

1. I pretty much agree now that we should revert the change.

We understand now that the change in 1.9.0 was unintentional, and
after we analyse the situation, we are very unlikely to conclude that
that change was a complete bug fix to the whole issue of no-op
changes. It is surprising and is regarded as losing information, and
is not justified (yet) by some higher purpose.

It seems fairly clear what the change was, and so how to revert it.

2. We might also want to make another change to the behaviour of
'svnadmin load', so that the result of loading a dump file that people
have *already* produced using 1.9.0-1.9.2 will be the same as if they
had dumped and loaded using 1.8.x. I don't yet understand the details
enough to know whether this option is possible.

3. I firmly believe that our handling of 'no-op changes' is mistaken
and a bad idea. I'll explain that, but not in this thread -- that's a
follow-on task.

Brane wrote:
> I also suggest adding a note to
> http://subversion.apache.org/docs/release-notes/1.9.html#issues .

And we need to file an issue.

I'll do both of those things (issue and rel-notes) now.

- Julian

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Branko Čibej <br...@apache.org>.
On 23.09.2015 11:58, Johan Corveleyn wrote:
> On Wed, Sep 23, 2015 at 11:44 AM, Branko Čibej <br...@apache.org> wrote:
>> On 23.09.2015 11:03, Johan Corveleyn wrote:
>>> TBH, I'm not really interested in having a really fundamental
>>> discussion about this (but feel free to drive that of course). What I
>>> am interested in, is that we have a regression, and that 'dump' is
>>> losing information (namely, not dumping correctly null-text-changes).
>> Well, without a "really fundamental" discussion you can't really decide
>> if it's a regression or a bug fix.
> Yes I can, because stefan2 told me in person that that part of the
> change in r1572363 was unintentional :-). IIUC, he didn't realize that
> it would have this effect on the output of dump.

Hearsay! :)

>>> I think the dump.c part of r1572363 and r1573111 should be reverted /
>>> fixed so that we get the previous behaviour again, and this should be
>>> backported to 1.9. At this point, IMO 'svnadmin dump' is broken in
>>> 1.9.
>> What we really should determine is whether recording no-op node changes
>> in the repository was intentional or just an unfortunate site-effect of
>> something or other. I'm leaning towards the latter, which would make the
>> 1.9 change in 'svnadmin dump' a bugfix.
>>
>> The point is that Subversion is supposed to track changes to a tree of
>> nodes (directories and files). Unlike empty revisions, no-op changes
>> have no useful value for either working copies or repository history.
> For repository history they do have useful value: the revision's log
> message might be very valuable, and now it's "attached" to that path's
> history (i.e. it's returned by 'svn log path'). By dropping the
> null-text-change, we drop the log message from that path's history.
>
> The relation between the log message and that particular path might be
> valuable / useful to someone.

See, we're already having a really fundamental discussion about what it
all means. Your point about log messages is well made. I hadn't thought
about that aspect.

Since you're a committer ... go ahead and make the fix and propose the
backport. I'm sure people will complain if they don't like it.

I also suggest adding a note to
http://subversion.apache.org/docs/release-notes/1.9.html#issues .

-- Brane


Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, Sep 23, 2015 at 11:44 AM, Branko Čibej <br...@apache.org> wrote:
> On 23.09.2015 11:03, Johan Corveleyn wrote:
>> TBH, I'm not really interested in having a really fundamental
>> discussion about this (but feel free to drive that of course). What I
>> am interested in, is that we have a regression, and that 'dump' is
>> losing information (namely, not dumping correctly null-text-changes).
>
> Well, without a "really fundamental" discussion you can't really decide
> if it's a regression or a bug fix.

Yes I can, because stefan2 told me in person that that part of the
change in r1572363 was unintentional :-). IIUC, he didn't realize that
it would have this effect on the output of dump.

>> I think the dump.c part of r1572363 and r1573111 should be reverted /
>> fixed so that we get the previous behaviour again, and this should be
>> backported to 1.9. At this point, IMO 'svnadmin dump' is broken in
>> 1.9.
>
> What we really should determine is whether recording no-op node changes
> in the repository was intentional or just an unfortunate site-effect of
> something or other. I'm leaning towards the latter, which would make the
> 1.9 change in 'svnadmin dump' a bugfix.
>
> The point is that Subversion is supposed to track changes to a tree of
> nodes (directories and files). Unlike empty revisions, no-op changes
> have no useful value for either working copies or repository history.

For repository history they do have useful value: the revision's log
message might be very valuable, and now it's "attached" to that path's
history (i.e. it's returned by 'svn log path'). By dropping the
null-text-change, we drop the log message from that path's history.

The relation between the log message and that particular path might be
valuable / useful to someone.

>> Just one small addition on the fundamental part: I still think (like
>> we discussed during breakfast last Friday in Berlin :-)) there is no
>> problem in having / preserving / exposing null-text-changes to paths
>> in Subversion. I think we have to, for backwards compatibility, and I
>> see no problem in doing so. And to answer the last question you posed
>> me last Friday: the reverse diff of a null-text-changed-path is
>> identical to the forward diff: a diff with no changed lines :-).
>
> There is no problem with recording such changes, but the real question
> is whether they have any useful semantics. Backwards compatibility does
> not mean hanging on to every silly decision we've ever made as if our
> (or our users') lives depended on it.
>
>
> We've already made (IMO) more questionable changes; for example,
> forbidding control characters in file names was in no small way driven
> by the fact that our dump format can't represent them. So instead of
> fixing the dump format we decided to break old repositories. That's a
> far more serious change than dropping no-op node changes from the dump file.

Okay, but in that case we did that for a clear reason: it broke the
dump format. What would be the compelling reason here to break
backward compatibility and possibly break some people's use of the
repository history?

-- 
Johan

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Branko Čibej <br...@apache.org>.
On 23.09.2015 11:03, Johan Corveleyn wrote:
> TBH, I'm not really interested in having a really fundamental
> discussion about this (but feel free to drive that of course). What I
> am interested in, is that we have a regression, and that 'dump' is
> losing information (namely, not dumping correctly null-text-changes).

Well, without a "really fundamental" discussion you can't really decide
if it's a regression or a bug fix.

> I think the dump.c part of r1572363 and r1573111 should be reverted /
> fixed so that we get the previous behaviour again, and this should be
> backported to 1.9. At this point, IMO 'svnadmin dump' is broken in
> 1.9.

What we really should determine is whether recording no-op node changes
in the repository was intentional or just an unfortunate site-effect of
something or other. I'm leaning towards the latter, which would make the
1.9 change in 'svnadmin dump' a bugfix.

The point is that Subversion is supposed to track changes to a tree of
nodes (directories and files). Unlike empty revisions, no-op changes
have no useful value for either working copies or repository history.

> Just one small addition on the fundamental part: I still think (like
> we discussed during breakfast last Friday in Berlin :-)) there is no
> problem in having / preserving / exposing null-text-changes to paths
> in Subversion. I think we have to, for backwards compatibility, and I
> see no problem in doing so. And to answer the last question you posed
> me last Friday: the reverse diff of a null-text-changed-path is
> identical to the forward diff: a diff with no changed lines :-).

There is no problem with recording such changes, but the real question
is whether they have any useful semantics. Backwards compatibility does
not mean hanging on to every silly decision we've ever made as if our
(or our users') lives depended on it.


We've already made (IMO) more questionable changes; for example,
forbidding control characters in file names was in no small way driven
by the fact that our dump format can't represent them. So instead of
fixing the dump format we decided to break old repositories. That's a
far more serious change than dropping no-op node changes from the dump file.

-- Brane

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Johan Corveleyn wrote on Wed, Sep 23, 2015 at 11:03:43 +0200:
> Just one small addition on the fundamental part: I still think (like
> we discussed during breakfast last Friday in Berlin :-)) there is no
> problem in having / preserving / exposing null-text-changes to paths
> in Subversion.

I tend to agree: in general, allowing the vacuous case tends to be
useful.  Johan's example of hanging a non-empty log message on
a revision that lacks a non-empty text-delta is a perfect example of
that.

Another example is creating padding revisions.  When we migrated to ASF,
we had to incur a revision numbers offset of 840074.  It would have been
reasonable to create 26 padding revisions to make the offset a round
number.  Those revisions wouldn't have needed any content.  If
Subversion forbade revisions with no changed-paths, we would have had to
create dummy changes — but, at that point, we would be working around
a limitation of the tool.

Cheers,

Daniel


Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Johan Corveleyn <jc...@gmail.com>.
On Tue, Sep 22, 2015 at 12:54 PM, Julian Foad
<ju...@btopenworld.com> wrote:
> Daniel Shahaf wrote:
>> Johan Corveleyn wrote:
>>> [...], revision N-1 contains a real
>>> change, but only a short log message; and revision N has a no-op
>>> change to that same path, and a very informative log message [...]
>>
>> The FreeBSD project used to intentionally make no-op commits (they term it
>> "forced commits") as part of their new committer workflow.  I don't know
>> whether they still do that.
>
> We need to be careful with terminology. We're not talking about a
> no-op commit, we're talking about a path that is marked as 'changed'
> within a commit, but whose content did not change. (The same commit
> might or might not also contain other paths that have real changes.)

OK, let's call that a null-text-change to that path. So, the path is
part of the changed paths, but has a null-text-change. Or something
like that?

> In fact, I think one of the first things we need to do is a precise
> analysis of the issue:
>
>   * What exactly are the existing possible forms of 'no-op change'
> that any part of Subversion can represent?
>     - Text-change?
>     - Props-change?
>     - Whole-node-change?
>     - Commit? (not so interesting in the current issue)
>     - Only certain combinations of those?
>
>   * At which APIs can each those changes be (a) made and (b) seen?
>     - FS API?
>     - Repos API?
>     - RA and client-side APIs?
>     - svnadmin dump/load?
>     - svnrdump dump/load?
>     - svnsync?

TBH, I'm not really interested in having a really fundamental
discussion about this (but feel free to drive that of course). What I
am interested in, is that we have a regression, and that 'dump' is
losing information (namely, not dumping correctly null-text-changes).

I think the dump.c part of r1572363 and r1573111 should be reverted /
fixed so that we get the previous behaviour again, and this should be
backported to 1.9. At this point, IMO 'svnadmin dump' is broken in
1.9.

Just one small addition on the fundamental part: I still think (like
we discussed during breakfast last Friday in Berlin :-)) there is no
problem in having / preserving / exposing null-text-changes to paths
in Subversion. I think we have to, for backwards compatibility, and I
see no problem in doing so. And to answer the last question you posed
me last Friday: the reverse diff of a null-text-changed-path is
identical to the forward diff: a diff with no changed lines :-).

-- 
Johan

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Julian Foad <ju...@btopenworld.com>.
Daniel Shahaf wrote:
> Johan Corveleyn wrote:
>> [...], revision N-1 contains a real
>> change, but only a short log message; and revision N has a no-op
>> change to that same path, and a very informative log message [...]
>
> The FreeBSD project used to intentionally make no-op commits (they term it
> "forced commits") as part of their new committer workflow.  I don't know
> whether they still do that.

We need to be careful with terminology. We're not talking about a
no-op commit, we're talking about a path that is marked as 'changed'
within a commit, but whose content did not change. (The same commit
might or might not also contain other paths that have real changes.)

In fact, I think one of the first things we need to do is a precise
analysis of the issue:

  * What exactly are the existing possible forms of 'no-op change'
that any part of Subversion can represent?
    - Text-change?
    - Props-change?
    - Whole-node-change?
    - Commit? (not so interesting in the current issue)
    - Only certain combinations of those?

  * At which APIs can each those changes be (a) made and (b) seen?
    - FS API?
    - Repos API?
    - RA and client-side APIs?
    - svnadmin dump/load?
    - svnrdump dump/load?
    - svnsync?

- Julian

Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Johan Corveleyn wrote on Mon, Sep 21, 2015 at 16:38:20 +0200:
> In [jcorvel's company's] repository, there are a couple of such revisions, dating from
> our conversion from cvs to svn (actually, revision N-1 contains a real
> change, but only a short log message; and revision N has a no-op
> change to that same path, and a very informative log message
> describing the real change in detail). It's not like we consciously
> used this as a specific use-case, but now the information is like that
> in our repository, and I'd hate to lose it.

The FreeBSD project used to intentionally make no-op commits (they term it
"forced commits") as part of their new committer workflow.  I don't know
whether they still do that.