You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@apache.org> on 2019/10/11 14:56:31 UTC

Subversion semantics: no no-op changes

Hello Eric.

TL;DR:  I explain why I am convinced no-op changes don't belong in the 
Subversion versioning semantics.  With your work on Subversion 
repository and dump stream semantics, is this something you can offer a 
view on?  I have previously failed to convince the developer community [1].


In examining the Subversion versioned data semantics and how the 
protocols and APIs represent them, I have come across a number of kinds 
of what could be called a "no-op change" or perhaps better described as 
"I touched this but did not change its value".

Example:
   - I changed the text of file F from T1 to T1;
   - now "svn log -v" tells me the text of F was "modified";
   - some variants of "svn diff" show no output;
   - some variants of "svn diff" show a diff header with no body.

That is the best known user-visible example.  Other kinds are possible 
too, and a number of examples exist on the server side, e.g. [#4623].

A Subversion client generally does not send no-op changes to a 
repository, but in certain cases it does.  A Subversion repository 
generally does not record and play back any no-op changes that may be 
sent to it, but in certain cases it does.

I am convinced "no-op changes" should be considered meaningless and 
removed from the data model presented to the user.  In protocols and 
APIs, a no-op change should be considered a non-canonical form and a 
transient implementation detail of that particular interface, and 
implementations should not attempt to preserve it.

In the rest of this note I try to explain some angles to the issue.

The Subversion system is built on a main design principle of tree 
snapshots and differencing and merging of trees.  A no-op change is 
out-of-tree metadata about certain pairs of trees.  Carrying such 
metadata around the system in general is fundamentally incompatible with 
that principle.

One practical reason the existing system does not preserve that metadata 
is because, with very few exceptions, the existing interfaces convey 
no-op changes only implicitly, as a side effect of how their explicit 
operations are formulated, and so one differs from another.  For 
example, an interface that represents a file change through multiple 
optional operations, one of which is "on file F, property P changes to 
value V" can convey "property P1's value no-op-changed from V1 to V1, 
while property P2's value was not touched" if we invoke the "change 
property" operation for property P1 but do not invoke it for P2.  On the 
other hand, an interface that represents a file change as a single 
operation, "new file := {text, {properties}}" cannot; the only no-op 
change it can convey is at a coarser granularity, "file F no-op-changed 
its value from {T1,PROPS1} to {T1,PROPS1}".  The kinds of no-op change 
an interface can convey locally is an implementation detail of that 
particular interface, and so cannot be expected to match any other 
interface unless explicitly required and tested, which they mostly are not.

Because the existing interfaces convey no-op-change information only 
incidentally, the system cannot be expected to preserve any particular 
no-op change when data flows through multiple interfaces, through 
commit, checkout, branch, diff, merge, and so on.  Subversion only 
preserves some within very limited scopes (such as the "file changed" 
flag in the "changed paths" list in "svn log -v").

Some of the existing svn protocols and APIs explicitly preserve certain 
no-op changes.  For example, one user reported [2] that in their svn 
history (converted from CVS) they would "hate to lose" the historical 
record that "svn log -v" reports "file text changed" for a certain no-op 
file change.  When I eliminated this no-op change from "dump", without 
due care to backward compatibility, it was considered a regression and 
reverted [#4598].  There are valid arguments for preserving backward 
compatibility in some places.  However, I propose such behaviour should 
be considered obsolete and broken, and a migration path should be 
planned to get away from it.

The snapshots argument is diluted because we already have at least one 
other kind of metadata outside a pure tree-snapshots system: the 
"copy-from" links.  I am not immediately planning to ditch copy-from 
links, though I think there are good reasons, analogous to the reasoning 
about no-op changes, to replace them in a possible future system.  I 
have given some thought to it.  That would be a more visible change to 
the system, of course, though not so much as it might first appear.

The example of a no-op file text change is a simple one.  An example 
with deeper implications is a directory copy combined with replacing one 
of its implicitly copied child with an explicit copy of that child from 
the same source as it was implicitly copied.  Addressing a case like 
this may be as simple as declaring one version as the canonical form, or 
may require further travel down the road of copy-from semantics.

In conclusion, I consider svn would be a better system -- more 
predictable, testable, composable, etc.; more generally dependable -- 
and would lose no significant value at all -- if we were to explicitly 
remove no-op changes.

Does this all ring true and obvious to you, or can you explain better 
what I am getting at and what I'm missing?

- Julian


[1] Email: "No no-op changes", from me to dev@, 2014-09-19,
     https://svn.haxx.se/dev/archive-2014-09/0082.shtml
 
https://mail-archives.apache.org/mod_mbox/subversion-dev/201409.mbox/%3c1411138196.98623.YahooMailNeo@web87703.mail.ir2.yahoo.com%3e

[2] Email: "No-op changes no longer dumped by 'svnadmin dump' in 1.9",
     from Johan Corveleyn to dev@, 2015-09-21,
     https://svn.haxx.se/dev/archive-2015-09/0269.shtml
 
http://mail-archives.apache.org/mod_mbox/subversion-dev/201509.mbox/%3CCAB84uBVe8QnEpbPVAb__yQjiDDoYjFn2+M9mPcdBXZCwMCpOLw@mail.gmail.com%3E

[#4598] "No-op changes no longer dumped by 'svnadmin dump' in 1.9",
     https://subversion.apache.org/issue/4598
     https://issues.apache.org/jira/browse/SVN-4598

[#4623] "no-op prop change not preserved across dump/load"
     https://subversion.apache.org/issue/4623
     https://issues.apache.org/jira/browse/SVN-4623

Re: Subversion semantics: no no-op changes

Posted by Julian Foad <ju...@apache.org>.
Eric S. Raymond wrote:
> Then I no longer undersrand what we are talking about.
 
OK, no worries. I was just hoping you'd twig what I'm clumsily trying to get at, but I'll have to write it up properly some time.
 
- Julian


Re: Subversion semantics: no no-op changes

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
Julian Foad <ju...@apache.org>:
> Eric S. Raymond wrote:
> > Julian Foad <ju...@apache.org>:
> > > [...]  I have come across a number of kinds of what could
> > > be called a "no-op change" or perhaps better described as "I touched this
> > > but did not change its value".
> [...]
> > No-op commits are specifically awkward for
> [...]
> 
> Eric, thank you for your thoughts on no-op commits.
> 
> Unfortunately I failed to make clear that I didn't mean to talk about an
> entire commit that is in total a no-op.  That is of course one manifestation
> of a no-op change, and I would certainly be interested in discussing it.
> 
> However, in this thread I specifically meant to talk about all the possible
> variants of micro-change within a commit (or within a diff or update or
> merge or ...).  For the sake of simplicity, let's say we're talking about a
> commit that also contains at least one "operative" change.
> 
> - Julian

Then I no longer undersrand what we are talking about.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>



Re: Subversion semantics: no no-op changes

Posted by Julian Foad <ju...@apache.org>.
Eric S. Raymond wrote:
> Julian Foad <ju...@apache.org>:
>> [...]  I have come across a number of kinds of what could
>> be called a "no-op change" or perhaps better described as "I touched this
>> but did not change its value".
[...]
> No-op commits are specifically awkward for 
[...]

Eric, thank you for your thoughts on no-op commits.

Unfortunately I failed to make clear that I didn't mean to talk about an 
entire commit that is in total a no-op.  That is of course one 
manifestation of a no-op change, and I would certainly be interested in 
discussing it.

However, in this thread I specifically meant to talk about all the 
possible variants of micro-change within a commit (or within a diff or 
update or merge or ...).  For the sake of simplicity, let's say we're 
talking about a commit that also contains at least one "operative" change.

- Julian

Re: Subversion semantics: no no-op changes

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
Julian Foad <ju...@apache.org>:
> Hello Eric.
> 
> TL;DR:  I explain why I am convinced no-op changes don't belong in the
> Subversion versioning semantics.  With your work on Subversion repository
> and dump stream semantics, is this something you can offer a view on?  I
> have previously failed to convince the developer community [1].
> 
> 
> In examining the Subversion versioned data semantics and how the protocols
> and APIs represent them, I have come across a number of kinds of what could
> be called a "no-op change" or perhaps better described as "I touched this
> but did not change its value".
> 
> Example:
>   - I changed the text of file F from T1 to T1;
>   - now "svn log -v" tells me the text of F was "modified";
>   - some variants of "svn diff" show no output;
>   - some variants of "svn diff" show a diff header with no body.

Initially I had to struggle with your presentation a bit.  When I
first read that, I thought you were changing the file from the text
"T1" to the text "T1;" and wondered "How is that a no-op change?" But
never mind, that was just me not being awake enough yet.

I am in favor of not shipping or recording no-op changes. They don't
have a natural representation on DVCSes, which are essentially
changeset DAGs in the way that a Subversion history is essentially a
series of file-tree surgical operations.

(The difference can be painful. The one conversion bug I have left is
around copy-delete-copy sequences on tags and branches. What to do
with the dead segment between the first tag/branch creation and the
delete turns out to be a question that does not have an obvious right
answer in changest-DAG-land.)

No-op commits are specifically awkward for the internal representation
reposurgeon uses, which has the semantics of git fast-import stream
semantics with a bit of gingerbread added - notably, per-commit properties
including a legacy-ID field which is normally used to carry Subversion
revision-IDs.

Git streams don't even want to carry around empty directories, let
alone commits with no file operations or only no-ops. Thus, when reposurgeon
encounters these, the no-op commit is preserved as an annotation tag with a
genetated name so a human operator can decide what to do about the
comment text.

Usually the disposition is to toss out the comment - I don't think
I've ever seen an exception.  That makes its own statement about the
calue of keeping these things in your representation.

So yeah, Eric sez nuke 'em.  It'd be simpler all around.

(BTW, reposurgeon actually treats Subversion branch copies as no-op
changes in this exact way; the first change after the new branch
creation is what's recorded, and the commit metadata of the copy
operation is shuffled off to become a junk tag.)
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>



Re: Subversion semantics: no no-op changes

Posted by Branko Čibej <br...@apache.org>.
On 15.10.2019 09:10, Julian Foad wrote:
> Branko Čibej wrote:
>> [...] if the dump or the load step drops no-op
>> changes, all existing working copies suddenly are no longer compatible
>  
> No. They remain compatible.
>  
> (Perhaps you are thinking of eliminating no-op commits and renumbering revs. That's not what we're talking about.)

Ah. Hm. You're right, all the pristine files and checksums remain valid.

-- Brane


Re: Subversion semantics: no no-op changes

Posted by Julian Foad <ju...@foad.me.uk>.
Branko Čibej wrote:
> [...] if the dump or the load step drops no-op
> changes, all existing working copies suddenly are no longer compatible
 
No. They remain compatible.
 
(Perhaps you are thinking of eliminating no-op commits and renumbering revs. That's not what we're talking about.)
 
- Julian


Re: Subversion semantics: no no-op changes

Posted by Branko Čibej <br...@apache.org>.
On 14.10.2019 10:07, Johan Corveleyn wrote:
> On Fri, Oct 11, 2019 at 4:56 PM Julian Foad <ju...@apache.org> wrote:
> ...
>> Some of the existing svn protocols and APIs explicitly preserve certain
>> no-op changes.  For example, one user reported [2] that in their svn
>> history (converted from CVS) they would "hate to lose" the historical
>> record that "svn log -v" reports "file text changed" for a certain no-op
>> file change.  When I eliminated this no-op change from "dump", without
>> due care to backward compatibility, it was considered a regression and
>> reverted [#4598].  There are valid arguments for preserving backward
>> compatibility in some places.  However, I propose such behaviour should
>> be considered obsolete and broken, and a migration path should be
>> planned to get away from it.
> Hi Julian,
>
> As the user in question, who reported this loss of history at $company
> while dumping/loading our repo: at the time, I was mainly concerned
> with backwards compatibility, wanting to preserve our existing history
> as close to 100% as possible. I was also very much surprised by this
> change in behaviour. Another couple of years have passed, and now I
> think: meh, it's probably not such a big deal.

Well actually it is. And here's why:

If you have a repository with such no-op changes; and you want to do a
dump/load cycle in order to, I don't know, optimize the repo or make new
features available. Then, if the dump or the load step drops no-op
changes, all existing working copies suddenly are no longer compatible
with the new repository, even if you preserver the UUID. You can't just
'svn relocate' any more (or replace the repo on the server), you have to
do a fresh checkout, or expect possibly corrupting side effects.

-- Brane

Re: Subversion semantics: no no-op changes

Posted by Johan Corveleyn <jc...@gmail.com>.
On Fri, Oct 11, 2019 at 4:56 PM Julian Foad <ju...@apache.org> wrote:
...
> Some of the existing svn protocols and APIs explicitly preserve certain
> no-op changes.  For example, one user reported [2] that in their svn
> history (converted from CVS) they would "hate to lose" the historical
> record that "svn log -v" reports "file text changed" for a certain no-op
> file change.  When I eliminated this no-op change from "dump", without
> due care to backward compatibility, it was considered a regression and
> reverted [#4598].  There are valid arguments for preserving backward
> compatibility in some places.  However, I propose such behaviour should
> be considered obsolete and broken, and a migration path should be
> planned to get away from it.

Hi Julian,

As the user in question, who reported this loss of history at $company
while dumping/loading our repo: at the time, I was mainly concerned
with backwards compatibility, wanting to preserve our existing history
as close to 100% as possible. I was also very much surprised by this
change in behaviour. Another couple of years have passed, and now I
think: meh, it's probably not such a big deal.

Do note however that at the time of that mail in 2015 (and issue
#4598), it was also determined that the change in behaviour in 1.9 (no
longer dumping those no-op changes) was unintentional (side effect of
another change by Stefan Fuhrmann). That's one of the reasons why it
was considered a regression.

Anyway, that's all "history" now. I just wanted to say: if y'all
decide to get rid of no-op changes, I won't oppose it at all cost for
the sake of backwards compatibility any more :-). So FWIW, consider my
opinion "neutral" on the matter.

-- 
Johan

> [2] Email: "No-op changes no longer dumped by 'svnadmin dump' in 1.9",
>      from Johan Corveleyn to dev@, 2015-09-21,
>      https://svn.haxx.se/dev/archive-2015-09/0269.shtml
>
> http://mail-archives.apache.org/mod_mbox/subversion-dev/201509.mbox/%3CCAB84uBVe8QnEpbPVAb__yQjiDDoYjFn2+M9mPcdBXZCwMCpOLw@mail.gmail.com%3E
>
> [#4598] "No-op changes no longer dumped by 'svnadmin dump' in 1.9",
>      https://subversion.apache.org/issue/4598
>      https://issues.apache.org/jira/browse/SVN-4598
>
> [#4623] "no-op prop change not preserved across dump/load"
>      https://subversion.apache.org/issue/4623
>      https://issues.apache.org/jira/browse/SVN-4623

Re: Subversion semantics: no no-op changes

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian Foad wrote on Fri, Oct 11, 2019 at 15:56:31 +0100:
> Hello Eric.

Not to preëmpt Eric, but may I share my thoughts too?

> In conclusion, I consider svn would be a better system -- more predictable,
> testable, composable, etc.; more generally dependable -- and would lose no
> significant value at all -- if we were to explicitly remove no-op changes.
> 
> Does this all ring true and obvious to you, or can you explain better what I
> am getting at and what I'm missing?

What layer do you want to forbid no-op changes at?

At the FS level, a filesystem is an array of trees.  The contents of rN
place some restrictions on what the contents of rN+1 may be: for example,
if foo@N+1 has copyfrom=bar@N, then foo and bar have the same node kind.
At this level, I don't see a reason to forbid two successive trees from being
equal.¹  The question isn't whether that's a _useful_ thing to do, but
whether it has a consistent, well-defined semantics — to which, IMO, the
answer is in the affirmative.  That, incidentally, is also the reason
why «x + 0» is a well-formed arithmetic expression.

In C, it's perfectly valid to write «if (0) abort();», or even «foo; ;».
It is not a syntax error.  It _will_ generate a compiler warning, but
you will be able to compile and run it.

Similarly, I suspect "no no-op changes" is a semantic that belongs at
the same level as "no mixed-revision or switched subtrees": it is
something a user _usually_ don't want to do, but ultimately, it's their
data, not ours.   That's what we do in 'svn merge', which does that
sanity check but makes it opt-outable.  It's actually what we already do
with empty revisions: 'svn commit' won't create an empty revision, but
it _is_ possible to do so if one really wants to.

Use-case: Imagine a company that is regulatorily required to review its
foobar policy once a year.  For that company, it would make perfect
sense to create a commit that _doesn't_ change the contents of the
policy, but does mark the file as changed (in «svn log -qv», etc), with
the log message "Policy reviewed and reaffirmed without changes.".

Yes, it's a metadata-only change.

Makes sense?

Cheers,

Daniel

¹ [Hair splitting: when I say "equal" I mean, "equal as far as a consumer
of the replay or dump API can detect.  That is, node-rev-id's are not
available to distinguish wish.]

P.S. Relevant APIs: svn_fs_contents_different() v. svn_fs_contents_changed().