You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Malcolm Rowe <ma...@farside.org.uk> on 2005/10/09 23:16:55 UTC

What does 'svn diff' do?

Hello list,

I'm hoping that someone can explain to me what output 'svn diff' is
supposed to produce.

That's not as stupid a question as it sounds - I've looked around at the
book, the API documentation, and the help, and although they describe
it in general terms, the specifics are absent.

So here's my attempt at a definition of the _basic_ functionality,
ignoring --notice-ancestry and --no-diff-deleted: (and note that while
the actual client command has three syntax variations, they can all be
implemented using the following)

1. 'svn diff' accepts two path/revision pairs, each representing a
subtree of a repository or working copy at a particular revision (or
BASE or WORKING for the wc). The output of 'svn diff' is something that,
when fed to GNU patch, will convert the first tree into the second tree.
  
2. The output also contains textual comments that describe directory and
file property modifications, except for deleted files or directories,
where property deletions are implicit.

3. The output of 'svn diff' will be identical (ignoring ordering)
regardless of whether the subtrees are in a working copy or repository.


Sounds good? We don't do that, of course, but most of the reasons why
are bugs.

The reason why I'd like a definition is that I'm spending some time
looking at the diff code at the moment. It's pretty complex already,
and it's even harder to understand when we don't seem to have a design
spec for the feature.


Also:

There _is_ one place where we deliberately don't follow the text above,
and I'd also like to understand whether it's the result of a deliberate
design decision, or whether it's a bug that's been codified into
a feature.

Here it is:
For BASE->WORKING and repos->WORKING diffs, we do not show a new copied
(i.e. schedule-add-with-history) file as 'added' in the output of 'svn
diff'. Regardless of the requested revision, for these files, we show only
local modifications (that is, any modifications made _after_ the copy).

We test for this behaviour explicitly in diff_tests.py (22: diff a file
that has been renamed), but I can't understand why it's desirable.

There are several reasons why it's _not_ desirable:

* The result of 'svn diff' changes once you commit, because repos->repos
diffs do _not_ have this behaviour (they report an add-with-history just
the same way as an add: as the complete contents of the file, added).

* The result of 'svn diff' is no longer something that you can feed
into patch. While copies (and property modifications, for that matter)
can't be represented precisely as diffs, we lose even the fact that the
file was created.

* The file shows up in 'svn status', but not 'svn diff'.

* It leads to the utterly weird situation whereby 'svn mv A B; svn diff'
reports the deletion of A, but not the addition of B.


The _only_ advantage I can see for this behaviour is that it permits you
to see what local modifications have been performed to the file since it
was copied, if any. But I'm fairly sure that if you wanted to know that,
you could pass in the old and new targets explicitly: it shouldn't be
the default.


So, two questions: Firstly, am I missing something? If not, do we really
want svn diff to behave like this?

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by Malcolm Rowe <ma...@farside.org.uk>.
Thank you all for your detailed replies!

To summarise:

I think there's complete agreement that we need the ability to print
differences against the copy source. Greg pointed out something that
hadn't occurred to me: while you might be able to specify the source
and target manually for a single file copy, you need a way to get the
current behaviour when you are producing a diff across more than one file.

I think that we also have consensus that there's a need to provide
a switch to force one behaviour or the other, and to rationalise the
wc->repos and repos->repos diffs to work the same way by default. Karl
pointed out the --no-diff-deleted option on 'svn diff', which already
gives users a way to make the diff more useful (at the cost of not
providing a 'complete' diff), while Peter pointed out that svnlook also
has a --diff-copy-from option, which does exactly what we're talking
about here (svnlook _also_ has a --no-diff-added, so it's got the most
flexible of the interfaces).

It sounds like it might also be useful to further develop the draft 'svn
diff' specification that I put together (Julian seemed particularly keen
on this) - what do you all think? What would we do with it afterwards?

In terms of code changes, I think I should be able to conditionalise
the copyfrom behaviour for repos->wc and wc->wc diffs on a command-line
option, and I _may_ be able to do the same for repos->repos diffs as well
(though I haven't looked at that code yet).


One important thing to decide is what the default behaviour should
be. We are already inconsistent: if we do want to fix that inconsistency,
one of the default (wc,repos)->wc / repos->repos codepaths has to change.

So:

I propose that we add the --diff-copy-from option as an option to 'svn
diff', and therefore change the default behaviour for the (wc,repos)->wc
diffs, because...

* I strongly believe that the default behaviour should be to produce a
diff that is a valid input to 'patch' (where 'valid' means 'transforms
tree A to tree B'). We already provide one option (--no-diff-deleted)
on 'svn diff' to change the output format in a non-standard way;
adding --diff-copy-from would just allow us to change it in another
non-standard way.

* The current behaviour violates the principle of least surprise. We
don't document it anywhere very obvious, and the _absence_ of a file
from diff output is easily overlooked (by contrast, if you _want_ to see
your local modifications, you're going to realise that something needs
to change immediately).

* It would be confusing if 'svnlook diff' worked one way, and 'svn diff'
the other. Yes, that might be an argument for changing svnlook :-)

However, that's just my personal belief: consensus is obviously what's
important here.


Finally, there's also the discussion about whether Subversion should
provide a more 'complete' diff format. I'm not too bothered about that
right this moment: it's a conversation that I suspect could go on for
a while.


So, let me know which of the above is useful. I can't promise that I'll
have time to follow up on everything (I don't have a great deal of spare
time generally, though last week was an exception), but I hope I can
make some contribution along the way.

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by Malcolm Rowe <ma...@farside.org.uk>.
(See my other reply for more; just a minor few points here)

On Sun, Oct 09, 2005 at 09:22:23PM -0500, kfogel@collab.net wrote:
> > 2. The output also contains textual comments that describe directory and
> > file property modifications, except for deleted files or directories,
> > where property deletions are implicit.
> 
> Yup.  And you didn't say so, but this meta-information is supposed to
> be transparent to the "patch" program (as I believe it is).  Someday
> we may have a more parseable format, and a patcher that can actually
> do something with it.

Yes, it is ignored. I used the word 'comments', but I guess I could have
been clearer.

> Well, there's --no-diff-deleted, but yeah.  (Though I think the
> deletion might still be reported even then, just not shown as a full
> diff.)

Yes, that's right: we just print '(deleted)' after the header, and have
labels or content for the diff itself.

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by kf...@collab.net.
Malcolm Rowe <ma...@farside.org.uk> writes:
> 1. 'svn diff' accepts two path/revision pairs, each representing a
> subtree of a repository or working copy at a particular revision (or
> BASE or WORKING for the wc). The output of 'svn diff' is something that,
> when fed to GNU patch, will convert the first tree into the second tree.

Exactly.

> 2. The output also contains textual comments that describe directory and
> file property modifications, except for deleted files or directories,
> where property deletions are implicit.

Yup.  And you didn't say so, but this meta-information is supposed to
be transparent to the "patch" program (as I believe it is).  Someday
we may have a more parseable format, and a patcher that can actually
do something with it.

> 3. The output of 'svn diff' will be identical (ignoring ordering)
> regardless of whether the subtrees are in a working copy or repository.

Yup.

> Sounds good? We don't do that, of course, but most of the reasons why
> are bugs.
> 
> The reason why I'd like a definition is that I'm spending some time
> looking at the diff code at the moment. It's pretty complex already,
> and it's even harder to understand when we don't seem to have a design
> spec for the feature.

I think you pretty much nailed it above.

> There _is_ one place where we deliberately don't follow the text above,
> and I'd also like to understand whether it's the result of a deliberate
> design decision, or whether it's a bug that's been codified into
> a feature.
> 
> Here it is:
> For BASE->WORKING and repos->WORKING diffs, we do not show a new copied
> (i.e. schedule-add-with-history) file as 'added' in the output of 'svn
> diff'. Regardless of the requested revision, for these files, we show only
> local modifications (that is, any modifications made _after_ the copy).
> 
> We test for this behaviour explicitly in diff_tests.py (22: diff a file
> that has been renamed), but I can't understand why it's desirable.

I believe it was a deliberate decision (I'm not saying it's right,
merely that it was deliberate).  It's desirable because people reading
diffs are often more interested in the changes made against the copy
source, changes which would be obscured if we simply showed the entire
copy as an add.

What we need, I think, is an option that causes adds to shown as full
adds (i.e., every line added), so at least one can get the formally
correct -- if less useful to humans -- behavior.

It's worth noting that on similar grounds, we already have the
--no-diff-deleted option to 'svn diff'.

> There are several reasons why it's _not_ desirable:
> 
> * The result of 'svn diff' changes once you commit, because repos->repos
> diffs do _not_ have this behaviour (they report an add-with-history just
> the same way as an add: as the complete contents of the file, added).

That seems inconsistent; I wonder if it's an oversight, hmmm.

> * The result of 'svn diff' is no longer something that you can feed
> into patch. While copies (and property modifications, for that matter)
> can't be represented precisely as diffs, we lose even the fact that the
> file was created.

Agreed.  This is a disadvantage.

> * The file shows up in 'svn status', but not 'svn diff'.

Yup.  I think this was discussed during the decision, in fact.

> * It leads to the utterly weird situation whereby 'svn mv A B; svn diff'
> reports the deletion of A, but not the addition of B.

Well, there's --no-diff-deleted, but yeah.  (Though I think the
deletion might still be reported even then, just not shown as a full
diff.)

> The _only_ advantage I can see for this behaviour is that it permits you
> to see what local modifications have been performed to the file since it
> was copied, if any. But I'm fairly sure that if you wanted to know that,
> you could pass in the old and new targets explicitly: it shouldn't be
> the default.
> 
> So, two questions: Firstly, am I missing something? If not, do we really
> want svn diff to behave like this?

I don't think you're missing anything.

I do think we want this behavior.  But, I don't know if we want it as
the default...

-Karl

-- 
www.collab.net  <>  CollabNet  |  Distributed Development On Demand


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2005-10-09 at 22:20 -0500, Ben Collins-Sussman wrote:
> On 10/9/05, Malcolm Rowe <ma...@farside.org.uk> wrote:
> 
> > The _only_ advantage I can see for this behaviour is that it permits you
> > to see what local modifications have been performed to the file since it
> > was copied, if any. But I'm fairly sure that if you wanted to know that,
> > you could pass in the old and new targets explicitly: it shouldn't be
> > the default.

The advantage is that when you run "svn diff" to review what changes
you're about to commit, modifications you've made to add-with-history
files show up as modifications, rather than the entire contents of the
file.

Sure, you could specify the old and new targets explicitly, but when
you've got a mix of straightforward modifications and adds-with-history,
the usage becomes really muddled.  So while there's an argument for
diffing adds-with-history against empty for definitional elegance, I
think it would detract from our usability.

I note we have this "--strict" flag, which we use for some other
commands when the default usage has been tweaked for usability reasons.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Sun, 9 Oct 2005, Ben Collins-Sussman wrote:

> On 10/9/05, Malcolm Rowe <ma...@farside.org.uk> wrote:
>
> > The _only_ advantage I can see for this behaviour is that it permits you
> > to see what local modifications have been performed to the file since it
> > was copied, if any. But I'm fairly sure that if you wanted to know that,
> > you could pass in the old and new targets explicitly: it shouldn't be
> > the default.
>
> This discussion has come up many times:  should 'svn diff' show a
> copied file as entirely added, or only the parts that have been
> tweaked since the copy?  There are legitimate uses for both cases, and
> my memory is that we all want to support both cases.  Historically,
> it's been a waste of time to declare that one use-case or the other is
> folly.
>
Just for the record, in 1.3, svnlook diff got a --dif-copy-from option.
The default is to show added files as diffs against the empty stream, and
this option making it show what svn diff BASE:WORKING does today. To me,
it is reasonable that the default behaviour should be a format tha patch
can read. I don't know what compatibility problem we might cause if we
make this behaviour consistent across svn diff...

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by Ben Collins-Sussman <su...@red-bean.com>.
On 10/9/05, Malcolm Rowe <ma...@farside.org.uk> wrote:

> The _only_ advantage I can see for this behaviour is that it permits you
> to see what local modifications have been performed to the file since it
> was copied, if any. But I'm fairly sure that if you wanted to know that,
> you could pass in the old and new targets explicitly: it shouldn't be
> the default.

This discussion has come up many times:  should 'svn diff' show a
copied file as entirely added, or only the parts that have been
tweaked since the copy?  There are legitimate uses for both cases, and
my memory is that we all want to support both cases.  Historically,
it's been a waste of time to declare that one use-case or the other is
folly.

The hard question is (1) deciding which behavior is the default, and
which requires a switch, and (2) being consistent across all three
code-paths of  'svn diff'.  I don't think we've done a good job of
achieving those goals so far, and would love to see a cleanup.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: What does 'svn diff' do?

Posted by Greg Hudson <gh...@MIT.EDU>.
On Fri, 2005-10-14 at 16:36 +0100, Malcolm Rowe wrote:
> Okay - just so that I understand this: in the case where you have only
> modified existing files, the output of 'svn diff' should be always be
> equivalent to the naïve output that GNU diff would produce, given the two
> trees. In all other cases, we will need to choose (or have already chosen)
> between that, and producing 'more useful' output. Is that approximately
> correct?

Yes.

> So, as a concrete example, does that mean that, if we were to decide
> that file or directory renames would be better reported as no-content
> 'renames' in diff output rather than full-content 'delete/add' pairs,
> we would be happy to make that change by default? [I'm not suggesting
> that this _is_ a change we'd want to make, but if we _did_, could we?]

I'd say so.

> If people are relying on the current behaviour, then it sounds the
> answer to my question above is "no". Is the definition of 'svn diff'
> therefore effectively "whatever we have now"?

"svn diff" should, by default, present tree changes in the most
reviewable way possible, because that's how it generally works now and I
think people rely on that.  That's not the same as saying 'the
definition of svn diff is the output svn diff produces'.

> For example, IIRC, issue #2333 basically boils down to the fact that
> repos-repos diffs don't report subtree deletions. The other modes
> (repos->wc, wc->wc) _do_ report such deletions, but since we don't have
> a definition of what 'svn diff' should report, how do we decide which,
> if either, should change?
> 
> Do we really have to do all this on an ad hoc basis?

It's true that "present tree changes in the most reviewable way
possible" is a somewhat more squishy definition than "present tree
changes in a way that they will be reflected by patch".  But I don't
think it's the same as deciding everything on an ad hoc basis.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: What does 'svn diff' do?

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Thu, Oct 13, 2005 at 01:50:28PM -0400, Greg Hudson wrote:
> > Is svn diff primarily intended to be something that produces classical
> > GNU diff unified diffs that can be fed to patch, or is it intended
> > to be a tool that works similarly to GNU diff, but that has differing
> > [default] behaviour in cases where we think those changes would help
> > the user understand the changes better?
> 
> I'd say the latter, by default.  In the particular common case where you
> have modified files but not reorganized the tree, "svn diff" output can
> be fed to patch, and that's a good thing, but outside of that case I
> don't think we can really achieve that goal in a meaningful way.

Okay - just so that I understand this: in the case where you have only
modified existing files, the output of 'svn diff' should be always be
equivalent to the naïve output that GNU diff would produce, given the two
trees. In all other cases, we will need to choose (or have already chosen)
between that, and producing 'more useful' output. Is that approximately
correct?


So, as a concrete example, does that mean that, if we were to decide
that file or directory renames would be better reported as no-content
'renames' in diff output rather than full-content 'delete/add' pairs,
we would be happy to make that change by default? [I'm not suggesting
that this _is_ a change we'd want to make, but if we _did_, could we?]

> At any rate, we've applied the latter philosophy up until now, and
> people have come to rely on it, so I don't think we can reasonably
> change the default behavior to reflect the former philosophy.

If people are relying on the current behaviour, then it sounds the
answer to my question above is "no". Is the definition of 'svn diff'
therefore effectively "whatever we have now"?

I suppose I'm trying to work out how we decide what is a bug in diff (and
can therefore be changed without violating our compatibility guarantees),
and what is a change to the naïve model that we've made deliberately.

For example, IIRC, issue #2333 basically boils down to the fact that
repos-repos diffs don't report subtree deletions. The other modes
(repos->wc, wc->wc) _do_ report such deletions, but since we don't have
a definition of what 'svn diff' should report, how do we decide which,
if either, should change?

Do we really have to do all this on an ad hoc basis?


Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by Greg Hudson <gh...@MIT.EDU>.
On Thu, 2005-10-13 at 11:04 +0100, Malcolm Rowe wrote:
> Is svn diff primarily intended to be something that produces classical
> GNU diff unified diffs that can be fed to patch, or is it intended
> to be a tool that works similarly to GNU diff, but that has differing
> [default] behaviour in cases where we think those changes would help
> the user understand the changes better?

I'd say the latter, by default.  In the particular common case where you
have modified files but not reorganized the tree, "svn diff" output can
be fed to patch, and that's a good thing, but outside of that case I
don't think we can really achieve that goal in a meaningful way.

For example, suppose I do a really simple tree change: adding a file
without history.  "svn diff" produces a diff which will, when fed to
patch, create the file, but patch will of course not "svn add" the file.
So a developer who receives such a diff and applies it in an svn working
copy will have to go to extra work to reflect the tree change.  That's
probably the most common use case of applying the result of "svn diff"
using patch.

At any rate, we've applied the latter philosophy up until now, and
people have come to rely on it, so I don't think we can reasonably
change the default behavior to reflect the former philosophy.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by kf...@collab.net.
Malcolm Rowe <ma...@farside.org.uk> writes:
> Perhaps Karl et al could provide some guidance?

Karl will be mostly offline for ten days starting tomorrow afternoon,
so probably can't contribute much more to this thread.  He hopes Et Al
will be able to provide guidance, though.

(Sorry, I know that wasn't the most important point to make in this
thread, but I also didn't want you to be waiting forever for my input!)

Best,
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by Malcolm Rowe <ma...@farside.org.uk>.
[I have a bad cold; if any of this email sounds rude, apologies: that's
not the intent.]

On Tue, Oct 11, 2005 at 04:54:18PM +0100, Julian Foad wrote:
> Having thought more deeply, I don't think you can usefully write a single 
> description of the behaviour.

I don't know; I think I'm pretty close already. I'm only aiming to
document what we currently intend the output to be, not what it could
more usefully be.

> I think you need to organise this definition 
> according to purpose:

I think the definition of 'diff' itself can remain static (though it can
say additional things like "If the 'no-diff-deleted' option is used,
the behaviour is as above except that file deletions are reported only
in summary). But you're right that we need some kind of mapping from
purpose to options.

> >>Patch: not exactly; you need to say something about how only the 
> >>modifications to individual files are obeyed by Patch, and whether this 
> >>should include whole-file deletes that are part of a whole-subtree delete.
> >
> >Well, I'm defining the output of diff, not the behaviour of patch. I
> >mention that property modifications are ignored by patch further down;
> >is there anything else that needs to be clarified?
> 
> But you say existing "patch" will convert one tree into another.  Oh, you 
> mean by brute force and heuristics: if you supply an all-lines-deleted 
> patch, it can delete the file, and if all files are deleted, it can delete 
> the directory, etc.  I don't necessarily think that's what we should output 
> by default.
> 

But it _is_ what we output, except for repos-repos diffs (that's issue
2333).

> >Why would whole-file deletes _not_ be included in the output of a subtree
> >delete? (We don't in one or two cases, but they're bugs, no?). They're
> >necessary to remove the files from the target.
> 
> Then renaming a root directory would result in a huge delete-and-add patch. 
> What is this output intended to be used for?  That would be bad for showing 
> the user; it would be bad for submitting a patch to a project for review; 
> it would only be good for generating a patch that can be automatically 
> applied by an exisiting Patch program.  But no exisiting Patch program can 
> apply the whole of an arbitrary Subversion diff automatically, so is that 
> really useful?  Only in limited circumstances, I think.
> 

Agreed, on all points, except: it's called 'diff'. We already do things
to make sure the output of diff is a valid input to patch (for example:
we output no-content diffs for new empty files; one reason noted is so
that patch will create the file).

If we had called it 'svn whatchanged' or something _other_ than diff,
or if its output _wasn't_ typically parsable by patch, I'd have a far
better time accepting that we could arbitrarily change the default output
to something else (for example, putting in a 'renamed' comment rather
than a deleted+added diff, when directories are renamed).

Don't get me wrong: I think it would be great if svn had better tools
to report differences between trees, because, as you quite rightly point
out, a basic 'diff' isn't ideal for reviewing changes in all cases.

You already mentioned the file or directory-renamed problem. Only
yesterday, I was reordering functions in a file, and I thought that it
would be significantly more useful had diff been able to show me that
the text in the file had just been moved rather than added and deleted.

But not by default?

Also, technically, while we continue to represent renames as paired
'delete' and 'add-with-history', it's not possible to reassemble the
operations into a single 'rename' operation in all cases (for example:
delete of 'A', add-with-history B from A, add-with-history C from A. Is
that a rename of A->B and an add-with-history of C, or A->C and add B,
or A->(B and C)). That's not to say we couldn't try, of course.

> Is that really what our current "svn diff" output is intended to be?  

That being my original question, of course :-)
Perhaps Karl et al could provide some guidance?

Is svn diff primarily intended to be something that produces classical
GNU diff unified diffs that can be fed to patch, or is it intended
to be a tool that works similarly to GNU diff, but that has differing
[default] behaviour in cases where we think those changes would help
the user understand the changes better?

> That's why I think we need to specify useful behaviours, and then specify 
> the current implementation in terms of them and decide whether and how much 
> to change it.

Ok. I don't have enough time to drive that, which is why I'd prefer
to document what appears to be the current intended behaviour, and
possibly the deviations from that (though most are in the issue list,
so there'd be some duplication there). Perhaps we could produce something
jointly?

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by Julian Foad <ju...@btopenworld.com>.
Having thought more deeply, I don't think you can usefully write a single 
description of the behaviour.  I think you need to organise this definition 
according to purpose:


* For summarising the changes for a human, and usable by conventional "patch" 
to a reasonable extent.

* For generating a patch that a conventional "patch" program can apply to the 
greatest extent possible, with the remaining information as annotations for a 
human.

* For generating a patch that a future Subversion "patch" program can apply to 
recreate the changes exactly.

* For compatibility with the current "svn diff" (excluding any behaviours that 
we agree are definitely bugs).


File renamed from dir/F1 to dir/F2:
   (human)    "R  dir/F1 -> dir/F2"
   (oldpatch) diff deleting F1 + diff adding F2
   (svnpatch) "R  dir/F1 dir/F2"
   (compat)   "D  dir/F1" + diff adding F2

Property P deleted from file F:
   (human)    Same as (compat).
   (oldpatch) Same as (compat).
   (svnpatch) [...]
   (compat)   "Property change on F:" + [...]

[...]


Malcolm Rowe wrote:
> On Mon, Oct 10, 2005 at 01:55:19PM +0100, Julian Foad wrote:
> 
>>>1. 'svn diff' accepts two path/revision pairs, each representing a
>>>subtree of a repository or working copy at a particular revision (or
>>>BASE or WORKING for the wc). The output of 'svn diff' is something that,
>>>when fed to GNU patch, will convert the first tree into the second tree.
>>
>>Patch: not exactly; you need to say something about how only the 
>>modifications to individual files are obeyed by Patch, and whether this 
>>should include whole-file deletes that are part of a whole-subtree delete.
> 
> Well, I'm defining the output of diff, not the behaviour of patch. I
> mention that property modifications are ignored by patch further down;
> is there anything else that needs to be clarified?

But you say existing "patch" will convert one tree into another.  Oh, you mean 
by brute force and heuristics: if you supply an all-lines-deleted patch, it can 
delete the file, and if all files are deleted, it can delete the directory, 
etc.  I don't necessarily think that's what we should output by default.

> Why would whole-file deletes _not_ be included in the output of a subtree
> delete? (We don't in one or two cases, but they're bugs, no?). They're
> necessary to remove the files from the target.

Then renaming a root directory would result in a huge delete-and-add patch. 
What is this output intended to be used for?  That would be bad for showing the 
user; it would be bad for submitting a patch to a project for review; it would 
only be good for generating a patch that can be automatically applied by an 
exisiting Patch program.  But no exisiting Patch program can apply the whole of 
an arbitrary Subversion diff automatically, so is that really useful?  Only in 
limited circumstances, I think.

Is that really what our current "svn diff" output is intended to be?  That's 
why I think we need to specify useful behaviours, and then specify the current 
implementation in terms of them and decide whether and how much to change it.


>>Also I should think we ought to say explicitly that tree-change information 
>>should be included in the output in an unambiguous form that might be 
>>usable by a hypothetical tree-aware Patch program, and can certainly be 
>>used by a human.
> 
> "should", perhaps, but it's not at the moment. Not that I disagree with
> you (though I don't know whether it's something we should overload onto
> the textual diff format - the old tree-delta XML format was the first
> thing I thought of using), but I'm trying to describe what the intended
> output is at the moment.

Right, this was a case of me not distinguishing between different intentions.


>>>3. The output of 'svn diff' will be identical (ignoring ordering)
>>>regardless of whether the subtrees are in a working copy or repository.
>>
>>And something about what the same diff in reverse order should look like.
> 
> No: there's no such thing in this model. A 'reverse diff' is simply a
> diff with the source and target subtrees swapped.

Yes.  I was trying to get at some sort of wording to make clear the 
relationship between the output of "diff a b" and "diff b a", so as to make 
clear that the current bugs resulting in asymmetrical output for those cases 
really are bugs.  But it's obvious; no such description is necessary.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Mon, Oct 10, 2005 at 01:55:19PM +0100, Julian Foad wrote:
> Thank you _very_ much for asking.  We sorely need this defined and the bugs 
> and quirks in "diff" ironed out.

Oh good, I'm glad it's not just me who thinks the broken bits need fixing!

> >1. 'svn diff' accepts two path/revision pairs, each representing a
> >subtree of a repository or working copy at a particular revision (or
> >BASE or WORKING for the wc). The output of 'svn diff' is something that,
> >when fed to GNU patch, will convert the first tree into the second tree.
> 
> Patch: not exactly; you need to say something about how only the 
> modifications to individual files are obeyed by Patch, and whether this 
> should include whole-file deletes that are part of a whole-subtree delete.
> 

Well, I'm defining the output of diff, not the behaviour of patch. I
mention that property modifications are ignored by patch further down;
is there anything else that needs to be clarified?

Why would whole-file deletes _not_ be included in the output of a subtree
delete? (We don't in one or two cases, but they're bugs, no?). They're
necessary to remove the files from the target.

> Also I should think we ought to say explicitly that tree-change information 
> should be included in the output in an unambiguous form that might be 
> usable by a hypothetical tree-aware Patch program, and can certainly be 
> used by a human.

"should", perhaps, but it's not at the moment. Not that I disagree with
you (though I don't know whether it's something we should overload onto
the textual diff format - the old tree-delta XML format was the first
thing I thought of using), but I'm trying to describe what the intended
output is at the moment.

> >3. The output of 'svn diff' will be identical (ignoring ordering)
> >regardless of whether the subtrees are in a working copy or repository.
> 
> And something about what the same diff in reverse order should look like.

No: there's no such thing in this model. A 'reverse diff' is simply a
diff with the source and target subtrees swapped.

> And, when we put this specification into more detail, something about how 
> the files and revisions are labeled.  In particular, what revision number 
> or "base" or "wc"-type label is appended to a file name in various 
> circumstances.
> 

Yes, that sounds like an excellent idea. For example, there's a bug at
the moment whereby for deleted files, we print the revision number of
the source tree anchor instead of the file. Things like that are very
hard at the moment to work out whether they're bugs or not.

> Hooray for you, Malcolm!

Heh. What have I started? :-)

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by Julian Foad <ju...@btopenworld.com>.
kfogel@collab.net wrote:
> Julian Foad <ju...@btopenworld.com> writes:
> 
>>Also I should think we ought to say explicitly that tree-change
>>information should be included in the output in an unambiguous form
>>that might be usable by a hypothetical tree-aware Patch program, and
>>can certainly be used by a human.
> 
> In the past, this has always gotten bogged down in the realization
> that we need to consult with a zillion other version control projects
[...]

I put the words "might" and "hypothetical" in there on purpose.  What I meant 
is that we should make a quick stab at writing output that states tree changes 
in a human-readable way that is not needlessly unparseable or incomplete, but 
we should not expend significant time and effort trying to design a "proper" 
tree-diff solution now before fixing the current "diff" problems.

> Do we want to give up on that goal?  Do we want to come up with our
> own parseable format that we know will work for Subversion, and then
> later if a common patch format develops (with or without our
> involvement), we can add an option to produce it?

That's not a bad intermediate goal.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by Ben Collins-Sussman <su...@red-bean.com>.
On 10 Oct 2005 14:49:16 -0500, kfogel@collab.net <kf...@collab.net> wrote:

> Well, I think we're past the hand-waving stage now.  To make progress,
> concrete proposals with examples are what's needed, IMHO... Do you
> agree?
>

By the way, last year Erik and I spent some time trying to revive the
old XML format that Subversion used in the early, early days.  Like
svk, it was just another format used to represent editor calls.  (It's
all we had before libsvn_fs existed;  'svn commit' would produce the
XML, and and 'svn up' would read the XML.)

Anyway, you can look at our progress here:

    http://svn.red-bean.com/repos/sussman/software/svnpatch/trunk/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: What does 'svn diff' do?

Posted by kf...@collab.net.
Michael Sweet <mi...@easysw.com> writes:
> I'm sure you could do something very similar, although I don't know
> if the SVK data chunk (base64-encoded, flate compressed, text data)
> would map cleanly to plain Subversion.  Unfortunately, I can't find
> any real documentation on the chunk format, and my Perl-fu skills
> aren't up to the SVK source...
> 
> Aside from that, SVK patches are just unified diffs, so at the very
> least you can come up with additional Subversion-specific data
> chunks that are ignored by patch(1) and SVK, at least initially.
> Perhaps some combination of the dump and unified diff formats?

Well, I think we're past the hand-waving stage now.  To make progress,
concrete proposals with examples are what's needed, IMHO... Do you
agree?

-Karl

-- 
www.collab.net  <>  CollabNet  |  Distributed Development On Demand

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by Michael Sweet <mi...@easysw.com>.
kfogel@collab.net wrote:
> Michael Sweet <mi...@easysw.com> writes:
>> No, however I am more optimistic about an effort that has even an
>> incomplete specification/proposal than one in which people say "this
>> is hard, we need to get with everyone else to define it".
>>
>> It is laudable to have a patch format that is usable by all version
>> control systems, however from a practical standpoint it is more
>> important to have a way to express patches beyond universal diffs
>> with Subversion (I personally don't care about anything else.)
> 
> Are there straightforward transformations that would make the SVK
> format useable for Subversion?  I'd love to stay as compatible with
> SVK as we can.

I'm sure you could do something very similar, although I don't know
if the SVK data chunk (base64-encoded, flate compressed, text data)
would map cleanly to plain Subversion.  Unfortunately, I can't find
any real documentation on the chunk format, and my Perl-fu skills
aren't up to the SVK source...

Aside from that, SVK patches are just unified diffs, so at the very
least you can come up with additional Subversion-specific data
chunks that are ignored by patch(1) and SVK, at least initially.
Perhaps some combination of the dump and unified diff formats?

-- 
______________________________________________________________________
Michael Sweet, Easy Software Products           mike at easysw dot com
Internet Printing and Publishing Software        http://www.easysw.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by kf...@collab.net.
Michael Sweet <mi...@easysw.com> writes:
> No, however I am more optimistic about an effort that has even an
> incomplete specification/proposal than one in which people say "this
> is hard, we need to get with everyone else to define it".
> 
> It is laudable to have a patch format that is usable by all version
> control systems, however from a practical standpoint it is more
> important to have a way to express patches beyond universal diffs
> with Subversion (I personally don't care about anything else.)

Are there straightforward transformations that would make the SVK
format useable for Subversion?  I'd love to stay as compatible with
SVK as we can.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by Michael Sweet <mi...@easysw.com>.
kfogel@collab.net wrote:
> Michael Sweet <mi...@easysw.com> writes:
>> How about just coming up with a proposed format, and then float it
>> among the other projects for comments?  IMHO, nothing will happen
>> until someone writes it up - the rest is "easy", you just tweak
>> things until everything is covered, and if you can't reach agreement
>> then you'll at least have something that will work for Subversion.
> 
> Um, right.
> 
> You should look over the past discussions and see how "easy" this is :-).

:)

FWIW, I *have* read the past discussions, and it sounds like the SVK
folks already have a working patch format, complete with metadata,
for Subversion.  Granted, this is probably not a general solution,
and I personally would prefer a metadata encoding that is human-
readable/easier to generate, however it is a place to start.

> (I don't mean to discourage you, but I do think you're being overly
> optimistic about the effort required.)

No, however I am more optimistic about an effort that has even an
incomplete specification/proposal than one in which people say "this
is hard, we need to get with everyone else to define it".

It is laudable to have a patch format that is usable by all version
control systems, however from a practical standpoint it is more
important to have a way to express patches beyond universal diffs
with Subversion (I personally don't care about anything else.)

-- 
______________________________________________________________________
Michael Sweet, Easy Software Products           mike at easysw dot com
Internet Printing and Publishing Software        http://www.easysw.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by kf...@collab.net.
Michael Sweet <mi...@easysw.com> writes:
> How about just coming up with a proposed format, and then float it
> among the other projects for comments?  IMHO, nothing will happen
> until someone writes it up - the rest is "easy", you just tweak
> things until everything is covered, and if you can't reach agreement
> then you'll at least have something that will work for Subversion.

Um, right.

You should look over the past discussions and see how "easy" this is :-).

(I don't mean to discourage you, but I do think you're being overly
optimistic about the effort required.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by Michael Sweet <mi...@easysw.com>.
kfogel@collab.net wrote:
> Julian Foad <ju...@btopenworld.com> writes:
>> Also I should think we ought to say explicitly that tree-change
>> information should be included in the output in an unambiguous form
>> that might be usable by a hypothetical tree-aware Patch program, and
>> can certainly be used by a human.
> 
> In the past, this has always gotten bogged down in the realization
> that we need to consult with a zillion other version control projects
> and come up with a common patch format that supports tree- and
> metadata-changes as well as straight diff data.  In other words, a
> patch format that supports the union of all the operations all these
> version control systems support.
> 
> This is perfectly possible, of course, just hard.
> 
> Do we want to give up on that goal?  Do we want to come up with our
> own parseable format that we know will work for Subversion, and then
> later if a common patch format develops (with or without our
> involvement), we can add an option to produce it?

How about just coming up with a proposed format, and then float it
among the other projects for comments?  IMHO, nothing will happen
until someone writes it up - the rest is "easy", you just tweak
things until everything is covered, and if you can't reach agreement
then you'll at least have something that will work for Subversion.

-- 
______________________________________________________________________
Michael Sweet, Easy Software Products           mike at easysw dot com
Internet Printing and Publishing Software        http://www.easysw.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by kf...@collab.net.
Julian Foad <ju...@btopenworld.com> writes:
> Also I should think we ought to say explicitly that tree-change
> information should be included in the output in an unambiguous form
> that might be usable by a hypothetical tree-aware Patch program, and
> can certainly be used by a human.

In the past, this has always gotten bogged down in the realization
that we need to consult with a zillion other version control projects
and come up with a common patch format that supports tree- and
metadata-changes as well as straight diff data.  In other words, a
patch format that supports the union of all the operations all these
version control systems support.

This is perfectly possible, of course, just hard.

Do we want to give up on that goal?  Do we want to come up with our
own parseable format that we know will work for Subversion, and then
later if a common patch format develops (with or without our
involvement), we can add an option to produce it?

-Karl

-- 
www.collab.net  <>  CollabNet  |  Distributed Development On Demand

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What does 'svn diff' do?

Posted by Julian Foad <ju...@btopenworld.com>.
Malcolm Rowe wrote:
> 
> I'm hoping that someone can explain to me what output 'svn diff' is
> supposed to produce.

Thank you _very_ much for asking.  We sorely need this defined and the bugs and 
quirks in "diff" ironed out.


> That's not as stupid a question as it sounds - I've looked around at the
> book, the API documentation, and the help, and although they describe
> it in general terms, the specifics are absent.
> 
> So here's my attempt at a definition of the _basic_ functionality,
> ignoring --notice-ancestry and --no-diff-deleted: (and note that while
> the actual client command has three syntax variations, they can all be
> implemented using the following)
> 
> 1. 'svn diff' accepts two path/revision pairs, each representing a
> subtree of a repository or working copy at a particular revision (or
> BASE or WORKING for the wc). The output of 'svn diff' is something that,
> when fed to GNU patch, will convert the first tree into the second tree.

Patch: not exactly; you need to say something about how only the modifications 
to individual files are obeyed by Patch, and whether this should include 
whole-file deletes that are part of a whole-subtree delete.

Also I should think we ought to say explicitly that tree-change information 
should be included in the output in an unambiguous form that might be usable by 
a hypothetical tree-aware Patch program, and can certainly be used by a human.

> 2. The output also contains textual comments that describe directory and
> file property modifications, except for deleted files or directories,
> where property deletions are implicit.
> 
> 3. The output of 'svn diff' will be identical (ignoring ordering)
> regardless of whether the subtrees are in a working copy or repository.

And something about what the same diff in reverse order should look like.

And, when we put this specification into more detail, something about how the 
files and revisions are labeled.  In particular, what revision number or "base" 
or "wc"-type label is appended to a file name in various circumstances.


> Sounds good? We don't do that, of course, but most of the reasons why
> are bugs.
> 
> The reason why I'd like a definition is that I'm spending some time
> looking at the diff code at the moment. It's pretty complex already,
> and it's even harder to understand when we don't seem to have a design
> spec for the feature.

Here, here!


> Also:
> 
> There _is_ one place where we deliberately don't follow the text above,
> and I'd also like to understand whether it's the result of a deliberate
> design decision, or whether it's a bug that's been codified into
> a feature.
> 
> Here it is:
> For BASE->WORKING and repos->WORKING diffs, we do not show a new copied
> (i.e. schedule-add-with-history) file as 'added' in the output of 'svn
> diff'. Regardless of the requested revision, for these files, we show only
> local modifications (that is, any modifications made _after_ the copy).

As others have commented, a re-think and proper design of this behaviour would 
be good.  Of course we have backward compatibility guarantees, but let that not 
stop us from designing the best long-term solution first and then seeing what 
we are allowed to do towards it in the short term.


Hooray for you, Malcolm!

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org