You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Alexey Neyman <st...@att.net> on 2010/08/14 01:54:19 UTC

svn log --show-diff behavior

[Posting here as 'svn log --show-diff' is currently only available in trunk]

Hi All,

I am trying out 1.7.0 (yes, I know it's a bit early) and noticed that svn log 
now supports --show-diff. It's very convenient, thanks.

As the comments in the code explain, in case the path was created in some 
revision, the function will attempt to go one level up until it finds a parent 
that existed in both 'rev' and 'rev-1' revisions. This may result in LOTS of 
extra files - sometimes, the whole revision diff may be dumped. Consider what 
happens if the file being diff'ed originated as a part of 20Mb 'svn import'.

Judging from the commit message, this option should mimic 'git log -p' 
behavior, but git does not dump the whole diff by default when -p is 
specified. It only diffs the requested path and produces a full diff only if 
--full-diff option is also passed.

From http://www.kernel.org/pub/software/scm/git/docs/git-log.html

[[[
--full-diff 
 Without this flag, "git log -p <path>…" shows commits that touch the 
specified paths, and diffs about the same specified paths. With this, the full 
diff is shown for commits that touch the specified paths; this means that 
"<path>…" limits only commits, and doesn't limit diff for those commits.
]]]

Now, I understand this approach was chosen due to svn_client_diff5() returning 
an error if the path does not exist in the specified revision. So, first 
question is why svn_client_diff5() does not consider missing file as if it 
were an empty file? It does that thing when existing parent directory is 
diff'ed...

And, if svn_client_diff5() has valid reasons to produce an error in such case, 
why doesn't 'svn log --diff' just stop at that point, perhaps with some 
meaningful message that the specified file was created in that revision?

Regards,
Alexey.

Re: svn log --show-diff behavior

Posted by Alexey Neyman <st...@att.net>.
On Sunday, August 15, 2010 12:58:46 pm Stefan Sperling wrote:
> On Sat, Aug 14, 2010 at 11:47:52AM -0700, Alexey Neyman wrote:
> > On Saturday, August 14, 2010 04:31:44 am Stefan Sperling wrote:
> > > > Judging from the commit message, this option should mimic 'git log
> > > > -p' behavior, but git does not dump the whole diff by default when
> > > > -p is specified. It only diffs the requested path and produces a
> > > > full diff only if --full-diff option is also passed.
> > > 
> > > There is no need for a --full-diff option.
> > > If you have a commit with two paths, A and B, and you run svn log
> > > --diff A, you will only see a diff for A. So you can limit the scope
> > > of the diff shown by limiting the log operation to just the paths you
> > > are interested in.
> > 
> > I know that 'svn log' can take multiple paths. But still, there is a use
> > case for --full-diff:
> > 
> > a) One does not know the list of changed paths beforehand. Therefore, one
> > first needs to run 'svn log -v' to find what paths were changed and then
> > run 'svn log --diff' with those paths. This may be quite inconvenient if
> > the paths are scattered over the repository.
> > 
> > b) Suppose A is the on which one wants log with diffs; B and C are paths
> > that were changed alongside A:
> > 
> > rev X1: A B
> > rev X2: B
> > rev X3: A C
> > rev X4: C
> > 
> > Even if full list of paths is supplied, 'svn log --diff A B C' would show
> > revisions X1, X2, X3, X4, while 'svn log --diff --full-diff A' would have
> > shown only revisions X1 and X3 (but with corresponding changes
> > to paths B and C)
> 
> Hmmm. You could also list the revisions you want explicitly, but that
> would be awkward to use in complicated cases so I see your point.
> Would you like to try to write and submit a patch that implements this
> option?

I'll try, but first I'd like to see if diff can be made to treat a missing 
file as empty. Without it, log --diff would not be usable for me: in our 
repository, most of the files originated as a part of some large import. So, 
running 'svn log --diff' currently produces the output, 99% of which is not 
relevant to the file being logged :(

> > > It may be possible to show the diff anyway treating one of the sides as
> > > empty as you suggest, and thus showing a diff with only adds/deletes.
> > > I agree that this would make sense.
> > > 
> > > But making such a change may require substantial work within the
> > > Subversion libraries, so it takes more time than the quick band-aid
> > > fix which I applied to log-cmd.c to make it produce at least some sort
> > > of meaningful output. Sometimes circumstances force people to take
> > > short cuts...
> > 
> > Thing is, current approach does not work at all if the parent directory
> > was also added in the same revision as the path being logged. Try this:
> > 
> > [[[
> > #! /bin/bash
> > rm -rf tst-repo tst-wc
> > svnadmin create tst-repo
> > svn mkdir -m commit_1 "file://`pwd`/tst-repo/a"
> > svn co "file://`pwd`/tst-repo/a" tst-wc
> > svn mkdir tst-wc/b
> > echo 3 > tst-wc/b/3
> > echo 4 > tst-wc/b/4
> > svn add tst-wc/b/[34]
> > svn ci -m commit_2 tst-wc
> > svn log --diff tst-wc/b/3
> > ]]]
> > 
> > 'svn log' at the end of this script hangs indefinitely.
> 
> That was a silly bug I fixed yesterday. Please update your working copy,
> rebuild, and check. Your script runs fine for me with current trunk.

Thanks!

> > > It should show a diff which shows things as added. We need the output
> > > to be suitable for use with svn patch. A special message that only
> > > humans will understand isn't useful. Showing a diff with too broad a
> > > scope is much better than showing no diff at all.
> > 
> > Given that the changes are in the last-revision-first order, this patch
> > would only be applicable in the reverse direction (if it contains more
> > than one revision).
> 
> The last-revision-first order only affects the order in which revisions
> are listed. svn log --diff always shows diffs from revision N-1 to revision
> N, so the revision range of the log operation should not matter.

Oops, I didn't notice that revisions in 'svn log' output can sorted ascending 
or descending, depending on whether the revision range is M:N or N:M. With 
that in mind, I see how the output of 'svn log --diff' could be passed to the 
patch.

However, I still think that 'svn diff' is more convenient as an input to 'svn 
patch'. I think that 'svn log --diff' is most useful for human review - it 
allows to see both log message and changes at the same time.

Regards,
Alexey.

Re: svn log --show-diff behavior

Posted by Stefan Sperling <st...@elego.de>.
On Sat, Aug 14, 2010 at 11:47:52AM -0700, Alexey Neyman wrote:
> On Saturday, August 14, 2010 04:31:44 am Stefan Sperling wrote:
> > > Judging from the commit message, this option should mimic 'git log -p'
> > > behavior, but git does not dump the whole diff by default when -p is
> > > specified. It only diffs the requested path and produces a full diff only
> > > if --full-diff option is also passed.
> > 
> > There is no need for a --full-diff option.
> > If you have a commit with two paths, A and B, and you run svn log --diff A,
> > you will only see a diff for A. So you can limit the scope of the diff
> > shown by limiting the log operation to just the paths you are interested
> > in.
> 
> I know that 'svn log' can take multiple paths. But still, there is a use case 
> for --full-diff:
> 
> a) One does not know the list of changed paths beforehand. Therefore, one 
> first needs to run 'svn log -v' to find what paths were changed and then run 
> 'svn log --diff' with those paths. This may be quite inconvenient if the paths 
> are scattered over the repository.
> 
> b) Suppose A is the on which one wants log with diffs; B and C are paths that 
> were changed alongside A:
> 
> rev X1: A B
> rev X2: B
> rev X3: A C
> rev X4: C
> 
> Even if full list of paths is supplied, 'svn log --diff A B C' would show 
> revisions X1, X2, X3, X4, while 'svn log --diff --full-diff A' would have 
> shown only revisions X1 and X3 (but with corresponding changes
> to paths B and C)

Hmmm. You could also list the revisions you want explicitly, but that
would be awkward to use in complicated cases so I see your point.
Would you like to try to write and submit a patch that implements this option?

> > It may be possible to show the diff anyway treating one of the sides as
> > empty as you suggest, and thus showing a diff with only adds/deletes.
> > I agree that this would make sense.
> >
> > But making such a change may require substantial work within the Subversion
> > libraries, so it takes more time than the quick band-aid fix which I
> > applied to log-cmd.c to make it produce at least some sort of meaningful
> > output. Sometimes circumstances force people to take short cuts...
>  
> Thing is, current approach does not work at all if the parent directory was 
> also added in the same revision as the path being logged. Try this:
> 
> [[[
> #! /bin/bash
> rm -rf tst-repo tst-wc
> svnadmin create tst-repo
> svn mkdir -m commit_1 "file://`pwd`/tst-repo/a"
> svn co "file://`pwd`/tst-repo/a" tst-wc
> svn mkdir tst-wc/b
> echo 3 > tst-wc/b/3
> echo 4 > tst-wc/b/4
> svn add tst-wc/b/[34]
> svn ci -m commit_2 tst-wc
> svn log --diff tst-wc/b/3
> ]]]
> 
> 'svn log' at the end of this script hangs indefinitely.

That was a silly bug I fixed yesterday. Please update your working copy,
rebuild, and check. Your script runs fine for me with current trunk.

> > It should show a diff which shows things as added. We need the output to
> > be suitable for use with svn patch. A special message that only humans
> > will understand isn't useful. Showing a diff with too broad a scope is
> > much better than showing no diff at all.
> 
> Given that the changes are in the last-revision-first order, this patch would 
> only be applicable in the reverse direction (if it contains more than one 
> revision).

The last-revision-first order only affects the order in which revisions
are listed. svn log --diff always shows diffs from revision N-1 to revision N,
so the revision range of the log operation should not matter.

Stefan

Re: svn log --show-diff behavior

Posted by Alexey Neyman <st...@att.net>.
On Saturday, August 14, 2010 04:31:44 am Stefan Sperling wrote:
> > Judging from the commit message, this option should mimic 'git log -p'
> > behavior, but git does not dump the whole diff by default when -p is
> > specified. It only diffs the requested path and produces a full diff only
> > if --full-diff option is also passed.
> 
> There is no need for a --full-diff option.
> If you have a commit with two paths, A and B, and you run svn log --diff A,
> you will only see a diff for A. So you can limit the scope of the diff
> shown by limiting the log operation to just the paths you are interested
> in.

I know that 'svn log' can take multiple paths. But still, there is a use case 
for --full-diff:

a) One does not know the list of changed paths beforehand. Therefore, one 
first needs to run 'svn log -v' to find what paths were changed and then run 
'svn log --diff' with those paths. This may be quite inconvenient if the paths 
are scattered over the repository.

b) Suppose A is the on which one wants log with diffs; B and C are paths that 
were changed alongside A:

rev X1: A B
rev X2: B
rev X3: A C
rev X4: C

Even if full list of paths is supplied, 'svn log --diff A B C' would show 
revisions X1, X2, X3, X4, while 'svn log --diff --full-diff A' would have 
shown only revisions X1 and X3 (but with corresponding changes
to paths B and C)

> It may be possible to show the diff anyway treating one of the sides as
> empty as you suggest, and thus showing a diff with only adds/deletes.
> I agree that this would make sense.
>
> But making such a change may require substantial work within the Subversion
> libraries, so it takes more time than the quick band-aid fix which I
> applied to log-cmd.c to make it produce at least some sort of meaningful
> output. Sometimes circumstances force people to take short cuts...
 
Thing is, current approach does not work at all if the parent directory was 
also added in the same revision as the path being logged. Try this:

[[[
#! /bin/bash
rm -rf tst-repo tst-wc
svnadmin create tst-repo
svn mkdir -m commit_1 "file://`pwd`/tst-repo/a"
svn co "file://`pwd`/tst-repo/a" tst-wc
svn mkdir tst-wc/b
echo 3 > tst-wc/b/3
echo 4 > tst-wc/b/4
svn add tst-wc/b/[34]
svn ci -m commit_2 tst-wc
svn log --diff tst-wc/b/3
]]]

'svn log' at the end of this script hangs indefinitely.

> It should show a diff which shows things as added. We need the output to
> be suitable for use with svn patch. A special message that only humans
> will understand isn't useful. Showing a diff with too broad a scope is
> much better than showing no diff at all.

Given that the changes are in the last-revision-first order, this patch would 
only be applicable in the reverse direction (if it contains more than one 
revision).

Regards,
Alexey.

Re: svn log --show-diff behavior

Posted by Stefan Sperling <st...@elego.de>.
On Fri, Aug 13, 2010 at 06:54:19PM -0700, Alexey Neyman wrote:
> [Posting here as 'svn log --show-diff' is currently only available in trunk]
> 
> Hi All,
> 
> I am trying out 1.7.0 (yes, I know it's a bit early) and noticed that svn log 
> now supports --show-diff. It's very convenient, thanks.
> 
> As the comments in the code explain, in case the path was created in some 
> revision, the function will attempt to go one level up until it finds a parent 
> that existed in both 'rev' and 'rev-1' revisions. This may result in LOTS of 
> extra files - sometimes, the whole revision diff may be dumped. Consider what 
> happens if the file being diff'ed originated as a part of 20Mb 'svn import'.
> 
> Judging from the commit message, this option should mimic 'git log -p' 
> behavior, but git does not dump the whole diff by default when -p is 
> specified. It only diffs the requested path and produces a full diff only if 
> --full-diff option is also passed.

There is no need for a --full-diff option.
If you have a commit with two paths, A and B, and you run svn log --diff A,
you will only see a diff for A. So you can limit the scope of the diff shown
by limiting the log operation to just the paths you are interested in.

This works fine unless in the situation you describe, unfortunately.
As soon as we step up into the parent, the diff will include everything
beneath the parent.

> From http://www.kernel.org/pub/software/scm/git/docs/git-log.html
> 
> [[[
> --full-diff 
>  Without this flag, "git log -p <path>…" shows commits that touch the 
> specified paths, and diffs about the same specified paths. With this, the full 
> diff is shown for commits that touch the specified paths; this means that 
> "<path>…" limits only commits, and doesn't limit diff for those commits.
> ]]]
> 
> Now, I understand this approach was chosen due to svn_client_diff5() returning 
> an error if the path does not exist in the specified revision. So, first 
> question is why svn_client_diff5() does not consider missing file as if it 
> were an empty file? It does that thing when existing parent directory is 
> diff'ed...

I don't really know off-hand why it doesn't do this.
Quite possibly this is simply the result of organic growth of the code
base over the years, or a wrong design decision a long time ago, or a case
of "we'll deal with this later...". And nobody has so far been bothered
enough to fix it.

There are quite a few oddities in the diff implementation, e.g. this one
which was fixed recently (but also by using a band-aid approach):
http://subversion.tigris.org/issues/show_bug.cgi?id=2333
The plan to "rewrite diff" has actually been an in-joke between some of the
developers for a while, similar to "rewriting the working copy" (which some
developers assumed would never happen, but is happening now).

It may be possible to show the diff anyway treating one of the sides as
empty as you suggest, and thus showing a diff with only adds/deletes.
I agree that this would make sense.
But making such a change may require substantial work within the Subversion
libraries, so it takes more time than the quick band-aid fix which I applied
to log-cmd.c to make it produce at least some sort of meaningful output.
Sometimes circumstances force people to take short cuts...

See also this issue: http://subversion.tigris.org/issues/show_bug.cgi?id=2873
If that issue was fixed we could remove the log-cmd.c workaround, I think.
 
> And, if svn_client_diff5() has valid reasons to produce an error in such case, 
> why doesn't 'svn log --diff' just stop at that point, perhaps with some 
> meaningful message that the specified file was created in that revision?

It should show a diff which shows things as added. We need the output to
be suitable for use with svn patch. A special message that only humans
will understand isn't useful. Showing a diff with too broad a scope is
much better than showing no diff at all.

Stefan