You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2006/02/16 18:33:56 UTC
Ignoring space changes, but not all space changes
Right, please don't flame me. This is a rambling train of thought that I might
not still agree with after due consideration and hearing others' thought.
In our plan to support ignoring space changes in "diff", and then "blame" and
presumably "merge" and perhaps "update", Peter has started by implementing
ignore-space-change
ignore-all-space
ignore-eol-style
But what do we really want?
When I said I wanted "svn up" to ignore whitespace, Stefan Haller wrote:
> I must be misunderstanding what you have in mind here, but if it is what
> I think it is, then I don't think this is a good idea. In C, whitespace
> is only insignificant between tokens; in string literals it is probably
> significant. And in python, you sure don't want to silently ignore it
> if someone changes
>
> if x:
> foo()
> bar()
>
> to
>
> if x:
> foo()
> bar()
He's quite right. What we really want, right now, is (Use Case 1) to ignore
changes in whitespace between a function name or expression and the following
opening parenthesis, in C files only. None of the suggested options can do
that properly or even safely.
The second half of that, "in C files only", is an orthogonal issue. It's a
long-standing request for enhancement that we should be able to compare
different types of files in different ways. We can crudely do so already, by
writing a "diff-cmd" script that looks at the file names.
The first half, that we want to ignore only certain space changes, is more
pertinent at the moment. We might be happy to generalise this particular
requirement to "ignore space changes between any C tokens, as long as the
tokens are separate both before and after the change (so "return a;" ->
"returna;" is not allowed). It still should not ignore changes to spacing in
string literals, but probably should ignore spacing inside comments.
Use Case 2: Ignoring indentation changes after, say, inserting an "if () {...}"
around a big block or just after reindenting subsequent lines to match a change
on an initial line. The feature needed for this is ignoring leading
whitespace. That's fairly simple and clear-cut.
Any other real use cases?
Use Case 3: Ignoring spacing changes in XML code. You can see where this is
going. Not only are there different rules for where space is significant, but
also this could only be done to a very limited extent on a single line, so we
really need to include newlines in our space-ignore processing. Oh, but that's
getting a bit more complicated than we were prepared for. Hmm.
I suggest it may be unwise to provide crude tools like "ignore all space" which
require the user to take the result with a pinch of salt, i.e. not trust the
result. It's all very well for just viewing the diff by eye, but what about
when we incorporate this facility in "merge" or "blame" or "update"?
One approach would be instead to introduce a larger set of more specific
options: e.g. ignore-space-at-eol, ignore-space-at-start,
ignore-space-between-c-tokens, etc., as well as ignore-all-space.
One concern is that Subversion's "diff" is starting to evolve into a
general-purpose "diff" engine, and our "diff" support will become less and less
pluggable because commands like "blame" and "update" (?) will be able to use
its special features but not those of an external diff. Is that true? Maybe
not, it depends on where this code is going, and I haven't looked at the patch.
Anyway, I would be very happy if we decided to let Subversion's diff evolve
as a somewhat separate project with a looser coupling to Subversion, one in
which it competed on equal terms with other diff programs.
I don't know. We'd never get anywhere if we required every part of the system
to be completely pure. I just don't want to see us at the end of the year
saying "Look what a mess we've got ourselves into by allowing ourselves to add
such a crude feature in the heat of the moment."
Here's a completely different way of looking at the situation. I don't use
Subversion's internal diff, I have "diff-cmd=/usr/bin/diff". I'll change that
to "diff-cmd=~/bin/my-svn-diff" and write "my-svn-diff" that handles all of
these space-ignoring modes, and more, and better, and make that available as a
contributed tool, at least for unix-like systems. Then we could ask, "Who are
we addressing with these options to the built-in diff?"
Maybe the answer is, "It's just to provide some very basic tools to those who
can't or don't want to use an external diff."
Quite likely Peter's doing the right thing. I needed at least to work through
these doubts in my mind and see if anyone else can shed light on them.
- Julian
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Ignoring space changes, but not all space changes
Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Fri, 17 Feb 2006, Ph. Marek wrote:
> On Thursday 16 February 2006 23:27, Julian Foad wrote:
> > In languages like C, what we need is:
> >
> > Ignore spaces completely when they're between tokens that don't
> > concatenate; ignore changes to the nonzero amount of space where tokens do
> > concatenate.
> >
> > The sets of tokens that do and do not concatenate are language-specific.
>
> I see lots of problems ahead.
...
>
>
> I believe that this should be solved like the diff-issue, ie. done by specific
> diff-programs, which get chosen based on the mime-type or the file(1) result.
I think it is fine to have the basic functionality that I committed
yesterday. It has been in (GNU?) diff for years and people seem to use
it.
> Note: I'm not sure how much work for blame is done on the server - but if the
> comparisions are done there, all needed diff-programs would have to be
> installed there, too.
>
The diff is run on the client, but always using the internal diff program.
If we allow external diff programs to be plugged into blame, we'd need to
start parsing (various!) diff romats and hope for the best...
> Sure, you could expand the internal diff with various options, but that'll get
> you only part of the way ... a full solution would have to know about the
> file-type.
I don't want to go adding lots of whitespace handling options to
libsvn_diff.
Thanks,
//Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Ignoring space changes, but not all space changes
Posted by "Ph. Marek" <ph...@bmlv.gv.at>.
On Thursday 16 February 2006 23:27, Julian Foad wrote:
> In languages like C, what we need is:
>
> Ignore spaces completely when they're between tokens that don't
> concatenate; ignore changes to the nonzero amount of space where tokens do
> concatenate.
>
> The sets of tokens that do and do not concatenate are language-specific.
I see lots of problems ahead.
1) In python the whitespace *at the start of the line* is significant.
2) In C, Perl, ..., the whitespace is often the thing which has to be repaired
*for visual reasons*, but should not necessarily show in diffs.
3) How are multiline-strings (valid eg in perl) and document-here handled?
4 ;-) What about whitespace (http://compsoc.dur.ac.uk/whitespace/) and
Acme::Bleach (http://search.cpan.org/dist/Acme-Bleach/lib/Acme/Bleach.pm)?
I believe that this should be solved like the diff-issue, ie. done by specific
diff-programs, which get chosen based on the mime-type or the file(1) result.
Note: I'm not sure how much work for blame is done on the server - but if the
comparisions are done there, all needed diff-programs would have to be
installed there, too.
Sure, you could expand the internal diff with various options, but that'll get
you only part of the way ... a full solution would have to know about the
file-type.
Regards,
Phil
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Ignoring space changes, but not all space changes
Posted by Julian Foad <ju...@btopenworld.com>.
kfogel@collab.net wrote:
>
> I think this misses the main case for ignoring whitespace changes. In
> many formats, changes in the "nonzero amount" of whitespace are
> insignificant, but changes in the *presence* of whitespace are
> significant. In other words
>
> foo bar
> foo bar
> foo bar
>
> are often semantically all the same, but completely different from
>
> foobar
Ah, but that's only true when the characters on either side concatenate into a
single token.
f ( - - x )
is equivalent to
f(- -x)
but they differ from
f (--x)
> This distinction corresponds to the GNU diff -b and -w options. [...]
Neither of those actually does what we want. Ignoring all space is unsafe,
whereas ignoring changes to existing space is safe but incomplete (it wouldn't
be any use on the recent reformatting).
In languages like C, what we need is:
Ignore spaces completely when they're between tokens that don't concatenate;
ignore changes to the nonzero amount of space where tokens do concatenate.
The sets of tokens that do and do not concatenate are language-specific.
> This is not really language-specific -- it's common to so many formats
> as to be generally useful & deserving of native support, IMHO.
I'm pretty much resigned to having these basic options because they are better
than nothing when used with a bit of human-brain filtering and luck, but they
don't really cut the mustard as tools for the job.
- Julian
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Ignoring space changes, but not all space changes
Posted by kf...@collab.net.
Julian Foad <ju...@btopenworld.com> writes:
> He's quite right. What we really want, right now, is (Use Case 1) to
> ignore changes in whitespace between a function name or expression and
> the following opening parenthesis, in C files only. None of the
> suggested options can do that properly or even safely.
>
> [...]
>
> Any other real use cases?
>
> Use Case 3: Ignoring spacing changes in XML code. You can see where
> this is going. Not only are there different rules for where space is
> significant, but also this could only be done to a very limited extent
> on a single line, so we really need to include newlines in our
> space-ignore processing. Oh, but that's getting a bit more
> complicated than we were prepared for. Hmm.
>
> I suggest it may be unwise to provide crude tools like "ignore all
> space" which require the user to take the result with a pinch of salt,
> i.e. not trust the result. It's all very well for just viewing the
> diff by eye, but what about when we incorporate this facility in
> "merge" or "blame" or "update"?
>
> One approach would be instead to introduce a larger set of more
> specific options: e.g. ignore-space-at-eol, ignore-space-at-start,
> ignore-space-between-c-tokens, etc., as well as ignore-all-space.
I think this misses the main case for ignoring whitespace changes. In
many formats, changes in the "nonzero amount" of whitespace are
insignificant, but changes in the *presence* of whitespace are
significant. In other words
foo bar
foo bar
foo bar
are often semantically all the same, but completely different from
foobar
This distinction corresponds to the GNU diff -b and -w options. (Not
sure whether to make a 2-by-2 behavior grid differentiating horizontal
from vertical whitespace, but we can talk about that later.)
This is not really language-specific -- it's common to so many formats
as to be generally useful & deserving of native support, IMHO.
I think this has been mentioned in other recent threads. But I didn't
see it discussed anywhere in your summary mail, so I wanted to state
it in direct response.
Best,
-Karl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Ignoring space changes, but not all space changes
Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 16 Feb 2006, Julian Foad wrote:
> Right, please don't flame me. This is a rambling train of thought that I might
> not still agree with after due consideration and hearing others' thought.
>
Flame, flame, flame! :-)
>
> In our plan to support ignoring space changes in "diff", and then "blame" and
> presumably "merge" and perhaps "update", Peter has started by implementing
>
> ignore-space-change
> ignore-all-space
> ignore-eol-style
>
> But what do we really want?
>
> When I said I wanted "svn up" to ignore whitespace, Stefan Haller wrote:
> > I must be misunderstanding what you have in mind here, but if it is what
> > I think it is, then I don't think this is a good idea. In C, whitespace
> > is only insignificant between tokens; in string literals it is probably
> > significant. And in python, you sure don't want to silently ignore it
> > if someone changes
> >
> > if x:
> > foo()
> > bar()
> >
> > to
> >
> > if x:
> > foo()
> > bar()
>
> He's quite right. What we really want, right now, is (Use Case 1) to ignore
...
I think these options are generally useful. Using them for languages tha
relies on whitespace extensively seems wrong to me
For a language like C or Java, they will most often dtrt, and I think that
if you use these options, you need to be careful and check the results.
Adding a lot of different --ignore-whitespace-blah options seems to go
overboard to me.
Thanks,
//Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org