You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2006/02/16 18:33:56 UTC

Ignoring space changes, but not all space changes

Right, please don't flame me.  This is a rambling train of thought that I might 
not still agree with after due consideration and hearing others' thought.


In our plan to support ignoring space changes in "diff", and then "blame" and 
presumably "merge" and perhaps "update", Peter has started by implementing

   ignore-space-change
   ignore-all-space
   ignore-eol-style

But what do we really want?

When I said I wanted "svn up" to ignore whitespace, Stefan Haller wrote:
> I must be misunderstanding what you have in mind here, but if it is what
> I think it is, then I don't think this is a good idea.  In C, whitespace
> is only insignificant between tokens; in string literals it is probably
> significant.  And in python, you sure don't want to silently ignore it
> if someone changes
> 
>     if x:
>       foo()
>       bar()
> 
> to
> 
>     if x:
>       foo()
>     bar()

He's quite right.  What we really want, right now, is (Use Case 1) to ignore 
changes in whitespace between a function name or expression and the following 
opening parenthesis, in C files only.  None of the suggested options can do 
that properly or even safely.

The second half of that, "in C files only", is an orthogonal issue.  It's a 
long-standing request for enhancement that we should be able to compare 
different types of files in different ways.  We can crudely do so already, by 
writing a "diff-cmd" script that looks at the file names.

The first half, that we want to ignore only certain space changes, is more 
pertinent at the moment.  We might be happy to generalise this particular 
requirement to "ignore space changes between any C tokens, as long as the 
tokens are separate both before and after the change (so "return a;" -> 
"returna;" is not allowed).  It still should not ignore changes to spacing in 
string literals, but probably should ignore spacing inside comments.

Use Case 2: Ignoring indentation changes after, say, inserting an "if () {...}" 
around a big block or just after reindenting subsequent lines to match a change 
on an initial line.  The feature needed for this is ignoring leading 
whitespace.  That's fairly simple and clear-cut.

Any other real use cases?

Use Case 3: Ignoring spacing changes in XML code.  You can see where this is 
going.  Not only are there different rules for where space is significant, but 
also this could only be done to a very limited extent on a single line, so we 
really need to include newlines in our space-ignore processing.  Oh, but that's 
getting a bit more complicated than we were prepared for.  Hmm.


I suggest it may be unwise to provide crude tools like "ignore all space" which 
require the user to take the result with a pinch of salt, i.e. not trust the 
result.  It's all very well for just viewing the diff by eye, but what about 
when we incorporate this facility in "merge" or "blame" or "update"?

One approach would be instead to introduce a larger set of more specific 
options: e.g. ignore-space-at-eol, ignore-space-at-start, 
ignore-space-between-c-tokens, etc., as well as ignore-all-space.

One concern is that Subversion's "diff" is starting to evolve into a 
general-purpose "diff" engine, and our "diff" support will become less and less 
pluggable because commands like "blame" and "update" (?) will be able to use 
its special features but not those of an external diff.  Is that true?  Maybe 
not, it depends on where this code is going, and I haven't looked at the patch. 
  Anyway, I would be very happy if we decided to let Subversion's diff evolve 
as a somewhat separate project with a looser coupling to Subversion, one in 
which it competed on equal terms with other diff programs.

I don't know.  We'd never get anywhere if we required every part of the system 
to be completely pure.  I just don't want to see us at the end of the year 
saying "Look what a mess we've got ourselves into by allowing ourselves to add 
such a crude feature in the heat of the moment."


Here's a completely different way of looking at the situation.  I don't use 
Subversion's internal diff, I have "diff-cmd=/usr/bin/diff".  I'll change that 
to "diff-cmd=~/bin/my-svn-diff" and write "my-svn-diff" that handles all of 
these space-ignoring modes, and more, and better, and make that available as a 
contributed tool, at least for unix-like systems.  Then we could ask, "Who are 
we addressing with these options to the built-in diff?"

Maybe the answer is, "It's just to provide some very basic tools to those who 
can't or don't want to use an external diff."

Quite likely Peter's doing the right thing.  I needed at least to work through 
these doubts in my mind and see if anyone else can shed light on them.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ignoring space changes, but not all space changes

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Fri, 17 Feb 2006, Ph. Marek wrote:

> On Thursday 16 February 2006 23:27, Julian Foad wrote:
> > In languages like C, what we need is:
> >
> >    Ignore spaces completely when they're between tokens that don't
> > concatenate; ignore changes to the nonzero amount of space where tokens do
> > concatenate.
> >
> > The sets of tokens that do and do not concatenate are language-specific.
>
> I see lots of problems ahead.
...
>
>
> I believe that this should be solved like the diff-issue, ie. done by specific
> diff-programs, which get chosen based on the mime-type or the file(1) result.

I think it is fine to have the basic functionality that I committed
yesterday.  It has been in (GNU?) diff for years and people seem to use
it.

> Note: I'm not sure how much work for blame is done on the server - but if the
> comparisions are done there, all needed diff-programs would have to be
> installed there, too.
>
The diff is run on the client, but always using the internal diff program.
If we allow external diff programs to be plugged into blame, we'd need to
start parsing (various!) diff romats and hope for the best...


> Sure, you could expand the internal diff with various options, but that'll get
> you only part of the way ... a full solution would have to know about the
> file-type.

I don't want to go adding lots of whitespace handling options to
libsvn_diff.

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ignoring space changes, but not all space changes

Posted by "Ph. Marek" <ph...@bmlv.gv.at>.
On Thursday 16 February 2006 23:27, Julian Foad wrote:
> In languages like C, what we need is:
>
>    Ignore spaces completely when they're between tokens that don't
> concatenate; ignore changes to the nonzero amount of space where tokens do
> concatenate.
>
> The sets of tokens that do and do not concatenate are language-specific.

I see lots of problems ahead.

1) In python the whitespace *at the start of the line* is significant.
2) In C, Perl, ..., the whitespace is often the thing which has to be repaired 
*for visual reasons*, but should not necessarily show in diffs.
3) How are multiline-strings (valid eg in perl) and document-here handled?
4 ;-) What about whitespace (http://compsoc.dur.ac.uk/whitespace/) and 
Acme::Bleach (http://search.cpan.org/dist/Acme-Bleach/lib/Acme/Bleach.pm)?


I believe that this should be solved like the diff-issue, ie. done by specific 
diff-programs, which get chosen based on the mime-type or the file(1) result.
Note: I'm not sure how much work for blame is done on the server - but if the 
comparisions are done there, all needed diff-programs would have to be 
installed there, too.

Sure, you could expand the internal diff with various options, but that'll get 
you only part of the way ... a full solution would have to know about the 
file-type.


Regards,

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ignoring space changes, but not all space changes

Posted by Julian Foad <ju...@btopenworld.com>.
kfogel@collab.net wrote:
> 
> I think this misses the main case for ignoring whitespace changes.  In
> many formats, changes in the "nonzero amount" of whitespace are
> insignificant, but changes in the *presence* of whitespace are
> significant.  In other words
> 
>    foo           bar
>    foo     bar
>    foo bar
> 
> are often semantically all the same, but completely different from
> 
>    foobar

Ah, but that's only true when the characters on either side concatenate into a 
single token.

   f ( - - x )

is equivalent to

   f(- -x)

but they differ from

   f (--x)

> This distinction corresponds to the GNU diff -b and -w options.  [...]

Neither of those actually does what we want.  Ignoring all space is unsafe, 
whereas ignoring changes to existing space is safe but incomplete (it wouldn't 
be any use on the recent reformatting).

In languages like C, what we need is:

   Ignore spaces completely when they're between tokens that don't concatenate;
   ignore changes to the nonzero amount of space where tokens do concatenate.

The sets of tokens that do and do not concatenate are language-specific.


> This is not really language-specific -- it's common to so many formats
> as to be generally useful & deserving of native support, IMHO.

I'm pretty much resigned to having these basic options because they are better 
than nothing when used with a bit of human-brain filtering and luck, but they 
don't really cut the mustard as tools for the job.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ignoring space changes, but not all space changes

Posted by kf...@collab.net.
Julian Foad <ju...@btopenworld.com> writes:
> He's quite right.  What we really want, right now, is (Use Case 1) to
> ignore changes in whitespace between a function name or expression and
> the following opening parenthesis, in C files only.  None of the
> suggested options can do that properly or even safely.
> 
> [...]
> 
> Any other real use cases?
> 
> Use Case 3: Ignoring spacing changes in XML code.  You can see where
> this is going.  Not only are there different rules for where space is
> significant, but also this could only be done to a very limited extent
> on a single line, so we really need to include newlines in our
> space-ignore processing.  Oh, but that's getting a bit more
> complicated than we were prepared for.  Hmm.
> 
> I suggest it may be unwise to provide crude tools like "ignore all
> space" which require the user to take the result with a pinch of salt,
> i.e. not trust the result.  It's all very well for just viewing the
> diff by eye, but what about when we incorporate this facility in
> "merge" or "blame" or "update"?
> 
> One approach would be instead to introduce a larger set of more
> specific options: e.g. ignore-space-at-eol, ignore-space-at-start,
> ignore-space-between-c-tokens, etc., as well as ignore-all-space.

I think this misses the main case for ignoring whitespace changes.  In
many formats, changes in the "nonzero amount" of whitespace are
insignificant, but changes in the *presence* of whitespace are
significant.  In other words

   foo           bar
   foo     bar
   foo bar

are often semantically all the same, but completely different from

   foobar

This distinction corresponds to the GNU diff -b and -w options.  (Not
sure whether to make a 2-by-2 behavior grid differentiating horizontal
from vertical whitespace, but we can talk about that later.)

This is not really language-specific -- it's common to so many formats
as to be generally useful & deserving of native support, IMHO.

I think this has been mentioned in other recent threads.  But I didn't
see it discussed anywhere in your summary mail, so I wanted to state
it in direct response.

Best,
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Ignoring space changes, but not all space changes

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 16 Feb 2006, Julian Foad wrote:

> Right, please don't flame me.  This is a rambling train of thought that I might
> not still agree with after due consideration and hearing others' thought.
>

Flame, flame, flame! :-)
>
> In our plan to support ignoring space changes in "diff", and then "blame" and
> presumably "merge" and perhaps "update", Peter has started by implementing
>
>    ignore-space-change
>    ignore-all-space
>    ignore-eol-style
>
> But what do we really want?
>
> When I said I wanted "svn up" to ignore whitespace, Stefan Haller wrote:
> > I must be misunderstanding what you have in mind here, but if it is what
> > I think it is, then I don't think this is a good idea.  In C, whitespace
> > is only insignificant between tokens; in string literals it is probably
> > significant.  And in python, you sure don't want to silently ignore it
> > if someone changes
> >
> >     if x:
> >       foo()
> >       bar()
> >
> > to
> >
> >     if x:
> >       foo()
> >     bar()
>
> He's quite right.  What we really want, right now, is (Use Case 1) to ignore
...

I think these options are generally useful.  Using them for languages tha
relies on whitespace extensively seems wrong to me

For a language like C or Java, they will most often dtrt, and I think that
if you use these options, you need to be careful and check the results.
Adding a lot of different --ignore-whitespace-blah options seems to go
overboard to me.

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org