You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Näslund <da...@longitudo.com> on 2010/04/14 14:21:56 UTC

#3610, How make 'svn patch' able to use the targets lines for intermediate context?

Hi!

#3610 [1] is about ignoring white space changes to be able to apply
patches that have been mangled, i.e. a mailing software converting tabs
to spaces, removing trailing white spaces or leading white spaces.

It means that if a certain option is given, we will match and apply
hunks if the only thing differing is white spaces.

The problem
-------------
The '+' lines will be applied with the white spaces changes
in the patch. That's the intended behaviour. But it would be preferable if
the context lines would be read from the target. For leading and
trailing context, I can do that. But in the case when there are context
lines in the middle of a hunk I can't since svn_hunk_t uses a stream
called modified_text that makes no distinction between '+' lines and context
lines, e.g. the leading ' ' or '+' are removed.

The options
------------
1) Just say that the whole hunk will be applied and we make no
guarantees on what the white spaces will be.

2) Change svn_hunk_t to make it possible to differ between context and
'+' lines. I really hope I don't have to do that since using
modified_text and original_text is a clean solution and all the code in
libsvn_client/patch.c depends on the current representation.

3) Introduce some sort of list of offsets into the hunk for where
intermediate context starts and ends. Yeah, that sucks! 

Any suggestions? I'd go with option 1, since the other two are
complicated and ugly to implement.

cheers,
Daniel

[1] http://subversion.tigris.org/issues/show_bug.cgi?id=3610

Re: #3610, How make 'svn patch' able to use the targets lines for intermediate context?

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Apr 14, 2010 at 05:49:51PM +0200, Daniel Näslund wrote:
> On Wed, Apr 14, 2010 at 05:35:36PM +0200, Stefan Sperling wrote:
> > I.e. we'll add some optional magic in match_hunk(), making it skip over
> > whitespace on either side (as determined by isspace()), but comparing
> > any other characters.
> 
> Note that a whitespace may be tabs beeing translated to spaces, e.g.
> it's not enough to just look at the trailing or leading whitespaces.

Yes. I meant either side as in "original and modified text", not as
in "start and end" :)  I could have phrased this better.

> I've used this approach (comparing the the lines with all whitespaces
> removed):
> [[[
> if (ignore_whitespaces)
>   {
>     char *stripped_hunk_line = apr_pstrdup(pool, 
>                                            hunk_line_translated);
>     char *stripped_target_line = apr_pstrdup(pool, target_line);
> 
>     apr_collapse_spaces(stripped_hunk_line,
>                         hunk_line_translated);
>     apr_collapse_spaces(stripped_target_line, target_line);
>     lines_matched = ! strcmp(stripped_hunk_line,
>                              stripped_target_line);
>   }
> ]]]

Looks fine.

Stefan

Re: #3610, How make 'svn patch' able to use the targets lines for intermediate context?

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Apr 14, 2010 at 05:49:51PM +0200, Daniel Näslund wrote:
> On Wed, Apr 14, 2010 at 05:35:36PM +0200, Stefan Sperling wrote:
> > I.e. we'll add some optional magic in match_hunk(), making it skip over
> > whitespace on either side (as determined by isspace()), but comparing
> > any other characters.
> 
> Note that a whitespace may be tabs beeing translated to spaces, e.g.
> it's not enough to just look at the trailing or leading whitespaces.

Yes. I meant either side as in "original and modified text", not as
in "start and end" :)  I could have phrased this better.

> I've used this approach (comparing the the lines with all whitespaces
> removed):
> [[[
> if (ignore_whitespaces)
>   {
>     char *stripped_hunk_line = apr_pstrdup(pool, 
>                                            hunk_line_translated);
>     char *stripped_target_line = apr_pstrdup(pool, target_line);
> 
>     apr_collapse_spaces(stripped_hunk_line,
>                         hunk_line_translated);
>     apr_collapse_spaces(stripped_target_line, target_line);
>     lines_matched = ! strcmp(stripped_hunk_line,
>                              stripped_target_line);
>   }
> ]]]

Looks fine.

Stefan

Re: #3610, How make 'svn patch' able to use the targets lines for intermediate context?

Posted by Daniel Näslund <da...@longitudo.com>.
On Wed, Apr 14, 2010 at 05:35:36PM +0200, Stefan Sperling wrote:
> On Wed, Apr 14, 2010 at 04:21:56PM +0200, Daniel Näslund wrote:
> > Hi!
> > 
> > #3610 [1] is about ignoring white space changes to be able to apply
> > patches that have been mangled, i.e. a mailing software converting tabs
> > to spaces, removing trailing white spaces or leading white spaces.
> > 
> > It means that if a certain option is given, we will match and apply
> > hunks if the only thing differing is white spaces.
> 
> I'd say it means that if a certain option is given, we will match
> and apply hunks even if amount of whitespace doesn't match (everything
> else must match though).
> 
> I.e. we'll add some optional magic in match_hunk(), making it skip over
> whitespace on either side (as determined by isspace()), but comparing
> any other characters.

Note that a whitespace may be tabs beeing translated to spaces, e.g.
it's not enough to just look at the trailing or leading whitespaces.
I've used this approach (comparing the the lines with all whitespaces
removed):

[[[
if (ignore_whitespaces)
  {
    char *stripped_hunk_line = apr_pstrdup(pool, 
                                           hunk_line_translated);
    char *stripped_target_line = apr_pstrdup(pool, target_line);

    apr_collapse_spaces(stripped_hunk_line,
                        hunk_line_translated);
    apr_collapse_spaces(stripped_target_line, target_line);
    lines_matched = ! strcmp(stripped_hunk_line,
                             stripped_target_line);
  }
]]]

> > The problem
> > -------------
> > The '+' lines will be applied with the white spaces changes
> > in the patch. That's the intended behaviour. But it would be preferable if
> > the context lines would be read from the target. For leading and
> > trailing context, I can do that. But in the case when there are context
> > lines in the middle of a hunk I can't since svn_hunk_t uses a stream
> > called modified_text that makes no distinction between '+' lines and context
> > lines, e.g. the leading ' ' or '+' are removed.
> > 
> > The options
> > ------------
> > 1) Just say that the whole hunk will be applied and we make no
> > guarantees on what the white spaces will be.
> 
> Yes. Just use whatever comes from the patch, including context lines.
> This is consistent with the current behaviour. I think we should avoid
> special cases where this rule is currently not true anymore.

I'll do that.

> (I'm not sure how UNIX patch behaves wrt context lines, actually.
> Might be worth checking.)

> 
> If people don't like the result of patching, they can edit the patch
> to their liking and re-apply it (or ask the submitter to resubmit
> the patch -- mangled patches are usually the submitter's fault).

my thoughts exactly.

Thanks,
Daniel

Re: #3610, How make 'svn patch' able to use the targets lines for intermediate context?

Posted by Daniel Näslund <da...@longitudo.com>.
On Wed, Apr 14, 2010 at 05:35:36PM +0200, Stefan Sperling wrote:
> On Wed, Apr 14, 2010 at 04:21:56PM +0200, Daniel Näslund wrote:
> > Hi!
> > 
> > #3610 [1] is about ignoring white space changes to be able to apply
> > patches that have been mangled, i.e. a mailing software converting tabs
> > to spaces, removing trailing white spaces or leading white spaces.
> > 
> > It means that if a certain option is given, we will match and apply
> > hunks if the only thing differing is white spaces.
> 
> I'd say it means that if a certain option is given, we will match
> and apply hunks even if amount of whitespace doesn't match (everything
> else must match though).
> 
> I.e. we'll add some optional magic in match_hunk(), making it skip over
> whitespace on either side (as determined by isspace()), but comparing
> any other characters.

Note that a whitespace may be tabs beeing translated to spaces, e.g.
it's not enough to just look at the trailing or leading whitespaces.
I've used this approach (comparing the the lines with all whitespaces
removed):

[[[
if (ignore_whitespaces)
  {
    char *stripped_hunk_line = apr_pstrdup(pool, 
                                           hunk_line_translated);
    char *stripped_target_line = apr_pstrdup(pool, target_line);

    apr_collapse_spaces(stripped_hunk_line,
                        hunk_line_translated);
    apr_collapse_spaces(stripped_target_line, target_line);
    lines_matched = ! strcmp(stripped_hunk_line,
                             stripped_target_line);
  }
]]]

> > The problem
> > -------------
> > The '+' lines will be applied with the white spaces changes
> > in the patch. That's the intended behaviour. But it would be preferable if
> > the context lines would be read from the target. For leading and
> > trailing context, I can do that. But in the case when there are context
> > lines in the middle of a hunk I can't since svn_hunk_t uses a stream
> > called modified_text that makes no distinction between '+' lines and context
> > lines, e.g. the leading ' ' or '+' are removed.
> > 
> > The options
> > ------------
> > 1) Just say that the whole hunk will be applied and we make no
> > guarantees on what the white spaces will be.
> 
> Yes. Just use whatever comes from the patch, including context lines.
> This is consistent with the current behaviour. I think we should avoid
> special cases where this rule is currently not true anymore.

I'll do that.

> (I'm not sure how UNIX patch behaves wrt context lines, actually.
> Might be worth checking.)

> 
> If people don't like the result of patching, they can edit the patch
> to their liking and re-apply it (or ask the submitter to resubmit
> the patch -- mangled patches are usually the submitter's fault).

my thoughts exactly.

Thanks,
Daniel

Re: #3610, How make 'svn patch' able to use the targets lines for intermediate context?

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Apr 14, 2010 at 09:27:25PM +0200, Daniel Näslund wrote:
> On Wed, Apr 14, 2010 at 08:18:06PM +0200, Alan Barrett wrote:
> > I don't buy the argument that it's necessary, for consistency with
> > current svn behaviour, to copy context lines from the patch to
> > the output file.

It is consistent with current "svn patch" behaviour.
I never meant to argue that there was some consistency with the
rest of svn.

> Nope, it wouldn't be an incompatible change. It's just that it would
> involve rewriting how we construct the streams from the patch and how we
> apply the hunks.

Yes, and that is why it's a separate problem.

We can implement the whitespace ignore matching parts,
while getting context from the patch as we do now.
If folks want us to copy context lines from the target instead of
the patch, that can probably be done. But it is a different problem.
Let's not mix up two different problems.

Stefan

Re: #3610, How make 'svn patch' able to use the targets lines for intermediate context?

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Apr 14, 2010 at 09:27:25PM +0200, Daniel Näslund wrote:
> On Wed, Apr 14, 2010 at 08:18:06PM +0200, Alan Barrett wrote:
> > I don't buy the argument that it's necessary, for consistency with
> > current svn behaviour, to copy context lines from the patch to
> > the output file.

It is consistent with current "svn patch" behaviour.
I never meant to argue that there was some consistency with the
rest of svn.

> Nope, it wouldn't be an incompatible change. It's just that it would
> involve rewriting how we construct the streams from the patch and how we
> apply the hunks.

Yes, and that is why it's a separate problem.

We can implement the whitespace ignore matching parts,
while getting context from the patch as we do now.
If folks want us to copy context lines from the target instead of
the patch, that can probably be done. But it is a different problem.
Let's not mix up two different problems.

Stefan

Re: #3610, How make 'svn patch' able to use the targets lines for intermediate context?

Posted by Alan Barrett <ap...@cequrux.com>.
On Wed, 14 Apr 2010, Daniel Nslund wrote:
> If we would implement --ignore-whitespaces to have the behaviour you
> suggested we would have to rewrite the parts dealing with
> modified_stream to make us able to distinguish between context lines and
> added lines. At the moment, we filter out the leading characters (' ',
> '+', '-') before writing to the streams.

Yes, I understand that implementing the "better" behaviour is a lot more
work.  Perhaps it would be reasonable to implement the easier algorithm
first, and clearly document it's limitations.

> In my world, I would probably just ask the submitter to retransmit the
> patch using some other mail software.

I have sometimes taken patches from web pages where communicating with
the author is not easy.  But svn doesn't have to solve this problem; we
have external tools for patching and editing.

--apb (Alan Barrett)

Re: #3610, How make 'svn patch' able to use the targets lines for intermediate context?

Posted by Daniel Näslund <da...@longitudo.com>.
On Wed, Apr 14, 2010 at 08:18:06PM +0200, Alan Barrett wrote:
> On Wed, 14 Apr 2010, Stefan Sperling wrote:
> > Yes. Just use whatever comes from the patch, including context lines.
> > This is consistent with the current behaviour.  I think we should avoid
> > special cases where this rule is currently not true anymore.
> > (I'm not sure how UNIX patch behaves wrt context lines, actually.
> > Might be worth checking.)
> 
> If applying a patch ever causes changes to lines that are marked in the
> patch as context lines (not as lines to be removed or added), then I
> expect people to be unhappy.  The context lines in the patch file are
> known or suspected to be damaged (or else the user would not have asked
> to ignore whitespace changes), so copying the context lines from the
> patch is not wanted.
> 
> The GNU patch implementation that I just tried does the right thing
> here: when passed the "--ignore-whitespace" option, it ignores
> whitespace changes in context lines and lines marked as being removed,
> and it copies context lines from the input file, not from the patch.

Darn GNU patch! :) I guess there's a reason for the maze of if
statements that makes up it's source code! It does more than one
expects!

> I don't buy the argument that it's necessary, for consistency with
> current svn behaviour, to copy context lines from the patch to
> the output file.  Current behaviour does not include any kind of
> "ignore whitespace" option, so the context lines in the input file
> are guaranteed not to differ from the context lines in the patch,
> so an outside observer (without knowledge of the internals of the
> implementation) cannot tell whether the current implementation copies
> context lines from the patch or from the input file.  Changing the
> implementation to copy context lines from the input file would therefore
> not be an incompatible change.

Nope, it wouldn't be an incompatible change. It's just that it would
involve rewriting how we construct the streams from the patch and how we
apply the hunks. The current behaviour is (writing it down to remind
myself):

When parsing
-------------
Lines beginning with ' ' and '-' are read to original_stream
Lines beinning with ' ' and '+' are read to modified_stream

When applying
---------------
for each hunk in patch
  determine the line the hunk should be applied at by matching
  original_stream to target

for each hunk in patch
  copy lines from target until we reach match-point
  copy lines from modified_stream

If we would implement --ignore-whitespaces to have the behaviour you
suggested we would have to rewrite the parts dealing with
modified_stream to make us able to distinguish between context lines and
added lines. At the moment, we filter out the leading characters (' ',
'+', '-') before writing to the streams.

In the end it boils down to: From my standpoint as a newbie 
programmer I'm very dense when it comes to any major changes. I do agree
that the behaviour you describe is the preferred but I'm a bit annoyed
about having to do all those changes for something that is a very small
part of the typical use case set. In my world, I would probably just ask
the submitter to retransmit the patch using some other mail software.

I hope I'm not too grumpy. I just don't want to rush into some bigger
change without weighing the options. 

cheers,
Daniel

Re: #3610, How make 'svn patch' able to use the targets lines for intermediate context?

Posted by Alan Barrett <ap...@cequrux.com>.
On Wed, 14 Apr 2010, Stefan Sperling wrote:
> Yes. Just use whatever comes from the patch, including context lines.
> This is consistent with the current behaviour.  I think we should avoid
> special cases where this rule is currently not true anymore.
> (I'm not sure how UNIX patch behaves wrt context lines, actually.
> Might be worth checking.)

If applying a patch ever causes changes to lines that are marked in the
patch as context lines (not as lines to be removed or added), then I
expect people to be unhappy.  The context lines in the patch file are
known or suspected to be damaged (or else the user would not have asked
to ignore whitespace changes), so copying the context lines from the
patch is not wanted.

The GNU patch implementation that I just tried does the right thing
here: when passed the "--ignore-whitespace" option, it ignores
whitespace changes in context lines and lines marked as being removed,
and it copies context lines from the input file, not from the patch.

I don't buy the argument that it's necessary, for consistency with
current svn behaviour, to copy context lines from the patch to
the output file.  Current behaviour does not include any kind of
"ignore whitespace" option, so the context lines in the input file
are guaranteed not to differ from the context lines in the patch,
so an outside observer (without knowledge of the internals of the
implementation) cannot tell whether the current implementation copies
context lines from the patch or from the input file.  Changing the
implementation to copy context lines from the input file would therefore
not be an incompatible change.

--apb (Alan Barrett)

Re: #3610, How make 'svn patch' able to use the targets lines for intermediate context?

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Apr 14, 2010 at 04:21:56PM +0200, Daniel Näslund wrote:
> Hi!
> 
> #3610 [1] is about ignoring white space changes to be able to apply
> patches that have been mangled, i.e. a mailing software converting tabs
> to spaces, removing trailing white spaces or leading white spaces.
> 
> It means that if a certain option is given, we will match and apply
> hunks if the only thing differing is white spaces.

I'd say it means that if a certain option is given, we will match
and apply hunks even if amount of whitespace doesn't match (everything
else must match though).

I.e. we'll add some optional magic in match_hunk(), making it skip over
whitespace on either side (as determined by isspace()), but comparing
any other characters.

> The problem
> -------------
> The '+' lines will be applied with the white spaces changes
> in the patch. That's the intended behaviour. But it would be preferable if
> the context lines would be read from the target. For leading and
> trailing context, I can do that. But in the case when there are context
> lines in the middle of a hunk I can't since svn_hunk_t uses a stream
> called modified_text that makes no distinction between '+' lines and context
> lines, e.g. the leading ' ' or '+' are removed.
> 
> The options
> ------------
> 1) Just say that the whole hunk will be applied and we make no
> guarantees on what the white spaces will be.

Yes. Just use whatever comes from the patch, including context lines.
This is consistent with the current behaviour. I think we should avoid
special cases where this rule is currently not true anymore.
(I'm not sure how UNIX patch behaves wrt context lines, actually.
Might be worth checking.)

If people don't like the result of patching, they can edit the patch
to their liking and re-apply it (or ask the submitter to resubmit
the patch -- mangled patches are usually the submitter's fault).

Stefan

Re: #3610, How make 'svn patch' able to use the targets lines for intermediate context?

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Apr 14, 2010 at 04:21:56PM +0200, Daniel Näslund wrote:
> Hi!
> 
> #3610 [1] is about ignoring white space changes to be able to apply
> patches that have been mangled, i.e. a mailing software converting tabs
> to spaces, removing trailing white spaces or leading white spaces.
> 
> It means that if a certain option is given, we will match and apply
> hunks if the only thing differing is white spaces.

I'd say it means that if a certain option is given, we will match
and apply hunks even if amount of whitespace doesn't match (everything
else must match though).

I.e. we'll add some optional magic in match_hunk(), making it skip over
whitespace on either side (as determined by isspace()), but comparing
any other characters.

> The problem
> -------------
> The '+' lines will be applied with the white spaces changes
> in the patch. That's the intended behaviour. But it would be preferable if
> the context lines would be read from the target. For leading and
> trailing context, I can do that. But in the case when there are context
> lines in the middle of a hunk I can't since svn_hunk_t uses a stream
> called modified_text that makes no distinction between '+' lines and context
> lines, e.g. the leading ' ' or '+' are removed.
> 
> The options
> ------------
> 1) Just say that the whole hunk will be applied and we make no
> guarantees on what the white spaces will be.

Yes. Just use whatever comes from the patch, including context lines.
This is consistent with the current behaviour. I think we should avoid
special cases where this rule is currently not true anymore.
(I'm not sure how UNIX patch behaves wrt context lines, actually.
Might be worth checking.)

If people don't like the result of patching, they can edit the patch
to their liking and re-apply it (or ask the submitter to resubmit
the patch -- mangled patches are usually the submitter's fault).

Stefan