You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Branko Čibej <br...@wandisco.com> on 2012/11/20 11:25:30 UTC

Wrong assumption about unidiff syntax in libsvn_diff?

I'm digging through libsvn_diff and found this snippet in diff_memory.c:

      SVN_ERR(svn_utf_cstring_from_utf8_ex2
              (&out_str,
               /* The string below is intentionally not marked for translation:
                  it's vital to correct operation of the diff(1)/patch(1)
                  program pair. */
               APR_EOL_STR "\\ No newline at end of file" APR_EOL_STR,
               btn->header_encoding, btn->pool));


I was under the impression that we already cleared up this
misconception? Only the leading backslash and space are important for
signaling the no-trailing-eoln state. The text itself can be localized,
or absent.

I'm pretty sure we should mark the "No newline at end of file" for
translation -- but /not/ the "\\ ".

-- Brane


-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com	


Re: Wrong assumption about unidiff syntax in libsvn_diff?

Posted by Julian Foad <ju...@btopenworld.com>.
Branko Čibej wrote:

> I'm digging through libsvn_diff and found this snippet in diff_memory.c:
> 
>     /* The string below is intentionally not marked for translation:
>        it's vital to correct operation of the diff(1)/patch(1)
>        program pair. */
>     APR_EOL_STR "\\ No newline at end of file" 
> 
> I was under the impression that we already cleared up this
> misconception? Only the leading backslash and space are important for
> signaling the no-trailing-eoln state. The text itself can be localized,
> or absent.

References?  The current GNU diff documentation says "The exact message may differ in non-English locales." [1]  I haven't found any more widely authoritative documentation of this "\ No newline..." at all, such as in POSIX/SUS.

There are some reports of other tools not working properly with other messages.  For example, <https://bugs.launchpad.net/ubuntu/+source/diffutils/+bug/708897>.  That's not a good bug report -- it doesn't say exactly what patch tools broke -- and maybe the patch tools should be considered broken, but it is an indication that there may be such breakages.

> I'm pretty sure we should mark the "No newline at end of file" for
> translation -- but /not/ the "\\ ".

Keeping it in English gives a much higher chance of it working with most patch programs.

- Julian

[1] GNU diff documentation, "Incomplete Lines" section, <http://www.gnu.org/software/diffutils/manual/html_node/Incomplete-Lines.html>.

Re: Wrong assumption about unidiff syntax in libsvn_diff?

Posted by Branko Čibej <br...@wandisco.com>.
On 20.11.2012 17:15, Johan Corveleyn wrote:
> On Tue, Nov 20, 2012 at 4:09 PM, Stefan Sperling <st...@elego.de> wrote:
>> On Tue, Nov 20, 2012 at 03:40:20PM +0100, Branko Čibej wrote:
>>> On 20.11.2012 15:05, Stefan Sperling wrote:
>>>> I just checked the UNIX patch code shipped with OpenBSD and it seems
>>>> you're right that it only looks for the backslash and ignores the comment.
>>>>
>>>> However, it seems in practice patches usually contain this string in
>>>> non-localised form. At least nobody has yet complained about svn patch
>>>> misparsing such patches.
>>> I'm distinctly remember a report on this very list, with a unidiff with
>>> a localized message attached. However, I can't find it in the archives.
>>> In any case it would be nice if our diff parser only looked at the \,
>>> not the whole message. I'm less worried about our not localizing that
>>> particular string.
>> Currently, lines such as:
>> \\ this is a comment
>> anywhere in the patch are silently ignored.
>>
>> If we follow your suggestion we must treat any such lines as hunk
>> terminators and assume the hunk lacks a trailing newline.
>> I suppose such a change is correct but it's a behaviour change so
>> I wouldn't want to backport it to 1.7.x.
>>
>> Below is a patch. Diff parser tests and patch tests are still passing.
>>
>> [[[
>> * subversion/libsvn_diff/parse-diff.c
>>   (parse_next_hunk): Treat any line that starts with a backslash as a
>>    hunk terminator, indicating that the hunk does not end with EOL.
>>    Comments following the backslash might be localised or missing
>>    in which case parsing the patch would fail.
>>
>> Suggested by: brane
>> ]]]
>>
>> Index: subversion/libsvn_diff/parse-diff.c
>> ===================================================================
>> --- subversion/libsvn_diff/parse-diff.c (revision 1411078)
>> +++ subversion/libsvn_diff/parse-diff.c (working copy)
>> @@ -555,15 +555,11 @@ parse_next_hunk(svn_diff_hunk_t **hunk,
>>        pos = 0;
>>        SVN_ERR(svn_io_file_seek(apr_file, APR_CUR, &pos, iterpool));
>>
>> -      /* Lines starting with a backslash are comments, such as
>> +      /* Lines starting with a backslash indicate a missing EOL:
>>         * "\ No newline at end of file". */
>>        if (line->data[0] == '\\')
>>          {
>> -          if (in_hunk &&
>> -              ((!*is_property &&
>> -                strcmp(line->data, "\\ No newline at end of file") == 0) ||
>> -               (*is_property &&
>> -                strcmp(line->data, "\\ No newline at end of property") == 0)))
>> +          if (in_hunk)
>>              {
>>                char eolbuf[2];
>>                apr_size_t len;
> I thought Branko said "leading backslash followed by a space":
>
> [[[
> Only the leading backslash and space are important for
> signaling the no-trailing-eoln state.
> ]]]
>
> (your argument about "\\ this is a comment" still holds though)

I'm wrong about the "and space" bit. In unidiff (and context-diff) the
first column is for instruction markers; +/> for imsertions, -/< for
removals, \ for no-newline-at-end, space for surrounding context. So as
far as looking only at the first column is concerned, that's correct.

I've frankly never heard about \ being a comment marker. However, we do
know the size of the hunk -- it's signalled in the hunk header -- and
the \ that signifies no-end-of-line would always appear /after/ the last
line of the hunk. In that respect, it would appear that our diff parser
has a bug, since it doesn't appear to handle these things correctly.

N.B., the original coment in that code is wrong, too; "\ No newline at
end of file" is *not* a comment, it's a processing instruction.

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: Wrong assumption about unidiff syntax in libsvn_diff?

Posted by Johan Corveleyn <jc...@gmail.com>.
On Tue, Nov 20, 2012 at 4:09 PM, Stefan Sperling <st...@elego.de> wrote:
> On Tue, Nov 20, 2012 at 03:40:20PM +0100, Branko Čibej wrote:
>> On 20.11.2012 15:05, Stefan Sperling wrote:
>> > I just checked the UNIX patch code shipped with OpenBSD and it seems
>> > you're right that it only looks for the backslash and ignores the comment.
>> >
>> > However, it seems in practice patches usually contain this string in
>> > non-localised form. At least nobody has yet complained about svn patch
>> > misparsing such patches.
>>
>> I'm distinctly remember a report on this very list, with a unidiff with
>> a localized message attached. However, I can't find it in the archives.
>> In any case it would be nice if our diff parser only looked at the \,
>> not the whole message. I'm less worried about our not localizing that
>> particular string.
>
> Currently, lines such as:
> \\ this is a comment
> anywhere in the patch are silently ignored.
>
> If we follow your suggestion we must treat any such lines as hunk
> terminators and assume the hunk lacks a trailing newline.
> I suppose such a change is correct but it's a behaviour change so
> I wouldn't want to backport it to 1.7.x.
>
> Below is a patch. Diff parser tests and patch tests are still passing.
>
> [[[
> * subversion/libsvn_diff/parse-diff.c
>   (parse_next_hunk): Treat any line that starts with a backslash as a
>    hunk terminator, indicating that the hunk does not end with EOL.
>    Comments following the backslash might be localised or missing
>    in which case parsing the patch would fail.
>
> Suggested by: brane
> ]]]
>
> Index: subversion/libsvn_diff/parse-diff.c
> ===================================================================
> --- subversion/libsvn_diff/parse-diff.c (revision 1411078)
> +++ subversion/libsvn_diff/parse-diff.c (working copy)
> @@ -555,15 +555,11 @@ parse_next_hunk(svn_diff_hunk_t **hunk,
>        pos = 0;
>        SVN_ERR(svn_io_file_seek(apr_file, APR_CUR, &pos, iterpool));
>
> -      /* Lines starting with a backslash are comments, such as
> +      /* Lines starting with a backslash indicate a missing EOL:
>         * "\ No newline at end of file". */
>        if (line->data[0] == '\\')
>          {
> -          if (in_hunk &&
> -              ((!*is_property &&
> -                strcmp(line->data, "\\ No newline at end of file") == 0) ||
> -               (*is_property &&
> -                strcmp(line->data, "\\ No newline at end of property") == 0)))
> +          if (in_hunk)
>              {
>                char eolbuf[2];
>                apr_size_t len;

I thought Branko said "leading backslash followed by a space":

[[[
Only the leading backslash and space are important for
signaling the no-trailing-eoln state.
]]]

(your argument about "\\ this is a comment" still holds though)

-- 
Johan

Re: Wrong assumption about unidiff syntax in libsvn_diff?

Posted by Julian Foad <ju...@btopenworld.com>.
I committed a patch implementing this, and incorporating your patch below, in r1411723.

I would say it's a portability fix and should be considered for backport -- but I don't feel strongly.

- Julian




Stefan Sperling wrote:
> Currently, lines such as:
> \\ this is a comment
> anywhere in the patch are silently ignored.
> 
> If we follow your suggestion we must treat any such lines as hunk
> terminators and assume the hunk lacks a trailing newline.
> I suppose such a change is correct but it's a behaviour change so
> I wouldn't want to backport it to 1.7.x.
> 
> Below is a patch. Diff parser tests and patch tests are still passing.
> 
> [[[
> * subversion/libsvn_diff/parse-diff.c
>   (parse_next_hunk): Treat any line that starts with a backslash as a
>    hunk terminator, indicating that the hunk does not end with EOL.
>    Comments following the backslash might be localised or missing
>    in which case parsing the patch would fail.
> 
> Suggested by: brane
> ]]]
> 
> Index: subversion/libsvn_diff/parse-diff.c
> ===================================================================
> --- subversion/libsvn_diff/parse-diff.c    (revision 1411078)
> +++ subversion/libsvn_diff/parse-diff.c    (working copy)
> @@ -555,15 +555,11 @@ parse_next_hunk(svn_diff_hunk_t **hunk,
>        pos = 0;
>        SVN_ERR(svn_io_file_seek(apr_file, APR_CUR, &pos, iterpool));
> 
> -      /* Lines starting with a backslash are comments, such as
> +      /* Lines starting with a backslash indicate a missing EOL:
>         * "\ No newline at end of file". */
>        if (line->data[0] == '\\')
>          {
> -          if (in_hunk &&
> -              ((!*is_property &&
> -                strcmp(line->data, "\\ No newline at end of 
> file") == 0) ||
> -               (*is_property &&
> -                strcmp(line->data, "\\ No newline at end of 
> property") == 0)))
> +          if (in_hunk)
>              {
>                char eolbuf[2];
>                apr_size_t len;
> 

Re: Wrong assumption about unidiff syntax in libsvn_diff?

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Nov 20, 2012 at 03:40:20PM +0100, Branko Čibej wrote:
> On 20.11.2012 15:05, Stefan Sperling wrote:
> > I just checked the UNIX patch code shipped with OpenBSD and it seems
> > you're right that it only looks for the backslash and ignores the comment.
> >
> > However, it seems in practice patches usually contain this string in
> > non-localised form. At least nobody has yet complained about svn patch
> > misparsing such patches.
> 
> I'm distinctly remember a report on this very list, with a unidiff with
> a localized message attached. However, I can't find it in the archives.
> In any case it would be nice if our diff parser only looked at the \,
> not the whole message. I'm less worried about our not localizing that
> particular string.

Currently, lines such as:
\\ this is a comment
anywhere in the patch are silently ignored.

If we follow your suggestion we must treat any such lines as hunk
terminators and assume the hunk lacks a trailing newline.
I suppose such a change is correct but it's a behaviour change so
I wouldn't want to backport it to 1.7.x.

Below is a patch. Diff parser tests and patch tests are still passing.

[[[
* subversion/libsvn_diff/parse-diff.c
  (parse_next_hunk): Treat any line that starts with a backslash as a
   hunk terminator, indicating that the hunk does not end with EOL.
   Comments following the backslash might be localised or missing
   in which case parsing the patch would fail.

Suggested by: brane
]]]

Index: subversion/libsvn_diff/parse-diff.c
===================================================================
--- subversion/libsvn_diff/parse-diff.c	(revision 1411078)
+++ subversion/libsvn_diff/parse-diff.c	(working copy)
@@ -555,15 +555,11 @@ parse_next_hunk(svn_diff_hunk_t **hunk,
       pos = 0;
       SVN_ERR(svn_io_file_seek(apr_file, APR_CUR, &pos, iterpool));
 
-      /* Lines starting with a backslash are comments, such as
+      /* Lines starting with a backslash indicate a missing EOL:
        * "\ No newline at end of file". */
       if (line->data[0] == '\\')
         {
-          if (in_hunk &&
-              ((!*is_property &&
-                strcmp(line->data, "\\ No newline at end of file") == 0) ||
-               (*is_property &&
-                strcmp(line->data, "\\ No newline at end of property") == 0)))
+          if (in_hunk)
             {
               char eolbuf[2];
               apr_size_t len;

Re: Wrong assumption about unidiff syntax in libsvn_diff?

Posted by Julian Foad <ju...@btopenworld.com>.
Branko Čibej wrote:

> On 20.11.2012 15:05, Stefan Sperling wrote:
>>  On Tue, Nov 20, 2012 at 11:25:30AM +0100, Branko Čibej wrote:
>>>  I was under the impression that we already cleared up this
>>>  misconception? Only the leading backslash and space are important
>>> for  signaling the no-trailing-eoln state. The text itself can be
>>> localized,  or absent.
>>> 
>>>  I'm pretty sure we should mark the "No newline at end of file"
>>> for  translation -- but /not/ the "\\ ".
>> 
>>  svn patch relies on the comment being present and not being localised.
>>  See parse_diff.c:parse_next_hunk().
>> 
>>  If we change diff_memory.c the diff parser should also be changed
>>  to not rely on this string being present (we can't recognise
>>  translated versions in the parser obviously).
>> 
>>  I just checked the UNIX patch code shipped with OpenBSD and it seems
>>  you're right that it only looks for the backslash and ignores the 
>> comment.
>> 
>>  However, it seems in practice patches usually contain this string in
>>  non-localised form. At least nobody has yet complained about svn patch
>>  misparsing such patches.
> 
> I'm distinctly remember a report on this very list, with a unidiff with
> a localized message attached. However, I can't find it in the archives.
> In any case it would be nice if our diff parser only looked at the \,
> not the whole message. I'm less worried about our not localizing that
> particular string.

Let's fix the parser to require only '\' and leave the string untranslated.  "Be conservative in what you send and liberal in what you accept" is a good mantra for interoperability.

- Julian

Re: Wrong assumption about unidiff syntax in libsvn_diff?

Posted by Branko Čibej <br...@wandisco.com>.
On 20.11.2012 15:05, Stefan Sperling wrote:
> On Tue, Nov 20, 2012 at 11:25:30AM +0100, Branko Čibej wrote:
>> I'm digging through libsvn_diff and found this snippet in diff_memory.c:
>>
>>       SVN_ERR(svn_utf_cstring_from_utf8_ex2
>>               (&out_str,
>>                /* The string below is intentionally not marked for translation:
>>                   it's vital to correct operation of the diff(1)/patch(1)
>>                   program pair. */
>>                APR_EOL_STR "\\ No newline at end of file" APR_EOL_STR,
>>                btn->header_encoding, btn->pool));
>>
>>
>> I was under the impression that we already cleared up this
>> misconception? Only the leading backslash and space are important for
>> signaling the no-trailing-eoln state. The text itself can be localized,
>> or absent.
>>
>> I'm pretty sure we should mark the "No newline at end of file" for
>> translation -- but /not/ the "\\ ".
> svn patch relies on the comment being present and not being localised.
> See parse_diff.c:parse_next_hunk().
>
> If we change diff_memory.c the diff parser should also be changed
> to not rely on this string being present (we can't recognise
> translated versions in the parser obviously).
>
> I just checked the UNIX patch code shipped with OpenBSD and it seems
> you're right that it only looks for the backslash and ignores the comment.
>
> However, it seems in practice patches usually contain this string in
> non-localised form. At least nobody has yet complained about svn patch
> misparsing such patches.

I'm distinctly remember a report on this very list, with a unidiff with
a localized message attached. However, I can't find it in the archives.
In any case it would be nice if our diff parser only looked at the \,
not the whole message. I'm less worried about our not localizing that
particular string.

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: Wrong assumption about unidiff syntax in libsvn_diff?

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Nov 20, 2012 at 11:25:30AM +0100, Branko Čibej wrote:
> I'm digging through libsvn_diff and found this snippet in diff_memory.c:
> 
>       SVN_ERR(svn_utf_cstring_from_utf8_ex2
>               (&out_str,
>                /* The string below is intentionally not marked for translation:
>                   it's vital to correct operation of the diff(1)/patch(1)
>                   program pair. */
>                APR_EOL_STR "\\ No newline at end of file" APR_EOL_STR,
>                btn->header_encoding, btn->pool));
> 
> 
> I was under the impression that we already cleared up this
> misconception? Only the leading backslash and space are important for
> signaling the no-trailing-eoln state. The text itself can be localized,
> or absent.
> 
> I'm pretty sure we should mark the "No newline at end of file" for
> translation -- but /not/ the "\\ ".

svn patch relies on the comment being present and not being localised.
See parse_diff.c:parse_next_hunk().

If we change diff_memory.c the diff parser should also be changed
to not rely on this string being present (we can't recognise
translated versions in the parser obviously).

I just checked the UNIX patch code shipped with OpenBSD and it seems
you're right that it only looks for the backslash and ignores the comment.

However, it seems in practice patches usually contain this string in
non-localised form. At least nobody has yet complained about svn patch
misparsing such patches.