You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "D.J. Heap" <dj...@gmail.com> on 2006/01/20 14:46:38 UTC

Textual diff bug?

The attached repository demonstrates a diff bug (on Linux or Windows).
 I debugged through it a bit and it seems to detect changed hunks
correctly, but doesn't output any text lines for it -- it might be
some kind of boundary problem since it happens at right around the end
of the file (which is also about 10k in size).

Could someone with more diff expertise take a look?

Just expand the repo somewhere and run a diff for revs 1 & 2:

[dj@vm-fc4 ~]$ svn diff -r1:2 file:///home/dj/tr
Index: file.txt
===================================================================
--- file.txt    (revision 1)
+++ file.txt    (revision 2)
[dj@vm-fc4 ~]$

The file has had a single character changed on line 318.

DJ

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Textual diff bug?

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Mon, 23 Jan 2006, Malcolm Rowe wrote:

> On Fri, Jan 20, 2006 at 07:46:38AM -0700, D.J. Heap wrote:
> > The attached repository demonstrates a diff bug (on Linux or Windows).
> >  I debugged through it a bit and it seems to detect changed hunks
> > correctly, but doesn't output any text lines for it -- it might be
> > some kind of boundary problem since it happens at right around the end
> > of the file (which is also about 10k in size).
> >
>
> Ok, this is fallout from lundblad's enhancement to handle diffing files
> with different end-of-line markers, in r17624.  In summary, the unified
> diff engine was producing the wrong output for CR- or CR/LF-terminated
> files if any CR fell at the end of a 4k chunk (in your example, this
> happens on line 80).
>
Woops! Sorry for this one:-(

> Anyway, I've implemented the obvious fix for this in r18195.  Peter,
> I'm fairly sure this is correct, but could you cast your eye over it too?
>
Yes, it looks correct. Thanks, D.J. for finding it and Malcolm for fixing!

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Textual diff bug?

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Fri, Jan 20, 2006 at 07:46:38AM -0700, D.J. Heap wrote:
> The attached repository demonstrates a diff bug (on Linux or Windows).
>  I debugged through it a bit and it seems to detect changed hunks
> correctly, but doesn't output any text lines for it -- it might be
> some kind of boundary problem since it happens at right around the end
> of the file (which is also about 10k in size).
> 

Ok, this is fallout from lundblad's enhancement to handle diffing files
with different end-of-line markers, in r17624.  In summary, the unified
diff engine was producing the wrong output for CR- or CR/LF-terminated
files if any CR fell at the end of a 4k chunk (in your example, this
happens on line 80).

What's happening was that, when we do a diff, we first read the two
files and calculate the diff, then read the files again and output
the lines that we're interested in, using the appropriate markers (see
svn_diff__file_output_unified_line() in libsvn_diff/diff_file.c).

In that output function, we maintain a 4k buffer of data from the file
and a 'current line' pointer into the buffer.  If the last line that we
read from the buffer ends with a CR, then we need to read another 4k into
the buffer to check whether the last line was CR- or CR/LF-terminated
(and if the latter, include the LF as part of the line).

The problem was that although we read the next 4k of the file into the
buffer, we didn't update the 'current line' pointer before leaving the
procedure, so it remained pointing somewhere near the end of the buffer.
Effectively, we skipped exactly 4k of the file and lost track of the
current line completely.  That meant that we'd either output completely
the wrong lines in the diff, or, for lines near the end of the file
(like yours), we'd output nothing, since the function just returns
silently if we attempt to read lines past the end of the file.

Anyway, I've implemented the obvious fix for this in r18195.  Peter,
I'm fairly sure this is correct, but could you cast your eye over it too?

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Textual diff bug?

Posted by "D.J. Heap" <dj...@gmail.com>.
> The attached repository demonstrates a diff bug (on Linux or Windows).

...and here is the repo actually attached this time...

DJ

Re: Textual diff bug?

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Fri, Jan 20, 2006 at 07:46:38AM -0700, D.J. Heap wrote:
>  I debugged through it a bit and it seems to detect changed hunks
> correctly, but doesn't output any text lines for it -- it might be
> some kind of boundary problem since it happens at right around the end
> of the file (which is also about 10k in size).
> 

Everything seems to go fine until we get to
libsvn_diff/diff_file.c:svn_diff__file_output_unified_diff_modified(), at
which point I get lost.  As you say, the original diff line-by-line/LCS
calculation itself looks good, it's just the output that's broken.

I suspect it's a boundary condition as well, but I've not got the time
to debug this right now - perhaps later.

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org