You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp> on 2012/12/24 15:43:42 UTC

[PATCH] fix for diff optimization bug

Hi, Subversion developers.

The optimization of diff inclued in version 1.7 has a bug that
produces incorrect diff on a certain condition.
The attached patch fix it. 

diffstat:
 subversion/libsvn_diff/diff_file.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)


Detail of the bug
-----------------

When the identical suffix begins at the boundary of a chunk,
datasource_get_next_token() defined in subversion/libsvn_diff/diff_file.c
does not stop at head of the identical suffix.
Therefore, when one of the identical suffixes of the original file
and the modified file begin from the boundary of a chunk,
excessive tokens are added to the diff tree.

How to reproduce
----------------

$ for ((i=0;i<8256;i++)); do echo 0123456789abcde; done > test.txt
$ hexdump -C test.txt
00000000  30 31 32 33 34 35 36 37  38 39 61 62 63 64 65 0a  |0123456789abcde.|
*
00020400
$ svn add test.txt; svn ci -m test
A         test.txt
Adding         test.txt
Transmitting file data .
Committed revision 2.
$ echo 0123456789ABCDE |dd of=test.txt bs=16 seek=64 conv=notrunc
1+0 records in
1+0 records out
$ echo 0123456789ABCDE |dd of=test.txt bs=16 seek=8141 conv=notrunc
1+0 records in
1+0 records out
$ echo 0123456789abcde >> test.txt
$ echo 0123456789abcde >> test.txt
$ hexdump -C test.txt
00000000  30 31 32 33 34 35 36 37  38 39 61 62 63 64 65 0a  |0123456789abcde.|
*
00000400  30 31 32 33 34 35 36 37  38 39 41 42 43 44 45 0a  |0123456789ABCDE.|
00000410  30 31 32 33 34 35 36 37  38 39 61 62 63 64 65 0a  |0123456789abcde.|
*
0001fcd0  30 31 32 33 34 35 36 37  38 39 41 42 43 44 45 0a  |0123456789ABCDE.|
0001fce0  30 31 32 33 34 35 36 37  38 39 61 62 63 64 65 0a  |0123456789abcde.|
*
00020420
$ svn cat test.txt | diff -u - test.txt
--- -   2012-12-24 22:30:18.760832000 +0900
+++ test.txt    2012-12-24 22:29:24.000000000 +0900
@@ -62,6 +62,7 @@
 0123456789abcde
 0123456789abcde
 0123456789abcde
+0123456789ABCDE
 0123456789abcde
 0123456789abcde
 0123456789abcde
@@ -8138,6 +8139,7 @@
 0123456789abcde
 0123456789abcde
 0123456789abcde
+0123456789ABCDE
 0123456789abcde
 0123456789abcde
 0123456789abcde
$ svn di test.txt
Index: test.txt
===================================================================
--- test.txt    (revision 2)
+++ test.txt    (working copy)
@@ -62,6 +62,7 @@
 0123456789abcde
 0123456789abcde
 0123456789abcde
+0123456789ABCDE
 0123456789abcde
 0123456789abcde
 0123456789abcde
@@ -8138,6 +8139,7 @@
 0123456789abcde
 0123456789abcde
 0123456789abcde
+0123456789ABCDE
 0123456789abcde
 0123456789abcde
 0123456789abcde
@@ -8188,6 +8190,72 @@
 0123456789abcde
 0123456789abcde
 0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
+0123456789abcde
 0123456789abcde
 0123456789abcde
 0123456789abcde


-- 
Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>

Re: [PATCH] fix for diff optimization bug

Posted by Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>.
Hi.

> It would be very useful to have a regression test for the bug that you have found.? Would you be able to convert your reproduction recipe into a regression test written in C like the that one on trunk?
> Please let us know if you would be willing to write a test for the bug you found, and/or port test 13 to version 1.7, and/or write a patch to fix the bug shown by test number 13.? We can treat them as two entirely separate problems, but maybe you have the skill and wish to help fix both of them.

I converted the recipe into regression test and created a new patch for nightly tarball. 
I will investigate issue #4133 later. 


diffstat:
 subversion/libsvn_diff/diff_file.c             |   10 ++-
 subversion/tests/libsvn_diff/diff-diff3-test.c |   69 +++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 3 deletions(-)


On Mon, 24 Dec 2012 20:51:09 +0000 (GMT)
Julian Foad <ju...@btopenworld.com> wrote:

> Hi, Hideki.? Thank you very much for finding this bug and a fix for it.
> You might be interested to know that on the 'trunk',  a new test has already been added for a similar problem:
> 
> $ ./subversion/tests/libsvn_diff/diff-diff3-test --list
> [...]
> ? 13??? XFAIL? difference at the start of a 128KB window
> 
> We don't have a fix for this bug yet.? I don't know whether this bug also exists in version 1.7.8.
> 
> 
> It would be very useful to have a regression test for the bug that you have found.? Would you be able to convert your reproduction recipe into a regression test written in C like the that one on trunk?
> Please let us know if you would be willing to write a test for the bug you found, and/or port test 13 to version 1.7, and/or write a patch to fix the bug shown by test number 13.? We can treat them as two entirely separate problems, but maybe you have the skill and wish to help fix both of them.
> 
> - Julian
> 
> 
> Hideki IWAMOTO wrote:
> 
> > The optimization of diff inclued in version 1.7 has a bug that
> > produces incorrect diff on a certain condition.
> > The attached patch fix it. 
> > 
> > 
> > Detail of the bug
> > -----------------
> > 
> > When the identical suffix begins at the boundary of a chunk,
> > datasource_get_next_token() defined in subversion/libsvn_diff/diff_file.c
> > does not stop at head of the identical suffix.
> > Therefore, when one of the identical suffixes of the original file
> > and the modified file begin from the boundary of a chunk,
> > excessive tokens are added to the diff tree.
> > 
> > How to reproduce
> > ----------------
> > 
> > $ for ((i=0;i<8256;i++)); do echo 0123456789abcde; done > test.txt
> > $ hexdump -C test.txt
> > 00000000? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > *
> > 00020400
> > $ svn add test.txt; svn ci -m test
> > A? ? ? ?  test.txt
> > Adding? ? ? ?  test.txt
> > Transmitting file data .
> > Committed revision 2.
> > $ echo 0123456789ABCDE |dd of=test.txt bs=16 seek=64 conv=notrunc
> > 1+0 records in
> > 1+0 records out
> > $ echo 0123456789ABCDE |dd of=test.txt bs=16 seek=8141 conv=notrunc
> > 1+0 records in
> > 1+0 records out
> > $ echo 0123456789abcde >> test.txt
> > $ echo 0123456789abcde >> test.txt
> > $ hexdump -C test.txt
> > 00000000? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > *
> > 00000400? 30 31 32 33 34 35 36 37? 38 39 41 42 43 44 45 0a? |0123456789ABCDE.|
> > 00000410? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > *
> > 0001fcd0? 30 31 32 33 34 35 36 37? 38 39 41 42 43 44 45 0a? |0123456789ABCDE.|
> > 0001fce0? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > *
> > 00020420
> > $ svn cat test.txt | diff -u - test.txt
> > --- -?  2012-12-24 22:30:18.760832000 +0900
> > +++ test.txt? ? 2012-12-24 22:29:24.000000000 +0900
> > @@ -62,6 +62,7 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789ABCDE
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > @@ -8138,6 +8139,7 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789ABCDE
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > $ svn di test.txt
> > Index: test.txt
> > ===================================================================
> > --- test.txt? ? (revision 2)
> > +++ test.txt? ? (working copy)
> > @@ -62,6 +62,7 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789ABCDE
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > @@ -8138,6 +8139,7 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789ABCDE
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > @@ -8188,6 +8190,72 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > 
> > 
> > -- 
> > Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>
> > 

-- 
Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>

Re: [PATCH] fix for diff optimization bug

Posted by Daniel Shahaf <da...@elego.de>.
Hideki IWAMOTO wrote on Tue, Dec 25, 2012 at 07:26:37 +0900:
> Hi.
> 
> > It would be very useful to have a regression test for the bug that you have found.? Would you be able to convert your reproduction recipe into a regression test written in C like the that one on trunk?
> > Please let us know if you would be willing to write a test for the bug you found, and/or port test 13 to version 1.7, and/or write a patch to fix the bug shown by test number 13.? We can treat them as two entirely separate problems, but maybe you have the skill and wish to help fix both of them.
> 
> Because the client that I built has a problem on access to Subversion repository,
> I want to do these using nightly tarball.  Is it OK?

A patch against a nightly tarball is as good as one against trunk, yes.
As to problems accessing svn.apache.org, feel free to follow up about
them with infrastructure@apache.org mentioning your IP address and what
"svn.apache.org" resolves to for you.

Re: [PATCH] fix for diff optimization bug

Posted by Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>.
Hi.

> It would be very useful to have a regression test for the bug that you have found.? Would you be able to convert your reproduction recipe into a regression test written in C like the that one on trunk?
> Please let us know if you would be willing to write a test for the bug you found, and/or port test 13 to version 1.7, and/or write a patch to fix the bug shown by test number 13.? We can treat them as two entirely separate problems, but maybe you have the skill and wish to help fix both of them.

Because the client that I built has a problem on access to Subversion repository,
I want to do these using nightly tarball.  Is it OK?


On Mon, 24 Dec 2012 20:51:09 +0000 (GMT)
Julian Foad <ju...@btopenworld.com> wrote:

> Hi, Hideki.? Thank you very much for finding this bug and a fix for it.
> You might be interested to know that on the 'trunk',  a new test has already been added for a similar problem:
> 
> $ ./subversion/tests/libsvn_diff/diff-diff3-test --list
> [...]
> ? 13??? XFAIL? difference at the start of a 128KB window
> 
> We don't have a fix for this bug yet.? I don't know whether this bug also exists in version 1.7.8.
> 
> 
> It would be very useful to have a regression test for the bug that you have found.? Would you be able to convert your reproduction recipe into a regression test written in C like the that one on trunk?
> Please let us know if you would be willing to write a test for the bug you found, and/or port test 13 to version 1.7, and/or write a patch to fix the bug shown by test number 13.? We can treat them as two entirely separate problems, but maybe you have the skill and wish to help fix both of them.
> 
> - Julian
> 
> 
> Hideki IWAMOTO wrote:
> 
> > The optimization of diff inclued in version 1.7 has a bug that
> > produces incorrect diff on a certain condition.
> > The attached patch fix it. 
> > 
> > 
> > Detail of the bug
> > -----------------
> > 
> > When the identical suffix begins at the boundary of a chunk,
> > datasource_get_next_token() defined in subversion/libsvn_diff/diff_file.c
> > does not stop at head of the identical suffix.
> > Therefore, when one of the identical suffixes of the original file
> > and the modified file begin from the boundary of a chunk,
> > excessive tokens are added to the diff tree.
> > 
> > How to reproduce
> > ----------------
> > 
> > $ for ((i=0;i<8256;i++)); do echo 0123456789abcde; done > test.txt
> > $ hexdump -C test.txt
> > 00000000? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > *
> > 00020400
> > $ svn add test.txt; svn ci -m test
> > A? ? ? ?  test.txt
> > Adding? ? ? ?  test.txt
> > Transmitting file data .
> > Committed revision 2.
> > $ echo 0123456789ABCDE |dd of=test.txt bs=16 seek=64 conv=notrunc
> > 1+0 records in
> > 1+0 records out
> > $ echo 0123456789ABCDE |dd of=test.txt bs=16 seek=8141 conv=notrunc
> > 1+0 records in
> > 1+0 records out
> > $ echo 0123456789abcde >> test.txt
> > $ echo 0123456789abcde >> test.txt
> > $ hexdump -C test.txt
> > 00000000? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > *
> > 00000400? 30 31 32 33 34 35 36 37? 38 39 41 42 43 44 45 0a? |0123456789ABCDE.|
> > 00000410? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > *
> > 0001fcd0? 30 31 32 33 34 35 36 37? 38 39 41 42 43 44 45 0a? |0123456789ABCDE.|
> > 0001fce0? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > *
> > 00020420
> > $ svn cat test.txt | diff -u - test.txt
> > --- -?  2012-12-24 22:30:18.760832000 +0900
> > +++ test.txt? ? 2012-12-24 22:29:24.000000000 +0900
> > @@ -62,6 +62,7 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789ABCDE
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > @@ -8138,6 +8139,7 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789ABCDE
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > $ svn di test.txt
> > Index: test.txt
> > ===================================================================
> > --- test.txt? ? (revision 2)
> > +++ test.txt? ? (working copy)
> > @@ -62,6 +62,7 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789ABCDE
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > @@ -8138,6 +8139,7 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789ABCDE
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > @@ -8188,6 +8190,72 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > 
> > 
> > -- 
> > Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>
> > 

-- 
Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>


Re: [PATCH] fix for diff optimization bug

Posted by Daniel Shahaf <da...@elego.de>.
Hideki IWAMOTO wrote on Sat, Dec 29, 2012 at 22:22:00 +0900:
> Hi, Daniel.
> 
> Sorry for not understanding using of issue tracker.
> 

No worries.  Johan has worked on this code before, but I didn't want to
assume he would have time:
http://www.producingoss.com/en/managing-volunteers.html#delegation-assignment
http://www.producingoss.com/ja/managing-volunteers.html#delegation-assignment

> > OK, thanks for this.  Could you set a more descriptive 'summary' on the
> > issue, please?  i.e., describe the bug, not just state that there is
> > one.
> 
> I change to summary to
>  'When the identical suffix starts at chunk boundary, 
>   "svn diff" produces incorrect diff'.
> 
> > (Also, Johan is just a volunteer and might not have time to look into
> > the bug, despite the fact you've assigned it to him; I suppose you know
> > that, but restating it won't hurt.)
> 
> I also changed assign to the default owner of libsvn_diff.

Thanks, that's good.

Re: [PATCH] fix for diff optimization bug

Posted by Johan Corveleyn <jc...@gmail.com>.
On Sat, Dec 29, 2012 at 2:22 PM, Hideki IWAMOTO
<h-...@kit.hi-ho.ne.jp> wrote:
> Hi, Daniel.
>
> Sorry for not understanding using of issue tracker.
>
>> OK, thanks for this.  Could you set a more descriptive 'summary' on the
>> issue, please?  i.e., describe the bug, not just state that there is
>> one.
>
> I change to summary to
>  'When the identical suffix starts at chunk boundary,
>   "svn diff" produces incorrect diff'.
>
>> (Also, Johan is just a volunteer and might not have time to look into
>> the bug, despite the fact you've assigned it to him; I suppose you know
>> that, but restating it won't hurt.)
>
> I also changed assign to the default owner of libsvn_diff.

Great, thanks for finding and fixing that bug.

It's indeed best in general to assign bugs just to the default owner
(issues@subversion). People will assign to themselves when they start
working on something.

I'm following "from a distance" right now ... too many family
obligations at the moment :-). I'll certainly take a closer look in a
couple of days.

Thanks,
-- 
Johan

Re: [PATCH] fix for diff optimization bug

Posted by Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>.
Hi, Daniel.

Sorry for not understanding using of issue tracker.

> OK, thanks for this.  Could you set a more descriptive 'summary' on the
> issue, please?  i.e., describe the bug, not just state that there is
> one.

I change to summary to
 'When the identical suffix starts at chunk boundary, 
  "svn diff" produces incorrect diff'.

> (Also, Johan is just a volunteer and might not have time to look into
> the bug, despite the fact you've assigned it to him; I suppose you know
> that, but restating it won't hurt.)

I also changed assign to the default owner of libsvn_diff.


On Sat, 29 Dec 2012 14:39:11 +0200
Daniel Shahaf <da...@elego.de> wrote:

> Hideki IWAMOTO wrote on Sat, Dec 29, 2012 at 21:30:32 +0900:
> > Hi.
> > 
> > > Hi Hideki, thanks for the patch.  I hope Johan or Julian will have
> > > a look (probably next year), but if you don't get a response please ping
> > > the thread and I'll have a look.
> > 
> > I created issue #4283 for diff optimization bug and assigned to Johan.
> > 
> 
> OK, thanks for this.  Could you set a more descriptive 'summary' on the
> issue, please?  i.e., describe the bug, not just state that there is
> one.
> 
> (Also, Johan is just a volunteer and might not have time to look into
> the bug, despite the fact you've assigned it to him; I suppose you know
> that, but restating it won't hurt.)
> 
> Cheers
> 
> Daniel
> 

-- 
Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>


Re: [PATCH] fix for diff optimization bug

Posted by Daniel Shahaf <da...@elego.de>.
Hideki IWAMOTO wrote on Sat, Dec 29, 2012 at 21:30:32 +0900:
> Hi.
> 
> > Hi Hideki, thanks for the patch.  I hope Johan or Julian will have
> > a look (probably next year), but if you don't get a response please ping
> > the thread and I'll have a look.
> 
> I created issue #4283 for diff optimization bug and assigned to Johan.
> 

OK, thanks for this.  Could you set a more descriptive 'summary' on the
issue, please?  i.e., describe the bug, not just state that there is
one.

(Also, Johan is just a volunteer and might not have time to look into
the bug, despite the fact you've assigned it to him; I suppose you know
that, but restating it won't hurt.)

Cheers

Daniel

> On Sat, 29 Dec 2012 02:00:02 +0200
> Daniel Shahaf <da...@elego.de> wrote:
> 
> > Hi Hideki, thanks for the patch.  I hope Johan or Julian will have
> > a look (probably next year), but if you don't get a response please ping
> > the thread and I'll have a look.
> > 
> > Hideki IWAMOTO wrote on Sat, Dec 29, 2012 at 08:25:07 +0900:
> > > Hi, Daniel.
> > > 
> > > > Because I assumed using svn_diff_mem_* would reproduce issue #4133.  If
> > > > it doesn't, the test should be amended to use an API that will reproduce
> > > > the issue.
> > > 
> > > The cause of issue #4133 is a bugs in the calculations of
> > > svn_diff__file_token_t::norm_offset.
> > > 
> > > The test that you added is for different problem. (whitespaces at 
> > > the end of line are not ignored even if option `-x -b' option is specified.)
> > > 
> > > patch for issue #4133 is pattached.
> > > 
> > >  subversion/libsvn_diff/diff_file.c             |   13 ++-
> > >  subversion/tests/libsvn_diff/diff-diff3-test.c |   89 ++++++++++++++++++-------
> > >  2 files changed, 75 insertions(+), 27 deletions(-)
> 
> -- 
> Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>
> 

Re: [PATCH] fix for diff optimization bug

Posted by Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>.
Hi.

> Hi Hideki, thanks for the patch.  I hope Johan or Julian will have
> a look (probably next year), but if you don't get a response please ping
> the thread and I'll have a look.

I created issue #4283 for diff optimization bug and assigned to Johan.

On Sat, 29 Dec 2012 02:00:02 +0200
Daniel Shahaf <da...@elego.de> wrote:

> Hi Hideki, thanks for the patch.  I hope Johan or Julian will have
> a look (probably next year), but if you don't get a response please ping
> the thread and I'll have a look.
> 
> Hideki IWAMOTO wrote on Sat, Dec 29, 2012 at 08:25:07 +0900:
> > Hi, Daniel.
> > 
> > > Because I assumed using svn_diff_mem_* would reproduce issue #4133.  If
> > > it doesn't, the test should be amended to use an API that will reproduce
> > > the issue.
> > 
> > The cause of issue #4133 is a bugs in the calculations of
> > svn_diff__file_token_t::norm_offset.
> > 
> > The test that you added is for different problem. (whitespaces at 
> > the end of line are not ignored even if option `-x -b' option is specified.)
> > 
> > patch for issue #4133 is pattached.
> > 
> >  subversion/libsvn_diff/diff_file.c             |   13 ++-
> >  subversion/tests/libsvn_diff/diff-diff3-test.c |   89 ++++++++++++++++++-------
> >  2 files changed, 75 insertions(+), 27 deletions(-)

-- 
Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>


RE: [PATCH] fix for diff optimization bug

Posted by Bert Huijben <be...@qqmail.nl>.
> -----Original Message-----
> From: Hideki IWAMOTO [mailto:h-iwamoto@kit.hi-ho.ne.jp]
> Sent: zaterdag 29 december 2012 17:28
> To: Bert Huijben
> Cc: dev@subversion.apache.org
> Subject: Re: [PATCH] fix for diff optimization bug
> 
> Hi.
> 
> > I retraced your steps and applied your patch with some small tweaks in
> > r1426752
> (https://svn.apache.org/viewvc?view=revision&revision=r1426752)
> 
> There is a small difference in
> issu4133-trunk.patch which I attached to e-mail and
> issu4133-trunk-2.patch which I registered into issue tracker.
> Would you also apply this?

It would be much easier if you told what differences there are between your
patches instead of making me find out myself. (Or you could just send a new
patch)

I think I applied the change you refer to in r1426784.


For future patches I would recommend writing a log message following our
community guidelines
(http://subversion.apache.org/docs/community-guide/conventions.html#log-mess
ages)

This makes the review of your patch (and committing the patch) much easier
for others. 

Thanks,

	Bert


Re: [PATCH] fix for diff optimization bug

Posted by Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>.
Hi.

> I retraced your steps and applied your patch with some small tweaks in
> r1426752 (https://svn.apache.org/viewvc?view=revision&revision=r1426752)

There is a small difference in
issu4133-trunk.patch which I attached to e-mail and
issu4133-trunk-2.patch which I registered into issue tracker.
Would you also apply this? 


$ interdiff issu4133-trunk.patch issu4133-trunk-2.patch | diffstat -p0
 subversion/libsvn_diff/diff_file.c             |    2 +-
 subversion/tests/libsvn_diff/diff-diff3-test.c |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

On Sat, 29 Dec 2012 13:31:21 +0100
"Bert Huijben" <be...@qqmail.nl> wrote:

> 
> 
> > -----Original Message-----
> > From: Daniel Shahaf [mailto:danielsh@elego.de]
> > Sent: zaterdag 29 december 2012 01:00
> > To: Hideki IWAMOTO
> > Cc: dev@subversion.apache.org; Julian Foad
> > Subject: Re: [PATCH] fix for diff optimization bug
> > 
> > Hi Hideki, thanks for the patch.  I hope Johan or Julian will have
> > a look (probably next year), but if you don't get a response please ping
> > the thread and I'll have a look.
> 
> 	Hi Hideki,
> 
> Thanks for doing the research for this issue!
> 
> I retraced your steps and applied your patch with some small tweaks in
> r1426752 (https://svn.apache.org/viewvc?view=revision&revision=r1426752)
> 
> Thanks,
> 	Bert
> 
> > 
> > Hideki IWAMOTO wrote on Sat, Dec 29, 2012 at 08:25:07 +0900:
> > > Hi, Daniel.
> > >
> > > > Because I assumed using svn_diff_mem_* would reproduce issue #4133.
> > If
> > > > it doesn't, the test should be amended to use an API that will
> reproduce
> > > > the issue.
> > >
> > > The cause of issue #4133 is a bugs in the calculations of
> > > svn_diff__file_token_t::norm_offset.
> > >
> > > The test that you added is for different problem. (whitespaces at
> > > the end of line are not ignored even if option `-x -b' option is
> specified.)
> > >
> > > patch for issue #4133 is pattached.
> > >
> > >  subversion/libsvn_diff/diff_file.c             |   13 ++-
> > >  subversion/tests/libsvn_diff/diff-diff3-test.c |   89
> > ++++++++++++++++++-------
> > >  2 files changed, 75 insertions(+), 27 deletions(-)

-- 
Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>


RE: [PATCH] fix for diff optimization bug

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Daniel Shahaf [mailto:danielsh@elego.de]
> Sent: zaterdag 29 december 2012 01:00
> To: Hideki IWAMOTO
> Cc: dev@subversion.apache.org; Julian Foad
> Subject: Re: [PATCH] fix for diff optimization bug
> 
> Hi Hideki, thanks for the patch.  I hope Johan or Julian will have
> a look (probably next year), but if you don't get a response please ping
> the thread and I'll have a look.

	Hi Hideki,

Thanks for doing the research for this issue!

I retraced your steps and applied your patch with some small tweaks in
r1426752 (https://svn.apache.org/viewvc?view=revision&revision=r1426752)

Thanks,
	Bert

> 
> Hideki IWAMOTO wrote on Sat, Dec 29, 2012 at 08:25:07 +0900:
> > Hi, Daniel.
> >
> > > Because I assumed using svn_diff_mem_* would reproduce issue #4133.
> If
> > > it doesn't, the test should be amended to use an API that will
reproduce
> > > the issue.
> >
> > The cause of issue #4133 is a bugs in the calculations of
> > svn_diff__file_token_t::norm_offset.
> >
> > The test that you added is for different problem. (whitespaces at
> > the end of line are not ignored even if option `-x -b' option is
specified.)
> >
> > patch for issue #4133 is pattached.
> >
> >  subversion/libsvn_diff/diff_file.c             |   13 ++-
> >  subversion/tests/libsvn_diff/diff-diff3-test.c |   89
> ++++++++++++++++++-------
> >  2 files changed, 75 insertions(+), 27 deletions(-)


Re: [PATCH] fix for diff optimization bug

Posted by Daniel Shahaf <da...@elego.de>.
Hi Hideki, thanks for the patch.  I hope Johan or Julian will have
a look (probably next year), but if you don't get a response please ping
the thread and I'll have a look.

Hideki IWAMOTO wrote on Sat, Dec 29, 2012 at 08:25:07 +0900:
> Hi, Daniel.
> 
> > Because I assumed using svn_diff_mem_* would reproduce issue #4133.  If
> > it doesn't, the test should be amended to use an API that will reproduce
> > the issue.
> 
> The cause of issue #4133 is a bugs in the calculations of
> svn_diff__file_token_t::norm_offset.
> 
> The test that you added is for different problem. (whitespaces at 
> the end of line are not ignored even if option `-x -b' option is specified.)
> 
> patch for issue #4133 is pattached.
> 
>  subversion/libsvn_diff/diff_file.c             |   13 ++-
>  subversion/tests/libsvn_diff/diff-diff3-test.c |   89 ++++++++++++++++++-------
>  2 files changed, 75 insertions(+), 27 deletions(-)

Re: [PATCH] fix for diff optimization bug

Posted by Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>.
Hi, Daniel.

> Because I assumed using svn_diff_mem_* would reproduce issue #4133.  If
> it doesn't, the test should be amended to use an API that will reproduce
> the issue.

The cause of issue #4133 is a bugs in the calculations of
svn_diff__file_token_t::norm_offset.

The test that you added is for different problem. (whitespaces at 
the end of line are not ignored even if option `-x -b' option is specified.)

patch for issue #4133 is pattached.

 subversion/libsvn_diff/diff_file.c             |   13 ++-
 subversion/tests/libsvn_diff/diff-diff3-test.c |   89 ++++++++++++++++++-------
 2 files changed, 75 insertions(+), 27 deletions(-)

On Tue, 25 Dec 2012 18:25:23 +0200
Daniel Shahaf <da...@elego.de> wrote:

> Hideki IWAMOTO wrote on Wed, Dec 26, 2012 at 00:31:16 +0900:
> > Hi, Daniel.
> > 
> > > You might be interested to know that on the 'trunk',  a new test has already been added for a similar problem:
> > > 
> > > $ ./subversion/tests/libsvn_diff/diff-diff3-test --list
> > > [...]
> > > ? 13??? XFAIL? difference at the start of a 128KB window
> > 
> > I have some questions about the test which you added for issue #4133.
> > 
> > 1. Enum constant corresponding to the option `-w' is svn_diff_file_ignore_space_all. 
> >    But, you have set svn_diff_file_ignore_space_change to ignore_space. 
> >    Why? 
> > 
> 
> That's probably a bug in the test: it is failing now because it expects
> "ba \n" and "ba\n" to be considered equal, but it should be using
> svn_diff_file_ignore_space_all for them to be considered equal.
> 
> I don't care whether the test uses svn_diff_file_ignore_space_change or
> svn_diff_file_ignore_space_all, so long as it tests for issue #4133.
> 
> > subversion/tests/libsvn_diff/diff-diff3-test.c:
> >  2397:     /* Issue #4133, '"diff -x -w" showing wrong change'.
> >              ...
> >  2408:      diff_opts->ignore_space = svn_diff_file_ignore_space_change;
> > 
> > 
> > 2. The 128KB chunk size is used only for diff on file, and it seems not to affect on
> >    diff on memory. 
> >    Why do you use svn_diff_mem_string_diff? 
> > 
> 
> Because I assumed using svn_diff_mem_* would reproduce issue #4133.  If
> it doesn't, the test should be amended to use an API that will reproduce
> the issue.
> 
> > subversion/tests/libsvn_diff/diff-diff3-test.c:
> >  2398:    The magic number used in this test, 1<<17, is
> >  2399:    CHUNK_SIZE from ../../libsvn_diff/diff_file.c
> >             ...
> >  2425:     SVN_ERR(svn_diff_mem_string_diff(&diff, &left, &right, diff_opts, pool));
> > 
> 
> Thanks for catching these problems.  Would you be able to write patches
> to fix them?  It sounds like you're already more familiar with the diff
> code than I am. :)
> 
> Cheers
> 
> Daniel
> 
> > 
> > 
> > On Mon, 24 Dec 2012 20:51:09 +0000 (GMT)
> > Julian Foad <ju...@btopenworld.com> wrote:
> > 
> > > Hi, Hideki.? Thank you very much for finding this bug and a fix for it.
> > > You might be interested to know that on the 'trunk',  a new test has already been added for a similar problem:
> > > 
> > > $ ./subversion/tests/libsvn_diff/diff-diff3-test --list
> > > [...]
> > > ? 13??? XFAIL? difference at the start of a 128KB window
> > > 
> > > We don't have a fix for this bug yet.? I don't know whether this bug also exists in version 1.7.8.
> > > 
> > > 
> > > It would be very useful to have a regression test for the bug that you have found.? Would you be able to convert your reproduction recipe into a regression test written in C like the that one on trunk?
> > > Please let us know if you would be willing to write a test for the bug you found, and/or port test 13 to version 1.7, and/or write a patch to fix the bug shown by test number 13.? We can treat them as two entirely separate problems, but maybe you have the skill and wish to help fix both of them.
> > > 
> > > - Julian
> > > 
> > > 
> > > Hideki IWAMOTO wrote:
> > > 
> > > > The optimization of diff inclued in version 1.7 has a bug that
> > > > produces incorrect diff on a certain condition.
> > > > The attached patch fix it. 
> > > > 
> > > > 
> > > > Detail of the bug
> > > > -----------------
> > > > 
> > > > When the identical suffix begins at the boundary of a chunk,
> > > > datasource_get_next_token() defined in subversion/libsvn_diff/diff_file.c
> > > > does not stop at head of the identical suffix.
> > > > Therefore, when one of the identical suffixes of the original file
> > > > and the modified file begin from the boundary of a chunk,
> > > > excessive tokens are added to the diff tree.
> > > > 
> > > > How to reproduce
> > > > ----------------
> > > > 
> > > > $ for ((i=0;i<8256;i++)); do echo 0123456789abcde; done > test.txt
> > > > $ hexdump -C test.txt
> > > > 00000000? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > > > *
> > > > 00020400
> > > > $ svn add test.txt; svn ci -m test
> > > > A? ? ? ?  test.txt
> > > > Adding? ? ? ?  test.txt
> > > > Transmitting file data .
> > > > Committed revision 2.
> > > > $ echo 0123456789ABCDE |dd of=test.txt bs=16 seek=64 conv=notrunc
> > > > 1+0 records in
> > > > 1+0 records out
> > > > $ echo 0123456789ABCDE |dd of=test.txt bs=16 seek=8141 conv=notrunc
> > > > 1+0 records in
> > > > 1+0 records out
> > > > $ echo 0123456789abcde >> test.txt
> > > > $ echo 0123456789abcde >> test.txt
> > > > $ hexdump -C test.txt
> > > > 00000000? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > > > *
> > > > 00000400? 30 31 32 33 34 35 36 37? 38 39 41 42 43 44 45 0a? |0123456789ABCDE.|
> > > > 00000410? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > > > *
> > > > 0001fcd0? 30 31 32 33 34 35 36 37? 38 39 41 42 43 44 45 0a? |0123456789ABCDE.|
> > > > 0001fce0? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > > > *
> > > > 00020420
> > > > $ svn cat test.txt | diff -u - test.txt
> > > > --- -?  2012-12-24 22:30:18.760832000 +0900
> > > > +++ test.txt? ? 2012-12-24 22:29:24.000000000 +0900
> > > > @@ -62,6 +62,7 @@
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > +0123456789ABCDE
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > @@ -8138,6 +8139,7 @@
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > +0123456789ABCDE
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > $ svn di test.txt
> > > > Index: test.txt
> > > > ===================================================================
> > > > --- test.txt? ? (revision 2)
> > > > +++ test.txt? ? (working copy)
> > > > @@ -62,6 +62,7 @@
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > +0123456789ABCDE
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > @@ -8138,6 +8139,7 @@
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > +0123456789ABCDE
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > @@ -8188,6 +8190,72 @@
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > +0123456789abcde
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > 0123456789abcde
> > > > 
> > > > 
> > > > -- 
> > > > Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>
> > > > 
> > 
> > -- 
> > Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>
> > 

-- 
Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>

Re: [PATCH] fix for diff optimization bug

Posted by Daniel Shahaf <da...@elego.de>.
Hideki IWAMOTO wrote on Wed, Dec 26, 2012 at 00:31:16 +0900:
> Hi, Daniel.
> 
> > You might be interested to know that on the 'trunk',  a new test has already been added for a similar problem:
> > 
> > $ ./subversion/tests/libsvn_diff/diff-diff3-test --list
> > [...]
> > ? 13??? XFAIL? difference at the start of a 128KB window
> 
> I have some questions about the test which you added for issue #4133.
> 
> 1. Enum constant corresponding to the option `-w' is svn_diff_file_ignore_space_all. 
>    But, you have set svn_diff_file_ignore_space_change to ignore_space. 
>    Why? 
> 

That's probably a bug in the test: it is failing now because it expects
"ba \n" and "ba\n" to be considered equal, but it should be using
svn_diff_file_ignore_space_all for them to be considered equal.

I don't care whether the test uses svn_diff_file_ignore_space_change or
svn_diff_file_ignore_space_all, so long as it tests for issue #4133.

> subversion/tests/libsvn_diff/diff-diff3-test.c:
>  2397:     /* Issue #4133, '"diff -x -w" showing wrong change'.
>              ...
>  2408:      diff_opts->ignore_space = svn_diff_file_ignore_space_change;
> 
> 
> 2. The 128KB chunk size is used only for diff on file, and it seems not to affect on
>    diff on memory. 
>    Why do you use svn_diff_mem_string_diff? 
> 

Because I assumed using svn_diff_mem_* would reproduce issue #4133.  If
it doesn't, the test should be amended to use an API that will reproduce
the issue.

> subversion/tests/libsvn_diff/diff-diff3-test.c:
>  2398:    The magic number used in this test, 1<<17, is
>  2399:    CHUNK_SIZE from ../../libsvn_diff/diff_file.c
>             ...
>  2425:     SVN_ERR(svn_diff_mem_string_diff(&diff, &left, &right, diff_opts, pool));
> 

Thanks for catching these problems.  Would you be able to write patches
to fix them?  It sounds like you're already more familiar with the diff
code than I am. :)

Cheers

Daniel

> 
> 
> On Mon, 24 Dec 2012 20:51:09 +0000 (GMT)
> Julian Foad <ju...@btopenworld.com> wrote:
> 
> > Hi, Hideki.? Thank you very much for finding this bug and a fix for it.
> > You might be interested to know that on the 'trunk',  a new test has already been added for a similar problem:
> > 
> > $ ./subversion/tests/libsvn_diff/diff-diff3-test --list
> > [...]
> > ? 13??? XFAIL? difference at the start of a 128KB window
> > 
> > We don't have a fix for this bug yet.? I don't know whether this bug also exists in version 1.7.8.
> > 
> > 
> > It would be very useful to have a regression test for the bug that you have found.? Would you be able to convert your reproduction recipe into a regression test written in C like the that one on trunk?
> > Please let us know if you would be willing to write a test for the bug you found, and/or port test 13 to version 1.7, and/or write a patch to fix the bug shown by test number 13.? We can treat them as two entirely separate problems, but maybe you have the skill and wish to help fix both of them.
> > 
> > - Julian
> > 
> > 
> > Hideki IWAMOTO wrote:
> > 
> > > The optimization of diff inclued in version 1.7 has a bug that
> > > produces incorrect diff on a certain condition.
> > > The attached patch fix it. 
> > > 
> > > 
> > > Detail of the bug
> > > -----------------
> > > 
> > > When the identical suffix begins at the boundary of a chunk,
> > > datasource_get_next_token() defined in subversion/libsvn_diff/diff_file.c
> > > does not stop at head of the identical suffix.
> > > Therefore, when one of the identical suffixes of the original file
> > > and the modified file begin from the boundary of a chunk,
> > > excessive tokens are added to the diff tree.
> > > 
> > > How to reproduce
> > > ----------------
> > > 
> > > $ for ((i=0;i<8256;i++)); do echo 0123456789abcde; done > test.txt
> > > $ hexdump -C test.txt
> > > 00000000? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > > *
> > > 00020400
> > > $ svn add test.txt; svn ci -m test
> > > A? ? ? ?  test.txt
> > > Adding? ? ? ?  test.txt
> > > Transmitting file data .
> > > Committed revision 2.
> > > $ echo 0123456789ABCDE |dd of=test.txt bs=16 seek=64 conv=notrunc
> > > 1+0 records in
> > > 1+0 records out
> > > $ echo 0123456789ABCDE |dd of=test.txt bs=16 seek=8141 conv=notrunc
> > > 1+0 records in
> > > 1+0 records out
> > > $ echo 0123456789abcde >> test.txt
> > > $ echo 0123456789abcde >> test.txt
> > > $ hexdump -C test.txt
> > > 00000000? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > > *
> > > 00000400? 30 31 32 33 34 35 36 37? 38 39 41 42 43 44 45 0a? |0123456789ABCDE.|
> > > 00000410? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > > *
> > > 0001fcd0? 30 31 32 33 34 35 36 37? 38 39 41 42 43 44 45 0a? |0123456789ABCDE.|
> > > 0001fce0? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > > *
> > > 00020420
> > > $ svn cat test.txt | diff -u - test.txt
> > > --- -?  2012-12-24 22:30:18.760832000 +0900
> > > +++ test.txt? ? 2012-12-24 22:29:24.000000000 +0900
> > > @@ -62,6 +62,7 @@
> > > 0123456789abcde
> > > 0123456789abcde
> > > 0123456789abcde
> > > +0123456789ABCDE
> > > 0123456789abcde
> > > 0123456789abcde
> > > 0123456789abcde
> > > @@ -8138,6 +8139,7 @@
> > > 0123456789abcde
> > > 0123456789abcde
> > > 0123456789abcde
> > > +0123456789ABCDE
> > > 0123456789abcde
> > > 0123456789abcde
> > > 0123456789abcde
> > > $ svn di test.txt
> > > Index: test.txt
> > > ===================================================================
> > > --- test.txt? ? (revision 2)
> > > +++ test.txt? ? (working copy)
> > > @@ -62,6 +62,7 @@
> > > 0123456789abcde
> > > 0123456789abcde
> > > 0123456789abcde
> > > +0123456789ABCDE
> > > 0123456789abcde
> > > 0123456789abcde
> > > 0123456789abcde
> > > @@ -8138,6 +8139,7 @@
> > > 0123456789abcde
> > > 0123456789abcde
> > > 0123456789abcde
> > > +0123456789ABCDE
> > > 0123456789abcde
> > > 0123456789abcde
> > > 0123456789abcde
> > > @@ -8188,6 +8190,72 @@
> > > 0123456789abcde
> > > 0123456789abcde
> > > 0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > +0123456789abcde
> > > 0123456789abcde
> > > 0123456789abcde
> > > 0123456789abcde
> > > 
> > > 
> > > -- 
> > > Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>
> > > 
> 
> -- 
> Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>
> 

Re: [PATCH] fix for diff optimization bug

Posted by Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>.
Hi, Daniel.

> You might be interested to know that on the 'trunk',  a new test has already been added for a similar problem:
> 
> $ ./subversion/tests/libsvn_diff/diff-diff3-test --list
> [...]
> ? 13??? XFAIL? difference at the start of a 128KB window

I have some questions about the test which you added for issue #4133.

1. Enum constant corresponding to the option `-w' is svn_diff_file_ignore_space_all. 
   But, you have set svn_diff_file_ignore_space_change to ignore_space. 
   Why? 

subversion/tests/libsvn_diff/diff-diff3-test.c:
 2397:     /* Issue #4133, '"diff -x -w" showing wrong change'.
             ...
 2408:      diff_opts->ignore_space = svn_diff_file_ignore_space_change;


2. The 128KB chunk size is used only for diff on file, and it seems not to affect on
   diff on memory. 
   Why do you use svn_diff_mem_string_diff? 

subversion/tests/libsvn_diff/diff-diff3-test.c:
 2398:    The magic number used in this test, 1<<17, is
 2399:    CHUNK_SIZE from ../../libsvn_diff/diff_file.c
            ...
 2425:     SVN_ERR(svn_diff_mem_string_diff(&diff, &left, &right, diff_opts, pool));



On Mon, 24 Dec 2012 20:51:09 +0000 (GMT)
Julian Foad <ju...@btopenworld.com> wrote:

> Hi, Hideki.? Thank you very much for finding this bug and a fix for it.
> You might be interested to know that on the 'trunk',  a new test has already been added for a similar problem:
> 
> $ ./subversion/tests/libsvn_diff/diff-diff3-test --list
> [...]
> ? 13??? XFAIL? difference at the start of a 128KB window
> 
> We don't have a fix for this bug yet.? I don't know whether this bug also exists in version 1.7.8.
> 
> 
> It would be very useful to have a regression test for the bug that you have found.? Would you be able to convert your reproduction recipe into a regression test written in C like the that one on trunk?
> Please let us know if you would be willing to write a test for the bug you found, and/or port test 13 to version 1.7, and/or write a patch to fix the bug shown by test number 13.? We can treat them as two entirely separate problems, but maybe you have the skill and wish to help fix both of them.
> 
> - Julian
> 
> 
> Hideki IWAMOTO wrote:
> 
> > The optimization of diff inclued in version 1.7 has a bug that
> > produces incorrect diff on a certain condition.
> > The attached patch fix it. 
> > 
> > 
> > Detail of the bug
> > -----------------
> > 
> > When the identical suffix begins at the boundary of a chunk,
> > datasource_get_next_token() defined in subversion/libsvn_diff/diff_file.c
> > does not stop at head of the identical suffix.
> > Therefore, when one of the identical suffixes of the original file
> > and the modified file begin from the boundary of a chunk,
> > excessive tokens are added to the diff tree.
> > 
> > How to reproduce
> > ----------------
> > 
> > $ for ((i=0;i<8256;i++)); do echo 0123456789abcde; done > test.txt
> > $ hexdump -C test.txt
> > 00000000? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > *
> > 00020400
> > $ svn add test.txt; svn ci -m test
> > A? ? ? ?  test.txt
> > Adding? ? ? ?  test.txt
> > Transmitting file data .
> > Committed revision 2.
> > $ echo 0123456789ABCDE |dd of=test.txt bs=16 seek=64 conv=notrunc
> > 1+0 records in
> > 1+0 records out
> > $ echo 0123456789ABCDE |dd of=test.txt bs=16 seek=8141 conv=notrunc
> > 1+0 records in
> > 1+0 records out
> > $ echo 0123456789abcde >> test.txt
> > $ echo 0123456789abcde >> test.txt
> > $ hexdump -C test.txt
> > 00000000? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > *
> > 00000400? 30 31 32 33 34 35 36 37? 38 39 41 42 43 44 45 0a? |0123456789ABCDE.|
> > 00000410? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > *
> > 0001fcd0? 30 31 32 33 34 35 36 37? 38 39 41 42 43 44 45 0a? |0123456789ABCDE.|
> > 0001fce0? 30 31 32 33 34 35 36 37? 38 39 61 62 63 64 65 0a? |0123456789abcde.|
> > *
> > 00020420
> > $ svn cat test.txt | diff -u - test.txt
> > --- -?  2012-12-24 22:30:18.760832000 +0900
> > +++ test.txt? ? 2012-12-24 22:29:24.000000000 +0900
> > @@ -62,6 +62,7 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789ABCDE
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > @@ -8138,6 +8139,7 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789ABCDE
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > $ svn di test.txt
> > Index: test.txt
> > ===================================================================
> > --- test.txt? ? (revision 2)
> > +++ test.txt? ? (working copy)
> > @@ -62,6 +62,7 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789ABCDE
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > @@ -8138,6 +8139,7 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789ABCDE
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > @@ -8188,6 +8190,72 @@
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > +0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > 0123456789abcde
> > 
> > 
> > -- 
> > Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>
> > 

-- 
Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>


Re: [PATCH] fix for diff optimization bug

Posted by Julian Foad <ju...@btopenworld.com>.
Hi, Hideki.  Thank you very much for finding this bug and a fix for it.
You might be interested to know that on the 'trunk',  a new test has already been added for a similar problem:

$ ./subversion/tests/libsvn_diff/diff-diff3-test --list
[...]
  13    XFAIL  difference at the start of a 128KB window

We don't have a fix for this bug yet.  I don't know whether this bug also exists in version 1.7.8.


It would be very useful to have a regression test for the bug that you have found.  Would you be able to convert your reproduction recipe into a regression test written in C like the that one on trunk?
Please let us know if you would be willing to write a test for the bug you found, and/or port test 13 to version 1.7, and/or write a patch to fix the bug shown by test number 13.  We can treat them as two entirely separate problems, but maybe you have the skill and wish to help fix both of them.

- Julian


Hideki IWAMOTO wrote:

> The optimization of diff inclued in version 1.7 has a bug that
> produces incorrect diff on a certain condition.
> The attached patch fix it. 
> 
> 
> Detail of the bug
> -----------------
> 
> When the identical suffix begins at the boundary of a chunk,
> datasource_get_next_token() defined in subversion/libsvn_diff/diff_file.c
> does not stop at head of the identical suffix.
> Therefore, when one of the identical suffixes of the original file
> and the modified file begin from the boundary of a chunk,
> excessive tokens are added to the diff tree.
> 
> How to reproduce
> ----------------
> 
> $ for ((i=0;i<8256;i++)); do echo 0123456789abcde; done > test.txt
> $ hexdump -C test.txt
> 00000000  30 31 32 33 34 35 36 37  38 39 61 62 63 64 65 0a  |0123456789abcde.|
> *
> 00020400
> $ svn add test.txt; svn ci -m test
> A         test.txt
> Adding         test.txt
> Transmitting file data .
> Committed revision 2.
> $ echo 0123456789ABCDE |dd of=test.txt bs=16 seek=64 conv=notrunc
> 1+0 records in
> 1+0 records out
> $ echo 0123456789ABCDE |dd of=test.txt bs=16 seek=8141 conv=notrunc
> 1+0 records in
> 1+0 records out
> $ echo 0123456789abcde >> test.txt
> $ echo 0123456789abcde >> test.txt
> $ hexdump -C test.txt
> 00000000  30 31 32 33 34 35 36 37  38 39 61 62 63 64 65 0a  |0123456789abcde.|
> *
> 00000400  30 31 32 33 34 35 36 37  38 39 41 42 43 44 45 0a  |0123456789ABCDE.|
> 00000410  30 31 32 33 34 35 36 37  38 39 61 62 63 64 65 0a  |0123456789abcde.|
> *
> 0001fcd0  30 31 32 33 34 35 36 37  38 39 41 42 43 44 45 0a  |0123456789ABCDE.|
> 0001fce0  30 31 32 33 34 35 36 37  38 39 61 62 63 64 65 0a  |0123456789abcde.|
> *
> 00020420
> $ svn cat test.txt | diff -u - test.txt
> --- -   2012-12-24 22:30:18.760832000 +0900
> +++ test.txt    2012-12-24 22:29:24.000000000 +0900
> @@ -62,6 +62,7 @@
> 0123456789abcde
> 0123456789abcde
> 0123456789abcde
> +0123456789ABCDE
> 0123456789abcde
> 0123456789abcde
> 0123456789abcde
> @@ -8138,6 +8139,7 @@
> 0123456789abcde
> 0123456789abcde
> 0123456789abcde
> +0123456789ABCDE
> 0123456789abcde
> 0123456789abcde
> 0123456789abcde
> $ svn di test.txt
> Index: test.txt
> ===================================================================
> --- test.txt    (revision 2)
> +++ test.txt    (working copy)
> @@ -62,6 +62,7 @@
> 0123456789abcde
> 0123456789abcde
> 0123456789abcde
> +0123456789ABCDE
> 0123456789abcde
> 0123456789abcde
> 0123456789abcde
> @@ -8138,6 +8139,7 @@
> 0123456789abcde
> 0123456789abcde
> 0123456789abcde
> +0123456789ABCDE
> 0123456789abcde
> 0123456789abcde
> 0123456789abcde
> @@ -8188,6 +8190,72 @@
> 0123456789abcde
> 0123456789abcde
> 0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> +0123456789abcde
> 0123456789abcde
> 0123456789abcde
> 0123456789abcde
> 
> 
> -- 
> Hideki IWAMOTO <h-...@kit.hi-ho.ne.jp>
>