You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Philip Martin <ph...@wandisco.com> on 2013/03/25 21:31:27 UTC

Diff suffix scanning invalid read

Philip Martin <ph...@wandisco.com> writes:

> $ valgrind -q .libs/lt-diff-diff3-test 15
> ==6097== Invalid read of size 1
> ==6097==    at 0x503FD83: find_identical_suffix (diff_file.c:586)
> ==6097==    by 0x5040C45: datasources_open (diff_file.c:815)
> ==6097==    by 0x503D6B2: svn_diff_diff3_2 (diff3.c:276)
> ==6097==    by 0x5041D3A: svn_diff_file_diff3_2 (diff_file.c:1327)
> ==6097==    by 0x401F2F: three_way_merge (diff-diff3-test.c:191)
> ==6097==    by 0x4027B7: two_way_diff (diff-diff3-test.c:311)
> ==6097==    by 0x405DF8: test_token_compare (diff-diff3-test.c:2589)
> ==6097==    by 0x4E34C6A: do_test_num (svn_test_main.c:268)
> ==6097==    by 0x4E35686: main (svn_test_main.c:551)
> ==6097==  Address 0x138585af is 1 bytes before a block of size 131,072 alloc'd
> ==6097==    at 0x4C28BED: malloc (vg_replace_malloc.c:263)
> ==6097==    by 0x572DDDB: pool_alloc (apr_pools.c:1463)
> ==6097==    by 0x572DF57: apr_palloc_debug (apr_pools.c:1504)
> ==6097==    by 0x503FACA: find_identical_suffix (diff_file.c:558)
> ==6097==    by 0x5040C45: datasources_open (diff_file.c:815)
> ==6097==    by 0x503D6B2: svn_diff_diff3_2 (diff3.c:276)
> ==6097==    by 0x5041D3A: svn_diff_file_diff3_2 (diff_file.c:1327)
> ==6097==    by 0x401F2F: three_way_merge (diff-diff3-test.c:191)
> ==6097==    by 0x4027B7: two_way_diff (diff-diff3-test.c:311)
> ==6097==    by 0x405DF8: test_token_compare (diff-diff3-test.c:2589)
> ==6097==    by 0x4E34C6A: do_test_num (svn_test_main.c:268)
> ==6097==    by 0x4E35686: main (svn_test_main.c:551)

On entry to find_identical_suffix file[0] is

(gdb) p file[0]
$1 = {path = 0x407ee6 "token-compare-original1", file = 0x127cea0, 
  size = 131072, chunk = 0, buffer = 0x127cf50 '\n' <repeats 200 times>..., 
  curp = 0x129cf48 "    @@@\n\300\312&\001", endp = 0x129cf50 "\300\312&\001", 
  normalize_state = svn_diff__normalize_state_normal, suffix_start_chunk = -1, 
  suffix_offset_in_chunk = 0}

The file data ends at the end of the chunk.  The code does

      if (file_for_suffix[i].chunk == file[i].chunk)
        {
          /* Prefix ended in last chunk, so we can reuse the prefix buffer */
          file_for_suffix[i].buffer = file[i].buffer;
        }
      else
        {
          /* There is at least more than 1 chunk,
             so allocate full chunk size buffer */
          file_for_suffix[i].buffer = apr_palloc(pool, CHUNK_SIZE);
          SVN_ERR(read_chunk(file_for_suffix[i].file, file_for_suffix[i].path,
                             file_for_suffix[i].buffer, length[i],
                             chunk_to_offset(file_for_suffix[i].chunk),
                             pool));
        }
      file_for_suffix[i].endp = file_for_suffix[i].buffer + length[i];
      file_for_suffix[i].curp = file_for_suffix[i].endp - 1;

and allocates a new chunk so that file_for_suffix[0] is

(gdb) p file_for_suffix[0]
$3 = {path = 0x407ee6 "token-compare-original1", file = 0x127cea0, 
  size = 131072, chunk = 1, buffer = 0x12bd0d0 'A' <repeats 200 times>..., 
  curp = 0x12bd0cf "", endp = 0x12bd0d0 'A' <repeats 200 times>..., 
  normalize_state = svn_diff__normalize_state_normal, suffix_start_chunk = 0, 
  suffix_offset_in_chunk = 0}

file_for_suffix[0=.endp is 1 byte before file_for_suffix[0].buffer and
so dereferencing it is not valid.

How are the chunks supposed to work?  When the file data ends at the end
of a chunk is there supposed to be another valid chunk with no data?

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download

Re: Diff suffix scanning invalid read

Posted by Johan Corveleyn <jc...@gmail.com>.
On Mon, Mar 25, 2013 at 11:09 PM, Johan Corveleyn <jc...@gmail.com> wrote:
> On Mon, Mar 25, 2013 at 9:31 PM, Philip Martin
> <ph...@wandisco.com> wrote:
>> Philip Martin <ph...@wandisco.com> writes:
>>
>>> $ valgrind -q .libs/lt-diff-diff3-test 15
>>> ==6097== Invalid read of size 1
>>> ==6097==    at 0x503FD83: find_identical_suffix (diff_file.c:586)
>>> ==6097==    by 0x5040C45: datasources_open (diff_file.c:815)
>>> ==6097==    by 0x503D6B2: svn_diff_diff3_2 (diff3.c:276)
>>> ==6097==    by 0x5041D3A: svn_diff_file_diff3_2 (diff_file.c:1327)
>>> ==6097==    by 0x401F2F: three_way_merge (diff-diff3-test.c:191)
>>> ==6097==    by 0x4027B7: two_way_diff (diff-diff3-test.c:311)
>>> ==6097==    by 0x405DF8: test_token_compare (diff-diff3-test.c:2589)
>>> ==6097==    by 0x4E34C6A: do_test_num (svn_test_main.c:268)
>>> ==6097==    by 0x4E35686: main (svn_test_main.c:551)
>>> ==6097==  Address 0x138585af is 1 bytes before a block of size 131,072 alloc'd
>>> ==6097==    at 0x4C28BED: malloc (vg_replace_malloc.c:263)
>>> ==6097==    by 0x572DDDB: pool_alloc (apr_pools.c:1463)
>>> ==6097==    by 0x572DF57: apr_palloc_debug (apr_pools.c:1504)
>>> ==6097==    by 0x503FACA: find_identical_suffix (diff_file.c:558)
>>> ==6097==    by 0x5040C45: datasources_open (diff_file.c:815)
>>> ==6097==    by 0x503D6B2: svn_diff_diff3_2 (diff3.c:276)
>>> ==6097==    by 0x5041D3A: svn_diff_file_diff3_2 (diff_file.c:1327)
>>> ==6097==    by 0x401F2F: three_way_merge (diff-diff3-test.c:191)
>>> ==6097==    by 0x4027B7: two_way_diff (diff-diff3-test.c:311)
>>> ==6097==    by 0x405DF8: test_token_compare (diff-diff3-test.c:2589)
>>> ==6097==    by 0x4E34C6A: do_test_num (svn_test_main.c:268)
>>> ==6097==    by 0x4E35686: main (svn_test_main.c:551)
>>
>> On entry to find_identical_suffix file[0] is
>>
>> (gdb) p file[0]
>> $1 = {path = 0x407ee6 "token-compare-original1", file = 0x127cea0,
>>   size = 131072, chunk = 0, buffer = 0x127cf50 '\n' <repeats 200 times>...,
>>   curp = 0x129cf48 "    @@@\n\300\312&\001", endp = 0x129cf50 "\300\312&\001",
>>   normalize_state = svn_diff__normalize_state_normal, suffix_start_chunk = -1,
>>   suffix_offset_in_chunk = 0}
>>
>> The file data ends at the end of the chunk.  The code does
>>
>>       if (file_for_suffix[i].chunk == file[i].chunk)
>>         {
>>           /* Prefix ended in last chunk, so we can reuse the prefix buffer */
>>           file_for_suffix[i].buffer = file[i].buffer;
>>         }
>>       else
>>         {
>>           /* There is at least more than 1 chunk,
>>              so allocate full chunk size buffer */
>>           file_for_suffix[i].buffer = apr_palloc(pool, CHUNK_SIZE);
>>           SVN_ERR(read_chunk(file_for_suffix[i].file, file_for_suffix[i].path,
>>                              file_for_suffix[i].buffer, length[i],
>>                              chunk_to_offset(file_for_suffix[i].chunk),
>>                              pool));
>>         }
>>       file_for_suffix[i].endp = file_for_suffix[i].buffer + length[i];
>>       file_for_suffix[i].curp = file_for_suffix[i].endp - 1;
>>
>> and allocates a new chunk so that file_for_suffix[0] is
>>
>> (gdb) p file_for_suffix[0]
>> $3 = {path = 0x407ee6 "token-compare-original1", file = 0x127cea0,
>>   size = 131072, chunk = 1, buffer = 0x12bd0d0 'A' <repeats 200 times>...,
>>   curp = 0x12bd0cf "", endp = 0x12bd0d0 'A' <repeats 200 times>...,
>>   normalize_state = svn_diff__normalize_state_normal, suffix_start_chunk = 0,
>>   suffix_offset_in_chunk = 0}
>>
>> file_for_suffix[0=.endp is 1 byte before file_for_suffix[0].buffer and
>> so dereferencing it is not valid.
>
> You mean file_for_suffix[0].curp, which is 1 byte before
> file_for_suffix[0].buffer. Bummer, it's not supposed to do that.
>
>> How are the chunks supposed to work?  When the file data ends at the end
>> of a chunk is there supposed to be another valid chunk with no data?
>
> It's been a while, so I'm not sure. I *think* the intention is that
> find_identical_suffix is not executed in this case, because in
> datasources_open, find_identical_suffix is only executed if
> (!reached_one_eof). And find_identical_prefix should have set
> reached_one_eof in this case. I think ... But apparently it doesn't
> ...

Hmmm, I should have looked closer. The case of file.size==131072
(which is CHUNK_SIZE, which is 1<<CHUNK_SHIFT, which is 17), or any
chunk_size multiple, can happen with or without find_identical_prefix
eating it all. The fact is that offset_to_chunk is defined as:

#define offset_to_chunk(offset) ((offset) >> CHUNK_SHIFT)

which means that offset_to_chunk(131072) is 1. And it just so happens
that in multiple places in diff_file.c the "last chunk" is determined
as offset_to_chunk(file.size).

I think this ("last chunk being offset_to_chunk(file.size)") predates
my involvement of working on the prefix/suffix scanning. So I think
it's safe to say that: yes, it seems that there is supposed to be
another chunk with no data. Perhaps someone else with longer
involvement with the origins of the code of diff_file.c can confirm
that this is indeed the intention.

If that's indeed the case (there can be a last chunk with no data), I
suppose it's best to fix find_identical_suffix to check for that.

Sorry to not be any more help ... a bit overworked lately.
-- 
Johan

Re: Diff suffix scanning invalid read

Posted by Johan Corveleyn <jc...@gmail.com>.
On Mon, Mar 25, 2013 at 9:31 PM, Philip Martin
<ph...@wandisco.com> wrote:
> Philip Martin <ph...@wandisco.com> writes:
>
>> $ valgrind -q .libs/lt-diff-diff3-test 15
>> ==6097== Invalid read of size 1
>> ==6097==    at 0x503FD83: find_identical_suffix (diff_file.c:586)
>> ==6097==    by 0x5040C45: datasources_open (diff_file.c:815)
>> ==6097==    by 0x503D6B2: svn_diff_diff3_2 (diff3.c:276)
>> ==6097==    by 0x5041D3A: svn_diff_file_diff3_2 (diff_file.c:1327)
>> ==6097==    by 0x401F2F: three_way_merge (diff-diff3-test.c:191)
>> ==6097==    by 0x4027B7: two_way_diff (diff-diff3-test.c:311)
>> ==6097==    by 0x405DF8: test_token_compare (diff-diff3-test.c:2589)
>> ==6097==    by 0x4E34C6A: do_test_num (svn_test_main.c:268)
>> ==6097==    by 0x4E35686: main (svn_test_main.c:551)
>> ==6097==  Address 0x138585af is 1 bytes before a block of size 131,072 alloc'd
>> ==6097==    at 0x4C28BED: malloc (vg_replace_malloc.c:263)
>> ==6097==    by 0x572DDDB: pool_alloc (apr_pools.c:1463)
>> ==6097==    by 0x572DF57: apr_palloc_debug (apr_pools.c:1504)
>> ==6097==    by 0x503FACA: find_identical_suffix (diff_file.c:558)
>> ==6097==    by 0x5040C45: datasources_open (diff_file.c:815)
>> ==6097==    by 0x503D6B2: svn_diff_diff3_2 (diff3.c:276)
>> ==6097==    by 0x5041D3A: svn_diff_file_diff3_2 (diff_file.c:1327)
>> ==6097==    by 0x401F2F: three_way_merge (diff-diff3-test.c:191)
>> ==6097==    by 0x4027B7: two_way_diff (diff-diff3-test.c:311)
>> ==6097==    by 0x405DF8: test_token_compare (diff-diff3-test.c:2589)
>> ==6097==    by 0x4E34C6A: do_test_num (svn_test_main.c:268)
>> ==6097==    by 0x4E35686: main (svn_test_main.c:551)
>
> On entry to find_identical_suffix file[0] is
>
> (gdb) p file[0]
> $1 = {path = 0x407ee6 "token-compare-original1", file = 0x127cea0,
>   size = 131072, chunk = 0, buffer = 0x127cf50 '\n' <repeats 200 times>...,
>   curp = 0x129cf48 "    @@@\n\300\312&\001", endp = 0x129cf50 "\300\312&\001",
>   normalize_state = svn_diff__normalize_state_normal, suffix_start_chunk = -1,
>   suffix_offset_in_chunk = 0}
>
> The file data ends at the end of the chunk.  The code does
>
>       if (file_for_suffix[i].chunk == file[i].chunk)
>         {
>           /* Prefix ended in last chunk, so we can reuse the prefix buffer */
>           file_for_suffix[i].buffer = file[i].buffer;
>         }
>       else
>         {
>           /* There is at least more than 1 chunk,
>              so allocate full chunk size buffer */
>           file_for_suffix[i].buffer = apr_palloc(pool, CHUNK_SIZE);
>           SVN_ERR(read_chunk(file_for_suffix[i].file, file_for_suffix[i].path,
>                              file_for_suffix[i].buffer, length[i],
>                              chunk_to_offset(file_for_suffix[i].chunk),
>                              pool));
>         }
>       file_for_suffix[i].endp = file_for_suffix[i].buffer + length[i];
>       file_for_suffix[i].curp = file_for_suffix[i].endp - 1;
>
> and allocates a new chunk so that file_for_suffix[0] is
>
> (gdb) p file_for_suffix[0]
> $3 = {path = 0x407ee6 "token-compare-original1", file = 0x127cea0,
>   size = 131072, chunk = 1, buffer = 0x12bd0d0 'A' <repeats 200 times>...,
>   curp = 0x12bd0cf "", endp = 0x12bd0d0 'A' <repeats 200 times>...,
>   normalize_state = svn_diff__normalize_state_normal, suffix_start_chunk = 0,
>   suffix_offset_in_chunk = 0}
>
> file_for_suffix[0=.endp is 1 byte before file_for_suffix[0].buffer and
> so dereferencing it is not valid.

You mean file_for_suffix[0].curp, which is 1 byte before
file_for_suffix[0].buffer. Bummer, it's not supposed to do that.

> How are the chunks supposed to work?  When the file data ends at the end
> of a chunk is there supposed to be another valid chunk with no data?

It's been a while, so I'm not sure. I *think* the intention is that
find_identical_suffix is not executed in this case, because in
datasources_open, find_identical_suffix is only executed if
(!reached_one_eof). And find_identical_prefix should have set
reached_one_eof in this case. I think ... But apparently it doesn't
...

-- 
Johan