You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Johan Corveleyn <jc...@gmail.com> on 2010/12/14 22:35:47 UTC

[RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Hi all,

On the diff-optimizations-bytes branch, in diff_file.c, there are two
functions which are called for every byte of the identical prefix and
suffix: increment_pointers and decrement_pointers. These functions are
actually equivalents of curp++ or curp--, reading the next/previous
byte, but taking into account the chunked-ness of the data (file data
is read in memory in chunks).

As an experiment I changed these functions into macro's, eliminating
the function calls. This makes the diff algorithm another 10% - 15%
faster (granted, this was measured with my "extreme" testcase of a 1,5
Mb file (60000 lines), of which most lines are identical
prefix/suffix). However, having an entire function like that
implemented as a macro isn't very pretty (see below for an example).

Some considerations:
- Maybe I can use APR_INLINE, with similar results?

- Maybe I can put just the "critical section" into a macro (the curp++
/ curp-- part), and call a function when a chunk boundary is
encountered (~ once every 131072 iterations (chunks are 128 Kb
large)), to read in the new chunk, advancing variables, ...

- Maybe it's not worth it?

Thoughts?

Just for kicks, here is an example of increment_pointers as a macro:
[[[
#define increment_pointers(afile, file_len, pool) {                          \
  int i;                                                                     \
                                                                             \
  for (i = 0; i < file_len; i++)                                             \
    if (afile[i]->chunk == -1) /* indicates before beginning of file */      \
      {                                                                      \
        afile[i]->chunk = 0; /* point to beginning of file again */          \
      }                                                                      \
    else if (afile[i]->curp == afile[i]->endp - 1)                           \
      {                                                                      \
        apr_off_t last_chunk = offset_to_chunk(afile[i]->size);              \
        if (afile[i]->chunk == last_chunk)                                   \
          {                                                                  \
            afile[i]->curp++; /* curp == endp signals end of file */         \
          }                                                                  \
        else                                                                 \
          {                                                                  \
            apr_off_t length;                                                \
            afile[i]->chunk++;                                               \
            length = afile[i]->chunk == last_chunk ?                         \
              offset_in_chunk(afile[i]->size) : CHUNK_SIZE;                  \
            SVN_ERR(read_chunk(afile[i]->file, afile[i]->path,
afile[i]->buffer,\
                               length, chunk_to_offset(afile[i]->chunk),     \
                               pool));                                       \
            afile[i]->endp = afile[i]->buffer + length;                      \
            afile[i]->curp = afile[i]->buffer;                               \
          }                                                                  \
      }                                                                      \
    else                                                                     \
      {                                                                      \
        afile[i]->curp++;                                                    \
      }                                                                      \
  }
]]]

Cheers,
-- 
Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Tue, Dec 28, 2010 at 7:31 PM, Johan Corveleyn <jc...@gmail.com> wrote:
> On Fri, Dec 24, 2010 at 3:40 PM, Stefan Fuhrmann <eq...@web.de> wrote:
>> On 20.12.2010 02:43, Johan Corveleyn wrote:
>>>
>>> On Wed, Dec 15, 2010 at 10:58 AM, Stefan Fuhrmann<eq...@web.de>  wrote:
>>>>
>>>> On 15.12.2010 02:30, Stefan Fuhrmann wrote:
>>>>>
>>>>> On 14.12.2010 23:35, Johan Corveleyn wrote:
>>>>>
>>>>>> Thoughts?
>>>>>
>>>>> Two things you might try:
>>>>> * introduce a local variable for afile[i]
>>>>> * replace the for() loop with two nested ones, keeping
>>>>>  calls to other functions out of the hot spot:
>>>>>
>>>>> for (i=0; i<  file_len;)
>>>>
>>>> That should read:
>>>> for (i=0; i<  file_len; i++)
>>>>>
>>>>> {
>>>>>  /* hot spot: */
>>>>>  for(; i<  file_len; i++)
>>>>>  {
>>>>>    curFile = afile[i];
>>>>>    if (curFile->chunk==-1)
>>>>>      curFile->chunk = 0;
>>>>>    else if (curFile->curp != curFile->endp -1)
>>>>>      curFile->curp++;
>>>>>    else
>>>>>      break;
>>>>>  }
>>>>>
>>>>>  if (i<  file_len)
>>>>>  {
>>>>>    /* the complex, rarely used stuff goes here */
>>>>>  }
>>>>> }
>>>
>>> Ok, I tried this, but it didn't really help. It's still only about as
>>> fast as before. While the macro version is about 10% faster (I made a
>>> cleaner macro version, only macro'ing the hotspot stuff, with a
>>> function call to the complex, rarely used stuff).
>>>
>>> For the inline version, I tried several variations (always with
>>> APR_INLINE in the function signature, and with local variable for
>>> file[i]): with or without the nested for loop, with or without the
>>> complex stuff factored out to a separate function, ... it made no
>>> difference.
>>>
>>> Maybe I'm still doing something wrong, keeping the compiler from
>>> inlining it (but I'm not really a compiler expert, maybe I have to add
>>> some compiler options or something, ...). OTOH, I now have a macro
>>> implementation which is quite readable (and which uses the do ...
>>> while(0) construct - thanks Peter), and which does the trick. So maybe
>>> I should simply stick with that option ...
>>
>> The key factor here is that file_len is only 2.
>> Basically, that will make my optimization a
>> pessimization.
>>
>> Assuming that for most calls, curp of *both*
>> files will be somewhere *between* BOF and
>> EOF, the unrolling the loop may help:
>>
>> #define INCREMENT_POINTERS
>>
>> /* special code for the common case. 10 .. 12 ticks per execution */
>>
>> static APR_INLINE svn_boolean_t
>> quickly_increment_pointers(struct file_info *afile[])
>> {
>> struct file_info *file0 = afile[0];
>> struct file_info *file1 = afile[1];
>> if ((afile0->chunk != -1) && (afile1->chunk != -1))
>>  {
>>    /* suitable_type */ nextp0 = afile0->curp + 1;
>>    /* suitable_type */ nextp1 = afile1->curp + 1;
>>    if ((nextp0 != afile0->endp) && (nextp1 != afile1->endp))
>>      {
>>        afile0->curp = nextp0;
>>        afile1->curp = nextp1;
>>        return TRUE;
>>      }
>>  }
>> return FALSE;
>> }
>>
>> /* slow code covering all special cases */
>>
>> static svn_error_t*
>> slowly_increment_pointers(struct file_info *file[], apr_size_t file_len,
>> apr_pool_t *pool)
>> {
>>  for (i = 0; i < file_len;i++)
>> ...
>> }
>>
>> /* maybe as macro: */
>> return ((file_len == 2) && quickly_increment_pointers (file))
>>  ? SVN_NO_ERROR
>>  : slowly_increment_pointers(file, file_len, pool);
>
> Nice! I will try this out sometime soon, but I'm not so sure it will
> help more than the current macro version (only macro for the critical
> section, calling a function for increment/decrement chunk - see
> diff_file.c in HEAD of diff-optimizations-bytes branch). I guess I'll
> just have to test it.
>
> With the macro implementation in place my gut feeling is that the
> prefix/suffix scanning is reasonably optimal, and that the remaining
> time spent in the diff code (70 seconds for my example blame of big
> xml file with 2200 changes) is almost all due to the "original" diff
> algorithm (working on the remainder between prefix and suffix). But to
> be sure I should probably measure that first.

Hmmm, my gut feeling seems to be a little bit off. I measured this,
and prefix/suffix scanning is still taking 46 - 50 seconds of the
total 70 seconds spent in "diff-time" during the blame of my example
file (~20 seconds are going to "getting tokens and inserting them into
the token tree", and ~5 seconds in the actual LCS algorithm). So
optimizations here will probably still be very useful.

I'm going to let this rest for a while, until I get the prefix/suffix
scanning work for diff3 and diff4. After I get that all working, I
might revisit this for more optimization ...

Cheers,
-- 
Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Fri, Dec 24, 2010 at 3:40 PM, Stefan Fuhrmann <eq...@web.de> wrote:
> On 20.12.2010 02:43, Johan Corveleyn wrote:
>>
>> On Wed, Dec 15, 2010 at 10:58 AM, Stefan Fuhrmann<eq...@web.de>  wrote:
>>>
>>> On 15.12.2010 02:30, Stefan Fuhrmann wrote:
>>>>
>>>> On 14.12.2010 23:35, Johan Corveleyn wrote:
>>>>
>>>>> Thoughts?
>>>>
>>>> Two things you might try:
>>>> * introduce a local variable for afile[i]
>>>> * replace the for() loop with two nested ones, keeping
>>>>  calls to other functions out of the hot spot:
>>>>
>>>> for (i=0; i<  file_len;)
>>>
>>> That should read:
>>> for (i=0; i<  file_len; i++)
>>>>
>>>> {
>>>>  /* hot spot: */
>>>>  for(; i<  file_len; i++)
>>>>  {
>>>>    curFile = afile[i];
>>>>    if (curFile->chunk==-1)
>>>>      curFile->chunk = 0;
>>>>    else if (curFile->curp != curFile->endp -1)
>>>>      curFile->curp++;
>>>>    else
>>>>      break;
>>>>  }
>>>>
>>>>  if (i<  file_len)
>>>>  {
>>>>    /* the complex, rarely used stuff goes here */
>>>>  }
>>>> }
>>
>> Ok, I tried this, but it didn't really help. It's still only about as
>> fast as before. While the macro version is about 10% faster (I made a
>> cleaner macro version, only macro'ing the hotspot stuff, with a
>> function call to the complex, rarely used stuff).
>>
>> For the inline version, I tried several variations (always with
>> APR_INLINE in the function signature, and with local variable for
>> file[i]): with or without the nested for loop, with or without the
>> complex stuff factored out to a separate function, ... it made no
>> difference.
>>
>> Maybe I'm still doing something wrong, keeping the compiler from
>> inlining it (but I'm not really a compiler expert, maybe I have to add
>> some compiler options or something, ...). OTOH, I now have a macro
>> implementation which is quite readable (and which uses the do ...
>> while(0) construct - thanks Peter), and which does the trick. So maybe
>> I should simply stick with that option ...
>
> The key factor here is that file_len is only 2.
> Basically, that will make my optimization a
> pessimization.
>
> Assuming that for most calls, curp of *both*
> files will be somewhere *between* BOF and
> EOF, the unrolling the loop may help:
>
> #define INCREMENT_POINTERS
>
> /* special code for the common case. 10 .. 12 ticks per execution */
>
> static APR_INLINE svn_boolean_t
> quickly_increment_pointers(struct file_info *afile[])
> {
> struct file_info *file0 = afile[0];
> struct file_info *file1 = afile[1];
> if ((afile0->chunk != -1) && (afile1->chunk != -1))
>  {
>    /* suitable_type */ nextp0 = afile0->curp + 1;
>    /* suitable_type */ nextp1 = afile1->curp + 1;
>    if ((nextp0 != afile0->endp) && (nextp1 != afile1->endp))
>      {
>        afile0->curp = nextp0;
>        afile1->curp = nextp1;
>        return TRUE;
>      }
>  }
> return FALSE;
> }
>
> /* slow code covering all special cases */
>
> static svn_error_t*
> slowly_increment_pointers(struct file_info *file[], apr_size_t file_len,
> apr_pool_t *pool)
> {
>  for (i = 0; i < file_len;i++)
> ...
> }
>
> /* maybe as macro: */
> return ((file_len == 2) && quickly_increment_pointers (file))
>  ? SVN_NO_ERROR
>  : slowly_increment_pointers(file, file_len, pool);

Nice! I will try this out sometime soon, but I'm not so sure it will
help more than the current macro version (only macro for the critical
section, calling a function for increment/decrement chunk - see
diff_file.c in HEAD of diff-optimizations-bytes branch). I guess I'll
just have to test it.

With the macro implementation in place my gut feeling is that the
prefix/suffix scanning is reasonably optimal, and that the remaining
time spent in the diff code (70 seconds for my example blame of big
xml file with 2200 changes) is almost all due to the "original" diff
algorithm (working on the remainder between prefix and suffix). But to
be sure I should probably measure that first.

(I should note here that, with my example file, the prefix/suffix
scanning doesn't always work optimally, because in about half of the
revisions there is a change both in one of the first lines (where an
author name is inserted automatically by XML Spy upon saving it), and
then somewhere in the middle where the actual change is done. It's
really a pity, because that means about 20000-30000 lines cannot be
skipped as prefix.)

> Minor remark on C style:
>
>> static APR_INLINE svn_error_t *
>> increment_pointers(struct file_info *file[], int file_len, apr_pool_t
>> *pool)
>
> file_len should be apr_size_t unless it can assume negative
> values in which case it should be apr_ssize_t.That way, you
> know for sure it will never overflow. Int is potentially too short
> (although extreeeemly unlikely in this case) and it is signed
> (often not appropriate for an index value).

Thanks. I didn't know that. I'll change it.

> One more thing that might help speed up your code. The fact
> that the calling overhead of a single & simple function turns
> out to be a significant contribution to the overall runtime tells
> us that the execution is tightly CPU-bound and squeezing
> some extra cycles out of it could help. Try
>
> struct file_info file[] instead of struct file_info *file[]
>
> i.e. array of structs instead of array of pointers. The boring
> background is as follows.
>
> file[index]->member
>
> requires 2 cache accesses: reading the pointer form the
> array (assuming that the array pointer itself and the index
> value are already in some register) followed by an access
> relative to that pointer:
>
>    /* assume file -> reg_0, index -> reg_1 */
> (1.1) mem[reg_0 + 4 * reg_1] -> reg_2    // content of file[index], i.e.
> file_info pointer
> (1.2) mem[reg_2 + member_offset] -> reg_3  // content of member element
>
> The really bad part of this is that (1.2) depends on (1.1) and
> must wait for the data to come in from the cache. On Corei7
> this takes 4 ticks (on others like PPC it takes 2). So, while
> you can issue 1 cache request per tick, the dependency
> makes the whole thing take at least 5 ticks.
>
> Using an array of structures this becomes
>
> file[index].member
>    /* assume file -> reg_0, index -> reg_1 */
> (2.1) reg_1 * 32 -> reg_2 // offset of struct [index], assume 32 == sizeof
> (file_info)
> (2.2) mem[reg_0 + reg_2 + member_offset] -> reg_3  // content of member
> element
>
> Now this is only 2 ticks with (2.1) often being hidden, pre-
> calculated elsewhere or entirely unnecessary (if index is
> fixed).
>
> A final problem with the array of pointers issue is that
> you often cannot reuse the result of (1.1) because in C
> a pointer can point anywhere. e.g.:
>
> file[0] = (file_struct*)file;
> file[0]->member1 = 123; /* might (partially) modify file[0] */
> file[0]->member0 = 123; /* same here */
> file[0]->member2 = 123; /* and here */
>
> The C compiler cannot be *sure* that something to this
> effect has not been done somewhere. Therefore, any
> write access to a "random" address invalidates all pre-
> fetched information from "random" positions. "random"
> basically means "not from stack" or similar well-traceable
> locations.
>
> file_struct *file0 = file[0]
> file0->member1 = 0;
> file0->member2 = 123;
>
> Is thus often faster than
>
> file[0]->member1 = 0;
> file[0]->member2 = 123;
>
> but still much slower than
>
> file[0].member1 = 0;
> file[0].member2 = 123;
>
> Food for thought ;)

Interesting! I'll have to chew a little bit on this food :-), but
perhaps I'll experiment a little bit with it.

As I said above, I think I'll first try to find out if most of the
time is still spent in prefix/suffix scanning, or in the original diff
algorithm on the remaining non-matching section. Just so I don't spend
too much time optimizing the wrong thing :-). Maybe this kind of
optimization could be tried on the original algorithm ...

Thanks for the input.

Cheers,
-- 
Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Stefan Fuhrmann <eq...@web.de>.

On 20.12.2010 02:43, Johan Corveleyn wrote:
> On Wed, Dec 15, 2010 at 10:58 AM, Stefan Fuhrmann<eq...@web.de>  wrote:
>> On 15.12.2010 02:30, Stefan Fuhrmann wrote:
>>> On 14.12.2010 23:35, Johan Corveleyn wrote:
>>>
>>>> Thoughts?
>>> Two things you might try:
>>> * introduce a local variable for afile[i]
>>> * replace the for() loop with two nested ones, keeping
>>>   calls to other functions out of the hot spot:
>>>
>>> for (i=0; i<  file_len;)
>> That should read:
>> for (i=0; i<  file_len; i++)
>>> {
>>>   /* hot spot: */
>>>   for(; i<  file_len; i++)
>>>   {
>>>     curFile = afile[i];
>>>     if (curFile->chunk==-1)
>>>       curFile->chunk = 0;
>>>     else if (curFile->curp != curFile->endp -1)
>>>       curFile->curp++;
>>>     else
>>>       break;
>>>   }
>>>
>>>   if (i<  file_len)
>>>   {
>>>     /* the complex, rarely used stuff goes here */
>>>   }
>>> }
> Ok, I tried this, but it didn't really help. It's still only about as
> fast as before. While the macro version is about 10% faster (I made a
> cleaner macro version, only macro'ing the hotspot stuff, with a
> function call to the complex, rarely used stuff).
>
> For the inline version, I tried several variations (always with
> APR_INLINE in the function signature, and with local variable for
> file[i]): with or without the nested for loop, with or without the
> complex stuff factored out to a separate function, ... it made no
> difference.
>
> Maybe I'm still doing something wrong, keeping the compiler from
> inlining it (but I'm not really a compiler expert, maybe I have to add
> some compiler options or something, ...). OTOH, I now have a macro
> implementation which is quite readable (and which uses the do ...
> while(0) construct - thanks Peter), and which does the trick. So maybe
> I should simply stick with that option ...
The key factor here is that file_len is only 2.
Basically, that will make my optimization a
pessimization.

Assuming that for most calls, curp of *both*
files will be somewhere *between* BOF and
EOF, the unrolling the loop may help:

#define INCREMENT_POINTERS

/* special code for the common case. 10 .. 12 ticks per execution */

static APR_INLINE svn_boolean_t
quickly_increment_pointers(struct file_info *afile[])
{
struct file_info *file0 = afile[0];
struct file_info *file1 = afile[1];
if ((afile0->chunk != -1) && (afile1->chunk != -1))
   {
     /* suitable_type */ nextp0 = afile0->curp + 1;
     /* suitable_type */ nextp1 = afile1->curp + 1;
     if ((nextp0 != afile0->endp) && (nextp1 != afile1->endp))
       {
         afile0->curp = nextp0;
         afile1->curp = nextp1;
         return TRUE;
       }
   }
return FALSE;
}

/* slow code covering all special cases */

static svn_error_t*
slowly_increment_pointers(struct file_info *file[], apr_size_t file_len, 
apr_pool_t *pool)
{
   for (i = 0; i < file_len;i++)
...
}

/* maybe as macro: */
return ((file_len == 2) && quickly_increment_pointers (file))
   ? SVN_NO_ERROR
   : slowly_increment_pointers(file, file_len, pool);

Minor remark on C style:

> static APR_INLINE svn_error_t *
> increment_pointers(struct file_info *file[], int file_len, apr_pool_t *pool)
file_len should be apr_size_t unless it can assume negative
values in which case it should be apr_ssize_t.That way, you
know for sure it will never overflow. Int is potentially too short
(although extreeeemly unlikely in this case) and it is signed
(often not appropriate for an index value).

One more thing that might help speed up your code. The fact
that the calling overhead of a single & simple function turns
out to be a significant contribution to the overall runtime tells
us that the execution is tightly CPU-bound and squeezing
some extra cycles out of it could help. Try

struct file_info file[] instead of struct file_info *file[]

i.e. array of structs instead of array of pointers. The boring
background is as follows.

file[index]->member

requires 2 cache accesses: reading the pointer form the
array (assuming that the array pointer itself and the index
value are already in some register) followed by an access
relative to that pointer:

     /* assume file -> reg_0, index -> reg_1 */
(1.1) mem[reg_0 + 4 * reg_1] -> reg_2    // content of file[index], i.e. 
file_info pointer
(1.2) mem[reg_2 + member_offset] -> reg_3  // content of member element

The really bad part of this is that (1.2) depends on (1.1) and
must wait for the data to come in from the cache. On Corei7
this takes 4 ticks (on others like PPC it takes 2). So, while
you can issue 1 cache request per tick, the dependency
makes the whole thing take at least 5 ticks.

Using an array of structures this becomes

file[index].member
     /* assume file -> reg_0, index -> reg_1 */
(2.1) reg_1 * 32 -> reg_2 // offset of struct [index], assume 32 == 
sizeof (file_info)
(2.2) mem[reg_0 + reg_2 + member_offset] -> reg_3  // content of member 
element

Now this is only 2 ticks with (2.1) often being hidden, pre-
calculated elsewhere or entirely unnecessary (if index is
fixed).

A final problem with the array of pointers issue is that
you often cannot reuse the result of (1.1) because in C
a pointer can point anywhere. e.g.:

file[0] = (file_struct*)file;
file[0]->member1 = 123; /* might (partially) modify file[0] */
file[0]->member0 = 123; /* same here */
file[0]->member2 = 123; /* and here */

The C compiler cannot be *sure* that something to this
effect has not been done somewhere. Therefore, any
write access to a "random" address invalidates all pre-
fetched information from "random" positions. "random"
basically means "not from stack" or similar well-traceable
locations.

file_struct *file0 = file[0]
file0->member1 = 0;
file0->member2 = 123;

Is thus often faster than

file[0]->member1 = 0;
file[0]->member2 = 123;

but still much slower than

file[0].member1 = 0;
file[0].member2 = 123;

Food for thought ;)

-- Stefan^2.

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Mon, Dec 20, 2010 at 2:16 PM, Johan Corveleyn <jc...@gmail.com> wrote:
> On Mon, Dec 20, 2010 at 11:03 AM, Julian Foad <ju...@wandisco.com> wrote:
>> On Mon, 2010-12-20, Johan Corveleyn wrote:
>>> New macro version (increment only, decrement is similar):
>>> [[[
>>> /* For all files in the FILE array, increment the curp pointer.  If a file
>>>  * points before the beginning of file, let it point at the first byte again.
>>>  * If the end of the current chunk is reached, read the next chunk in the
>>>  * buffer and point curp to the start of the chunk.  If EOF is reached, set
>>>  * curp equal to endp to indicate EOF. */
>>> #define increment_pointers(all_files, files_len, pool)                       \
>>>   do {                                                                       \
>>>     int i;                                                                   \
>>>                                                                              \
>>>     for (i = 0; i < files_len; i++)                                          \
>>>     {                                                                        \
>>>       if (all_files[i]->chunk == -1) /* indicates before beginning of file */\
>>>         all_files[i]->chunk = 0;     /* point to beginning of file again */  \
>>>       else if (all_files[i]->curp != all_files[i]->endp - 1)                 \
>>>         all_files[i]->curp++;                                                \
>>
>> Hi Johan.
>>
>> Here you are having to test for two special cases every time: chunk==-1
>> and curp==endp-1.  I would suggest changing the specification of "before
>> beginning of file" to include the promise that curp==endp-1, so that you
>> don't have to use a separate test here and can instead test for this
>> special case within increment_chunk().
>
> Good idea. I'll try this tonight ...

Ok, this worked pretty well. It's a little bit faster (about 1 second,
which seems right intuitively).

One style question though: should these macros be all upper case
(INCREMENT_POINTERS and DECREMENT_POINTERS)? I guess so.

The code now looks as follows (in attachment a patch relative to
diff-optimizations-bytes branch):

[[[
/* For all files in the FILE array, increment the curp pointer.  If a file
 * points before the beginning of file, let it point at the first byte again.
 * If the end of the current chunk is reached, read the next chunk in the
 * buffer and point curp to the start of the chunk.  If EOF is reached, set
 * curp equal to endp to indicate EOF. */
#define increment_pointers(all_files, files_len, pool)                       \
  do {                                                                       \
    int i;                                                                   \
                                                                             \
    for (i = 0; i < files_len; i++)                                          \
    {                                                                        \
      if (all_files[i]->curp < all_files[i]->endp - 1)                       \
        all_files[i]->curp++;                                                \
      else                                                                   \
        SVN_ERR(increment_chunk(all_files[i], pool));                        \
    }                                                                        \
  } while (0)


/* For all files in the FILE array, decrement the curp pointer.  If the
 * start of a chunk is reached, read the previous chunk in the buffer and
 * point curp to the last byte of the chunk.  If the beginning of a FILE is
 * reached, set chunk to -1 to indicate BOF. */
#define decrement_pointers(all_files, files_len, pool)                       \
  do {                                                                       \
    int i;                                                                   \
                                                                             \
    for (i = 0; i < files_len; i++)                                          \
    {                                                                        \
      if (all_files[i]->curp > all_files[i]->buffer)                         \
        all_files[i]->curp--;                                                \
      else                                                                   \
        SVN_ERR(decrement_chunk(all_files[i], pool));                        \
    }                                                                        \
  } while (0)


static svn_error_t *
increment_chunk(struct file_info *file, apr_pool_t *pool)
{
  apr_off_t length;
  apr_off_t last_chunk = offset_to_chunk(file->size);

  if (file->chunk == -1)
    {
      /* We are at BOF (Beginning Of File). Point to first chunk/byte again. */
      file->chunk = 0;
      file->curp = file->buffer;
    }
  else if (file->chunk == last_chunk)
    {
      /* We are at the last chunk. Indicate EOF by setting curp == endp. */
      file->curp = file->endp;
    }
  else
    {
      /* There are still chunks left. Read next chunk and reset pointers. */
      file->chunk++;
      length = file->chunk == last_chunk ?
        offset_in_chunk(file->size) : CHUNK_SIZE;
      SVN_ERR(read_chunk(file->file, file->path, file->buffer,
                         length, chunk_to_offset(file->chunk),
                         pool));
      file->endp = file->buffer + length;
      file->curp = file->buffer;
    }

  return SVN_NO_ERROR;
}


static svn_error_t *
decrement_chunk(struct file_info *file, apr_pool_t *pool)
{
  if (file->chunk == 0)
    {
      /* We are already at the first chunk. Indicate BOF (Beginning Of File)
         by setting chunk = -1 and curp = endp - 1. Both conditions are
         important. They help the increment step to catch the BOF situation
         in an efficient way. */
      file->chunk--;
      file->curp = file->endp - 1;
    }
  else
    {
      /* Read previous chunk and reset pointers. */
      file->chunk--;
      SVN_ERR(read_chunk(file->file, file->path, file->buffer,
                         CHUNK_SIZE, chunk_to_offset(file->chunk),
                         pool));
      file->endp = file->buffer + CHUNK_SIZE;
      file->curp = file->endp - 1;
    }

  return SVN_NO_ERROR;
}
]]]

Cheers,
-- 
Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Mon, Dec 20, 2010 at 11:03 AM, Julian Foad <ju...@wandisco.com> wrote:
> On Mon, 2010-12-20, Johan Corveleyn wrote:
>> New macro version (increment only, decrement is similar):
>> [[[
>> /* For all files in the FILE array, increment the curp pointer.  If a file
>>  * points before the beginning of file, let it point at the first byte again.
>>  * If the end of the current chunk is reached, read the next chunk in the
>>  * buffer and point curp to the start of the chunk.  If EOF is reached, set
>>  * curp equal to endp to indicate EOF. */
>> #define increment_pointers(all_files, files_len, pool)                       \
>>   do {                                                                       \
>>     int i;                                                                   \
>>                                                                              \
>>     for (i = 0; i < files_len; i++)                                          \
>>     {                                                                        \
>>       if (all_files[i]->chunk == -1) /* indicates before beginning of file */\
>>         all_files[i]->chunk = 0;     /* point to beginning of file again */  \
>>       else if (all_files[i]->curp != all_files[i]->endp - 1)                 \
>>         all_files[i]->curp++;                                                \
>
> Hi Johan.
>
> Here you are having to test for two special cases every time: chunk==-1
> and curp==endp-1.  I would suggest changing the specification of "before
> beginning of file" to include the promise that curp==endp-1, so that you
> don't have to use a separate test here and can instead test for this
> special case within increment_chunk().

Good idea. I'll try this tonight ...

Cheers,
-- 
Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Julian Foad <ju...@wandisco.com>.

On Mon, 2010-12-20, Johan Corveleyn wrote:
> New macro version (increment only, decrement is similar):
> [[[
> /* For all files in the FILE array, increment the curp pointer.  If a file
>  * points before the beginning of file, let it point at the first byte again.
>  * If the end of the current chunk is reached, read the next chunk in the
>  * buffer and point curp to the start of the chunk.  If EOF is reached, set
>  * curp equal to endp to indicate EOF. */
> #define increment_pointers(all_files, files_len, pool)                       \
>   do {                                                                       \
>     int i;                                                                   \
>                                                                              \
>     for (i = 0; i < files_len; i++)                                          \
>     {                                                                        \
>       if (all_files[i]->chunk == -1) /* indicates before beginning of file */\
>         all_files[i]->chunk = 0;     /* point to beginning of file again */  \
>       else if (all_files[i]->curp != all_files[i]->endp - 1)                 \
>         all_files[i]->curp++;                                                \

Hi Johan.

Here you are having to test for two special cases every time: chunk==-1
and curp==endp-1.  I would suggest changing the specification of "before
beginning of file" to include the promise that curp==endp-1, so that you
don't have to use a separate test here and can instead test for this
special case within increment_chunk().

- Julian

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Wed, Dec 15, 2010 at 10:58 AM, Stefan Fuhrmann <eq...@web.de> wrote:
> On 15.12.2010 02:30, Stefan Fuhrmann wrote:
>>
>> On 14.12.2010 23:35, Johan Corveleyn wrote:
>>
>>> Thoughts?
>>
>> Two things you might try:
>> * introduce a local variable for afile[i]
>> * replace the for() loop with two nested ones, keeping
>>  calls to other functions out of the hot spot:
>>
>> for (i=0; i < file_len;)
>
> That should read:
> for (i=0; i < file_len; i++)
>>
>> {
>>  /* hot spot: */
>>  for(; i < file_len; i++)
>>  {
>>    curFile = afile[i];
>>    if (curFile->chunk==-1)
>>      curFile->chunk = 0;
>>    else if (curFile->curp != curFile->endp -1)
>>      curFile->curp++;
>>    else
>>      break;
>>  }
>>
>>  if (i < file_len)
>>  {
>>    /* the complex, rarely used stuff goes here */
>>  }
>> }

Ok, I tried this, but it didn't really help. It's still only about as
fast as before. While the macro version is about 10% faster (I made a
cleaner macro version, only macro'ing the hotspot stuff, with a
function call to the complex, rarely used stuff).

For the inline version, I tried several variations (always with
APR_INLINE in the function signature, and with local variable for
file[i]): with or without the nested for loop, with or without the
complex stuff factored out to a separate function, ... it made no
difference.

Maybe I'm still doing something wrong, keeping the compiler from
inlining it (but I'm not really a compiler expert, maybe I have to add
some compiler options or something, ...). OTOH, I now have a macro
implementation which is quite readable (and which uses the do ...
while(0) construct - thanks Peter), and which does the trick. So maybe
I should simply stick with that option ...

FYI, in attachment some patches for the different variations (to be
applied against HEAD of diff-optimizations-bytes branch). And the two
main versions here below in full for easy review/discussion/context.
At the bottom, some of my measurements. As always, I welcome any
feedback.

New macro version (increment only, decrement is similar):
[[[
/* For all files in the FILE array, increment the curp pointer.  If a file
 * points before the beginning of file, let it point at the first byte again.
 * If the end of the current chunk is reached, read the next chunk in the
 * buffer and point curp to the start of the chunk.  If EOF is reached, set
 * curp equal to endp to indicate EOF. */
#define increment_pointers(all_files, files_len, pool)                       \
  do {                                                                       \
    int i;                                                                   \
                                                                             \
    for (i = 0; i < files_len; i++)                                          \
    {                                                                        \
      if (all_files[i]->chunk == -1) /* indicates before beginning of file */\
        all_files[i]->chunk = 0;     /* point to beginning of file again */  \
      else if (all_files[i]->curp != all_files[i]->endp - 1)                 \
        all_files[i]->curp++;                                                \
      else                                                                   \
        SVN_ERR(increment_chunk(all_files[i], pool));                        \
    }                                                                        \
  } while (0)

static svn_error_t *
increment_chunk(struct file_info *file, apr_pool_t *pool)
{
  apr_off_t length;
  apr_off_t last_chunk = offset_to_chunk(file->size);

  /* Unless this was the last chunk, we need to read a new chunk. */
  if (file->chunk == last_chunk)
    {
      file->curp++; /* curp == endp signals end of file */
    }
  else
    {
      file->chunk++;
      length = file->chunk == last_chunk ?
        offset_in_chunk(file->size) : CHUNK_SIZE;
      SVN_ERR(read_chunk(file->file, file->path, file->buffer,
                         length, chunk_to_offset(file->chunk),
                         pool));
      file->endp = file->buffer + length;
      file->curp = file->buffer;
    }

  return SVN_NO_ERROR;
}
]]]

APR_INLINE version (with nested for loop, and separate function)
[[[
/* For all files in the FILE array, increment the curp pointer.  If a file
 * points before the beginning of file, let it point at the first byte again.
 * If the end of the current chunk is reached, read the next chunk in the
 * buffer and point curp to the start of the chunk.  If EOF is reached, set
 * curp equal to endp to indicate EOF. */
static APR_INLINE svn_error_t *
increment_pointers(struct file_info *file[], int file_len, apr_pool_t *pool)
{
  struct file_info *curFile;
  int i;

  for (i = 0; i < file_len; i++)
  {
    /* hot spot: */
    for(; i < file_len; i++)
    {
      curFile = file[i];
      if (curFile->chunk == -1) /* indicates before beginning of file */
        curFile->chunk = 0;     /* point to beginning of file again */
      else if (curFile->curp != curFile->endp - 1)
        curFile->curp++;
      else
        break;
    }

    if (i < file_len)
    {
      SVN_ERR(increment_chunk(curFile, pool));
    }
  }

  return SVN_NO_ERROR;
}

static svn_error_t *
increment_chunk(struct file_info *file, apr_pool_t *pool)
{
  apr_off_t length;
  apr_off_t last_chunk = offset_to_chunk(file->size);

  /* Unless this was the last chunk, we need to read a new chunk. */
  if (file->chunk == last_chunk)
    {
      file->curp++; /* curp == endp signals end of file */
    }
  else
    {
      file->chunk++;
      length = file->chunk == last_chunk ?
        offset_in_chunk(file->size) : CHUNK_SIZE;
      SVN_ERR(read_chunk(file->file, file->path, file->buffer,
                         length, chunk_to_offset(file->chunk),
                         pool));
      file->endp = file->buffer + length;
      file->curp = file->buffer;
    }

  return SVN_NO_ERROR;
}
]]]


Finally, some measurements (with the help of some instrumentation code
in blame.c, see patch in attachment). This is timing of "svn blame
-x-b" of a ~1,5 Mb file (61000 lines) with 2272 revisions (each time
average of 2nd and 3rd run):

                                      Total blame time (s)      diff time (s)
Normal function (original version):                    118                 80
Inline function (w/nested for)    :                    118                 79
Macro version                     :                    115                 72


Ok, it's nothing huge, but it makes me feel a little bit better :-).

Cheers,
-- 
Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Wed, Dec 15, 2010 at 10:58 AM, Stefan Fuhrmann <eq...@web.de> wrote:
> On 15.12.2010 02:30, Stefan Fuhrmann wrote:
>>
>> On 14.12.2010 23:35, Johan Corveleyn wrote:
>>
>>> Thoughts?
>>
>> Two things you might try:
>> * introduce a local variable for afile[i]
>> * replace the for() loop with two nested ones, keeping
>> �calls to other functions out of the hot spot:
>>
>> for (i=0; i < file_len;)
>
> That should read:
> for (i=0; i < file_len; i++)
>>
>> {
>> �/* hot spot: */
>> �for(; i < file_len; i++)
>> �{
>> � �curFile = afile[i];
>> � �if (curFile->chunk==-1)
>> � � �curFile->chunk = 0;
>> � �else if (curFile->curp != curFile->endp -1)
>> � � �curFile->curp++;
>> � �else
>> � � �break;
>> �}
>>
>> �if (i < file_len)
>> �{
>> � �/* the complex, rarely used stuff goes here */
>> �}
>> }

Ok, I tried this, but it didn't really help. It's still only about as
fast as before. While the macro version is about 10% faster (I made a
cleaner macro version, only macro'ing the hotspot stuff, with a
function call to the complex, rarely used stuff).

For the inline version, I tried several variations (always with
APR_INLINE in the function signature, and with local variable for
file[i]): with or without the nested for loop, with or without the
complex stuff factored out to a separate function, ... it made no
difference.

Maybe I'm still doing something wrong, keeping the compiler from
inlining it (but I'm not really a compiler expert, maybe I have to add
some compiler options or something, ...). OTOH, I now have a macro
implementation which is quite readable (and which uses the do ...
while(0) construct - thanks Peter), and which does the trick. So maybe
I should simply stick with that option ...

FYI, in attachment some patches for the different variations (to be
applied against HEAD of diff-optimizations-bytes branch). And the two
main versions here below in full for easy review/discussion/context.
At the bottom, some of my measurements. As always, I welcome any
feedback.

New macro version (increment only, decrement is similar):
[[[
/* For all files in the FILE array, increment the curp pointer.  If a file
 * points before the beginning of file, let it point at the first byte again.
 * If the end of the current chunk is reached, read the next chunk in the
 * buffer and point curp to the start of the chunk.  If EOF is reached, set
 * curp equal to endp to indicate EOF. */
#define increment_pointers(all_files, files_len, pool)                       \
  do {                                                                       \
    int i;                                                                   \
                                                                             \
    for (i = 0; i < files_len; i++)                                          \
    {                                                                        \
      if (all_files[i]->chunk == -1) /* indicates before beginning of file */\
        all_files[i]->chunk = 0;     /* point to beginning of file again */  \
      else if (all_files[i]->curp != all_files[i]->endp - 1)                 \
        all_files[i]->curp++;                                                \
      else                                                                   \
        SVN_ERR(increment_chunk(all_files[i], pool));                        \
    }                                                                        \
  } while (0)

static svn_error_t *
increment_chunk(struct file_info *file, apr_pool_t *pool)
{
  apr_off_t length;
  apr_off_t last_chunk = offset_to_chunk(file->size);

  /* Unless this was the last chunk, we need to read a new chunk. */
  if (file->chunk == last_chunk)
    {
      file->curp++; /* curp == endp signals end of file */
    }
  else
    {
      file->chunk++;
      length = file->chunk == last_chunk ?
        offset_in_chunk(file->size) : CHUNK_SIZE;
      SVN_ERR(read_chunk(file->file, file->path, file->buffer,
                         length, chunk_to_offset(file->chunk),
                         pool));
      file->endp = file->buffer + length;
      file->curp = file->buffer;
    }

  return SVN_NO_ERROR;
}
]]]

APR_INLINE version (with nested for loop, and separate function)
[[[
/* For all files in the FILE array, increment the curp pointer.  If a file
 * points before the beginning of file, let it point at the first byte again.
 * If the end of the current chunk is reached, read the next chunk in the
 * buffer and point curp to the start of the chunk.  If EOF is reached, set
 * curp equal to endp to indicate EOF. */
static APR_INLINE svn_error_t *
increment_pointers(struct file_info *file[], int file_len, apr_pool_t *pool)
{
  struct file_info *curFile;
  int i;

  for (i = 0; i < file_len; i++)
  {
    /* hot spot: */
    for(; i < file_len; i++)
    {
      curFile = file[i];
      if (curFile->chunk == -1) /* indicates before beginning of file */
        curFile->chunk = 0;     /* point to beginning of file again */
      else if (curFile->curp != curFile->endp - 1)
        curFile->curp++;
      else
        break;
    }

    if (i < file_len)
    {
      SVN_ERR(increment_chunk(curFile, pool));
    }
  }

  return SVN_NO_ERROR;
}

static svn_error_t *
increment_chunk(struct file_info *file, apr_pool_t *pool)
{
  apr_off_t length;
  apr_off_t last_chunk = offset_to_chunk(file->size);

  /* Unless this was the last chunk, we need to read a new chunk. */
  if (file->chunk == last_chunk)
    {
      file->curp++; /* curp == endp signals end of file */
    }
  else
    {
      file->chunk++;
      length = file->chunk == last_chunk ?
        offset_in_chunk(file->size) : CHUNK_SIZE;
      SVN_ERR(read_chunk(file->file, file->path, file->buffer,
                         length, chunk_to_offset(file->chunk),
                         pool));
      file->endp = file->buffer + length;
      file->curp = file->buffer;
    }

  return SVN_NO_ERROR;
}
]]]


Finally, some measurements (with the help of some instrumentation code
in blame.c, see patch in attachment). This is timing of "svn blame
-x-b" of a ~1,5 Mb file (61000 lines) with 2272 revisions (each time
average of 2nd and 3rd run):

                                      Total blame time (s)      diff time (s)
Normal function (original version):                    118                 80
Inline function (w/nested for)    :                    118                 79
Macro version                     :                    115                 72


Ok, it's nothing huge, but it makes me feel a little bit better :-).

Cheers,
-- 
Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Stefan Fuhrmann <eq...@web.de>.

On 15.12.2010 02:30, Stefan Fuhrmann wrote:
> On 14.12.2010 23:35, Johan Corveleyn wrote:
>
>> Thoughts?
> Two things you might try:
> * introduce a local variable for afile[i]
> * replace the for() loop with two nested ones, keeping
>   calls to other functions out of the hot spot:
>
> for (i=0; i < file_len;)
That should read:
for (i=0; i < file_len; i++)
> {
>   /* hot spot: */
>   for(; i < file_len; i++)
>   {
>     curFile = afile[i];
>     if (curFile->chunk==-1)
>       curFile->chunk = 0;
>     else if (curFile->curp != curFile->endp -1)
>       curFile->curp++;
>     else
>       break;
>   }
>
>   if (i < file_len)
>   {
>     /* the complex, rarely used stuff goes here */
>   }
> }

-- Stefan^2

RE: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Bert Huijben <be...@qqmail.nl>.


> -----Original Message-----
> From: Johan Corveleyn [mailto:jcorvel@gmail.com]
> Sent: woensdag 15 december 2010 11:30
> To: Branko Čibej
> Cc: dev@subversion.apache.org
> Subject: Re: [RFC] diff-optimizations-bytes branch: avoiding function
> call overhead (?)
> 

> It was a Windows build (x86), with "Release" configuration, compiled
> with Visual Studio Express 2008 (when measuring performance of this
> branch, I always use a Release build, a Debug build is at least twice
> as slow; I only change to a Debug build if I'm having problems,
> needing to debug etc.).

Not really useful in this discussion, but Visual Studio Express 2008 doesn't have the optimizing C++ compiler that is available in the higher Visual Studio versions.

In debug builds on VC++ some additional checks are enabled which will slow things down, but the optimizing compiler (which *is* available in the free to download platform SDK) will improve performance more than just disabling of the debug checks.

	Bert

RE: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Bert Huijben <be...@qqmail.nl>.


> -----Original Message-----
> From: Johan Corveleyn [mailto:jcorvel@gmail.com]
> Sent: woensdag 15 december 2010 11:30
> To: Branko Čibej
> Cc: dev@subversion.apache.org
> Subject: Re: [RFC] diff-optimizations-bytes branch: avoiding function
> call overhead (?)
> 

> It was a Windows build (x86), with "Release" configuration, compiled
> with Visual Studio Express 2008 (when measuring performance of this
> branch, I always use a Release build, a Debug build is at least twice
> as slow; I only change to a Debug build if I'm having problems,
> needing to debug etc.).

Not really useful in this discussion, but Visual Studio Express 2008 doesn't have the optimizing C++ compiler that is available in the higher Visual Studio versions.

In debug builds on VC++ some additional checks are enabled which will slow things down, but the optimizing compiler (which *is* available in the free to download platform SDK) will improve performance more than just disabling of the debug checks.

	Bert

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Wed, Dec 15, 2010 at 10:30 AM, Branko Čibej <br...@xbc.nu> wrote:
> On 15.12.2010 02:30, Stefan Fuhrmann wrote:
>> On 14.12.2010 23:35, Johan Corveleyn wrote:
>>> Some considerations:
>>> - Maybe I can use APR_INLINE, with similar results?
>>>
>>> - Maybe I can put just the "critical section" into a macro (the curp++
>>> / curp-- part), and call a function when a chunk boundary is
>>> encountered (~ once every 131072 iterations (chunks are 128 Kb
>>> large)), to read in the new chunk, advancing variables, ...
>> Prefer inlining because the compiler is free to ignore it.
>> Depending on the target architecture and the compiler,
>> it may be beneficial to "narrow" the scope of optimization:
>> In large functions, the optimizer may guess wrongly about
>> the hot spots.
>
> Certainly. These days it makes no sense to use macros instead of
> inlining, except in really, really extreme cases (of which there should
> be none in Subversion code), or where macros are used for tricks that
> the language proper doesn't support (debug-mode error location reporting
> being such an example).
>
> Which raises the question -- did you run your test case with a
> fully-optimized build, or with some debug-friendly build? Compilers
> these days /should/ be able to automatically inline static functions,
> regardless of explicit "inline" declarations

It was a Windows build (x86), with "Release" configuration, compiled
with Visual Studio Express 2008 (when measuring performance of this
branch, I always use a Release build, a Debug build is at least twice
as slow; I only change to a Debug build if I'm having problems,
needing to debug etc.).

I have not used any special compilation options, so maybe that could
also make a difference (but, to be honest, I'm lacking the expert
knowledge about C compilers, options, platform specific things etc.
for this).

Maybe the function is a little bit too large/complex for the compiler
to automatically inline it without any hints? Or maybe writing it
differently, like Stefan suggests, could help the compiler to inline
it without needing explicit inline declarations... I'll experiment a
little bit (in a couple of days, having a huge shortage of time now
:-)).

-- 
Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Branko Čibej <br...@xbc.nu>.

On 15.12.2010 02:30, Stefan Fuhrmann wrote:
> On 14.12.2010 23:35, Johan Corveleyn wrote:
>> Some considerations:
>> - Maybe I can use APR_INLINE, with similar results?
>>
>> - Maybe I can put just the "critical section" into a macro (the curp++
>> / curp-- part), and call a function when a chunk boundary is
>> encountered (~ once every 131072 iterations (chunks are 128 Kb
>> large)), to read in the new chunk, advancing variables, ...
> Prefer inlining because the compiler is free to ignore it.
> Depending on the target architecture and the compiler,
> it may be beneficial to "narrow" the scope of optimization:
> In large functions, the optimizer may guess wrongly about
> the hot spots.

Certainly. These days it makes no sense to use macros instead of
inlining, except in really, really extreme cases (of which there should
be none in Subversion code), or where macros are used for tricks that
the language proper doesn't support (debug-mode error location reporting
being such an example).

Which raises the question -- did you run your test case with a
fully-optimized build, or with some debug-friendly build? Compilers
these days /should/ be able to automatically inline static functions,
regardless of explicit "inline" declarations

-- Brane

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Wed, Dec 15, 2010 at 11:53 AM, Johan Corveleyn <jc...@gmail.com> wrote:
> On Wed, Dec 15, 2010 at 11:50 AM, Johan Corveleyn <jc...@gmail.com> wrote:
>> Also, I did my measurements with a blame operation on this very large
>> file, which has ~2500 revisions. So that's 2500 diffs of a ~1,5 Mb
>> file, with say on average 1 Mb of identical prefix/suffix every time.
>> That's some 2,500,000,000 function calls.
>
> That should be: 5,000,000,000 function calls, because for every diff
> we advance prefix and suffix pointers in two files: the original one
> and the modified one.

Damn. Scratch that last mail. I'm confused. It's 2,500,000,000
function calls (each of which advances the two pointers in a "for"
loop).

(that's sleep-deprivation in action (sick child etc ...))
-- 
Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Wed, Dec 15, 2010 at 11:50 AM, Johan Corveleyn <jc...@gmail.com> wrote:
> On Wed, Dec 15, 2010 at 2:30 AM, Stefan Fuhrmann <eq...@web.de> wrote:
>> On 14.12.2010 23:35, Johan Corveleyn wrote:
>>>
>>> Hi all,
>>
>> Hi Johan ;)
>
> Hi Stefan, thanks for the input :). I suspected that you might have
> some ideas about this ...
>
>>> On the diff-optimizations-bytes branch, in diff_file.c, there are two
>>> functions which are called for every byte of the identical prefix and
>>> suffix: increment_pointers and decrement_pointers. These functions are
>>> actually equivalents of curp++ or curp--, reading the next/previous
>>> byte, but taking into account the chunked-ness of the data (file data
>>> is read in memory in chunks).
>>>
>>> As an experiment I changed these functions into macro's, eliminating
>>> the function calls. This makes the diff algorithm another 10% - 15%
>>> faster (granted, this was measured with my "extreme" testcase of a 1,5
>>> Mb file (60000 lines), of which most lines are identical
>>> prefix/suffix). However, having an entire function like that
>>> implemented as a macro isn't very pretty (see below for an example).
>>
>> I'm kind of surprised that the calling overhead is
>> actually noticeable for larger files, i.e. larger values
>> of file_len it should loop many times.
>
> file_len is the size of the array of files, not the length of the
> files themselves. In the typical case (the only case that's currently
> supported) file_len is 2. That's because we're diffing just 2 files:
> the original  one and the modified one. The implementation is made
> with an array and a file_len (and "for" loops), because I wanted it to
> be generically useful also for diff3 (with 3 files: original, modified
> and latest) and diff4 (4 files: original, modified, latest and
> ancestor).
>
> Also, I did my measurements with a blame operation on this very large
> file, which has ~2500 revisions. So that's 2500 diffs of a ~1,5 Mb
> file, with say on average 1 Mb of identical prefix/suffix every time.
> That's some 2,500,000,000 function calls.

That should be: 5,000,000,000 function calls, because for every diff
we advance prefix and suffix pointers in two files: the original one
and the modified one.

Johan

> For measurement, I counted the total time of the blame operation, as
> well as the cumulative time for the svn_diff_diff calls (with
> apr_time_now() before and after).
>
>>> Some considerations:
>>> - Maybe I can use APR_INLINE, with similar results?
>>>
>>> - Maybe I can put just the "critical section" into a macro (the curp++
>>> / curp-- part), and call a function when a chunk boundary is
>>> encountered (~ once every 131072 iterations (chunks are 128 Kb
>>> large)), to read in the new chunk, advancing variables, ...
>>
>> Prefer inlining because the compiler is free to ignore it.
>> Depending on the target architecture and the compiler,
>> it may be beneficial to "narrow" the scope of optimization:
>> In large functions, the optimizer may guess wrongly about
>> the hot spots.
>>
>>> - Maybe it's not worth it?
>>
>> Since inlining is for free from the maintenance POV,
>> any gain should justify it.
>>>
>>> Thoughts?
>>
>> Two things you might try:
>> * introduce a local variable for afile[i]
>> * replace the for() loop with two nested ones, keeping
>>  calls to other functions out of the hot spot:
>>
>> for (i=0; i < file_len;)
>> {
>>  /* hot spot: */
>>  for(; i < file_len; i++)
>>  {
>>    curFile = afile[i];
>>    if (curFile->chunk==-1)
>>      curFile->chunk = 0;
>>    else if (curFile->curp != curFile->endp -1)
>>      curFile->curp++;
>>    else
>>      break;
>>  }
>>
>>  if (i < file_len)
>>  {
>>    /* the complex, rarely used stuff goes here */
>>  }
>> }
>
> Ok, when I have some time, I will experiment a little bit with
> "inline", and see if the compiler "gets it" (I'll try your "nested for
> loop" example (the corrected one, which you just sent in the other
> mail) to help the compiler a little bit). I should be able to easily
> compare that with the macro version.
>
> --
> Johan
>

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Wed, Dec 15, 2010 at 2:30 AM, Stefan Fuhrmann <eq...@web.de> wrote:
> On 14.12.2010 23:35, Johan Corveleyn wrote:
>>
>> Hi all,
>
> Hi Johan ;)

Hi Stefan, thanks for the input :). I suspected that you might have
some ideas about this ...

>> On the diff-optimizations-bytes branch, in diff_file.c, there are two
>> functions which are called for every byte of the identical prefix and
>> suffix: increment_pointers and decrement_pointers. These functions are
>> actually equivalents of curp++ or curp--, reading the next/previous
>> byte, but taking into account the chunked-ness of the data (file data
>> is read in memory in chunks).
>>
>> As an experiment I changed these functions into macro's, eliminating
>> the function calls. This makes the diff algorithm another 10% - 15%
>> faster (granted, this was measured with my "extreme" testcase of a 1,5
>> Mb file (60000 lines), of which most lines are identical
>> prefix/suffix). However, having an entire function like that
>> implemented as a macro isn't very pretty (see below for an example).
>
> I'm kind of surprised that the calling overhead is
> actually noticeable for larger files, i.e. larger values
> of file_len it should loop many times.

file_len is the size of the array of files, not the length of the
files themselves. In the typical case (the only case that's currently
supported) file_len is 2. That's because we're diffing just 2 files:
the original  one and the modified one. The implementation is made
with an array and a file_len (and "for" loops), because I wanted it to
be generically useful also for diff3 (with 3 files: original, modified
and latest) and diff4 (4 files: original, modified, latest and
ancestor).

Also, I did my measurements with a blame operation on this very large
file, which has ~2500 revisions. So that's 2500 diffs of a ~1,5 Mb
file, with say on average 1 Mb of identical prefix/suffix every time.
That's some 2,500,000,000 function calls.

For measurement, I counted the total time of the blame operation, as
well as the cumulative time for the svn_diff_diff calls (with
apr_time_now() before and after).

>> Some considerations:
>> - Maybe I can use APR_INLINE, with similar results?
>>
>> - Maybe I can put just the "critical section" into a macro (the curp++
>> / curp-- part), and call a function when a chunk boundary is
>> encountered (~ once every 131072 iterations (chunks are 128 Kb
>> large)), to read in the new chunk, advancing variables, ...
>
> Prefer inlining because the compiler is free to ignore it.
> Depending on the target architecture and the compiler,
> it may be beneficial to "narrow" the scope of optimization:
> In large functions, the optimizer may guess wrongly about
> the hot spots.
>
>> - Maybe it's not worth it?
>
> Since inlining is for free from the maintenance POV,
> any gain should justify it.
>>
>> Thoughts?
>
> Two things you might try:
> * introduce a local variable for afile[i]
> * replace the for() loop with two nested ones, keeping
>  calls to other functions out of the hot spot:
>
> for (i=0; i < file_len;)
> {
>  /* hot spot: */
>  for(; i < file_len; i++)
>  {
>    curFile = afile[i];
>    if (curFile->chunk==-1)
>      curFile->chunk = 0;
>    else if (curFile->curp != curFile->endp -1)
>      curFile->curp++;
>    else
>      break;
>  }
>
>  if (i < file_len)
>  {
>    /* the complex, rarely used stuff goes here */
>  }
> }

Ok, when I have some time, I will experiment a little bit with
"inline", and see if the compiler "gets it" (I'll try your "nested for
loop" example (the corrected one, which you just sent in the other
mail) to help the compiler a little bit). I should be able to easily
compare that with the macro version.

-- 
Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Stefan Fuhrmann <eq...@web.de>.

On 14.12.2010 23:35, Johan Corveleyn wrote:
> Hi all,
Hi Johan ;)
> On the diff-optimizations-bytes branch, in diff_file.c, there are two
> functions which are called for every byte of the identical prefix and
> suffix: increment_pointers and decrement_pointers. These functions are
> actually equivalents of curp++ or curp--, reading the next/previous
> byte, but taking into account the chunked-ness of the data (file data
> is read in memory in chunks).
>
> As an experiment I changed these functions into macro's, eliminating
> the function calls. This makes the diff algorithm another 10% - 15%
> faster (granted, this was measured with my "extreme" testcase of a 1,5
> Mb file (60000 lines), of which most lines are identical
> prefix/suffix). However, having an entire function like that
> implemented as a macro isn't very pretty (see below for an example).
I'm kind of surprised that the calling overhead is
actually noticeable for larger files, i.e. larger values
of file_len it should loop many times.
> Some considerations:
> - Maybe I can use APR_INLINE, with similar results?
>
> - Maybe I can put just the "critical section" into a macro (the curp++
> / curp-- part), and call a function when a chunk boundary is
> encountered (~ once every 131072 iterations (chunks are 128 Kb
> large)), to read in the new chunk, advancing variables, ...
Prefer inlining because the compiler is free to ignore it.
Depending on the target architecture and the compiler,
it may be beneficial to "narrow" the scope of optimization:
In large functions, the optimizer may guess wrongly about
the hot spots.

> - Maybe it's not worth it?
Since inlining is for free from the maintenance POV,
any gain should justify it.
> Thoughts?
Two things you might try:
* introduce a local variable for afile[i]
* replace the for() loop with two nested ones, keeping
   calls to other functions out of the hot spot:

for (i=0; i < file_len;)
{
   /* hot spot: */
   for(; i < file_len; i++)
   {
     curFile = afile[i];
     if (curFile->chunk==-1)
       curFile->chunk = 0;
     else if (curFile->curp != curFile->endp -1)
       curFile->curp++;
     else
       break;
   }

   if (i < file_len)
   {
     /* the complex, rarely used stuff goes here */
   }
}

-- Stefan^2.
> Just for kicks, here is an example of increment_pointers as a macro:
> [[[
> #define increment_pointers(afile, file_len, pool) {                          \
>    int i;                                                                     \
>                                                                               \
>    for (i = 0; i<  file_len; i++)                                             \
>      if (afile[i]->chunk == -1) /* indicates before beginning of file */      \
>        {                                                                      \
>          afile[i]->chunk = 0; /* point to beginning of file again */          \
>        }                                                                      \
>      else if (afile[i]->curp == afile[i]->endp - 1)                           \
>        {                                                                      \
>          apr_off_t last_chunk = offset_to_chunk(afile[i]->size);              \
>          if (afile[i]->chunk == last_chunk)                                   \
>            {                                                                  \
>              afile[i]->curp++; /* curp == endp signals end of file */         \
>            }                                                                  \
>          else                                                                 \
>            {                                                                  \
>              apr_off_t length;                                                \
>              afile[i]->chunk++;                                               \
>              length = afile[i]->chunk == last_chunk ?                         \
>                offset_in_chunk(afile[i]->size) : CHUNK_SIZE;                  \
>              SVN_ERR(read_chunk(afile[i]->file, afile[i]->path,
> afile[i]->buffer,\
>                                 length, chunk_to_offset(afile[i]->chunk),     \
>                                 pool));                                       \
>              afile[i]->endp = afile[i]->buffer + length;                      \
>              afile[i]->curp = afile[i]->buffer;                               \
>            }                                                                  \
>        }                                                                      \
>      else                                                                     \
>        {                                                                      \
>          afile[i]->curp++;                                                    \
>        }                                                                      \
>    }
> ]]]
>
> Cheers,

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Peter Samuelson <pe...@p12n.org>.

[Johan Corveleyn]
> As an experiment I changed these functions into macro's, eliminating
> the function calls. This makes the diff algorithm another 10% - 15%
> faster (granted, this was measured with my "extreme" testcase of a
> 1,5 Mb file (60000 lines), of which most lines are identical
> prefix/suffix).

Seems worth it.  That test case doesn't seem terribly uncommon.

> Some considerations:
> - Maybe I can use APR_INLINE, with similar results?
> 
> - Maybe I can put just the "critical section" into a macro (the curp++
> / curp-- part), and call a function when a chunk boundary is
> encountered (~ once every 131072 iterations (chunks are 128 Kb
> large)), to read in the new chunk, advancing variables, ...

The second option seems best to me, possibly combined with the first.
You don't want really big inline functions.  You probably can't have
them anyway, the compiler will decide to un-inline them behind your
back.  (And that's usually the right decision, for optimal CPU L1
instruction cache).

Also, if you're going to define a { } block in a macro, I think you
want to do the trick suggested by gcc docs: wrap your { } with
do...while(0).  There's no semantic change to the macro itself, but
consider cases like this:

   if (foo)
     increment_pointers(file, len, pool);
   else
     ...

Think about what the expansion looks like with and without the "do { }
while (0)", the difference should be clear.

-- 
Peter Samuelson | org-tld!p12n!peter | http://p12n.org/

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Thu, Dec 23, 2010 at 1:05 PM, Julian Foad <ju...@wandisco.com> wrote:
> On Thu, 2010-12-23, Daniel Shahaf wrote:
>> Daniel Shahaf wrote on Thu, Dec 23, 2010 at 13:25:40 +0200:
>> > Johan Corveleyn wrote on Thu, Dec 23, 2010 at 01:51:08 +0100:
>> > > Yes, that's a good idea. I'll try to spend some time on that. But I'm
>> > > wondering about a good way to write such a script.
>> > >
>> > > I'd like the script to generate large files quickly, and with content
>> > > that's not totally random, but also not 1000000 times the exact same
>> > > line (either of those are not going to be representative for real
>> > > world data, might hit some edge behavior of the diff algorithm).
>> >
> [...]
>> >
>> > > (maybe totally random is fine, but is there an easy/fast way to
>> > > generate this?)
>> > >
>> > > As a first attempt, I quickly hacked up a small shell script, writing
>> > > out lines in a for loop, one by one, with a fixed string together with
>> > > the line number (index of the iteration). But that's too slow (10000
>> > > lines of 70 bytes, i.e. 700Kb, is already taking 14 seconds).
>> > >
>> > > Maybe I can start with 10 or 20 different lines (or generate 100 in a
>> > > for loop), and then start doubling that until I have enough (cat
>> > > file.txt >> file.txt). That will probably be faster. And it might be
>> > > "real-worldish" enough (a single source file also contains many
>> > > identical lines, e.g. all lines with a single brace etc.).
>> > >
>> > > Other ideas? Maybe there is already something like this lying around?
>> > >
>> > > Another question: a shell script might not be good, because not
>> > > portable (and not fast)? Should I use python for this? Maybe the
>> > > "write line by line with a line number in a for loop" would be a lot
>> > > faster in Python? I don't know a lot of python, but it might be a good
>> > > opportunity to learn some ...
>> >
>> > IMO, use whatever language is most convenient for you to write the
>> > script in.  (Generating the test data need not be fast since it's
>> > a once-only task.)
>>
>> That is: *in my opinion* it doesn't need to be fast.  But re-reading
>> your mail, I gather you think otherwise.
>>
>> Why?  I assumed you'd run the script once, generate a repository, then
>> (commit that repository to ^/tags somewhere for safekeeping and) work
>> with that repository thereafter without regeneraeting it each time; so
>> generating wouldn't need to be fast.
>
> That's OK if it's a private test but for a maintainable test it's much
> better to generate any large data set on the fly.  Then we can easily
> tweak it to generate different data sizes, data with mismatching EOL
> style, data with prefix matching only/suffix only/both, etc.  And if the
> test data size is in the order of a megabyte or more it's ugly to check
> it in as part of the test suite in the project repo (even if it is
> usually compressed in the repo and in transmission).

Yes, I wouldn't like to commit them (as you suggest, Julian, we might
want to generate different variants, of different sizes; I wouldn't
like to commit several 100Mb files for instance).

Also, I wouldn't like to depend on a website or wikipedia dump or
something like that (or even just depend on an internet connection,
for that matter).

Taking just a bunch of real sources from svn's source tree also
doesn't feel "quite right". It's not reproducible, since the sources
change.

I'm currently thinking of embedding into the script (or committing
next to the script) a significant chunk of test data with some
"real-worldish" content (say 1 Kb, or 10 or even 100 Kb), and use that
to generate abritrary length files by repeating that block (or by
doubling the file (cat file >> file) until it's large enough).

I'm a little bit wondering about what "real-worldish" would mean, but
maybe it's not that terribly important. I could always include
multiple variations (a piece of C code, a chunk of "Lorem ipsum" text,
a large xml file, ...).

Cheers,
-- 
Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Julian Foad <ju...@wandisco.com>.

On Thu, 2010-12-23, Daniel Shahaf wrote:
> Daniel Shahaf wrote on Thu, Dec 23, 2010 at 13:25:40 +0200:
> > Johan Corveleyn wrote on Thu, Dec 23, 2010 at 01:51:08 +0100:
> > > Yes, that's a good idea. I'll try to spend some time on that. But I'm
> > > wondering about a good way to write such a script.
> > > 
> > > I'd like the script to generate large files quickly, and with content
> > > that's not totally random, but also not 1000000 times the exact same
> > > line (either of those are not going to be representative for real
> > > world data, might hit some edge behavior of the diff algorithm).
> > 
[...]
> > 
> > > (maybe totally random is fine, but is there an easy/fast way to
> > > generate this?)
> > > 
> > > As a first attempt, I quickly hacked up a small shell script, writing
> > > out lines in a for loop, one by one, with a fixed string together with
> > > the line number (index of the iteration). But that's too slow (10000
> > > lines of 70 bytes, i.e. 700Kb, is already taking 14 seconds).
> > > 
> > > Maybe I can start with 10 or 20 different lines (or generate 100 in a
> > > for loop), and then start doubling that until I have enough (cat
> > > file.txt >> file.txt). That will probably be faster. And it might be
> > > "real-worldish" enough (a single source file also contains many
> > > identical lines, e.g. all lines with a single brace etc.).
> > > 
> > > Other ideas? Maybe there is already something like this lying around?
> > > 
> > > Another question: a shell script might not be good, because not
> > > portable (and not fast)? Should I use python for this? Maybe the
> > > "write line by line with a line number in a for loop" would be a lot
> > > faster in Python? I don't know a lot of python, but it might be a good
> > > opportunity to learn some ...
> > 
> > IMO, use whatever language is most convenient for you to write the
> > script in.  (Generating the test data need not be fast since it's
> > a once-only task.)
> 
> That is: *in my opinion* it doesn't need to be fast.  But re-reading
> your mail, I gather you think otherwise.
> 
> Why?  I assumed you'd run the script once, generate a repository, then
> (commit that repository to ^/tags somewhere for safekeeping and) work
> with that repository thereafter without regeneraeting it each time; so
> generating wouldn't need to be fast.

That's OK if it's a private test but for a maintainable test it's much
better to generate any large data set on the fly.  Then we can easily
tweak it to generate different data sizes, data with mismatching EOL
style, data with prefix matching only/suffix only/both, etc.  And if the
test data size is in the order of a megabyte or more it's ugly to check
it in as part of the test suite in the project repo (even if it is
usually compressed in the repo and in transmission).

- Julian

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Daniel Shahaf wrote on Thu, Dec 23, 2010 at 13:25:40 +0200:
> Johan Corveleyn wrote on Thu, Dec 23, 2010 at 01:51:08 +0100:
> > On Wed, Dec 22, 2010 at 11:50 AM, Philip Martin
> > <ph...@wandisco.com> wrote:
> > > Johan Corveleyn <jc...@gmail.com> writes:
> > >
> > >> On Mon, Dec 20, 2010 at 11:19 AM, Philip Martin
> > >> <ph...@wandisco.com> wrote:
> > >>> Johan Corveleyn <jc...@gmail.com> writes:
> > >>>
> > >>>> This makes the diff algorithm another 10% - 15%
> > >>>> faster (granted, this was measured with my "extreme" testcase of a 1,5
> > >>>> Mb file (60000 lines), of which most lines are identical
> > >>>> prefix/suffix).
> > >>>
> > >>> Can you provide a test script?  Or decribe the test more fully, please.
> > >>
> > >> Hmm, it's not easy to come up with a test script to test this "from
> > >> scratch" (unless with testing diff directly, see below). I test it
> > >> with a repository (a dump/load of an old version of our production
> > >> repository) which contains this 60000 line xml file (1,5 Mb) with 2272
> > >> revisions.
> > >>
> > >> I run blame on this file, over svnserve protocol on localhost (server
> > >> running on same machine), with an svnserve built from Stefan^2's
> > >> performance branch (with membuffer caching of full-texts, so server
> > >> I/O is not the bottleneck). This gives me an easy way to call 2272
> > >> times diff on this file, and measure it (with the help of some
> > >> instrumentation code in blame.c, see attachment). And it's
> > >> incidentally the actual use case I first started out wanting to
> > >> optimize (blame for large files with many revisions).
> > >
> > > Testing with real-world data is important, perhaps even more important
> > > than artificial test data, but some test data would be useful.  If you
> > > were to write a script to generate two test files of size 100MB, say,
> > > then you could use the tools/diff/diff utility to run Subversion diff on
> > > those two files.  Or tools/diff/diff3 if it's a 3-way diff that matters.
> > > The first run might involve disk IO, but on most machines the OS should
> > > be able to cache the files and subsequent hot-cache runs should be a
> > > good way to profile the diff code, assumming it is CPU limited.
> > 
> > Yes, that's a good idea. I'll try to spend some time on that. But I'm
> > wondering about a good way to write such a script.
> > 
> > I'd like the script to generate large files quickly, and with content
> > that's not totally random, but also not 1000000 times the exact same
> > line (either of those are not going to be representative for real
> > world data, might hit some edge behavior of the diff algorithm).
> 
> How about using
> 
>         cat subversion/libsvn_wc/*.c
> 
> as your test file?
> 

As to time:

t1/subversion% time cat */*c | wc -c
cat: tests/libsvn_wc: Is a directory
9484278
cat */*c  0.00s user 0.05s system 4% cpu 1.248 total
wc -c  0.00s user 0.01s system 0% cpu 1.243 total

(but I ran 'make' earlier, so it might not be a cold cache)

> 
> > (maybe totally random is fine, but is there an easy/fast way to
> > generate this?)
> > 
> > As a first attempt, I quickly hacked up a small shell script, writing
> > out lines in a for loop, one by one, with a fixed string together with
> > the line number (index of the iteration). But that's too slow (10000
> > lines of 70 bytes, i.e. 700Kb, is already taking 14 seconds).
> > 
> > Maybe I can start with 10 or 20 different lines (or generate 100 in a
> > for loop), and then start doubling that until I have enough (cat
> > file.txt >> file.txt). That will probably be faster. And it might be
> > "real-worldish" enough (a single source file also contains many
> > identical lines, e.g. all lines with a single brace etc.).
> > 
> > Other ideas? Maybe there is already something like this lying around?
> > 
> > Another question: a shell script might not be good, because not
> > portable (and not fast)? Should I use python for this? Maybe the
> > "write line by line with a line number in a for loop" would be a lot
> > faster in Python? I don't know a lot of python, but it might be a good
> > opportunity to learn some ...
> > 
> 
> IMO, use whatever language is most convenient for you to write the
> script in.  (Generating the test data need not be fast since it's
> a once-only task.)

That is: *in my opinion* it doesn't need to be fast.  But re-reading
your mail, I gather you think otherwise.

Why?  I assumed you'd run the script once, generate a repository, then
(commit that repository to ^/tags somewhere for safekeeping and) work
with that repository thereafter without regeneraeting it each time; so
generating wouldn't need to be fast.

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Daniel Shahaf wrote on Thu, Dec 23, 2010 at 13:25:40 +0200:
> Johan Corveleyn wrote on Thu, Dec 23, 2010 at 01:51:08 +0100:
> > On Wed, Dec 22, 2010 at 11:50 AM, Philip Martin
> > <ph...@wandisco.com> wrote:
> > > Johan Corveleyn <jc...@gmail.com> writes:
> > >
> > >> On Mon, Dec 20, 2010 at 11:19 AM, Philip Martin
> > >> <ph...@wandisco.com> wrote:
> > >>> Johan Corveleyn <jc...@gmail.com> writes:
> > >>>
> > >>>> This makes the diff algorithm another 10% - 15%
> > >>>> faster (granted, this was measured with my "extreme" testcase of a 1,5
> > >>>> Mb file (60000 lines), of which most lines are identical
> > >>>> prefix/suffix).
> > >>>
> > >>> Can you provide a test script?  Or decribe the test more fully, please.
> > >>
> > >> Hmm, it's not easy to come up with a test script to test this "from
> > >> scratch" (unless with testing diff directly, see below). I test it
> > >> with a repository (a dump/load of an old version of our production
> > >> repository) which contains this 60000 line xml file (1,5 Mb) with 2272
> > >> revisions.
> > >>
> > >> I run blame on this file, over svnserve protocol on localhost (server
> > >> running on same machine), with an svnserve built from Stefan^2's
> > >> performance branch (with membuffer caching of full-texts, so server
> > >> I/O is not the bottleneck). This gives me an easy way to call 2272
> > >> times diff on this file, and measure it (with the help of some
> > >> instrumentation code in blame.c, see attachment). And it's
> > >> incidentally the actual use case I first started out wanting to
> > >> optimize (blame for large files with many revisions).
> > >
> > > Testing with real-world data is important, perhaps even more important
> > > than artificial test data, but some test data would be useful.  If you
> > > were to write a script to generate two test files of size 100MB, say,
> > > then you could use the tools/diff/diff utility to run Subversion diff on
> > > those two files.  Or tools/diff/diff3 if it's a 3-way diff that matters.
> > > The first run might involve disk IO, but on most machines the OS should
> > > be able to cache the files and subsequent hot-cache runs should be a
> > > good way to profile the diff code, assumming it is CPU limited.
> > 
> > Yes, that's a good idea. I'll try to spend some time on that. But I'm
> > wondering about a good way to write such a script.
> > 
> > I'd like the script to generate large files quickly, and with content
> > that's not totally random, but also not 1000000 times the exact same
> > line (either of those are not going to be representative for real
> > world data, might hit some edge behavior of the diff algorithm).
> 
> How about using
> 
>         cat subversion/libsvn_wc/*.c
> 
> as your test file?
> 

As to time:

t1/subversion% time cat */*c | wc -c
cat: tests/libsvn_wc: Is a directory
9484278
cat */*c  0.00s user 0.05s system 4% cpu 1.248 total
wc -c  0.00s user 0.01s system 0% cpu 1.243 total

(but I ran 'make' earlier, so it might not be a cold cache)

> 
> > (maybe totally random is fine, but is there an easy/fast way to
> > generate this?)
> > 
> > As a first attempt, I quickly hacked up a small shell script, writing
> > out lines in a for loop, one by one, with a fixed string together with
> > the line number (index of the iteration). But that's too slow (10000
> > lines of 70 bytes, i.e. 700Kb, is already taking 14 seconds).
> > 
> > Maybe I can start with 10 or 20 different lines (or generate 100 in a
> > for loop), and then start doubling that until I have enough (cat
> > file.txt >> file.txt). That will probably be faster. And it might be
> > "real-worldish" enough (a single source file also contains many
> > identical lines, e.g. all lines with a single brace etc.).
> > 
> > Other ideas? Maybe there is already something like this lying around?
> > 
> > Another question: a shell script might not be good, because not
> > portable (and not fast)? Should I use python for this? Maybe the
> > "write line by line with a line number in a for loop" would be a lot
> > faster in Python? I don't know a lot of python, but it might be a good
> > opportunity to learn some ...
> > 
> 
> IMO, use whatever language is most convenient for you to write the
> script in.  (Generating the test data need not be fast since it's
> a once-only task.)

That is: *in my opinion* it doesn't need to be fast.  But re-reading
your mail, I gather you think otherwise.

Why?  I assumed you'd run the script once, generate a repository, then
(commit that repository to ^/tags somewhere for safekeeping and) work
with that repository thereafter without regeneraeting it each time; so
generating wouldn't need to be fast.

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Johan Corveleyn wrote on Thu, Dec 23, 2010 at 01:51:08 +0100:
> On Wed, Dec 22, 2010 at 11:50 AM, Philip Martin
> <ph...@wandisco.com> wrote:
> > Johan Corveleyn <jc...@gmail.com> writes:
> >
> >> On Mon, Dec 20, 2010 at 11:19 AM, Philip Martin
> >> <ph...@wandisco.com> wrote:
> >>> Johan Corveleyn <jc...@gmail.com> writes:
> >>>
> >>>> This makes the diff algorithm another 10% - 15%
> >>>> faster (granted, this was measured with my "extreme" testcase of a 1,5
> >>>> Mb file (60000 lines), of which most lines are identical
> >>>> prefix/suffix).
> >>>
> >>> Can you provide a test script?  Or decribe the test more fully, please.
> >>
> >> Hmm, it's not easy to come up with a test script to test this "from
> >> scratch" (unless with testing diff directly, see below). I test it
> >> with a repository (a dump/load of an old version of our production
> >> repository) which contains this 60000 line xml file (1,5 Mb) with 2272
> >> revisions.
> >>
> >> I run blame on this file, over svnserve protocol on localhost (server
> >> running on same machine), with an svnserve built from Stefan^2's
> >> performance branch (with membuffer caching of full-texts, so server
> >> I/O is not the bottleneck). This gives me an easy way to call 2272
> >> times diff on this file, and measure it (with the help of some
> >> instrumentation code in blame.c, see attachment). And it's
> >> incidentally the actual use case I first started out wanting to
> >> optimize (blame for large files with many revisions).
> >
> > Testing with real-world data is important, perhaps even more important
> > than artificial test data, but some test data would be useful.  If you
> > were to write a script to generate two test files of size 100MB, say,
> > then you could use the tools/diff/diff utility to run Subversion diff on
> > those two files.  Or tools/diff/diff3 if it's a 3-way diff that matters.
> > The first run might involve disk IO, but on most machines the OS should
> > be able to cache the files and subsequent hot-cache runs should be a
> > good way to profile the diff code, assumming it is CPU limited.
> 
> Yes, that's a good idea. I'll try to spend some time on that. But I'm
> wondering about a good way to write such a script.
> 
> I'd like the script to generate large files quickly, and with content
> that's not totally random, but also not 1000000 times the exact same
> line (either of those are not going to be representative for real
> world data, might hit some edge behavior of the diff algorithm).

How about using

        cat subversion/libsvn_wc/*.c

as your test file?


It's currently 1.8MB, and should give plenty of prefix/suffix... (e.g.,
adm_crawler.c often is untouched)


> (maybe totally random is fine, but is there an easy/fast way to
> generate this?)
> 
> As a first attempt, I quickly hacked up a small shell script, writing
> out lines in a for loop, one by one, with a fixed string together with
> the line number (index of the iteration). But that's too slow (10000
> lines of 70 bytes, i.e. 700Kb, is already taking 14 seconds).
> 
> Maybe I can start with 10 or 20 different lines (or generate 100 in a
> for loop), and then start doubling that until I have enough (cat
> file.txt >> file.txt). That will probably be faster. And it might be
> "real-worldish" enough (a single source file also contains many
> identical lines, e.g. all lines with a single brace etc.).
> 
> Other ideas? Maybe there is already something like this lying around?
> 
> Another question: a shell script might not be good, because not
> portable (and not fast)? Should I use python for this? Maybe the
> "write line by line with a line number in a for loop" would be a lot
> faster in Python? I don't know a lot of python, but it might be a good
> opportunity to learn some ...
> 

IMO, use whatever language is most convenient for you to write the
script in.  (Generating the test data need not be fast since it's
a once-only task.)

> Are there any examples of such "manual test scripts" in svn? So I
> could have a look at the style, coding habits, ... maybe borrow some
> boilerplate code?
> 

If there are, they are in the source tree :-)  There are various Python
scripts under tools/ (particularly tools/dev/) and subversion/tests/,
but I'm not what how close they are to what you're looking for.

> Cheers,
> -- 
> Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Philip Martin <ph...@wandisco.com>.

Johan Corveleyn <jc...@gmail.com> writes:

> Another question: a shell script might not be good, because not
> portable (and not fast)? Should I use python for this? Maybe the
> "write line by line with a line number in a for loop" would be a lot
> faster in Python? I don't know a lot of python, but it might be a good
> opportunity to learn some ...

A shell script is probably fine.  What I want is some data that I can
use on my machine to test your patches.

Here's a crude python script.  With the default values it generates two
4.3MB files in less than 2 seconds on my machine.  Subversion diff takes
over 10 seconds to compare the files, GNU diff less than one second.

Using --num-prefix=2 makes the script slight slower, since it generates
more random numbers, and the time to run Subversion diff on the output
goes up to 2min.  GNU diff still takes a fraction of a second, and with
--minimal the time is 35s.  So for big improvements you probably want to
concentrate on shortcut heuristics, rather than low-level optimisation.

#!/usr/bin/python

import random, sys
from optparse import OptionParser

random.seed('abc') # repeatable

def write_file_contents(f, num_lines, num_prefix, num_suffix,
                        percent_middle, unique):
  for i in range(num_lines):
    if num_prefix > 1:
      prefix = random.randint(1, num_prefix)
    else:
      prefix = 1
    line = str(prefix) + "-common-prefix-" + str(prefix)

    middle = random.randint(1, 100)
    if middle <= percent_middle:
       line += " " + str(12345678 + i) + " "
    else:
       line += " " + str(9999999999 + i) + unique + " "

    if num_suffix > 1:
      suffix = random.randint(1, num_suffix)
    else:
      suffix = 1
    line += str(suffix) + "-common-suffix-" + str(suffix)
    f.write(line + '\n')


parser = OptionParser('Generate files for diff')
parser.add_option('--num-lines', type=int, default=100000, dest='num_lines',
                  help='number of lines, default 100000')
parser.add_option('--num-prefix', type=int, default=1, dest='num_prefix',
                  help='number of distinct prefixes, default 1')
parser.add_option('--num-suffix', type=int, default=1, dest='num_suffix',
                  help='number of distinct suffixes, default 1')
parser.add_option('--percent-middle', type=int, default=99,
                  dest='percent_middle',
                  help='percentage matching middles, default 99')
(options, args) = parser.parse_args(sys.argv)

f1 = open('file1.txt', 'w')
write_file_contents(f1, options.num_lines,
                    options.num_prefix, options.num_suffix,
                    options.percent_middle, 'a')

f2 = open('file2.txt', 'w')
write_file_contents(f2, options.num_lines,
                    options.num_prefix, options.num_suffix,
                    options.percent_middle, 'b')
-- 
Philip

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Johan Corveleyn wrote on Thu, Dec 23, 2010 at 01:51:08 +0100:
> As a first attempt, I quickly hacked up a small shell script, writing
> out lines in a for loop, one by one, with a fixed string together with
> the line number (index of the iteration). But that's too slow (10000
> lines of 70 bytes, i.e. 700Kb, is already taking 14 seconds).
> 

Use a RAM disk?

I'm not sure what I can suggest for RAM disks on Windows, though...
(Bert or Paul might know?)

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Thu, Dec 23, 2010 at 11:43 PM, Gavin Beau Baumanis
<ga...@thespidernet.com> wrote:
> On 23/12/2010, at 11:36 PM, Johan Corveleyn wrote:
>
>> On Thu, Dec 23, 2010 at 3:44 AM, Gavin Beau Baumanis
>> <ga...@thespidernet.com> wrote:
>>> Hi Johan,
>>
>> Hi Gavin. Thanks for your interest. It's a nice problem isn't it?
>
> Yes - it "should be" so very simple... but a little thought  - and before you know it - it's not!
>
>
>>
>>> I was intrigued by your requirement to create a large file for testing.
>>> I remember from a really long time ago when I learnt C, that we used a specific algorithm for creating "natural" and "random" text.
>>> With some help from Mr.Google found out about Markov Chains that look promising - I can't remember if that was what I learned about or not  - but it looks like it might be a prove helpful none the less.
>>>
>>> A little further Googlng and I found this specific post on stackoverflow.
>>> http://stackoverflow.com/questions/1037719/how-can-i-quickly-create-large-1gb-textbinary-files-with-natural-content
>>
>> Interesting link. If I'm reading it correctly, the best suggestion in
>> there was with those Markov Chains. But it seems maybe a little
>> overkill / too much work for me.
>
> I might have an answer for you there.
> I pointed out this thread to our intern.
> He seems to know about Marchov chains already - I have asked him to contribute to the thread.
>
>
>>
>>> No Idea if it is going to help you specifically or not... but there are quite a few ideas in the comments;
>>> * Obtain a copy of the first 100MB from wikipedia - for example.
>>
>> Nah, I wouldn't like to depend on some internet resource.
>>
>>> Finally, if you happen to a large enough file already, could you not use "split" (unix) function to give you a specific sized file?
>>
>> Yes, but we'll first need that large enough file :-).
>>
>>>
>>> Actually - I was so intrigued by the "challenge" of this - I have had a think of it over lunch;
>>>
>>> Could you not do this?
>>> (pseudo -code)
>>>
>>> Start with a known "chunk" - say the license file.
>>> get length of license file - for example only  = 125bytes
>>> append the chunk to itself until such time as you have the required size.
>>> write the file once.
>>>
>>> startSize = length(knownFile);
>>> int numberOfLoops = log2(desiredFileSize / knownFile) ;
>>> newFile = knownFile
>>>
>>> Loop for numberOfLoops
>>> {
>>>      newFile = newFile +newFile
>>> }
>>>
>>> write newFile;
>>
>> Yes, that certainly seems a sensible approach. I actually had the same
>> idea (see below, "doubling the file until I have enough (cat file.txt
>>>> file.txt)). Well, it's almost the same: in your suggestion the file
>> is built up completely in memory and only written out in the end.
>> That's also an option, but I think it's better to write it out every
>> time it's doubled (to avoid using up too much memory).
>
> I certainly understand that, But in todays GB RAM PCs / Laptops;
> Is a 100 MB file in memory such a worrisome issue?
> It would only be short lived - until the file was written out. The memory could then be cleared?
> And from your first email - it seemed like timing was the hurdle to clear and file open / write / close every iteration of the loop could certainly be the bottleneck of your first attempt.
>
> Anyway - Please be sure not to let me "bully" you into anything you don;t think is right for your requirements :)
> Andy (our intern) might be able to provide the missing link for you.
>
> none the less, Good Luck.

Thanks, but maybe it will not be needed. I've found a way which is
good enough for me at the moment (although it will not hurt to have a
better algorithm etc., so feel free to go ahead anyway).

I've thought/tried a bit harder, and improved on my first naive attempt:

>>>> As a first attempt, I quickly hacked up a small shell script, writing
>>>> out lines in a for loop, one by one, with a fixed string together with
>>>> the line number (index of the iteration). But that's too slow (10000
>>>> lines of 70 bytes, i.e. 700Kb, is already taking 14 seconds).

That wasn't very smart :-). Writing line-by-line into a file, tssss.

If I switch to writing lines to a shell variable, and then write it
out to a file, the 10000 lines are done in 1 second:
#!/bin/sh
CHUNK_SIZE=10000         # 10000 lines
LINE="This is a plain line of text in the file. It's actually line number"
CHUNK=`for (( i=0; i < $CHUNK_SIZE; i++ )); do echo "$LINE $i."; done`
echo "$CHUNK" > $FILENAME

Appending such CHUNKS in a for-loop a couple of times, quickly
generates a file that's large enough, with some repetition of lines,
and not too much memory usage.

In attachment a version of the script "generate_big_testfiles.sh". It
writes 10 of such 10000-line chunks to the "original file", and the
same to the "modified file" with 100 changed lines inserted in the
middle. That gives me two files of ~7 Mb, in around 7 seconds on my
machine.

I will continue to fiddle a little bit with this script, suggestions
welcome :-).

I'm currently having some trouble building the tools directory, but
once I get that fixed, I'll test the tools/diff executables with these
big test files.

Cheers,
-- 
Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Gavin Beau Baumanis <ga...@thespidernet.com>.

On 23/12/2010, at 11:36 PM, Johan Corveleyn wrote:

> On Thu, Dec 23, 2010 at 3:44 AM, Gavin Beau Baumanis
> <ga...@thespidernet.com> wrote:
>> Hi Johan,
> 
> Hi Gavin. Thanks for your interest. It's a nice problem isn't it?

Yes - it "should be" so very simple... but a little thought  - and before you know it - it's not!


> 
>> I was intrigued by your requirement to create a large file for testing.
>> I remember from a really long time ago when I learnt C, that we used a specific algorithm for creating "natural" and "random" text.
>> With some help from Mr.Google found out about Markov Chains that look promising - I can't remember if that was what I learned about or not  - but it looks like it might be a prove helpful none the less.
>> 
>> A little further Googlng and I found this specific post on stackoverflow.
>> http://stackoverflow.com/questions/1037719/how-can-i-quickly-create-large-1gb-textbinary-files-with-natural-content
> 
> Interesting link. If I'm reading it correctly, the best suggestion in
> there was with those Markov Chains. But it seems maybe a little
> overkill / too much work for me.

I might have an answer for you there.
I pointed out this thread to our intern.
He seems to know about Marchov chains already - I have asked him to contribute to the thread.


> 
>> No Idea if it is going to help you specifically or not... but there are quite a few ideas in the comments;
>> * Obtain a copy of the first 100MB from wikipedia - for example.
> 
> Nah, I wouldn't like to depend on some internet resource.
> 
>> Finally, if you happen to a large enough file already, could you not use "split" (unix) function to give you a specific sized file?
> 
> Yes, but we'll first need that large enough file :-).
> 
>> 
>> Actually - I was so intrigued by the "challenge" of this - I have had a think of it over lunch;
>> 
>> Could you not do this?
>> (pseudo -code)
>> 
>> Start with a known "chunk" - say the license file.
>> get length of license file - for example only  = 125bytes
>> append the chunk to itself until such time as you have the required size.
>> write the file once.
>> 
>> startSize = length(knownFile);
>> int numberOfLoops = log2(desiredFileSize / knownFile) ;
>> newFile = knownFile
>> 
>> Loop for numberOfLoops
>> {
>>      newFile = newFile +newFile
>> }
>> 
>> write newFile;
> 
> Yes, that certainly seems a sensible approach. I actually had the same
> idea (see below, "doubling the file until I have enough (cat file.txt
>>> file.txt)). Well, it's almost the same: in your suggestion the file
> is built up completely in memory and only written out in the end.
> That's also an option, but I think it's better to write it out every
> time it's doubled (to avoid using up too much memory).

I certainly understand that, But in todays GB RAM PCs / Laptops;
Is a 100 MB file in memory such a worrisome issue?
It would only be short lived - until the file was written out. The memory could then be cleared?
And from your first email - it seemed like timing was the hurdle to clear and file open / write / close every iteration of the loop could certainly be the bottleneck of your first attempt.

Anyway - Please be sure not to let me "bully" you into anything you don;t think is right for your requirements :)
Andy (our intern) might be able to provide the missing link for you.

none the less, Good Luck.

<snip>


> 
> But good suggestion anyway. I think I will go with some variation of
> this (starting with a large enough chunk of representative content).
> 
> Cheers,
> Johan
> 
>> On 23/12/2010, at 11:51 AM, Johan Corveleyn wrote:
>> 
>>> On Wed, Dec 22, 2010 at 11:50 AM, Philip Martin
>>> <ph...@wandisco.com> wrote:
>>>> Johan Corveleyn <jc...@gmail.com> writes:
>>>> 
>>>>> On Mon, Dec 20, 2010 at 11:19 AM, Philip Martin
>>>>> <ph...@wandisco.com> wrote:
>>>>>> Johan Corveleyn <jc...@gmail.com> writes:
>>>>>> 
>>>>>>> This makes the diff algorithm another 10% - 15%
>>>>>>> faster (granted, this was measured with my "extreme" testcase of a 1,5
>>>>>>> Mb file (60000 lines), of which most lines are identical
>>>>>>> prefix/suffix).
>>>>>> 
>>>>>> Can you provide a test script?  Or decribe the test more fully, please.
>>>>> 
>>>>> Hmm, it's not easy to come up with a test script to test this "from
>>>>> scratch" (unless with testing diff directly, see below). I test it
>>>>> with a repository (a dump/load of an old version of our production
>>>>> repository) which contains this 60000 line xml file (1,5 Mb) with 2272
>>>>> revisions.
>>>>> 
>>>>> I run blame on this file, over svnserve protocol on localhost (server
>>>>> running on same machine), with an svnserve built from Stefan^2's
>>>>> performance branch (with membuffer caching of full-texts, so server
>>>>> I/O is not the bottleneck). This gives me an easy way to call 2272
>>>>> times diff on this file, and measure it (with the help of some
>>>>> instrumentation code in blame.c, see attachment). And it's
>>>>> incidentally the actual use case I first started out wanting to
>>>>> optimize (blame for large files with many revisions).
>>>> 
>>>> Testing with real-world data is important, perhaps even more important
>>>> than artificial test data, but some test data would be useful.  If you
>>>> were to write a script to generate two test files of size 100MB, say,
>>>> then you could use the tools/diff/diff utility to run Subversion diff on
>>>> those two files.  Or tools/diff/diff3 if it's a 3-way diff that matters.
>>>> The first run might involve disk IO, but on most machines the OS should
>>>> be able to cache the files and subsequent hot-cache runs should be a
>>>> good way to profile the diff code, assumming it is CPU limited.
>>> 
>>> Yes, that's a good idea. I'll try to spend some time on that. But I'm
>>> wondering about a good way to write such a script.
>>> 
>>> I'd like the script to generate large files quickly, and with content
>>> that's not totally random, but also not 1000000 times the exact same
>>> line (either of those are not going to be representative for real
>>> world data, might hit some edge behavior of the diff algorithm).
>>> (maybe totally random is fine, but is there an easy/fast way to
>>> generate this?)
>>> 
>>> As a first attempt, I quickly hacked up a small shell script, writing
>>> out lines in a for loop, one by one, with a fixed string together with
>>> the line number (index of the iteration). But that's too slow (10000
>>> lines of 70 bytes, i.e. 700Kb, is already taking 14 seconds).
>>> 
>>> Maybe I can start with 10 or 20 different lines (or generate 100 in a
>>> for loop), and then start doubling that until I have enough (cat
>>> file.txt >> file.txt). That will probably be faster. And it might be
>>> "real-worldish" enough (a single source file also contains many
>>> identical lines, e.g. all lines with a single brace etc.).
>>> 
>>> Other ideas? Maybe there is already something like this lying around?
>>> 
>>> Another question: a shell script might not be good, because not
>>> portable (and not fast)? Should I use python for this? Maybe the
>>> "write line by line with a line number in a for loop" would be a lot
>>> faster in Python? I don't know a lot of python, but it might be a good
>>> opportunity to learn some ...
>>> 
>>> Are there any examples of such "manual test scripts" in svn? So I
>>> could have a look at the style, coding habits, ... maybe borrow some
>>> boilerplate code?
>>> 
>>> Cheers,
>>> --
>>> Johan
>> 
>>

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Thu, Dec 23, 2010 at 3:44 AM, Gavin Beau Baumanis
<ga...@thespidernet.com> wrote:
> Hi Johan,

Hi Gavin. Thanks for your interest. It's a nice problem isn't it?

> I was intrigued by your requirement to create a large file for testing.
> I remember from a really long time ago when I learnt C, that we used a specific algorithm for creating "natural" and "random" text.
> With some help from Mr.Google found out about Markov Chains that look promising - I can't remember if that was what I learned about or not  - but it looks like it might be a prove helpful none the less.
>
> A little further Googlng and I found this specific post on stackoverflow.
> http://stackoverflow.com/questions/1037719/how-can-i-quickly-create-large-1gb-textbinary-files-with-natural-content

Interesting link. If I'm reading it correctly, the best suggestion in
there was with those Markov Chains. But it seems maybe a little
overkill / too much work for me.

> No Idea if it is going to help you specifically or not... but there are quite a few ideas in the comments;
>  * Obtain a copy of the first 100MB from wikipedia - for example.

Nah, I wouldn't like to depend on some internet resource.

> Finally, if you happen to a large enough file already, could you not use "split" (unix) function to give you a specific sized file?

Yes, but we'll first need that large enough file :-).

>
> Actually - I was so intrigued by the "challenge" of this - I have had a think of it over lunch;
>
> Could you not do this?
> (pseudo -code)
>
> Start with a known "chunk" - say the license file.
> get length of license file - for example only  = 125bytes
> append the chunk to itself until such time as you have the required size.
> write the file once.
>
> startSize = length(knownFile);
> int numberOfLoops = log2(desiredFileSize / knownFile) ;
> newFile = knownFile
>
> Loop for numberOfLoops
> {
>        newFile = newFile +newFile
> }
>
> write newFile;

Yes, that certainly seems a sensible approach. I actually had the same
idea (see below, "doubling the file until I have enough (cat file.txt
>> file.txt)). Well, it's almost the same: in your suggestion the file
is built up completely in memory and only written out in the end.
That's also an option, but I think it's better to write it out every
time it's doubled (to avoid using up too much memory).

But good suggestion anyway. I think I will go with some variation of
this (starting with a large enough chunk of representative content).

Cheers,
Johan

> On 23/12/2010, at 11:51 AM, Johan Corveleyn wrote:
>
>> On Wed, Dec 22, 2010 at 11:50 AM, Philip Martin
>> <ph...@wandisco.com> wrote:
>>> Johan Corveleyn <jc...@gmail.com> writes:
>>>
>>>> On Mon, Dec 20, 2010 at 11:19 AM, Philip Martin
>>>> <ph...@wandisco.com> wrote:
>>>>> Johan Corveleyn <jc...@gmail.com> writes:
>>>>>
>>>>>> This makes the diff algorithm another 10% - 15%
>>>>>> faster (granted, this was measured with my "extreme" testcase of a 1,5
>>>>>> Mb file (60000 lines), of which most lines are identical
>>>>>> prefix/suffix).
>>>>>
>>>>> Can you provide a test script?  Or decribe the test more fully, please.
>>>>
>>>> Hmm, it's not easy to come up with a test script to test this "from
>>>> scratch" (unless with testing diff directly, see below). I test it
>>>> with a repository (a dump/load of an old version of our production
>>>> repository) which contains this 60000 line xml file (1,5 Mb) with 2272
>>>> revisions.
>>>>
>>>> I run blame on this file, over svnserve protocol on localhost (server
>>>> running on same machine), with an svnserve built from Stefan^2's
>>>> performance branch (with membuffer caching of full-texts, so server
>>>> I/O is not the bottleneck). This gives me an easy way to call 2272
>>>> times diff on this file, and measure it (with the help of some
>>>> instrumentation code in blame.c, see attachment). And it's
>>>> incidentally the actual use case I first started out wanting to
>>>> optimize (blame for large files with many revisions).
>>>
>>> Testing with real-world data is important, perhaps even more important
>>> than artificial test data, but some test data would be useful.  If you
>>> were to write a script to generate two test files of size 100MB, say,
>>> then you could use the tools/diff/diff utility to run Subversion diff on
>>> those two files.  Or tools/diff/diff3 if it's a 3-way diff that matters.
>>> The first run might involve disk IO, but on most machines the OS should
>>> be able to cache the files and subsequent hot-cache runs should be a
>>> good way to profile the diff code, assumming it is CPU limited.
>>
>> Yes, that's a good idea. I'll try to spend some time on that. But I'm
>> wondering about a good way to write such a script.
>>
>> I'd like the script to generate large files quickly, and with content
>> that's not totally random, but also not 1000000 times the exact same
>> line (either of those are not going to be representative for real
>> world data, might hit some edge behavior of the diff algorithm).
>> (maybe totally random is fine, but is there an easy/fast way to
>> generate this?)
>>
>> As a first attempt, I quickly hacked up a small shell script, writing
>> out lines in a for loop, one by one, with a fixed string together with
>> the line number (index of the iteration). But that's too slow (10000
>> lines of 70 bytes, i.e. 700Kb, is already taking 14 seconds).
>>
>> Maybe I can start with 10 or 20 different lines (or generate 100 in a
>> for loop), and then start doubling that until I have enough (cat
>> file.txt >> file.txt). That will probably be faster. And it might be
>> "real-worldish" enough (a single source file also contains many
>> identical lines, e.g. all lines with a single brace etc.).
>>
>> Other ideas? Maybe there is already something like this lying around?
>>
>> Another question: a shell script might not be good, because not
>> portable (and not fast)? Should I use python for this? Maybe the
>> "write line by line with a line number in a for loop" would be a lot
>> faster in Python? I don't know a lot of python, but it might be a good
>> opportunity to learn some ...
>>
>> Are there any examples of such "manual test scripts" in svn? So I
>> could have a look at the style, coding habits, ... maybe borrow some
>> boilerplate code?
>>
>> Cheers,
>> --
>> Johan
>
>

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Stefan Fuhrmann <eq...@web.de>.

On 24.12.2010 14:25, Stefan Fuhrmann wrote:
> On 23.12.2010 03:44, Gavin Beau Baumanis wrote:
>> Hi Johan,
>>
>> I was intrigued by your requirement to create a large file for testing.
>> I remember from a really long time ago when I learnt C, that we used 
>> a specific algorithm for creating "natural" and "random" text.
>> With some help from Mr.Google found out about Markov Chains that look 
>> promising - I can't remember if that was what I learned about or not  
>> - but it looks like it might be a prove helpful none the less.
>>
>> A little further Googlng and I found this specific post on 
>> stackoverflow.
>> http://stackoverflow.com/questions/1037719/how-can-i-quickly-create-large-1gb-textbinary-files-with-natural-content 
>>
>>
>>
>> No Idea if it is going to help you specifically or not... but there 
>> are quite a few ideas in the comments;
>>   * Obtain a copy of the first 100MB from wikipedia - for example.
>>
> You might try some recent LINUX tar ball (~400MB).
> It should be
> * mainly but probably not entirely text
> * very close to typical real-world data (large config file
>   sections, lots of source code, maybe some binary /
>   UTF16 data)
> * accessible to everybody for independent testing etc.
>
> Just an idea ;)
> -- Stefan^2.
>
... you may import many versions (including the RCs)
of it to form a deep history.

-- Stefan^2.

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Stefan Fuhrmann <eq...@web.de>.

On 23.12.2010 03:44, Gavin Beau Baumanis wrote:
> Hi Johan,
>
> I was intrigued by your requirement to create a large file for testing.
> I remember from a really long time ago when I learnt C, that we used a specific algorithm for creating "natural" and "random" text.
> With some help from Mr.Google found out about Markov Chains that look promising - I can't remember if that was what I learned about or not  - but it looks like it might be a prove helpful none the less.
>
> A little further Googlng and I found this specific post on stackoverflow.
> http://stackoverflow.com/questions/1037719/how-can-i-quickly-create-large-1gb-textbinary-files-with-natural-content
>
>
> No Idea if it is going to help you specifically or not... but there are quite a few ideas in the comments;
>   * Obtain a copy of the first 100MB from wikipedia - for example.
>
You might try some recent LINUX tar ball (~400MB).
It should be
* mainly but probably not entirely text
* very close to typical real-world data (large config file
   sections, lots of source code, maybe some binary /
   UTF16 data)
* accessible to everybody for independent testing etc.

Just an idea ;)
-- Stefan^2.

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Gavin Beau Baumanis <ga...@thespidernet.com>.

Hi Johan,

I was intrigued by your requirement to create a large file for testing.
I remember from a really long time ago when I learnt C, that we used a specific algorithm for creating "natural" and "random" text.
With some help from Mr.Google found out about Markov Chains that look promising - I can't remember if that was what I learned about or not  - but it looks like it might be a prove helpful none the less.

A little further Googlng and I found this specific post on stackoverflow.
http://stackoverflow.com/questions/1037719/how-can-i-quickly-create-large-1gb-textbinary-files-with-natural-content


No Idea if it is going to help you specifically or not... but there are quite a few ideas in the comments;
 * Obtain a copy of the first 100MB from wikipedia - for example.

Finally, if you happen to a large enough file already, could you not use "split" (unix) function to give you a specific sized file?


Actually - I was so intrigued by the "challenge" of this - I have had a think of it over lunch;

Could you not do this?
(pseudo -code)

Start with a known "chunk" - say the license file.
get length of license file - for example only  = 125bytes
append the chunk to itself until such time as you have the required size.
write the file once.

startSize = length(knownFile);
int numberOfLoops = log2(desiredFileSize / knownFile) ;
newFile = knownFile

Loop for numberOfLoops
{
	newFile = newFile +newFile
}

write newFile;



Gavin



On 23/12/2010, at 11:51 AM, Johan Corveleyn wrote:

> On Wed, Dec 22, 2010 at 11:50 AM, Philip Martin
> <ph...@wandisco.com> wrote:
>> Johan Corveleyn <jc...@gmail.com> writes:
>> 
>>> On Mon, Dec 20, 2010 at 11:19 AM, Philip Martin
>>> <ph...@wandisco.com> wrote:
>>>> Johan Corveleyn <jc...@gmail.com> writes:
>>>> 
>>>>> This makes the diff algorithm another 10% - 15%
>>>>> faster (granted, this was measured with my "extreme" testcase of a 1,5
>>>>> Mb file (60000 lines), of which most lines are identical
>>>>> prefix/suffix).
>>>> 
>>>> Can you provide a test script?  Or decribe the test more fully, please.
>>> 
>>> Hmm, it's not easy to come up with a test script to test this "from
>>> scratch" (unless with testing diff directly, see below). I test it
>>> with a repository (a dump/load of an old version of our production
>>> repository) which contains this 60000 line xml file (1,5 Mb) with 2272
>>> revisions.
>>> 
>>> I run blame on this file, over svnserve protocol on localhost (server
>>> running on same machine), with an svnserve built from Stefan^2's
>>> performance branch (with membuffer caching of full-texts, so server
>>> I/O is not the bottleneck). This gives me an easy way to call 2272
>>> times diff on this file, and measure it (with the help of some
>>> instrumentation code in blame.c, see attachment). And it's
>>> incidentally the actual use case I first started out wanting to
>>> optimize (blame for large files with many revisions).
>> 
>> Testing with real-world data is important, perhaps even more important
>> than artificial test data, but some test data would be useful.  If you
>> were to write a script to generate two test files of size 100MB, say,
>> then you could use the tools/diff/diff utility to run Subversion diff on
>> those two files.  Or tools/diff/diff3 if it's a 3-way diff that matters.
>> The first run might involve disk IO, but on most machines the OS should
>> be able to cache the files and subsequent hot-cache runs should be a
>> good way to profile the diff code, assumming it is CPU limited.
> 
> Yes, that's a good idea. I'll try to spend some time on that. But I'm
> wondering about a good way to write such a script.
> 
> I'd like the script to generate large files quickly, and with content
> that's not totally random, but also not 1000000 times the exact same
> line (either of those are not going to be representative for real
> world data, might hit some edge behavior of the diff algorithm).
> (maybe totally random is fine, but is there an easy/fast way to
> generate this?)
> 
> As a first attempt, I quickly hacked up a small shell script, writing
> out lines in a for loop, one by one, with a fixed string together with
> the line number (index of the iteration). But that's too slow (10000
> lines of 70 bytes, i.e. 700Kb, is already taking 14 seconds).
> 
> Maybe I can start with 10 or 20 different lines (or generate 100 in a
> for loop), and then start doubling that until I have enough (cat
> file.txt >> file.txt). That will probably be faster. And it might be
> "real-worldish" enough (a single source file also contains many
> identical lines, e.g. all lines with a single brace etc.).
> 
> Other ideas? Maybe there is already something like this lying around?
> 
> Another question: a shell script might not be good, because not
> portable (and not fast)? Should I use python for this? Maybe the
> "write line by line with a line number in a for loop" would be a lot
> faster in Python? I don't know a lot of python, but it might be a good
> opportunity to learn some ...
> 
> Are there any examples of such "manual test scripts" in svn? So I
> could have a look at the style, coding habits, ... maybe borrow some
> boilerplate code?
> 
> Cheers,
> -- 
> Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Johan Corveleyn wrote on Thu, Dec 23, 2010 at 01:51:08 +0100:
> On Wed, Dec 22, 2010 at 11:50 AM, Philip Martin
> <ph...@wandisco.com> wrote:
> > Johan Corveleyn <jc...@gmail.com> writes:
> >
> >> On Mon, Dec 20, 2010 at 11:19 AM, Philip Martin
> >> <ph...@wandisco.com> wrote:
> >>> Johan Corveleyn <jc...@gmail.com> writes:
> >>>
> >>>> This makes the diff algorithm another 10% - 15%
> >>>> faster (granted, this was measured with my "extreme" testcase of a 1,5
> >>>> Mb file (60000 lines), of which most lines are identical
> >>>> prefix/suffix).
> >>>
> >>> Can you provide a test script?  Or decribe the test more fully, please.
> >>
> >> Hmm, it's not easy to come up with a test script to test this "from
> >> scratch" (unless with testing diff directly, see below). I test it
> >> with a repository (a dump/load of an old version of our production
> >> repository) which contains this 60000 line xml file (1,5 Mb) with 2272
> >> revisions.
> >>
> >> I run blame on this file, over svnserve protocol on localhost (server
> >> running on same machine), with an svnserve built from Stefan^2's
> >> performance branch (with membuffer caching of full-texts, so server
> >> I/O is not the bottleneck). This gives me an easy way to call 2272
> >> times diff on this file, and measure it (with the help of some
> >> instrumentation code in blame.c, see attachment). And it's
> >> incidentally the actual use case I first started out wanting to
> >> optimize (blame for large files with many revisions).
> >
> > Testing with real-world data is important, perhaps even more important
> > than artificial test data, but some test data would be useful.  If you
> > were to write a script to generate two test files of size 100MB, say,
> > then you could use the tools/diff/diff utility to run Subversion diff on
> > those two files.  Or tools/diff/diff3 if it's a 3-way diff that matters.
> > The first run might involve disk IO, but on most machines the OS should
> > be able to cache the files and subsequent hot-cache runs should be a
> > good way to profile the diff code, assumming it is CPU limited.
> 
> Yes, that's a good idea. I'll try to spend some time on that. But I'm
> wondering about a good way to write such a script.
> 
> I'd like the script to generate large files quickly, and with content
> that's not totally random, but also not 1000000 times the exact same
> line (either of those are not going to be representative for real
> world data, might hit some edge behavior of the diff algorithm).

How about using

        cat subversion/libsvn_wc/*.c

as your test file?


It's currently 1.8MB, and should give plenty of prefix/suffix... (e.g.,
adm_crawler.c often is untouched)


> (maybe totally random is fine, but is there an easy/fast way to
> generate this?)
> 
> As a first attempt, I quickly hacked up a small shell script, writing
> out lines in a for loop, one by one, with a fixed string together with
> the line number (index of the iteration). But that's too slow (10000
> lines of 70 bytes, i.e. 700Kb, is already taking 14 seconds).
> 
> Maybe I can start with 10 or 20 different lines (or generate 100 in a
> for loop), and then start doubling that until I have enough (cat
> file.txt >> file.txt). That will probably be faster. And it might be
> "real-worldish" enough (a single source file also contains many
> identical lines, e.g. all lines with a single brace etc.).
> 
> Other ideas? Maybe there is already something like this lying around?
> 
> Another question: a shell script might not be good, because not
> portable (and not fast)? Should I use python for this? Maybe the
> "write line by line with a line number in a for loop" would be a lot
> faster in Python? I don't know a lot of python, but it might be a good
> opportunity to learn some ...
> 

IMO, use whatever language is most convenient for you to write the
script in.  (Generating the test data need not be fast since it's
a once-only task.)

> Are there any examples of such "manual test scripts" in svn? So I
> could have a look at the style, coding habits, ... maybe borrow some
> boilerplate code?
> 

If there are, they are in the source tree :-)  There are various Python
scripts under tools/ (particularly tools/dev/) and subversion/tests/,
but I'm not what how close they are to what you're looking for.

> Cheers,
> -- 
> Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Wed, Dec 22, 2010 at 11:50 AM, Philip Martin
<ph...@wandisco.com> wrote:
> Johan Corveleyn <jc...@gmail.com> writes:
>
>> On Mon, Dec 20, 2010 at 11:19 AM, Philip Martin
>> <ph...@wandisco.com> wrote:
>>> Johan Corveleyn <jc...@gmail.com> writes:
>>>
>>>> This makes the diff algorithm another 10% - 15%
>>>> faster (granted, this was measured with my "extreme" testcase of a 1,5
>>>> Mb file (60000 lines), of which most lines are identical
>>>> prefix/suffix).
>>>
>>> Can you provide a test script?  Or decribe the test more fully, please.
>>
>> Hmm, it's not easy to come up with a test script to test this "from
>> scratch" (unless with testing diff directly, see below). I test it
>> with a repository (a dump/load of an old version of our production
>> repository) which contains this 60000 line xml file (1,5 Mb) with 2272
>> revisions.
>>
>> I run blame on this file, over svnserve protocol on localhost (server
>> running on same machine), with an svnserve built from Stefan^2's
>> performance branch (with membuffer caching of full-texts, so server
>> I/O is not the bottleneck). This gives me an easy way to call 2272
>> times diff on this file, and measure it (with the help of some
>> instrumentation code in blame.c, see attachment). And it's
>> incidentally the actual use case I first started out wanting to
>> optimize (blame for large files with many revisions).
>
> Testing with real-world data is important, perhaps even more important
> than artificial test data, but some test data would be useful.  If you
> were to write a script to generate two test files of size 100MB, say,
> then you could use the tools/diff/diff utility to run Subversion diff on
> those two files.  Or tools/diff/diff3 if it's a 3-way diff that matters.
> The first run might involve disk IO, but on most machines the OS should
> be able to cache the files and subsequent hot-cache runs should be a
> good way to profile the diff code, assumming it is CPU limited.

Yes, that's a good idea. I'll try to spend some time on that. But I'm
wondering about a good way to write such a script.

I'd like the script to generate large files quickly, and with content
that's not totally random, but also not 1000000 times the exact same
line (either of those are not going to be representative for real
world data, might hit some edge behavior of the diff algorithm).
(maybe totally random is fine, but is there an easy/fast way to
generate this?)

As a first attempt, I quickly hacked up a small shell script, writing
out lines in a for loop, one by one, with a fixed string together with
the line number (index of the iteration). But that's too slow (10000
lines of 70 bytes, i.e. 700Kb, is already taking 14 seconds).

Maybe I can start with 10 or 20 different lines (or generate 100 in a
for loop), and then start doubling that until I have enough (cat
file.txt >> file.txt). That will probably be faster. And it might be
"real-worldish" enough (a single source file also contains many
identical lines, e.g. all lines with a single brace etc.).

Other ideas? Maybe there is already something like this lying around?

Another question: a shell script might not be good, because not
portable (and not fast)? Should I use python for this? Maybe the
"write line by line with a line number in a for loop" would be a lot
faster in Python? I don't know a lot of python, but it might be a good
opportunity to learn some ...

Are there any examples of such "manual test scripts" in svn? So I
could have a look at the style, coding habits, ... maybe borrow some
boilerplate code?

Cheers,
-- 
Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Philip Martin <ph...@wandisco.com>.

Johan Corveleyn <jc...@gmail.com> writes:

> On Mon, Dec 20, 2010 at 11:19 AM, Philip Martin
> <ph...@wandisco.com> wrote:
>> Johan Corveleyn <jc...@gmail.com> writes:
>>
>>> This makes the diff algorithm another 10% - 15%
>>> faster (granted, this was measured with my "extreme" testcase of a 1,5
>>> Mb file (60000 lines), of which most lines are identical
>>> prefix/suffix).
>>
>> Can you provide a test script?  Or decribe the test more fully, please.
>
> Hmm, it's not easy to come up with a test script to test this "from
> scratch" (unless with testing diff directly, see below). I test it
> with a repository (a dump/load of an old version of our production
> repository) which contains this 60000 line xml file (1,5 Mb) with 2272
> revisions.
>
> I run blame on this file, over svnserve protocol on localhost (server
> running on same machine), with an svnserve built from Stefan^2's
> performance branch (with membuffer caching of full-texts, so server
> I/O is not the bottleneck). This gives me an easy way to call 2272
> times diff on this file, and measure it (with the help of some
> instrumentation code in blame.c, see attachment). And it's
> incidentally the actual use case I first started out wanting to
> optimize (blame for large files with many revisions).

Testing with real-world data is important, perhaps even more important
than artificial test data, but some test data would be useful.  If you
were to write a script to generate two test files of size 100MB, say,
then you could use the tools/diff/diff utility to run Subversion diff on
those two files.  Or tools/diff/diff3 if it's a 3-way diff that matters.
The first run might involve disk IO, but on most machines the OS should
be able to cache the files and subsequent hot-cache runs should be a
good way to profile the diff code, assumming it is CPU limited.

-- 
Philip

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Johan Corveleyn <jc...@gmail.com>.

On Mon, Dec 20, 2010 at 11:19 AM, Philip Martin
<ph...@wandisco.com> wrote:
> Johan Corveleyn <jc...@gmail.com> writes:
>
>> This makes the diff algorithm another 10% - 15%
>> faster (granted, this was measured with my "extreme" testcase of a 1,5
>> Mb file (60000 lines), of which most lines are identical
>> prefix/suffix).
>
> Can you provide a test script?  Or decribe the test more fully, please.

Hmm, it's not easy to come up with a test script to test this "from
scratch" (unless with testing diff directly, see below). I test it
with a repository (a dump/load of an old version of our production
repository) which contains this 60000 line xml file (1,5 Mb) with 2272
revisions.

I run blame on this file, over svnserve protocol on localhost (server
running on same machine), with an svnserve built from Stefan^2's
performance branch (with membuffer caching of full-texts, so server
I/O is not the bottleneck). This gives me an easy way to call 2272
times diff on this file, and measure it (with the help of some
instrumentation code in blame.c, see attachment). And it's
incidentally the actual use case I first started out wanting to
optimize (blame for large files with many revisions).

This is the actual command I use, and the output generated by the
instrumentation in blame.c:
[[[
$ time svn blame -x-b svn://localhost/trunk/path/to/settings.xml >/dev/null
### blame took 117546875 usec (117 s)
### file_rev_handler: 3203125 (3 s) - window_handler: 110781250 (110 s)
### wrapped_handler: 37859375 (37 s) - diff: 70921875 (70 s) -
blame_process: 1015625 (1 s)

real    1m58.008s
user    0m0.015s
sys     0m0.031s
]]]

(note: I use -x-b option in this case, because for some reason this
speeds it up tremendously. This probably has something to do with my
test data, which contains in its history some "all tabs to spaces" and
"all spaces to tabs" revisions.).

Some background info on this instrumentation output:
- "blame took ...": timing before and after doing all the stuff
(around the call to svn_ra_get_file_revs2, which includes all the
callbacks).
- "file_rev_handler" and "window_handler": timing of all the useful
work that's done at the client side (so this kind of excludes the time
that the client is simply waiting for the server).
- Last line contains parts of the window_handler time:
- "wrapped_handler": time taken to build all the full-texts at the client side.
- "diff": time spent in calls to svn_diff_file_diff_2 (this is the one
I'm trying to optimize with the diff-optimizations stuff).
- "blame_process": the time taken to create and insert the blame
chunks (linked list with blame information). As you can see this is
quite negligible.

So, I'm mainly looking at the time reported after "diff:".

An alternative way to test this, which may be scriptable: testing diff
directly, by "svn diffing" a large file. I can notice small
differences (in the area of 10 or 20 milliseconds) when simply
executing a single "svn diff" of settings.xml, with one line modified.
But it's too small to make any definite conclusions (inaccuracy,
overhead of program startup, ...). Maybe a simple test in c, with a
for loop with many iterations calling svn_diff_file_diff_2, would be
better.

I guess it would be easy to script the creation of a new repository,
commit a file in it with 100000 lines, modify one line, and diff that
while measuring it.

(the best example I found in subversion's own repository on
svn.apache.org, was subversion/tests/cmdline/merge_tests.py. This has
~16500 lines, and has about 660 changes)

Cheers,
-- 
Johan

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

Posted by Philip Martin <ph...@wandisco.com>.

Johan Corveleyn <jc...@gmail.com> writes:

> This makes the diff algorithm another 10% - 15%
> faster (granted, this was measured with my "extreme" testcase of a 1,5
> Mb file (60000 lines), of which most lines are identical
> prefix/suffix).

Can you provide a test script?  Or decribe the test more fully, please.

-- 
Philip