You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Branko Čibej <br...@xbc.nu> on 2002/07/25 21:16:19 UTC

Delta combiner stress test results

I ran some stress tests on the delta combiner today, using revision 
2703, and comparisons whth /trunk at the same revision. Here's how I tested:

    * I used the dump file for the Subversion repository (up to revision
      2662) to create two repositories, one with the branch svn and one
      with the trunk svn. Here's what the repositories look like, after
      cleaning out the log files:

[brane@silmaril s]$ ll repo-trunk/db/
total 57921
-rw-r--r--    1 brane    None          328 Jul 25 06:06 DB_CONFIG
-rw-r--r--    1 brane    None         8192 Jul 25 06:06 __db.001
-rw-r--r--    1 brane    None       270336 Jul 25 06:06 __db.002
-rw-r--r--    1 brane    None       327680 Jul 25 06:06 __db.003
-rw-r--r--    1 brane    None       704512 Jul 25 06:06 __db.004
-rw-r--r--    1 brane    None        16384 Jul 25 06:06 __db.005
-rw-r--r--    1 brane    None      1826816 Jul 25 22:16 changes
-rw-r--r--    1 brane    None        16384 Jul 25 22:16 copies
-rw-r--r--    1 brane    None      1720320 Jul 25 22:16 nodes
-rw-r--r--    1 brane    None      2097152 Jul 25 22:16 representations
-rw-r--r--    1 brane    None        73728 Jul 25 22:16 revisions
-rw-r--r--    1 brane    None     49037312 Jul 25 22:16 strings
-rw-r--r--    1 brane    None      3211264 Jul 25 22:16 transactions
[brane@silmaril s]$ ll repo-branch/db/
total 41033
-rw-r--r--    1 brane    None          328 Jul 25 10:13 DB_CONFIG
-rw-r--r--    1 brane    None         8192 Jul 25 10:13 __db.001
-rw-r--r--    1 brane    None       270336 Jul 25 10:13 __db.002
-rw-r--r--    1 brane    None       327680 Jul 25 10:13 __db.003
-rw-r--r--    1 brane    None       704512 Jul 25 10:13 __db.004
-rw-r--r--    1 brane    None        16384 Jul 25 10:13 __db.005
-rw-r--r--    1 brane    None      1826816 Jul 25 21:46 changes
-rw-r--r--    1 brane    None        16384 Jul 25 21:46 copies
-rw-r--r--    1 brane    None      1720320 Jul 25 21:46 nodes
-rw-r--r--    1 brane    None      2088960 Jul 25 21:46 representations
-rw-r--r--    1 brane    None        73728 Jul 25 21:46 revisions
-rw-r--r--    1 brane    None     31752192 Jul 25 21:46 strings
-rw-r--r--    1 brane    None      3211264 Jul 25 21:46 transactions
    

      The branch repo is smaller, because it deltifies files regardless
      of their size. On the trunk, files larger than the delta window
      (100k) aren't deltified.

    * I created out four working copies, to cover all the possible
      combinations: trunk svn/trunk repo, trunk svn/branch repo, branch
      svn/trunk repo and branch svn/branch svn.

    * I then did the following tests in all four flavours:
      svn co -r500    # Check out an early version -- lots of
      undeltification, few files
      svn up -r2500   # Move to recent version -- less undeltification,
      many files
      svn up -r1500   # Move to older version -- more undeltification,
      less files

    * After each set of tests, I recursively compared the working copies
      (except the .svn directories). All the working copes were
      identical, _except_ that the trunk svn/branch repo combination
      corrupted files larget than 100k. I assume there's a latent bug in
      the undeltification code on /trunk that shows up when trying to
      undeltify files larger than the window size. We didn't notice it
      because there are no such files in our repositories. The branch
      sbuversion doesn't have this problem.


The timing results are in the attached table. Note that the results are 
skewed, especially the "svn up" ones, because I measured ra_local. Based 
on how I observed "svn" to behave over ra_dav, I estimate that working 
copy handling represents a constant 60s and 6-7M of the results.

The tests show that operations on the branch repo are slightly more 
expensive, because there's more undeltification going on. On the plus 
side, the branch repo is 30% smaller.

Memory usage in the branch svn was much more constant than in the trunk 
svn, but both versions displayed a slight but continuous increase in 
working set size. We probably have a memory leak somewhere.


Conclusion: The combiner is ready to be merged on the mainline. Please 
test the code in /branches/issue-531-dev on your repositories, and let 
me know the results. I'd be especially interested in results from really 
huge repositores, e.g., the Linux kernel archives I know some of you 
have created.

If everything goes well, I'll merge the combiner onto the trunk on Monday.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

Re: Delta combiner stress test results

Posted by Branko Čibej <br...@xbc.nu>.
cmpilato@collab.net wrote:

>Karl Fogel <kf...@newton.ch.collab.net> writes:
>
>  
>
>>Branko Čibej <br...@xbc.nu> writes:
>>    
>>
>>>    * After each set of tests, I recursively compared the working copies
>>>      (except the .svn directories). All the working copes were
>>>      identical, _except_ that the trunk svn/branch repo combination
>>>      corrupted files larget than 100k. I assume there's a latent bug in
>>>      the undeltification code on /trunk that shows up when trying to
>>>      undeltify files larger than the window size. We didn't notice it
>>>      because there are no such files in our repositories. The branch
>>>      sbuversion doesn't have this problem.
>>>      
>>>
>>Heh.  Nice discovery.
>>    
>>
>
>I'm confused.  Does large_file_integrity() in fs-test.c not do an
>adequate job of testing?
>

No, because on trunk, large files don't get deltified at all. This bug 
only showed up when using trunk svn with a repository created by the 
branch svn, which _does_ deltify large files.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Delta combiner stress test results

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
cmpilato@collab.net writes:

> In fact, large_file_integrity() does exactly this sort of thing.  Here
> are the in-line comments from the helper function:

Yeah, but the files aren't nearly large enough.  They're larger than
the svn delta window size, but they're not anywhere in the
neighborhood of hundreds of MB, nor even tens of MB :-).

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Delta combiner stress test results

Posted by cm...@collab.net.
Karl Fogel <kf...@newton.ch.collab.net> writes:

> Branko Čibej <br...@xbc.nu> writes:
> >     * After each set of tests, I recursively compared the working copies
> >       (except the .svn directories). All the working copes were
> >       identical, _except_ that the trunk svn/branch repo combination
> >       corrupted files larget than 100k. I assume there's a latent bug in
> >       the undeltification code on /trunk that shows up when trying to
> >       undeltify files larger than the window size. We didn't notice it
> >       because there are no such files in our repositories. The branch
> >       sbuversion doesn't have this problem.
> 
> Heh.  Nice discovery.

I'm confused.  Does large_file_integrity() in fs-test.c not do an
adequate job of testing?

> > Conclusion: The combiner is ready to be merged on the mainline. Please
> > test the code in /branches/issue-531-dev on your repositories, and let
> > me know the results. I'd be especially interested in results from
> > really huge repositores, e.g., the Linux kernel archives I know some
> > of you have created.
> 
> I'd be *most* interested in results for repositories versioning huge
> files (~100MB), in two separate ways:
> 
>    - Huge file, but each revision is a small delta to it (like adding
>         the string "fish" inside a giant image file)
> 
>    - Huge file, and each revision touches a lot of its bytes

In fact, large_file_integrity() does exactly this sort of thing.  Here
are the in-line comments from the helper function:

  /* Create a big, ugly, pseudo-random-filled file and commit it.  */

  /* Now, let's make some edits to the beginning of our file, and
     commit those. */

  /* Now, let's make some edits to the end of our file. */

  /* How about some edits to both the beginning and the end of the
     file? */

  /* Alright, now we're just going to go crazy.  Let's make many more
     edits -- pseudo-random numbers and offsets of bytes changed to
     more pseudo-random values.  */

> Our showstopper scalability issue had been that we didn't deltify
> files larger than the delta window size (~100k).  Now that Brane's
> fixed that, wouldn't be interesting to see where our limits are? :-)

Ooh...perhaps these big files in large_file_integrity() weren't
actually being deltified... ?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Delta combiner stress test results

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Branko Čibej <br...@xbc.nu> writes:
>     * After each set of tests, I recursively compared the working copies
>       (except the .svn directories). All the working copes were
>       identical, _except_ that the trunk svn/branch repo combination
>       corrupted files larget than 100k. I assume there's a latent bug in
>       the undeltification code on /trunk that shows up when trying to
>       undeltify files larger than the window size. We didn't notice it
>       because there are no such files in our repositories. The branch
>       sbuversion doesn't have this problem.

Heh.  Nice discovery.

> The tests show that operations on the branch repo are slightly more
> expensive, because there's more undeltification going on. On the plus
> side, the branch repo is 30% smaller.

These time & mem differences are pretty small, insignificant really.

> Conclusion: The combiner is ready to be merged on the mainline. Please
> test the code in /branches/issue-531-dev on your repositories, and let
> me know the results. I'd be especially interested in results from
> really huge repositores, e.g., the Linux kernel archives I know some
> of you have created.

I'd be *most* interested in results for repositories versioning huge
files (~100MB), in two separate ways:

   - Huge file, but each revision is a small delta to it (like adding
        the string "fish" inside a giant image file)

   - Huge file, and each revision touches a lot of its bytes

Our showstopper scalability issue had been that we didn't deltify
files larger than the delta window size (~100k).  Now that Brane's
fixed that, wouldn't be interesting to see where our limits are? :-)

> If everything goes well, I'll merge the combiner onto the trunk on Monday.

Bravo, sir!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org