You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Jan Bares <he...@yahoo.com> on 2004/03/16 10:45:55 UTC

vdelta performance

Hi,

how can I test vdelta performance? Can I build vdelta as standalone
application? Can I get the actual delta size from the database? Or at least
size of all the changes between revisions?
(I am on Windows platform)

Why I am asking this? I developed in-house delta routine for our program
used to update applications over internet. I realized, that on big files
(mostly databases), the delta programs are slow and produce big files. I
tested Xdelta 1.1.3 and MSPatch (Microsoft's patch engine, no sources, no
docs, very good for small files). I think that the reason why the delta's
are so big is in the size of the comparing window. If the same portions are
too apart, they will not be found. I use very simple approach by searching
for longest match in the *whole* file and then compressing the copy/insert
operations.

I did simple test with Subversion 1.0. I inserted 55MB Microsoft Access
database. The whole size of repository (strings) was ~55MB. Then I
modified/deleted/added some records and commited the changes. The size of
the repository (strings) is now ~110MB. However the patch size generated by
my delta program is only 25kB. (the logs were deleted).

Thanks, Jan




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: vdelta performance

Posted by Jan Bares <he...@yahoo.com>.
Thanks John for answer

> Remember that BDB is a preallocated database, i.e. the file size only
grows.
> After the update, but before the deltify, the database has two full copies
of
> the original file (hence 110MB).  After the delta has been calculated,
most of
> that space is recovered.  If you were to do the same modify/commit several
> times, you shouldn't see the strings database grow by any significant
amount.

Thanks for noting that.

> You could do a dump/load to a fresh database to see the actual usage after
a
> single round-trip.

I don't think it will work, because it will face the same problem (the dump
doesn't store the deltas, only full files). Anyway I dumped the repository,
loaded to fresh new repository. Now the strings is 3MB bigger than in the
previous repository (113MB)...

So still the same question, how I can check the size of delta?

Jan




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: vdelta performance

Posted by Greg Hudson <gh...@MIT.EDU>.
On Wed, 2004-03-17 at 09:33, Ph. Marek wrote:
> I mentioned some time ago that we could store Manber [1] and MD5 Hashes for 
> blocks, whose borders are determined by using the Manber Hashes.
> So we could locally run through the file (sometimes already done to verify 
> that the md5 differs), on this occasion put together the list of blocks, and 
> verify this list with the server - and only transmit differing blocks, which 
> *could* be broken down further by the server.

> The difference would be that instead of comparing 2 files of N bytes, we would 
> compare 2 lists of eg. N/4096 entries (which can be sorted prior to 
> comparision) - which would bound the computational effort a bit further down.

I don't understand what Subversion operations you think this approach
could optimize.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: vdelta performance

Posted by "Ph. Marek" <ph...@bmlv.gv.at>.
> I compiled and tested the svndiff-test. The results are similar (in fact
> slightly better) to my own delta engine.
I mentioned some time ago that we could store Manber [1] and MD5 Hashes for 
blocks, whose borders are determined by using the Manber Hashes.
So we could locally run through the file (sometimes already done to verify 
that the md5 differs), on this occasion put together the list of blocks, and 
verify this list with the server - and only transmit differing blocks, which 
*could* be broken down further by the server.

The difference would be that instead of comparing 2 files of N bytes, we would 
compare 2 lists of eg. N/4096 entries (which can be sorted prior to 
comparision) - which would bound the computational effort a bit further down.


Regards,

Phil


[1]:	http://citeseer.nj.nec.com/manber94finding.html


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: vdelta performance

Posted by Florian Weimer <fw...@deneb.enyo.de>.
Greg Hudson wrote:

> On Wed, 2004-03-17 at 18:10, Florian Weimer wrote:
> > I've compared it to xdelta and rdiff.  White it is significantly better
> > than rdiff, it's also worse than xdelta (both compressed and
> > uncompressed).
> > 
> > xdelta diffing is also faster, but xdelta doesn't seem to use bounded
> > amounts of memory.
> 
> Heh.  If you expand the window size to a few hundred megabytes, I'm sure
> you can get some very nice results from svndiff as well.

D'oh.

I would have expected that speed is higher, due to improved locality,
but the window management probably eats up the advantage.  xdelta's
linear space nature is thoroughly enshrined in its algorithm, according
the "File System Sypport for Delta Compression" paper.

Anyway, my timing tests suggest that spooling the new file into the DB
is the bottleneck, not the delta operation.  (Commit times are much,
much higher than svndiff-test times.)

-- 
Current mail filters: many dial-up/DSL/cable modem hosts, and the
following domains: atlas.cz, bigpond.com, freenet.de, hotmail.com,
libero.it, netscape.net, postino.it, tiscali.co.uk, tiscali.cz,
tiscali.it, voila.fr, wanadoo.fr, yahoo.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: vdelta performance

Posted by Greg Hudson <gh...@MIT.EDU>.
On Wed, 2004-03-17 at 18:10, Florian Weimer wrote:
> I've compared it to xdelta and rdiff.  White it is significantly better
> than rdiff, it's also worse than xdelta (both compressed and
> uncompressed).
> 
> xdelta diffing is also faster, but xdelta doesn't seem to use bounded
> amounts of memory.

Heh.  If you expand the window size to a few hundred megabytes, I'm sure
you can get some very nice results from svndiff as well.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: vdelta performance

Posted by Florian Weimer <fw...@deneb.enyo.de>.
Jan Bares wrote:

> I compiled and tested the svndiff-test. The results are similar (in fact
> slightly better) to my own delta engine.

I've compared it to xdelta and rdiff.  White it is significantly better
than rdiff, it's also worse than xdelta (both compressed and
uncompressed).

xdelta diffing is also faster, but xdelta doesn't seem to use bounded
amounts of memory.

-- 
Current mail filters: many dial-up/DSL/cable modem hosts, and the
following domains: atlas.cz, bigpond.com, freenet.de, hotmail.com,
libero.it, netscape.net, postino.it, tiscali.co.uk, tiscali.cz,
tiscali.it, voila.fr, wanadoo.fr, yahoo.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: vdelta performance

Posted by Jan Bares <he...@yahoo.com>.
I compiled and tested the svndiff-test. The results are similar (in fact
slightly better) to my own delta engine.

Jan

"Jan Bares" <he...@yahoo.com> wrote in message
news:c390ak$hni$1@sea.gmane.org...
> Thank you both, I will give it a try.
>
> Jan
>
> > I was going to suggest the same thing, but then I noticed it was broken,
> > and then I got sidetracked while fixing it.
> >
> > Upshot: the version of svndiff-test.c we ship in any release so far is
> > only useful for generating nice juicy core files.  Grab the
> > svndiff-test.c from the trunk for now.




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: vdelta performance

Posted by Jan Bares <he...@yahoo.com>.
Thank you both, I will give it a try.

Jan

> I was going to suggest the same thing, but then I noticed it was broken,
> and then I got sidetracked while fixing it.
>
> Upshot: the version of svndiff-test.c we ship in any release so far is
> only useful for generating nice juicy core files.  Grab the
> svndiff-test.c from the trunk for now.




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: vdelta performance

Posted by Greg Hudson <gh...@MIT.EDU>.
On Tue, 2004-03-16 at 13:11, Branko Cibej wrote:
> I don't think we actually record these numbers anywhere. I suggest you
> take a look at subversion/tests/libsvn_delta/svndiff-test.c

I was going to suggest the same thing, but then I noticed it was broken,
and then I got sidetracked while fixing it.

Upshot: the version of svndiff-test.c we ship in any release so far is
only useful for generating nice juicy core files.  Grab the
svndiff-test.c from the trunk for now.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: vdelta performance

Posted by Branko Cibej <br...@xbc.nu>.
Quoting John Peacock <jp...@rowman.com>:

> Jan Bares wrote:
> > I did simple test with Subversion 1.0. I inserted 55MB Microsoft
> Access
> > database. The whole size of repository (strings) was ~55MB. Then I
> > modified/deleted/added some records and commited the changes. The size
> of
> > the repository (strings) is now ~110MB. 
> 
> Remember that BDB is a preallocated database, i.e. the file size only grows. 
> After the update, but before the deltify, the database has two full copies of 
> the original file (hence 110MB).  After the delta has been calculated, most of 
> that space is recovered.  If you were to do the same modify/commit several 
> times, you shouldn't see the strings database grow by any significant amount.
> 
> You could do a dump/load to a fresh database to see the actual usage
> after a single round-trip.

That won't help, because the dump file contains full contents, too,
and the load goes through the same contortion.

I don't think we actually record these numbers anywhere. I suggest you
take a look at subversion/tests/libsvn_delta/svndiff-test.c -- that
takes two files, computes the vdelta between them and outputs base64
encoded svndiff, which is 4/3 the size of what goes into the repository.

Note also that the repository stores reverse deltas, not forward deltas,
but the size should be about the same. Oh, also, the window size we use
is 100 100 kiB; you can change SVN_STREAM_CHUNK_SIZE to get a different
window size.


-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: vdelta performance

Posted by John Peacock <jp...@rowman.com>.
Jan Bares wrote:
> I did simple test with Subversion 1.0. I inserted 55MB Microsoft Access
> database. The whole size of repository (strings) was ~55MB. Then I
> modified/deleted/added some records and commited the changes. The size of
> the repository (strings) is now ~110MB. 

Remember that BDB is a preallocated database, i.e. the file size only grows. 
After the update, but before the deltify, the database has two full copies of 
the original file (hence 110MB).  After the delta has been calculated, most of 
that space is recovered.  If you were to do the same modify/commit several 
times, you shouldn't see the strings database grow by any significant amount.

You could do a dump/load to a fresh database to see the actual usage after a 
single round-trip.

John

-- 
John Peacock
Director of Information Research and Technology
Rowman & Littlefield Publishing Group
4720 Boston Way
Lanham, MD 20706
301-459-3366 x.5010
fax 301-429-5747

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org