You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Niels Werensteijn <n....@student.utwente.nl> on 2008/01/17 22:49:32 UTC

[PATCH] Replace vdelta with xdelta variant

Hi,
This is my first contribution to opensource so please tell me everything 
I am doing wrong :)

I have a big binary installer commited in svn. Exporting this binary 
costs a lot of time. I traced this down to vdelta. I noticed that it 
only gets used for chucks which have no source steam. XDelta is supposed 
to be much faster than vdelta. So I thought: why not adjust the xdelta 
algorithm to make a diff from itself. I copy/pasted most of the 
algorithm, adding some minor stuff.

So far I have tested this patch with 2 repositories.
1. Consists of only the sqlexpres installer (large binary 32 mb, 
compressed).
2. Is a software project of mine, consisting mainly of delphi source code.

The export time for 1 dropped from 10 secs to 7 secs. For repository 2 
it fropped from 58 secs to 53. The on disk sizes are negligibly smaller 
with the patch.

I'd like to hear what others are experiencing, and ofcourse eventually 
to include this patch in the trunk :)

Regards,
Niels Werensteijn

[[[
Replace vdelta with an xdelta variant that can create deltas from its 
own data.

* subversion/libsvn_delta/delta.h
   (svn_txdelta__xdelta_self): New prototype

* subversion/libsvn_delta/text_delta.c
   (compute_window): replace call to svn_txdelta__vdelta with
   a call to svn_txdelta__xdelta_self

* subversion/libsvn_delta/xdelta.c
   (init_blocks_table): Split this function in two. This part
   only initialized the blocks stuct
   (init_blocks_table_with_data): This has the same functionality
   as the old init_blocks_table
   (compute_delta): Change the call to old init_blocks_table to the
   new init_blocks_table_with_data
   (compute_delta_self): The routine that calculates the delta from
   own data
   (svn_txdelta__xdelta_self): The entry point for compute_delta_self
]]]

Re: [PATCH] Replace vdelta with xdelta variant

Posted by Daniel Berlin <db...@dberlin.org>.
On Jan 19, 2008 6:07 AM, Niels Werensteijn
<n....@student.utwente.nl> wrote:
> Daniel Berlin schreef:
>
> > On Jan 18, 2008 1:54 PM, Niels Werensteijn
> > <n....@student.utwente.nl> wrote:
> >>> Is this an implementation of any particular algorithm?
> >> No. I just looked at XDelta as it was implemented. I saw how it would be
> >> faster than vdelta, and adjusted the XDelta algorithm so it could make a
> >> diff against itself.
> >
> > So, my question becomes does this do better than running zlib's
> > compress over the stream?
>
> I suspect not. Xdelta is not optimally diffing, when looking at the
> output size, so it definitely is not optimal for compression. I have
> thought about implementing this as 1 insert command that inserts the
> whole stream(chunk). This would speed things up considerably on the cpu
> side, and when send between client and server, zlib will compress it (I
> only looked at svnserve, I am not sure DAV would use zlib). The down
> side would be that the files would also be stored this way in the
> repository. And there, as far as I know, no zlib compression is applied.

You should look again :)
We do compress the new data using zlib, which is why i asked if this
was worth it compared to that.
The only reason we kept vdelta is because there were clients that did
not understand this new "svndiff1" format (svndiff0 has uncompressed
new data) at the time.  I doubt anyone still uses a client that
doesn't support svndiff1 these days.



>
> I don't know what the policy is on repository size. I think repository
> size has been sacrificed when the switch from vdelta to xdelta was made.
> So if that is no problem, we could explore the "single insert" option.

Errr, not really.
We compress new data using zlib now. repositories are actually about
20% smaller than they were when we used vdelta.

>
> regards,
> Niels Werensteijn
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with xdelta variant

Posted by Niels Werensteijn <n....@student.utwente.nl>.
Daniel Berlin schreef:
> On Jan 18, 2008 1:54 PM, Niels Werensteijn
> <n....@student.utwente.nl> wrote:
>>> Is this an implementation of any particular algorithm?
>> No. I just looked at XDelta as it was implemented. I saw how it would be
>> faster than vdelta, and adjusted the XDelta algorithm so it could make a
>> diff against itself.
> 
> So, my question becomes does this do better than running zlib's
> compress over the stream?

I suspect not. Xdelta is not optimally diffing, when looking at the 
output size, so it definitely is not optimal for compression. I have 
thought about implementing this as 1 insert command that inserts the 
whole stream(chunk). This would speed things up considerably on the cpu 
side, and when send between client and server, zlib will compress it (I 
only looked at svnserve, I am not sure DAV would use zlib). The down 
side would be that the files would also be stored this way in the 
repository. And there, as far as I know, no zlib compression is applied.

I don't know what the policy is on repository size. I think repository 
size has been sacrificed when the switch from vdelta to xdelta was made. 
So if that is no problem, we could explore the "single insert" option.

regards,
Niels Werensteijn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with xdelta variant

Posted by Daniel Berlin <db...@dberlin.org>.
On Jan 18, 2008 1:54 PM, Niels Werensteijn
<n....@student.utwente.nl> wrote:
>
> > Is this an implementation of any particular algorithm?
>
> No. I just looked at XDelta as it was implemented. I saw how it would be
> faster than vdelta, and adjusted the XDelta algorithm so it could make a
> diff against itself.

So, my question becomes does this do better than running zlib's
compress over the stream?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with xdelta variant

Posted by Niels Werensteijn <n....@student.utwente.nl>.
Malcolm Rowe schreef:
> On Fri, Jan 18, 2008 at 07:54:59PM +0100, Niels Werensteijn wrote:
>> Malcolm Rowe wrote:
>>> On Thu, Jan 17, 2008 at 11:49:32PM +0100, Niels Werensteijn wrote:
>>> Interesting.  This sounds similar to a patch that Dan Berlin (cc'd)
>>> produced around the time of the Subversion conference, which used
>>> Rabin fingerprinting (basically a CRC) and a match table.
>> Is this in the mailinglist archive? If so, please give me a hint where to 
>> find it? :)
>>
> 
> Afraid not.  It's on my laptop :-/, and Dan should also have a copy.
> 
> Thinking about it a little more, we weren't actually planning on
> replacing vdelta -- we were going to replace xdelta, with something
> faster and/or better performing.
Ah ok. I did take a quick look at the paper you mentioned about the 
hsadelta algorithm. It might not even have to be to complex to implement 
  considering the small chunk sizes subversion uses for delta's (100K). 
The paper itself mentions that some of the datastructures could be left 
out at the expense of some cpu efficiency, which might not be a big 
problem with small chunks.

> vdelta-against-empty-stream is essentially just compression, so I wonder
> if we can just adapt/use zlib in some way (sorry, been a while since I
> looked at this).  Though you mentioned that your changes would help if a
> large binary file changed significantly -- why's that? (because as far
> as I'm aware we need to make a decision about what compression/delta
> scheme to use _before_ we see the whole file, which means we can't say
> things like 'use xdelta, or just self-compress if that's better'.

My main concern is cpu usage. My main subversion repository is on a 
relatively slow machine. When a new version of the binary is larger 
(starting at 100K boundaries) it will have chunks that will also have no 
source stream. In that case, for the new chucks, velta is now chosen. 
With the patch it is the xdelta variant. The xdelta variant is faster 
than vdelta at the expense of size efficiency.

The patch I made is mainly concerned about reducing cpu cost of (part 
of) streams that are diffed against an empty source stream, as seems to 
happen on a few cases.

To give an example: I want to export a big binary (32MB compressed 
windows installer) from my slow VIA cpu powered repository to my 
workstation. If I instruct subversion to only export that file, 
subversion just sends the whole file verbatem and that takes 2 seconds. 
If I instruct it to send the directory where the file is in (and only 
that file) it takes a whopping 46 seconds.. thats 2300% slower :) The 
bulk of that time is spend in first vdelta and than zlib compression. 
Since the binary is already compressed both vdelta(xdelta) and zlib 
won't find much to compress :(.

My first idea was to just make 1 insert that inserts the whole chunk, 
and then let zlib try to compress it, saving over 50% cpu time. But that 
would grow the repository, because most files are compressable by 
vdelta, and the repository does not use zlib. So I opted for a faster 
diff algoritm.

Ofcourse this is my particular situation. I agree that the best solution 
would be to use some sort of real compression scheme. But then this 
would have to be taken into account when sending chunks over a network. 
svnserve uses zlib to compress data (as said before, I don't know about 
DAV). Using zlib on a zlib stream uses lots of cpu time and uses more 
bytes. I am not yet formiliar enough with the source of subversion to 
make an estimate on how much work it would be to implement "zlib 
compression instead of diffing".

Regards,
Niels Werensteijn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with xdelta variant

Posted by Niels Werensteijn <n....@student.utwente.nl>.
Malcolm Rowe schreef:
> On Mon, Jan 21, 2008 at 02:56:23PM +0100, Niels Werensteijn wrote:
>> Malcolm Rowe schreef:
>>> If I remember correctly, that would disadvantage pre-1.4 clients and
>>> 'svnadmin dump --deltas', but would help in all other cases (in that
>>> vdelta self-compression+zlib is almost guaranteed to be slower/bigger
>>> than zlib alone).
>> It would also probably grow the repository on disk. But it would be a small 
>> grow because it would almost only affect first time checkins and files 
>> growing over a 100kb boundary.
>>
> 
> It shouldn't adversely affect the on-disk size if svndiff1 deltas are in
> use (which is the default in 1.4.x and later).  If anything, it should
> improve the size (and speed of reading/writing).
Well, I implemented this idea and you were right. My two test archives 
were a little smaller (though not significantly). I am assuming this 
also means "over the wire transmission" is reduced, when compression is 
used.
And the export times were greately reduced on my high power cpu:

Export Repository1 (delphi source code): from 58,6 to 42,0 seconds
Export Repository2 (big compressed bin): from 10,71 to 5,47 seconds

Should I submit this patch? And if yes, as a new thread or as 
continuation as this one? (What should I put in the mail subject)

Regards,
Niels Werensteijn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with xdelta variant

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Mon, Jan 21, 2008 at 02:56:23PM +0100, Niels Werensteijn wrote:
> Malcolm Rowe schreef:
>> If I remember correctly, that would disadvantage pre-1.4 clients and
>> 'svnadmin dump --deltas', but would help in all other cases (in that
>> vdelta self-compression+zlib is almost guaranteed to be slower/bigger
>> than zlib alone).
> It would also probably grow the repository on disk. But it would be a small 
> grow because it would almost only affect first time checkins and files 
> growing over a 100kb boundary.
>

It shouldn't adversely affect the on-disk size if svndiff1 deltas are in
use (which is the default in 1.4.x and later).  If anything, it should
improve the size (and speed of reading/writing).

Regards,
Malcolm

Re: [PATCH] Replace vdelta with xdelta variant

Posted by Niels Werensteijn <nw...@4-s.nl>.
Malcolm Rowe schreef:
> Sure; I meant "can we use the output from zlib to produce an
> svndiff-compatible stream".  Thinking about it some more, it might be
> more productive to just switch off vdelta entirely, and product a
> plaintext svndiff output (i.e. ADD <bytes>, ADD <bytes>, ...) for deltas
> against the empty stream, relying on an upper-level protocol to compress
> the data section of the output (i.e. svndiff1).
Ah. I had the same idea. It would be best, from a compression point of 
view in both time and space

> If I remember correctly, that would disadvantage pre-1.4 clients and
> 'svnadmin dump --deltas', but would help in all other cases (in that
> vdelta self-compression+zlib is almost guaranteed to be slower/bigger
> than zlib alone).
It would also probably grow the repository on disk. But it would be a 
small grow because it would almost only affect first time checkins and 
files growing over a 100kb boundary.

Regards,
Niels

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with xdelta variant

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Mon, Jan 21, 2008 at 12:01:30PM +0100, =?ISO-8859-2?Q?Branko_=C8ibej_ wrote:
> Please note that we currently expect things to be either plaintext, or 
> svndiff; and that we can always combine svndiffs. Introducing a 
> zlib-compressed thing would affect the delta combiner and a lot of other 
> things.
>

Sure; I meant "can we use the output from zlib to produce an
svndiff-compatible stream".  Thinking about it some more, it might be
more productive to just switch off vdelta entirely, and product a
plaintext svndiff output (i.e. ADD <bytes>, ADD <bytes>, ...) for deltas
against the empty stream, relying on an upper-level protocol to compress
the data section of the output (i.e. svndiff1).

If I remember correctly, that would disadvantage pre-1.4 clients and
'svnadmin dump --deltas', but would help in all other cases (in that
vdelta self-compression+zlib is almost guaranteed to be slower/bigger
than zlib alone).

Or perhaps communicate the existence of the secondary compressor to the
delta engine, and make it switch between vdelta/plaintext based on that.
I've looked at doing that previously, though, and it's not very clean.

Regards,
Malcolm

Re: [PATCH] Replace vdelta with xdelta variant

Posted by Branko Čibej <br...@xbc.nu>.
Malcolm Rowe wrote:
> On Fri, Jan 18, 2008 at 07:54:59PM +0100, Niels Werensteijn wrote:
>   
>> Malcolm Rowe wrote:
>>     
>>> On Thu, Jan 17, 2008 at 11:49:32PM +0100, Niels Werensteijn wrote:
>>> Interesting.  This sounds similar to a patch that Dan Berlin (cc'd)
>>> produced around the time of the Subversion conference, which used
>>> Rabin fingerprinting (basically a CRC) and a match table.
>>>       
>> Is this in the mailinglist archive? If so, please give me a hint where to 
>> find it? :)
>>
>>     
>
> Afraid not.  It's on my laptop :-/, and Dan should also have a copy.
>
> Thinking about it a little more, we weren't actually planning on
> replacing vdelta -- we were going to replace xdelta, with something
> faster and/or better performing.
>
> vdelta-against-empty-stream is essentially just compression, so I wonder
> if we can just adapt/use zlib in some way (sorry, been a while since I
> looked at this).  Though you mentioned that your changes would help if a
> large binary file changed significantly -- why's that? (because as far
> as I'm aware we need to make a decision about what compression/delta
> scheme to use _before_ we see the whole file, which means we can't say
> things like 'use xdelta, or just self-compress if that's better'.
>   

Please note that we currently expect things to be either plaintext, or 
svndiff; and that we can always combine svndiffs. Introducing a 
zlib-compressed thing would affect the delta combiner and a lot of other 
things.

Just nothing that this may be slighlty more work than you expect.

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with xdelta variant

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Fri, Jan 18, 2008 at 07:54:59PM +0100, Niels Werensteijn wrote:
> Malcolm Rowe wrote:
>> On Thu, Jan 17, 2008 at 11:49:32PM +0100, Niels Werensteijn wrote:
>> Interesting.  This sounds similar to a patch that Dan Berlin (cc'd)
>> produced around the time of the Subversion conference, which used
>> Rabin fingerprinting (basically a CRC) and a match table.
>
> Is this in the mailinglist archive? If so, please give me a hint where to 
> find it? :)
>

Afraid not.  It's on my laptop :-/, and Dan should also have a copy.

Thinking about it a little more, we weren't actually planning on
replacing vdelta -- we were going to replace xdelta, with something
faster and/or better performing.

vdelta-against-empty-stream is essentially just compression, so I wonder
if we can just adapt/use zlib in some way (sorry, been a while since I
looked at this).  Though you mentioned that your changes would help if a
large binary file changed significantly -- why's that? (because as far
as I'm aware we need to make a decision about what compression/delta
scheme to use _before_ we see the whole file, which means we can't say
things like 'use xdelta, or just self-compress if that's better'.

Regards,
Malcolm

Re: [PATCH] Replace vdelta with xdelta variant

Posted by Niels Werensteijn <n....@student.utwente.nl>.
Malcolm Rowe wrote:
> On Thu, Jan 17, 2008 at 11:49:32PM +0100, Niels Werensteijn wrote:
>> +   This algorithm works the same as xdelta, with the exception
>> +   that it gets blocks from its own data.
>> +
>> +   1. Checksum the first MATCH_BLOCKSIZE block of bytes using adler32, and
>> +      insert the checksum into a match table with the position of the match.
>> +   2. Go through the target byte by byte, starting at position MATCH_BLOCKSIZE.
>> +      See if that byte starts a match that we have in the match table.
>> +      2a. If so, try to extend the match as far as possible both
>> +          forwards and backwards, and then insert a source copy
>> +          operation into the delta ops builder for the match.
>> +          Care must be taken here to find match data only from "previous" data,
>> +          that we know is already in the target window.
>> +      2b. If not, insert the byte as new data using an insert delta op.
>> +
>> +   Our implementation doesn't immediately insert "insert" operations,
>> +   it waits until we have another copy, or we are done.  The reasoning
>> +   is twofold:
>> +
>> +   1. Otherwise, we would just be building a ton of 1 byte insert
>> +      operations
>> +   2. So that we can extend a source match backwards into a pending
>> +     insert operation, and possibly remove the need for the insert
>> +     entirely.  This can happen due to stream alignment.
> 
> Interesting.  This sounds similar to a patch that Dan Berlin (cc'd)
> produced around the time of the Subversion conference, which used
> Rabin fingerprinting (basically a CRC) and a match table.

Is this in the mailinglist archive? If so, please give me a hint where 
to find it? :)

> 
> Is this an implementation of any particular algorithm?

No. I just looked at XDelta as it was implemented. I saw how it would be 
faster than vdelta, and adjusted the XDelta algorithm so it could make a 
diff against itself.

> There are two papers I was looking at around that time in this area:
> 
> M. Ajtai, R. Burns, R. Fagin, D. D. E. Long, and L. Stockmeyer.
> Compactly encoding unstructured input with differential compression.
> www.almaden.ibm.com/cs/people/stock/diff7.ps, IBM Research Report RJ
> 10187, April 2000.
> http://citeseer.ist.psu.edu/article/ajtai00compactly.html
> 
> (which your algorithm sounds very similar to, in particular the 1.5-pass
> correcting version), and also
> 
> An approximation to the greedy algorithm for differential compression
> by R. C. Agarwal, K. Gupta, S. Jain, S. Amalapurapu
> IBM Journal of Research and Development
> Volume 50, Number 1, Page 149 (2006)
> http://www.research.ibm.com/journal/rd/501/agarwal.pdf
> 
> (which presents a possibly-better-performing, but much more complex
> algorithm).
> 
> I never got the time to evaluate Dan's patch, or do any real work in
> this area, but it's something we should consider doing.

Ill take a look at them. Always interresting!

> I think Dan produced some performance (time, space) results for the gcc
> repository.  It'd be worth comparing them to what you've got.

Well, One thing I am also thinking about. If we are diffing against an 
empty source stream (what we are doing here), it is basicly compression. 
  Perhaps there are better algorithms for this problem.

Regards,
Niels Werensteijn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with xdelta variant

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Thu, Jan 17, 2008 at 11:49:32PM +0100, Niels Werensteijn wrote:
> +   This algorithm works the same as xdelta, with the exception
> +   that it gets blocks from its own data.
> +
> +   1. Checksum the first MATCH_BLOCKSIZE block of bytes using adler32, and
> +      insert the checksum into a match table with the position of the match.
> +   2. Go through the target byte by byte, starting at position MATCH_BLOCKSIZE.
> +      See if that byte starts a match that we have in the match table.
> +      2a. If so, try to extend the match as far as possible both
> +          forwards and backwards, and then insert a source copy
> +          operation into the delta ops builder for the match.
> +          Care must be taken here to find match data only from "previous" data,
> +          that we know is already in the target window.
> +      2b. If not, insert the byte as new data using an insert delta op.
> +
> +   Our implementation doesn't immediately insert "insert" operations,
> +   it waits until we have another copy, or we are done.  The reasoning
> +   is twofold:
> +
> +   1. Otherwise, we would just be building a ton of 1 byte insert
> +      operations
> +   2. So that we can extend a source match backwards into a pending
> +     insert operation, and possibly remove the need for the insert
> +     entirely.  This can happen due to stream alignment.

Interesting.  This sounds similar to a patch that Dan Berlin (cc'd)
produced around the time of the Subversion conference, which used
Rabin fingerprinting (basically a CRC) and a match table.

Is this an implementation of any particular algorithm?
There are two papers I was looking at around that time in this area:

M. Ajtai, R. Burns, R. Fagin, D. D. E. Long, and L. Stockmeyer.
Compactly encoding unstructured input with differential compression.
www.almaden.ibm.com/cs/people/stock/diff7.ps, IBM Research Report RJ
10187, April 2000.
http://citeseer.ist.psu.edu/article/ajtai00compactly.html

(which your algorithm sounds very similar to, in particular the 1.5-pass
correcting version), and also

An approximation to the greedy algorithm for differential compression
by R. C. Agarwal, K. Gupta, S. Jain, S. Amalapurapu
IBM Journal of Research and Development
Volume 50, Number 1, Page 149 (2006)
http://www.research.ibm.com/journal/rd/501/agarwal.pdf

(which presents a possibly-better-performing, but much more complex
algorithm).

I never got the time to evaluate Dan's patch, or do any real work in
this area, but it's something we should consider doing.

I think Dan produced some performance (time, space) results for the gcc
repository.  It'd be worth comparing them to what you've got.

Regards,
Malcolm

Re: [PATCH] Replace vdelta with xdelta variant

Posted by Niels Werensteijn <n....@student.utwente.nl>.
Mark Phippard schreef:
> 2008/1/17 Niels Werensteijn <n....@student.utwente.nl>:
> 
>> This is my first contribution to opensource so please tell me everything
>> I am doing wrong :)
>>
>> I have a big binary installer commited in svn. Exporting this binary
>> costs a lot of time. I traced this down to vdelta. I noticed that it
>> only gets used for chucks which have no source steam. XDelta is supposed
>> to be much faster than vdelta. So I thought: why not adjust the xdelta
>> algorithm to make a diff from itself. I copy/pasted most of the
>> algorithm, adding some minor stuff.
> 
> I am confused by this.  I assumed someone who understands this more
> would respond.  Since no one has, I'll try.  Have you seen these two
> things from past releases?
> 
> http://subversion.tigris.org/svn_1.2_releasenotes.html#xdelta
> 
> http://subversion.tigris.org/svn_1.4_releasenotes.html#svndiff1
> 
> Basically, SVN has already switched from vdelta to xdelta.  Now, I
> believe you would have to be on 1.4 and have created your repository
> and dumped/loaded to get all of the benefits.  So I did not know if
> your patch was making an incremental improvement for people that have
> not been able to do this, or if you just missed that we switched to
> xdelta altogether.

Yes subversion uses xdelta except in one case. When the source chunk of 
the two streams that are "send" is 0. This is the case when the whole 
source stream is length 0, or in some cases if the target stream is 
(much) larger that the source stream. In this case subversion still uses 
vdelta, because xdelta as implemented cannot make a diff without a 
source stream. And this is where my solution comes in.

A source stream has length 0 if the file is entirely new. Also 
svn_txdelta_send_stream in text_delta.c sends streams as a diff against 
an empty stream. I am not sure if there are more situations where 
streams are send as a diff against an empty source. (Perhaps export? i 
don't know).

I assume that the repository shrink comes from:
1. The first time a file is checked in. This is now compressed with my 
version of xdelta, instead of vdelta.
2. When a new version of a file is checked in which is much larger than 
the old version.


Regards,
Niels Werensteijn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with xdelta variant

Posted by Mark Phippard <ma...@gmail.com>.
2008/1/17 Niels Werensteijn <n....@student.utwente.nl>:

> This is my first contribution to opensource so please tell me everything
> I am doing wrong :)
>
> I have a big binary installer commited in svn. Exporting this binary
> costs a lot of time. I traced this down to vdelta. I noticed that it
> only gets used for chucks which have no source steam. XDelta is supposed
> to be much faster than vdelta. So I thought: why not adjust the xdelta
> algorithm to make a diff from itself. I copy/pasted most of the
> algorithm, adding some minor stuff.

I am confused by this.  I assumed someone who understands this more
would respond.  Since no one has, I'll try.  Have you seen these two
things from past releases?

http://subversion.tigris.org/svn_1.2_releasenotes.html#xdelta

http://subversion.tigris.org/svn_1.4_releasenotes.html#svndiff1

Basically, SVN has already switched from vdelta to xdelta.  Now, I
believe you would have to be on 1.4 and have created your repository
and dumped/loaded to get all of the benefits.  So I did not know if
your patch was making an incremental improvement for people that have
not been able to do this, or if you just missed that we switched to
xdelta altogether.

Thanks

Mark Phippard
http://markphip.blogspot.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org