You are viewing a plain text version of this content. The canonical link for it is here.
Posted to slide-dev@jakarta.apache.org by Torgeir Veimo <to...@vertech.no> on 2001/12/05 17:37:12 UTC

delta diff implementation

I have the first prototype ready for a java based implementation of Josh 
MacDonalds xdelta binary diff/delta algorithm. See 
http://sourceforge.net/projects/xdelta for more information on this 
algorithm.

It is available for download here; http://www.vertech.no/java-xdelta.tar.gz

This implementation uses the GDIFF file format for storing binary 
deltas. There is also a patcher that takes a source file and a patch 
file in this format and produces the original file.

When this implementation is mature it might be usefull if someone wants 
to implement a versioned store implementation for slide with reduced 
physical storage requirements.

I am seeking feedback on this. Warning; this is pre-alpha quality code.

-- 
-Torgeir


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: delta diff implementation

Posted by Torgeir Veimo <to...@vertech.no>.
Kelvin Tan wrote:

> I couldn't resist peeking...:)


> Why did you choose to implement your own implementation of the Adler32 algo
> in Checksum.java? What about java.util.zip.Adler32?


This is still the weakest part of the implementation. The reason was 
that I wanted to try to use the update method, so that only one byte 
goes into the adler checksum, instead of regenerating the entire 16 byte 
checksum for each new byte.

-- 
-Torgeir


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: delta diff implementation

Posted by Kelvin Tan <ke...@relevanz.com>.
Torgeir,

I couldn't resist peeking...:)

Why did you choose to implement your own implementation of the Adler32 algo
in Checksum.java? What about java.util.zip.Adler32?

Kelvin

----- Original Message -----
From: Torgeir Veimo <to...@vertech.no>
To: Slide Developers List <sl...@jakarta.apache.org>
Sent: Thursday, December 06, 2001 7:36 PM
Subject: Re: delta diff implementation


> Remy Maucherat wrote:
>
> >>I have the first prototype ready for a java based implementation of Josh
> >>MacDonalds xdelta binary diff/delta algorithm. See
> >>http://sourceforge.net/projects/xdelta for more information on this
> >>algorithm.
> >>
> >>It is available for download here;
> >>
> > http://www.vertech.no/java-xdelta.tar.gz
> >
> >>This implementation uses the GDIFF file format for storing binary
> >>deltas. There is also a patcher that takes a source file and a patch
> >>file in this format and produces the original file.
> >>
> >>When this implementation is mature it might be usefull if someone wants
> >>to implement a versioned store implementation for slide with reduced
> >>physical storage requirements.
> >>
> >>I am seeking feedback on this. Warning; this is pre-alpha quality code.
> >>
> >
> > Obviously, that's a very useful library. I'll try to play with it :)
>
>
> Some notes; as you'll see, the source 'file' needs to be seekable, so I
> use a RandomAccessFile in this implementation. This is a property of it
> being a copy-insert algorithm, and not a delete-insert like vanilla
> diff. It is possible to create an implementation that doesn't the source
> doesn't need to be seekable. But it *has* to be readable twice.
>
> There are implication eg. if one wants to store the last revision as the
> full one, and the previous as a delta; since they last revision might be
> available as an input stream from a http put. One way of doing it would
> be to write to the store first, and at the same time generate the hash
> fingerprints, then do the delta run on the source from the store.
> Reading the source from the store would probably imply that it is
> seekable to?
>
> When patching, the source needs to be seekable as well, due to the same
> reason as above.
>
> It is not very space optimized yet, as it uses a hashtable. I will use
> an array as the hashtable in the next release, as the algorithm really
> was designed for this.
>
> The GDIFFWriter could really be a special outputstream. The same goes
> for the patcher; it could be an inputstream that took the source input
> and the delta input as parameters.
>
> I have tried the algorithm on different input data, but it obviously
> needs a lot of testing. It would also be nice if someone could proof
> read my io code; I'm not shure I got all the signed / unsigned stuff
right.
>
> --
> -Torgeir
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: delta diff implementation

Posted by Torgeir Veimo <to...@vertech.no>.
Remy Maucherat wrote:

>>I have the first prototype ready for a java based implementation of Josh
>>MacDonalds xdelta binary diff/delta algorithm. See
>>http://sourceforge.net/projects/xdelta for more information on this
>>algorithm.
>>
>>It is available for download here;
>>
> http://www.vertech.no/java-xdelta.tar.gz
> 
>>This implementation uses the GDIFF file format for storing binary
>>deltas. There is also a patcher that takes a source file and a patch
>>file in this format and produces the original file.
>>
>>When this implementation is mature it might be usefull if someone wants
>>to implement a versioned store implementation for slide with reduced
>>physical storage requirements.
>>
>>I am seeking feedback on this. Warning; this is pre-alpha quality code.
>>
> 
> Obviously, that's a very useful library. I'll try to play with it :)


Some notes; as you'll see, the source 'file' needs to be seekable, so I 
use a RandomAccessFile in this implementation. This is a property of it 
being a copy-insert algorithm, and not a delete-insert like vanilla 
diff. It is possible to create an implementation that doesn't the source 
doesn't need to be seekable. But it *has* to be readable twice.

There are implication eg. if one wants to store the last revision as the 
full one, and the previous as a delta; since they last revision might be 
available as an input stream from a http put. One way of doing it would 
be to write to the store first, and at the same time generate the hash 
fingerprints, then do the delta run on the source from the store. 
Reading the source from the store would probably imply that it is 
seekable to?

When patching, the source needs to be seekable as well, due to the same 
reason as above.

It is not very space optimized yet, as it uses a hashtable. I will use 
an array as the hashtable in the next release, as the algorithm really 
was designed for this.

The GDIFFWriter could really be a special outputstream. The same goes 
for the patcher; it could be an inputstream that took the source input 
and the delta input as parameters.

I have tried the algorithm on different input data, but it obviously 
needs a lot of testing. It would also be nice if someone could proof 
read my io code; I'm not shure I got all the signed / unsigned stuff right.

-- 
-Torgeir


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: delta diff implementation

Posted by Remy Maucherat <re...@apache.org>.
> I have the first prototype ready for a java based implementation of Josh
> MacDonalds xdelta binary diff/delta algorithm. See
> http://sourceforge.net/projects/xdelta for more information on this
> algorithm.
>
> It is available for download here;
http://www.vertech.no/java-xdelta.tar.gz
>
> This implementation uses the GDIFF file format for storing binary
> deltas. There is also a patcher that takes a source file and a patch
> file in this format and produces the original file.
>
> When this implementation is mature it might be usefull if someone wants
> to implement a versioned store implementation for slide with reduced
> physical storage requirements.
>
> I am seeking feedback on this. Warning; this is pre-alpha quality code.

Obviously, that's a very useful library. I'll try to play with it :)

Remy


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>