You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Jim Blandy <ji...@savonarola.red-bean.com> on 2000/08/08 03:51:52 UTC

Text delta interface

Dear folks working on text deltas ---

I've committed an interface to subversion/include/svn_delta.h, under
the heading "Text deltas", that describes the initial interface we
would like to have to the text delta generator.  If you could
implement to that interface (and point out whatever problems you
encounter), that would be great.

The major points:

- The interface allows deltas to be computed in a stream-like fashion,
one section at a time.  This means that the delta generator can handle
arbitrarily large datasets, without requiring similarly large amounts
of memory for itself.

- The interface provides a C representation of deltas as a sequence of
substring copies from the source string, the target string generated
so far, and a new data string.  Among other things, the server will
use this form to combine a series of text deltas between successive
versions of a file into a single text delta describing the difference
between two distant versions.

- The interface provides functions for converting between the VCDIFF
delta format and the internal format.

We'll be adding to this interface over time, but I think everything
needed to talk to a basic delta engine is there.

Re: Text delta interface

Posted by Branko Čibej <br...@hermes.si>.
Jim Blandy wrote:
> Perhaps there is some common case that I'm misunderstanding, but
> concatenating the source and target seems to me more obscure than
> significant.  Maintainability is more important.

I couldn't agree more!

> But that's just the way the scales happened to tip for me.  If you
> prefer doing things otherwise, that's fine.

I'm aware the arguments go both ways. I just want to make sure we're
all on the same wavelength.

> I don't think it will make streaming more or less difficult.  The
> vdelta algorithm must process the entire source string before
> producing its first opcode anyway.  But the person writing the delta
> generator is in the best position to answer that question.
> 
> Would you prefer an interface closer to vdelta/vcdiff, in which there
> are only two opcodes, one of which copies from the virtual
> concatenation of the source and the target strings?  If that's easier
> to work with, then I think it's fine to change the interface.

No, let's keep the interface as it is; it's clear and simple.
I'm seeing vcdiff as an external representation, and there's no reason the
internal form should match it exactly. It's the job of the conversion functions
to do any optimisations that seem necessary, including the overlapping copy.

Anyway, unless I've completely missed something, an overlapping copy will
reduce the delta by one opcode. That's not something we need to care about
right now.


> We're going to ditch glib presently.  We're not using it.

O.K.


    Brane

-- 
Branko Čibej                 <br...@hermes.si>
HERMES SoftLab, Litijska 51, 1000 Ljubljana, Slovenia
voice: (+386 1) 586 53 49     fax: (+386 1) 586 52 70

Re: Text delta interface

Posted by Jim Blandy <ji...@savonarola.red-bean.com>.
> Reading the comments to svn_delta_op_t, I noticed the restriction that
> a copy from the source to the target (the svn_delta_source op) can't
> cross the source/target boundary. Neither the Vdelta algorithm nor the
> vcdiff format have that restriction. Was this meant to make it easier
> to stream the source and target data?

I ignored that trick from the papers because:
- it makes the window format easier to describe and (it seems to me)
  understand,
- the ability to use substrings that happen to cross from the end of
  the source string onto the beginning of the target string seems
  obscure, and unlikely to provide any significant additional
  compression, and
- it seemed to me that the virtual concatenation could make delta composition
  more complex (although I'm not sure about this).

Perhaps there is some common case that I'm misunderstanding, but
concatenating the source and target seems to me more obscure than
significant.  Maintainability is more important.

But that's just the way the scales happened to tip for me.  If you
prefer doing things otherwise, that's fine.

I don't think it will make streaming more or less difficult.  The
vdelta algorithm must process the entire source string before
producing its first opcode anyway.  But the person writing the delta
generator is in the best position to answer that question.

Would you prefer an interface closer to vdelta/vcdiff, in which there
are only two opcodes, one of which copies from the virtual
concatenation of the source and the target strings?  If that's easier
to work with, then I think it's fine to change the interface.


> > We'll be adding to this interface over time, but I think everything
> > needed to talk to a basic delta engine is there.
> 
> May I suggest a small change: s/ap_off_t/apr_off_t/g .

Yes, thank you.


> This brings me to another question I've been meaning to ask. Is it
> safe to assume we'll be using the APR library throughout the code?
> I noticed a copy of glib is in the repository, too. APR and glib
> overlap in several places. Which is preferred?

We're going to ditch glib presently.  We're not using it.

Re: Text delta interface

Posted by Branko Čibej <br...@hermes.si>.
Jim Blandy wrote:
> 
> Dear folks working on text deltas ---
> 
> I've committed an interface to subversion/include/svn_delta.h, under
> the heading "Text deltas", that describes the initial interface we
> would like to have to the text delta generator.  If you could
> implement to that interface (and point out whatever problems you
> encounter), that would be great.

I had a look at the interface, and it looks fine as a starting point.

Reading the comments to svn_delta_op_t, I noticed the restriction that
a copy from the source to the target (the svn_delta_source op) can't
cross the source/target boundary. Neither the Vdelta algorithm nor the
vcdiff format have that restriction. Was this meant to make it easier
to stream the source and target data?


> We'll be adding to this interface over time, but I think everything
> needed to talk to a basic delta engine is there.

May I suggest a small change: s/ap_off_t/apr_off_t/g .

This brings me to another question I've been meaning to ask. Is it
safe to assume we'll be using the APR library throughout the code?
I noticed a copy of glib is in the repository, too. APR and glib
overlap in several places. Which is preferred?

    Brane

-- 
Branko Čibej                 <br...@hermes.si>
HERMES SoftLab, Litijska 51, 1000 Ljubljana, Slovenia
voice: (+386 1) 586 53 49     fax: (+386 1) 586 52 70