You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by kf...@collab.net on 2001/09/24 20:14:19 UTC

deltification semi-rewrite starting now

Mike Pilato and I have just reviewed Branko's deltification proposal,
found at

   notes/delta-indexing-and-composition.txt

and like what we see :-).  We have a couple of questions that probably
Branko can answer quickly, but basically we're going to start
implementing it now, completion anticipated in 2 weeks max (thank
goodness all the strings/reps separation is already done, so that
whole wheel doesn't need to be reinvented).

The plan is that we'll also implement a new `svnadmin' subcommand for
deltifying and undeltifying revisions, or particular paths within
revisions.  That way, administrators have a way to make certain trees
very efficient to retrieve -- for example, one might want to do this
to a tagged release -- and also gives us an obvious way to deltify the
storage of the current svn repository without perturbing the revision
numbers. :-)

Branko, a couple of questions regarding your lovely design:

> So, here's my proposal
> 
> 1) Change the delta representation to index and store delta windows 
> separately
> 
>         DELTA ::= (("delta" FLAG ...) (OFFSET WINDOW) ...) ;
>        WINDOW ::= DIFF SIZE CHECKSUM [REP-KEY REP-OFFSET] ;
>        OFFSET ::= number ;
>    REP-OFFSET ::= number;
> 
> 
> The REP-KEY and REP-OFFSET in WINDOW are optional because, if the 
> differences between two file revisions is large enough, the diff could 
> in fact be larger than a compression-only vdelta of the text region. In 
> that case it makes more sense to compress the window than to store a diff.

We're not sure what REP-OFFSET is for.  

We're pretty sure we understand OFFSET.  It's the offset into the
reconstructed fulltext.  The OFFSETs increase with each WINDOW in a
DELTA, and you can tell a given window's reconstruction range either
by adding OFFSET + SIZE, or by subtracting one OFFSET from the next.

Hopefully that's a correct summary. :-)

But what is REP-OFFSET?  We understand the REP-KEY that precedes it.
That's simply the representation against whose fulltext this delta
applies, right?  But why would we want an offset into that rep?  We
had thought the relevant offset(s) are part of the svndiff encoding.
Is it a way of magically jumping over a certain number of windows and
landing on the right one, in next-most-immediate source
representation, or is it something else?

We're still thinking about this, but maybe you can put us out of our
misery quickly. :-)

Also, did you mean

   WINDOW ::= (DIFF SIZE CHECKSUM [REP-KEY REP-OFFSET]) ;

i.e., with parens, rather than without?  Yes, it would work without
being a sublist, but for maintainability a sublist might be
preferable...

Anyway, we can start coding right away, while awaiting clarification.
Found no holes in the proposal; agree that there is a slight storage
penalty, but the memory usage and speed gains are so overwhelming that
it would be petty to complain about the *very* gently-sloped, albeit
linear, increase in storage per deltified file.

The replacing of distant diffs with ones nearer the fulltext is a
great idea; we'll probably wait on that until after the basic rewrite
is done, however, as it is an optimization, though a very effective
one.

-Karl and Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: deltification semi-rewrite starting now

Posted by kf...@collab.net.
Greg Stein <gs...@lyra.org> writes:
> Maybe we should release an M4 before enabling deltification in the server?
> What issues are left? (just properties?)

That's the plan, yeah, with M5 being only about deltification and the
commit-path rewrite.  M4 bugs are:

   496 -- possible wc-side commit bug
   382 -- DAV property handling rewrite
   487 -- versioned resource url out-of-date problems?

Any word on the latter two?

(The only unprioritized issue is 495, but it doesn't look like it
needs to be an M4 issue.)

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: deltification semi-rewrite starting now

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Sep 24, 2001 at 03:14:19PM -0500, kfogel@collab.net wrote:
>...
> and like what we see :-).  We have a couple of questions that probably
> Branko can answer quickly, but basically we're going to start
> implementing it now, completion anticipated in 2 weeks max (thank
> goodness all the strings/reps separation is already done, so that
> whole wheel doesn't need to be reinvented).

Maybe we should release an M4 before enabling deltification in the server?
What issues are left? (just properties?)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: deltification semi-rewrite starting now

Posted by kf...@collab.net.
Branko =?ISO-8859-2?Q?=C8ibej?= <br...@xbc.nu> writes:
> I remember, I remember!
> 
> The offsets embedded in the svndiff are stored in a string; these 
> offsets would be in the representation. The point is that you get all 
> the information you need to select the appropriate windows from the rep 
> skel -- without touching a single string. This means a bit more space 
> used in the repository, but lots less memory used on the server.

Okay, thanks.  We'll see how things work out in implementation --
might or might not end up using this, can't tell yet.

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: deltification semi-rewrite starting now

Posted by Branko Čibej <br...@xbc.nu>.
Branko �ibej wrote:

> kfogel@collab.net wrote:
>
>> But what is REP-OFFSET?  We understand the REP-KEY that precedes it.
>> That's simply the representation against whose fulltext this delta
>> applies, right?
>>
> Let me think ... Yes.
>
>>  But why would we want an offset into that rep?  We
>> had thought the relevant offset(s) are part of the svndiff encoding.
>> Is it a way of magically jumping over a certain number of windows and
>> landing on the right one, in next-most-immediate source
>> representation, or is it something else?
>>
> Although the offset is implicit in the svndiff, in real life you want 
> to find the source (fulltext) *before* decoding the window. Also, as I 
> noted, you might want to just use a (self-referencing) vdelta compress 
> instead of a diff, if the result of the compression is smaller than 
> the diff.
>
> Hmm. It's been a long time since I wrote that, and as usual I left 
> some of the reasoning out. I'll have to think about this again. I sort 
> of remember it had to do with true random access to the text. 


I remember, I remember!

The offsets embedded in the svndiff are stored in a string; these 
offsets would be in the representation. The point is that you get all 
the information you need to select the appropriate windows from the rep 
skel -- without touching a single string. This means a bit more space 
used in the repository, but lots less memory used on the server.

-- 
Brane �ibej   <br...@xbc.nu>            http://www.xbc.nu/brane/




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: deltification semi-rewrite starting now

Posted by Branko Čibej <br...@xbc.nu>.
kfogel@collab.net wrote:

>Mike Pilato and I have just reviewed Branko's deltification proposal,
>found at
>
>   notes/delta-indexing-and-composition.txt
>
>and like what we see :-).  We have a couple of questions that probably
>Branko can answer quickly, but basically we're going to start
>implementing it now, completion anticipated in 2 weeks max (thank
>goodness all the strings/reps separation is already done, so that
>whole wheel doesn't need to be reinvented).
>
>The plan is that we'll also implement a new `svnadmin' subcommand for
>deltifying and undeltifying revisions, or particular paths within
>revisions.  That way, administrators have a way to make certain trees
>very efficient to retrieve -- for example, one might want to do this
>to a tagged release -- and also gives us an obvious way to deltify the
>storage of the current svn repository without perturbing the revision
>numbers. :-)
>
>Branko, a couple of questions regarding your lovely design:
>
>>So, here's my proposal
>>
>>1) Change the delta representation to index and store delta windows 
>>separately
>>
>>        DELTA ::= (("delta" FLAG ...) (OFFSET WINDOW) ...) ;
>>       WINDOW ::= DIFF SIZE CHECKSUM [REP-KEY REP-OFFSET] ;
>>       OFFSET ::= number ;
>>   REP-OFFSET ::= number;
>>
>>
>>The REP-KEY and REP-OFFSET in WINDOW are optional because, if the 
>>differences between two file revisions is large enough, the diff could 
>>in fact be larger than a compression-only vdelta of the text region. In 
>>that case it makes more sense to compress the window than to store a diff.
>>
>
>We're not sure what REP-OFFSET is for.  
>
>We're pretty sure we understand OFFSET.  It's the offset into the
>reconstructed fulltext.  The OFFSETs increase with each WINDOW in a
>DELTA, and you can tell a given window's reconstruction range either
>by adding OFFSET + SIZE, or by subtracting one OFFSET from the next.
>
>Hopefully that's a correct summary. :-)
>
Yes, that is exactly right.


>But what is REP-OFFSET?  We understand the REP-KEY that precedes it.
>That's simply the representation against whose fulltext this delta
>applies, right?
>
Let me think ... Yes.

>  But why would we want an offset into that rep?  We
>had thought the relevant offset(s) are part of the svndiff encoding.
>Is it a way of magically jumping over a certain number of windows and
>landing on the right one, in next-most-immediate source
>representation, or is it something else?
>
Although the offset is implicit in the svndiff, in real life you want to 
find the source (fulltext) *before* decoding the window. Also, as I 
noted, you might want to just use a (self-referencing) vdelta compress 
instead of a diff, if the result of the compression is smaller than the 
diff.

Hmm. It's been a long time since I wrote that, and as usual I left some 
of the reasoning out. I'll have to think about this again. I sort of 
remember it had to do with true random access to the text.


>We're still thinking about this, but maybe you can put us out of our
>misery quickly. :-)
>
Thanks, you just got me worrying about it. :-)

>Also, did you mean
>
>   WINDOW ::= (DIFF SIZE CHECKSUM [REP-KEY REP-OFFSET]) ;
>
>i.e., with parens, rather than without?  Yes, it would work without
>being a sublist, but for maintainability a sublist might be
>preferable...
>
I meant without params, but obviously it doesn't hurt to make a sublist 
out of it. Use whatever you find more aesthetically pleasing. :-)

>Anyway, we can start coding right away, while awaiting clarification.
>Found no holes in the proposal; agree that there is a slight storage
>penalty, but the memory usage and speed gains are so overwhelming that
>it would be petty to complain about the *very* gently-sloped, albeit
>linear, increase in storage per deltified file.
>
Wonderful. Now I /really/ have to dust off and finish the delta combiner.


>The replacing of distant diffs with ones nearer the fulltext is a
>great idea; we'll probably wait on that until after the basic rewrite
>is done, however, as it is an optimization, though a very effective
>one.
>
Yes, it's an optimization only. What's more, it can be done entirely 
off-line.


-- 
Brane �ibej   <br...@xbc.nu>            http://www.xbc.nu/brane/




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org