You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by kf...@collab.net on 2001/09/24 20:14:19 UTC
deltification semi-rewrite starting now
Mike Pilato and I have just reviewed Branko's deltification proposal,
found at
notes/delta-indexing-and-composition.txt
and like what we see :-). We have a couple of questions that probably
Branko can answer quickly, but basically we're going to start
implementing it now, completion anticipated in 2 weeks max (thank
goodness all the strings/reps separation is already done, so that
whole wheel doesn't need to be reinvented).
The plan is that we'll also implement a new `svnadmin' subcommand for
deltifying and undeltifying revisions, or particular paths within
revisions. That way, administrators have a way to make certain trees
very efficient to retrieve -- for example, one might want to do this
to a tagged release -- and also gives us an obvious way to deltify the
storage of the current svn repository without perturbing the revision
numbers. :-)
Branko, a couple of questions regarding your lovely design:
> So, here's my proposal
>
> 1) Change the delta representation to index and store delta windows
> separately
>
> DELTA ::= (("delta" FLAG ...) (OFFSET WINDOW) ...) ;
> WINDOW ::= DIFF SIZE CHECKSUM [REP-KEY REP-OFFSET] ;
> OFFSET ::= number ;
> REP-OFFSET ::= number;
>
>
> The REP-KEY and REP-OFFSET in WINDOW are optional because, if the
> differences between two file revisions is large enough, the diff could
> in fact be larger than a compression-only vdelta of the text region. In
> that case it makes more sense to compress the window than to store a diff.
We're not sure what REP-OFFSET is for.
We're pretty sure we understand OFFSET. It's the offset into the
reconstructed fulltext. The OFFSETs increase with each WINDOW in a
DELTA, and you can tell a given window's reconstruction range either
by adding OFFSET + SIZE, or by subtracting one OFFSET from the next.
Hopefully that's a correct summary. :-)
But what is REP-OFFSET? We understand the REP-KEY that precedes it.
That's simply the representation against whose fulltext this delta
applies, right? But why would we want an offset into that rep? We
had thought the relevant offset(s) are part of the svndiff encoding.
Is it a way of magically jumping over a certain number of windows and
landing on the right one, in next-most-immediate source
representation, or is it something else?
We're still thinking about this, but maybe you can put us out of our
misery quickly. :-)
Also, did you mean
WINDOW ::= (DIFF SIZE CHECKSUM [REP-KEY REP-OFFSET]) ;
i.e., with parens, rather than without? Yes, it would work without
being a sublist, but for maintainability a sublist might be
preferable...
Anyway, we can start coding right away, while awaiting clarification.
Found no holes in the proposal; agree that there is a slight storage
penalty, but the memory usage and speed gains are so overwhelming that
it would be petty to complain about the *very* gently-sloped, albeit
linear, increase in storage per deltified file.
The replacing of distant diffs with ones nearer the fulltext is a
great idea; we'll probably wait on that until after the basic rewrite
is done, however, as it is an optimization, though a very effective
one.
-Karl and Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: deltification semi-rewrite starting now
Posted by kf...@collab.net.
Greg Stein <gs...@lyra.org> writes:
> Maybe we should release an M4 before enabling deltification in the server?
> What issues are left? (just properties?)
That's the plan, yeah, with M5 being only about deltification and the
commit-path rewrite. M4 bugs are:
496 -- possible wc-side commit bug
382 -- DAV property handling rewrite
487 -- versioned resource url out-of-date problems?
Any word on the latter two?
(The only unprioritized issue is 495, but it doesn't look like it
needs to be an M4 issue.)
-Karl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: deltification semi-rewrite starting now
Posted by Greg Stein <gs...@lyra.org>.
On Mon, Sep 24, 2001 at 03:14:19PM -0500, kfogel@collab.net wrote:
>...
> and like what we see :-). We have a couple of questions that probably
> Branko can answer quickly, but basically we're going to start
> implementing it now, completion anticipated in 2 weeks max (thank
> goodness all the strings/reps separation is already done, so that
> whole wheel doesn't need to be reinvented).
Maybe we should release an M4 before enabling deltification in the server?
What issues are left? (just properties?)
Cheers,
-g
--
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: deltification semi-rewrite starting now
Posted by kf...@collab.net.
Branko =?ISO-8859-2?Q?=C8ibej?= <br...@xbc.nu> writes:
> I remember, I remember!
>
> The offsets embedded in the svndiff are stored in a string; these
> offsets would be in the representation. The point is that you get all
> the information you need to select the appropriate windows from the rep
> skel -- without touching a single string. This means a bit more space
> used in the repository, but lots less memory used on the server.
Okay, thanks. We'll see how things work out in implementation --
might or might not end up using this, can't tell yet.
-K
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: deltification semi-rewrite starting now
Posted by Branko Čibej <br...@xbc.nu>.
Branko �ibej wrote:
> kfogel@collab.net wrote:
>
>> But what is REP-OFFSET? We understand the REP-KEY that precedes it.
>> That's simply the representation against whose fulltext this delta
>> applies, right?
>>
> Let me think ... Yes.
>
>> But why would we want an offset into that rep? We
>> had thought the relevant offset(s) are part of the svndiff encoding.
>> Is it a way of magically jumping over a certain number of windows and
>> landing on the right one, in next-most-immediate source
>> representation, or is it something else?
>>
> Although the offset is implicit in the svndiff, in real life you want
> to find the source (fulltext) *before* decoding the window. Also, as I
> noted, you might want to just use a (self-referencing) vdelta compress
> instead of a diff, if the result of the compression is smaller than
> the diff.
>
> Hmm. It's been a long time since I wrote that, and as usual I left
> some of the reasoning out. I'll have to think about this again. I sort
> of remember it had to do with true random access to the text.
I remember, I remember!
The offsets embedded in the svndiff are stored in a string; these
offsets would be in the representation. The point is that you get all
the information you need to select the appropriate windows from the rep
skel -- without touching a single string. This means a bit more space
used in the repository, but lots less memory used on the server.
--
Brane �ibej <br...@xbc.nu> http://www.xbc.nu/brane/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: deltification semi-rewrite starting now
Posted by Branko Čibej <br...@xbc.nu>.
kfogel@collab.net wrote:
>Mike Pilato and I have just reviewed Branko's deltification proposal,
>found at
>
> notes/delta-indexing-and-composition.txt
>
>and like what we see :-). We have a couple of questions that probably
>Branko can answer quickly, but basically we're going to start
>implementing it now, completion anticipated in 2 weeks max (thank
>goodness all the strings/reps separation is already done, so that
>whole wheel doesn't need to be reinvented).
>
>The plan is that we'll also implement a new `svnadmin' subcommand for
>deltifying and undeltifying revisions, or particular paths within
>revisions. That way, administrators have a way to make certain trees
>very efficient to retrieve -- for example, one might want to do this
>to a tagged release -- and also gives us an obvious way to deltify the
>storage of the current svn repository without perturbing the revision
>numbers. :-)
>
>Branko, a couple of questions regarding your lovely design:
>
>>So, here's my proposal
>>
>>1) Change the delta representation to index and store delta windows
>>separately
>>
>> DELTA ::= (("delta" FLAG ...) (OFFSET WINDOW) ...) ;
>> WINDOW ::= DIFF SIZE CHECKSUM [REP-KEY REP-OFFSET] ;
>> OFFSET ::= number ;
>> REP-OFFSET ::= number;
>>
>>
>>The REP-KEY and REP-OFFSET in WINDOW are optional because, if the
>>differences between two file revisions is large enough, the diff could
>>in fact be larger than a compression-only vdelta of the text region. In
>>that case it makes more sense to compress the window than to store a diff.
>>
>
>We're not sure what REP-OFFSET is for.
>
>We're pretty sure we understand OFFSET. It's the offset into the
>reconstructed fulltext. The OFFSETs increase with each WINDOW in a
>DELTA, and you can tell a given window's reconstruction range either
>by adding OFFSET + SIZE, or by subtracting one OFFSET from the next.
>
>Hopefully that's a correct summary. :-)
>
Yes, that is exactly right.
>But what is REP-OFFSET? We understand the REP-KEY that precedes it.
>That's simply the representation against whose fulltext this delta
>applies, right?
>
Let me think ... Yes.
> But why would we want an offset into that rep? We
>had thought the relevant offset(s) are part of the svndiff encoding.
>Is it a way of magically jumping over a certain number of windows and
>landing on the right one, in next-most-immediate source
>representation, or is it something else?
>
Although the offset is implicit in the svndiff, in real life you want to
find the source (fulltext) *before* decoding the window. Also, as I
noted, you might want to just use a (self-referencing) vdelta compress
instead of a diff, if the result of the compression is smaller than the
diff.
Hmm. It's been a long time since I wrote that, and as usual I left some
of the reasoning out. I'll have to think about this again. I sort of
remember it had to do with true random access to the text.
>We're still thinking about this, but maybe you can put us out of our
>misery quickly. :-)
>
Thanks, you just got me worrying about it. :-)
>Also, did you mean
>
> WINDOW ::= (DIFF SIZE CHECKSUM [REP-KEY REP-OFFSET]) ;
>
>i.e., with parens, rather than without? Yes, it would work without
>being a sublist, but for maintainability a sublist might be
>preferable...
>
I meant without params, but obviously it doesn't hurt to make a sublist
out of it. Use whatever you find more aesthetically pleasing. :-)
>Anyway, we can start coding right away, while awaiting clarification.
>Found no holes in the proposal; agree that there is a slight storage
>penalty, but the memory usage and speed gains are so overwhelming that
>it would be petty to complain about the *very* gently-sloped, albeit
>linear, increase in storage per deltified file.
>
Wonderful. Now I /really/ have to dust off and finish the delta combiner.
>The replacing of distant diffs with ones nearer the fulltext is a
>great idea; we'll probably wait on that until after the basic rewrite
>is done, however, as it is an optimization, though a very effective
>one.
>
Yes, it's an optimization only. What's more, it can be done entirely
off-line.
--
Brane �ibej <br...@xbc.nu> http://www.xbc.nu/brane/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org