You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by "Hyrum K. Wright" <hy...@mail.utexas.edu> on 2008/03/27 15:45:26 UTC

Rep sharing and circular deltas

I've been poking around on the fs-rep-sharing branch lately to try and
determine how we can avoid circular deltas when reusing representations. 
  For those unfamiliar with the problem, here's a brief synopsis. 
(Apologies if this sounds like a core dump from my brain...)

The purpose of fs-rep-sharing is to eliminate storing duplicate
representations in the fs.  Instead of just writing a representation to 
the database, we first check a cache of checksum->representation-key 
mappings, and see if we can reuse an existing representation.  This is 
currently implemented for Berkeley DB.

Because the representations themselves are mutable in BDB (i.e., the
same representation can be full-text, and later rewritten as a delta 
against another full text), a problem arises where we can get circular 
delta references.  Consider the following:

   +-----+
   |     |
   |     |
   |     |
   +-----+
     A_1

A_1 is the first representation of the object, so we store the full text.

   +-----+        +-----+
   |     | delta  |     |
   |     | ---->  |     |
   |     |        |     |
   +-----+        +-----+
     A_1            A_2

A_2 is a subsequent representation of the object, so we store its full
text, and store A_1 as a delta against it.  Now, imagine we revert back
to the previous text in a later revision.  Without rep-sharing we get:

   +-----+       +-----+       +-----+
   |     | delta |     | delta |     |
   |     | ----> |     | ----> |     |
   |     |       |     |       |     |
   +-----+       +-----+       +-----+
     A_1           A_2           A_3

A_3 is now stored full text and A_2 is a delta against A_3, with A_1
being unchanged.  However, with rep-sharing, we get:

      +-------------------------+
      v                         |
   +-----+       +-----+        |
   |     | delta |     | delta +-+
   |     | ----> |     | ----> |o|
   |     |       |     |       +-+
   +-----+       +-----+
     A_1           A_2         A_3

Where both A_1 and A_2 remain unchanged, and A_3, instead of being a new
representation, just refers to the previously written identical
representation, A_1.  This turns out to be a problem, because A_3 is
really just A_1, which is a delta against A_2, which is a delta against
A_1, which is a delta against A_2, ad nauseam.

See the problem?

The representations don't even have to be this close to each other.  In 
other words, as long as two representations in the same "delta chain" as 
the same, we've got the problem:

      +-------------------------------------------------+
      v                                                 |
   +-----+       +-----+                 +-----+        |
   |     | delta |     | delta     delta |     | delta +-+
   |     | ----> |     | ----> ... ----> |     | ----> |o|
   |     |       |     |                 |     |       +-+
   +-----+       +-----+                 +-----+
     A_1           A_2                    A_n-1        A_n


The solution, would be to *not* rewrite A_n-1 in terms of A_n, but just
create the link to A_1:

      +---------------------------------------------+
      v                                             |
   +-----+       +-----+                 +-----+    |
   |     | delta |     | delta     delta |     |   +-+
   |     | ----> |     | ----> ... ----> |     |   |o|
   |     |       |     |                 |     |   +-+
   +-----+       +-----+                 +-----+
     A_1           A_2                    A_n-1    A_n

This breaks the "delta chain", with A_n-1 remaining a full text 
representation, against which the relevant deltas could apply.  An 
interesting side effect is what happens on the subsequent commit:

      +---------------------------------------------+
      v                                             |
   +-----+       +-----+                 +-----+    |        +-----+
   |     | delta |     | delta     delta |     |   +-+ delta |     |
   |     | ----> |     | ----> ... ----> |     |   |o| ----> |     |
   |     |       |     |                 |     |   +-+       |     |
   +-----+       +-----+                 +-----+             +-----+
     A_1           A_2                    A_n-1    A_n        A_n+1

We represent A_n+1 as full text, and rewrite A_n as a delta against 
that.  But, because A_n is the same representation as A_1, we really get:

      +---------------------------------------------+
      v                                             |
   +-----+       +-----+                 +-----+    |        +-----+
   |     |       |     | delta     delta |     |   +-+       |     |
   |     |--+    |     | ----> ... ----> |     |   |o|   +-> |     |
   |     |  |    |     |                 |     |   +-+   |   |     |
   +-----+  |    +-----+                 +-----+         |   +-----+
     A_1    |      A_2                    A_n-1    A_n   |    A_n+1
            |                 delta                      |
            +--------------------------------------------+

Which works out okay: A_1 still represents the same content, it is just 
being expressed differently.  In a complex repository, where there are 
multiple links to the same representation, that representation would get 
rewritten several times, but the content should *never* change.

So far, the solution looks tenable.  The real problem is determining 
when to not rewrite the previous representation as a delta against a 
linked representation.  In other words, we have to answer the question: 
"Is rep A_1 somehow dependent upon A_n-1?"  I don't know how to do this 
in constant time, and I'm leery of running forward through the delta 
chain to find the full text every time we find a duplicate 
representation.  Are there other alternatives?

(For the curious, we don't have this problem in FSFS.  Because committed 
representations are immutable in FSFS, the representation we are 
referring to always has an appropriate full text for the delta, and we 
know it won't change.  The real barrier on FSFS is creating an efficient 
on-disk cache for checksum->representation mappings.  Hmmmm, SQLite, 
anybody?)

-Hyrum

Re: Rep sharing and circular deltas

Posted by David Glasser <gl...@davidglasser.net>.

On Thu, Mar 27, 2008 at 8:45 AM, Hyrum K. Wright
<hy...@mail.utexas.edu> wrote:
> I've been poking around on the fs-rep-sharing branch lately to try and
>  determine how we can avoid circular deltas when reusing representations.
>   For those unfamiliar with the problem, here's a brief synopsis.
>  (Apologies if this sounds like a core dump from my brain...)
>
>  The purpose of fs-rep-sharing is to eliminate storing duplicate
>  representations in the fs.  Instead of just writing a representation to
>  the database, we first check a cache of checksum->representation-key
>  mappings, and see if we can reuse an existing representation.  This is
>  currently implemented for Berkeley DB.

Have you looked into git's "repacking" algorithm, which is what it
uses to do the equivalent of fs_base's "redeltification"?

--dave

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Rep sharing and circular deltas

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.

Blair Zajac wrote:
> Hyrum K. Wright wrote:
>> I've been poking around on the fs-rep-sharing branch lately to try and
>> determine how we can avoid circular deltas when reusing 
>> representations.  For those unfamiliar with the problem, here's a 
>> brief synopsis. (Apologies if this sounds like a core dump from my 
>> brain...)
>>
>> The purpose of fs-rep-sharing is to eliminate storing duplicate
>> representations in the fs.  Instead of just writing a representation 
>> to the database, we first check a cache of 
>> checksum->representation-key mappings, and see if we can reuse an 
>> existing representation.  This is currently implemented for Berkeley DB.
>>
>> Because the representations themselves are mutable in BDB (i.e., the
>> same representation can be full-text, and later rewritten as a delta 
>> against another full text), a problem arises where we can get circular 
>> delta references.  Consider the following:
>>
>>   +-----+
>>   |     |
>>   |     |
>>   |     |
>>   +-----+
>>     A_1
>>
>> A_1 is the first representation of the object, so we store the full text.
>>
>>   +-----+        +-----+
>>   |     | delta  |     |
>>   |     | ---->  |     |
>>   |     |        |     |
>>   +-----+        +-----+
>>     A_1            A_2
>>
>> A_2 is a subsequent representation of the object, so we store its full
>> text, and store A_1 as a delta against it.  Now, imagine we revert back
>> to the previous text in a later revision.  Without rep-sharing we get:
>>
>>   +-----+       +-----+       +-----+
>>   |     | delta |     | delta |     |
>>   |     | ----> |     | ----> |     |
>>   |     |       |     |       |     |
>>   +-----+       +-----+       +-----+
>>     A_1           A_2           A_3
>>
>> A_3 is now stored full text and A_2 is a delta against A_3, with A_1
>> being unchanged.  However, with rep-sharing, we get:
>>
>>      +-------------------------+
>>      v                         |
>>   +-----+       +-----+        |
>>   |     | delta |     | delta +-+
>>   |     | ----> |     | ----> |o|
>>   |     |       |     |       +-+
>>   +-----+       +-----+
>>     A_1           A_2         A_3
> 
> Why not have A_3 be full text and A_1 refers to A_3.  That way all 
> deltas and references are from a small revision to a large revisions 
> and you can't have loops?

In thinking about this more, it seems like we'd also need a table of 
REP-KEY -> NODE-REVISION mappings to know which node-revisions to update 
with the new full-text representation.  Does that sound right?

-Hyrum

Re: Rep sharing and circular deltas

Posted by Blair Zajac <bl...@orcaware.com>.

Hyrum K. Wright wrote:
> Blair Zajac wrote:
>> Hyrum K. Wright wrote:
>>> I've been poking around on the fs-rep-sharing branch lately to try and
>>> determine how we can avoid circular deltas when reusing 
>>> representations.  For those unfamiliar with the problem, here's a 
>>> brief synopsis. (Apologies if this sounds like a core dump from my 
>>> brain...)
>>>
>>> The purpose of fs-rep-sharing is to eliminate storing duplicate
>>> representations in the fs.  Instead of just writing a representation 
>>> to the database, we first check a cache of 
>>> checksum->representation-key mappings, and see if we can reuse an 
>>> existing representation.  This is currently implemented for Berkeley DB.
>>>
>>> Because the representations themselves are mutable in BDB (i.e., the
>>> same representation can be full-text, and later rewritten as a delta 
>>> against another full text), a problem arises where we can get 
>>> circular delta references.  Consider the following:
>>>
>>>   +-----+
>>>   |     |
>>>   |     |
>>>   |     |
>>>   +-----+
>>>     A_1
>>>
>>> A_1 is the first representation of the object, so we store the full 
>>> text.
>>>
>>>   +-----+        +-----+
>>>   |     | delta  |     |
>>>   |     | ---->  |     |
>>>   |     |        |     |
>>>   +-----+        +-----+
>>>     A_1            A_2
>>>
>>> A_2 is a subsequent representation of the object, so we store its full
>>> text, and store A_1 as a delta against it.  Now, imagine we revert back
>>> to the previous text in a later revision.  Without rep-sharing we get:
>>>
>>>   +-----+       +-----+       +-----+
>>>   |     | delta |     | delta |     |
>>>   |     | ----> |     | ----> |     |
>>>   |     |       |     |       |     |
>>>   +-----+       +-----+       +-----+
>>>     A_1           A_2           A_3
>>>
>>> A_3 is now stored full text and A_2 is a delta against A_3, with A_1
>>> being unchanged.  However, with rep-sharing, we get:
>>>
>>>      +-------------------------+
>>>      v                         |
>>>   +-----+       +-----+        |
>>>   |     | delta |     | delta +-+
>>>   |     | ----> |     | ----> |o|
>>>   |     |       |     |       +-+
>>>   +-----+       +-----+
>>>     A_1           A_2         A_3
>>
>> Why not have A_3 be full text and A_1 refers to A_3.  That way all 
>> deltas and refereneces are from a small revision to a large revisions 
>> and you can't have loops?
> 
> Just so I understand, instead of just linking to an existing 
> representation, we would re-write that representation as full text, and 
> then delta the previous one, business as usual?

Yes.

> That would avoid cycles 
> in the delta graph, and it wouldn't take any more space.  Would it 
> impact other history-related activities?

I don't know.

Blair

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Rep sharing and circular deltas

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.

Blair Zajac wrote:
> Hyrum K. Wright wrote:
>> I've been poking around on the fs-rep-sharing branch lately to try and
>> determine how we can avoid circular deltas when reusing 
>> representations.  For those unfamiliar with the problem, here's a 
>> brief synopsis. (Apologies if this sounds like a core dump from my 
>> brain...)
>>
>> The purpose of fs-rep-sharing is to eliminate storing duplicate
>> representations in the fs.  Instead of just writing a representation 
>> to the database, we first check a cache of 
>> checksum->representation-key mappings, and see if we can reuse an 
>> existing representation.  This is currently implemented for Berkeley DB.
>>
>> Because the representations themselves are mutable in BDB (i.e., the
>> same representation can be full-text, and later rewritten as a delta 
>> against another full text), a problem arises where we can get circular 
>> delta references.  Consider the following:
>>
>>   +-----+
>>   |     |
>>   |     |
>>   |     |
>>   +-----+
>>     A_1
>>
>> A_1 is the first representation of the object, so we store the full text.
>>
>>   +-----+        +-----+
>>   |     | delta  |     |
>>   |     | ---->  |     |
>>   |     |        |     |
>>   +-----+        +-----+
>>     A_1            A_2
>>
>> A_2 is a subsequent representation of the object, so we store its full
>> text, and store A_1 as a delta against it.  Now, imagine we revert back
>> to the previous text in a later revision.  Without rep-sharing we get:
>>
>>   +-----+       +-----+       +-----+
>>   |     | delta |     | delta |     |
>>   |     | ----> |     | ----> |     |
>>   |     |       |     |       |     |
>>   +-----+       +-----+       +-----+
>>     A_1           A_2           A_3
>>
>> A_3 is now stored full text and A_2 is a delta against A_3, with A_1
>> being unchanged.  However, with rep-sharing, we get:
>>
>>      +-------------------------+
>>      v                         |
>>   +-----+       +-----+        |
>>   |     | delta |     | delta +-+
>>   |     | ----> |     | ----> |o|
>>   |     |       |     |       +-+
>>   +-----+       +-----+
>>     A_1           A_2         A_3
> 
> Why not have A_3 be full text and A_1 refers to A_3.  That way all 
> deltas and refereneces are from a small revision to a large revisions 
> and you can't have loops?

Just so I understand, instead of just linking to an existing 
representation, we would re-write that representation as full text, and 
then delta the previous one, business as usual?  That would avoid cycles 
in the delta graph, and it wouldn't take any more space.  Would it 
impact other history-related activities?

-Hyrum

Re: Rep sharing and circular deltas

Posted by Blair Zajac <bl...@orcaware.com>.

Hyrum K. Wright wrote:
> I've been poking around on the fs-rep-sharing branch lately to try and
> determine how we can avoid circular deltas when reusing representations. 
>  For those unfamiliar with the problem, here's a brief synopsis. 
> (Apologies if this sounds like a core dump from my brain...)
> 
> The purpose of fs-rep-sharing is to eliminate storing duplicate
> representations in the fs.  Instead of just writing a representation to 
> the database, we first check a cache of checksum->representation-key 
> mappings, and see if we can reuse an existing representation.  This is 
> currently implemented for Berkeley DB.
> 
> Because the representations themselves are mutable in BDB (i.e., the
> same representation can be full-text, and later rewritten as a delta 
> against another full text), a problem arises where we can get circular 
> delta references.  Consider the following:
> 
>   +-----+
>   |     |
>   |     |
>   |     |
>   +-----+
>     A_1
> 
> A_1 is the first representation of the object, so we store the full text.
> 
>   +-----+        +-----+
>   |     | delta  |     |
>   |     | ---->  |     |
>   |     |        |     |
>   +-----+        +-----+
>     A_1            A_2
> 
> A_2 is a subsequent representation of the object, so we store its full
> text, and store A_1 as a delta against it.  Now, imagine we revert back
> to the previous text in a later revision.  Without rep-sharing we get:
> 
>   +-----+       +-----+       +-----+
>   |     | delta |     | delta |     |
>   |     | ----> |     | ----> |     |
>   |     |       |     |       |     |
>   +-----+       +-----+       +-----+
>     A_1           A_2           A_3
> 
> A_3 is now stored full text and A_2 is a delta against A_3, with A_1
> being unchanged.  However, with rep-sharing, we get:
> 
>      +-------------------------+
>      v                         |
>   +-----+       +-----+        |
>   |     | delta |     | delta +-+
>   |     | ----> |     | ----> |o|
>   |     |       |     |       +-+
>   +-----+       +-----+
>     A_1           A_2         A_3

Why not have A_3 be full text and A_1 refers to A_3.  That way all deltas and 
refereneces are from a small revision to a large revisions and you can't have loops?

Blair


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Rep sharing and circular deltas

Posted by "C. Michael Pilato" <cm...@collab.net>.

Karl Fogel wrote:
>> being unchanged.  However, with rep-sharing, we get:
>>
>>      +-------------------------+
>>      v                         |
>>   +-----+       +-----+        |
>>   |     | delta |     | delta +-+
>>   |     | ----> |     | ----> |o|
>>   |     |       |     |       +-+
>>   +-----+       +-----+
>>     A_1           A_2         A_3
>>
>> Where both A_1 and A_2 remain unchanged, and A_3, instead of being a new
>> representation, just refers to the previously written identical
>> representation, A_1.  This turns out to be a problem, because A_3 is
>> really just A_1, which is a delta against A_2, which is a delta against
>> A_1, which is a delta against A_2, ad nauseam.
>>
>> See the problem?
> 
> Well, no, wait.  Under a sharing scheme, if when you add a string you
> discover it has the same (fulltext) checksum as some existing string,
> then no new deltification happens -- instead, the new string is just a
> reference to the existing string.  No cycle danger here.
> 
> In other words, both A1 and A3 would be deltas against A2.
> 
> I didn't read the rest of the mail closely, because it seemed to be
> solving a problem whose existence I'm denying here :-).

Deltification happens in a separate phase of the commit, Karl.  By the time 
we perform deltification, we've completely forgotten that we recycled a 
representation/string somewhere along the way and should avoid deltifying 
against that object.

Trust me, this problem *does* exist.  Compile the fs-rep-sharing code with 
BDB support right now, and run the test suite.  But, uh, save your Emacs 
buffer state before doing so.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Re: Rep sharing and circular deltas

Posted by "C. Michael Pilato" <cm...@collab.net>.

Hyrum K. Wright wrote:
> Also, now that I think about it, does this break the "most recent 
> node-revision is full text" characteristic of the BDB backend?  Does it 
> matter?

I'm not sure that was a major feature of the BDB backend design.  A 
side-effect, sure, but not something we were banking on.  Consider the way 
branches work today.  If you copy /trunk to some branch and start committing 
on a file on the branch, that file's representation on /trunk will no longer 
be fulltext (because while it is at the tip of /trunk, it *also* is in the 
non-tip lineage of its branch location).  So, it was never the case that 
HEAD was necessarily all fulltexts.  The most recent node-revision in a line 
of history, yes, but who knows where that might be path-wise.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Re: Rep sharing and circular deltas

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.

Karl Fogel wrote:
> "Hyrum K. Wright" <hy...@mail.utexas.edu> writes:
>> I've been poking around on the fs-rep-sharing branch lately to try and
>> determine how we can avoid circular deltas when reusing
>> representations. For those unfamiliar with the problem, here's a brief
>> synopsis. (Apologies if this sounds like a core dump from my brain...)
>>
>> The purpose of fs-rep-sharing is to eliminate storing duplicate
>> representations in the fs.  Instead of just writing a representation
>> to the database, we first check a cache of
>> checksum->representation-key mappings, and see if we can reuse an
>> existing representation.  This is currently implemented for Berkeley
>> DB.
>>
>> Because the representations themselves are mutable in BDB (i.e., the
>> same representation can be full-text, and later rewritten as a delta
>> against another full text), a problem arises where we can get circular
>> delta references.  Consider the following:
> 
> Sanity check: are you using the word "representation" for what the
> repository (at least BDB) code calls "strings"?  Because there's
> something else in there called "representations", but they're not
> deltified -- they're just a level of indirection between the filesystem
> and the file contents.
> 
> I'll assume you're talking about "strings" from here on.

No, I'm pretty sure I was talking about "representations".  :)  Strings 
are just blobs of bits, which have no knowledge of their relationships 
to other strings, their checksums, deltas, etc.  Representations 
reference strings, but include other information so that a full text of 
the representation can be reconstructed.  Currently, we're trying to 
allow multiple node-revisions to refer to the same representation, not 
multiple representations to refer to the same string.

I think the rest of your analysis holds, though.

>>   +-----+
>>   |     |
>>   |     |
>>   |     |
>>   +-----+
>>     A_1
>>
>> A_1 is the first representation of the object, so we store the full text.
>>
>>   +-----+        +-----+
>>   |     | delta  |     |
>>   |     | ---->  |     |
>>   |     |        |     |
>>   +-----+        +-----+
>>     A_1            A_2
>>
>> A_2 is a subsequent representation of the object, so we store its full
>> text, and store A_1 as a delta against it.  Now, imagine we revert back
>> to the previous text in a later revision.  Without rep-sharing we get:
>>
>>   +-----+       +-----+       +-----+
>>   |     | delta |     | delta |     |
>>   |     | ----> |     | ----> |     |
>>   |     |       |     |       |     |
>>   +-----+       +-----+       +-----+
>>     A_1           A_2           A_3
>>
>> A_3 is now stored full text and A_2 is a delta against A_3, with A_1
>> being unchanged.  However, with rep-sharing, we get:
>>
>>      +-------------------------+
>>      v                         |
>>   +-----+       +-----+        |
>>   |     | delta |     | delta +-+
>>   |     | ----> |     | ----> |o|
>>   |     |       |     |       +-+
>>   +-----+       +-----+
>>     A_1           A_2         A_3
>>
>> Where both A_1 and A_2 remain unchanged, and A_3, instead of being a new
>> representation, just refers to the previously written identical
>> representation, A_1.  This turns out to be a problem, because A_3 is
>> really just A_1, which is a delta against A_2, which is a delta against
>> A_1, which is a delta against A_2, ad nauseam.
>>
>> See the problem?
> 
> Well, no, wait.  Under a sharing scheme, if when you add a string you
> discover it has the same (fulltext) checksum as some existing string,
> then no new deltification happens -- instead, the new string is just a
> reference to the existing string.  No cycle danger here.
> 
> In other words, both A1 and A3 would be deltas against A2.
> 
> I didn't read the rest of the mail closely, because it seemed to be
> solving a problem whose existence I'm denying here :-).

That makes sense.  It seems to be an issue with the current 
implementation where deltification of the previous representation 
happens before we know if there is an existing representation for the 
content we are interested in.  We'd need to we order those steps, so 
that the first can be conditional on the second.

Also, now that I think about it, does this break the "most recent 
node-revision is full text" characteristic of the BDB backend?  Does it 
matter?

-Hyrum

Re: Rep sharing and circular deltas

Posted by Karl Fogel <kf...@red-bean.com>.

"Hyrum K. Wright" <hy...@mail.utexas.edu> writes:
> I've been poking around on the fs-rep-sharing branch lately to try and
> determine how we can avoid circular deltas when reusing
> representations. For those unfamiliar with the problem, here's a brief
> synopsis. (Apologies if this sounds like a core dump from my brain...)
>
> The purpose of fs-rep-sharing is to eliminate storing duplicate
> representations in the fs.  Instead of just writing a representation
> to the database, we first check a cache of
> checksum->representation-key mappings, and see if we can reuse an
> existing representation.  This is currently implemented for Berkeley
> DB.
>
> Because the representations themselves are mutable in BDB (i.e., the
> same representation can be full-text, and later rewritten as a delta
> against another full text), a problem arises where we can get circular
> delta references.  Consider the following:

Sanity check: are you using the word "representation" for what the
repository (at least BDB) code calls "strings"?  Because there's
something else in there called "representations", but they're not
deltified -- they're just a level of indirection between the filesystem
and the file contents.

I'll assume you're talking about "strings" from here on.

>   +-----+
>   |     |
>   |     |
>   |     |
>   +-----+
>     A_1
>
> A_1 is the first representation of the object, so we store the full text.
>
>   +-----+        +-----+
>   |     | delta  |     |
>   |     | ---->  |     |
>   |     |        |     |
>   +-----+        +-----+
>     A_1            A_2
>
> A_2 is a subsequent representation of the object, so we store its full
> text, and store A_1 as a delta against it.  Now, imagine we revert back
> to the previous text in a later revision.  Without rep-sharing we get:
>
>   +-----+       +-----+       +-----+
>   |     | delta |     | delta |     |
>   |     | ----> |     | ----> |     |
>   |     |       |     |       |     |
>   +-----+       +-----+       +-----+
>     A_1           A_2           A_3
>
> A_3 is now stored full text and A_2 is a delta against A_3, with A_1
> being unchanged.  However, with rep-sharing, we get:
>
>      +-------------------------+
>      v                         |
>   +-----+       +-----+        |
>   |     | delta |     | delta +-+
>   |     | ----> |     | ----> |o|
>   |     |       |     |       +-+
>   +-----+       +-----+
>     A_1           A_2         A_3
>
> Where both A_1 and A_2 remain unchanged, and A_3, instead of being a new
> representation, just refers to the previously written identical
> representation, A_1.  This turns out to be a problem, because A_3 is
> really just A_1, which is a delta against A_2, which is a delta against
> A_1, which is a delta against A_2, ad nauseam.
>
> See the problem?

Well, no, wait.  Under a sharing scheme, if when you add a string you
discover it has the same (fulltext) checksum as some existing string,
then no new deltification happens -- instead, the new string is just a
reference to the existing string.  No cycle danger here.

In other words, both A1 and A3 would be deltas against A2.

I didn't read the rest of the mail closely, because it seemed to be
solving a problem whose existence I'm denying here :-).

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org