You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "Hyrum K. Wright" <hy...@mail.utexas.edu> on 2008/04/04 04:32:04 UTC

Re: Rep sharing and circular deltas

Blair Zajac wrote:
> Hyrum K. Wright wrote:
>> I've been poking around on the fs-rep-sharing branch lately to try and
>> determine how we can avoid circular deltas when reusing 
>> representations.  For those unfamiliar with the problem, here's a 
>> brief synopsis. (Apologies if this sounds like a core dump from my 
>> brain...)
>>
>> The purpose of fs-rep-sharing is to eliminate storing duplicate
>> representations in the fs.  Instead of just writing a representation 
>> to the database, we first check a cache of 
>> checksum->representation-key mappings, and see if we can reuse an 
>> existing representation.  This is currently implemented for Berkeley DB.
>>
>> Because the representations themselves are mutable in BDB (i.e., the
>> same representation can be full-text, and later rewritten as a delta 
>> against another full text), a problem arises where we can get circular 
>> delta references.  Consider the following:
>>
>>   +-----+
>>   |     |
>>   |     |
>>   |     |
>>   +-----+
>>     A_1
>>
>> A_1 is the first representation of the object, so we store the full text.
>>
>>   +-----+        +-----+
>>   |     | delta  |     |
>>   |     | ---->  |     |
>>   |     |        |     |
>>   +-----+        +-----+
>>     A_1            A_2
>>
>> A_2 is a subsequent representation of the object, so we store its full
>> text, and store A_1 as a delta against it.  Now, imagine we revert back
>> to the previous text in a later revision.  Without rep-sharing we get:
>>
>>   +-----+       +-----+       +-----+
>>   |     | delta |     | delta |     |
>>   |     | ----> |     | ----> |     |
>>   |     |       |     |       |     |
>>   +-----+       +-----+       +-----+
>>     A_1           A_2           A_3
>>
>> A_3 is now stored full text and A_2 is a delta against A_3, with A_1
>> being unchanged.  However, with rep-sharing, we get:
>>
>>      +-------------------------+
>>      v                         |
>>   +-----+       +-----+        |
>>   |     | delta |     | delta +-+
>>   |     | ----> |     | ----> |o|
>>   |     |       |     |       +-+
>>   +-----+       +-----+
>>     A_1           A_2         A_3
> 
> Why not have A_3 be full text and A_1 refers to A_3.  That way all 
> deltas and references are from a small revision to a large revisions 
> and you can't have loops?

In thinking about this more, it seems like we'd also need a table of 
REP-KEY -> NODE-REVISION mappings to know which node-revisions to update 
with the new full-text representation.  Does that sound right?

-Hyrum