You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Karl Fogel <kf...@galois.collab.net> on 2001/02/21 15:51:57 UTC

question about fs node IDs

A while ago, Greg Hudson asked (in a message I can't seem to find) why
it's useful to generate IDs of successor nodes in such a way that you
can tell what they succeeded merely by examining the IDs.  This
paragraph in `libsvn_fs/structure' is about that:

   Since nodal revision numbers increase by one each time a delta is
   added, we can compute how many deltas separate two related node
   revisions simply by comparing their ID's.  For example, the
   distance between 100.10.3.2 and 100.12 is the distance from
   100.10.3.2 to their common ancestor, 100.10 (two deltas), plus the
   distance from 100.10 to 100.12 (two deltas).

Jim answered Greg, saying (paraphrased) "Try writing the deltification
code and I think you'll see why it comes in handy."

I'm wondering in exactly what circumstance it's handy.  In particular,
`structure' documents one kind of node representation:

   ("younger" DELTA CHECKSUM)
       We store a delta that transforms the next younger revision of the
       node into this revision.  To find the contents of this revision:
       - We find the unparsed form of the NODE-REVISION skel for the next
         younger revision.
       - We apply DELTA to that string, yielding the unparsed form of the
         NODE-REVISION skel for this revision.

I can see how, given this representation, being able to deduce the ID
of the next-younger node is really handy.  However, for a long time
I've been thinking that a more general representation would be:

   ("delta" BASE-ID DELTA CHECKSUM)
       We store a delta that transforms the contents node BASE-ID into
       this revision.  To find the contents of this revision:
       - We get the unparsed form of the NODE-REVISION skel for the
         node identified by BASE-ID (it may itself require undeltification)
       - We apply DELTA to that string, yielding the unparsed form of the
         NODE-REVISION skel for this revision.

This would leave the door open for arbitrary efficiencies later; for
example, if the filesystem somehow finds out that this node is only a
tiny diff away from some other (perhaps unrelated) node, it could
replace this node's representation with a delta of the above form.

If we used such a representation, is there still an advantage to the
current ID scheme?  (Note: I'm not sure there any disadvantages to it
either, and it does preserve lineage information, so I think I'd still
be uncomfortable changing it.  But as we get deeper into the
filesystem, we'll probably want a clear & complete understanding of
what advantages it brings us.)

-K

Re: question about fs node IDs

Posted by Kevin Pilch-Bisson <ke...@pilch-bisson.net>.

On Wed, Feb 21, 2001 at 09:51:57AM -0600, Karl Fogel wrote:
> A while ago, Greg Hudson asked (in a message I can't seem to find) why
> it's useful to generate IDs of successor nodes in such a way that you
> can tell what they succeeded merely by examining the IDs.  This
> paragraph in `libsvn_fs/structure' is about that:
> 
>    Since nodal revision numbers increase by one each time a delta is
>    added, we can compute how many deltas separate two related node
>    revisions simply by comparing their ID's.  For example, the
>    distance between 100.10.3.2 and 100.12 is the distance from
>    100.10.3.2 to their common ancestor, 100.10 (two deltas), plus the
>    distance from 100.10 to 100.12 (two deltas).
> 
> Jim answered Greg, saying (paraphrased) "Try writing the deltification
> code and I think you'll see why it comes in handy."
> 
> I'm wondering in exactly what circumstance it's handy.  In particular,
> `structure' documents one kind of node representation:
> 
>    ("younger" DELTA CHECKSUM)
>        We store a delta that transforms the next younger revision of the
>        node into this revision.  To find the contents of this revision:
>        - We find the unparsed form of the NODE-REVISION skel for the next
>          younger revision.
>        - We apply DELTA to that string, yielding the unparsed form of the
>          NODE-REVISION skel for this revision.
> 
> I can see how, given this representation, being able to deduce the ID
> of the next-younger node is really handy.  However, for a long time
> I've been thinking that a more general representation would be:
> 
>    ("delta" BASE-ID DELTA CHECKSUM)
>        We store a delta that transforms the contents node BASE-ID into
>        this revision.  To find the contents of this revision:
>        - We get the unparsed form of the NODE-REVISION skel for the
>          node identified by BASE-ID (it may itself require undeltification)
>        - We apply DELTA to that string, yielding the unparsed form of the
>          NODE-REVISION skel for this revision.
> 
> This would leave the door open for arbitrary efficiencies later; for
> example, if the filesystem somehow finds out that this node is only a
> tiny diff away from some other (perhaps unrelated) node, it could
> replace this node's representation with a delta of the above form.

I like this idea much better than the "younger" form, which as I see it,
is a holdover from the days of RCS.  Why should we only be able to have
deltas with respect to neighbourly revisions? 

> 
> If we used such a representation, is there still an advantage to the
> current ID scheme?  (Note: I'm not sure there any disadvantages to it
> either, and it does preserve lineage information, so I think I'd still
> be uncomfortable changing it.  But as we get deeper into the
> filesystem, we'll probably want a clear & complete understanding of
> what advantages it brings us.)
> 
Not having been too involved in the fs code, I am not sure about
remaining advantages, but like you, I also can't see any disadvantages.

All in all, I think this is a good suggestion.

> -K

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kevin Pilch-Bisson                    http://www.pilch-bisson.net
     "Historically speaking, the presences of wheels in Unix
     has never precluded their reinvention." - Larry Wall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~