You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Bill Tutt <ra...@lyra.org> on 2002/05/29 23:34:41 UTC

RE: Re: svnadmin deltify that evil thing



> From: cmpilato@collab.net [mailto:cmpilato@collab.net]
> 
> <<< Bringing this to the list so we can get more heads on it >>>
> 
> "Bill Tutt" <ra...@lyra.org> writes:
> 
> > As I said in AIM TxnIDs don't have any required ordering. They could
be
> > GUIDs, or reused integers, or whatever. The TxnIDs could also be
> > pending, so we can't use that as a guide.
> 
> There's no reason why we can't state that TxnIDs and CopyIDs have
> ordering, though.  Because they do have ordering right now -- they are
> implemented as base-36 numbers ordered by the order in which they were
> created.
> 
> I'll come back to this.
> 
> > I think three straightforward and clean options are:
> > * Add "Deltified against NodeRevisionPK" to the DELTA skel
definition.
> 
> I don't see the usefulness of this.  We won't have DELTA skels until
> *after* we deltify, and we won't be deltifying until *after* we find
> out what to deltify against.  And finding out what to deltify against
> is the whole issue here. :-)
> 
> > * Or, make svnadmin deltify's syntax look like this:
> >   svnadmin deltify deltasrc deltarev srcpath srcrev
> 
> This won't work, either.  The point of `svnadmin deltify' is force a
> path (perhaps recursively) 
> to be stored as a delta (provided it is
> space-wise more efficient to do so), instead of as fulltext.  Your new
> calling semantics break down when:
> 
>    - some path A in DELTAREV didn't change in SRCPATH (the most common
>      case, by far).
> 

Nothing wrong with that. The delta is empty, and we don't have any work
to do, oh darn.
 
>    - some path A in DELTAREV doesn't exist in SRCPATH, but was copied
>      elsewhere, or revived in a younger revision (this is an
>      admittedly weird case)
> 

This indeed is annoying, let's not bother letting them recurse. Let's
just deltify the item that they told us to deltify. You could also
require that the NodeIDs of that the (DELTAPATH, DELTAREV) and (SRCPATH,
SRCREV) are equivalent.

> > * Realize that people won't care about this very often, let them
suffer
> >   and code as above.
> 
> Perhaps this should read: Realize that most people won't care about
> having this functionality at all, and do away with `svnadmin
> deltify/undeltify' altogether.  :-)  I almost like the sound of that!
> 

Works for me. Datastores should be as self-maintaining as possible.
External tuning knobs are bad.

> > Given those above options and facts, I'd pick #2 atm. I think we
have
> > much more important thigns for you to work on than svnadmin deltify.
> >
> > If we ever decided to do skip lists based on perf data for our
combiner
> > we'd need to add #1 in any event, but let's wait until that time
comes
> > for that.
> 
> Now, back to granting an order to TxnIDs and CopyIds.  If we can
> loosen their *definition* to match *reality*, we will gain a really
> straightforward algorithm for finding successors, the one I described
> on IM.
> 
> We know the following things:
> 
>    - To get a successor, you must either clone an immutable node, or
>      copy an immutable node, therefore
> 
>    - No successor of node revision NR has a TXN_ID < NR's TXN_ID, and
> 

Actually this can be false for directories, because of auto-merging.
e.g.:
User A begins TXN1, which contains a modification to /foo. (adds file A)
User B begins TXN2, which contains a modification to /foo. (adds file B)
User B commits TXN2.
User A commits TXN1.

Auto-merging causes this TXNID ordering oddity.

However, this property does hold in the face of a subsequent copy.
(differing CopyIDs)

>    - No successor of node revision NR has a COPY_ID < NR's COPY_ID.
> 

If CopyIDs are monotonically increasing integers this is true. 

>    - Also, the `nodes' table is sorted first by NODE_ID, then by
>      COPY_ID, and then by TXN_ID (as described in `structure').
> 
> What we do not know:
> 
>    - The relative "age" of any two successors of a NR
> 

Not without also looking at the associated Repository Revisions that's
correct.

> As it turns out, for the purposes of svn_fs_deltify, we don't need
> *the oldest successor*.  We just need a successor to deltify against,
> and we don't want to look too far (though, technically speaking, we
> don't even *really* need a successor...you could deltify against
> anything you want to).  I think the following algorithm will get us a
> successor for NR, which is likely (though not guaranteed) to be
> relatively "near" NR.
> 
>    Setup a cursor on NR's node revision id in the `nodes' table;
>    Advance cursor to next row;
>    THIS_NR = Current cursor location;
>    If THIS_NR.NodeId != NR.NodeId:
>       /* unrelated node, no more node revisions of NR */
>       return FAILURE;
>    If THIS_NR.CopyId == NR.CopyId:
>       && THIS_NR.TxnId is not a pending transaction:
>       /* same node_id, same copy_id, must be different (older!) txn_id
*/
I think you mean younger or older txn_id due to auto-merging, but yes,
it will certainly work as the source of a delta. :)

>       return SUCCESS, THIS_NR;
>    ELSE:
>       DO:
>          IF THIS_NR.TxnId > NR.TxnId:
This is implied by the data model, and the column ordering of the table.
i.e.:
having /foo with NodeID1 have the following data in the repository is
illegal:
NodeID1.C1.T1
NodeID1.C2.T1 

That is: While NodeID.CopyID.TxnID is indeed a unique key for the entire
table. The number of rows that have a given NodeID.TxnID with the same
CopyID is 0.

>          && THIS_NR.TxnId is not a pending transaction:
>             /* same node_id, older copy_id, older txn_id */
>             return SUCCESS, THIS_NR;
>          Advance cursor to next row;
>          THIS_NR = Current cursor location;
>       WHILE (THIS_NR.NodeId == NR.NodeId)
>    return FAILURE;
> 

To sum up: Yes, your algorithm works, but it might incorrectly fail,
where mine never will. The major simplification of your algorithm is
that it doesn't have to deal with calculating the appropriate successor
CopyID. That doesn't seem simplification enough to bother not using the
ordering.

> However, I realize that adding ordering to those IDs is probably not a
> popular thought.  :-)
> 

Indeed, this is so. :)

Bill



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svnadmin deltify that evil thing (SQL internal deltification branch, Alpha company)

Posted by "Glenn A. Thompson" <gt...@cdr.net>.

>

Hey,

>
>
> Subversion is complex.
> Lather, rinse repeat. :)
>

Or the newbie variation:
What the hell have I gotten myself into.
Later, rinse, repeat :-)

A new thread ?!?

OK so this SQL implementation is going to be a little harder for me than I
thought.  No problemo. I will keep working it.
Anyway, all these deltification discussions have made me feel I should bring
to light something I've been playing with.
As I'm still a Subversion Feebe, I would have waited until I better
understood subversion if it weren't for this thread.

Anyway,
For a SQL scheme it seems that having directories and properties rolled up
into a delta ball made adhoc queries difficult at best.
So I'm kicking around the idea of list data being stored in one of two
additional tables.  I'm not happy with the names but for now who cares.

Hanging off the Reps table (perhaps Revs?) are collectionAction (for
properties) and nodeCollectionAction (for directories).  Think of a row in
these tables as a method invocation on a list object.  Two methods would be
supported; "add" and "delete".

The tables look something like this:
collectionAction (
coll_id        large number,
action         set("add","delete"),
item_name  text,
item_value   text
)

nodeCollectionAction (
coll_id        largenumber,
action         set("add","delete"),
chld_nod    largenumber,
chld_cpy    largenumber,
chld_txn     largenumber
)


There also needs to be a flag somewhere which indicates that a particular
node-rev is a complete rev with respect to the list.  The moral equiv of
full text if you will.  As such it would contain only add rows for a given
data_id.  These "baselines" could be manufactured anywhere throughout a
lists history.  I also considered having a "has" action which makes this
more obvious.

So the current items in a list would be derived by applying "adds" and
"deletes" to a complete rev using SQL statments.  I assumed that I could use
the nodes predecessor to go back in time.  However, based on what I'm
reading that would be badness. So should I have it explicitly based on a
rep_id and so on and so on?  That seems to track more closely with deltas.
This would also allow me to use a "group by" clause using MAX(coll_id) to
resolve "add" or "delete" conflicts involving the same node-rev for
nodeCollectionAction or item_name in collectionAction.

These heirarchical queries are quite uuuuuuugggly without "connect by".
Does anyone know of any DB besides Oracle that has them?  Can you say temp
tables?

Does anyone have any major grief with the above chit?

Thanks,
gat





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Re: svnadmin deltify that evil thing

Posted by Bill Tutt <ra...@lyra.org>.

> From: cmpilato@collab.net [mailto:cmpilato@collab.net]
> 
> "Bill Tutt" <ra...@lyra.org> writes:
> 
> > > This won't work, either.  The point of `svnadmin deltify' is force
a
> > > path (perhaps recursively)
> > > to be stored as a delta (provided it is
> > > space-wise more efficient to do so), instead of as fulltext.  Your
new
> > > calling semantics break down when:
> > >
> > >    - some path A in DELTAREV didn't change in SRCPATH (the most
common
> > >      case, by far).
> > >
> >
> > Nothing wrong with that. The delta is empty, and we don't have any
work
> > to do, oh darn.
> 
> Hm.  Weinie's way out.  :-(
> 
> > >    - some path A in DELTAREV doesn't exist in SRCPATH, but was
copied
> > >      elsewhere, or revived in a younger revision (this is an
> > >      admittedly weird case)
> > >
> >
> > This indeed is annoying, let's not bother letting them recurse.
Let's
> > just deltify the item that they told us to deltify. You could also
> > require that the NodeIDs of that the (DELTAPATH, DELTAREV) and
(SRCPATH,
> > SRCREV) are equivalent.
> 
> Well, yeah, we could do that, I suppose (not recurse, that is).  At
> this stage, though, the "tool" is becoming so useless that its
> existence is questionable.
> 

Yeah, we said that in the last email didn't we? :)
Let's toss it!

> > > > * Realize that people won't care about this very often, let them
> suffer
> > > >   and code as above.
> > >
> > > Perhaps this should read: Realize that most people won't care
about
> > > having this functionality at all, and do away with `svnadmin
> > > deltify/undeltify' altogether.  :-)  I almost like the sound of
that!
> > >
> >
> > Works for me. Datastores should be as self-maintaining as possible.
> > External tuning knobs are bad.
> 
> :-) The problem is that without a delta combiner, people like me run
> out of stack trying to undeltify some old revisions.  The way I see
> it, we either need an amazing external deltifier, or we should
> internally do our deltification in a less prone-to-infinite-growth
> fashion.  Perhaps this is where the skip-lists come into play.
> 

Possibly, although my take on it (if we don't have a delta combiner) for
1.0 would be: Screw it, pretend we're Vesta and just don't deltify
anything. As Vesta correctly points out: disk space is cheap, why
complicate something needlessly? ;) I'm sure there are are lots of ther
things that need help besides just undeltification performance work. I'd
honestly only want to try the skip-list stuff if we figured out that a
delta-combiner just wasn't possible with svndiff.

> > > We know the following things:
> > >
> > >    - To get a successor, you must either clone an immutable node,
or
> > >      copy an immutable node, therefore
> > >
> > >    - No successor of node revision NR has a TXN_ID < NR's TXN_ID,
and
> > >
> >
> > Actually this can be false for directories, because of auto-merging.
> > e.g.:
> > User A begins TXN1, which contains a modification to /foo. (adds
file A)
> > User B begins TXN2, which contains a modification to /foo. (adds
file B)
> > User B commits TXN2.
> > User A commits TXN1.
> >
> > Auto-merging causes this TXNID ordering oddity.
> 
> This situation currently doesn't make my assertion false (and now I'm
> thinking that might be bad).
> 

You're algorithm works, but it can generate false failures because your
assertion is bad. (i.e. the only possible successor might be on a lower
TxnID)

> Currently, the result of this operation is that User A's /foo will
> have a predecessor of the /foo in HEAD at the time TXN1 begins.  User
> B's /foo will have the same predecessor as User A's.  

No, auto-merging should commit the predecessor nodes correctly. 
i.e.:

When TXN2 (User B) commits the predecessor is /foo in HEAD at the time
TXN1 begins.

When TXN1 (User A) commits, it's predecessor starts out as /foo in HEAD
at the time TXN1 begins, but gets fixed up by auto-merging to be /foo in
HEAD at the time TXN2 commits.

Whenever I say "auto-merging" I mean the fact that directories have
looser up-to-date-ness rules than files. i.e. the infamous merge()
routine in libsvn_fs\tree.c.

> Regardless of
> the commit order, A's /foo is not a successor of B's /foo, nor
> vice-versa.  So, given the ordering in your example, asking A for it's
> closest successor will return "there ain't one".  Same for B.
> 
> Um...this is incorrect behavior, isn't it?
> 

If the code really did this then yeah, it'd be incorrect, yes.
Thankfully, it doesn't.

The reason this works is because the auto-merging of directory changes
into HEAD when TXN1 is commiting causes newly mutable NodeRevisions to
be created to be the target of the merge() code in tree.c. And of
course, creating a newly mutable NodeRevision initializes the
predecessor correctly. It just so happens that this new NodeRevision has
a TXNID of TXN1. Have I mentioned recently how the word Transaction and
Perforce Changeset concepts are practically synonymous now in the data
model? :)

Subversion is complex.
Lather, rinse repeat. :)

Am I making sense now?
Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svnadmin deltify that evil thing

Posted by cm...@collab.net.

"Bill Tutt" <ra...@lyra.org> writes:

> > This won't work, either.  The point of `svnadmin deltify' is force a
> > path (perhaps recursively) 
> > to be stored as a delta (provided it is
> > space-wise more efficient to do so), instead of as fulltext.  Your new
> > calling semantics break down when:
> > 
> >    - some path A in DELTAREV didn't change in SRCPATH (the most common
> >      case, by far).
> > 
> 
> Nothing wrong with that. The delta is empty, and we don't have any work
> to do, oh darn.

Hm.  Weinie's way out.  :-(  

> >    - some path A in DELTAREV doesn't exist in SRCPATH, but was copied
> >      elsewhere, or revived in a younger revision (this is an
> >      admittedly weird case)
> > 
> 
> This indeed is annoying, let's not bother letting them recurse. Let's
> just deltify the item that they told us to deltify. You could also
> require that the NodeIDs of that the (DELTAPATH, DELTAREV) and (SRCPATH,
> SRCREV) are equivalent.

Well, yeah, we could do that, I suppose (not recurse, that is).  At
this stage, though, the "tool" is becoming so useless that its
existence is questionable.

> > > * Realize that people won't care about this very often, let them suffer
> > >   and code as above.
> > 
> > Perhaps this should read: Realize that most people won't care about
> > having this functionality at all, and do away with `svnadmin
> > deltify/undeltify' altogether.  :-)  I almost like the sound of that!
> > 
> 
> Works for me. Datastores should be as self-maintaining as possible.
> External tuning knobs are bad.

:-) The problem is that without a delta combiner, people like me run
out of stack trying to undeltify some old revisions.  The way I see
it, we either need an amazing external deltifier, or we should
internally do our deltification in a less prone-to-infinite-growth
fashion.  Perhaps this is where the skip-lists come into play.

> > We know the following things:
> > 
> >    - To get a successor, you must either clone an immutable node, or
> >      copy an immutable node, therefore
> > 
> >    - No successor of node revision NR has a TXN_ID < NR's TXN_ID, and
> > 
> 
> Actually this can be false for directories, because of auto-merging.
> e.g.:
> User A begins TXN1, which contains a modification to /foo. (adds file A)
> User B begins TXN2, which contains a modification to /foo. (adds file B)
> User B commits TXN2.
> User A commits TXN1.
> 
> Auto-merging causes this TXNID ordering oddity.

This situation currently doesn't make my assertion false (and now I'm
thinking that might be bad).

Currently, the result of this operation is that User A's /foo will
have a predecessor of the /foo in HEAD at the time TXN1 begins.  User
B's /foo will have the same predecessor as User A's.  Regardless of
the commit order, A's /foo is not a successor of B's /foo, nor
vice-versa.  So, given the ordering in your example, asking A for it's
closest successor will return "there ain't one".  Same for B.

Um...this is incorrect behavior, isn't it?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: RE: Re: svnadmin deltify that evil thing

Posted by Bill Tutt <ra...@lyra.org>.

> >
> >    Setup a cursor on NR's node revision id in the `nodes' table;

If you could setup a cursor for NR's "NodeID.CopyID." with no
TransactionID specified, then you could search for > TxnIDs, and that
would remove the false positive case at the possible cost of having to
search more rows looking for a > TxnID case.

Still doesn't seem like you're gaining enough
benefit/simplification/speedup to do it this way.

Bill


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org