You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Tom Lord <lo...@regexps.com> on 2002/12/16 09:51:45 UTC

revnum considered harmful




I think I see a flaw in the semantic design of svn (revnums) that I
believe is likely to impose a serious limit on performance in the
future, when people try to scale svn for large but realistic
situations:

revnum imposes a total order on all write transactions.

If I'm reading the code correctly, the global revision number of a
write transaction is determined early in the transaction -- before
most of the work is done (I think the ra_svn protocol calls this
step `target_rvn').

Let's suppose that we have two concurrent write transactions.  One of
these precedes (by revnum) the other and which of the two that is is
determined before the scope of either transaction is known.

Let's suppose that the first of two concurrent transactions is a
large transaction, the second a short transaction.

The short transaction can not reasonably complete until the long
transaction completes.  Consider:


	client A 	long txn	revision N
	client B	short txn	revision N+1

	Timeline:

	0... A starts ..... B starts ..... B ready to finish ..... ->

	If B completes with A still running, client B expects to be
	able to query the repository at revision N+1.  But the part of
	the repository written by A is not yet known because it is not
	yet known whether A will succeed or fail.  While other clients
	might be limited to seeing only N-1 until the fate of A is
	known, client B expects to see the result of its own write.
	(This invites an alternative conclusion that while B can
	complete early, certain subsequent reads concurrent with A
	must be delayed -- but that alternative doesn't substantially
	change the conclusions reached in this message.)
	

	If B completes, aborting A, then time already spent on A
	is needlessly wasted.

In general, the time of completion of each of a set of concurrent
writes is forced, forever, by the revnum design, to be the time of
completion of the latest-finishing transaction in that set.

Consider a shop with O(100) developers and a busy phase of
development: lots of checkins over just a few days.  Some checkins
modify just a few files, others modify hundreds or thousands.

Because of revnum, whenever someone begins to checkin a large
modification, the repository "freezes up" for all other committers
until that large checkin completes.

This is by no means a _necessary_ state of affairs.  Consider this
usage scenario: we have a repository that holds multiple projects and
branches.  Our two transactions each modify a different project or
branch.  Thus, in terms of the revision control data that users care
about, the ordering of these two transactions is immaterial because
they modify disjoint sets of data.  If we must regard them as
well-ordered, then we may as well let the order be determined by which
transaction _completes_ first, not which one starts first.

It would seem natural (at least under some conditions which can be
easily detected by the server) to let the short transaction complete
before the longer -- on the reasonable presumption that the longer
transaction will not overlap that data.  But no -- the revnum semantic
will not allow that because the server does not know in advance that
the two transactions do not overlap.

Several solutions are possible.

One is to modify the protocols so that the scope of each transaction
is declared very early.  This would reduce the impact of the problem,
but not eliminate it, as even declaring the scope of a large
transaction may turn out to be expensive.

Another is to modify the protocols so that the revnum of a write is
determined only when the transaction completes.  This would also
reduce the problem, although it still will impose a bottleneck that 
restricts the potential benefits of dividing a single logical
repository across multiple servers (consider a large code shop, or an
"exotic" application such as a large wiki).

Both of those imperfect solutions are likely to have a large impact on
the protocols, the server, and clients.  If large impacts are
unavoidable anyway, one might as well consider a third, clean
solution: eliminate revnum entirely.

A strong case can be made that the primary four virtues of svn
repository semantics and presumed performance are:

	1. Space and time efficient tree cloning.

	2. Update methods that facilitate server-side delta-compressed
	   storage.

	3. Change-based update and access methods that reduce network
           traffic.

	4. Transactions permitting multi-file atomic updates.

and, furthermore, a strong case can be made that those virtuous
capabilities are, in and of themselves, quite sufficient to implement
revision control (with either or both arch-like or cvs-like user
interfaces).  Given those four capabilities, a global revnum is not
needed at all (and an alternative has already been presented).

Currently, it is primarily code that is part of svn itself that would
be impacted by elimination of revnum.  QOTD slashdot (paraphrasing):
"If you can't find the time to do it right in the first place, how are
you going to find the time to fix it later?" :-)

I tend to believe that a great deal of care and effort has gone into 
the low-level implementation of repositories (the fs layer?).  That
work and expertese can, I think, be quickly leveraged to implement the
"four virtues" approach.  Similarly, a great deal of care and effort
has gone into a UI layer with a CVS feel -- that too can be quickly
leveraged.  


It would be a mistake for you to believe I care what color you paint
your bikeshed, 
-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum considered harmful

Posted by Tom Lord <lo...@regexps.com>.


       Isn't all this just a special case of a larger issue - namely
       that with two transactions running that *may* affect each
       other, one of them *has* to wait for the other to complete (or
       for the system to determine that they do not overlap)?


It's hard to answer that query consisely.

Yes, this is an instance of a well known, general problem from
database theory: Avoid global txn sequence numbers or equivalent
limits on txn concurrency.  It's a fundamental design blunder every
time.

No, this isn't an insurmountable problem -- not by any means -- and I
proposed the general form of a solution in the message you're replying
to ("four virtues"), and many details of a specific solution in my
FSDB message a day or so ago.

Consider not svn specifically, but an FSDB in general.  Two write
transactions modify different parts of the tree.  Do we have to know
in what order those two transactions occur?  Not in general, no.
A concurrent read that spans the region of both writes can force us to
choose a particular order, but there's kind of a quantum mechanics
effect:  if nobody's looking (in ways that matter), then the two
transactions don't have to be ordered.

The ambiguity about txn order is important: it enables lots of
important optimizations.  That's true not only for svn, but for
databases generally.


You suggested:

	  (1) Prioritize the smaller transactions, letting the larger
	  transactions require a re-try (or simply fail) in case of a
	  conflict.

	  (2) Prioritize the larger transactions, letting the smaller
	  transactions require a re-try (or simply fail) in case of a
	  conflict.

	  (3) Finish transactions on a first-come first-serverd basis.

but left out (4): Use fine-grained locking and/or heuristics based on
partial information about the txns-so-far to decide in what order to
prioritize the two.  This is quite plausible in the case at hand.  You
have to get past the notion of writes propogating all the way up to
new revisions of the / directory, though -- that might be hard if you
are stuck in the current svn mindset.  Cheap tree cloning -- good.
Every txn vsns / -- bad.

I may as well point to arch, in which locks are
per-line-of-development.  Although arch is not an FSDB, the idea of
per-line locking does map to an FSDB in a natural way.

And, just for fun (cause it's a fun read): There's a neat (~30 year
old) paper by Leslie Lamport relating concepts from special relativity
to synchronizing concurrent threads.  It's not directly about
databases, but the concepts and approach developed there are helpful
here.  Sorry, though, I don't have the precise reference handy (I
think it was in "Communications of the ACM" -- I seem to remember an
orange cover.)


	 My apologies if I've totally missed the point / am talking
	 out of my ass.

Harldy.  This stuff is hella tricky, IMO.  It took, like, 10 or so
revisions of my message before it was close-enough-to-accurate to
consider sending.  Even so, I'm sure a practiced nitpicker could find
enough little flubs to totally distract attention away from the deep
content.  Such is life.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum considered harmful

Posted by Peter Schuller <pe...@infidyne.com>.

</delurk>

[snipped as I'm not commenting on any particular part]

Isn't all this just a special case of a larger issue - namely that with
two transactions running that *may* affect each other, one of them *has*
to wait for the other to complete (or for the system to determine that
they do not overlap)?

This is a problem regardless of wheather or not one is using revision
numbers, as long as one desires true transaction handling. Now, assuming
there are multiple transactions running and it has not yet been
determined wheather they overlap, there are three major possibilities
that I can see:

(1) Prioritize the smaller transactions, letting the larger transactions
require a re-try (or simply fail) in case of a conflict.
(2) Prioritize the larger transactions, letting the smaller transactions
require a re-try (or simply fail) in case of a conflict.
(3) Finish transactions on a first-come first-serverd basis.

IMO (2) is better than (1), with (3) being the obvious choice. You seem
to be advocating (1) in order to not force committers of small
transactions wait.

The problem with that approach is that given a large number of small
transactions, a large transaction might *never* be given the chance to
complete!

The same problem exists with (2) but to a lesser extent.

(3) would ensure all transactions have a chance of completing.

How do you propose to get around this problem while maintaing proper
transactional support?

My apologies if I've totally missed the point / am talking out of my
ass.

-- 
/ Peter Schuller, InfiDyne Technologies HB

PGP userID: 0xE9758B7D or 'Peter Schuller <pe...@infidyne.com>'
Key retrival: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Tracking CVS in SVN [was: Re: revnum (still) considered harmful]

Posted by Blair Zajac <bl...@orcaware.com>.

Branko ?ibej wrote:
> 
> Blair Zajac wrote:
> 
> >Zack Weinberg wrote:
> >
> >
> >>Michael Price <mp...@atl.lmco.com> writes:
> >>
> >>
> >>
> >>>I'm glad there are multiple revision control systems in existence
> >>>(variety is the spice of life) but I only ever use one at a time. I can
> >>>safely say that I've NEVER even thought about needing a "smart merging"
> >>>facility to smart merge between different revision control system
> >>>repositories. I doubt I ever will.
> >>>
> >>>
> >>For the past two weeks I've been writing a horrible script to do
> >>exactly this -- between GCC's CVS repository, and my current client's
> >>internal ClearCase repository (they use GCC to build their product).
> >>
> >>Just wanted to point out that it's not totally unheard of.
> >>
> >>I have no opinion on the global revision number thing.
> >>
> >>
> >
> >Would you be interested in sharing that script?  I need to track a
> >public CVS repository in a Subversion repository, and this sounds
> >like the perfect script I need.
> >
> >
> 
> What about the old trick of checking a CVS working copy into Subversion?
> I'm told it works famously.

True.  But you don't get the same history of commits.  If you just
update to HEAD, then you say, I updated to HEAD.  If you want the
individual commits, you're still stuck writing a script to figure
out what each commit was.

Blair

-- 
Blair Zajac <bl...@orcaware.com>
Plots of your system's performance - http://www.orcaware.com/orca/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Tracking CVS in SVN [was: Re: revnum (still) considered harmful]

Posted by Branko Čibej <br...@xbc.nu>.

Blair Zajac wrote:

>Zack Weinberg wrote:
>  
>
>>Michael Price <mp...@atl.lmco.com> writes:
>>
>>    
>>
>>>I'm glad there are multiple revision control systems in existence
>>>(variety is the spice of life) but I only ever use one at a time. I can
>>>safely say that I've NEVER even thought about needing a "smart merging"
>>>facility to smart merge between different revision control system
>>>repositories. I doubt I ever will.
>>>      
>>>
>>For the past two weeks I've been writing a horrible script to do
>>exactly this -- between GCC's CVS repository, and my current client's
>>internal ClearCase repository (they use GCC to build their product).
>>
>>Just wanted to point out that it's not totally unheard of.
>>
>>I have no opinion on the global revision number thing.
>>    
>>
>
>Would you be interested in sharing that script?  I need to track a
>public CVS repository in a Subversion repository, and this sounds
>like the perfect script I need.
>  
>

What about the old trick of checking a CVS working copy into Subversion?
I'm told it works famously.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

cvs2svn incremental mode

Posted by Marko Macek <Ma...@gmx.net>.

Blair Zajac wrote:

>I would think that much of the CVS end of it would be the same.  Is this
>true?
>
>Would it be possible to see it now, if that's appropriate?  (You may be
>able to tell, I'm anxious to get this other CVS repository tracked :)
>  
>

Attached is a quick hack to make cvs2svn work in incremental mode. If 
you have the CVS repository available locally (via rsync?) this can do 
what you wish.

It adds a --incremental mode which is used after the initial conversion 
is done. You need to keep the cvs2svn-data.revs file from the previous 
run to work incrementally.

It applies to latest /branches/cvs2svn-mmacek in the subversion repository.

WARNING: only lightly tested, I suspect a few bugs.

A big problem is when something happens (disk full, ^C), there is no way 
to recover, you need to start from scratch (create new repository). A 
solution  to this problem could be saving the CVS revision numbers in 
svn properties.

Regards,
Mark

Re: revnum (still) considered harmful

Posted by Blair Zajac <bl...@orcaware.com>.

Zack Weinberg wrote:
> 
> Blair Zajac <bl...@orcaware.com> writes:
> 
> > Zack Weinberg wrote:
> >
> > Would you be interested in sharing that script?  I need to track a
> > public CVS repository in a Subversion repository, and this sounds
> > like the perfect script I need.
> 
> I'm afraid it is (a) not done, and (b) highly specific to ClearCase.
> However, if you are still curious, I'll send you a copy when I'm done
> writing it.


I would think that much of the CVS end of it would be the same.  Is this
true?

Would it be possible to see it now, if that's appropriate?  (You may be
able to tell, I'm anxious to get this other CVS repository tracked :)

Best,
Blair

-- 
Blair Zajac <bl...@orcaware.com>
Plots of your system's performance - http://www.orcaware.com/orca/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Zack Weinberg <za...@codesourcery.com>.

Blair Zajac <bl...@orcaware.com> writes:

> Zack Weinberg wrote:
>
> Would you be interested in sharing that script?  I need to track a
> public CVS repository in a Subversion repository, and this sounds
> like the perfect script I need.

I'm afraid it is (a) not done, and (b) highly specific to ClearCase.
However, if you are still curious, I'll send you a copy when I'm done
writing it.

zw

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Blair Zajac <bl...@orcaware.com>.

Zack Weinberg wrote:
> 
> Michael Price <mp...@atl.lmco.com> writes:
> 
> > I'm glad there are multiple revision control systems in existence
> > (variety is the spice of life) but I only ever use one at a time. I can
> > safely say that I've NEVER even thought about needing a "smart merging"
> > facility to smart merge between different revision control system
> > repositories. I doubt I ever will.
> 
> For the past two weeks I've been writing a horrible script to do
> exactly this -- between GCC's CVS repository, and my current client's
> internal ClearCase repository (they use GCC to build their product).
> 
> Just wanted to point out that it's not totally unheard of.
> 
> I have no opinion on the global revision number thing.

Would you be interested in sharing that script?  I need to track a
public CVS repository in a Subversion repository, and this sounds
like the perfect script I need.

Best,
Blair

-- 
Blair Zajac <bl...@orcaware.com>
Plots of your system's performance - http://www.orcaware.com/orca/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Zack Weinberg <za...@codesourcery.com>.

Michael Price <mp...@atl.lmco.com> writes:

> I'm glad there are multiple revision control systems in existence
> (variety is the spice of life) but I only ever use one at a time. I can
> safely say that I've NEVER even thought about needing a "smart merging"
> facility to smart merge between different revision control system
> repositories. I doubt I ever will.

For the past two weeks I've been writing a horrible script to do
exactly this -- between GCC's CVS repository, and my current client's
internal ClearCase repository (they use GCC to build their product).

Just wanted to point out that it's not totally unheard of.

I have no opinion on the global revision number thing.

zw

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Michael Price <mp...@atl.lmco.com>.

Tom Lord writes:
 >   Realistically (imo), _this_ performance problem can only ever really
 >   be important for utterly huge transaction rates.

Even then I doubt revnum's will be the performance bottleneck. As such,
this is a non-issue.

 >   It also becomes possible to have "smart merging" technology not be
 >   specific to any particular rev ctl system -- but to instead have
 >   systems be interoperable in this regard.  I can have a branch in my
 >   svn repository of a line in your arch repository and smart merge
 >   between those.

I'm glad there are multiple revision control systems in existence
(variety is the spice of life) but I only ever use one at a time. I can
safely say that I've NEVER even thought about needing a "smart merging"
facility to smart merge between different revision control system
repositories. I doubt I ever will.

 >   So, I think that both the intra-repository and global revision names
 >   for merging purposes should not be based on revnum, but on an
 >   independent, higher-level namespace.

I like the revnum's. Were I forced to pick names for every revision I'd
quickly setup a script to increment an integer and stick it in there for
me. The idea that I'd be required to come up with a unique name for
every revision is sickening. Please never do that.

Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Greg Hudson <gh...@MIT.EDU> writes:
> At any rate, it's most likely pointless to try to design a merge history
> system right now, given that no one is planning to implement it in the
> immediate future (as far as I know).  So this conversation probably
> shouldn't go on too much longer.

Yup.  Let's be real: we're not going to change how revision numbers
work at this point.  If someone wants to do that, they'll need to fork
the project :-).

Suggest that the rest of this discussion happen Post-1.0.

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Greg Hudson <gh...@MIT.EDU>.

On Mon, 2002-12-16 at 19:06, Tom Lord wrote:
> 	Well, here's how I think we'd implement this if we were going to:
> 
> Already, I think you're off on the wrong foot.

I thought your presentation of the idea was pretty complete.  A design 
helps to estimate how much effort it would take, and also helps to
clarify that I read what you wrote.

>> * Ignoring the merge history aspects, it feels like window dressing.
> "Feels", huh?  Hmmm.

Four days ago, you said, "every week or so some detail goes by on the
svn dev list that strikes me as _wrong_".  Is only one of us allowed to
talk about our gut feelings in response to a design idea?

>       * I don't really buy that smart merging between different pieces
>         of revision control software is a realistic or desirable goal.
> 
> arch is an existence proof that it's realistic.

How can a single piece of software be an existence proof of
interoperability?

>         And even if it does come about, using numbers doesn't mean we
>         can't interoperate; it just means that our revision names are
>         less informative.

> That statement makes presumptions about the namespace and how it is
> best used that are, if not false, at least completely unsupported.

You referred to revision names as being "friendly names for the
changesets in question."  That sounds like information to me.

>       * You can no longer compress merge history using revision ranges (or
>         if you do, you lose the benefit of making the merge history
>         readable). 

> No, you are mistaken.  arch can and does compress merge history while
> maintaining a readable record.  You can ask, of a combined merge, "what
> individuals changes are combined here?".  Smart-merging, not just
> human readers, make use of that information.

With revision numbers, you can say that revisions 100-2000 of foo.c have
been merged into bar.c.

With revision names, you might say that revisions feature-foo through
bugfix-bar have been merged into bar.c, but a human will have no idea
whether docfix-baz is in that range or not.  Postprocessing of the
history record might provide that information by asking the repository
those changesets came from, but postprocessing of revision numbers can
do the same thing.

Unless arch has magical powers, it can't display the individual
changeset names in a history record without either storing that
information close at hand, or asking for it when it is needed.

>       At any rate, it's most likely pointless to try to design a merge
>       history system right now, given that no one is planning to
>       implement it in the immediate future (as far as I know).  So
>       this conversation probably shouldn't go on too much longer.

> In other words: "It isn't worth considering whether or not this is
> worth planning for because nobody is currently planning for it."
> Interesting.

Earlier today you complained about the "shameful tactic called 'pessimal
reading'", and yet here you are, rewording my argument into a circular
statement by reducing two separate antecedents into the same inspecific
noun ("this"/"it").

I did not say "nobody is planning to implement revision names, so it's
pointless to discuss whether we want revision names."  I said, "nobody
is planning to implement merge history soon, and you presented revision
names as a prequisite for merge history, so it's pointless to discuss
whether we would want revision names in a merge history system."

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Tom Lord <lo...@regexps.com>.


       > (a) the (much reduced) performance limitation:

       I'm not sure how your hypothetical distributed repository is
       going to determine that transactions are non-overlapping more
       cheaply than it can settle revision numbers.  But you've
       admitted this is a small issue.


They can decide in advance by tentatively partioning regions of the
repository among themselves, coordinating synchronously only as a
fallback for txns that span the tentative boundaries.

The performance issue is small for source code managment.  It isn't a
small issue for other quite plausible and valuable applications of
FSDB-style technology.



	>   So, I think that both the intra-repository and global
	>   revision names for merging purposes should not be based on
	>   revnum, but on an independent, higher-level namespace.

	Well, here's how I think we'd implement this if we were going to:

Already, I think you're off on the wrong foot.  The namespace is
useful to tools adjacent to revision control, not just revision
control itself.  It is something that can have and plausibly deserves
a "stand alone" design -- independent of revision control technology.
The first question isn't "how do we implement it?", but "what is the
form and function of this namespace? -- what is it exactly?"  You
can't really figure out how to implement it until you understand in a
deeper way what it is.



    I don't really like this idea because:

      * Ignoring the merge history aspects, it feels like window
        dressing.

"Feels", huh?  Hmmm.


      * I don't really buy that smart merging between different pieces
        of revision control software is a realistic or desirable goal.

arch is an existence proof that it's realistic.   Read the recent
project-administrative messages on gcc list (and think about them) to
begin to get a sense of why it's desirable.  Linux kernel development
also provides some relevant development patterns.


        And even if it does come about, using numbers doesn't mean we
        can't interoperate; it just means that our revision names are
        less informative.

That statement makes presumptions about the namespace and how it is
best used that are, if not false, at least completely unsupported.


      * You can no longer compress merge history using revision ranges (or
        if you do, you lose the benefit of making the merge history
        readable). 

No, you are mistaken.  arch can and does compress merge history while
maintaining a readable record.  You can ask, of a combined merge, "what
individuals changes are combined here?".  Smart-merging, not just
human readers, make use of that information.


	I'm already concerned about the bulk of merge history information given
	that we may get stuck storing it for each file.

Well then, that's something to figure out for sure then, isn't it?


      At any rate, it's most likely pointless to try to design a merge
      history system right now, given that no one is planning to
      implement it in the immediate future (as far as I know).  So
      this conversation probably shouldn't go on too much longer.

In other words: "It isn't worth considering whether or not this is
worth planning for because nobody is currently planning for it."
Interesting.

-t


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Greg Hudson <gh...@MIT.EDU>.

On Mon, 2002-12-16 at 16:29, Tom Lord wrote:
> (a) the (much reduced) performance limitation:

I'm not sure how your hypothetical distributed repository is going to
determine that transactions are non-overlapping more cheaply than it can
settle revision numbers.  But you've admitted this is a small issue.

>   So, I think that both the intra-repository and global revision names
>   for merging purposes should not be based on revnum, but on an
>   independent, higher-level namespace.

Well, here's how I think we'd implement this if we were going to:

  * Commits would acquire an optional parameter for the revision name.

  * The revisions table would contain mappings from names as well as
revnums.  (A revnum would map to ("revision" TXN NAME); a name would map
to ("revision" TXN REVNUM).  Or maybe they'd both map to the identical
skel containing both.  Doesn't matter much.

  * Revision specifications could be given as names as well as the
current options (numbers, dates, HEAD, etc.).  An ra method for
get-named-rev would be needed alongside get-latest-rev and
get-dated-rev.  And possibly a method to get the name given the number,
given the next step.

  * When it comes time to store merge history, use <guid,name> tuples
instead of <guid,number> tuples.

I don't really like this idea because:

  * Ignoring the merge history aspects, it feels like window dressing.

  * I don't really buy that smart merging between different pieces of
revision control software is a realistic or desirable goal.  And even if
it does come about, using numbers doesn't mean we can't interoperate; it
just means that our revision names are less informative.

  * You can no longer compress merge history using revision ranges (or
if you do, you lose the benefit of making the merge history readable). 
I'm already concerned about the bulk of merge history information given
that we may get stuck storing it for each file.

At any rate, it's most likely pointless to try to design a merge history
system right now, given that no one is planning to implement it in the
immediate future (as far as I know).  So this conversation probably
shouldn't go on too much longer.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Branko Čibej <br...@xbc.nu>.

Greg Stein wrote:

>On Tue, Dec 17, 2002 at 03:49:13AM +0100, Branko Cibej wrote:
>  
>
>>Greg Stein wrote:
>>
>>    
>>
>>>The problem is that txnids are defined as non-integers right now, so they
>>>don't range-compress like revnums do.
>>>
>>>      
>>>
>>Say again? I thought txn id's /were/ integers, thery're just not
>>marshalled in base-10 in the repository.
>>    
>>
>
>Don't get me started... the FS carries them around as char* values :-(
>  
>
Oh, /that/. I remember the fights we had about that, yup. :-)

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Greg Stein <gs...@lyra.org>.

On Tue, Dec 17, 2002 at 03:49:13AM +0100, Branko Cibej wrote:
> Greg Stein wrote:
> 
> >The problem is that txnids are defined as non-integers right now, so they
> >don't range-compress like revnums do.
> >
> Say again? I thought txn id's /were/ integers, thery're just not
> marshalled in base-10 in the repository.

Don't get me started... the FS carries them around as char* values :-(

-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Branko Čibej <br...@xbc.nu>.

Greg Stein wrote:

>The problem is that txnids are defined as non-integers right now, so they
>don't range-compress like revnums do.
>
Say again? I thought txn id's /were/ integers, thery're just not
marshalled in base-10 in the repository.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Greg Stein <gs...@lyra.org>.

On Mon, Dec 16, 2002 at 04:17:58PM -0800, Tom Lord wrote:
>...
>        >   still have either a single thread of execution or a
>        >   distributed commit protocol through which all commits must
>        >   pass.
> 
>        Correct, and we aren't worrying about this right now.
> 
> I understand that.  Although it's tangential to the main points of my
> comments on the 1.0 plans, I'll point out that I think it is worth
> thinking about right now, and here's why: The FSDB sketch I sent this
> list has applications far beyond revision control, including
> applications where very high txn rates are important. At the same
> time, an implementation of that sketch, sufficient for revision
> control, looks from where I sit like a simplification of what svn
> currently has.

For argument's sake, I'll concede these two points as true.

> So, it's worth considering because you can
> simultaneously simplify svn and prepare for applications where huge
> txn rates are important.

The svn architecture isn't going to radically change for 1.0. I don't think
that any of the developers have any interest in doing that. Therefore, if
you would like to build FSDB, then it will need to be a layer on top of the
svn_fs API (rather than below it).

Nobody has performed extensive commit-time benchmarks for SVN right now,
preferring completion of functionality over fine-tuning of performance. Not
to mention benchmarks lie :-) But let's say for argument's sake that we can
only do 10 commits per second on "typical" hardware. I am *SO* fine with
that for a 1.0 release. "High txn rates" isn't really a goal that
I/CollabNet cares much about. I am pretty darn sure there *are* people here
who are, and I am equally sure that they'll work on the problem. "Great!" I
say. But will 1.0 be held up? Will an architecture redesign occur to ensure
that post-1.0 it can hit those rates? I don't think so.

[ yes, there have been a number of benchmarks run, but they're concentrating
  on pretty high-order operations; nothing like what you'd be looking for
  out of an FSDB ]

>     >   Yet within one repository, merge history is expressed wrt. revnum.
>     >   The emerging plan for distributed revision control seems to be aiming
>     >   at recording merge history as <guid,revnum> pairs.
> 
>     Whatever. Those are merely ideas, and they won't become concrete
>     for quite a while. I think it is entirely possible to record the
>     data as <guid,txnid> pairs. Revnum doesn't have to appear.
> 
> [As an aside: did you really mean txnid, not revnum?]

I certainly did. If you use <guid,txnid>, then you would be out of the
conflicting-revnum business. In the current SVN FS data model, the txnid is
the important identifier. The revnum is simply turned into a txnid before
any real work is done. If revnums scare you :-), then use txnid.

The problem is that txnids are defined as non-integers right now, so they
don't range-compress like revnums do. But txnids *will* become integers at
some point, so we'll get range compression back (altho it will have holes,
but that's okay as I suspect revnums [as they occur in a merge source] have
holes in the ranges, too).

> I think there's now ample evidence that not only doesn't revnum _have_
> to appear in merge history, it _shouldn't_ appear.  "So what?" you
> ask, "This is all in the future, anyway."
> 
> It's not in the future.  It has impacts on UI, on project layout
> within repositories, on repository schema, and on protocols.  Even if
> you want to leave the specific feature of merge history out for now,
> it still has impacts on the features you aren't leaving out.

Yup. It has an impact. And we can solve that later. I'm confident it can be
solved, and I'll also grant that the total time expenditure will be higher
if we defer the thinking on that solution. But I'll *definitely* spend
future time on the problem to get a 1.0 sooner.

It's a simple benefit/cost, and I think you're seeing it whenever the SVN
community talks about SVN 1.0. We get the benefit of a "final" release
sooner, at the cost of more dev work later to compensate for "incorrect"
choices made now.

> Moreover, there's no good reason to leave it in the future.  It's
> basically been solved in prototype form, and it's only a tactical
> effort to figure out how to interpret that prototype in a svn context.

Great. If it is only tactical, then please begin execution :-). Patches and
working code are welcome...

Look. In all seriousness, I believe you have some great ideas. You also
expres them well, if a bit lengthy. But I think you're also going to have to
step up to the plate and do some coding if you want to see some of these
ideas reduced to practice. *Especially* if you're talking about changing SVN
itself, rather than building on top of it.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Tom Lord <lo...@regexps.com>.


       The txn id *is* assigned early. First thing you do when
       building a commit.  Only when the txn is actually committed,
       though, do you associate a revnum with that txn id.

That's what I missed.  I thought the two ids were one-in-the-same.


       >   less serious way).  In particular, if a single repository
       >   is implemented over a distributed database, all of the
       >   participating servers must still synchronize for every
       >   transaction in order to allocate txn numbers -- you'll
       >   still have either a single thread of execution or a
       >   distributed commit protocol through which all commits must
       >   pass.

       Correct, and we aren't worrying about this right now.

I understand that.  Although it's tangential to the main points of my
comments on the 1.0 plans, I'll point out that I think it is worth
thinking about right now, and here's why: The FSDB sketch I sent this
list has applications far beyond revision control, including
applications where very high txn rates are important.   At the same
time, an implementation of that sketch, sufficient for revision
control, looks from where I sit like a simplification of what svn
currently has.   So, it's worth considering because you can
simultaneously simplify svn and prepare for applications where huge
txn rates are important.


    >   Yet within one repository, merge history is expressed wrt. revnum.
    >   The emerging plan for distributed revision control seems to be aiming
    >   at recording merge history as <guid,revnum> pairs.

    Whatever. Those are merely ideas, and they won't become concrete
    for quite a while. I think it is entirely possible to record the
    data as <guid,txnid> pairs. Revnum doesn't have to appear.

[As an aside: did you really mean txnid, not revnum?]

I think there's now ample evidence that not only doesn't revnum _have_
to appear in merge history, it _shouldn't_ appear.  "So what?" you
ask, "This is all in the future, anyway."

It's not in the future.  It has impacts on UI, on project layout
within repositories, on repository schema, and on protocols.  Even if
you want to leave the specific feature of merge history out for now,
it still has impacts on the features you aren't leaving out.

Moreover, there's no good reason to leave it in the future.  It's
basically been solved in prototype form, and it's only a tactical
effort to figure out how to interpret that prototype in a svn context.

-t


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Greg Stein <gs...@lyra.org>.

On Mon, Dec 16, 2002 at 01:29:31PM -0800, Tom Lord wrote:
>...
> An admittedly quick read through the schema document made it seem that
> pending transactions are recorded in the database and that that record
> includes a transaction number -- which implies the txn number is
> assigned early.

The txn id *is* assigned early. First thing you do when building a commit.
Only when the txn is actually committed, though, do you associate a revnum
with that txn id.

>...
> Specifically, it seemed to me that early in
> the transaction, a commit examines the revnum of the repository to
> make sure that the wd is up-to-date wrt that revnum, and refuses to
> proceed if it is not.  That too, implies that the client (effectively)
> knows its new revnum early in the txn.  (I suppose now, in retrospect,
> that the commit is not looking at the global revnum, but only at the
> last revnum at which files being committed previously changed.)

That parenthetical note is correct: we only want to ensure that they are
changing the latest copy of the file/directory. They must be up-to-date for
each file/dir changed before the txn can be commited and receive a revnum.

There aren't any race conditions in here either. We merge the new changset
against <current-revnum>. Then we acquire a lock on the revnum->txnid
mapping table. Then we merge against <current-revnum> again, if it changed
from the last merge. Then we alloc a new revnum and associate it with the
txnid, then we release the lock.

>...
>   less serious way).  In particular, if a single repository is
>   implemented over a distributed database, all of the participating
>   servers must still synchronize for every transaction in order to
>   allocate txn numbers -- you'll still have either a single thread of
>   execution or a distributed commit protocol through which all commits
>   must pass.

Correct, and we aren't worrying about this right now.

>...
>   If I'm reading the FAQ correctly ( :-), revnum is, in essense, an
>   implementation detail -- it is "mostly hidden" from users for revision
>   control purposes.

Nah. There is a tension that exists. The revnum rate-of-change should not be
a cause for concern, yet the revnum is also a *very* useful tool. The FAQ
tends towards assuaging concern about revnums, but when people actually
start using SVN, they'll understand their utility quite a bit more.

>   Yet within one repository, merge history is expressed wrt. revnum.
>   The emerging plan for distributed revision control seems to be aiming
>   at recording merge history as <guid,revnum> pairs.

Whatever. Those are merely ideas, and they won't become concrete for quite a
while. I think it is entirely possible to record the data as <guid,txnid>
pairs. Revnum doesn't have to appear.

>...
>   When two related lines are merged or partialy merged, those changesets
>   are the ideal "unit of merging".   One might ask "on my branch, what's
>   been merged in from the foo mainline?" and get:

Part of the issue is that SVN imposes a linear ordering to the changesets
and that arbitrary composition is not easily supported. I think with some
work, people could definitely do change-composition-like stuff.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

revnum (still) considered harmful

Posted by Tom Lord <lo...@regexps.com>.


       > You've misunderstood the code (or ghudson's ra_svn protocol
       > is broken, which I highly doubt).

       I think the confusing bit is that the set-target-rev editor
       function is used for updates and similar operations, not for
       commits.

I was confused by reading and misinterpreting the `protocol' file in
the ra_svn directory and the description of the database schema in the
`fs' directory.

An admittedly quick read through the schema document made it seem that
pending transactions are recorded in the database and that that record
includes a transaction number -- which implies the txn number is
assigned early.

The confusion was reinforced by discussion on this list about certain
usage errors / bugs(?).  Specifically, it seemed to me that early in
the transaction, a commit examines the revnum of the repository to
make sure that the wd is up-to-date wrt that revnum, and refuses to
proceed if it is not.  That too, implies that the client (effectively)
knows its new revnum early in the txn.  (I suppose now, in retrospect,
that the commit is not looking at the global revnum, but only at the
last revnum at which files being committed previously changed.)

I think there are still two problems with revnum:  (a) a (much
reduced) performance limitation;  (b) a semantic problem from the
source mgt. perspective.

(a) the (much reduced) performance limitation:

  While assigning revnum late is far better than assigning it early, the
  existence of revnum _still_ limits server scalability (though in a
  less serious way).  In particular, if a single repository is
  implemented over a distributed database, all of the participating
  servers must still synchronize for every transaction in order to
  allocate txn numbers -- you'll still have either a single thread of
  execution or a distributed commit protocol through which all commits
  must pass.

  With no revnum, concurrent, non-overlapping txns can be unordered --
  for example, using a distributed database, synchronization for a set of
  such transactions can be coallesced (reducing the total number of
  syncs) and can take place asynchronously wrt to the txns themselves
  (e.g., well after they have completed and clients have moved on).

  Realistically (imo), _this_ performance problem can only ever really
  be important for utterly huge transaction rates.


(b) the source mgt problem:

  Revnum is harmful for another reason that has nothing to do with
  concurrency.

  If I'm reading the FAQ correctly ( :-), revnum is, in essense, an
  implementation detail -- it is "mostly hidden" from users for revision
  control purposes.

  Yet within one repository, merge history is expressed wrt. revnum.
  The emerging plan for distributed revision control seems to be aiming
  at recording merge history as <guid,revnum> pairs.

  Thus, the plan for merge history keeps track of history in low level
  terms that officially have no high-level rev ctl meaning.

  To understand why that's problematic, it's helpful to consider that
  merge history is not only the underlying support for "smart merging"
  -- it's also a record of reference that human's want to be able to
  read.   It should be expressed in higher level terms.

  This gets into smart changeset management.  For example, in a single line
  of development one would ideally like human-cosumable names for each
  revision, and (at least in the branches critical to a large
  development effort), to regard each revision as a particular,
  purposeful changset.   A query about the revisions for project `foo'
  might generate a list like:

	foo-rev1	added feature xyzzy
	foo-rev2	added feature quux
	foo-rev3	fixed bug #1234
	....

  When two related lines are merged or partialy merged, those changesets
  are the ideal "unit of merging".   One might ask "on my branch, what's
  been merged in from the foo mainline?" and get:

	foobranch-rev1
	foobranch-rev3

  or ask "what's missing from foo?" and get:

	foobranch-rev2

  and then, the human reader knows: "The feature `quux' has not been
  merged into foobranch".  And the humans have friendly names for the
  changesets in question.

  Moreover, by giving revisions more meaningful, less
  repository-specific names like this, it becomes practical to 
  put the tar bundle:

	foo-rev2-patch.tar.gz

  on your site, let people merge that with a `patch'-like tool, and have
  the effect be the same as if they'd done an operation between
  repositories.

  It also becomes possible to have "smart merging" technology not be
  specific to any particular rev ctl system -- but to instead have
  systems be interoperable in this regard.  I can have a branch in my
  svn repository of a line in your arch repository and smart merge
  between those.

  So, I think that both the intra-repository and global revision names
  for merging purposes should not be based on revnum, but on an
  independent, higher-level namespace.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum considered harmful

Posted by Greg Hudson <gh...@MIT.EDU>.

On Mon, 2002-12-16 at 08:52, cmpilato@collab.net wrote:
> You've misunderstood the code (or ghudson's ra_svn protocol is broken,
> which I highly doubt).

I think the confusing bit is that the set-target-rev editor function is
used for updates and similar operations, not for commits.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum considered harmful

Posted by Tom Lord <lo...@regexps.com>.


       You've misunderstood the code (or ghudson's ra_svn protocol is
       broken, which I highly doubt).  A new revision number is not
       assigned until near the very end of the svn_fs_commit_txn()
       function call

Well, that's easy.  :-)

Thanks.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum considered harmful

Posted by cm...@collab.net.

Tom Lord <lo...@regexps.com> writes:

> I think I see a flaw in the semantic design of svn (revnums) that I
> believe is likely to impose a serious limit on performance in the
> future, when people try to scale svn for large but realistic
> situations:
> 
> revnum imposes a total order on all write transactions.
> 
> If I'm reading the code correctly, the global revision number of a
> write transaction is determined early in the transaction -- before
> most of the work is done (I think the ra_svn protocol calls this
> step `target_rvn').

You've misunderstood the code (or ghudson's ra_svn protocol is broken,
which I highly doubt).  A new revision number is not assigned until
near the very end of the svn_fs_commit_txn() function call, after the
transaction (T) has successfully merged in all the changes of any
other transactions that have been committed since the beginning of T's
commit process.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org