You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Tom Lord <lo...@regexps.com> on 2002/12/16 21:29:31 UTC

revnum (still) considered harmful


       > You've misunderstood the code (or ghudson's ra_svn protocol
       > is broken, which I highly doubt).

       I think the confusing bit is that the set-target-rev editor
       function is used for updates and similar operations, not for
       commits.

I was confused by reading and misinterpreting the `protocol' file in
the ra_svn directory and the description of the database schema in the
`fs' directory.

An admittedly quick read through the schema document made it seem that
pending transactions are recorded in the database and that that record
includes a transaction number -- which implies the txn number is
assigned early.

The confusion was reinforced by discussion on this list about certain
usage errors / bugs(?).  Specifically, it seemed to me that early in
the transaction, a commit examines the revnum of the repository to
make sure that the wd is up-to-date wrt that revnum, and refuses to
proceed if it is not.  That too, implies that the client (effectively)
knows its new revnum early in the txn.  (I suppose now, in retrospect,
that the commit is not looking at the global revnum, but only at the
last revnum at which files being committed previously changed.)

I think there are still two problems with revnum:  (a) a (much
reduced) performance limitation;  (b) a semantic problem from the
source mgt. perspective.

(a) the (much reduced) performance limitation:

  While assigning revnum late is far better than assigning it early, the
  existence of revnum _still_ limits server scalability (though in a
  less serious way).  In particular, if a single repository is
  implemented over a distributed database, all of the participating
  servers must still synchronize for every transaction in order to
  allocate txn numbers -- you'll still have either a single thread of
  execution or a distributed commit protocol through which all commits
  must pass.

  With no revnum, concurrent, non-overlapping txns can be unordered --
  for example, using a distributed database, synchronization for a set of
  such transactions can be coallesced (reducing the total number of
  syncs) and can take place asynchronously wrt to the txns themselves
  (e.g., well after they have completed and clients have moved on).

  Realistically (imo), _this_ performance problem can only ever really
  be important for utterly huge transaction rates.


(b) the source mgt problem:

  Revnum is harmful for another reason that has nothing to do with
  concurrency.

  If I'm reading the FAQ correctly ( :-), revnum is, in essense, an
  implementation detail -- it is "mostly hidden" from users for revision
  control purposes.

  Yet within one repository, merge history is expressed wrt. revnum.
  The emerging plan for distributed revision control seems to be aiming
  at recording merge history as <guid,revnum> pairs.

  Thus, the plan for merge history keeps track of history in low level
  terms that officially have no high-level rev ctl meaning.

  To understand why that's problematic, it's helpful to consider that
  merge history is not only the underlying support for "smart merging"
  -- it's also a record of reference that human's want to be able to
  read.   It should be expressed in higher level terms.

  This gets into smart changeset management.  For example, in a single line
  of development one would ideally like human-cosumable names for each
  revision, and (at least in the branches critical to a large
  development effort), to regard each revision as a particular,
  purposeful changset.   A query about the revisions for project `foo'
  might generate a list like:

	foo-rev1	added feature xyzzy
	foo-rev2	added feature quux
	foo-rev3	fixed bug #1234
	....

  When two related lines are merged or partialy merged, those changesets
  are the ideal "unit of merging".   One might ask "on my branch, what's
  been merged in from the foo mainline?" and get:

	foobranch-rev1
	foobranch-rev3

  or ask "what's missing from foo?" and get:

	foobranch-rev2

  and then, the human reader knows: "The feature `quux' has not been
  merged into foobranch".  And the humans have friendly names for the
  changesets in question.

  Moreover, by giving revisions more meaningful, less
  repository-specific names like this, it becomes practical to 
  put the tar bundle:

	foo-rev2-patch.tar.gz

  on your site, let people merge that with a `patch'-like tool, and have
  the effect be the same as if they'd done an operation between
  repositories.

  It also becomes possible to have "smart merging" technology not be
  specific to any particular rev ctl system -- but to instead have
  systems be interoperable in this regard.  I can have a branch in my
  svn repository of a line in your arch repository and smart merge
  between those.

  So, I think that both the intra-repository and global revision names
  for merging purposes should not be based on revnum, but on an
  independent, higher-level namespace.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Tracking CVS in SVN [was: Re: revnum (still) considered harmful]

Posted by Blair Zajac <bl...@orcaware.com>.
Branko ?ibej wrote:
> 
> Blair Zajac wrote:
> 
> >Zack Weinberg wrote:
> >
> >
> >>Michael Price <mp...@atl.lmco.com> writes:
> >>
> >>
> >>
> >>>I'm glad there are multiple revision control systems in existence
> >>>(variety is the spice of life) but I only ever use one at a time. I can
> >>>safely say that I've NEVER even thought about needing a "smart merging"
> >>>facility to smart merge between different revision control system
> >>>repositories. I doubt I ever will.
> >>>
> >>>
> >>For the past two weeks I've been writing a horrible script to do
> >>exactly this -- between GCC's CVS repository, and my current client's
> >>internal ClearCase repository (they use GCC to build their product).
> >>
> >>Just wanted to point out that it's not totally unheard of.
> >>
> >>I have no opinion on the global revision number thing.
> >>
> >>
> >
> >Would you be interested in sharing that script?  I need to track a
> >public CVS repository in a Subversion repository, and this sounds
> >like the perfect script I need.
> >
> >
> 
> What about the old trick of checking a CVS working copy into Subversion?
> I'm told it works famously.

True.  But you don't get the same history of commits.  If you just
update to HEAD, then you say, I updated to HEAD.  If you want the
individual commits, you're still stuck writing a script to figure
out what each commit was.

Blair

-- 
Blair Zajac <bl...@orcaware.com>
Plots of your system's performance - http://www.orcaware.com/orca/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Tracking CVS in SVN [was: Re: revnum (still) considered harmful]

Posted by Branko Čibej <br...@xbc.nu>.
Blair Zajac wrote:

>Zack Weinberg wrote:
>  
>
>>Michael Price <mp...@atl.lmco.com> writes:
>>
>>    
>>
>>>I'm glad there are multiple revision control systems in existence
>>>(variety is the spice of life) but I only ever use one at a time. I can
>>>safely say that I've NEVER even thought about needing a "smart merging"
>>>facility to smart merge between different revision control system
>>>repositories. I doubt I ever will.
>>>      
>>>
>>For the past two weeks I've been writing a horrible script to do
>>exactly this -- between GCC's CVS repository, and my current client's
>>internal ClearCase repository (they use GCC to build their product).
>>
>>Just wanted to point out that it's not totally unheard of.
>>
>>I have no opinion on the global revision number thing.
>>    
>>
>
>Would you be interested in sharing that script?  I need to track a
>public CVS repository in a Subversion repository, and this sounds
>like the perfect script I need.
>  
>

What about the old trick of checking a CVS working copy into Subversion?
I'm told it works famously.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

cvs2svn incremental mode

Posted by Marko Macek <Ma...@gmx.net>.
Blair Zajac wrote:

>I would think that much of the CVS end of it would be the same.  Is this
>true?
>
>Would it be possible to see it now, if that's appropriate?  (You may be
>able to tell, I'm anxious to get this other CVS repository tracked :)
>  
>

Attached is a quick hack to make cvs2svn work in incremental mode. If 
you have the CVS repository available locally (via rsync?) this can do 
what you wish.

It adds a --incremental mode which is used after the initial conversion 
is done. You need to keep the cvs2svn-data.revs file from the previous 
run to work incrementally.

It applies to latest /branches/cvs2svn-mmacek in the subversion repository.

WARNING: only lightly tested, I suspect a few bugs.

A big problem is when something happens (disk full, ^C), there is no way 
to recover, you need to start from scratch (create new repository). A 
solution  to this problem could be saving the CVS revision numbers in 
svn properties.

Regards,
Mark

Re: revnum (still) considered harmful

Posted by Blair Zajac <bl...@orcaware.com>.
Zack Weinberg wrote:
> 
> Blair Zajac <bl...@orcaware.com> writes:
> 
> > Zack Weinberg wrote:
> >
> > Would you be interested in sharing that script?  I need to track a
> > public CVS repository in a Subversion repository, and this sounds
> > like the perfect script I need.
> 
> I'm afraid it is (a) not done, and (b) highly specific to ClearCase.
> However, if you are still curious, I'll send you a copy when I'm done
> writing it.


I would think that much of the CVS end of it would be the same.  Is this
true?

Would it be possible to see it now, if that's appropriate?  (You may be
able to tell, I'm anxious to get this other CVS repository tracked :)

Best,
Blair

-- 
Blair Zajac <bl...@orcaware.com>
Plots of your system's performance - http://www.orcaware.com/orca/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Zack Weinberg <za...@codesourcery.com>.
Blair Zajac <bl...@orcaware.com> writes:

> Zack Weinberg wrote:
>
> Would you be interested in sharing that script?  I need to track a
> public CVS repository in a Subversion repository, and this sounds
> like the perfect script I need.

I'm afraid it is (a) not done, and (b) highly specific to ClearCase.
However, if you are still curious, I'll send you a copy when I'm done
writing it.

zw

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Blair Zajac <bl...@orcaware.com>.
Zack Weinberg wrote:
> 
> Michael Price <mp...@atl.lmco.com> writes:
> 
> > I'm glad there are multiple revision control systems in existence
> > (variety is the spice of life) but I only ever use one at a time. I can
> > safely say that I've NEVER even thought about needing a "smart merging"
> > facility to smart merge between different revision control system
> > repositories. I doubt I ever will.
> 
> For the past two weeks I've been writing a horrible script to do
> exactly this -- between GCC's CVS repository, and my current client's
> internal ClearCase repository (they use GCC to build their product).
> 
> Just wanted to point out that it's not totally unheard of.
> 
> I have no opinion on the global revision number thing.

Would you be interested in sharing that script?  I need to track a
public CVS repository in a Subversion repository, and this sounds
like the perfect script I need.

Best,
Blair

-- 
Blair Zajac <bl...@orcaware.com>
Plots of your system's performance - http://www.orcaware.com/orca/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Zack Weinberg <za...@codesourcery.com>.
Michael Price <mp...@atl.lmco.com> writes:

> I'm glad there are multiple revision control systems in existence
> (variety is the spice of life) but I only ever use one at a time. I can
> safely say that I've NEVER even thought about needing a "smart merging"
> facility to smart merge between different revision control system
> repositories. I doubt I ever will.

For the past two weeks I've been writing a horrible script to do
exactly this -- between GCC's CVS repository, and my current client's
internal ClearCase repository (they use GCC to build their product).

Just wanted to point out that it's not totally unheard of.

I have no opinion on the global revision number thing.

zw

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Michael Price <mp...@atl.lmco.com>.
Tom Lord writes:
 >   Realistically (imo), _this_ performance problem can only ever really
 >   be important for utterly huge transaction rates.

Even then I doubt revnum's will be the performance bottleneck. As such,
this is a non-issue.

 >   It also becomes possible to have "smart merging" technology not be
 >   specific to any particular rev ctl system -- but to instead have
 >   systems be interoperable in this regard.  I can have a branch in my
 >   svn repository of a line in your arch repository and smart merge
 >   between those.

I'm glad there are multiple revision control systems in existence
(variety is the spice of life) but I only ever use one at a time. I can
safely say that I've NEVER even thought about needing a "smart merging"
facility to smart merge between different revision control system
repositories. I doubt I ever will.

 >   So, I think that both the intra-repository and global revision names
 >   for merging purposes should not be based on revnum, but on an
 >   independent, higher-level namespace.

I like the revnum's. Were I forced to pick names for every revision I'd
quickly setup a script to increment an integer and stick it in there for
me. The idea that I'd be required to come up with a unique name for
every revision is sickening. Please never do that.

Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Greg Hudson <gh...@MIT.EDU> writes:
> At any rate, it's most likely pointless to try to design a merge history
> system right now, given that no one is planning to implement it in the
> immediate future (as far as I know).  So this conversation probably
> shouldn't go on too much longer.

Yup.  Let's be real: we're not going to change how revision numbers
work at this point.  If someone wants to do that, they'll need to fork
the project :-).

Suggest that the rest of this discussion happen Post-1.0.

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Greg Hudson <gh...@MIT.EDU>.
On Mon, 2002-12-16 at 19:06, Tom Lord wrote:
> 	Well, here's how I think we'd implement this if we were going to:
> 
> Already, I think you're off on the wrong foot.

I thought your presentation of the idea was pretty complete.  A design 
helps to estimate how much effort it would take, and also helps to
clarify that I read what you wrote.

>> * Ignoring the merge history aspects, it feels like window dressing.
> "Feels", huh?  Hmmm.

Four days ago, you said, "every week or so some detail goes by on the
svn dev list that strikes me as _wrong_".  Is only one of us allowed to
talk about our gut feelings in response to a design idea?

>       * I don't really buy that smart merging between different pieces
>         of revision control software is a realistic or desirable goal.
> 
> arch is an existence proof that it's realistic.

How can a single piece of software be an existence proof of
interoperability?

>         And even if it does come about, using numbers doesn't mean we
>         can't interoperate; it just means that our revision names are
>         less informative.

> That statement makes presumptions about the namespace and how it is
> best used that are, if not false, at least completely unsupported.

You referred to revision names as being "friendly names for the
changesets in question."  That sounds like information to me.

>       * You can no longer compress merge history using revision ranges (or
>         if you do, you lose the benefit of making the merge history
>         readable). 

> No, you are mistaken.  arch can and does compress merge history while
> maintaining a readable record.  You can ask, of a combined merge, "what
> individuals changes are combined here?".  Smart-merging, not just
> human readers, make use of that information.

With revision numbers, you can say that revisions 100-2000 of foo.c have
been merged into bar.c.

With revision names, you might say that revisions feature-foo through
bugfix-bar have been merged into bar.c, but a human will have no idea
whether docfix-baz is in that range or not.  Postprocessing of the
history record might provide that information by asking the repository
those changesets came from, but postprocessing of revision numbers can
do the same thing.

Unless arch has magical powers, it can't display the individual
changeset names in a history record without either storing that
information close at hand, or asking for it when it is needed.

>       At any rate, it's most likely pointless to try to design a merge
>       history system right now, given that no one is planning to
>       implement it in the immediate future (as far as I know).  So
>       this conversation probably shouldn't go on too much longer.

> In other words: "It isn't worth considering whether or not this is
> worth planning for because nobody is currently planning for it."
> Interesting.

Earlier today you complained about the "shameful tactic called 'pessimal
reading'", and yet here you are, rewording my argument into a circular
statement by reducing two separate antecedents into the same inspecific
noun ("this"/"it").

I did not say "nobody is planning to implement revision names, so it's
pointless to discuss whether we want revision names."  I said, "nobody
is planning to implement merge history soon, and you presented revision
names as a prequisite for merge history, so it's pointless to discuss
whether we would want revision names in a merge history system."


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Tom Lord <lo...@regexps.com>.

       > (a) the (much reduced) performance limitation:

       I'm not sure how your hypothetical distributed repository is
       going to determine that transactions are non-overlapping more
       cheaply than it can settle revision numbers.  But you've
       admitted this is a small issue.


They can decide in advance by tentatively partioning regions of the
repository among themselves, coordinating synchronously only as a
fallback for txns that span the tentative boundaries.

The performance issue is small for source code managment.  It isn't a
small issue for other quite plausible and valuable applications of
FSDB-style technology.



	>   So, I think that both the intra-repository and global
	>   revision names for merging purposes should not be based on
	>   revnum, but on an independent, higher-level namespace.

	Well, here's how I think we'd implement this if we were going to:

Already, I think you're off on the wrong foot.  The namespace is
useful to tools adjacent to revision control, not just revision
control itself.  It is something that can have and plausibly deserves
a "stand alone" design -- independent of revision control technology.
The first question isn't "how do we implement it?", but "what is the
form and function of this namespace? -- what is it exactly?"  You
can't really figure out how to implement it until you understand in a
deeper way what it is.



    I don't really like this idea because:

      * Ignoring the merge history aspects, it feels like window
        dressing.

"Feels", huh?  Hmmm.


      * I don't really buy that smart merging between different pieces
        of revision control software is a realistic or desirable goal.

arch is an existence proof that it's realistic.   Read the recent
project-administrative messages on gcc list (and think about them) to
begin to get a sense of why it's desirable.  Linux kernel development
also provides some relevant development patterns.


        And even if it does come about, using numbers doesn't mean we
        can't interoperate; it just means that our revision names are
        less informative.

That statement makes presumptions about the namespace and how it is
best used that are, if not false, at least completely unsupported.


      * You can no longer compress merge history using revision ranges (or
        if you do, you lose the benefit of making the merge history
        readable). 

No, you are mistaken.  arch can and does compress merge history while
maintaining a readable record.  You can ask, of a combined merge, "what
individuals changes are combined here?".  Smart-merging, not just
human readers, make use of that information.


	I'm already concerned about the bulk of merge history information given
	that we may get stuck storing it for each file.

Well then, that's something to figure out for sure then, isn't it?


      At any rate, it's most likely pointless to try to design a merge
      history system right now, given that no one is planning to
      implement it in the immediate future (as far as I know).  So
      this conversation probably shouldn't go on too much longer.

In other words: "It isn't worth considering whether or not this is
worth planning for because nobody is currently planning for it."
Interesting.

-t


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Greg Hudson <gh...@MIT.EDU>.
On Mon, 2002-12-16 at 16:29, Tom Lord wrote:
> (a) the (much reduced) performance limitation:

I'm not sure how your hypothetical distributed repository is going to
determine that transactions are non-overlapping more cheaply than it can
settle revision numbers.  But you've admitted this is a small issue.

>   So, I think that both the intra-repository and global revision names
>   for merging purposes should not be based on revnum, but on an
>   independent, higher-level namespace.

Well, here's how I think we'd implement this if we were going to:

  * Commits would acquire an optional parameter for the revision name.

  * The revisions table would contain mappings from names as well as
revnums.  (A revnum would map to ("revision" TXN NAME); a name would map
to ("revision" TXN REVNUM).  Or maybe they'd both map to the identical
skel containing both.  Doesn't matter much.

  * Revision specifications could be given as names as well as the
current options (numbers, dates, HEAD, etc.).  An ra method for
get-named-rev would be needed alongside get-latest-rev and
get-dated-rev.  And possibly a method to get the name given the number,
given the next step.

  * When it comes time to store merge history, use <guid,name> tuples
instead of <guid,number> tuples.

I don't really like this idea because:

  * Ignoring the merge history aspects, it feels like window dressing.

  * I don't really buy that smart merging between different pieces of
revision control software is a realistic or desirable goal.  And even if
it does come about, using numbers doesn't mean we can't interoperate; it
just means that our revision names are less informative.

  * You can no longer compress merge history using revision ranges (or
if you do, you lose the benefit of making the merge history readable). 
I'm already concerned about the bulk of merge history information given
that we may get stuck storing it for each file.

At any rate, it's most likely pointless to try to design a merge history
system right now, given that no one is planning to implement it in the
immediate future (as far as I know).  So this conversation probably
shouldn't go on too much longer.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Branko Čibej <br...@xbc.nu>.
Greg Stein wrote:

>On Tue, Dec 17, 2002 at 03:49:13AM +0100, Branko Cibej wrote:
>  
>
>>Greg Stein wrote:
>>
>>    
>>
>>>The problem is that txnids are defined as non-integers right now, so they
>>>don't range-compress like revnums do.
>>>
>>>      
>>>
>>Say again? I thought txn id's /were/ integers, thery're just not
>>marshalled in base-10 in the repository.
>>    
>>
>
>Don't get me started... the FS carries them around as char* values :-(
>  
>
Oh, /that/. I remember the fights we had about that, yup. :-)

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Greg Stein <gs...@lyra.org>.
On Tue, Dec 17, 2002 at 03:49:13AM +0100, Branko Cibej wrote:
> Greg Stein wrote:
> 
> >The problem is that txnids are defined as non-integers right now, so they
> >don't range-compress like revnums do.
> >
> Say again? I thought txn id's /were/ integers, thery're just not
> marshalled in base-10 in the repository.

Don't get me started... the FS carries them around as char* values :-(

-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Branko Čibej <br...@xbc.nu>.
Greg Stein wrote:

>The problem is that txnids are defined as non-integers right now, so they
>don't range-compress like revnums do.
>
Say again? I thought txn id's /were/ integers, thery're just not
marshalled in base-10 in the repository.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Dec 16, 2002 at 04:17:58PM -0800, Tom Lord wrote:
>...
>        >   still have either a single thread of execution or a
>        >   distributed commit protocol through which all commits must
>        >   pass.
> 
>        Correct, and we aren't worrying about this right now.
> 
> I understand that.  Although it's tangential to the main points of my
> comments on the 1.0 plans, I'll point out that I think it is worth
> thinking about right now, and here's why: The FSDB sketch I sent this
> list has applications far beyond revision control, including
> applications where very high txn rates are important. At the same
> time, an implementation of that sketch, sufficient for revision
> control, looks from where I sit like a simplification of what svn
> currently has.

For argument's sake, I'll concede these two points as true.

> So, it's worth considering because you can
> simultaneously simplify svn and prepare for applications where huge
> txn rates are important.

The svn architecture isn't going to radically change for 1.0. I don't think
that any of the developers have any interest in doing that. Therefore, if
you would like to build FSDB, then it will need to be a layer on top of the
svn_fs API (rather than below it).

Nobody has performed extensive commit-time benchmarks for SVN right now,
preferring completion of functionality over fine-tuning of performance. Not
to mention benchmarks lie :-) But let's say for argument's sake that we can
only do 10 commits per second on "typical" hardware. I am *SO* fine with
that for a 1.0 release. "High txn rates" isn't really a goal that
I/CollabNet cares much about. I am pretty darn sure there *are* people here
who are, and I am equally sure that they'll work on the problem. "Great!" I
say. But will 1.0 be held up? Will an architecture redesign occur to ensure
that post-1.0 it can hit those rates? I don't think so.

[ yes, there have been a number of benchmarks run, but they're concentrating
  on pretty high-order operations; nothing like what you'd be looking for
  out of an FSDB ]

>     >   Yet within one repository, merge history is expressed wrt. revnum.
>     >   The emerging plan for distributed revision control seems to be aiming
>     >   at recording merge history as <guid,revnum> pairs.
> 
>     Whatever. Those are merely ideas, and they won't become concrete
>     for quite a while. I think it is entirely possible to record the
>     data as <guid,txnid> pairs. Revnum doesn't have to appear.
> 
> [As an aside: did you really mean txnid, not revnum?]

I certainly did. If you use <guid,txnid>, then you would be out of the
conflicting-revnum business. In the current SVN FS data model, the txnid is
the important identifier. The revnum is simply turned into a txnid before
any real work is done. If revnums scare you :-), then use txnid.

The problem is that txnids are defined as non-integers right now, so they
don't range-compress like revnums do. But txnids *will* become integers at
some point, so we'll get range compression back (altho it will have holes,
but that's okay as I suspect revnums [as they occur in a merge source] have
holes in the ranges, too).

> I think there's now ample evidence that not only doesn't revnum _have_
> to appear in merge history, it _shouldn't_ appear.  "So what?" you
> ask, "This is all in the future, anyway."
> 
> It's not in the future.  It has impacts on UI, on project layout
> within repositories, on repository schema, and on protocols.  Even if
> you want to leave the specific feature of merge history out for now,
> it still has impacts on the features you aren't leaving out.

Yup. It has an impact. And we can solve that later. I'm confident it can be
solved, and I'll also grant that the total time expenditure will be higher
if we defer the thinking on that solution. But I'll *definitely* spend
future time on the problem to get a 1.0 sooner.

It's a simple benefit/cost, and I think you're seeing it whenever the SVN
community talks about SVN 1.0. We get the benefit of a "final" release
sooner, at the cost of more dev work later to compensate for "incorrect"
choices made now.

> Moreover, there's no good reason to leave it in the future.  It's
> basically been solved in prototype form, and it's only a tactical
> effort to figure out how to interpret that prototype in a svn context.

Great. If it is only tactical, then please begin execution :-). Patches and
working code are welcome...

Look. In all seriousness, I believe you have some great ideas. You also
expres them well, if a bit lengthy. But I think you're also going to have to
step up to the plate and do some coding if you want to see some of these
ideas reduced to practice. *Especially* if you're talking about changing SVN
itself, rather than building on top of it.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Tom Lord <lo...@regexps.com>.

       The txn id *is* assigned early. First thing you do when
       building a commit.  Only when the txn is actually committed,
       though, do you associate a revnum with that txn id.

That's what I missed.  I thought the two ids were one-in-the-same.


       >   less serious way).  In particular, if a single repository
       >   is implemented over a distributed database, all of the
       >   participating servers must still synchronize for every
       >   transaction in order to allocate txn numbers -- you'll
       >   still have either a single thread of execution or a
       >   distributed commit protocol through which all commits must
       >   pass.

       Correct, and we aren't worrying about this right now.

I understand that.  Although it's tangential to the main points of my
comments on the 1.0 plans, I'll point out that I think it is worth
thinking about right now, and here's why: The FSDB sketch I sent this
list has applications far beyond revision control, including
applications where very high txn rates are important.   At the same
time, an implementation of that sketch, sufficient for revision
control, looks from where I sit like a simplification of what svn
currently has.   So, it's worth considering because you can
simultaneously simplify svn and prepare for applications where huge
txn rates are important.


    >   Yet within one repository, merge history is expressed wrt. revnum.
    >   The emerging plan for distributed revision control seems to be aiming
    >   at recording merge history as <guid,revnum> pairs.

    Whatever. Those are merely ideas, and they won't become concrete
    for quite a while. I think it is entirely possible to record the
    data as <guid,txnid> pairs. Revnum doesn't have to appear.

[As an aside: did you really mean txnid, not revnum?]

I think there's now ample evidence that not only doesn't revnum _have_
to appear in merge history, it _shouldn't_ appear.  "So what?" you
ask, "This is all in the future, anyway."

It's not in the future.  It has impacts on UI, on project layout
within repositories, on repository schema, and on protocols.  Even if
you want to leave the specific feature of merge history out for now,
it still has impacts on the features you aren't leaving out.

Moreover, there's no good reason to leave it in the future.  It's
basically been solved in prototype form, and it's only a tactical
effort to figure out how to interpret that prototype in a svn context.

-t


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: revnum (still) considered harmful

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Dec 16, 2002 at 01:29:31PM -0800, Tom Lord wrote:
>...
> An admittedly quick read through the schema document made it seem that
> pending transactions are recorded in the database and that that record
> includes a transaction number -- which implies the txn number is
> assigned early.

The txn id *is* assigned early. First thing you do when building a commit.
Only when the txn is actually committed, though, do you associate a revnum
with that txn id.

>...
> Specifically, it seemed to me that early in
> the transaction, a commit examines the revnum of the repository to
> make sure that the wd is up-to-date wrt that revnum, and refuses to
> proceed if it is not.  That too, implies that the client (effectively)
> knows its new revnum early in the txn.  (I suppose now, in retrospect,
> that the commit is not looking at the global revnum, but only at the
> last revnum at which files being committed previously changed.)

That parenthetical note is correct: we only want to ensure that they are
changing the latest copy of the file/directory. They must be up-to-date for
each file/dir changed before the txn can be commited and receive a revnum.

There aren't any race conditions in here either. We merge the new changset
against <current-revnum>. Then we acquire a lock on the revnum->txnid
mapping table. Then we merge against <current-revnum> again, if it changed
from the last merge. Then we alloc a new revnum and associate it with the
txnid, then we release the lock.

>...
>   less serious way).  In particular, if a single repository is
>   implemented over a distributed database, all of the participating
>   servers must still synchronize for every transaction in order to
>   allocate txn numbers -- you'll still have either a single thread of
>   execution or a distributed commit protocol through which all commits
>   must pass.

Correct, and we aren't worrying about this right now.

>...
>   If I'm reading the FAQ correctly ( :-), revnum is, in essense, an
>   implementation detail -- it is "mostly hidden" from users for revision
>   control purposes.

Nah. There is a tension that exists. The revnum rate-of-change should not be
a cause for concern, yet the revnum is also a *very* useful tool. The FAQ
tends towards assuaging concern about revnums, but when people actually
start using SVN, they'll understand their utility quite a bit more.

>   Yet within one repository, merge history is expressed wrt. revnum.
>   The emerging plan for distributed revision control seems to be aiming
>   at recording merge history as <guid,revnum> pairs.

Whatever. Those are merely ideas, and they won't become concrete for quite a
while. I think it is entirely possible to record the data as <guid,txnid>
pairs. Revnum doesn't have to appear.

>...
>   When two related lines are merged or partialy merged, those changesets
>   are the ideal "unit of merging".   One might ask "on my branch, what's
>   been merged in from the foo mainline?" and get:

Part of the issue is that SVN imposes a linear ordering to the changesets
and that arbitrary composition is not easily supported. I think with some
work, people could definitely do change-composition-like stuff.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org