You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Branko Čibej <br...@xbc.nu> on 2004/04/19 06:39:18 UTC

RFC: Revision indexes for 1.1

The idea of labels has been floating around the SVN lists for some time.
In this post I propose a generic mechanism that could be used to
implement one of the label flavours, but would be also useful for other
things.

1. The label proposals

There were several different proposals for label semantics, but they all
boil down to two types:

    * A "label" is a symbolic name for (a set of) revision(s)
    * A "label" is a symbolic collection of specific versions of
      specific files

The mechanism I propose here can be used to implement the first type of
label. The reason for this is simple: we already have way to uniquely
identify a set of specific versions of specific files; it's called a
branch or a tag. We can even create mixed-revision tags with a WC-to-WC
or WC-to-URL copy. Ease-of-use and other UI considerations aren't
central to this proposal, so I'll just note that all the mechanics
necessary for implementing label manipulations of the second kind in SVN
clients are already available in our libraries.

2. The svn_repos_dated_revision problem

Another issue that needs to be solved is the requirement that the values
svn:date revprops rise with increasing revision numbers. Apart from the
fact that we don't have any checks in the code that prevent changing the
commit date, this also limits the ways in which "svnadmin load" can be
used for combining repositories (e.g., for replication) or repository
conversion (e.g., cvs2svn).

3. Revision indexes

Last but not least, searching for particular values of revprops is very
slow. All of these problems can be solved by introducing a revision
property index. This index would map a (propname, value) pair to the
list of revision numbers where this key appears. For example, to find
all the commits by one kfogel, you could do a single search for
(svn:author, kfogel) instead of having to traverse all revisions and
read their revprops.

This indexing would happen automatically (probably implemented at the
repos layer, rather than the filesystem layer), and which props are
index sources would be controlled by a repository-specific configuration
file. Two parameters control the indexing behaviour for each
index-enabled property type:

    * Uniqueness: Controls whether a single (propname, value) pair can
      map to more than one revision number
    * Multiplicity: Controls whether, for the same propname, more than
      one (propname, value) pair can map to a single revision.

3.1 DB schema changes

The filesystem grows a new table, "revpropindex", with the following schema:

    (PROP-NAME PROP-VALUE) -> REVNUM

Non-unique indexes are allowed.

No other changes are necessary. For forward compatibility, servers that
do not implement revision indexes will ignore this table; for backward
compatibility, if the table does not exist in the repository, revision
indexing and search operations are disabled. The dumpfile format does
not change, as the contents of the revision index can be reconstructed
from revprop data.

3.2 Multiple indexes per revision

The values of properties that allow multiple keys per single revision
are represented in a newline-terminated list, one value per line (like
the svn:ignore property on directories). Each value is added as a
separate key to the index.

3.3 Indexing configuration

The repository grows a new configuration file, conf/revpropindex. The
format of the file is as follows:

    [propname]
    unique = [TRUE/false]
    multiple = [FALSE/true]

Revprops that do not appear in the config file are not indexed. The
default contents of this file are:

    [svn:date]
    unique = false
    multiple = false
    [svn:name]
    unique = false
    multiple = true

The svn:date and svn:name indexing cannot be turned off, neither can the
indexing parameters change (in effect, we may as well not actually
enable these in the revpropindex config file).

3.4 FS/Repos API changes

When opening an existing repository, the FS layer must not error out if
the revpropindex table does not exist.

The repos layer grows a new function,

    svn_repos_revision_search(propname, propvalue)

which returns a list of revision numbers. The list can be empty. No
error is returned if a property is not indexed or revision indexing is
not enabled in the repository (i.e., if the repository schema version is
older than the server version).

The propset, propchange and propdel repos-level wrappers must maintain
the revpropindex table (optimization hint: when changing multi-value
properties, only values deleted from or added to the list need to be
processed).

The function svn_repos_dated_revision changes: first, it calls
svn_repos_revision_search("svn:date", timestamp). If this returns a
non-empty list, it returns the oldest revision from this list. Otherwise
it performs the current binary search. (The binary-search implementation
must stay for backward compatibility. It can be removed in 2.0.)
svn_repos_committed_info and svn_repos_history get similar changes.

4. Implementing revision names

Using the mechanism described above, we can add symbolic names to a
revision or a set of revisions. To do this we introduce a new revision
property, "svn:name", that contains a newline-separated list of symbolic
names assigned to a revision. The values are non-unique: that is, a
single symbolic name can group several distinct revisions.

While the existing "prop(get|set|edit) --revprop" functionality is
sufficient for setting and maintaining revision names, it is not really
useful. I propose the following changes to the UI:

4.1 Extend the format of the "-r" command-line option

Currently the -r command-line option accepts a revision number or a date
(range):

    -r revnum|{date}[:revnum|{date}]

The {date} specifier is internally converted to a revision number. We
add another specifier, [labelname], that is also converted to a revision
number.

Note: Since label values are non-unique, a [label] specifier can refer
to a list of revision numbers. Such lists useless for "svn update" or
"svn export"; however, "svn merge" could be extended to handle
multi-revision merges (cherry-picking, right?). We should support an
analogous format, "-r revnum,revnum,..." for specifying an explicit list
of revision numbers; this is also needed for defining multi-revision labels.

4.2 svn label [-r revnum/range/list] label-name

Adds a label to the specified revision(s). All forms of the -r option
are supported (including label specifiers, of course). The default is to
label HEAD.

4.3 svn labeldel [-r revnum/range/list] label-name

Remove a label from the specified revision(s). If -r is not specified,
remove all instances of the label.


All these functions need equivalents in the client library; the RA layer
only has to expose svn_repos_revision_search. "svn label: and "svn
labeldel" can be implemented as simple revprop manipulations, although
implementing them on the server would make multi-revision labeling faster.

5. Future notes

Currently no history is recorded about revprop changes. This is an
oversight that makes Subversion behave slightly at cross-purposes with
configuration management philosophy. Unfortunately, in order to record
historical changes to revprops, a slightly more drastic change is needed
not just to the schema and API, because these changes would have to be
recorded in a new kind of transaction. Thus this kind of history
tracking cannot be implemented before 2.0.



-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by kf...@collab.net.

"C. Michael Pilato" <cm...@collab.net> writes:
> > Is there a reason not to do non-uniqueness like this:
> > 
> >       (PROP-NAME PROP-VALUE) -> (REVNUM1, REVNUM2, REVNUM1 ...)
> 
> A couple come to mind quickly:
> 
>   - It unnecessarily complexifies the value format, making it into a
>     skel, which is a Berkeley-ism (something we would eventually like
>     to get away from).
>   - Because then each addition is now an edit (which for Berkeley DB
>     means we write out more log data than necessary)

Thanks; both of those make sense.  I'll withdraw the question, in the
assumption that Brane would have answered it the same way :-).


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by Branko Čibej <br...@xbc.nu>.

C. Michael Pilato wrote:

>kfogel@collab.net writes:
>
>  
>
>>Is there a reason not to do non-uniqueness like this:
>>
>>      (PROP-NAME PROP-VALUE) -> (REVNUM1, REVNUM2, REVNUM1 ...)
>>    
>>
>
>A couple come to mind quickly:
>
>  - It unnecessarily complexifies the value format, making it into a
>    skel, which is a Berkeley-ism (something we would eventually like
>    to get away from).
>  - Because then each addition is now an edit (which for Berkeley DB
>    means we write out more log data than necessary)
>  
>
Yes, these are exactly the reasons I had. Also it would make the size of 
the record unbounded, and doesn't map nicely to a SQL schema.

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by "C. Michael Pilato" <cm...@collab.net>.

kfogel@collab.net writes:

> Is there a reason not to do non-uniqueness like this:
> 
>       (PROP-NAME PROP-VALUE) -> (REVNUM1, REVNUM2, REVNUM1 ...)

A couple come to mind quickly:

  - It unnecessarily complexifies the value format, making it into a
    skel, which is a Berkeley-ism (something we would eventually like
    to get away from).
  - Because then each addition is now an edit (which for Berkeley DB
    means we write out more log data than necessary)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by Branko Čibej <br...@xbc.nu>.

Greg Hudson wrote:

>On Sun, 2004-04-25 at 13:53, Branko Čibej wrote:
>  
>
>>So in the final analysis, yes, people won't run into the hard cases very 
>>often, and when they do, it'll be because they're trying to diff or 
>>merge unrelated things.
>>    
>>
>
>I don't agree.  It seems like a reasonable question to ask, "What
>changed in this repository between January and February of 2002?" and if
>we've given people the rope to have screwed that up in November of 2003
>by inserting mis-ordered revisions, we've done the user a disservice.
>
>We have a responsibility to define a semantic model which is simple and
>well-defined, not one that we think will just happen to work most of the
>time.
>  
>
Yes, it is a pretty problem. The trouble is, of course, that we've not 
really analysed our model enough. we definitely need _something_ that 
maintains time order, but it doesn't necessarily have to be revisions -- 
after all, revisions are only aliases for transactions, and _those_ I 
definitely agree must be ordered in time. The sad bit is that 
transactions don't have an immutable date attached. That's something for 
2.0 to solve. :-)

>>>>If we keep that restriction, there's no way optimize cvs2svn, which 
>>>>means that people who start with a converted repository will keep 
>>>>complaining about the size blowup.
>>>>        
>>>>
>
>  
>
>>I may have overstated that; it's probably not impossible, but very hard 
>>because, to create optimal branches and tags from CVS, you have to 
>>globally optimize the sequence of copies, which means you use up 
>>enormous amouts of either and/or memory.
>>    
>>
>
>(Either what and/or memory?)
>  
>
(time. duh)

>I don't think we should be making deep semantic compromises in svn for
>the sake of efficiency gains in cvs2svn, and I strongly suspect the
>branch optimization problem isn't insurmountable.
>  
>
Yup. I was convinced that cvs2svn was affected, but Karl says otherwise. 
I'm a bit confused, but don't have time to go into cvs2svn details, so 
I'll just believe what I hear.

>>>>>I've gotten the impression that cursor walks create locking issues in
>>>>>the BDB implementation.
>>>>>          
>>>>>
>>>>I can't believe BDB needs more than two lock object to do a linear 
>>>>cursor walk, unless you do the walk in a transaction. And there's no 
>>>>need to do that, it being a read-only operation.
>>>>        
>>>>
>>>But there might be write operations mucking with the table at the same
>>>time, and they need to do so in a transaction.
>>>      
>>>
>>So what? That just means that two identical date queries don't 
>>necessarily return the same range of revisions, but I don't see that as 
>>a problem.
>>    
>>
>
>Context, context.  "So what" meaning "perhaps we'll have locking
>problems."  I don't really understand what leads to BDB locking
>problems; I'm just relying on a statement from CMike that a cursor walk
>of the revisions table during a read-only operation has created locking
>issues in the past.
>  
>
I don't believe we ever did even a read-only cursor walk outside of a 
trail, therefore outside of a transaction (until lately, perhaps).

>>>You've lost me, a bit.  Were you proposing that the revision indices
>>>would all be btree tables?
>>>      
>>>
>
>  
>
>>I was proposing that there be one table for all indexed revision props. 
>>Of course it has to be a brtree table, how else can you use it as an 
>>index and get any performance benefit? Well, really.
>>    
>>
>
>You could use a hash table; the only reason to use a btree table would
>be for this date thing.
>  
>
Now this is going to the level of detail I haven't really thought about. 
I can see how a hash table could be better for certain classes of keys, 
but a btree is more predictable given that we don't know what kind of 
keys we'll have.

>>>It's true, your revision index feature is difficult (though I think not
>>>impossible) to implement within a libsvn_fs_fs design, and since I
>>>continue to think that it's of minimal value in general, I'm not very
>>>fond of it.
>>>      
>>>
>>I can't agree that it is of minimal value. The fact that you can't do 
>>efficient searches of revprops is a big limitation. Right now the only 
>>fast index is the revision number, and I see this as a usability 
>>misfeature because it makes CM tracking and reporting so much harder. It 
>>may not be a big deal for your average student project, but it's fairly 
>>major if you want to use SVN to implement any serious quality management 
>>process.
>>    
>>
>
>I feel like we have a fundamental conflict here.  Subversion was
>originally conceived of as a version control tool,
>
(raises hand) I though 1.0 was conceived as a CVS replacement? That 
doesn't impose constraints for the future...

> and with its current
>feature set we can have an implementation of it which is flexible and
>low-overhead.  If we want to turn it into Clearcase, we'll lose that
>ability, because it will become too unwieldly to index the repository in
>all the desired ways without a Oracle-caliber database.
>
Oh, I can absolutely agree that we don't want to change Subversion into 
ClearCase. If nothing else, we're light years ahead of CC in terms of 
the fundamental CM model, and of course we don't want to take a step 
back in that respect.

I know that's not what you meant by your comment. :-)

>Moreover, our
>learning curve will suffer as our command set grows to encompass a set
>of features most people will never need.
>  
>
I don't buy the idea that a user can only effectively use a tool if she 
can learn the UI by heart. And right now most people have trouble 
understanding tagging in SVN; the labeling set of commands (which is 
only one use of revision indexes) would make their life -- and ours! -- 
simpler.

>Of course, we could implement an SQL back end and let people build
>layered products on top of Subversion which take advantage of whatever
>indexing an SQL database can provide.  But that's different from
>providing core Subversion features aimed at implementing a full-fledged
>CM system.
>  
>
Yes, it is. And note that I'm not proposing, e.g., integrating change 
management or document flow control into the core. Rather, I'm proposing 
a generic feature that will _support_ the implementation of such tools 
regardless of what kind of back-end you happen to use. Labels and date 
indexes are just a spinoff.

I quite understand that you see SVN in a different context than I do; I 
don't expect you're being paid for designing large-scale SCM solutions. 
But even if you look at only the open-source community aspect, there are 
projects out there that could immediately start using what I propose. 
GCC comes to mind, for example (although they'll probably wait for arch 
to mature... dunno), and there are many others of similar eminence and size.

Anyway, I think I can take a step back and agree to keep the revision 
ordering constraint in 1.x, at least. I'd still like to use an index for 
dates rather than doing the binary walk explicitly (what nonsense -- BDB 
already does that for us), and I still think labels would be useful, 
although they're really a feature on top of revision indexes -- but they 
seem to be the killer app in the short term, heh.

-- Brane



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by Greg Hudson <gh...@MIT.EDU>.

On Sun, 2004-04-25 at 13:53, Branko Čibej wrote:
> So in the final analysis, yes, people won't run into the hard cases very 
> often, and when they do, it'll be because they're trying to diff or 
> merge unrelated things.

I don't agree.  It seems like a reasonable question to ask, "What
changed in this repository between January and February of 2002?" and if
we've given people the rope to have screwed that up in November of 2003
by inserting mis-ordered revisions, we've done the user a disservice.

We have a responsibility to define a semantic model which is simple and
well-defined, not one that we think will just happen to work most of the
time.

> >>If we keep that restriction, there's no way optimize cvs2svn, which 
> >>means that people who start with a converted repository will keep 
> >>complaining about the size blowup.

> I may have overstated that; it's probably not impossible, but very hard 
> because, to create optimal branches and tags from CVS, you have to 
> globally optimize the sequence of copies, which means you use up 
> enormous amouts of either and/or memory.

(Either what and/or memory?)

I don't think we should be making deep semantic compromises in svn for
the sake of efficiency gains in cvs2svn, and I strongly suspect the
branch optimization problem isn't insurmountable.

> >>>I've gotten the impression that cursor walks create locking issues in
> >>>the BDB implementation.
> >>I can't believe BDB needs more than two lock object to do a linear 
> >>cursor walk, unless you do the walk in a transaction. And there's no 
> >>need to do that, it being a read-only operation.
> >But there might be write operations mucking with the table at the same
> >time, and they need to do so in a transaction.
> So what? That just means that two identical date queries don't 
> necessarily return the same range of revisions, but I don't see that as 
> a problem.

Context, context.  "So what" meaning "perhaps we'll have locking
problems."  I don't really understand what leads to BDB locking
problems; I'm just relying on a statement from CMike that a cursor walk
of the revisions table during a read-only operation has created locking
issues in the past.

> >You've lost me, a bit.  Were you proposing that the revision indices
> >would all be btree tables?

> I was proposing that there be one table for all indexed revision props. 
> Of course it has to be a brtree table, how else can you use it as an 
> index and get any performance benefit? Well, really.

You could use a hash table; the only reason to use a btree table would
be for this date thing.

> >It's true, your revision index feature is difficult (though I think not
> >impossible) to implement within a libsvn_fs_fs design, and since I
> >continue to think that it's of minimal value in general, I'm not very
> >fond of it.

> I can't agree that it is of minimal value. The fact that you can't do 
> efficient searches of revprops is a big limitation. Right now the only 
> fast index is the revision number, and I see this as a usability 
> misfeature because it makes CM tracking and reporting so much harder. It 
> may not be a big deal for your average student project, but it's fairly 
> major if you want to use SVN to implement any serious quality management 
> process.

I feel like we have a fundamental conflict here.  Subversion was
originally conceived of as a version control tool, and with its current
feature set we can have an implementation of it which is flexible and
low-overhead.  If we want to turn it into Clearcase, we'll lose that
ability, because it will become too unwieldly to index the repository in
all the desired ways without a Oracle-caliber database.  Moreover, our
learning curve will suffer as our command set grows to encompass a set
of features most people will never need.

Of course, we could implement an SQL back end and let people build
layered products on top of Subversion which take advantage of whatever
indexing an SQL database can provide.  But that's different from
providing core Subversion features aimed at implementing a full-fledged
CM system.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by Branko Čibej <br...@xbc.nu>.

Greg Hudson wrote:

>On Sun, 2004-04-25 at 12:53, Branko Čibej wrote:
>  
>
>>>I'm not sure what you could do with that information, though.  If you've
>>>got mis-ordered dates such that "{2004-10-10}:{2004-10-11}" results in
>>>revs 4, 5, 800, and 6, in that order, what does "svn diff -r
>>>{2004-10-10}:{2004-10-11}" do?
>>>      
>>>
>>You seem to be forgetting that we also filter by path, not only by date. 
>>The client needs an intersection of the set of revisions in which a 
>>subtree changed, and the set of dates belonging to a range.
>>    
>>
>So, what happens if I run that command on the root of the repository? 
>Your heuristic analysis doesn't seem very convincing to me; you seem to
>be saying "eh, people probably won't run into the hard cases very
>often."
>  
>
I forgot to type part of the answer, regarding your example.

First, note that svn diff does _not_ operate on a range of revisions, it 
operates on two specific revisions. The cmdline syntax is the same, of 
course, but there's a subtle difference in semantics.

Now, if a date range operation returns more than one revision range, 
then obviously svn diff can't use it and will error out. Or we could 
decide that in this case the dates are definitive, and diff could use 
the two closest revisions that match the dates. But diff isn't the only 
operation that uses ranges; branch and merge do, too, and they can work 
just fine with a list of revisions. They don't now, but they can.

Regarding the "not so often": I don't propose to drop _all_ revision 
ordering. Obviously it's a good idea to try to keep them ordered, and 
revisions arising from normal commits would remain so. That means that 
in most cases, a date-range search would return a single revision range.

When merging subtrees of repositories (e.g., for some sort of repo 
replication), revisions would also remain ordered in a particular subtree.

So in the final analysis, yes, people won't run into the hard cases very 
often, and when they do, it'll be because they're trying to diff or 
merge unrelated things.

>>>I remain convinced that enforcing date order is the only sane path to
>>>follow.
>>>      
>>>
>>If we keep that restriction, there's no way optimize cvs2svn, which 
>>means that people who start with a converted repository will keep 
>>complaining about the size blowup.
>>    
>>
>I have a hard time believing this, but I'm at a bit of a disadvantage
>since I don't follow cvs2svn development.  Perhaps you can spell out
>where the conflict lies.
>  
>
I may have overstated that; it's probably not impossible, but very hard 
because, to create optimal branches and tags from CVS, you have to 
globally optimize the sequence of copies, which means you use up 
enormous amouts of either and/or memory.

>>>I've gotten the impression that cursor walks create locking issues in
>>>the BDB implementation.
>>>      
>>>
>>I can't believe BDB needs more than two lock object to do a linear 
>>cursor walk, unless you do the walk in a transaction. And there's no 
>>need to do that, it being a read-only operation.
>>    
>>
>But there might be write operations mucking with the table at the same
>time, and they need to do so in a transaction.
>  
>
So what? That just means that two identical date queries don't 
necessarily return the same range of revisions, but I don't see that as 
a problem.

Again, remember that revisons will not be out of order unless you 
specifically fiddle with svnadmin load (or equivalent) to make them so, 
or change svn:date. You can't do that by accident, and I'm inclined to 
assume that the repos administrator knows what she's doing.

>>> And it's also possible to imagine a repository
>>>getting big enough that a cursor walk of a table containing N revisions
>>>is too expensive, if getting a revision by date is a common operation.
>>>      
>>>
>
>  
>
>>You obviously don't walk the whole table; you start with the smallest 
>>matching index and stop when you've passed the largest one.
>>    
>>
>
>You've lost me, a bit.  Were you proposing that the revision indices
>would all be btree tables?
>  
>
I was proposing that there be one table for all indexed revision props. 
Of course it has to be a brtree table, how else can you use it as an 
index and get any performance benefit? Well, really.

>>>except BDB doesn't seem to
>>>have a "get the closest match in a btree database" operation
>>>      
>>>
>>Huh? DBcursor->c_get with DB_SET_RANGE
>>    
>>
>Ah, good to know.
>  
>
>>Not to mention that it takes a single SQL query. But it might be a bit 
>>hard to do in libsvn_fs_fs, I imagine. :-)
>>    
>>
>It's true, your revision index feature is difficult (though I think not
>impossible) to implement within a libsvn_fs_fs design, and since I
>continue to think that it's of minimal value in general, I'm not very
>fond of it.
>  
>
I can't agree that it is of minimal value. The fact that you can't do 
efficient searches of revprops is a big limitation. Right now the only 
fast index is the revision number, and I see this as a usability 
misfeature because it makes CM tracking and reporting so much harder. It 
may not be a big deal for your average student project, but it's fairly 
major if you want to use SVN to implement any serious quality management 
process.

-- Brane



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by Greg Hudson <gh...@MIT.EDU>.

On Sun, 2004-04-25 at 12:53, Branko Čibej wrote:
> >I'm not sure what you could do with that information, though.  If you've
> >got mis-ordered dates such that "{2004-10-10}:{2004-10-11}" results in
> >revs 4, 5, 800, and 6, in that order, what does "svn diff -r
> >{2004-10-10}:{2004-10-11}" do?

> You seem to be forgetting that we also filter by path, not only by date. 
> The client needs an intersection of the set of revisions in which a 
> subtree changed, and the set of dates belonging to a range.

So, what happens if I run that command on the root of the repository? 
Your heuristic analysis doesn't seem very convincing to me; you seem to
be saying "eh, people probably won't run into the hard cases very
often."

> >I remain convinced that enforcing date order is the only sane path to
> >follow.

> If we keep that restriction, there's no way optimize cvs2svn, which 
> means that people who start with a converted repository will keep 
> complaining about the size blowup.

I have a hard time believing this, but I'm at a bit of a disadvantage
since I don't follow cvs2svn development.  Perhaps you can spell out
where the conflict lies.

> >I've gotten the impression that cursor walks create locking issues in
> >the BDB implementation.

> I can't believe BDB needs more than two lock object to do a linear 
> cursor walk, unless you do the walk in a transaction. And there's no 
> need to do that, it being a read-only operation.

But there might be write operations mucking with the table at the same
time, and they need to do so in a transaction.

> >  And it's also possible to imagine a repository
> >getting big enough that a cursor walk of a table containing N revisions
> >is too expensive, if getting a revision by date is a common operation.

> You obviously don't walk the whole table; you start with the smallest 
> matching index and stop when you've passed the largest one.

You've lost me, a bit.  Were you proposing that the revision indices
would all be btree tables?

> > except BDB doesn't seem to
> >have a "get the closest match in a btree database" operation

> Huh? DBcursor->c_get with DB_SET_RANGE

Ah, good to know.

> Not to mention that it takes a single SQL query. But it might be a bit 
> hard to do in libsvn_fs_fs, I imagine. :-)

It's true, your revision index feature is difficult (though I think not
impossible) to implement within a libsvn_fs_fs design, and since I
continue to think that it's of minimal value in general, I'm not very
fond of it.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by kf...@collab.net.

Branko Čibej <br...@xbc.nu> writes:
>     * cvs2svn (or any other conversion) can delay branch and tag
>       creation, and optimize them better
>     * merging repositories or importing (via svnadmin load) some
>       project's history
> 
> In both cases the you'd typically ise date-range queries on a path
> where locally the revisions are ordered, and therefore the
> intersection of changes and dates is a continuous range of revisions.
> 
> >I remain convinced that enforcing date order is the only sane path to
> >follow.
>
> If we keep that restriction, there's no way optimize cvs2svn, which
> means that people who start with a converted repository will keep
> complaining about the size blowup.

What is the proposed optimization to cvs2svn here?

Since branch and tag creation are not versioned operations in CVS,
cvs2svn is free to put any date it wants to on the SVN revisions used
for those creations.  "Any date it wants to" within the bounds of
reason, of course -- the bounds of reason being too complex to go into
here, unless you really want the gory details.  But in any case, I
don't know of a way that enforcing or not enforcing ordered dates on
SVN revisions affects cvs2svn optimization.

> Not to mention that it takes a single SQL query. But it might be a bit
> hard to do in libsvn_fs_fs, I imagine. :-)

Dang, that was cruel :-).

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by Branko Čibej <br...@xbc.nu>.

Greg Hudson wrote:

>On Thu, 2004-04-22 at 20:33, Branko Čibej wrote:
>  
>
>>The API problem is that svn_repos_dated_revision looks for _one_ 
>>revision using _one_ date. So if your revisions aren't ordered by date, 
>>you can't just use this function to find the first and last revision in 
>>a date range and rely on the result. Instead, we need a new function 
>>that accepts a start and end date, and returns a list of revision ranges 
>>that fall within those two dates.
>>    
>>
>
>I'm not sure what you could do with that information, though.  If you've
>got mis-ordered dates such that "{2004-10-10}:{2004-10-11}" results in
>revs 4, 5, 800, and 6, in that order, what does "svn diff -r
>{2004-10-10}:{2004-10-11}" do?
>  
>
You seem to be forgetting that we also filter by path, not only by date. 
The client needs an intersection of the set of revisions in which a 
subtree changed, and the set of dates belonging to a range.

There are two use cases for non-ordered revisions:

    * cvs2svn (or any other conversion) can delay branch and tag
      creation, and optimize them better
    * merging repositories or importing (via svnadmin load) some
      project's history

In both cases the you'd typically ise date-range queries on a path where 
locally the revisions are ordered, and therefore the intersection of 
changes and dates is a continuous range of revisions.

>I remain convinced that enforcing date order is the only sane path to
>follow.
>  
>
If we keep that restriction, there's no way optimize cvs2svn, which 
means that people who start with a converted repository will keep 
complaining about the size blowup.

>> The implementation of such a function 
>>is trrivial and involves only a cursor traverse of the revision index, 
>>not touching revision props at all.
>>    
>>
>
>I've gotten the impression that cursor walks create locking issues in
>the BDB implementation.
>
I can't believe BDB needs more than two lock object to do a linear 
cursor walk, unless you do the walk in a transaction. And there's no 
need to do that, it being a read-only operation.

>  And it's also possible to imagine a repository
>getting big enough that a cursor walk of a table containing N revisions
>is too expensive, if getting a revision by date is a common operation.
>  
>
You obviously don't walk the whole table; you start with the smallest 
matching index and stop when you've passed the largest one.

>It seems like the best implementation would be a btree database where we
>could perform a binary search to find the closest match, and then walk
>forward or backward by one key as necessary, except BDB doesn't seem to
>have a "get the closest match in a btree database" operation, or a "walk
>forward or backward by one key from a specified starting point in a
>btree database" operation.  Obviously, such a thing wouldn't dovetail
>with the revision indexes plan, if we could implement it at all.
>  
>
Huh? DBcursor->c_get with DB_SET_RANGE, then DB_NEXT as long as 
necessary (or DB_NEXT_DUP, depending on what you're doing), will do 
exactly that, and will work marvelously with the proposed revision indexes.

Not to mention that it takes a single SQL query. But it might be a bit 
hard to do in libsvn_fs_fs, I imagine. :-)

-- Brane



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by Greg Hudson <gh...@MIT.EDU>.

On Thu, 2004-04-22 at 20:33, Branko Čibej wrote:
> The API problem is that svn_repos_dated_revision looks for _one_ 
> revision using _one_ date. So if your revisions aren't ordered by date, 
> you can't just use this function to find the first and last revision in 
> a date range and rely on the result. Instead, we need a new function 
> that accepts a start and end date, and returns a list of revision ranges 
> that fall within those two dates.

I'm not sure what you could do with that information, though.  If you've
got mis-ordered dates such that "{2004-10-10}:{2004-10-11}" results in
revs 4, 5, 800, and 6, in that order, what does "svn diff -r
{2004-10-10}:{2004-10-11}" do?

I remain convinced that enforcing date order is the only sane path to
follow.

>  The implementation of such a function 
> is trrivial and involves only a cursor traverse of the revision index, 
> not touching revision props at all.

I've gotten the impression that cursor walks create locking issues in
the BDB implementation.  And it's also possible to imagine a repository
getting big enough that a cursor walk of a table containing N revisions
is too expensive, if getting a revision by date is a common operation.

It seems like the best implementation would be a btree database where we
could perform a binary search to find the closest match, and then walk
forward or backward by one key as necessary, except BDB doesn't seem to
have a "get the closest match in a btree database" operation, or a "walk
forward or backward by one key from a specified starting point in a
btree database" operation.  Obviously, such a thing wouldn't dovetail
with the revision indexes plan, if we could implement it at all.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by Branko Čibej <br...@xbc.nu>.

kfogel@collab.net wrote:

>Branko Čibej <br...@xbc.nu> writes:
>  
>
>>>If revisions have their dates out of order, then -r {date}:{date} will
>>>evaluate to a revision range which does not reflect the date range.  So
>>>I think we're better off trying to enforce in-date-order revisions than
>>>we are trying to make our system work with out-of-date-order revisions.
>>>      
>>>
>>No, I'm totally against enforcing the date order, for the reasons I
>>stated. Instead we should just fix the API.
>>    
>>
>
>The "date-search problem" that Greg Hudson was talking about isn't
>merely an issue of API, it's about implementation.  The reason to
>enforce ordered dates is so we can use binary search, to narrow down a
>date range to a set of revisions very quickly.  Otherwise, resolving
>dates to revisions requires a full scan of the revisions table.
>  
>
It's an issue of both API and implementation, see below.

>You probably know all this already; it's just still not quite clear to
>me how your proposal solves the problem.  Btw, I'm not advocating
>enforced ordering of revision dates, merely stating that it's one way
>to turn an O(N) problem into a O(logN) problem.  The addition of a new
>index table might do the same thing, without enforcing ordering.  (Was
>there such an index in your proposal, and I just missed it?)
>  
>
The proposed revision indexes would by default create an injective 
mapping between svn:date and revision numbers. The implementation change 
is to use that index (when available) instead of the binary search, thus 
doing away with the ordering restriction.

The API problem is that svn_repos_dated_revision looks for _one_ 
revision using _one_ date. So if your revisions aren't ordered by date, 
you can't just use this function to find the first and last revision in 
a date range and rely on the result. Instead, we need a new function 
that accepts a start and end date, and returns a list of revision ranges 
that fall within those two dates. The implementation of such a function 
is trrivial and involves only a cursor traverse of the revision index, 
not touching revision props at all.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by kf...@collab.net.

Branko Čibej <br...@xbc.nu> writes:
> >>3.2 Multiple indexes per revision
> >>
> >>The values of properties that allow multiple keys per single revision
> >>are represented in a newline-terminated list, one value per line (like
> >>the svn:ignore property on directories). Each value is added as a
> >>separate key to the index.
> >
> >I didn't understand this, sorry.  Can you maybe give an example, or
> >try saying it another way?  (Probably this is somehow related to my
> >earlier question about unique indexes.)
>
> I hope I answered this adequately in my other reply.

Yup, completely got it now, thanks.

> Not all metadata is associated with a commit (ACLs are an
> example). But apart from that, yes, if the metadata associated with a
> commit is mutable -- as it is in Subversion -- then it is indeed at
> odds with CM philosophy if changes to that metadata aren't
> tracked. Imagine someone changing the value of svn:author; don't you
> agree that is useful to know what the original value had been, and who
> made the change, and when?

Perfect example, yup.

> There's a nice paper that discusses some of these issues at
> 
>     http://www.accurev.com/accurev/info/timesafe.htm

Yah -- it looks like there's a lot of good stuff there, actually.
[Sigh.  I'd give up my nights for another 12 hours in the day... Oh,
wait.]

The "timesafe" property is one that Subversion has been (informally)
trying to have.  But we fall down in the revprop change area, yes.

> Anyway, I think we can postpone this part of the discussion until the
> 2.0 design phase (or at least take it to another thread).

Agreed.

> P.S.: For the record, I'd consider AccuRev to be our real competition
> going forward, rather than CVS (or BitKeeper). It has a totally
> outstanding SCM model.

Wish I were familiar with AccuRev :-).

The only other thing that still worries me in the labels proposal is
this (from another mail of yours in this thread):

> >If revisions have their dates out of order, then -r {date}:{date} will
> >evaluate to a revision range which does not reflect the date range.  So
> >I think we're better off trying to enforce in-date-order revisions than
> >we are trying to make our system work with out-of-date-order revisions.
>
> No, I'm totally against enforcing the date order, for the reasons I
> stated. Instead we should just fix the API.

The "date-search problem" that Greg Hudson was talking about isn't
merely an issue of API, it's about implementation.  The reason to
enforce ordered dates is so we can use binary search, to narrow down a
date range to a set of revisions very quickly.  Otherwise, resolving
dates to revisions requires a full scan of the revisions table.

You probably know all this already; it's just still not quite clear to
me how your proposal solves the problem.  Btw, I'm not advocating
enforced ordering of revision dates, merely stating that it's one way
to turn an O(N) problem into a O(logN) problem.  The addition of a new
index table might do the same thing, without enforcing ordering.  (Was
there such an index in your proposal, and I just missed it?)

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by Branko Čibej <br...@xbc.nu>.

kfogel@collab.net wrote:

>Greg Hudson's comments echo mine, so I won't repeat those here.
>  
>
Then my reply to his post is addressed to you, too. :-)

>Just a few other questions, though:
>
>Branko Čibej <br...@xbc.nu> writes:
>  
>
>>3.1 DB schema changes
>>
>>The filesystem grows a new table, "revpropindex", with the following schema:
>>
>>    (PROP-NAME PROP-VALUE) -> REVNUM
>>
>>Non-unique indexes are allowed.
>>    
>>
>
>Is there a reason not to do non-uniqueness like this:
>
>      (PROP-NAME PROP-VALUE) -> (REVNUM1, REVNUM2, REVNUM1 ...)
>
>and retain unique indexes?
>  
>
Mike answered that one.

>>No other changes are necessary. For forward compatibility, servers that
>>do not implement revision indexes will ignore this table; for backward
>>compatibility, if the table does not exist in the repository, revision
>>indexing and search operations are disabled. The dumpfile format does
>>not change, as the contents of the revision index can be reconstructed
>>from revprop data.
>>    
>>
>
>If the feature is allowed to be unavailable anyway, then I'd prefer
>not to have the special rule that "svn:date" and "svn:name" are always
>indexed.  Why not let them be controlled in the same way as everything
>else?
>  
>
This feature is allowed to be unavailable for 1.x backward compatibility 
only. All repositories created by 1.1 and later would have to index 
dates and labels. Also I don't expect someone to downgrate the server to 
1.0.x yet keep the repository in the 1.1 format, but we have to cover 
that possibility because of our compatibility guarantees. (They'd still 
have to reload the repo at the next upgrade, to regenerate the index.)

It does make sense to not change the dump format, though, since all the 
information in the index is redundant, almost by definition.

>>3.2 Multiple indexes per revision
>>
>>The values of properties that allow multiple keys per single revision
>>are represented in a newline-terminated list, one value per line (like
>>the svn:ignore property on directories). Each value is added as a
>>separate key to the index.
>>    
>>
>
>I didn't understand this, sorry.  Can you maybe give an example, or
>try saying it another way?  (Probably this is somehow related to my
>earlier question about unique indexes.)
>  
>
I hope I answered this adequately in my other reply.

>>3.4 FS/Repos API changes
>>
>>When opening an existing repository, the FS layer must not error out if
>>the revpropindex table does not exist.
>>
>>The repos layer grows a new function,
>>
>>    svn_repos_revision_search(propname, propvalue)
>>
>>which returns a list of revision numbers. The list can be empty. No
>>error is returned if a property is not indexed or revision indexing is
>>not enabled in the repository (i.e., if the repository schema version is
>>older than the server version).
>>    
>>
>
>Why is it better to return an empty list than an
>SVN_ERR_UNSUPPORTED_FEATURE error or something like that?
>  
>
For one thing, I think it's completely irrelevant (from the client's 
perspective) whether the feature is unsupported or the list of matches 
is actually empty. On the other hand it will probably turn out that the 
RA implementation can't avoid throwing the error. This is both an 
implementation detail and a UI issue.

>>The propset, propchange and propdel repos-level wrappers must maintain
>>the revpropindex table (optimization hint: when changing multi-value
>>properties, only values deleted from or added to the list need to be
>>processed).
>>
>>The function svn_repos_dated_revision changes: first, it calls
>>svn_repos_revision_search("svn:date", timestamp). If this returns a
>>non-empty list, it returns the oldest revision from this list. Otherwise
>>it performs the current binary search. (The binary-search implementation
>>must stay for backward compatibility. It can be removed in 2.0.)
>>svn_repos_committed_info and svn_repos_history get similar changes.
>>    
>>
>
>Ooooh.  I have a feeling if I understood your earlier thing about
>"properties that allow multiple keys per single revision", I'd
>understand this part :-(.
>  
>
Nah, actually I screwed up this thing about date-based searching, and 
multi-indexes have no bearing on it...


>>4. Implementing revision names
>>
>>Using the mechanism described above, we can add symbolic names to a
>>revision or a set of revisions. To do this we introduce a new revision
>>property, "svn:name", that contains a newline-separated list of symbolic
>>names assigned to a revision. The values are non-unique: that is, a
>>single symbolic name can group several distinct revisions.
>>    
>>
>
>If we're calling these "labels", then let's use "svn:label".
>  
>
I'm only calling them "labels" here because of the earlier proposals. My 
candidate would be "symbolic revision names", mainly because I expect 
this would stop the flow of "how can I set a label on a single file" 
type of questions. But that's a bikeshed.

>>While the existing "prop(get|set|edit) --revprop" functionality is
>>sufficient for setting and maintaining revision names, it is not really
>>useful. I propose the following changes to the UI:
>>
>>4.1 Extend the format of the "-r" command-line option
>>
>>Currently the -r command-line option accepts a revision number or a date
>>(range):
>>
>>    -r revnum|{date}[:revnum|{date}]
>>
>>The {date} specifier is internally converted to a revision number. We
>>add another specifier, [labelname], that is also converted to a revision
>>number.
>>    
>>
>
>This sounds great (exactly the way CVS does it too).
>  
>
Yes, and it's also nice and logical (funny that, coming from CVS :-)

>>Note: Since label values are non-unique, a [label] specifier can refer
>>to a list of revision numbers. Such lists useless for "svn update" or
>>"svn export"; however, "svn merge" could be extended to handle
>>multi-revision merges (cherry-picking, right?). We should support an
>>analogous format, "-r revnum,revnum,..." for specifying an explicit list
>>of revision numbers; this is also needed for defining multi-revision labels.
>>    
>>
>
>Yes, and even before that's supported, we can have the [labelname]
>expansion be comma-separated when multiple revisions come back.  That
>way the -r option will give a syntax error for the stuff it can't
>handle yet.
>  
>
Indeed. I'd thought about saying this too, but in fact it's not 
efficient for the actual implementation to do text-based replacement of 
the option values. And cmdline isn't the only client, of course. There's 
some magic to be done in the svn_client API.


>>4.2 svn label [-r revnum/range/list] label-name
>>
>>Adds a label to the specified revision(s). All forms of the -r option
>>are supported (including label specifiers, of course). The default is to
>>label HEAD.
>>
>>4.3 svn labeldel [-r revnum/range/list] label-name
>>
>>Remove a label from the specified revision(s). If -r is not specified,
>>remove all instances of the label.
>>    
>>
>
>I'm not quite as opposed as Greg Hudson to having new commands for
>this, but would like to first do without and see whether or not it's a
>problem for people.  'svn propset --revprop -rN svn:label VALUE' isn't
>so hard, especially for an early adopter.  It feels premature to add
>dedicated subcommands for workflow-specific uses of properties, before
>we've had a chance to see how often and in what way the properties
>actually get used.
>  
>
The first use I see for multi-revision labels is for marking 
patch-release merge candidates. It would probably make sense to add a 
"labelget" command that returns a list of revision numbers associated 
with a label, too -- helpful until multi-version merges are implemented, 
and probably useful in any case (and incidentally it's yet another 
operation that can't easily be simulated by the current revprop commands).


>>All these functions need equivalents in the client library; the RA layer
>>only has to expose svn_repos_revision_search. "svn label: and "svn
>>labeldel" can be implemented as simple revprop manipulations, although
>>implementing them on the server would make multi-revision labeling faster.
>>    
>>
>
>This optimization is independent of the command set, if we implement
>the '-rN,M,...' syntax.
>  
>
Yes.

>>5. Future notes
>>
>>Currently no history is recorded about revprop changes. This is an
>>oversight that makes Subversion behave slightly at cross-purposes with
>>configuration management philosophy. Unfortunately, in order to record
>>historical changes to revprops, a slightly more drastic change is needed
>>not just to the schema and API, because these changes would have to be
>>recorded in a new kind of transaction. Thus this kind of history
>>tracking cannot be implemented before 2.0.
>>    
>>
>
>I agree, and wonder how high priority this should be.
>  
>
Currently it's an inconvenience, but not a showstopper; after all it can 
always be simulated with a pre-revprop-change hook. But I'd not like to 
see 2.0 without this.

>Look at it this way: assume that *every* change is versioned in the
>sense that
>
>   - it can be rolled back to any previous point
>   - it has some metadata (a log message) associated with it
>  
>
Ah no, the first but not the second -- you don't have to associate a log 
message with every change in the repository; log messages are associated 
with commits, not metadata changes. But you should be able to see, for 
example, what the _original_ log message for a revision was, and also 
replay metadata changes (this is indispensable for asynchronous 
repository replication, for example).

>...then a change to metadata itself must be rollbackable and have
>associated metametadata.  You can see how this begins to stack up.
>  
>
Modify that to "a change to mutable metadata", and the termination 
condition is part of the statement.

>It's not totally impossible to implement the infinite tower, it's just
>a pain.  Subversion has chosen to "bottom out" at the first level --
>the metadata associated with a commit is not versioned, it's just
>metadata.  Is that at odds with CM philosophy, or is it more that if
>one wants something versioned, one should put it under version
>control?
>  
>
Not all metadata is associated with a commit (ACLs are an example). But 
apart from that, yes, if the metadata associated with a commit is 
mutable -- as it is in Subversion -- then it is indeed at odds with CM 
philosophy if changes to that metadata aren't tracked. Imagine someone 
changing the value of svn:author; don't you agree that is useful to know 
what the original value had been, and who made the change, and when?

There's a nice paper that discusses some of these issues at

    http://www.accurev.com/accurev/info/timesafe.htm

Anyway, I think we can postpone this part of the discussion until the 
2.0 design phase (or at least take it to another thread).

-- Brane

P.S.: For the record, I'd consider AccuRev to be our real competition 
going forward, rather than CVS (or BitKeeper). It has a totally 
outstanding SCM model.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by kf...@collab.net.

Greg Hudson's comments echo mine, so I won't repeat those here.

Just a few other questions, though:

Branko Čibej <br...@xbc.nu> writes:
> 3.1 DB schema changes
> 
> The filesystem grows a new table, "revpropindex", with the following schema:
> 
>     (PROP-NAME PROP-VALUE) -> REVNUM
> 
> Non-unique indexes are allowed.

Is there a reason not to do non-uniqueness like this:

      (PROP-NAME PROP-VALUE) -> (REVNUM1, REVNUM2, REVNUM1 ...)

and retain unique indexes?

> No other changes are necessary. For forward compatibility, servers that
> do not implement revision indexes will ignore this table; for backward
> compatibility, if the table does not exist in the repository, revision
> indexing and search operations are disabled. The dumpfile format does
> not change, as the contents of the revision index can be reconstructed
> from revprop data.

If the feature is allowed to be unavailable anyway, then I'd prefer
not to have the special rule that "svn:date" and "svn:name" are always
indexed.  Why not let them be controlled in the same way as everything
else?

> 3.2 Multiple indexes per revision
> 
> The values of properties that allow multiple keys per single revision
> are represented in a newline-terminated list, one value per line (like
> the svn:ignore property on directories). Each value is added as a
> separate key to the index.

I didn't understand this, sorry.  Can you maybe give an example, or
try saying it another way?  (Probably this is somehow related to my
earlier question about unique indexes.)

> 3.3 Indexing configuration
> 
> The repository grows a new configuration file, conf/revpropindex. The
> format of the file is as follows:
> 
>     [propname]
>     unique = [TRUE/false]
>     multiple = [FALSE/true]
> 
> Revprops that do not appear in the config file are not indexed. The
> default contents of this file are:
> 
>     [svn:date]
>     unique = false
>     multiple = false
>     [svn:name]
>     unique = false
>     multiple = true
> 
> The svn:date and svn:name indexing cannot be turned off, neither can the
> indexing parameters change (in effect, we may as well not actually
> enable these in the revpropindex config file).

(See earlier comments about forced indexing.)

> 3.4 FS/Repos API changes
> 
> When opening an existing repository, the FS layer must not error out if
> the revpropindex table does not exist.
> 
> The repos layer grows a new function,
> 
>     svn_repos_revision_search(propname, propvalue)
> 
> which returns a list of revision numbers. The list can be empty. No
> error is returned if a property is not indexed or revision indexing is
> not enabled in the repository (i.e., if the repository schema version is
> older than the server version).

Why is it better to return an empty list than an
SVN_ERR_UNSUPPORTED_FEATURE error or something like that?

> The propset, propchange and propdel repos-level wrappers must maintain
> the revpropindex table (optimization hint: when changing multi-value
> properties, only values deleted from or added to the list need to be
> processed).
> 
> The function svn_repos_dated_revision changes: first, it calls
> svn_repos_revision_search("svn:date", timestamp). If this returns a
> non-empty list, it returns the oldest revision from this list. Otherwise
> it performs the current binary search. (The binary-search implementation
> must stay for backward compatibility. It can be removed in 2.0.)
> svn_repos_committed_info and svn_repos_history get similar changes.

Ooooh.  I have a feeling if I understood your earlier thing about
"properties that allow multiple keys per single revision", I'd
understand this part :-(.

> 4. Implementing revision names
> 
> Using the mechanism described above, we can add symbolic names to a
> revision or a set of revisions. To do this we introduce a new revision
> property, "svn:name", that contains a newline-separated list of symbolic
> names assigned to a revision. The values are non-unique: that is, a
> single symbolic name can group several distinct revisions.

If we're calling these "labels", then let's use "svn:label".

> While the existing "prop(get|set|edit) --revprop" functionality is
> sufficient for setting and maintaining revision names, it is not really
> useful. I propose the following changes to the UI:
> 
> 4.1 Extend the format of the "-r" command-line option
> 
> Currently the -r command-line option accepts a revision number or a date
> (range):
> 
>     -r revnum|{date}[:revnum|{date}]
> 
> The {date} specifier is internally converted to a revision number. We
> add another specifier, [labelname], that is also converted to a revision
> number.

This sounds great (exactly the way CVS does it too).

> Note: Since label values are non-unique, a [label] specifier can refer
> to a list of revision numbers. Such lists useless for "svn update" or
> "svn export"; however, "svn merge" could be extended to handle
> multi-revision merges (cherry-picking, right?). We should support an
> analogous format, "-r revnum,revnum,..." for specifying an explicit list
> of revision numbers; this is also needed for defining multi-revision labels.

Yes, and even before that's supported, we can have the [labelname]
expansion be comma-separated when multiple revisions come back.  That
way the -r option will give a syntax error for the stuff it can't
handle yet.

> 4.2 svn label [-r revnum/range/list] label-name
> 
> Adds a label to the specified revision(s). All forms of the -r option
> are supported (including label specifiers, of course). The default is to
> label HEAD.
>
> 4.3 svn labeldel [-r revnum/range/list] label-name
> 
> Remove a label from the specified revision(s). If -r is not specified,
> remove all instances of the label.

I'm not quite as opposed as Greg Hudson to having new commands for
this, but would like to first do without and see whether or not it's a
problem for people.  'svn propset --revprop -rN svn:label VALUE' isn't
so hard, especially for an early adopter.  It feels premature to add
dedicated subcommands for workflow-specific uses of properties, before
we've had a chance to see how often and in what way the properties
actually get used.

> All these functions need equivalents in the client library; the RA layer
> only has to expose svn_repos_revision_search. "svn label: and "svn
> labeldel" can be implemented as simple revprop manipulations, although
> implementing them on the server would make multi-revision labeling faster.

This optimization is independent of the command set, if we implement
the '-rN,M,...' syntax.
 
> 5. Future notes
> 
> Currently no history is recorded about revprop changes. This is an
> oversight that makes Subversion behave slightly at cross-purposes with
> configuration management philosophy. Unfortunately, in order to record
> historical changes to revprops, a slightly more drastic change is needed
> not just to the schema and API, because these changes would have to be
> recorded in a new kind of transaction. Thus this kind of history
> tracking cannot be implemented before 2.0.

I agree, and wonder how high priority this should be.

Look at it this way: assume that *every* change is versioned in the
sense that

   - it can be rolled back to any previous point
   - it has some metadata (a log message) associated with it

...then a change to metadata itself must be rollbackable and have
associated metametadata.  You can see how this begins to stack up.
It's not totally impossible to implement the infinite tower, it's just
a pain.  Subversion has chosen to "bottom out" at the first level --
the metadata associated with a commit is not versioned, it's just
metadata.  Is that at odds with CM philosophy, or is it more that if
one wants something versioned, one should put it under version
control?

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revision indexes for 1.1

Posted by Branko Čibej <br...@xbc.nu>.

Chris Foote wrote:

>I have one small question though.
>
>----- Original Message ----- 
>From: "Branko ÄOibej" <br...@xbc.nu>
>  
>
>>3.3 Indexing configuration
>>
>>The repository grows a new configuration file, conf/revpropindex. The
>>format of the file is as follows:
>>
>>    [propname]
>>    unique = [TRUE/false]
>>    multiple = [FALSE/true]
>>
>>    
>>
>Wouldn't it be better to make this the start of the conf/repos.conf file for
>all repositry
>configuration?
> That way only *one* config file would need be read.
>  
>
Possibly. That would only require a syntax change. (Although the fact 
that we allow colons in property names does complicate matters somewhat.)

>>Revprops that do not appear in the config file are not indexed. The
>>default contents of this file are:
>>
>>    [svn:date]
>>    unique = false
>>    multiple = false
>>    [svn:name]
>>    unique = false
>>    multiple = true
>>
>>The svn:date and svn:name indexing cannot be turned off, neither can the
>>indexing parameters change (in effect, we may as well not actually
>>enable these in the revpropindex config file).
>>
>>    
>>
>What happens when someone adds a new revprop to the config file?
>If you already have a repo with some props indexed and then decide to add
>another/remove one, would this require a dump/load cycle?
>  
>
Only if you want to add existing instances of that property to the 
index. Of course we could also add a "svnadmin reindex" command, or 
something similar, for maintaining the revprop index. That might be a 
good idea in any case.

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revision indexes for 1.1

Posted by Chris Foote <cf...@v21.me.uk>.

I have one small question though.

----- Original Message ----- 
From: "Branko ÄOibej" <br...@xbc.nu>
> 3.3 Indexing configuration
>
> The repository grows a new configuration file, conf/revpropindex. The
> format of the file is as follows:
>
>     [propname]
>     unique = [TRUE/false]
>     multiple = [FALSE/true]
>
Wouldn't it be better to make this the start of the conf/repos.conf file for
all repositry
configuration?
 That way only *one* config file would need be read.

> Revprops that do not appear in the config file are not indexed. The
> default contents of this file are:
>
>     [svn:date]
>     unique = false
>     multiple = false
>     [svn:name]
>     unique = false
>     multiple = true
>
> The svn:date and svn:name indexing cannot be turned off, neither can the
> indexing parameters change (in effect, we may as well not actually
> enable these in the revpropindex config file).
>
What happens when someone adds a new revprop to the config file?
If you already have a repo with some props indexed and then decide to add
another/remove one, would this require a dump/load cycle?

>
> -- Brane

Regards,
Chris



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by Branko Čibej <br...@xbc.nu>.

Edmund Horner wrote:

> Branko ibej wrote:
>
> | The filesystem grows a new table, "revpropindex", with the following
> | schema:
> |
> |    (PROP-NAME PROP-VALUE) -> REVNUM
> |
> | Non-unique indexes are allowed.
> |
> | No other changes are necessary. For forward compatibility, servers that
> | do not implement revision indexes will ignore this table; for backward
> | compatibility, if the table does not exist in the repository, revision
> | indexing and search operations are disabled. The dumpfile format does
> | not change, as the contents of the revision index can be reconstructed
> | from revprop data.
>
> Small change in the currence of your choice:
>
> Should that be:
>
> ~    (PROP-NAME PROP-VALUE) -> TXN
>
> instead?

Yes, possibly. Right now IIRC we have revision props, not txn props; 
ideally, we'd have the latter, of course.

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by Edmund Horner <ch...@chrysophylax.cjb.net>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Branko ibej wrote:

| The filesystem grows a new table, "revpropindex", with the following
| schema:
|
|    (PROP-NAME PROP-VALUE) -> REVNUM
|
| Non-unique indexes are allowed.
|
| No other changes are necessary. For forward compatibility, servers that
| do not implement revision indexes will ignore this table; for backward
| compatibility, if the table does not exist in the repository, revision
| indexing and search operations are disabled. The dumpfile format does
| not change, as the contents of the revision index can be reconstructed
| from revprop data.

Small change in the currence of your choice:

Should that be:

~    (PROP-NAME PROP-VALUE) -> TXN

instead?

Edmund.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAg6PVEbvImmpUq7gRAoNOAKCNkHIWyUHqy0QikvUMjiPJX0pEJACfacZ/
Jx5nOKAIv4B29zUU9KYpMbs=
=ldFC
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by Garrett Rooney <ro...@electricjellyfish.net>.

On Apr 19, 2004, at 2:39 AM, Branko Čibej wrote:

> 3.3 Indexing configuration
>
> The repository grows a new configuration file, conf/revpropindex. The
> format of the file is as follows:
>
>    [propname]
>    unique = [TRUE/false]
>    multiple = [FALSE/true]
>
> Revprops that do not appear in the config file are not indexed. The
> default contents of this file are:
>
>    [svn:date]
>    unique = false
>    multiple = false
>    [svn:name]
>    unique = false
>    multiple = true
>
> The svn:date and svn:name indexing cannot be turned off, neither can 
> the
> indexing parameters change (in effect, we may as well not actually
> enable these in the revpropindex config file).

I would suggest also turning svn:author indexing on by default, so that 
something like 'svn log --author kfogel' could be implemented.  I 
personally find myself searching through log history based on username 
quite often at work with perforce, and think that our lack of suck 
support is unfortunate.

-garrett


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by Branko Čibej <br...@xbc.nu>.

Greg Hudson wrote:

>I'm not sure this proposal solves any common problems.
>
>On Mon, 2004-04-19 at 02:39, Branko Čibej wrote:
>  
>
>>This indexing would happen automatically (probably implemented at the
>>repos layer, rather than the filesystem layer),
>>    
>>
>
>I don't like the implication that the FS doesn't maintain its own
>integrity here.
>  
>
I agree. This is in keeping with current practice, e.g., 
svn_repos_dated_revision is a repos-layer function, or hooks not being 
triggered by FS code. Going towards 2.0, I'd definitely want the FS 
schema to change in a way that would better support indexing by revprop.

Perhaps we should implement the indexing at the FS level, at that.

>>    * Multiplicity: Controls whether, for the same propname, more than
>>      one (propname, value) pair can map to a single revision.
>>    
>>
>
>How could you get multiplicity?  A propname can't have more than one
>value in a given revision.
>  
>
This is about how that value is _interpreted_ for indexing purposes. The 
svn:name (or svn:label) property's value would contain several lines, e.g.

    $ svn pg --revprop -r 9789 svn:name
    1.0.3_RC1
    Fix_#1435
    1.0.4_merge_candidate

This would produce three keys in the revprop index that point to r9789" 
(svn:name 1.0.3_RC1), (svn:name Fix_#1435) and (svn:name 
1.0.4_merge_candidate).

This allows us to set several labels on the same revision.

>>The function svn_repos_dated_revision changes: first, it calls
>>svn_repos_revision_search("svn:date", timestamp). If this returns a
>>non-empty list, it returns the oldest revision from this list. Otherwise
>>it performs the current binary search. (The binary-search implementation
>>must stay for backward compatibility. It can be removed in 2.0.)
>>svn_repos_committed_info and svn_repos_history get similar changes.
>>    
>>
>
>I don't think you've made any progress on the date-search problem.  Your
>proposal only works when I ask for a date which *exactly* matches the
>one on the revision.  But -r {date} is supposed to evaluate to the
>revision number which whose date is most recent as of the specified
>date.
>  
>
Yeah, I totally borked this one. My bad, lack of sleep. Rather I 
should've said that svn_repos_dated_revision _uses_ the dates in the 
revision index to find the matching revision. Actually we'd have to 
deprecate this function anyway, and add a new one that accepts a start 
and end date, and returns a list of revision numbers that fall between 
these dates.

>If revisions have their dates out of order, then -r {date}:{date} will
>evaluate to a revision range which does not reflect the date range.  So
>I think we're better off trying to enforce in-date-order revisions than
>we are trying to make our system work with out-of-date-order revisions.
>  
>
No, I'm totally against enforcing the date order, for the reasons I 
stated. Instead we should just fix the API.

>>4.2 svn label [-r revnum/range/list] label-name
>>4.3 svn labeldel [-r revnum/range/list] label-name
>>    
>>
>
>I have an interest in keeping our command set small; I think having a
>large command set makes our learning curve higher.  So I'd prefer to see
>the usual rev-prop commands used to set and remove revision labels if we
>must have them at all.
>  
>
I would agree, except that adding these two commands makes it much, much 
easier to manipulate a) multiple labels on a single revision and b) a 
single label on multiple revisions (you can't actually do the second 
atomically with the rev-prop commands at all).

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: Revision indexes for 1.1

Posted by Greg Hudson <gh...@MIT.EDU>.

I'm not sure this proposal solves any common problems.

On Mon, 2004-04-19 at 02:39, Branko Čibej wrote:
> This indexing would happen automatically (probably implemented at the
> repos layer, rather than the filesystem layer),

I don't like the implication that the FS doesn't maintain its own
integrity here.

>     * Multiplicity: Controls whether, for the same propname, more than
>       one (propname, value) pair can map to a single revision.

How could you get multiplicity?  A propname can't have more than one
value in a given revision.

> The function svn_repos_dated_revision changes: first, it calls
> svn_repos_revision_search("svn:date", timestamp). If this returns a
> non-empty list, it returns the oldest revision from this list. Otherwise
> it performs the current binary search. (The binary-search implementation
> must stay for backward compatibility. It can be removed in 2.0.)
> svn_repos_committed_info and svn_repos_history get similar changes.

I don't think you've made any progress on the date-search problem.  Your
proposal only works when I ask for a date which *exactly* matches the
one on the revision.  But -r {date} is supposed to evaluate to the
revision number which whose date is most recent as of the specified
date.

If revisions have their dates out of order, then -r {date}:{date} will
evaluate to a revision range which does not reflect the date range.  So
I think we're better off trying to enforce in-date-order revisions than
we are trying to make our system work with out-of-date-order revisions.

> 4.2 svn label [-r revnum/range/list] label-name
> 4.3 svn labeldel [-r revnum/range/list] label-name

I have an interest in keeping our command set small; I think having a
large command set makes our learning curve higher.  So I'd prefer to see
the usual rev-prop commands used to set and remove revision labels if we
must have them at all.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org