You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Shahaf <da...@elego.de> on 2011/09/20 15:38:33 UTC

BDB: implementing 'upgrade' for fs-successor-ids

To implement 'svnadmin upgrade' for fs-successor-ids, it seems we want
code to do the following:

  def construct_successors_in_existing_fs(fs):
    def handle_range(range):
      for revision in range:
        for node-rev-id in fs->revisions[revision]:
          Add to 'successors' table: key=fs->nodes[node-rev-id].pred-id value=node-rev-id

    For range in 0..999, 1000..1999, ..., N*1000..youngest:
      in a txn, handle_range(range)

    in a txn, handle_range( (old youngest)+1..[txn's value of "youngest"] )

Now, handle_range() needs to work with two tables at once --- reading
from NODES and writing to SUCCESSORS --- and I haven't found any
precedent in subversion/libsvn_fs_base/bdb/*.c for code that reads two
tables at once.  (locks-table.c reads from bfd->lock_tokens, but it
doesn't read from bfd->locks in that function; and everything else only
reads from the table the file is named after.)

So, I'm asking what's the proper layout for this code: do I make
a function in bdb/successors-table.c that reads from NODES and writes to
SUCCESSORS?  Or do I have a higher-level function (in, say, node-rev.c)
that runs within a txn, reading NODES and calling SUCCESSORS?

Thanks!

Daniel

Re: implementing 'upgrade' for fs-successor-ids

Posted by Daniel Shahaf <da...@elego.de>.
Daniel Shahaf wrote on Tue, Sep 20, 2011 at 16:38:33 +0300:
> To implement 'svnadmin upgrade' for fs-successor-ids, it seems we want
> code to do the following:
> 
>   def construct_successors_in_existing_fs(fs):
>     def handle_range(range):
>       for revision in range:
>         for node-rev-id in fs->revisions[revision]:
>           Add to 'successors' table: key=fs->nodes[node-rev-id].pred-id value=node-rev-id
> 

Another issue here (which applies to FSFS too): how to store the
'partial progress'?

I'm thinking we could maintain a record of max-successors-revision,
initialized to 0 by 'svnadmin upgrade', updated by the "Construct
successors in existing FS" procedure each time it finished handling
a revision, and finally deleted once the (atomically) youngest revision
was handled.

(In particular, this assumes that we will not require the successors
store to be constructed as a precondition to bumping the FS format
number.)

The public API's implementation can then, as long as the record exist,
directly return an error.  (We could optionally implement a mode where
the API returns the results already stored and in addition says
"I am only returning the successors not younger than rX".)

The record can be implemented as a 'miscellaneous' table entry for BDB
and either as a line in the 'format' file or as a new file in the root
of the FS (sibling to min-unpacked-rev, format, et al) for FSFS.


>     For range in 0..999, 1000..1999, ..., N*1000..youngest:
>       in a txn, handle_range(range)
> 
>     in a txn, handle_range( (old youngest)+1..[txn's value of "youngest"] )

Re: BDB: implementing 'upgrade' for fs-successor-ids

Posted by Daniel Shahaf <da...@elego.de>.
C. Michael Pilato wrote on Wed, Sep 21, 2011 at 11:50:40 -0400:
> On 09/21/2011 11:03 AM, Daniel Shahaf wrote:
> >> But before we press on here, I'd like to understanding your bigger-picture
> >> view.
> > 
> > The branch operates on the assumption that an efficiently-queryable
> > successors store should be managed by the FS.  In this thread I'm
> > further assuming that creating successors would be expensive and
> > therefore 'svnadmin upgrade' should create a 'miscellaneous' table
> > record and bump the format number.
> > 
> > There is a concurrent thread by Stefan2 that challenges both of these
> > assumptions.  I don't know that we have consensus yet whether the design
> > in that thread or the design currently on the branch are better.  (And,
> > yes, figuring that is the second thing at the top of my list, next to
> > figuring out how to implement 'upgrade' on the branch.)
> 
> Yeah, I'm not following Stefan2's thread very closely.  But regardless of
> what he thinks Subversion *should* have, I don't know of any reasons why it
> should *not* have this successor-id mapping.
> 

On a high level, I recall Stefan2 was suggesting a design that focuses
not on node-rev successors but on high-level copy operation, and that is
not FS-backend-specific.

> >> Why are you choosing to this by-revision in fs_base rather than using
> >> a more lower-level, largely-Subversion-ignorant approach?  Is it
> >> specifically so you can have an interruptible/restartable process?  Is it so
> >> you can hook into some pre-existing per-revision subsystem (notification,
> >> perhaps)?
> > 
> > I was simply trying to outline an algorithm for populating the
> > successors store from scratch in a live FS.  (And yes, both
> > restartability and notification are nice properties to have.)
> 
> Okay.  I'm not sure that I would take the same course in a live FS versus an
> offline one, and you've been referring to 'upgrade' which shouldn't be run
> on a live FS -- that is, it should make the FS effectively "not live" for
> the duration of the upgrade.  So, I'm a touch confused about what
> specifically you are aiming at.
> 

What I'm thinking is as follows:

  base_upgrade():
    - create 'miscellaneous' table entry
    - set the stored format number to 5

  add_successors_to_f5_fs():
    - backfill successors and remove 'miscellaneous' tables entry

  base_history_next():
    - assert format >= 5
    - assert no 'miscellaneous' table entry
    - do whatever it does today

This makes base_upgrade() a cheap operation.  I was trying to make
add_successors_to_f5_fs() not block concurrent writers more than
necessary.

For add_successors_to_f5_fs(), I assumed operating by-revision would
result in smaller transactions and thus better behaviour for concurrent
readers/writers.  It's also what I imagined the algorithm for FSFS would
be.

> But here's the extent of my assumptions:  you want to backfill successors as
> quickly, efficiently, and painlessly as possible, ideally without
> interrupting live operation of the repository.  Is that fair?  :-)
> 

Yes :-)

> > It's not clear to me exactly what the alternatives your question refers
> > to are.  Could you elaborate on them, please?
> 
> Well, BDB being a real database, we can do this sort of backfill operation
> without attending to any higher-level Subversion concepts such as revisions
> at all.  A cursor walk through the `nodes' table should suffice:
> 
>    for key, value in nodes_table.rows()
>       successor_id = key
>       node_rev = parse_node_revision_skel(value)
>       successors_table.add_row(node_rev.predecessor_id, successor_id)
> 

I see what you mean now, thanks.  (See above for why I went for
a by-revision algorithm.)

> -- 
> C. Michael Pilato <cm...@collab.net>
> CollabNet   <>   www.collab.net   <>   Distributed Development On Demand
> 



Re: BDB: implementing 'upgrade' for fs-successor-ids

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 09/21/2011 12:58 PM, Daniel Shahaf wrote:
> C. Michael Pilato wrote on Wed, Sep 21, 2011 at 12:13:17 -0400:
>> On 09/21/2011 11:50 AM, C. Michael Pilato wrote:
>>> Well, BDB being a real database, we can do this sort of backfill operation
>>> without attending to any higher-level Subversion concepts such as revisions
>>> at all.  A cursor walk through the `nodes' table should suffice:
>>>
>>>    for key, value in nodes_table.rows()
>>>       successor_id = key
>>>       node_rev = parse_node_revision_skel(value)
>>>       successors_table.add_row(node_rev.predecessor_id, successor_id)
>>
>> I should point out, though, that this approach is definitely *not* to be
>> used on a live repository.
>>
> 
> ... because new entries might be inserted to NODES by concurrent
> writers, and preventing that requires an exclusive lock that would
> block concurrent readers and writers for too long?

Precisely.  We need a sort of check-off list of which nodes-rev have been
processed, right?  That list is as simple as "all previous rows" when using
a cursor in an offline repository, but isn't so clean otherwise.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: BDB: implementing 'upgrade' for fs-successor-ids

Posted by Daniel Shahaf <da...@elego.de>.
C. Michael Pilato wrote on Wed, Sep 21, 2011 at 12:13:17 -0400:
> On 09/21/2011 11:50 AM, C. Michael Pilato wrote:
> > Well, BDB being a real database, we can do this sort of backfill operation
> > without attending to any higher-level Subversion concepts such as revisions
> > at all.  A cursor walk through the `nodes' table should suffice:
> > 
> >    for key, value in nodes_table.rows()
> >       successor_id = key
> >       node_rev = parse_node_revision_skel(value)
> >       successors_table.add_row(node_rev.predecessor_id, successor_id)
> 
> I should point out, though, that this approach is definitely *not* to be
> used on a live repository.
> 

... because new entries might be inserted to NODES by concurrent
writers, and preventing that requires an exclusive lock that would
block concurrent readers and writers for too long?

> -- 
> C. Michael Pilato <cm...@collab.net>
> CollabNet   <>   www.collab.net   <>   Distributed Development On Demand
> 



Re: BDB: implementing 'upgrade' for fs-successor-ids

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 09/21/2011 11:50 AM, C. Michael Pilato wrote:
> Well, BDB being a real database, we can do this sort of backfill operation
> without attending to any higher-level Subversion concepts such as revisions
> at all.  A cursor walk through the `nodes' table should suffice:
> 
>    for key, value in nodes_table.rows()
>       successor_id = key
>       node_rev = parse_node_revision_skel(value)
>       successors_table.add_row(node_rev.predecessor_id, successor_id)

I should point out, though, that this approach is definitely *not* to be
used on a live repository.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: BDB: implementing 'upgrade' for fs-successor-ids

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 09/21/2011 11:03 AM, Daniel Shahaf wrote:
>> But before we press on here, I'd like to understanding your bigger-picture
>> view.
> 
> The branch operates on the assumption that an efficiently-queryable
> successors store should be managed by the FS.  In this thread I'm
> further assuming that creating successors would be expensive and
> therefore 'svnadmin upgrade' should create a 'miscellaneous' table
> record and bump the format number.
> 
> There is a concurrent thread by Stefan2 that challenges both of these
> assumptions.  I don't know that we have consensus yet whether the design
> in that thread or the design currently on the branch are better.  (And,
> yes, figuring that is the second thing at the top of my list, next to
> figuring out how to implement 'upgrade' on the branch.)

Yeah, I'm not following Stefan2's thread very closely.  But regardless of
what he thinks Subversion *should* have, I don't know of any reasons why it
should *not* have this successor-id mapping.

>> Why are you choosing to this by-revision in fs_base rather than using
>> a more lower-level, largely-Subversion-ignorant approach?  Is it
>> specifically so you can have an interruptible/restartable process?  Is it so
>> you can hook into some pre-existing per-revision subsystem (notification,
>> perhaps)?
> 
> I was simply trying to outline an algorithm for populating the
> successors store from scratch in a live FS.  (And yes, both
> restartability and notification are nice properties to have.)

Okay.  I'm not sure that I would take the same course in a live FS versus an
offline one, and you've been referring to 'upgrade' which shouldn't be run
on a live FS -- that is, it should make the FS effectively "not live" for
the duration of the upgrade.  So, I'm a touch confused about what
specifically you are aiming at.

But here's the extent of my assumptions:  you want to backfill successors as
quickly, efficiently, and painlessly as possible, ideally without
interrupting live operation of the repository.  Is that fair?  :-)

> It's not clear to me exactly what the alternatives your question refers
> to are.  Could you elaborate on them, please?

Well, BDB being a real database, we can do this sort of backfill operation
without attending to any higher-level Subversion concepts such as revisions
at all.  A cursor walk through the `nodes' table should suffice:

   for key, value in nodes_table.rows()
      successor_id = key
      node_rev = parse_node_revision_skel(value)
      successors_table.add_row(node_rev.predecessor_id, successor_id)

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: BDB: implementing 'upgrade' for fs-successor-ids

Posted by Daniel Shahaf <da...@elego.de>.
C. Michael Pilato wrote on Wed, Sep 21, 2011 at 10:30:42 -0400:
> On 09/20/2011 09:38 AM, Daniel Shahaf wrote:
> > To implement 'svnadmin upgrade' for fs-successor-ids, it seems we want
> > code to do the following:
> > 
> >   def construct_successors_in_existing_fs(fs):
> >     def handle_range(range):
> >       for revision in range:
> >         for node-rev-id in fs->revisions[revision]:
> >           Add to 'successors' table: key=fs->nodes[node-rev-id].pred-id value=node-rev-id
> > 
> >     For range in 0..999, 1000..1999, ..., N*1000..youngest:
> >       in a txn, handle_range(range)
> > 
> >     in a txn, handle_range( (old youngest)+1..[txn's value of "youngest"] )
> > 
> > Now, handle_range() needs to work with two tables at once --- reading
> > from NODES and writing to SUCCESSORS --- and I haven't found any
> > precedent in subversion/libsvn_fs_base/bdb/*.c for code that reads two
> > tables at once.  (locks-table.c reads from bfd->lock_tokens, but it
> > doesn't read from bfd->locks in that function; and everything else only
> > reads from the table the file is named after.)
> > 
> > So, I'm asking what's the proper layout for this code: do I make
> > a function in bdb/successors-table.c that reads from NODES and writes to
> > SUCCESSORS?  Or do I have a higher-level function (in, say, node-rev.c)
> > that runs within a txn, reading NODES and calling SUCCESSORS?
> 
> I would probably go with a higher-level function that calls the appropriate
> database-specific functions in the bdb/* layer.
> 
> But before we press on here, I'd like to understanding your bigger-picture
> view.

The branch operates on the assumption that an efficiently-queryable
successors store should be managed by the FS.  In this thread I'm
further assuming that creating successors would be expensive and
therefore 'svnadmin upgrade' should create a 'miscellaneous' table
record and bump the format number.

There is a concurrent thread by Stefan2 that challenges both of these
assumptions.  I don't know that we have consensus yet whether the design
in that thread or the design currently on the branch are better.  (And,
yes, figuring that is the second thing at the top of my list, next to
figuring out how to implement 'upgrade' on the branch.)

> Why are you choosing to this by-revision in fs_base rather than using
> a more lower-level, largely-Subversion-ignorant approach?  Is it
> specifically so you can have an interruptible/restartable process?  Is it so
> you can hook into some pre-existing per-revision subsystem (notification,
> perhaps)?
> 

I was simply trying to outline an algorithm for populating the
successors store from scratch in a live FS.  (And yes, both
restartability and notification are nice properties to have.)

It's not clear to me exactly what the alternatives your question refers
to are.  Could you elaborate on them, please?

(I'm not particularly familiar with BDB or with libsvn_fs_base, so it's
conceivable that I overlooked some potential approaches.)

> (As to you follow-up mail, yes, the `miscellaneous' table is the best place
> to store a progress pointer.)
> 

*nod*.

> -- 
> C. Michael Pilato <cm...@collab.net>
> CollabNet   <>   www.collab.net   <>   Distributed Development On Demand
> 

Thanks for the feedback,

Daniel


Re: BDB: implementing 'upgrade' for fs-successor-ids

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 09/20/2011 09:38 AM, Daniel Shahaf wrote:
> To implement 'svnadmin upgrade' for fs-successor-ids, it seems we want
> code to do the following:
> 
>   def construct_successors_in_existing_fs(fs):
>     def handle_range(range):
>       for revision in range:
>         for node-rev-id in fs->revisions[revision]:
>           Add to 'successors' table: key=fs->nodes[node-rev-id].pred-id value=node-rev-id
> 
>     For range in 0..999, 1000..1999, ..., N*1000..youngest:
>       in a txn, handle_range(range)
> 
>     in a txn, handle_range( (old youngest)+1..[txn's value of "youngest"] )
> 
> Now, handle_range() needs to work with two tables at once --- reading
> from NODES and writing to SUCCESSORS --- and I haven't found any
> precedent in subversion/libsvn_fs_base/bdb/*.c for code that reads two
> tables at once.  (locks-table.c reads from bfd->lock_tokens, but it
> doesn't read from bfd->locks in that function; and everything else only
> reads from the table the file is named after.)
> 
> So, I'm asking what's the proper layout for this code: do I make
> a function in bdb/successors-table.c that reads from NODES and writes to
> SUCCESSORS?  Or do I have a higher-level function (in, say, node-rev.c)
> that runs within a txn, reading NODES and calling SUCCESSORS?

I would probably go with a higher-level function that calls the appropriate
database-specific functions in the bdb/* layer.

But before we press on here, I'd like to understanding your bigger-picture
view.  Why are you choosing to this by-revision in fs_base rather than using
a more lower-level, largely-Subversion-ignorant approach?  Is it
specifically so you can have an interruptible/restartable process?  Is it so
you can hook into some pre-existing per-revision subsystem (notification,
perhaps)?

(As to you follow-up mail, yes, the `miscellaneous' table is the best place
to store a progress pointer.)

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand