You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by David Glasser <gl...@davidglasser.net> on 2007/12/14 23:46:28 UTC

What you probably want to know to port the FS changes to BDB (on the reintegrate branch)

The reintegrate branch changes how mergeinfo queries are done, by
adding a couple of optional fields to the noderev structure and using
them instead of the sqlite database for queries.  I implemented this
for FSFS; similar changes need to be made for BDB.  Some of these
changes were made on the sqlite-mergeinfo-without-mergeinfo branch;
others were made directly on the reintegrate branch (which branched
from sqlite-mergeinfo-without-mergeinfo).

Most of the "write" functionality was implemented in r28134 and
r28141.

Note that a side effect of these changes is that the implementations
of the FS APIs svn_fs_get_mergeinfo and svn_fs_get_mergeinfo_for_tree
now raise errors if their arguments are nonexistent paths; this causes
some tests in our suite to fail.  This will need to be fixed in the
higher-level code.

WHAT THE DATA IS
================

There are two new fields on noderevs: a boolean "has-mergeinfo" and a
number (apr_uint64_t) "mergeinfo-count".  "has-mergeinfo" means that
the node itself has the svn:mergeinfo property set on it (possibly to
the empty string).  "mergeinfo-count" is the number of nodes,
including the node itself, in the subtree rooted at the given node
which has the "has-mergeinfo" flag set.  Thus, we can efficiently find
nodes with mergeinfo set by doing a tree crawl, skipping any subtree
where "mergeinfo-count" is not positive.

HOW TO MAINTAIN IT
==================

In general, when a node-revision is being copied (whether internally
to just clone the datastructure, or to make an FS-level copy) the two
fields should be copied as well.

"has-mergeinfo" is set by just one FS API: svn_fs_change_node_prop.
When that API is called on "svn:mergeinfo", the flag is set to
(new_value != NULL).

"mergeinfo-count" is set by several APIs:

  * svn_fs_change_node_prop
  * svn_fs_copy
  * svn_fs_revision_link
  * svn_fs_delete
  * svn_fs_merge

In general, whenever mergeinfo-count is incremented on a node, the
same increment needs to be applied to all of the node's ancestors.
(See libsvn_fs_fs/tree.c(increment_mergeinfo_up_tree).)

In svn_fs_change_node_prop, mergeinfo-count is incremented by 1 if
the has-mergeinfo flag is being changed from false to true, and
decremented by 1 if the flag is being changed from true to false.

svn_fs_copy (and its evil twin svn_fs_revision_link) can change the
mergeinfo-count for the parent of the destination.  Note that copies
can actually be replaces!  So first find the mergeinfo-count for the
target path (or 0 if it's not a replace), and compare that to the
copied node's mergeinfo-count; if they are different, you have to
increment the target's parent's mergeinfo-count by the difference.

svn_fs_delete should subtract the mergeinfo-count of the deleted node
from its parent's mergeinfo-count.

Finally, there's svn_fs_merge, which implements the
pre-commit-finalization merge-up phase.  The bulk of its logic is in
the "merge" function.  I added a mergeinfo_increment_out
output-argument, used for its recursive calls.  Honestly you should
probably just grep that function for "mergeinfo", but here's a quick
overview:

  * Initialize mergeinfo_increment to 0.

  * In the "entry is the same in target and ancestor, but changed in
    the source" block, there are two possibilities:
     * It was deleted in source:
         mergeinfo_increment -= target_entry.mergeinfo_count
     * It was changed in source:
         mergeinfo_increment += (source_entry.mergeinfo_count
                                 - target_entry.mergeinfo_count)
  * When making the recursive call to 'merge', pass it the address of
    a local 'sub_mergeinfo_increment', and then do
         mergeinfo_increment += sub_mergeinfo_increment

  * For each entry added in source, add its mergeinfo-count to
    mergeinfo_increment

  * At the end of the function, increment the target node's
    mergeinfo-count by mergeinfo_increment (but *don't* recurse
    upwards; the fact that we're already recursing through the
    directory structure will take care of that), and return
    mergeinfo_increment in the provided pointer.

The top-level caller of merge can ignore the mergeinfo_increment
return value (or just pass NULL).

HOW TO QUERY IT
===============

Don't use the libsvn_fs_util implementations of
svn_fs_get_mergeinfo[_for_tree] any more.  Write your own that use
this instead.  Basically, just model them on
fs_get_mergeinfo[_for_tree] in libsvn_fs_fs/tree.c.  In fact, some of
that code could maybe even be extracted out into libsvn_fs_util (but
not using sqlite!) if it is too similar.

Ask me if you need this explained in more detail.

Both svn_fs_get_mergeinfo_for_tree and the include_descendents=TRUE
code for svn_fs_get_mergeinfo use a crawler
"crawl_directory_dag_for_mergeinfo", which crawls a tree, only looking
for nodes with mergeinfo, and calls an action on each found node.

Note that the include_descendents parameter only exists on the
reintegrate branch.

ALSO
====

In places where the mergeinfo count is changed (like
svn_fs_fs__dag_increment_mergeinfo_count), I do some minor consistency
checks, like making sure that mergeinfo counts are never negative, and
that they are never greater than 1 on a file.  We'll need to bump the
repository version so that commits from older servers don't mess up
the metadata (and cause these corruption checks to fire).  I haven't
implemented this yet, though.

Let's say you took a repository with some scattered svn:mergeinfo
properties but not metadata, and upgraded it to the new version.
Could the consistency checks start failing?  Actually, no.  The
consistency checks only make sure that the mergeinfo-count numbers are
consistent with each other and with the has-mergeinfo flags.  Note how
in fs_change_node_prop the check to see if we're setting svn:mergeinfo
for the first time is based on the former setting of ther
has-mergeinfo, not whether or not it actually used to have an
svn:mergeinfo property.  This means that if you bump the FS version to
the (proposed) new filesystem format, no consistency checks can fail,
even if there are already svn:mergeinfo properties.  (However, the
queries won't find these properties until they're touched or
something.)

This happens at the FS level, so dump-and-load just magically works.

Pool usage is not so great in the query code (this is mostly marked
with TODO(reint)).


-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What you probably want to know to port the FS changes to BDB (on the reintegrate branch)

Posted by David Glasser <gl...@davidglasser.net>.
On Jan 3, 2008 11:23 AM, Mark Phippard <ma...@gmail.com> wrote:
> On Jan 3, 2008 11:21 AM, David Glasser <gl...@davidglasser.net> wrote:
> > This is the main thing stopping the reintegrate branch from being
> > merged to trunk.  I could work on this, but I'm not too familiar with
> > BDB.  Mike, you mentioned being interested in helping with this; do
> > you think you'll get a chance to any time soon?
>
> I was about to ask what the hold up was.  So thanks for posting this again.

Alternatively, we could temporarily break BDB on trunk.  I'd prefer
not to do that, but it's getting to the point where the main things I
want to do on the reintegrate branch are to fix trunk bugs revealed by
my changes there (mergeinfo queries to nonexistent paths that are
intended to silently succeed) and to combine it with issue-2897 work.
If it's going to be a while before anyone who knows BDB well can port
these changes to BDB, maybe that just means it's OK for it to be a
while before BDB works on trunk?

--dave


-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What you probably want to know to port the FS changes to BDB (on the reintegrate branch)

Posted by Mark Phippard <ma...@gmail.com>.
On Jan 3, 2008 11:21 AM, David Glasser <gl...@davidglasser.net> wrote:
> This is the main thing stopping the reintegrate branch from being
> merged to trunk.  I could work on this, but I'm not too familiar with
> BDB.  Mike, you mentioned being interested in helping with this; do
> you think you'll get a chance to any time soon?

I was about to ask what the hold up was.  So thanks for posting this again.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: What you probably want to know to port the FS changes to BDB (on the reintegrate branch)

Posted by "C. Michael Pilato" <cm...@collab.net>.
David Glasser wrote:
> This is the main thing stopping the reintegrate branch from being
> merged to trunk.  I could work on this, but I'm not too familiar with
> BDB.  Mike, you mentioned being interested in helping with this; do
> you think you'll get a chance to any time soon?

I was sorta waiting for a mail like this one saying, "Go for it."  I've got
a changelist-related fix to test and commit up, but will start working on
the BDB stuff immediately thereafter.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: What you probably want to know to port the FS changes to BDB (on the reintegrate branch)

Posted by David Glasser <gl...@davidglasser.net>.
This is the main thing stopping the reintegrate branch from being
merged to trunk.  I could work on this, but I'm not too familiar with
BDB.  Mike, you mentioned being interested in helping with this; do
you think you'll get a chance to any time soon?

Thanks,
dave

On Dec 14, 2007 6:46 PM, David Glasser <gl...@davidglasser.net> wrote:
> The reintegrate branch changes how mergeinfo queries are done, by
> adding a couple of optional fields to the noderev structure and using
> them instead of the sqlite database for queries.  I implemented this
> for FSFS; similar changes need to be made for BDB.  Some of these
> changes were made on the sqlite-mergeinfo-without-mergeinfo branch;
> others were made directly on the reintegrate branch (which branched
> from sqlite-mergeinfo-without-mergeinfo).
>
> Most of the "write" functionality was implemented in r28134 and
> r28141.
>
> Note that a side effect of these changes is that the implementations
> of the FS APIs svn_fs_get_mergeinfo and svn_fs_get_mergeinfo_for_tree
> now raise errors if their arguments are nonexistent paths; this causes
> some tests in our suite to fail.  This will need to be fixed in the
> higher-level code.
>
> WHAT THE DATA IS
> ================
>
> There are two new fields on noderevs: a boolean "has-mergeinfo" and a
> number (apr_uint64_t) "mergeinfo-count".  "has-mergeinfo" means that
> the node itself has the svn:mergeinfo property set on it (possibly to
> the empty string).  "mergeinfo-count" is the number of nodes,
> including the node itself, in the subtree rooted at the given node
> which has the "has-mergeinfo" flag set.  Thus, we can efficiently find
> nodes with mergeinfo set by doing a tree crawl, skipping any subtree
> where "mergeinfo-count" is not positive.
>
> HOW TO MAINTAIN IT
> ==================
>
> In general, when a node-revision is being copied (whether internally
> to just clone the datastructure, or to make an FS-level copy) the two
> fields should be copied as well.
>
> "has-mergeinfo" is set by just one FS API: svn_fs_change_node_prop.
> When that API is called on "svn:mergeinfo", the flag is set to
> (new_value != NULL).
>
> "mergeinfo-count" is set by several APIs:
>
>   * svn_fs_change_node_prop
>   * svn_fs_copy
>   * svn_fs_revision_link
>   * svn_fs_delete
>   * svn_fs_merge
>
> In general, whenever mergeinfo-count is incremented on a node, the
> same increment needs to be applied to all of the node's ancestors.
> (See libsvn_fs_fs/tree.c(increment_mergeinfo_up_tree).)
>
> In svn_fs_change_node_prop, mergeinfo-count is incremented by 1 if
> the has-mergeinfo flag is being changed from false to true, and
> decremented by 1 if the flag is being changed from true to false.
>
> svn_fs_copy (and its evil twin svn_fs_revision_link) can change the
> mergeinfo-count for the parent of the destination.  Note that copies
> can actually be replaces!  So first find the mergeinfo-count for the
> target path (or 0 if it's not a replace), and compare that to the
> copied node's mergeinfo-count; if they are different, you have to
> increment the target's parent's mergeinfo-count by the difference.
>
> svn_fs_delete should subtract the mergeinfo-count of the deleted node
> from its parent's mergeinfo-count.
>
> Finally, there's svn_fs_merge, which implements the
> pre-commit-finalization merge-up phase.  The bulk of its logic is in
> the "merge" function.  I added a mergeinfo_increment_out
> output-argument, used for its recursive calls.  Honestly you should
> probably just grep that function for "mergeinfo", but here's a quick
> overview:
>
>   * Initialize mergeinfo_increment to 0.
>
>   * In the "entry is the same in target and ancestor, but changed in
>     the source" block, there are two possibilities:
>      * It was deleted in source:
>          mergeinfo_increment -= target_entry.mergeinfo_count
>      * It was changed in source:
>          mergeinfo_increment += (source_entry.mergeinfo_count
>                                  - target_entry.mergeinfo_count)
>   * When making the recursive call to 'merge', pass it the address of
>     a local 'sub_mergeinfo_increment', and then do
>          mergeinfo_increment += sub_mergeinfo_increment
>
>   * For each entry added in source, add its mergeinfo-count to
>     mergeinfo_increment
>
>   * At the end of the function, increment the target node's
>     mergeinfo-count by mergeinfo_increment (but *don't* recurse
>     upwards; the fact that we're already recursing through the
>     directory structure will take care of that), and return
>     mergeinfo_increment in the provided pointer.
>
> The top-level caller of merge can ignore the mergeinfo_increment
> return value (or just pass NULL).
>
> HOW TO QUERY IT
> ===============
>
> Don't use the libsvn_fs_util implementations of
> svn_fs_get_mergeinfo[_for_tree] any more.  Write your own that use
> this instead.  Basically, just model them on
> fs_get_mergeinfo[_for_tree] in libsvn_fs_fs/tree.c.  In fact, some of
> that code could maybe even be extracted out into libsvn_fs_util (but
> not using sqlite!) if it is too similar.
>
> Ask me if you need this explained in more detail.
>
> Both svn_fs_get_mergeinfo_for_tree and the include_descendents=TRUE
> code for svn_fs_get_mergeinfo use a crawler
> "crawl_directory_dag_for_mergeinfo", which crawls a tree, only looking
> for nodes with mergeinfo, and calls an action on each found node.
>
> Note that the include_descendents parameter only exists on the
> reintegrate branch.
>
> ALSO
> ====
>
> In places where the mergeinfo count is changed (like
> svn_fs_fs__dag_increment_mergeinfo_count), I do some minor consistency
> checks, like making sure that mergeinfo counts are never negative, and
> that they are never greater than 1 on a file.  We'll need to bump the
> repository version so that commits from older servers don't mess up
> the metadata (and cause these corruption checks to fire).  I haven't
> implemented this yet, though.
>
> Let's say you took a repository with some scattered svn:mergeinfo
> properties but not metadata, and upgraded it to the new version.
> Could the consistency checks start failing?  Actually, no.  The
> consistency checks only make sure that the mergeinfo-count numbers are
> consistent with each other and with the has-mergeinfo flags.  Note how
> in fs_change_node_prop the check to see if we're setting svn:mergeinfo
> for the first time is based on the former setting of ther
> has-mergeinfo, not whether or not it actually used to have an
> svn:mergeinfo property.  This means that if you bump the FS version to
> the (proposed) new filesystem format, no consistency checks can fail,
> even if there are already svn:mergeinfo properties.  (However, the
> queries won't find these properties until they're touched or
> something.)
>
> This happens at the FS level, so dump-and-load just magically works.
>
> Pool usage is not so great in the query code (this is mostly marked
> with TODO(reint)).
>
>
> --
> David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/
>



-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org