You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Karl Fogel <kf...@galois.collab.net> on 2001/02/13 21:07:53 UTC

Re: New FS API, open_path, cloning and dag_node_t

Branko =?ISO-8859-2?Q?=C8ibej?= <br...@xbc.nu> writes:
> Now, here's the issue: While hammering on tree.c to get it in line with 
> the new FS API, I decded (and it seems that Jim tentatively agrees) that 
> we need a function that will call svn_fs__dag_open repeatedly to 
> traverse a path. It's called open_path in tree.c as of Jim's latest commits.
> 
> Now, often we'll call this function because we only want to read a node, 
> but sometimes we'll want to clone the node (for modifications) or its 
> parent (for delete). The cloning has to bubble up the path until it 
> reaches the root or a mutable node.
> 
> Now, what I'm proposing is that a) dag_node_t keeps track of its parent 
> and its name in the parent, and b) that open_path should construct a 
> list of DAG nodes. This node list could then be used directly to clone 
> the whole path.
> 
> This wouldn't increase processing time or memory footprint. One 
> potential problem is that we'd have to be careful not to hold on to the 
> node list past the lifetime of a trail, but I think we have to watch 
> that anyway.
> 
> Thoughts?

Well...

The parent information should not be recorded in dag_node_t.  A
dag_node_t represents a node in the dag filesystem, and may have
multiple parents (even in one revision tree).

The information you're trying to preserve isn't information the dag
filesystem knows or cares about -- parent information is important
only to the "virtual filesystem", uh, perhaps better called the
"versioning filesystem".  (Whichever it is, I'll call it the vfs from
now on).

A dag_node_t is currently something we can retrieve by key (see
svn_fs_id_t).  The value associated with that key knows nothing about
parent information.  So if dag_node_t had a parent fields, then when
we retrieve a dag_node_t it would have to be incomplete.

Remember svn_fs_node_t?  Tracking this kind of information was one of
the main reasons we had it.  Svn_fs_node_t was a superset of
dag_node_t; it held the dag node and also the metadata needed to
differentiate two nodes in the vfs even if they shared the same
underlying dag node.

So maybe the real question is: should we still have svn_fs_node_t or
something like it, at least to preserve the
path-by-which-we-got-to-this-dag-node and any other information
layered on top of dag nodes?

-Karl

> (Just to illustrate the proposed change:
> 
> Index: dag.c
> ===================================================================
> RCS file: /cvs/subversion/subversion/libsvn_fs/dag.c,v
> retrieving revision 1.14
> diff -u -r1.14 dag.c
> --- dag.c	2001/02/13 16:38:35	1.14
> +++ dag.c	2001/02/13 23:10:35
> @@ -34,6 +34,12 @@
>       into an svn_fs_id_t). */
>    svn_fs_id_t *id;
>  
> +  /* The node's parent node, NULL if this is the root. */
> +  dag_node_t* parent;
> +
> +  /* The node's name in PARENT. */
> +  const char* name;
> +
>    /* The contents of the node (i.e., the DB value, parsed into a
>       skel). */
>    skel_t *contents;
> @@ -204,6 +210,10 @@
>                                 const char *name,
>                                 trail_t *trail)
>  {
> +  /* Do the opening bits, then ... */
> +  child->parent = parent;
> +  child->name = name;
> +
>    abort();
>    /* NOTREACHED */
>    return NULL;
> 
> 
> 
> -- 
> Brane �ibej
>     home:   <br...@xbc.nu>             http://www.xbc.nu/brane/
>     work:   <br...@hermes.si>   http://www.hermes-softlab.com/
>      ACM:   <br...@acm.org>            http://www.acm.org/

Re: New FS API, open_path, cloning and dag_node_t

Posted by Karl Fogel <kf...@galois.collab.net>.

Jim Blandy <ji...@zwingli.cygnus.com> writes:
> >    svn_fs__get_node_revision()
> >    svn_fs__get_txn()
> >    svn_fs__create_successor()
> 
> No.  Those functions will continue to use svn_fs_id_t.  They are
> *used* by the dag filesystem.  I'm talking about removing this
> function *exported* by the dag filesystem:
> 
>     /* Open the mutable node in the transaction named TXN whose ID is ID
>        in FS, as part of TRAIL; set *NODE_P to the new node.  Allocate the
>        node in TRAIL->pool.  */
>     svn_error_t *svn_fs__dag_txn_node (dag_node_t **node_p,
> 				       svn_fs_t *fs,
> 				       const char *txn,
> 				       const svn_fs_id_t *id,
> 				       trail_t *trail);

Cool, thanks.  I hadn't realized that you meant exactly what you
said. :-)

-K

Re: New FS API, open_path, cloning and dag_node_t

Posted by Jim Blandy <ji...@zwingli.cygnus.com>.

Karl Fogel <kf...@galois.collab.net> writes:
> Jim Blandy <ji...@zwingli.cygnus.com> writes:
> > For what it's worth, that interface (creating a dag_node_t given a
> > node ID) was only needed by the clone tracking stuff.  So that
> > function could go away.
> 
> I already responded to this saying "sure, that's cool".
> 
> But before we make such a change, I'd want to understand what methods
> we would use to retrieve dag_node_t's from the database.  The pattern
> would be: first grab a revision or txn root, then to get stuff beneath
> that using svn_fs__dag_open(), right?

Yep.

> And of course, functions that currently return svn_fs_id_t (by
> reference) would return dag_node_t instead... functions such as:
> 
>    svn_fs__get_node_revision()
>    svn_fs__get_txn()
>    svn_fs__create_successor()

No.  Those functions will continue to use svn_fs_id_t.  They are
*used* by the dag filesystem.  I'm talking about removing this
function *exported* by the dag filesystem:

    /* Open the mutable node in the transaction named TXN whose ID is ID
       in FS, as part of TRAIL; set *NODE_P to the new node.  Allocate the
       node in TRAIL->pool.  */
    svn_error_t *svn_fs__dag_txn_node (dag_node_t **node_p,
				       svn_fs_t *fs,
				       const char *txn,
				       const svn_fs_id_t *id,
				       trail_t *trail);

Re: New FS API, open_path, cloning and dag_node_t

Posted by Karl Fogel <kf...@galois.collab.net>.

Jim Blandy <ji...@zwingli.cygnus.com> writes:
> > A dag_node_t is currently something we can retrieve by key (see
> > svn_fs_id_t).  The value associated with that key knows nothing about
> > parent information.  So if dag_node_t had a parent fields, then when
> > we retrieve a dag_node_t it would have to be incomplete.
> 
> For what it's worth, that interface (creating a dag_node_t given a
> node ID) was only needed by the clone tracking stuff.  So that
> function could go away.

I already responded to this saying "sure, that's cool".

But before we make such a change, I'd want to understand what methods
we would use to retrieve dag_node_t's from the database.  The pattern
would be: first grab a revision or txn root, then to get stuff beneath
that using svn_fs__dag_open(), right?

And of course, functions that currently return svn_fs_id_t (by
reference) would return dag_node_t instead... functions such as:

   svn_fs__get_node_revision()
   svn_fs__get_txn()
   svn_fs__create_successor()

Hmmm.  Okay.  If it's the case that everywhere we currently retrieve
an id, the very next thing we do is use that id as a key to get the
node's contents, then the change makes a lot of sense -- why impose an
extra step on callers?  But if that's not the case, then maybe the
status quo is fine.

-K

Re: New FS API, open_path, cloning and dag_node_t

Posted by Karl Fogel <kf...@galois.collab.net>.

Jim Blandy <ji...@zwingli.cygnus.com> writes:
> > A dag_node_t is currently something we can retrieve by key (see
> > svn_fs_id_t).  The value associated with that key knows nothing about
> > parent information.  So if dag_node_t had a parent fields, then when
> > we retrieve a dag_node_t it would have to be incomplete.
> 
> For what it's worth, that interface (creating a dag_node_t given a
> node ID) was only needed by the clone tracking stuff.  So that
> function could go away.

Sure.  I'm calling it right now (grin), but would be happy to change.

-K

Re: open by ID (was: Re: New FS API, open_path, cloning and dag_node_t)

Posted by Greg Stein <gs...@lyra.org>.

On Mon, Feb 19, 2001 at 04:01:19PM -0500, Jim Blandy wrote:
> 
> Greg Stein <gs...@lyra.org> writes:
> > That would be fine, as long as the ID remains the same for an file that
> > wasn't changed from R1 to R2.
> 
> So the ID would be the basis for caching?

Yup.

-- 
Greg Stein, http://www.lyra.org/

Re: open by ID (was: Re: New FS API, open_path, cloning and dag_node_t)

Posted by Jim Blandy <ji...@zwingli.cygnus.com>.

Greg Stein <gs...@lyra.org> writes:
> That would be fine, as long as the ID remains the same for an file that
> wasn't changed from R1 to R2.

So the ID would be the basis for caching?

Re: open by ID (was: Re: New FS API, open_path, cloning and dag_node_t)

Posted by Greg Stein <gs...@lyra.org>.

On Fri, Feb 16, 2001 at 11:07:59AM -0500, Jim Blandy wrote:
> Greg Stein <gs...@lyra.org> writes:
> > > What's the larger picture?  Do you need this for nodes in a
> > > transaction, committed nodes, or both?
> > 
> > Committed nodes only. This is for "svn checkout" or "svn update" -- fetching
> > the file contents and properties. The node ID is the most stable form
> > (perfectly stable, in fact) for referring to a node (imagine that).
> 
> How about I generate ID's for you?  They'd be opaque byte strings.

That would be fine, as long as the ID remains the same for an file that
wasn't changed from R1 to R2.

I'm using "ID/PATH" as the unique value.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: open by ID (was: Re: New FS API, open_path, cloning and dag_node_t)

Posted by Jim Blandy <ji...@zwingli.cygnus.com>.

Greg Stein <gs...@lyra.org> writes:
> > What's the larger picture?  Do you need this for nodes in a
> > transaction, committed nodes, or both?
> 
> Committed nodes only. This is for "svn checkout" or "svn update" -- fetching
> the file contents and properties. The node ID is the most stable form
> (perfectly stable, in fact) for referring to a node (imagine that).

How about I generate ID's for you?  They'd be opaque byte strings.
[looks innocent]

Re: open by ID (was: Re: New FS API, open_path, cloning and dag_node_t)

Posted by Greg Stein <gs...@lyra.org>.

On Thu, Feb 15, 2001 at 11:24:23AM -0500, Jim Blandy wrote:
> Greg Stein <gs...@lyra.org> writes:
>...
> > Um... a while back, I requested a way to get at a node, given an ID. The
> > only thing that I need is read-access to the file contents and its
> > properties. No changes, no parent info, nada.
> 
> Sorry --- I'd forgotten about that.
> 
> This is going to be problematic.  For example, you can't tell whether
> you're authorized to access the file without knowing whether you're
> authorized to access its parent directories.

Hrm. You're right. I *do* have the path, tho!

However, I can't tell you the root (revision) that the path is relative to.
It can be inferred from "use largest value R such that path P refers to node
id N." I'll give you P and N.

It would be an error to have P that doesn't refer to N in any revision.
(this bad query form could happen due to a bad client or a user typing in a
bad URL)

[ note: from an ACL standpoint, I'd think the ACL applies mostly to the node
  itself so that a copy/move doesn't accidentally expose something on
  another path; but I know that's a big debate and I don't think it is
  actually relevant to the above ]

[ I will also posit that the ACL would be the same across all revisions
  where P refers to N, so it shouldn't matter which one you select; if we
  say that it *does* matter, then using the latest should be fine; I would
  hope that somebody can't access rev 6 to skip the ACL that was emplaced in
  rev 7. ]

> What's the larger picture?  Do you need this for nodes in a
> transaction, committed nodes, or both?

Committed nodes only. This is for "svn checkout" or "svn update" -- fetching
the file contents and properties. The node ID is the most stable form
(perfectly stable, in fact) for referring to a node (imagine that).

Transaction nodes appear in a different URL space; I'll always have the
transaction ID and and path.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: open by ID (was: Re: New FS API, open_path, cloning and dag_node_t)

Posted by Jim Blandy <ji...@zwingli.cygnus.com>.

Greg Stein <gs...@lyra.org> writes:
> On Wed, Feb 14, 2001 at 09:24:07AM -0500, Jim Blandy wrote:
> > Karl Fogel <kf...@galois.ch.collab.net> writes:
> >...
> > > A dag_node_t is currently something we can retrieve by key (see
> > > svn_fs_id_t).  The value associated with that key knows nothing about
> > > parent information.  So if dag_node_t had a parent fields, then when
> > > we retrieve a dag_node_t it would have to be incomplete.
> > 
> > For what it's worth, that interface (creating a dag_node_t given a
> > node ID) was only needed by the clone tracking stuff.  So that
> > function could go away.
> 
> Um... a while back, I requested a way to get at a node, given an ID. The
> only thing that I need is read-access to the file contents and its
> properties. No changes, no parent info, nada.

Sorry --- I'd forgotten about that.

This is going to be problematic.  For example, you can't tell whether
you're authorized to access the file without knowing whether you're
authorized to access its parent directories.

What's the larger picture?  Do you need this for nodes in a
transaction, committed nodes, or both?

open by ID (was: Re: New FS API, open_path, cloning and dag_node_t)

Posted by Greg Stein <gs...@lyra.org>.

On Wed, Feb 14, 2001 at 09:24:07AM -0500, Jim Blandy wrote:
> Karl Fogel <kf...@galois.ch.collab.net> writes:
>...
> > A dag_node_t is currently something we can retrieve by key (see
> > svn_fs_id_t).  The value associated with that key knows nothing about
> > parent information.  So if dag_node_t had a parent fields, then when
> > we retrieve a dag_node_t it would have to be incomplete.
> 
> For what it's worth, that interface (creating a dag_node_t given a
> node ID) was only needed by the clone tracking stuff.  So that
> function could go away.

Um... a while back, I requested a way to get at a node, given an ID. The
only thing that I need is read-access to the file contents and its
properties. No changes, no parent info, nada.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: New FS API, open_path, cloning and dag_node_t

Posted by Jim Blandy <ji...@zwingli.cygnus.com>.

Karl Fogel <kf...@galois.ch.collab.net> writes:
> The parent information should not be recorded in dag_node_t.  A
> dag_node_t represents a node in the dag filesystem, and may have
> multiple parents (even in one revision tree).
> 
> The information you're trying to preserve isn't information the dag
> filesystem knows or cares about -- parent information is important
> only to the "virtual filesystem", uh, perhaps better called the
> "versioning filesystem".  (Whichever it is, I'll call it the vfs from
> now on).

Exactly right.

I'd amplify by saying that the whole point of the DAG layer is to be
as simple as possible while still having the ability to ensure the
integrity of the filesystem.  Path traversal is hardly a complicated
thing (although I think it's a bit more complex than one might guess),
but it can be implemented cleanly in terms of a dumb dag, and
*anything* we can keep out of the dag module makes the system a bit
better.

> A dag_node_t is currently something we can retrieve by key (see
> svn_fs_id_t).  The value associated with that key knows nothing about
> parent information.  So if dag_node_t had a parent fields, then when
> we retrieve a dag_node_t it would have to be incomplete.

For what it's worth, that interface (creating a dag_node_t given a
node ID) was only needed by the clone tracking stuff.  So that
function could go away.

Re: New FS API, open_path, cloning and dag_node_t

Posted by Karl Fogel <kf...@galois.collab.net>.

Branko =?ISO-8859-2?Q?=C8ibej?= <br...@xbc.nu> writes:
> > The parent information should not be recorded in dag_node_t.  A
> > dag_node_t represents a node in the dag filesystem, and may have
> > multiple parents (even in one revision tree).
> 
> Yes, but during the lifetime of a trail -- that is, within one 
> "tree.c"-level operation -- we're interested only in the path we used to 
> reach the node.

Hah.  Get your caller out of my interface. :-)

For you, one trail corresponds to one tree.c operation.  But I, in my
splendiferous, magnificent, glittering editorusness, may do any old
operation I please on a dag_node_t, so long as it matches the dag
semantics.

Awa-ha-ha-ha-haaaaah.

> > So maybe the real question is: should we still have svn_fs_node_t or
> > something like it, at least to preserve the
> > path-by-which-we-got-to-this-dag-node and any other information
> > layered on top of dag nodes?
> 
> We do. Otherwise we can't clone a whole path. Remember that we /do/ need 
> to clone a (sub) path in every FS operation that modifies a node. We can 
> only do that if we have a list of dag nodes that corresponds to the path.
> 
> Actually, we need a list of (node, name) pairs, where each "name" is a 
> path component. That's because we have to modify the dir entry called 
> "name" when we clone a node's child.
> 
> open_node can create this list (it has all the bits at its fingertips). 
> Maybe we should have something like this:
> 
>     struct named_node {
>       const char *name;
>       dag_node_t *node;
>     };
>
>     svn_error_t *open_path (apr_array_header_t **named_nodes,
>                             svn_fs_root_t *root,
>                             const char *path,
>                             trail_t *trail);

Sure.  I think we're talking about a data structure at roughly the
same exposure level as skel_t and stuff like that, so go ahead and
whip together whatever you need.

-K

Re: New FS API, open_path, cloning and dag_node_t

Posted by Branko Čibej <br...@xbc.nu>.

Karl Fogel wrote:

> The parent information should not be recorded in dag_node_t.  A
> dag_node_t represents a node in the dag filesystem, and may have
> multiple parents (even in one revision tree).

Yes, but during the lifetime of a trail -- that is, within one 
"tree.c"-level operation -- we're interested only in the path we used to 
reach the node.

> The information you're trying to preserve isn't information the dag
> filesystem knows or cares about -- parent information is important
> only to the "virtual filesystem", uh, perhaps better called the
> "versioning filesystem".  (Whichever it is, I'll call it the vfs from
> now on).
> 
> A dag_node_t is currently something we can retrieve by key (see
> svn_fs_id_t).  The value associated with that key knows nothing about
> parent information.  So if dag_node_t had a parent fields, then when
> we retrieve a dag_node_t it would have to be incomplete.

O.K., that's reasonable. I'm mixing the layers again.

> Remember svn_fs_node_t?  Tracking this kind of information was one of
> the main reasons we had it.  Svn_fs_node_t was a superset of
> dag_node_t; it held the dag node and also the metadata needed to
> differentiate two nodes in the vfs even if they shared the same
> underlying dag node.
> 
> So maybe the real question is: should we still have svn_fs_node_t or
> something like it, at least to preserve the
> path-by-which-we-got-to-this-dag-node and any other information
> layered on top of dag nodes?

We do. Otherwise we can't clone a whole path. Remember that we /do/ need 
to clone a (sub) path in every FS operation that modifies a node. We can 
only do that if we have a list of dag nodes that corresponds to the path.

Actually, we need a list of (node, name) pairs, where each "name" is a 
path component. That's because we have to modify the dir entry called 
"name" when we clone a node's child.

open_node can create this list (it has all the bits at its fingertips). 
Maybe we should have something like this:

    struct named_node {
      const char *name;
      dag_node_t *node;
    };

    svn_error_t *open_path (apr_array_header_t **named_nodes,
                            svn_fs_root_t *root,
                            const char *path,
                            trail_t *trail);




-- 
Brane ďż˝ibej
    home:   <br...@xbc.nu>             http://www.xbc.nu/brane/
    work:   <br...@hermes.si>   http://www.hermes-softlab.com/
     ACM:   <br...@acm.org>            http://www.acm.org/