You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@wandisco.com> on 2011/10/07 15:56:42 UTC

[RFC] API for reading trees from repo or WC

This is a Request For Comments on the creation of an API for reading a
tree of dirs and files from an arbitrary source via a common API.

RATIONALE

1.

I want to re-write svn_client_diff() so that it can diff any tree
against any other tree, where a tree is any of:

  * versioned (rooted at URL@REV in a repository)

  * WC base (rooted at a local abspath in a WC)

  * WC working (rooted at a local abspath in a WC)

  * unversioned on disk (rooted at a local abspath)

and potentially other sources.

I envisage two main code paths:

  diff_two_trees(tree1, tree2)

    Takes two references to trees, and reads directories and files from
tree1 and tree2 as required to find differences and present a unidiff
(or whatever kind of output).

  diff_tree_with_delta(tree1, delta)

    Takes a reference to a base tree, and an svn_delta_editor_t type
delta based on tree1, and reads dirs and files from tree1 as necessary
to present the delta as a unidiff (or whatever kind of output).

2.

It's the right way to design software.  Witness how successful the
pluggable RA system and the delta_editor interfaces have been.  (Note
that the need for editor v2 does not mean the editor was a bad idea.)

3.

I want to re-write all our libsvn_client read-from-tree APIs such as
'cat', 'propget', 'export' etc. so that they use a common "pull from a
tree" API, in order to reduce complexity and unintended differences and
bugs in those implementations.

4.

I want other people to be able to write such functions/features easily.


DESIGN

I'm thinking something like this for a start.

/* Present as a tree:
 *   an unversioned disk tree;
 *   a WC base tree
 *   a WC working tree
 *   a repository tree
 * 
 * The consumer "pulls" parts of the tree and can omit unwanted parts.
 * Consumer can pull any subtree "recursively" for efficient streaming.
 */

/**
 * A readable tree.  This object is used to perform read requests to a
 * repository tree or a working-copy (base or working) tree or any other
 * readable tree.
 *
 * @since New in 1.8.
 */
typedef struct svn_client_tree_t svn_client_tree_t;

/* */
typedef svn_io_dirent2_t svn_client_tree_dirent_t;

/* V-table for #svn_client_tree_t.
 *
 * Paths are relpaths, relative to the tree root.
 * Revision numbers and repository ids are #SVN_INVALID_REVNUM and NULL
 * for an unversioned node (including a node that is a local add/copy/move
 * in a WC working tree).
 */
typedef struct svn_client_tree__vtable_t
{
  /* Fetch the node kind of the node at @a relpath.
   * (### and other metadata? revnum? props?)
   *
   * Set @a *kind to the node kind.
   */
  svn_error_t *(*get_kind)(svn_client_tree_t *tree,
                           svn_node_kind_t *kind,
                           const char *relpath,
                           apr_pool_t *scratch_pool);

  /* Fetch the contents and properties of the file at @a relpath.
   *
   * If @a stream is non-NULL, set @a *stream to a readable stream yielding
   * the contents of the file at @a relpath.  (### ? The stream
   * handlers for @a stream may not perform any operations on @a tree.)
   *
   * If @a props is non-NULL, set @a *props to contain the regular
   * versioned properties of the file (not 'wcprops', 'entryprops', etc.).
   * The hash maps (const char *) names to (#svn_string_t *) values.
   */
  svn_error_t *(*get_file)(svn_client_tree_t *tree,
                           svn_stream_t **stream,
                           apr_hash_t **props,
                           const char *relpath,
                           apr_pool_t *result_pool,
                           apr_pool_t *scratch_pool);

  /* Fetch the entries and properties of the directory at @a relpath.
   *
   * If @a dirents is non-NULL, set @a *dirents to contain all the entries
   * of directory @a relpath.  The keys will be (<tt>const char *</tt>)
   * entry names, and the values (#svn_client_tree_dirent_t *) dirents.
   * Only the @c kind and @c filesize fields are filled in.
   * ### @c special would be useful too.
   *
   * If @a props is non-NULL, set @a *props to contain the regular
   * versioned properties of the file (not 'wcprops', 'entryprops', etc.).
   * The hash maps (const char *) names to (#svn_string_t *) values.
   */
  svn_error_t *(*get_dir)(svn_client_tree_t *tree,
                          apr_hash_t **dirents,
                          apr_hash_t **props,
                          const char *relpath,
                          apr_pool_t *result_pool,
                          apr_pool_t *scratch_pool);

  /* Push a sub-tree into an editor, as a delta against an empty tree.
   * This is useful for efficiency when streaming a (sub-)tree from a
   * remote source. */
  svn_error_t *(*push_as_delta_edit)(svn_client_tree_t *tree,
                                     const char *relpath,
                                     svn_delta_editor_t *editor,
                                     void *edit_baton,
                                     apr_pool_t *result_pool,
                                     apr_pool_t *scratch_pool)
} svn_client_tree__vtable_t;

/* */
struct svn_client_tree_t
{
  const svn_client_tree__vtable_t *vtable;

  /* Pool used to manage this session. */
  apr_pool_t *pool;

  /* Private data for the tree implementation. */
  void *priv;
};



It will no doubt need a bit more sophistication, which I'll discover
when I try to implement and use it.

Thoughts and comments so far?  Any objection to me starting such a thing
in trunk if it sounds like a good idea?

- Julian



Re: [RFC] API for reading trees from repo or WC

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian Foad wrote on Fri, Oct 07, 2011 at 18:12:09 +0100:
> On Fri, 2011-10-07 at 17:29 +0200, Daniel Shahaf wrote:
> > No comments on the general idea as I'm not sufficiently familiar with
> > the internals of the client library.  That said, ...
> 
> Thanks for the comments below.  I've checked in an initial API and
> implementation in the branch 'tree-read-api', without addressing your
> comments yet.

> As for public ... well, if it's good, yes I'd hope so;
> and you said on IRC that in that case we'd prefer a hidden-vtable
> implementation (separate API functions) so I'll change it to that format
> at some point.
> 

Thanks.  For the record, http://subversion.tigris.org/issues/show_bug.cgi?id=1931
is what prompted me to ask about vtable or no vtable; and on IRC other
halfway proposals were suggested, including

  SVN_ERR(vtable->verb(vtable, args));

> - Julian
> 
> 
> > Julian Foad wrote on Fri, Oct 07, 2011 at 14:56:42 +0100:
> > > This is a Request For Comments on the creation of an API for reading a
> > > tree of dirs and files from an arbitrary source via a common API.
> > > 
> > 
> > Would it be a public API?
> > 
> > > /* V-table for #svn_client_tree_t.
> > >  *
> > >  * Paths are relpaths, relative to the tree root.
> > >  * Revision numbers and repository ids are #SVN_INVALID_REVNUM and NULL
> > >  * for an unversioned node (including a node that is a local add/copy/move
> > >  * in a WC working tree).
> > >  */
> > > typedef struct svn_client_tree__vtable_t
> > > {
> > >   /* Fetch the node kind of the node at @a relpath.
> > >    * (### and other metadata? revnum? props?)
> > >    *
> > >    * Set @a *kind to the node kind.
> > >    */
> > >   svn_error_t *(*get_kind)(svn_client_tree_t *tree,
> > >                            svn_node_kind_t *kind,
> > >                            const char *relpath,
> > >                            apr_pool_t *scratch_pool);
> > 
> > Return an svn_client_info2_t ?
> > 
> > > 
> > >   /* Fetch the contents and properties of the file at @a relpath.
> > >   svn_error_t *(*get_file)(svn_client_tree_t *tree,
> > >                            svn_stream_t **stream,
> > >                            apr_hash_t **props,
> > >                            const char *relpath,
> > >                            apr_pool_t *result_pool,
> > >                            apr_pool_t *scratch_pool);
> > > 
> > >   /* Fetch the entries and properties of the directory at @a relpath.
> > >   svn_error_t *(*get_dir)(svn_client_tree_t *tree,
> > >                           apr_hash_t **dirents,
> > >                           apr_hash_t **props,
> > >                           const char *relpath,
> > >                           apr_pool_t *result_pool,
> > >                           apr_pool_t *scratch_pool);
> > > 
> > >   /* Push a sub-tree into an editor, as a delta against an empty tree.
> > >    * This is useful for efficiency when streaming a (sub-)tree from a
> > >    * remote source. */
> > >   svn_error_t *(*push_as_delta_edit)(svn_client_tree_t *tree,
> > >                                      const char *relpath,
> > >                                      svn_delta_editor_t *editor,
> > >                                      void *edit_baton,
> > >                                      apr_pool_t *result_pool,
> > >                                      apr_pool_t *scratch_pool)
> > 
> > Add a "describe this tree" member?  e.g., it could return the
> > PATH_OR_URL@PEG being described.
> > 
> > > } svn_client_tree__vtable_t;
> 
> 

Re: [RFC] API for reading trees from repo or WC

Posted by Julian Foad <ju...@wandisco.com>.
On Fri, 2011-10-07 at 17:29 +0200, Daniel Shahaf wrote:
> No comments on the general idea as I'm not sufficiently familiar with
> the internals of the client library.  That said, ...

Thanks for the comments below.  I've checked in an initial API and
implementation in the branch 'tree-read-api', without addressing your
comments yet.  As for public ... well, if it's good, yes I'd hope so;
and you said on IRC that in that case we'd prefer a hidden-vtable
implementation (separate API functions) so I'll change it to that format
at some point.

- Julian


> Julian Foad wrote on Fri, Oct 07, 2011 at 14:56:42 +0100:
> > This is a Request For Comments on the creation of an API for reading a
> > tree of dirs and files from an arbitrary source via a common API.
> > 
> 
> Would it be a public API?
> 
> > /* V-table for #svn_client_tree_t.
> >  *
> >  * Paths are relpaths, relative to the tree root.
> >  * Revision numbers and repository ids are #SVN_INVALID_REVNUM and NULL
> >  * for an unversioned node (including a node that is a local add/copy/move
> >  * in a WC working tree).
> >  */
> > typedef struct svn_client_tree__vtable_t
> > {
> >   /* Fetch the node kind of the node at @a relpath.
> >    * (### and other metadata? revnum? props?)
> >    *
> >    * Set @a *kind to the node kind.
> >    */
> >   svn_error_t *(*get_kind)(svn_client_tree_t *tree,
> >                            svn_node_kind_t *kind,
> >                            const char *relpath,
> >                            apr_pool_t *scratch_pool);
> 
> Return an svn_client_info2_t ?
> 
> > 
> >   /* Fetch the contents and properties of the file at @a relpath.
> >   svn_error_t *(*get_file)(svn_client_tree_t *tree,
> >                            svn_stream_t **stream,
> >                            apr_hash_t **props,
> >                            const char *relpath,
> >                            apr_pool_t *result_pool,
> >                            apr_pool_t *scratch_pool);
> > 
> >   /* Fetch the entries and properties of the directory at @a relpath.
> >   svn_error_t *(*get_dir)(svn_client_tree_t *tree,
> >                           apr_hash_t **dirents,
> >                           apr_hash_t **props,
> >                           const char *relpath,
> >                           apr_pool_t *result_pool,
> >                           apr_pool_t *scratch_pool);
> > 
> >   /* Push a sub-tree into an editor, as a delta against an empty tree.
> >    * This is useful for efficiency when streaming a (sub-)tree from a
> >    * remote source. */
> >   svn_error_t *(*push_as_delta_edit)(svn_client_tree_t *tree,
> >                                      const char *relpath,
> >                                      svn_delta_editor_t *editor,
> >                                      void *edit_baton,
> >                                      apr_pool_t *result_pool,
> >                                      apr_pool_t *scratch_pool)
> 
> Add a "describe this tree" member?  e.g., it could return the
> PATH_OR_URL@PEG being described.
> 
> > } svn_client_tree__vtable_t;



Re: [RFC] API for reading trees from repo or WC

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
No comments on the general idea as I'm not sufficiently familiar with
the internals of the client library.  That said, ...

Julian Foad wrote on Fri, Oct 07, 2011 at 14:56:42 +0100:
> This is a Request For Comments on the creation of an API for reading a
> tree of dirs and files from an arbitrary source via a common API.
> 

Would it be a public API?

> /* V-table for #svn_client_tree_t.
>  *
>  * Paths are relpaths, relative to the tree root.
>  * Revision numbers and repository ids are #SVN_INVALID_REVNUM and NULL
>  * for an unversioned node (including a node that is a local add/copy/move
>  * in a WC working tree).
>  */
> typedef struct svn_client_tree__vtable_t
> {
>   /* Fetch the node kind of the node at @a relpath.
>    * (### and other metadata? revnum? props?)
>    *
>    * Set @a *kind to the node kind.
>    */
>   svn_error_t *(*get_kind)(svn_client_tree_t *tree,
>                            svn_node_kind_t *kind,
>                            const char *relpath,
>                            apr_pool_t *scratch_pool);

Return an svn_client_info2_t ?

> 
>   /* Fetch the contents and properties of the file at @a relpath.
>   svn_error_t *(*get_file)(svn_client_tree_t *tree,
>                            svn_stream_t **stream,
>                            apr_hash_t **props,
>                            const char *relpath,
>                            apr_pool_t *result_pool,
>                            apr_pool_t *scratch_pool);
> 
>   /* Fetch the entries and properties of the directory at @a relpath.
>   svn_error_t *(*get_dir)(svn_client_tree_t *tree,
>                           apr_hash_t **dirents,
>                           apr_hash_t **props,
>                           const char *relpath,
>                           apr_pool_t *result_pool,
>                           apr_pool_t *scratch_pool);
> 
>   /* Push a sub-tree into an editor, as a delta against an empty tree.
>    * This is useful for efficiency when streaming a (sub-)tree from a
>    * remote source. */
>   svn_error_t *(*push_as_delta_edit)(svn_client_tree_t *tree,
>                                      const char *relpath,
>                                      svn_delta_editor_t *editor,
>                                      void *edit_baton,
>                                      apr_pool_t *result_pool,
>                                      apr_pool_t *scratch_pool)

Add a "describe this tree" member?  e.g., it could return the
PATH_OR_URL@PEG being described.

> } svn_client_tree__vtable_t;

Re: [RFC] API for reading trees from repo or WC

Posted by Greg Stein <gs...@gmail.com>.
On Mon, Oct 10, 2011 at 05:39, Julian Foad <ju...@wandisco.com> wrote:
>...
>> relpath against what? I presume the root of the tree is defined at
>> construction time. Thus, all API calls are relative to that implied
>> root.
>
> Yes; this is stated in the overall doc string.  May need to clarify or
> cross-ref the docs.

Oh, I probably just missed that part. No biggie.

>...
>> [...] that would be "baton" (we never call it "priv").
>
> Wrong; I copied this straight out of
>
> /* The RA session object. */
> struct svn_ra_session_t {

Wow. Had no idea. Stupid thing is that I probably wrote that structure
a decade ago :-P

In any case... we've certainly seemed to settle on "baton" for
private/callback data.

>...

Thanks,
-g

Re: [RFC] API for reading trees from repo or WC

Posted by Julian Foad <ju...@wandisco.com>.
Hi Greg.  Thanks for the feedback.  r1180843 addresses most of this.

On Fri, 2011-10-07, Greg Stein wrote:
> On Fri, Oct 7, 2011 at 09:56, Julian Foad <ju...@wandisco.com> wrote:
> >...
> >  diff_tree_with_delta(tree1, delta)
> >
> >    Takes a reference to a base tree, and an svn_delta_editor_t type
> > delta based on tree1, and reads dirs and files from tree1 as necessary
> > to present the delta as a unidiff (or whatever kind of output).
> 
> You should be using Ev2 rather than the delta editor.

Will do.

> > typedef struct svn_client_tree__vtable_t
> 
> The vtable should be private/encapsulated. 

Done; danielsh mentioned this.


> >  /* Fetch the node kind of the node at @a relpath.
> >   * (### and other metadata? revnum? props?)
> >   *
> >   * Set @a *kind to the node kind.
> >   */
> >  svn_error_t *(*get_kind)(svn_client_tree_t *tree,
> >                           svn_node_kind_t *kind,
> >                           const char *relpath,
> >                           apr_pool_t *scratch_pool);
> 
> relpath against what? I presume the root of the tree is defined at
> construction time. Thus, all API calls are relative to that implied
> root.

Yes; this is stated in the overall doc string.  May need to clarify or
cross-ref the docs.


> >  /* Fetch the contents and properties of the file at @a relpath.
> >   *
> >   * If @a stream is non-NULL, set @a *stream to a readable stream yielding
> >   * the contents of the file at @a relpath.  (### ? The stream
> >   * handlers for @a stream may not perform any operations on @a tree.)
> >   *
> >   * If @a props is non-NULL, set @a *props to contain the regular
> >   * versioned properties of the file (not 'wcprops', 'entryprops', etc.).
> >   * The hash maps (const char *) names to (#svn_string_t *) values.
> >   */
> >  svn_error_t *(*get_file)(svn_client_tree_t *tree,
> >                           svn_stream_t **stream,
> >                           apr_hash_t **props,
> >                           const char *relpath,
> >                           apr_pool_t *result_pool,
> >                           apr_pool_t *scratch_pool);
> 
> I would like to see the notion of a symlink be a first-order item in
> this API.
[...]
> I would suggest svn_kind_t. We can then get rid of svn_wc__db_kind_t,
> and various APIs can be versioned from svn_node_kind_t over to the new
> svn_kind_t.

Good; done.  In r1180839 I declared & defined svn_kind_t in svn_types.h,
and revved a couple of functions (svn_io_check_path, for starters).

> >  /* Fetch the entries and properties of the directory at @a relpath.
> >   *
> >   * If @a dirents is non-NULL, set @a *dirents to contain all the entries
> >   * of directory @a relpath.  The keys will be (<tt>const char *</tt>)
> >   * entry names, and the values (#svn_client_tree_dirent_t *) dirents.
> >   * Only the @c kind and @c filesize fields are filled in.
> 
> Just names are easiest.

Right, will do.

I was thinking, don't want another round trip to fetch the kind, then a
third round trip to fetch the node's contents.  However, that was the
wrong way to think about it; instead, the implementation should feel
free to fetch and cache info in whatever way is efficient for it, so for
example an implementation over an RA session might fetch the node-kinds
at the same time as the entry names (using svn_ra_get_dir), and then the
get_kind() calls can look at that cached info and return fast without
another round trip.

The same argument explains why I initially added the below
'push-to-delta-editor' call ...

> >  /* Push a sub-tree into an editor, as a delta against an empty tree.
> >   * This is useful for efficiency when streaming a (sub-)tree from a
> >   * remote source. */
> 
> I don't see the utility here. Every single node becomes an add_* call
> into the Ev2 interface. Not very complicated, so I'm not sure how this
> helps (or what the problem it is trying to solve).

Well, I was thinking how to stream a large chunk of a tree without
network round trips.  Initially I thought we needed some kind of "send
me a whole subtree" call.  But the better answer is: this interface
doesn't need to know how we do that efficiently, it's just an interface
for reading the data once we've received it.  Make another interface for
setting up the request for a whole subtree, and use this 'tree' API for
examining the response.

> >...
> > struct svn_client_tree_t
> > {
> >  const svn_client_tree__vtable_t *vtable;
> >
> >  /* Pool used to manage this session. */
> >  apr_pool_t *pool;
> >
> >  /* Private data for the tree implementation. */
> >  void *priv;
> > };
> 
> [...] that would be "baton" (we never call it "priv").

Wrong; I copied this straight out of 

/* The RA session object. */
struct svn_ra_session_t {
  const svn_ra__vtable_t *vtable;

  /* Pool used to manage this session. */
  apr_pool_t *pool;

  /* Private data for the RA implementation. */
  void *priv;
};

:-)

But I'm not personally attached to this name; we can change them.

- Julian



Re: [RFC] API for reading trees from repo or WC

Posted by Greg Stein <gs...@gmail.com>.
On Fri, Oct 7, 2011 at 09:56, Julian Foad <ju...@wandisco.com> wrote:
>...
>  diff_tree_with_delta(tree1, delta)
>
>    Takes a reference to a base tree, and an svn_delta_editor_t type
> delta based on tree1, and reads dirs and files from tree1 as necessary
> to present the delta as a unidiff (or whatever kind of output).

You should be using Ev2 rather than the delta editor.

>...
> /**
>  * A readable tree.  This object is used to perform read requests to a
>  * repository tree or a working-copy (base or working) tree or any other
>  * readable tree.
>  *
>  * @since New in 1.8.
>  */
> typedef struct svn_client_tree_t svn_client_tree_t;
>
> /* */
> typedef svn_io_dirent2_t svn_client_tree_dirent_t;
>
> /* V-table for #svn_client_tree_t.
>  *
>  * Paths are relpaths, relative to the tree root.
>  * Revision numbers and repository ids are #SVN_INVALID_REVNUM and NULL
>  * for an unversioned node (including a node that is a local add/copy/move
>  * in a WC working tree).
>  */
> typedef struct svn_client_tree__vtable_t

The vtable should be private/encapsulated. See how svn_editor_t
handles its vtable, along with the various callback-setters. The
code/feature-set is much easier to extend, and the API is much easier
to modify when the vtable is hidden.

> {
>  /* Fetch the node kind of the node at @a relpath.
>   * (### and other metadata? revnum? props?)
>   *
>   * Set @a *kind to the node kind.
>   */
>  svn_error_t *(*get_kind)(svn_client_tree_t *tree,
>                           svn_node_kind_t *kind,
>                           const char *relpath,
>                           apr_pool_t *scratch_pool);

relpath against what? I presume the root of the tree is defined at
construction time. Thus, all API calls are relative to that implied
root.

>
>  /* Fetch the contents and properties of the file at @a relpath.
>   *
>   * If @a stream is non-NULL, set @a *stream to a readable stream yielding
>   * the contents of the file at @a relpath.  (### ? The stream
>   * handlers for @a stream may not perform any operations on @a tree.)
>   *
>   * If @a props is non-NULL, set @a *props to contain the regular
>   * versioned properties of the file (not 'wcprops', 'entryprops', etc.).
>   * The hash maps (const char *) names to (#svn_string_t *) values.
>   */
>  svn_error_t *(*get_file)(svn_client_tree_t *tree,
>                           svn_stream_t **stream,
>                           apr_hash_t **props,
>                           const char *relpath,
>                           apr_pool_t *result_pool,
>                           apr_pool_t *scratch_pool);

I would like to see the notion of a symlink be a first-order item in
this API. The hope is to move away from exposing svn:special and
treating all symlinks as their own type. (and yes, this kinda messes
with your get_kind using svn_node_kind_t)

>
>  /* Fetch the entries and properties of the directory at @a relpath.
>   *
>   * If @a dirents is non-NULL, set @a *dirents to contain all the entries
>   * of directory @a relpath.  The keys will be (<tt>const char *</tt>)
>   * entry names, and the values (#svn_client_tree_dirent_t *) dirents.
>   * Only the @c kind and @c filesize fields are filled in.

Just names are easiest. ie. return an array of child names. If you
start returning structures, then you're going to get into versioning
hell for those structures. They become giant gloms of random data.
(ref: the old svn_wc_entry_t, the various svn_info_t structures, and
the haphazard svn_wc_status_t structures).

>   * ### @c special would be useful too.

Screw special. Use kind properly.

I would suggest svn_kind_t. We can then get rid of svn_wc__db_kind_t,
and various APIs can be versioned from svn_node_kind_t over to the new
svn_kind_t.

>   *
>   * If @a props is non-NULL, set @a *props to contain the regular
>   * versioned properties of the file (not 'wcprops', 'entryprops', etc.).
>   * The hash maps (const char *) names to (#svn_string_t *) values.
>   */
>  svn_error_t *(*get_dir)(svn_client_tree_t *tree,
>                          apr_hash_t **dirents,
>                          apr_hash_t **props,
>                          const char *relpath,
>                          apr_pool_t *result_pool,
>                          apr_pool_t *scratch_pool);
>
>  /* Push a sub-tree into an editor, as a delta against an empty tree.
>   * This is useful for efficiency when streaming a (sub-)tree from a
>   * remote source. */

I don't see the utility here. Every single node becomes an add_* call
into the Ev2 interface. Not very complicated, so I'm not sure how this
helps (or what the problem it is trying to solve).

>...
> struct svn_client_tree_t
> {
>  const svn_client_tree__vtable_t *vtable;
>
>  /* Pool used to manage this session. */
>  apr_pool_t *pool;
>
>  /* Private data for the tree implementation. */
>  void *priv;
> };

This should be encapsulated. See svn_editor_t. And that would be
"baton" (we never call it "priv").

>...

Cheers,
-g