You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@wandisco.com> on 2011/11/03 19:10:22 UTC

Editor v2 - suggestions and queries

Here, in the form of a patch, are many suggestions and queries I've
made on Ev2. I'm not intimately familiar with the design and goals of
it, I'm just responding to how it's currently written up in the header
file, with some memory of past discussions.  Discussion welcomed.

[[[
Index: subversion/include/svn_editor.h
===================================================================
--- subversion/include/svn_editor.h	(revision 1197094)
+++ subversion/include/svn_editor.h	(working copy)
@@ -177,18 +177,25 @@
  *    \n\n
  *    Just before each callback invocation is carried out, the @a cancel_func
  *    that was passed to svn_editor_create() is invoked to poll any
  *    external reasons to cancel the sequence of operations.  Unless it
  *    overrides the cancellation (denoted by #SVN_ERR_CANCELLED), the driver
  *    aborts the transmission by invoking the svn_editor_abort() callback.
  *    Exceptions to this are calls to svn_editor_complete() and
  *    svn_editor_abort(), which cannot be canceled externally.
  *
+ *    ### JAF: It should check for cancellation before 'complete'.
+ *        Imagine the user runs 'svn commit large-new-file' and then
+ *        tries to cancel it, and imagine svn_editor_add() does not
+ *        internally respond to cancellation. If the driver then calls
+ *        'complete' without intercepting the cancellation, the user
+ *        would not be happy to see that commit being completed.
+ *
  * - @b Receive: While the driver invokes operations upon the editor, the
  *    receiver finds its callback functions called with the information
  *    to operate on its tree. Each actual callback function receives those
  *    arguments that the driver passed to the "driving" functions, plus these:
  *    -  @a baton: This is the @a editor_baton pointer originally passed to
  *       svn_editor_create().  It may be freely used by the callback
  *       implementation to store information across all callbacks.
  *    -  @a scratch_pool: This temporary pool is cleared directly after
  *       each callback returns.  See "Pool Usage".
@@ -203,76 +210,111 @@
  *
  * <h3>Driving Order Restrictions</h3>
  * In order to reduce complexity of callback receivers, the editor callbacks
  * must be driven in adherence to these rules:
  *
  * - svn_editor_add_directory() -- Another svn_editor_add_*() call must
  *   follow for each child mentioned in the @a children argument of any
  *   svn_editor_add_directory() call.
  *
+ *   ### JAF: ... must follow at any time after the parent add_directory(),
+ *       and not before it.
+ *
  * - svn_editor_set_props()
  *   - The @a complete argument must be TRUE if no more calls will follow on
  *     the same path. @a complete must always be TRUE for directories.
  *   - If @a complete is FALSE, and:
  *     - if @a relpath is a file, this must (at some point) be followed by
  *       an svn_editor_set_text() call on the same path.
  *     - if @a relpath is a symlink, this must (at some point) be followed by
  *       an svn_editor_set_target() call on the same path.
  *
  * - svn_editor_set_text() and svn_editor_set_target() must always occur
  *   @b after an svn_editor_set_props() call on the same path, if any.
  *
  *   In other words, if there are two calls coming in on the same path, the
  *   first of them has to be svn_editor_set_props().
  *
+ * ### JAF: The set_* functions treat the properties of a node as
+ *     separate from the node's text or symlink target, whereas the
+ *     add_* functions treat the properties as an integral part of each
+ *     kind of node. This seems like a needless difference. It would make
+ *     sense for set_text() to take the properties with the text, and let
+ *     the implementation worry about whether the props and/or the text
+ *     have actually changed (as well as how to transmit the changes
+ *     most efficiently). For symlinks, include the properties with the
+ *     'set target' call (rename to 'set symlink') and for directories
+ *     introduce a 'set directory' function, and remove the set_props()
+ *     function. This would eliminate the exception in the "Once Rule".
+ *
  * - Other than the above two pairs of linked operations, a path should
  *   never be referenced more than once by the add_* and set_* and the
  *   delete operations (the "Once Rule"). The source path of a copy (and
  *   its children, if a directory) may be copied many times, and are
  *   otherwise subject to the Once Rule. The destination path of a copy
  *   or move may have set_* operations applied, but not add_* or delete.
  *   If the destination path of a copy or move is a directory, then its
  *   children are subject to the Once Rule. The source path of a move
  *   (and its child paths) may be referenced in add_*, or as the
  *   destination of a copy (where these new, copied nodes are subject to
  *   the Once Rule).
  *
  * - The ancestor of an added or modified node may not be deleted. The
  *   ancestor may not be moved (instead: perform the move, *then* the edits).
+ *
+ * ### JAF: Ancestor == parent dir?
+ *
+ * ### JAF: The ancestor of a node that is modified or added or copied-here
+ *     or moved-here ...?
+ *
+ * ### JAF: The ancestor of ... [can | may not?] be copied or moved.
  *
  * - svn_editor_delete() must not be used to replace a path -- i.e.
  *   svn_editor_delete() must not be followed by an svn_editor_add_*() on
  *   the same path, nor by an svn_editor_copy() or svn_editor_move() with
  *   the same path as the copy/move target.
+ *
+ * ### JAF: Nor followed by set_*().
+ *
+ * ### JAF: Nor followed by copy() or move() with the same path as the
+ *     copy/move source?
  *
  *   Instead of a prior delete call, the add/copy/move callbacks should be
  *   called with the @a replaces_rev argument set to the revision number of
  *   the node at this path that is being replaced.  Note that the path and
  *   revision number are the key to finding any other information about the
  *   replaced node, like node kind, etc.
  *   @todo say which function(s) to use.
+ *
+ * ### JAF: Can we have a way to refer to replacing a node in a tree that
+ *     is not a committed revision and so doesn't have a revision number?
+ *     This seems to be the only place where the editor definition requires
+ *     that a target node is a (copy of a) committed repository node.
  *
- * - svn_editor_delete() must not be used to move a path -- i.e.
- *   svn_editor_delete() must not delete the source path of a previous
+ * - svn_editor_delete() should not be used to move a path -- i.e.
+ *   svn_editor_delete() should not delete the source path of a previous
  *   svn_editor_copy() call. Instead, svn_editor_move() must be used.
  *   Note: if the desired semantics is one (or more) copies, followed
  *   by a delete... that is fine. It is simply that svn_editor_move()
  *   should be used to describe a semantic move.
  *
  * - One of svn_editor_complete() or svn_editor_abort() must be called
  *   exactly once, which must be the final call the driver invokes.
  *   Invoking svn_editor_complete() must imply that the set of changes has
  *   been transmitted completely and without errors, and invoking
  *   svn_editor_abort() must imply that the transformation was not completed
  *   successfully.
  *
  * - If any callback invocation returns with an error, the driver must
  *   invoke svn_editor_abort() and stop transmitting operations.
+ *
+ * - ### JAF: Some restriction analogous to the receiver's "All callbacks
+ *       must complete their handling of a path before they return ..."?
  * \n\n
  *
  * <h3>Receiving Restrictions</h3>
  * All callbacks must complete their handling of a path before they
  * return, except for the following pairs, where a change must be completed
  * when receiving the second callback in each pair:
  *  - svn_editor_set_props() (if @a complete is FALSE) and
  *    svn_editor_set_text() (if the node is a file)
  *  - svn_editor_set_props() (if @a complete is FALSE) and
@@ -328,25 +370,45 @@
  * calling any of the driving functions (e.g. svn_editor_add_directory()).
  * As with any other error, the driver must then invoke svn_editor_abort()
  * and abort the transformation sequence. See #svn_cancel_func_t.
  *
  * The @a cancel_baton argument to svn_editor_create() is passed
  * unchanged to each poll of @a cancel_func.
  *
  * The cancellation function and baton are typically provided by the client
  * context.
+ *
+ * ### JAF: Bring the section on cancellation under the Life-Cycle heading
+ *     down to here and combine it.
  *
  *
  * @todo ### TODO anything missing? -- allow text and prop change to follow
- * a move or copy. -- set_text() vs. apply_text_delta()? -- If a
+ * a move or copy. -- set_text() vs. apply_text_delta()?
+ *
+ * ### JAF: Those two seem to be done already.
+ *
+ *  -- If a
  * set_props/set_text/set_target/copy/move/delete in a merge source is
  * applied to a different branch, which side will REVISION arguments reflect
  * and is there still a problem?
+ *
+ * ### JAF: The 'revision' argument will always refer to the 'before'
+ *     state of the source node that the editor is editing. The receiver
+ *     must know about this source node to make sense of the edit. If
+ *     the receiver is merging the source-node edits into some other
+ *     node on a different target branch, the editor knows nothing about
+ *     that other node (which may not even have a revision number if
+ *     it's a working version that's been modified already by some
+ *     previous merge or user intervention).
+ *
+ * ### JAF: As well as the text checksum for files, consider adding an
+ *     (optional?) checksum for the full content of every node kind --
+ *     text with props, symlink with props, dir with list of children.
  *
  * @since New in 1.8.
  */
 typedef struct svn_editor_t svn_editor_t;


 /** These function types define the callback functions a tree delta consumer
  * implements.
  *
@@ -698,57 +760,51 @@
  *
  * Create a new directory at @a relpath. The immediate parent of @a relpath
  * is expected to exist.
  *
  * Set the properties of the new directory to @a props, which is an
  * apr_hash_t holding key-value pairs. Each key is a const char* of a
  * property name, each value is a const svn_string_t*. If no properties are
  * being set on the new directory, @a props must be NULL.
  *
- * If this add is expected to replace a previously existing file or
- * directory at @a relpath, the revision number of the node to be replaced
+ * If this add is expected to replace a previously existing node
+ * at @a relpath, the revision number of the node to be replaced
  * must be given in @a replaces_rev. Otherwise, @a replaces_rev must be
  * SVN_INVALID_REVNUM.  Note: it is not allowed to call a "delete" followed
  * by an "add" on the same path. Instead, an "add" with @a replaces_rev set
  * accordingly MUST be used.
  *
  * A complete listing of the immediate children of @a relpath that will be
  * added subsequently is given in @a children. @a children is an array of
  * const char*s, each giving the basename of an immediate child.
  *
  * For all restrictions on driving the editor, see #svn_editor_t.
  */
 svn_error_t *
 svn_editor_add_directory(svn_editor_t *editor,
                          const char *relpath,
                          const apr_array_header_t *children,
                          apr_hash_t *props,
                          svn_revnum_t replaces_rev);
+/* ### JAF: Let's keep the 'relpath' and 'replaces_rev' params together,
+ *     like they are in the delete/copy/move callbacks, because
+ *     logically they're tightly related. Also in other add_*(). */

 /** Drive @a editor's #svn_editor_cb_add_file_t callback.
  *
  * Create a new file at @a relpath. The immediate parent of @a relpath
  * is expected to exist.
  *
  * The file's contents are specified in @a contents which has a checksum
  * matching @a checksum.
  *
- * Set the properties of the new file to @a props, which is an
- * apr_hash_t holding key-value pairs. Each key is a const char* of a
- * property name, each value is a const svn_string_t*. If no properties are
- * being set on the new file, @a props must be NULL.
- *
- * If this add is expected to replace a previously existing file or
- * directory at @a relpath, the revision number of the node to be replaced
- * must be given in @a replaces_rev. Otherwise, @a replaces_rev must be
- * SVN_INVALID_REVNUM.  Note: it is not allowed to call a "delete" followed
- * by an "add" on the same path. Instead, an "add" with @a replaces_rev set
- * accordingly MUST be used.
+ * For descriptions of @a props and @a replaces_rev, see
+ * svn_editor_add_file().
  *
  * For all restrictions on driving the editor, see #svn_editor_t.
  * @since New in 1.8.
  */
 svn_error_t *
 svn_editor_add_file(svn_editor_t *editor,
                     const char *relpath,
                     const svn_checksum_t *checksum,
                     svn_stream_t *contents,
@@ -772,18 +828,19 @@
                        const char *target,
                        apr_hash_t *props,
                        svn_revnum_t replaces_rev);

 /** Drive @a editor's #svn_editor_cb_add_absent_t callback.
  *
  * Create an "absent" node of kind @a kind at @a relpath. The immediate
  * parent of @a relpath is expected to exist.
  * ### TODO @todo explain "absent".
+ * ### JAF: What are the allowed values of 'kind'?
  *
  * For a description of @a replaces_rev, see svn_editor_add_file().
  *
  * For all restrictions on driving the editor, see #svn_editor_t.
  * @since New in 1.8.
  */
 svn_error_t *
 svn_editor_add_absent(svn_editor_t *editor,
                       const char *relpath,
@@ -794,18 +851,20 @@
  *
  * Set or change properties on the existing node at @a relpath.  This
  * function sends *all* properties, both existing and changes.
  * ### TODO @todo What is REVISION for?
  * ### HKW: This is puzzling to me as well...
  * ###
  * ### what about "entry props"? will these still be handled via
  * ### the general prop function?
  *
+ * For a description of @a props, see svn_editor_add_file().
+ *
  * @a complete must be FALSE if and only if
  * - @a relpath is a file and an svn_editor_set_text() call will follow on
  *   the same path, or
  * - @a relpath is a symbolic link and an svn_editor_set_target() call will
  *   follow on the same path.
  *
  * For all restrictions on driving the editor, see #svn_editor_t.
  * @since New in 1.8.
  */
@@ -815,18 +874,47 @@
                      svn_revnum_t revision,
                      apr_hash_t *props,
                      svn_boolean_t complete);

 /** Drive @a editor's #svn_editor_cb_set_text_t callback.
  *
  * Set/change the text content of a file at @a relpath to @a contents
  * with checksum @a checksum.
  * ### TODO @todo Does this send the *complete* content, always?
+ * ### JAF: I think the idea is that 'contents' is the new full text and,
+ *     if wanted, the driver and receiver implementations should diff and
+ *     patch (respectively) it against the old full text.
+ *
+ * ### JAF: This may be inefficient for an implementation that already
+ *     has a diff available but doesn't have a full text available
+ *     (perhaps neither the 'before' nor the 'after' full text). We
+ *     should consider providing an alternative driver-side API so
+ *     that the driver can choose to bypass this full-text phase and
+ *     send the new text directly in any representation agreed with
+ *     the receiver, including as a diff against the old. On the
+ *     receiver side the same consideration applies.
+ *
+ * ### JAF: For the driver: @a contents is a readable stream. The editor
+ *     implementation may pull text from it as required, and will then
+ *     close it before returning. The editor [### need not | will]
+ *     pull all the text from the stream before closing it.
+ *
+ * ### JAF: For the receiver: @a contents is a readable stream. The
+ *     receiver may pull text from it as required, and will then
+ *     close it before returning. The receiver [### need not | will]
+ *     pull all the text from the stream before closing it.
+ *
+ * ### JAF: Is it permissible for the text change to be a no-op? The
+ *     driver may wish to avoid doing so for efficiency if it knows,
+ *     but I'd say we should not forbid it. Either way we should
+ *     consider how to be consistent in this regard across all the
+ *     whole editor.
+ *
  * ### TODO @todo What is REVISION for?
  *
  * For all restrictions on driving the editor, see #svn_editor_t.
  * @since New in 1.8.
  */
 svn_error_t *
 svn_editor_set_text(svn_editor_t *editor,
                     const char *relpath,
                     svn_revnum_t revision,
]]]

- Julian

Re: Editor v2 - suggestions and queries

Posted by Hyrum K Wright <hy...@wandisco.com>.
Only a couple of nit-pick comments.

Generally, since there is already discussion in the file itself, I'd
support committing additional discussion directly, and then discussing it
on the mailing list if needed.

-Hyrum

On Thu, Nov 3, 2011 at 1:10 PM, Julian Foad <ju...@wandisco.com>wrote:

> Here, in the form of a patch, are many suggestions and queries I've
> made on Ev2. I'm not intimately familiar with the design and goals of
> it, I'm just responding to how it's currently written up in the header
> file, with some memory of past discussions.  Discussion welcomed.
>
> [[[
> Index: subversion/include/svn_editor.h
> ===================================================================
> --- subversion/include/svn_editor.h     (revision 1197094)
> +++ subversion/include/svn_editor.h     (working copy)
> @@ -177,18 +177,25 @@
>  *    \n\n
>  *    Just before each callback invocation is carried out, the @a
> cancel_func
>  *    that was passed to svn_editor_create() is invoked to poll any
>  *    external reasons to cancel the sequence of operations.  Unless it
>  *    overrides the cancellation (denoted by #SVN_ERR_CANCELLED), the
> driver
>  *    aborts the transmission by invoking the svn_editor_abort() callback.
>  *    Exceptions to this are calls to svn_editor_complete() and
>  *    svn_editor_abort(), which cannot be canceled externally.
>  *
> + *    ### JAF: It should check for cancellation before 'complete'.
> + *        Imagine the user runs 'svn commit large-new-file' and then
> + *        tries to cancel it, and imagine svn_editor_add() does not
> + *        internally respond to cancellation. If the driver then calls
> + *        'complete' without intercepting the cancellation, the user
> + *        would not be happy to see that commit being completed.
> + *
>  * - @b Receive: While the driver invokes operations upon the editor, the
>  *    receiver finds its callback functions called with the information
>  *    to operate on its tree. Each actual callback function receives those
>  *    arguments that the driver passed to the "driving" functions, plus
> these:
>  *    -  @a baton: This is the @a editor_baton pointer originally passed to
>  *       svn_editor_create().  It may be freely used by the callback
>  *       implementation to store information across all callbacks.
>  *    -  @a scratch_pool: This temporary pool is cleared directly after
>  *       each callback returns.  See "Pool Usage".
> @@ -203,76 +210,111 @@
>  *
>  * <h3>Driving Order Restrictions</h3>
>  * In order to reduce complexity of callback receivers, the editor
> callbacks
>  * must be driven in adherence to these rules:
>  *
>  * - svn_editor_add_directory() -- Another svn_editor_add_*() call must
>  *   follow for each child mentioned in the @a children argument of any
>  *   svn_editor_add_directory() call.
>  *
> + *   ### JAF: ... must follow at any time after the parent
> add_directory(),
> + *       and not before it.
> + *
>  * - svn_editor_set_props()
>  *   - The @a complete argument must be TRUE if no more calls will follow
> on
>  *     the same path. @a complete must always be TRUE for directories.
>  *   - If @a complete is FALSE, and:
>  *     - if @a relpath is a file, this must (at some point) be followed by
>  *       an svn_editor_set_text() call on the same path.
>  *     - if @a relpath is a symlink, this must (at some point) be followed
> by
>  *       an svn_editor_set_target() call on the same path.
>  *
>  * - svn_editor_set_text() and svn_editor_set_target() must always occur
>  *   @b after an svn_editor_set_props() call on the same path, if any.
>  *
>  *   In other words, if there are two calls coming in on the same path, the
>  *   first of them has to be svn_editor_set_props().
>  *
> + * ### JAF: The set_* functions treat the properties of a node as
> + *     separate from the node's text or symlink target, whereas the
> + *     add_* functions treat the properties as an integral part of each
> + *     kind of node. This seems like a needless difference. It would make
> + *     sense for set_text() to take the properties with the text, and let
> + *     the implementation worry about whether the props and/or the text
> + *     have actually changed (as well as how to transmit the changes
> + *     most efficiently). For symlinks, include the properties with the
> + *     'set target' call (rename to 'set symlink') and for directories
> + *     introduce a 'set directory' function, and remove the set_props()
> + *     function. This would eliminate the exception in the "Once Rule".
> + *
>  * - Other than the above two pairs of linked operations, a path should
>  *   never be referenced more than once by the add_* and set_* and the
>  *   delete operations (the "Once Rule"). The source path of a copy (and
>  *   its children, if a directory) may be copied many times, and are
>  *   otherwise subject to the Once Rule. The destination path of a copy
>  *   or move may have set_* operations applied, but not add_* or delete.
>  *   If the destination path of a copy or move is a directory, then its
>  *   children are subject to the Once Rule. The source path of a move
>  *   (and its child paths) may be referenced in add_*, or as the
>  *   destination of a copy (where these new, copied nodes are subject to
>  *   the Once Rule).
>  *
>  * - The ancestor of an added or modified node may not be deleted. The
>  *   ancestor may not be moved (instead: perform the move, *then* the
> edits).
> + *
> + * ### JAF: Ancestor == parent dir?
> + *
> + * ### JAF: The ancestor of a node that is modified or added or
> copied-here
> + *     or moved-here ...?
> + *
> + * ### JAF: The ancestor of ... [can | may not?] be copied or moved.
>  *
>  * - svn_editor_delete() must not be used to replace a path -- i.e.
>  *   svn_editor_delete() must not be followed by an svn_editor_add_*() on
>  *   the same path, nor by an svn_editor_copy() or svn_editor_move() with
>  *   the same path as the copy/move target.
> + *
> + * ### JAF: Nor followed by set_*().
> + *
> + * ### JAF: Nor followed by copy() or move() with the same path as the
> + *     copy/move source?
>  *
>  *   Instead of a prior delete call, the add/copy/move callbacks should be
>  *   called with the @a replaces_rev argument set to the revision number of
>  *   the node at this path that is being replaced.  Note that the path and
>  *   revision number are the key to finding any other information about the
>  *   replaced node, like node kind, etc.
>  *   @todo say which function(s) to use.
> + *
> + * ### JAF: Can we have a way to refer to replacing a node in a tree that
> + *     is not a committed revision and so doesn't have a revision number?
> + *     This seems to be the only place where the editor definition
> requires
> + *     that a target node is a (copy of a) committed repository node.
>  *
> - * - svn_editor_delete() must not be used to move a path -- i.e.
> - *   svn_editor_delete() must not delete the source path of a previous
> + * - svn_editor_delete() should not be used to move a path -- i.e.
> + *   svn_editor_delete() should not delete the source path of a previous
>

These should be left as-is, per RFC 2119.


>  *   svn_editor_copy() call. Instead, svn_editor_move() must be used.
>  *   Note: if the desired semantics is one (or more) copies, followed
>  *   by a delete... that is fine. It is simply that svn_editor_move()
>  *   should be used to describe a semantic move.
>  *
>  * - One of svn_editor_complete() or svn_editor_abort() must be called
>  *   exactly once, which must be the final call the driver invokes.
>  *   Invoking svn_editor_complete() must imply that the set of changes has
>  *   been transmitted completely and without errors, and invoking
>  *   svn_editor_abort() must imply that the transformation was not
> completed
>  *   successfully.
>  *
>  * - If any callback invocation returns with an error, the driver must
>  *   invoke svn_editor_abort() and stop transmitting operations.
> + *
> + * - ### JAF: Some restriction analogous to the receiver's "All callbacks
> + *       must complete their handling of a path before they return ..."?
>  * \n\n
>  *
>  * <h3>Receiving Restrictions</h3>
>  * All callbacks must complete their handling of a path before they
>  * return, except for the following pairs, where a change must be completed
>  * when receiving the second callback in each pair:
>  *  - svn_editor_set_props() (if @a complete is FALSE) and
>  *    svn_editor_set_text() (if the node is a file)
>  *  - svn_editor_set_props() (if @a complete is FALSE) and
> @@ -328,25 +370,45 @@
>  * calling any of the driving functions (e.g. svn_editor_add_directory()).
>  * As with any other error, the driver must then invoke svn_editor_abort()
>  * and abort the transformation sequence. See #svn_cancel_func_t.
>  *
>  * The @a cancel_baton argument to svn_editor_create() is passed
>  * unchanged to each poll of @a cancel_func.
>  *
>  * The cancellation function and baton are typically provided by the client
>  * context.
> + *
> + * ### JAF: Bring the section on cancellation under the Life-Cycle heading
> + *     down to here and combine it.
>  *
>  *
>  * @todo ### TODO anything missing? -- allow text and prop change to follow
> - * a move or copy. -- set_text() vs. apply_text_delta()? -- If a
> + * a move or copy. -- set_text() vs. apply_text_delta()?
> + *
> + * ### JAF: Those two seem to be done already.
> + *
> + *  -- If a
>  * set_props/set_text/set_target/copy/move/delete in a merge source is
>  * applied to a different branch, which side will REVISION arguments
> reflect
>  * and is there still a problem?
> + *
> + * ### JAF: The 'revision' argument will always refer to the 'before'
> + *     state of the source node that the editor is editing. The receiver
> + *     must know about this source node to make sense of the edit. If
> + *     the receiver is merging the source-node edits into some other
> + *     node on a different target branch, the editor knows nothing about
> + *     that other node (which may not even have a revision number if
> + *     it's a working version that's been modified already by some
> + *     previous merge or user intervention).
> + *
> + * ### JAF: As well as the text checksum for files, consider adding an
> + *     (optional?) checksum for the full content of every node kind --
> + *     text with props, symlink with props, dir with list of children.
>  *
>  * @since New in 1.8.
>  */
>  typedef struct svn_editor_t svn_editor_t;
>
>
>  /** These function types define the callback functions a tree delta
> consumer
>  * implements.
>  *
> @@ -698,57 +760,51 @@
>  *
>  * Create a new directory at @a relpath. The immediate parent of @a relpath
>  * is expected to exist.
>  *
>  * Set the properties of the new directory to @a props, which is an
>  * apr_hash_t holding key-value pairs. Each key is a const char* of a
>  * property name, each value is a const svn_string_t*. If no properties are
>  * being set on the new directory, @a props must be NULL.
>  *
> - * If this add is expected to replace a previously existing file or
> - * directory at @a relpath, the revision number of the node to be replaced
> + * If this add is expected to replace a previously existing node
>

Use the more specific "directory" rather than "node".  (This is a doctoring
which only applies to directories.)


> + * at @a relpath, the revision number of the node to be replaced
>  * must be given in @a replaces_rev. Otherwise, @a replaces_rev must be
>  * SVN_INVALID_REVNUM.  Note: it is not allowed to call a "delete" followed
>  * by an "add" on the same path. Instead, an "add" with @a replaces_rev set
>  * accordingly MUST be used.
>  *
>  * A complete listing of the immediate children of @a relpath that will be
>  * added subsequently is given in @a children. @a children is an array of
>  * const char*s, each giving the basename of an immediate child.
>  *
>  * For all restrictions on driving the editor, see #svn_editor_t.
>  */
>  svn_error_t *
>  svn_editor_add_directory(svn_editor_t *editor,
>                          const char *relpath,
>                          const apr_array_header_t *children,
>                          apr_hash_t *props,
>                          svn_revnum_t replaces_rev);
> +/* ### JAF: Let's keep the 'relpath' and 'replaces_rev' params together,
> + *     like they are in the delete/copy/move callbacks, because
> + *     logically they're tightly related. Also in other add_*(). */
>
>  /** Drive @a editor's #svn_editor_cb_add_file_t callback.
>  *
>  * Create a new file at @a relpath. The immediate parent of @a relpath
>  * is expected to exist.
>  *
>  * The file's contents are specified in @a contents which has a checksum
>  * matching @a checksum.
>  *
> - * Set the properties of the new file to @a props, which is an
> - * apr_hash_t holding key-value pairs. Each key is a const char* of a
> - * property name, each value is a const svn_string_t*. If no properties
> are
> - * being set on the new file, @a props must be NULL.
> - *
> - * If this add is expected to replace a previously existing file or
> - * directory at @a relpath, the revision number of the node to be replaced
> - * must be given in @a replaces_rev. Otherwise, @a replaces_rev must be
> - * SVN_INVALID_REVNUM.  Note: it is not allowed to call a "delete"
> followed
> - * by an "add" on the same path. Instead, an "add" with @a replaces_rev
> set
> - * accordingly MUST be used.
> + * For descriptions of @a props and @a replaces_rev, see
> + * svn_editor_add_file().
>  *
>  * For all restrictions on driving the editor, see #svn_editor_t.
>  * @since New in 1.8.
>  */
>  svn_error_t *
>  svn_editor_add_file(svn_editor_t *editor,
>                     const char *relpath,
>                     const svn_checksum_t *checksum,
>                     svn_stream_t *contents,
> @@ -772,18 +828,19 @@
>                        const char *target,
>                        apr_hash_t *props,
>                        svn_revnum_t replaces_rev);
>
>  /** Drive @a editor's #svn_editor_cb_add_absent_t callback.
>  *
>  * Create an "absent" node of kind @a kind at @a relpath. The immediate
>  * parent of @a relpath is expected to exist.
>  * ### TODO @todo explain "absent".
> + * ### JAF: What are the allowed values of 'kind'?
>  *
>  * For a description of @a replaces_rev, see svn_editor_add_file().
>  *
>  * For all restrictions on driving the editor, see #svn_editor_t.
>  * @since New in 1.8.
>  */
>  svn_error_t *
>  svn_editor_add_absent(svn_editor_t *editor,
>                       const char *relpath,
> @@ -794,18 +851,20 @@
>  *
>  * Set or change properties on the existing node at @a relpath.  This
>  * function sends *all* properties, both existing and changes.
>  * ### TODO @todo What is REVISION for?
>  * ### HKW: This is puzzling to me as well...
>  * ###
>  * ### what about "entry props"? will these still be handled via
>  * ### the general prop function?
>  *
> + * For a description of @a props, see svn_editor_add_file().
> + *
>  * @a complete must be FALSE if and only if
>  * - @a relpath is a file and an svn_editor_set_text() call will follow on
>  *   the same path, or
>  * - @a relpath is a symbolic link and an svn_editor_set_target() call will
>  *   follow on the same path.
>  *
>  * For all restrictions on driving the editor, see #svn_editor_t.
>  * @since New in 1.8.
>  */
> @@ -815,18 +874,47 @@
>                      svn_revnum_t revision,
>                      apr_hash_t *props,
>                      svn_boolean_t complete);
>
>  /** Drive @a editor's #svn_editor_cb_set_text_t callback.
>  *
>  * Set/change the text content of a file at @a relpath to @a contents
>  * with checksum @a checksum.
>  * ### TODO @todo Does this send the *complete* content, always?
> + * ### JAF: I think the idea is that 'contents' is the new full text and,
> + *     if wanted, the driver and receiver implementations should diff and
> + *     patch (respectively) it against the old full text.
> + *
> + * ### JAF: This may be inefficient for an implementation that already
> + *     has a diff available but doesn't have a full text available
> + *     (perhaps neither the 'before' nor the 'after' full text). We
> + *     should consider providing an alternative driver-side API so
> + *     that the driver can choose to bypass this full-text phase and
> + *     send the new text directly in any representation agreed with
> + *     the receiver, including as a diff against the old. On the
> + *     receiver side the same consideration applies.
>

Please, please, please, please, no secret backdoors!  Let's get this
implementation figured out, and then worry about potential optimizations.


> + *
> + * ### JAF: For the driver: @a contents is a readable stream. The editor
> + *     implementation may pull text from it as required, and will then
> + *     close it before returning. The editor [### need not | will]
> + *     pull all the text from the stream before closing it.
> + *
> + * ### JAF: For the receiver: @a contents is a readable stream. The
> + *     receiver may pull text from it as required, and will then
> + *     close it before returning. The receiver [### need not | will]
> + *     pull all the text from the stream before closing it.
> + *
> + * ### JAF: Is it permissible for the text change to be a no-op? The
> + *     driver may wish to avoid doing so for efficiency if it knows,
> + *     but I'd say we should not forbid it. Either way we should
> + *     consider how to be consistent in this regard across all the
> + *     whole editor.
>

No-op text changes are permissible.


> + *
>  * ### TODO @todo What is REVISION for?
>  *
>  * For all restrictions on driving the editor, see #svn_editor_t.
>  * @since New in 1.8.
>  */
>  svn_error_t *
>  svn_editor_set_text(svn_editor_t *editor,
>                     const char *relpath,
>                     svn_revnum_t revision,
> ]]]
>
> - Julian
>



-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Re: Editor v2 - suggestions and queries

Posted by Greg Stein <gs...@gmail.com>.
On Fri, Nov 4, 2011 at 11:16, Julian Foad <ju...@wandisco.com> wrote:
>...
>>> + *        internally respond to cancellation. If the driver then calls
>>> + *        'complete' without intercepting the cancellation, the user
>>> + *        would not be happy to see that commit being completed.
>>
>> That would be a bug in the add, which is not checking. I see no reason
>> to work around a bug.
>
> The 'add' method *should* be cancellable internally if it's taking a
> long time, of course.  My argument still applies if 'add' is doing
> cancellation checks internally but happens not to check during the
> last (say) one second before it returns.  I disagree that that would
> constitute a bug in the add.
>
>> If the caller wants to abort the editor, then it
>> should be able to do so, regardless of the cancellation setting.
>
> I agree that 'abort' should not be cancellable, but I'm talking about
> 'complete'.
>
> I understand it's a feature of the editor design that the driver side
> implementation can batch up the effects of editor calls, buffering the
> data and returning quickly from each of them, and then the 'complete'
> call can take a long time as it flushes the buffer and waits for all
> the transfer to complete.  In this way, the 'complete' call acts like
> a 'flush' followed by a 'close'.  Would we not expect the 'complete'
> function to check internally for cancellation while its 'flush' phase
> is proceeding?  It seems necessary that it should do so, if
> cancellation is to be effective for the user.
>
> As an aside, I found it helpful to think about what the driver is
> allowed to do when an editor function returns 'cancelled'.  "Unless it
> overrides the cancellation (denoted by #SVN_ERR_CANCELLED), the driver
> aborts the transmission by invoking the svn_editor_abort() callback."
> The driver should certainly be allowed to call 'abort', and that call
> should reach the receiver side if at all possible.  For a little time
> I wondered whether we also want the option for a driver to treat the
> cancellation request as a polite request from the user that the driver
> has now sent everything the (driver-side) user cares about and so the
> editor should be closed 'successfully' at this point.  A driver that
> is performing a commit should certainly not behave like that.
> Performing a WC update?  No, that would corrupt the WC.  Printing a
> diff to a remote user's screen?  Even in that hypothetical case,
> although it might be harmless, it's probably much more useful and
> correct to call 'abort' and thus discard any unsent output.  So I
> don't see a need for a driver to be able to call 'complete' after an
> editor call returns 'cancelled', nor can I see that being logically
> sound, given that some editor function may have returned early and
> left some state undefined.
>
> So one point from that is I don't see a way in which the driver should
> be allowed to 'override' the cancellation.  If that were allowed, that
> would imply either that cancellation is only checked between editor
> calls and not at arbitrary points inside them, which is (I think) not
> right, or that the driver has knowledge about the editor
> implementation.
>
> If you're with me so far, then I'd suggest that in terms of
> cancellation 'complete' is more similar to 'add' than it is similar to
> 'abort'.

Hmm. Some interesting points here. And with some ties into your responses later.

It seems that we want to specify some behavior here.. some ideals.
We're kind of getting into the territory that not all drivers will
obey, but we can simply hope. In this case, what will the driver do in
the face of desired cancellation? What do we want the editor API to
do? (can we enforce any of that within the interior editor code?)

For example, one thing we should enforce: an editor should be
completed, or aborted. In debug mode, we can register a pool cleanup
and verify this fact. (though none of our tests attempt cancellation,
so I don't know how useful this is)

I don't think we need to worry too much about whether complete()
checks for cancellation. You're just talking about a race condition
there.

It seems reasonable for a driver to get a cancellation error [from the
editor/receiver] and then wrap things up and call complete(). That
kind of behavior would be totally fine for an "update" drive. There
are no semantic problems with doing a half-update. In this kind of
atmosphere, the notion of "each receiver callback completes before
returning" is very important. Of course, if the receiver defers some
of the work, then the driver *might* be upset, but as long as the
receiver doesn't leave behind integrity problems, then we're all good.

>...
>>> + * ### JAF: The set_* functions treat the properties of a node as
>>> + *     separate from the node's text or symlink target, whereas the
>>> + *     add_* functions treat the properties as an integral part of each
>>> + *     kind of node. This seems like a needless difference. It would make
>>> + *     sense for set_text() to take the properties with the text, and let
>>
>> Sure, that makes sense. Then the set_props() can lose the COMPLETE
>> argument, and we add the restriction that set_text() and set_target()
>> MUST NOT be called before/after the set_props() call.
>>
>> We could also state that props==NULL means "no change". "No
>> properties" (the empty set) would be designated by an empty hash.
>
> Right; and similarly allow contents==NULL and target==NULL.

Hrm. See below.

>>> + *     the implementation worry about whether the props and/or the text
>>> + *     have actually changed (as well as how to transmit the changes
>>> + *     most efficiently). For symlinks, include the properties with the
>>> + *     'set target' call (rename to 'set symlink') and for directories
>>
>> Keep it set target. You are not setting the symlink, you are setting
>> the *target* of the symlink. This is the nomenclature used everywhere
>> already.
>
> Huh?  We're now talking about a single call that sets the target and
> the properties together.  If we take this approach, I suggest naming
> the three calls 'set_symlink' (like 'add_symlink'), 'set_file' (like
> 'add_file'), and 'set_dir_props' (which is not quite like
> 'add_directory').

It was never intended to be set_dir_props(). You could set properties
on any node. That should stick around, cuz it kind of sucks for the
caller to have to know the node type just to set some properties.

But I do like where you're going with this. Rather than set_$kind,
let's go with alter_file(), alter_symlink(), and alter_directory(). I
chose "alter" rather than "change" since change_directory might throw
people off with some implied stateful semantics. Each of the alter_*
functions would provide for changing the properties, symlink target,
file contents, etc. Maybe we eliminate alter_directory() since that
would be exactly the same as set_props() [though maybe that becomes
alter_props?].

>...
>>> + * ### JAF: Nor followed by copy() or move() with the same path as the
>>> + *     copy/move source?
>>
>> You cannot copy/move a missing node.
>
> That sounds right for 'move', because I would assume the source node
> refers to the node as found in the current partially-edited state of
> the tree at the time of the call.
>
> For 'copy', I'm not clear what the semantics are.  Sometimes an edit
> driver might want to make a copy of relpath FOO as found in the
> current partially-edited state of the tree at the time of the call.
> At other times it may want to make a copy of relpath FOO as found in
> the 'old' tree that the edit is based on (not the partially-edited
> state).  How would an edit driver tell the receiver to make a copy
> from some arbitrary node in some (presumably older) committed revision
> that the receiver might or might not know about?

These are all very good concerns. But copy() already talks about a
source revision. I think a real question is how SRC_RELPATH is
specified in that function. Is that a repository relpath? (the
docstring doesn't clarify) Probably should be, since a revision is
specified, and (thus) implying something from a repository.

I do *not* think that copy() should be called for uncommitted items.
That is just a set of add_* calls which mirror the uncommitted source.

>...
>>> - * - svn_editor_delete() must not be used to move a path -- i.e.
>>> - *   svn_editor_delete() must not delete the source path of a previous
>>> + * - svn_editor_delete() should not be used to move a path -- i.e.
>>> + *   svn_editor_delete() should not delete the source path of a previous
>>
>> MUST NOT. Why the change?
>
> The interpretation was rather dependent on the reader's understanding
> of what 'to move' means.  How about this for a better phrasing of the
> whole paragraph:
>
>  - svn_editor_move() is the only way to describe a semantic move.  If
>    the desired semantics is one (or more) copies followed by a delete,
>    you may call svn_editor_delete() to delete the source path of a
>    previous svn_editor_copy() call, but that is not the same as a "move".

Sure.

>...
>>> + * ### JAF: This may be inefficient for an implementation that already
>>> + *     has a diff available but doesn't have a full text available
>>
>> Don't worry about this. In all cases that I identified, the base text
>> was available to apply a patch against to derive the desired new full
>> text.
>>
>> If you can identify a counter-example, then we can discuss. But
>> otherwise, this isn't an issue.
>
> It just seems to me that if I were writing a driver on the repository
> side I'd want to be able to transmit my exising svndiff streams, or
> combined svndiff streams, in the cases where I have them, and only
> resort to building full texts when necessary.  But I haven't looked at
> how the current repository code could do this in a concrete way.

As I've said: I investigated, and nobody *sources* svndiff streams.
Everybody has full text first, and then constructs diffs based on
context. So AFAIK, your hypothesis is false for our current codebase.
And I'm not willing to design for an arbitrary future, rather than our
past decade of experience.

>>>...
>>> + * ### JAF: For the driver: @a contents is a readable stream. The editor
>>> + *     implementation may pull text from it as required, and will then
>>> + *     close it before returning. The editor [### need not | will]
>>> + *     pull all the text from the stream before closing it.
>>
>> Somewhere in the doc, it should say that the receiver MAY close the
>> stream before reading anything [... or ...] all text from it [...].
>
> Good, will do.
>
> I wanted to clarify three separate things here.
>
>  (1) Partial read is allowed.  Good.
>
>  (2) It's a 'pull-mode' interface.  Fine.
>
>  (3) The editor is not allowed to return early and defer the reading
> of this stream until it's ready.  I wonder if we might want to let the
> editor keep several streams open and read from them as and when its
> transmit buffer allows, especially if it wants to be able to send two
> or more file streams in parallel.  These are just shallow thoughts at
> the moment.

Actually... damn. One of the reasons for a separate of set_props() vs
set_text() was to allow for the delayed delivery of contents. Same
thing for the add_file() and set_text(). We adjusted add_file() to
take contents, but that may have been a mistake.

At commit time, we want to delay the delivery of the content streams.

>...
>>> + * ### JAF: Is it permissible for the text change to be a no-op? [...]
>>
>> It is permissible, but stupid. I would be fine with saying "not
>> allowed". Or at least making a statement like "set_text() SHOULD make
>> changes to the contents."
>
> It may be 'stupid' if the driver already knows that the content is
> unchanged, but there may be cases where the driver wouldn't know
> except by scanning the stream, and for those cases it makes sense to
> allow it.

Right. That's why I think "SHOULD" is the appropriate term here.

> Now that we're talking about combining text & props into a single
> call, if we do that then I suggest that ==NULL is recommended but a
> stream with unchanged content is allowed.

Well... see above, ref: delayed content delivery. The API as I
originally designed provided for a delayed delivery. I forgot about
that aspect when I acceded to combining add_file/set_text.

Cheers,
-g

Re: Editor v2 - suggestions and queries

Posted by Julian Foad <ju...@wandisco.com>.
Hi Greg.  Thanks for the detailed response.

First on one point from Hyrum:
Hyrum Wright wrote:
    @@ -698,57 +760,51 @@
     * Create a new directory at @a relpath. [...]
     *
    - * If this add is expected to replace a previously existing file or
    - * directory at @a relpath, the revision number of the node to be replaced
    + * If this add is expected to replace a previously existing node
> Use the more specific "directory" rather than "node".  (This is a doctoring which only applies to directories.)

No, what we're talking about here is replacing some previous node.  A
new directory should be able to replace any kind of previous node.

(Heh, at first I interpreted 'doctoring' as 'editing', but now I guess
'doc-string'.)


Greg Stein wrote:
> Julian Foad wrote:
>>...
>> @@ -177,18 +177,25 @@
>>  *    \n\n
>>  *    Just before each callback invocation is carried out, the @a cancel_func
>>  *    that was passed to svn_editor_create() is invoked to poll any
>>  *    external reasons to cancel the sequence of operations.  Unless it
>>  *    overrides the cancellation (denoted by #SVN_ERR_CANCELLED), the driver
>>  *    aborts the transmission by invoking the svn_editor_abort() callback.
>>  *    Exceptions to this are calls to svn_editor_complete() and
>>  *    svn_editor_abort(), which cannot be canceled externally.
>>  *
>> + *    ### JAF: It should check for cancellation before 'complete'.
>> + *        Imagine the user runs 'svn commit large-new-file' and then
>> + *        tries to cancel it, and imagine svn_editor_add() does not
>> + *        internally respond to cancellation. If the driver then calls
>> + *        'complete' without intercepting the cancellation, the user
>> + *        would not be happy to see that commit being completed.
>
> That would be a bug in the add, which is not checking. I see no reason
> to work around a bug.

The 'add' method *should* be cancellable internally if it's taking a
long time, of course.  My argument still applies if 'add' is doing
cancellation checks internally but happens not to check during the
last (say) one second before it returns.  I disagree that that would
constitute a bug in the add.

> If the caller wants to abort the editor, then it
> should be able to do so, regardless of the cancellation setting.

I agree that 'abort' should not be cancellable, but I'm talking about
'complete'.

I understand it's a feature of the editor design that the driver side
implementation can batch up the effects of editor calls, buffering the
data and returning quickly from each of them, and then the 'complete'
call can take a long time as it flushes the buffer and waits for all
the transfer to complete.  In this way, the 'complete' call acts like
a 'flush' followed by a 'close'.  Would we not expect the 'complete'
function to check internally for cancellation while its 'flush' phase
is proceeding?  It seems necessary that it should do so, if
cancellation is to be effective for the user.

As an aside, I found it helpful to think about what the driver is
allowed to do when an editor function returns 'cancelled'.  "Unless it
overrides the cancellation (denoted by #SVN_ERR_CANCELLED), the driver
aborts the transmission by invoking the svn_editor_abort() callback."
The driver should certainly be allowed to call 'abort', and that call
should reach the receiver side if at all possible.  For a little time
I wondered whether we also want the option for a driver to treat the
cancellation request as a polite request from the user that the driver
has now sent everything the (driver-side) user cares about and so the
editor should be closed 'successfully' at this point.  A driver that
is performing a commit should certainly not behave like that.
Performing a WC update?  No, that would corrupt the WC.  Printing a
diff to a remote user's screen?  Even in that hypothetical case,
although it might be harmless, it's probably much more useful and
correct to call 'abort' and thus discard any unsent output.  So I
don't see a need for a driver to be able to call 'complete' after an
editor call returns 'cancelled', nor can I see that being logically
sound, given that some editor function may have returned early and
left some state undefined.

So one point from that is I don't see a way in which the driver should
be allowed to 'override' the cancellation.  If that were allowed, that
would imply either that cancellation is only checked between editor
calls and not at arbitrary points inside them, which is (I think) not
right, or that the driver has knowledge about the editor
implementation.

If you're with me so far, then I'd suggest that in terms of
cancellation 'complete' is more similar to 'add' than it is similar to
'abort'.


>>...
>> @@ -203,76 +210,111 @@
>>  *
>>  * <h3>Driving Order Restrictions</h3>
>>  * In order to reduce complexity of callback receivers, the editor callbacks
>>  * must be driven in adherence to these rules:
>>  *
>>  * - svn_editor_add_directory() -- Another svn_editor_add_*() call must
>>  *   follow for each child mentioned in the @a children argument of any
>>  *   svn_editor_add_directory() call.
>>  *
>> + *   ### JAF: ... must follow at any time after the parent add_directory(),
>> + *       and not before it.
>
> Huh? The statement already says the children must follow. I don't
> understand this seeming redundancy.

Just wanted to clarify it's *any time* after, making sure there's no
implication that the children must follow the parent before we mention
any path outside of the added tree, or anything like that.  I'll add a
general note somewhere to that effect.

>>...
>>  *   In other words, if there are two calls coming in on the same path, the
>>  *   first of them has to be svn_editor_set_props().
>>  *
>> + * ### JAF: The set_* functions treat the properties of a node as
>> + *     separate from the node's text or symlink target, whereas the
>> + *     add_* functions treat the properties as an integral part of each
>> + *     kind of node. This seems like a needless difference. It would make
>> + *     sense for set_text() to take the properties with the text, and let
>
> Sure, that makes sense. Then the set_props() can lose the COMPLETE
> argument, and we add the restriction that set_text() and set_target()
> MUST NOT be called before/after the set_props() call.
>
> We could also state that props==NULL means "no change". "No
> properties" (the empty set) would be designated by an empty hash.

Right; and similarly allow contents==NULL and target==NULL.

>> + *     the implementation worry about whether the props and/or the text
>> + *     have actually changed (as well as how to transmit the changes
>> + *     most efficiently). For symlinks, include the properties with the
>> + *     'set target' call (rename to 'set symlink') and for directories
>
> Keep it set target. You are not setting the symlink, you are setting
> the *target* of the symlink. This is the nomenclature used everywhere
> already.

Huh?  We're now talking about a single call that sets the target and
the properties together.  If we take this approach, I suggest naming
the three calls 'set_symlink' (like 'add_symlink'), 'set_file' (like
'add_file'), and 'set_dir_props' (which is not quite like
'add_directory').

>> + *     introduce a 'set directory' function, and remove the set_props()
>> + *     function. This would eliminate the exception in the "Once Rule".
>
> Sure.
>
>>...
>>  * - The ancestor of an added or modified node may not be deleted. The
>>  *   ancestor may not be moved (instead: perform the move, *then* the edits).
>> + *
>> + * ### JAF: Ancestor == parent dir?
>
> Any ancestor, not just the parent.

Right.  (My doubt was about whether it could mean a time-wise rather
than path-wise 'ancestor'.)

>> + *
>> + * ### JAF: The ancestor of a node that is modified or added or copied-here
>> + *     or moved-here ...?
>
> Sure.

OK, I've updated the text.

>> + *
>> + * ### JAF: The ancestor of ... [can | may not?] be copied or moved.
>
> The statement says the ancestor may not be moved. Since it says
> nothing about copying: sure, that is allowed (as you'd expect, so we
> don't have to call it out).

If only we could always be so sure that anything not explictly stated
in our docs is not true!  Seriously, not knowing how thoroughly the
text has been reviewed, I couldn't assume this with any certainty.

>>  * - svn_editor_delete() must not be used to replace a path -- i.e.
>>  *   svn_editor_delete() must not be followed by an svn_editor_add_*() on
>>  *   the same path, nor by an svn_editor_copy() or svn_editor_move() with
>>  *   the same path as the copy/move target.
>> + *
>> + * ### JAF: Nor followed by set_*().
>
> You cannot call set_*() on a missing node, so there is no reason to state this.

OK.

>> + * ### JAF: Nor followed by copy() or move() with the same path as the
>> + *     copy/move source?
>
> You cannot copy/move a missing node.

That sounds right for 'move', because I would assume the source node
refers to the node as found in the current partially-edited state of
the tree at the time of the call.

For 'copy', I'm not clear what the semantics are.  Sometimes an edit
driver might want to make a copy of relpath FOO as found in the
current partially-edited state of the tree at the time of the call.
At other times it may want to make a copy of relpath FOO as found in
the 'old' tree that the edit is based on (not the partially-edited
state).  How would an edit driver tell the receiver to make a copy
from some arbitrary node in some (presumably older) committed revision
that the receiver might or might not know about?

> The point of the above statement is to enforce that replacement is not
> allowed using a delete, followed by something that (re)constructs a
> node at that path. You don't need to (re)define the fact that set,
> copy, move must have a valid source.

OK.

>>  *   Instead of a prior delete call, the add/copy/move callbacks should be
>>  *   called with the @a replaces_rev argument set to the revision number of
>>  *   the node at this path that is being replaced.  Note that the path and
>>  *   revision number are the key to finding any other information about the
>>  *   replaced node, like node kind, etc.
>>  *   @todo say which function(s) to use.
>> + *
>> + * ### JAF: Can we have a way to refer to replacing a node in a tree that
>> + *     is not a committed revision and so doesn't have a revision number?
>> + *     This seems to be the only place where the editor definition requires
>> + *     that a target node is a (copy of a) committed repository node.
>
> Makes sense. We can state that SVN_INVALID_REVNUM means "uncommitted"
> (rather than "wildcard").

We'll have to think a bit deeper on this.

>>  *
>> - * - svn_editor_delete() must not be used to move a path -- i.e.
>> - *   svn_editor_delete() must not delete the source path of a previous
>> + * - svn_editor_delete() should not be used to move a path -- i.e.
>> + *   svn_editor_delete() should not delete the source path of a previous
>
> MUST NOT. Why the change?

The interpretation was rather dependent on the reader's understanding
of what 'to move' means.  How about this for a better phrasing of the
whole paragraph:

  - svn_editor_move() is the only way to describe a semantic move.  If
    the desired semantics is one (or more) copies followed by a delete,
    you may call svn_editor_delete() to delete the source path of a
    previous svn_editor_copy() call, but that is not the same as a "move".

>>...
>>  * - If any callback invocation returns with an error, the driver must
>>  *   invoke svn_editor_abort() and stop transmitting operations.
>> + *
>> + * - ### JAF: Some restriction analogous to the receiver's "All callbacks
>> + *       must complete their handling of a path before they return ..."?
>
> I don't see the analogy.

Oops, ignore me.  I didn't mean the paragraph above about invoking
abort() is analogous, I meant do we need to specify anything on the
driver side about all editor functions needing to complete their
handling of a path before return.  But I think that's basically what
all the many paragraphs above are already describing, so nothing more
to say here.

But now I think about it, I'm not clear on what "must complete their
handling of a path" really means for the receiver.


[...]
>>  * @todo ### TODO anything missing? -- allow text and prop change to follow
>> - * a move or copy. -- set_text() vs. apply_text_delta()? -- If a
>> + * a move or copy. -- set_text() vs. apply_text_delta()?
>> + *
>> + * ### JAF: Those two seem to be done already.
>
> Yup.
>
>> + *
>> + *  -- If a
>>  * set_props/set_text/set_target/copy/move/delete in a merge source is
>>  * applied to a different branch, which side will REVISION arguments reflect
>>  * and is there still a problem?
>> + *
>> + * ### JAF: The 'revision' argument will always refer to the 'before'
>> + *     state of the source node that the editor is editing. The receiver
>> + *     must know about this source node to make sense of the edit. If
>> + *     the receiver is merging the source-node edits into some other
>> + *     node on a different target branch, the editor knows nothing about
>> + *     that other node (which may not even have a revision number if
>> + *     it's a working version that's been modified already by some
>> + *     previous merge or user intervention).
>
> Right. svn_editor changes a tree from one state to another. "Merge"
> doesn't come into play whatsoever. "Make the tree look like <X>".
> Effectively, the driver computes the merge result, and then applies
> that to the target tree via svn_editor.

OK, I've removing these three notes.


>> + * ### JAF: As well as the text checksum for files, consider adding an
>> + *     (optional?) checksum for the full content of every node kind --
>> + *     text with props, symlink with props, dir with list of children.
>
> Huh? We've never talking about adding more checksums. I don't think
> this is the place to start.

OK, it was just a thought.

>>  /** Drive @a editor's #svn_editor_cb_add_absent_t callback.
>>  *
>>  * Create an "absent" node of kind @a kind at @a relpath. The immediate
>>  * parent of @a relpath is expected to exist.
>>  * ### TODO @todo explain "absent".
>> + * ### JAF: What are the allowed values of 'kind'?
>
> Not sure. My initial assumption was {dir, file, symlink}.
>
> Do we need to be able to set server-excluded, and excluded? I believe
> these latter two concepts are derived from the notion of "this $kind
> node should be marked absent", and it just depends on who receives
> such a notice. ie. the update receiver takes an absent marker as
> "server-excluded".


>>  /** Drive @a editor's #svn_editor_cb_set_text_t callback.
>>  *
>>  * Set/change the text content of a file at @a relpath to @a contents
>>  * with checksum @a checksum.
>>  * ### TODO @todo Does this send the *complete* content, always?
>> + * ### JAF: I think the idea is that 'contents' is the new full text and,
>> + *     if wanted, the driver and receiver implementations should diff and
>> + *     patch (respectively) it against the old full text.
>
> The CONTENTS param is always the new full text. Always.
>
> If the receiver is an RA committing editor, then it MAY choose to
> generate a diff to send to the server. But that is private to the
> receiver. If the driver is an RA update driver, then it MAY need to
> apply a diff received from the server against the (known) base text
> producing the new full text.
>
> No other formats should be used within the CONTENTS stream (e.g never
> an svndiff).

Yup, that's what I meant.

>> + * ### JAF: This may be inefficient for an implementation that already
>> + *     has a diff available but doesn't have a full text available
>
> Don't worry about this. In all cases that I identified, the base text
> was available to apply a patch against to derive the desired new full
> text.
>
> If you can identify a counter-example, then we can discuss. But
> otherwise, this isn't an issue.

It just seems to me that if I were writing a driver on the repository
side I'd want to be able to transmit my exising svndiff streams, or
combined svndiff streams, in the cases where I have them, and only
resort to building full texts when necessary.  But I haven't looked at
how the current repository code could do this in a concrete way.

>>...
>> + * ### JAF: For the driver: @a contents is a readable stream. The editor
>> + *     implementation may pull text from it as required, and will then
>> + *     close it before returning. The editor [### need not | will]
>> + *     pull all the text from the stream before closing it.
>
> Somewhere in the doc, it should say that the receiver MAY close the
> stream before reading anything [... or ...] all text from it [...].

Good, will do.

I wanted to clarify three separate things here.

  (1) Partial read is allowed.  Good.

  (2) It's a 'pull-mode' interface.  Fine.

  (3) The editor is not allowed to return early and defer the reading
of this stream until it's ready.  I wonder if we might want to let the
editor keep several streams open and read from them as and when its
transmit buffer allows, especially if it wants to be able to send two
or more file streams in parallel.  These are just shallow thoughts at
the moment.

> [...] drivers may choose to implement a "lazy-open" stream. [...]

>> + * ### JAF: For the receiver: @a contents is a readable stream. The
>> + *     receiver may pull text from it as required, and will then
>> + *     close it before returning. The receiver [### need not | will]
>> + *     pull all the text from the stream before closing it.
>
> This is basically above. The editor framework will never touch the
> stream, so we're only talking about the receiver's callback here.

It's worth being clear about the same three points, as I don't see
that the receiver callback's rules must necessarily be the same as the
driver's.

>> + * ### JAF: Is it permissible for the text change to be a no-op? [...]
>
> It is permissible, but stupid. I would be fine with saying "not
> allowed". Or at least making a statement like "set_text() SHOULD make
> changes to the contents."

It may be 'stupid' if the driver already knows that the content is
unchanged, but there may be cases where the driver wouldn't know
except by scanning the stream, and for those cases it makes sense to
allow it.

Now that we're talking about combining text & props into a single
call, if we do that then I suggest that ==NULL is recommended but a
stream with unchanged content is allowed.

- Julian

Re: Editor v2 - suggestions and queries

Posted by Greg Stein <gs...@gmail.com>.
On Thu, Nov 3, 2011 at 14:10, Julian Foad <ju...@wandisco.com> wrote:
>...
> @@ -177,18 +177,25 @@
>  *    \n\n
>  *    Just before each callback invocation is carried out, the @a cancel_func
>  *    that was passed to svn_editor_create() is invoked to poll any
>  *    external reasons to cancel the sequence of operations.  Unless it
>  *    overrides the cancellation (denoted by #SVN_ERR_CANCELLED), the driver
>  *    aborts the transmission by invoking the svn_editor_abort() callback.
>  *    Exceptions to this are calls to svn_editor_complete() and
>  *    svn_editor_abort(), which cannot be canceled externally.
>  *
> + *    ### JAF: It should check for cancellation before 'complete'.
> + *        Imagine the user runs 'svn commit large-new-file' and then
> + *        tries to cancel it, and imagine svn_editor_add() does not
> + *        internally respond to cancellation. If the driver then calls
> + *        'complete' without intercepting the cancellation, the user
> + *        would not be happy to see that commit being completed.

That would be a bug in the add, which is not checking. I see no reason
to work around a bug. If the caller wants to abort the editor, then it
should be able to do so, regardless of the cancellation setting.

>...
> @@ -203,76 +210,111 @@
>  *
>  * <h3>Driving Order Restrictions</h3>
>  * In order to reduce complexity of callback receivers, the editor callbacks
>  * must be driven in adherence to these rules:
>  *
>  * - svn_editor_add_directory() -- Another svn_editor_add_*() call must
>  *   follow for each child mentioned in the @a children argument of any
>  *   svn_editor_add_directory() call.
>  *
> + *   ### JAF: ... must follow at any time after the parent add_directory(),
> + *       and not before it.

Huh? The statement already says the children must follow. I don't
understand this seeming redundancy.

>...
>  *   In other words, if there are two calls coming in on the same path, the
>  *   first of them has to be svn_editor_set_props().
>  *
> + * ### JAF: The set_* functions treat the properties of a node as
> + *     separate from the node's text or symlink target, whereas the
> + *     add_* functions treat the properties as an integral part of each
> + *     kind of node. This seems like a needless difference. It would make
> + *     sense for set_text() to take the properties with the text, and let

Sure, that makes sense. Then the set_props() can lose the COMPLETE
argument, and we add the restriction that set_text() and set_target()
MUST NOT be called before/after the set_props() call.

We could also state that props==NULL means "no change". "No
properties" (the empty set) would be designated by an empty hash.

> + *     the implementation worry about whether the props and/or the text
> + *     have actually changed (as well as how to transmit the changes
> + *     most efficiently). For symlinks, include the properties with the
> + *     'set target' call (rename to 'set symlink') and for directories

Keep it set target. You are not setting the symlink, you are setting
the *target* of the symlink. This is the nomenclature used everywhere
already.

> + *     introduce a 'set directory' function, and remove the set_props()
> + *     function. This would eliminate the exception in the "Once Rule".

Sure.

>...
>  * - The ancestor of an added or modified node may not be deleted. The
>  *   ancestor may not be moved (instead: perform the move, *then* the edits).
> + *
> + * ### JAF: Ancestor == parent dir?

Any ancestor, not just the parent.

> + *
> + * ### JAF: The ancestor of a node that is modified or added or copied-here
> + *     or moved-here ...?

Sure.

> + *
> + * ### JAF: The ancestor of ... [can | may not?] be copied or moved.

The statement says the ancestor may not be moved. Since it says
nothing about copying: sure, that is allowed (as you'd expect, so we
don't have to call it out).

>  * - svn_editor_delete() must not be used to replace a path -- i.e.
>  *   svn_editor_delete() must not be followed by an svn_editor_add_*() on
>  *   the same path, nor by an svn_editor_copy() or svn_editor_move() with
>  *   the same path as the copy/move target.
> + *
> + * ### JAF: Nor followed by set_*().

You cannot call set_*() on a missing node, so there is no reason to state this.

> + * ### JAF: Nor followed by copy() or move() with the same path as the
> + *     copy/move source?

You cannot copy/move a missing node.

The point of the above statement is to enforce that replacement is not
allowed using a delete, followed by something that (re)constructs a
node at that path. You don't need to (re)define the fact that set,
copy, move must have a valid source.

>  *   Instead of a prior delete call, the add/copy/move callbacks should be
>  *   called with the @a replaces_rev argument set to the revision number of
>  *   the node at this path that is being replaced.  Note that the path and
>  *   revision number are the key to finding any other information about the
>  *   replaced node, like node kind, etc.
>  *   @todo say which function(s) to use.
> + *
> + * ### JAF: Can we have a way to refer to replacing a node in a tree that
> + *     is not a committed revision and so doesn't have a revision number?
> + *     This seems to be the only place where the editor definition requires
> + *     that a target node is a (copy of a) committed repository node.

Makes sense. We can state that SVN_INVALID_REVNUM means "uncommitted"
(rather than "wildcard").

>  *
> - * - svn_editor_delete() must not be used to move a path -- i.e.
> - *   svn_editor_delete() must not delete the source path of a previous
> + * - svn_editor_delete() should not be used to move a path -- i.e.
> + *   svn_editor_delete() should not delete the source path of a previous

MUST NOT. Why the change?

>...
>  * - If any callback invocation returns with an error, the driver must
>  *   invoke svn_editor_abort() and stop transmitting operations.
> + *
> + * - ### JAF: Some restriction analogous to the receiver's "All callbacks
> + *       must complete their handling of a path before they return ..."?

I don't see the analogy.

>...
> @@ -328,25 +370,45 @@
>  * calling any of the driving functions (e.g. svn_editor_add_directory()).
>  * As with any other error, the driver must then invoke svn_editor_abort()
>  * and abort the transformation sequence. See #svn_cancel_func_t.
>  *
>  * The @a cancel_baton argument to svn_editor_create() is passed
>  * unchanged to each poll of @a cancel_func.
>  *
>  * The cancellation function and baton are typically provided by the client
>  * context.
> + *
> + * ### JAF: Bring the section on cancellation under the Life-Cycle heading
> + *     down to here and combine it.

*shrug*

>  *
>  *
>  * @todo ### TODO anything missing? -- allow text and prop change to follow
> - * a move or copy. -- set_text() vs. apply_text_delta()? -- If a
> + * a move or copy. -- set_text() vs. apply_text_delta()?
> + *
> + * ### JAF: Those two seem to be done already.

Yup.

> + *
> + *  -- If a
>  * set_props/set_text/set_target/copy/move/delete in a merge source is
>  * applied to a different branch, which side will REVISION arguments reflect
>  * and is there still a problem?
> + *
> + * ### JAF: The 'revision' argument will always refer to the 'before'
> + *     state of the source node that the editor is editing. The receiver
> + *     must know about this source node to make sense of the edit. If
> + *     the receiver is merging the source-node edits into some other
> + *     node on a different target branch, the editor knows nothing about
> + *     that other node (which may not even have a revision number if
> + *     it's a working version that's been modified already by some
> + *     previous merge or user intervention).

Right. svn_editor changes a tree from one state to another. "Merge"
doesn't come into play whatsoever. "Make the tree look like <X>".
Effectively, the driver computes the merge result, and then applies
that to the target tree via svn_editor.

> + *
> + * ### JAF: As well as the text checksum for files, consider adding an
> + *     (optional?) checksum for the full content of every node kind --
> + *     text with props, symlink with props, dir with list of children.

Huh? We've never talking about adding more checksums. I don't think
this is the place to start.

>...
>  svn_error_t *
>  svn_editor_add_directory(svn_editor_t *editor,
>                          const char *relpath,
>                          const apr_array_header_t *children,
>                          apr_hash_t *props,
>                          svn_revnum_t replaces_rev);
> +/* ### JAF: Let's keep the 'relpath' and 'replaces_rev' params together,
> + *     like they are in the delete/copy/move callbacks, because
> + *     logically they're tightly related. Also in other add_*(). */

*shrug*

>...
> + * For descriptions of @a props and @a replaces_rev, see
> + * svn_editor_add_file().

Sure.

>...
> @@ -772,18 +828,19 @@
>                        const char *target,
>                        apr_hash_t *props,
>                        svn_revnum_t replaces_rev);
>
>  /** Drive @a editor's #svn_editor_cb_add_absent_t callback.
>  *
>  * Create an "absent" node of kind @a kind at @a relpath. The immediate
>  * parent of @a relpath is expected to exist.
>  * ### TODO @todo explain "absent".
> + * ### JAF: What are the allowed values of 'kind'?

Not sure. My initial assumption was {dir, file, symlink}.

Do we need to be able to set server-excluded, and excluded? I believe
these latter two concepts are derived from the notion of "this $kind
node should be marked absent", and it just depends on who receives
such a notice. ie. the update receiver takes an absent marker as
"server-excluded".

>...
> @@ -794,18 +851,20 @@
>  *
>  * Set or change properties on the existing node at @a relpath.  This
>  * function sends *all* properties, both existing and changes.
>  * ### TODO @todo What is REVISION for?
>  * ### HKW: This is puzzling to me as well...
>  * ###
>  * ### what about "entry props"? will these still be handled via
>  * ### the general prop function?
>  *
> + * For a description of @a props, see svn_editor_add_file().

Sure.

>...
> @@ -815,18 +874,47 @@
>                      svn_revnum_t revision,
>                      apr_hash_t *props,
>                      svn_boolean_t complete);
>
>  /** Drive @a editor's #svn_editor_cb_set_text_t callback.
>  *
>  * Set/change the text content of a file at @a relpath to @a contents
>  * with checksum @a checksum.
>  * ### TODO @todo Does this send the *complete* content, always?
> + * ### JAF: I think the idea is that 'contents' is the new full text and,
> + *     if wanted, the driver and receiver implementations should diff and
> + *     patch (respectively) it against the old full text.

The CONTENTS param is always the new full text. Always.

If the receiver is an RA committing editor, then it MAY choose to
generate a diff to send to the server. But that is private to the
receiver. If the driver is an RA update driver, then it MAY need to
apply a diff received from the server against the (known) base text
producing the new full text.

No other formats should be used within the CONTENTS stream (e.g never
an svndiff).

> + * ### JAF: This may be inefficient for an implementation that already
> + *     has a diff available but doesn't have a full text available

Don't worry about this. In all cases that I identified, the base text
was available to apply a patch against to derive the desired new full
text.

If you can identify a counter-example, then we can discuss. But
otherwise, this isn't an issue.

>...
> + * ### JAF: For the driver: @a contents is a readable stream. The editor
> + *     implementation may pull text from it as required, and will then
> + *     close it before returning. The editor [### need not | will]
> + *     pull all the text from the stream before closing it.

Somewhere in the doc, it should say that the receiver MAY close the
stream before reading anything from it. It MAY choose to close the
stream before reading all text from it, but that is unlikely. The
typical scenario is that the receiver chooses to not use the contents,
or that it will use the entire contents.

Because receivers may close the stream before reading anything,
drivers may choose to implement a "lazy-open" stream. ie. only on the
first read, would the driver make the necessary calls to wc_db to get
the pristine stream.

Note: this lazy-open concept is in the Ev2 notes, but it looks like it
didn't make it to svn_editor.h.

> + *
> + * ### JAF: For the receiver: @a contents is a readable stream. The
> + *     receiver may pull text from it as required, and will then
> + *     close it before returning. The receiver [### need not | will]
> + *     pull all the text from the stream before closing it.

This is basically above. The editor framework will never touch the
stream, so we're only talking about the receiver's callback here.

> + * ### JAF: Is it permissible for the text change to be a no-op? The
> + *     driver may wish to avoid doing so for efficiency if it knows,
> + *     but I'd say we should not forbid it. Either way we should
> + *     consider how to be consistent in this regard across all the
> + *     whole editor.

It is permissible, but stupid. I would be fine with saying "not
allowed". Or at least making a statement like "set_text() SHOULD make
changes to the contents."

>...

Cheers,
-g