You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Philip Martin <ph...@wandisco.com> on 2014/03/28 20:52:59 UTC

Re: Review of lock-many API

Julian Foad <ju...@btopenworld.com> writes:

> Philip Martin wrote:
>> Julian Foad writes:
>>>>  URL: http://svn.apache.org/r1577280
>>>>  * subversion/include/svn_fs.h
>>>>    (svn_fs_lock_target_t, svn_fs_lock_result_t,
>>>>     svn_fs_lock2, svn_fs_unlock2): new.
>>>> 
>>>>  * subversion/include/svn_repos.h
>>>>    (svn_repos_fs_lock2, svn_repos_fs_unlock2): new.
>>> 
>>>  Do we intend to deprecate the old versions? If not, then it would be
>>>  better to name the new functions something like ...lock_many() instead
>>>  of ...lock2().
>> 
>> I'm unsure.  In some ways it is convenient to have a single path
>> function as it avoids the need to construct hashes and the error
>> handling is simpler and less prone to leak.  On the other hand when
>> svn_client_copy4 introduced multi-path copy source the single path
>> version was deprecated.
>
> Thoughts...
>
> A low level API such as libsvn_fs should aim to be clean and
> efficient, more so than to provide functions that are convenient for
> the casual user. (A high level API such as libsvn_client or the
> bindings is a more appropriate place to provide convenience
> functions.) We do have a number of APIs already, in several libraries,
> where the caller is expected to pass in a list of targets even if they
> only want to operate on one. Maybe in general the thing to aim for is
> to make it easy for a caller to do so.
>
> We could provide a singleton constructor for the
> hash-of-svn_fs_lock_target_t argument. Should we? Probably not at the
> libsvn_fs level; maybe at a high level.

I'l look at doing that.

> I started reading further through the new code. Here are some more
> review comments.
>
> The 'comment' parameter to svn_fs_lock2 should be a member of the
> 'target' struct, since it is inherently a per-target attribute. Of
> course there are common scenarios where a client wants to lock a bunch
> of paths all with the same comment, but the benefits of a lock-many
> API should also be made available to clients which supply different
> comments. This applies all the way up the call stack. However, I note
> that from the RA layer upwards we have had lock-many APIs since v1.2
> which take a single comment for all the paths, so changing to
> per-target comments throughout the stack is perhaps beyond the scope
> of this issue. We could still do it at the FS and repos layers now in
> preparation; I don't see that it would add significant overhead in
> run-time or in maintenance.

I think per-lock comments in a multi-path lock are unneccesary.  I don't
think anybody locking dozens/hundreds of files will bother defining
different comments.  Anyone who wants two or three locks with distinct
comments can call the function multiple times.

> In principle, the 'is_dav_comment' and 'expiration_date' parameters
> should similarly be per target, but that makes less sense in practice
> as they're only used for generic DAV clients via mod_dav_svn. As an
> alternative, we might consider dropping them from this API and keeping
> the original single-lock API (not deprecated) to cater for such
> locks. RA-local and svnserve and 'svnadmin lock' don't support
> expiration and always pass is_dav_comment=FALSE. Only mod_dav_svn
> supplies these two options, and it currently only uses the old
> one-lock-at-a-time API anyway.

Eventually we will define a new request and mod_dav_svn will start
calling the multiple path version.  It will still handle the current
LOCK request but may use the multi-path function to do it.

> We should provide a destructor for the hash-of-svn_fs_lock_result_t
> results, which contains errors that need to be cleared. svnserve's
> lock_many() already contains such code twice. It's simple code --
> around 5 lines -- but if we're demanding that callers ensure they do
> it, it's nice to provide a ready-made way for them to do it. This
> would also help in future-proofing against us revising the API in
> later releases.

I have a look at that.

> The semantics relating to returning one error per path and an overall
> error needs to be fully documented and/or changed. For example,
> svn_repos_fs_lock() assumes that svn_repos_fs_lock2() has set *results
> to a valid hash if it returns any overall error, while svn_fs_lock()
> assumes that svn_fs_lock2() WON'T have put any error in *results if it
> returns an error overall. These look at least surprising.
>

That's an error in libsvn_fs, libsvn_repos does it right.  Fixed in
r1582845.

> +  /* [JAF] Is that a question to reviewers? It depends. What errors can
> +           svn_fs_lock2 return, and do any of them justify not running the
> +           post-lock hook? Can it even return an error and also create
> +           locks? Its doc string needs to say. */
>    if (err)
>      return svn_error_trace(err);

It's almost always possible for fs functions to return arbitrary errors
due to filesystem permissions, lack of disk space etc.  This could
happen part way through creating locks.  I'm not sure whether we should
attempt to run the post-hook.


> Index: subversion/svnserve/serve.c
> ===================================================================
> --- subversion/svnserve/serve.c    (revision 1582225)
> +++ subversion/svnserve/serve.c    (working copy)
> @@ -2733,7 +2733,8 @@ static svn_error_t *lock_many(svn_ra_svn
>       an error. */
>    SVN_ERR(must_have_access(conn, pool, b, svn_authz_write, NULL, TRUE));
>
> -  /* Loop through the lock requests. */
> +  /* Loop through the lock requests
> +     ### and do what? */

I just copied that comment from the old code. I probably should have
removed it.

>    for (i = 0; i < path_revs->nelts; ++i)
>      {
>        const char *path, *full_path;
> @@ -2759,6 +2760,8 @@ static svn_error_t *lock_many(svn_ra_svn
>        target->current_rev = current_rev;
>
>        /* We could check for duplicate paths and reject the request? */
> +      /* ### [JAF] Is that a question to reviewers? We could, but I
> +         don't think it's useful to do so. */

I brought up the handling of canonical paths on the dev list, it wasn't
conclusive from my point of view so I just implemented something.  Given
that FS generally accepts non-canonical paths it is hard to know what to
do.

>        svn_hash_sets(targets, full_path, target);
>      }
>
> @@ -2767,6 +2770,11 @@ static svn_error_t *lock_many(svn_ra_svn
>
>    /* From here on we need to make sure any errors in authz_results, or
>       results, are cleared before returning from this function. */
> +
> +  /* Check authz access for each target, because ...
> +     ### Why? We don't want svn_repos_fs_lock2() to do this for us ...?
> +         Or, we want to log such errors and it's easier to do so before
> +         than afterwards? */

That's the way servers/repos functions work.

>    for (hi = apr_hash_first(pool, targets); hi; hi = apr_hash_next(hi))
>      {
>        const char *full_path = svn__apr_hash_index_key(hi);
> @@ -2791,7 +2799,8 @@ static svn_error_t *lock_many(svn_ra_svn
>                             0, /* No expiration time. */
>                             steal_lock, pool, subpool);
>
> -  /* The client expects results in the same order as paths were supplied. */
> +  /* Send the results. The client expects results in the same order as
> +     paths were supplied. */
>    for (i = 0; i < path_revs->nelts; ++i)
>      {
>        const char *path, *full_path;
> @@ -2816,6 +2825,16 @@ static svn_error_t *lock_many(svn_ra_svn
>          result = svn_hash_gets(authz_results, full_path);
>        if (!result)
>          /* No result?  Should we return some sort of placeholder error? */
> +        /* ### [JAF] Is that a question to reviewers? Certainly something
> +           has gone wrong. Maybe svn_repos_fs_lock2 returned an error
> +           overall, having processed none or only some paths -- is it
> +           allowed to do so?

It could run out of disk space half way through.

> Breaking here would signal to the client that
> +           something went wrong, because they'll see a too-short response
> +           list, but would also potentially hide information about further
> +           targets that were processed, some of which perhaps were locked
> +           successfully. (Note: we're scanning them here in a different
> +           order from the order in which svn_repos_fs_lock2() processed
> +           them.) */
>          break;
>
>        if (result->err)


-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Review of lock-many API

Posted by Julian Foad <ju...@btopenworld.com>.

Hi Philip.

Philip Martin wrote:

> Julian Foad writes:
>>  Index: subversion/svnserve/serve.c
>>  ===================================================================

>>  -  /* The client expects results in the same order as paths were supplied. */
>>  +  /* Send the results. The client expects results in the same order as
>>  +     paths were supplied. */
>>     for (i = 0; i < path_revs->nelts; ++i)
>>       {
>>         const char *path, *full_path;
>>  @@ -2816,6 +2825,16 @@ static svn_error_t *lock_many(svn_ra_svn
>>           result = svn_hash_gets(authz_results, full_path);
>>         if (!result)
>>           /* No result?  Should we return some sort of placeholder error? */
>>  +        /* ### [JAF] Is that a question to reviewers? Certainly something
>>  +           has gone wrong. Maybe svn_repos_fs_lock2 returned an error
>>  +           overall, having processed none or only some paths -- is it
>>  +           allowed to do so?
> 
> It could run out of disk space half way through.

AFAICT, if I'm reading this right, then a valid reason for not finding a result here is if svn_repos_fs_lock_many returned an error overall, having locked some but not all of the paths. Doesn't really matter why.

Breaking here seems wrong. We're processing the targets in this loop in a different order from the hash order that svn_repos_fs_lock_many() was using to lock them. Therefore we are potentially stopping before we have given responses to all the paths that were successfully locked. We would stop the response list at this point, and terminate it with a successful "done" keyword, but the response list would have too few elements. This seems arbitrary and broken.

I can think of two alternatives. We could just not generate a response list in the case where svn_repos_fs_lock_many returns an error, but simply report the error. That's effectively no worse than the current behaviour. Or we could insert an error response or some sort of place-holder entry in the list for this target, and then continue to the next target, to give a full response.

Does that make sense and would either of those options work?

- Julian

>>                          Breaking here would signal to the client that
>>  +           something went wrong, because they'll see a too-short response
>>  +           list, but would also potentially hide information about further
>>  +           targets that were processed, some of which perhaps were locked
>>  +           successfully. (Note: we're scanning them here in a different
>>  +           order from the order in which svn_repos_fs_lock2() processed
>>  +           them.) */
>>           break;

Re: Review of lock-many API

Posted by Julian Foad <ju...@btopenworld.com>.

Philip Martin wrote:
> Julian Foad <ju...@btopenworld.com> writes:
>> Imagine the caller is an svn GUI that has an offline mode in which the
>> user can queue up lock requests, and the software will send them to
>> the server when it's back on line. Does the GUI's back end need to run
>> through the queued lock requests and re-group them into batches such
>> that all the lock requests in a batch have the same comment?
>> 
>> Or imagine we want to teach 'svnsync' to synchronize locks: it reads
>> all the locks in the source repo and writes them to the target
>> repo. Do we want to force this software to re-group the locks into
>> batches or write them one at a time? Wouldn't it be better if it can
>> just say "Here is a bunch of locks; please write them"?
>> 
>> The principle that makes sense to me is, if the semantics of the data
>> objects being written is one comment per lock, then the API should
>> match that, and not restrict the caller unnecessarily.
> 
> Those examples rely on a new RA API that we don't have.  Making the FS
> API support more than the RA API might future-proof the FS API but it
> might also be a dead end that is never used.  We don't know exactly how
> lock replication will work and whether it will require changes to the
> storage backend, but if it does require backend changes then any FS API
> we create today may not be sufficient.
> 
> Lock usernames are another problem. At present the username is passed
> via an svn_fs_access_t and stored in each lock but there is no mechanism
> to pass per-lock usernames and no mechanism to modify usernames on
> locks.  Should we pass per-lock usernames with the rest of the lock
> data, or should we introduce per-lock usernames in svn_fs_access_t?  The
> first is contrary to all other FS APIs, the second is a totally new API
> that nobody yet needs.  I don't know the answer and since we don't need
> this feature yet I prefer not to make a decision at this time.

Ack. Okay.

> I have made svn_fs_lock_target_t opaque in the public API which should
> make it easier to add new fields in future.

OK.

Thanks.

- Julian

Re: Review of lock-many API

Posted by Philip Martin <ph...@wandisco.com>.

Julian Foad <ju...@btopenworld.com> writes:

> Imagine the caller is an svn GUI that has an offline mode in which the
> user can queue up lock requests, and the software will send them to
> the server when it's back on line. Does the GUI's back end need to run
> through the queued lock requests and re-group them into batches such
> that all the lock requests in a batch have the same comment?
>
> Or imagine we want to teach 'svnsync' to synchronize locks: it reads
> all the locks in the source repo and writes them to the target
> repo. Do we want to force this software to re-group the locks into
> batches or write them one at a time? Wouldn't it be better if it can
> just say "Here is a bunch of locks; please write them"?
>
> The principle that makes sense to me is, if the semantics of the data
> objects being written is one comment per lock, then the API should
> match that, and not restrict the caller unnecessarily.

Those examples rely on a new RA API that we don't have.  Making the FS
API support more than the RA API might future-proof the FS API but it
might also be a dead end that is never used.  We don't know exactly how
lock replication will work and whether it will require changes to the
storage backend, but if it does require backend changes then any FS API
we create today may not be sufficient.

Lock usernames are another problem. At present the username is passed
via an svn_fs_access_t and stored in each lock but there is no mechanism
to pass per-lock usernames and no mechanism to modify usernames on
locks.  Should we pass per-lock usernames with the rest of the lock
data, or should we introduce per-lock usernames in svn_fs_access_t?  The
first is contrary to all other FS APIs, the second is a totally new API
that nobody yet needs.  I don't know the answer and since we don't need
this feature yet I prefer not to make a decision at this time.

I have made svn_fs_lock_target_t opaque in the public API which should
make it easier to add new fields in future.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Review of lock-many API

Posted by Julian Foad <ju...@btopenworld.com>.

Hi Philip. Thanks for the responses.

(I see you've changed the API implementation to a callback pattern now.)

Philip Martin wrote:
> Julian Foad <ju...@btopenworld.com> writes:
>> The 'comment' parameter to svn_fs_lock2 should be a member of the
>> 'target' struct, since it is inherently a per-target attribute. Of
>> course there are common scenarios where a client wants to lock a bunch
>> of paths all with the same comment, but the benefits of a lock-many
>> API should also be made available to clients which supply different
>> comments. This applies all the way up the call stack. However, I note
>> that from the RA layer upwards we have had lock-many APIs since v1.2
>> which take a single comment for all the paths, so changing to
>> per-target comments throughout the stack is perhaps beyond the scope
>> of this issue. We could still do it at the FS and repos layers now in
>> preparation; I don't see that it would add significant overhead in
>> run-time or in maintenance.
> 
> I think per-lock comments in a multi-path lock are unneccesary.  I don't
> think anybody locking dozens/hundreds of files will bother defining
> different comments.  Anyone who wants two or three locks with distinct
> comments can call the function multiple times.

For a high level client API, I agree with you. But the same argument doesn't make sense for the low level APIs. Here, it's not a matter of what *people* are likely to do, but another layer of *software*.

Imagine the caller is an svn GUI that has an offline mode in which the user can queue up lock requests, and the software will send them to the server when it's back on line. Does the GUI's back end need to run through the queued lock requests and re-group them into batches such that all the lock requests in a batch have the same comment?

Or imagine we want to teach 'svnsync' to synchronize locks: it reads all the locks in the source repo and writes them to the target repo. Do we want to force this software to re-group the locks into batches or write them one at a time? Wouldn't it be better if it can just say "Here is a bunch of locks; please write them"?

The principle that makes sense to me is, if the semantics of the data objects being written is one comment per lock, then the API should match that, and not restrict the caller unnecessarily.

>> In principle, the 'is_dav_comment' and 'expiration_date' parameters
>> should similarly be per target, but that makes less sense in practice
>> as they're only used for generic DAV clients via mod_dav_svn. As an
>> alternative, we might consider dropping them from this API and keeping
>> the original single-lock API (not deprecated) to cater for such
>> locks. RA-local and svnserve and 'svnadmin lock' don't support
>> expiration and always pass is_dav_comment=FALSE. Only mod_dav_svn
>> supplies these two options, and it currently only uses the old
>> one-lock-at-a-time API anyway.
> 
> Eventually we will define a new request and mod_dav_svn will start
> calling the multiple path version.  It will still handle the current
> LOCK request but may use the multi-path function to do it.

Eventually we will define a new request and mod_dav_svn MAY start calling the multiple path version of the API, but it will never need to set multiple locks at once, because mod_dav_svn talks to DAV via the dav_hooks_locks vtable which only deals with one path at a time. Look at the calls to svn_repos_fs_lock() in mod_dav_svn/lock.c. (A slight confusion is that the vtable methods support multiple locks per path; but that's irrelevant as our implementation doesn't.)

So it could still just as efficiently use the old single-path version of the API, if we support that, couldn't it?

[...]
>>  Index: subversion/svnserve/serve.c
>>  ===================================================================
>>  --- subversion/svnserve/serve.c    (revision 1582225)
>>  +++ subversion/svnserve/serve.c    (working copy)
>>  @@ -2733,7 +2733,8 @@ static svn_error_t *lock_many(svn_ra_svn
>>        an error. */
>>     SVN_ERR(must_have_access(conn, pool, b, svn_authz_write, NULL, TRUE));
>> 
>>  -  /* Loop through the lock requests. */
>>  +  /* Loop through the lock requests
>>  +     ### and do what? */
> 
> I just copied that comment from the old code. I probably should have
> removed it.

Well, I find it very useful to have a comment at the top of a non-trivial block of code summarizing what it's supposed to be doing. How about

    /* Parse PATH_REVS into TARGETS. */

here?

>>     for (i = 0; i < path_revs->nelts; ++i)
>>       {
>>         const char *path, *full_path;
>>  @@ -2759,6 +2760,8 @@ static svn_error_t *lock_many(svn_ra_svn
>>         target->current_rev = current_rev;
>> 
>>         /* We could check for duplicate paths and reject the request? */
>>  +      /* ### [JAF] Is that a question to reviewers? We could, but I
>>  +         don't think it's useful to do so. */
> 
> I brought up the handling of canonical paths on the dev list, it wasn't
> conclusive from my point of view so I just implemented something.  Given
> that FS generally accepts non-canonical paths it is hard to know what to
> do.

Agreed. Let's just update the comment here to say we decided not to?

Please could you mark comments with '###' or similar when they are questions to reviewers about the code or design being unfinished or questionable, etc.

- Julian