You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2012/07/25 00:50:24 UTC

Re: svn commit: r1365324 - reparenting an RA session each time it's used

Bert pointed out on IRC that changes like this one (and I have been making other similar changes for some time) could potentially have an adverse performance effect in some cases, because reparenting an RA session is not free.  Our IRC chat [1]:


Bert julianf: I hope you can get rid of those two extra ra sessions in your improvements (re: r1365182) 
julianf Bert: Sure. 
julianf Doing RA session rationalization right now, in fact, for other functions in the merge code, so that I can do it there. 
julianf Many merge functions were taking two different sessions just because they 
relied on the session's current 'parent' URL matching the URL to be 
operated on. By making all the functions not care where the session is 
parented, most functions can work with just one session. 
julianf (Only functions where the two sessions might be opened to two different 
repositories will still need two -- and there are not many such 
functions because we don't support diffing between two repos. If we ever do in the future, I'm confident the changes we'll need will not just be reverting this.) 
julianf There are some functions where the merge source and the merge target can be 
in different repos. But most of those functions don't take a session for the target anyway. 
Bert julianf: Reparenting an ra session is very cheap compared to creating a new ra 
session, but in our current ra implementations it might use network io 
(e.g. in ra_svn) and it uses a bit of memory in the session pool. So if 
we really need two locations in some place during merge it might help to keep using two throughout the merge code. 
julianf We discussed this a bit on the list some weeks or months back and it's 
more forward-thinking (towards a time when we 'cache' and re-use 
sessions more globally) and cleaner overall to re-use a single session. 
Bert (We should be able to resolve the leak for all reasonably modern servers, 
but very old ra_svn servers don't allow reparenting and we just reopen 
an ra session) 
julianf Another way of looking at it is it's moving towards the design model where a 
session isn't "parented" at an interesting URL but always at the root of the repo and callers always specify a root-relative path. 
julianf Also note that the low-level reparenting functions optimize out that case where the URL isn't actually changing. 
Bert *nod* But if you hop between two urls its always changing. 
Bert I'm not sure if we do, but by switching from 2 to 1 that might make a difference. 
julianf Well, the main big deal is that simplifying the merge code is far more 
significant than an extra round-trip or ten when talking to an old 
ra_svn server or any other minor time penalties. 
Bert For every ra server it is a few roundtrips (as reparenting is forwarded to 
the server). For very old servers its adding another connection (and if I remember correctly not even closing the old one). 
Bert s/For every ra server/For every ra_svn server/ (for ra_serf it is a local operation) 
Bert For those old versions it might even be typing your password again. 
julianf Reparenting is widespread in our code already. 
julianf If it's that big a problem, wouldn't we know about it? 
Bert julianf: Did we know about those slow sqlite queries that did a table scan? 
Bert I'm not sure if it is a big problem, but switching from 2 to 1 ra session 
might double the reparentings. And that *might* make a difference for 
some use-cases. 
Bert I'm not sure and I doubt anyone knows. 
julianf OK, please raise it on dev@. 
Bert In 1.6 we just opened 4-6 ra sessions in a common merge, probably more. 
julianf I'll try to find the thread where we last discussed it. 
Bert I just tried to add that a reparent is not a 'free' operation... It is 
still an expensive operation compared to many other operations... But 
compared to opening a new ra session it is still at least 10 times 
faster. 
Bert Just like every wc_db operation by itself is cheap, but adding all of them 
they still slow down some common operations that operate per node. 
julianf OK, I know that. Do you think we need to discuss on dev@ and try to find 
the scenarios where it might be slower than today and measure it? 
Bert And maybe during merge most urls do match in both sessions so they are optimized away in most cases. I haven't tested that. 
julianf Tell you what... I#ll email dev@. 

My gut feeling is this isn't going to be significant, but does anyone have any concern here or think we should try to figure out what the worst cases could be and test or measure them?
- Julian

[1] <http://colabti.org/irclogger/irclogger_log/svn-dev?date=2012-07-24#l180>.

julianfoad@apache.org wrote:

> * subversion/libsvn_client/merge.c
>   (find_unsynced_ranges): Don't care where the RA session is parented. This
>     will make it easy for the caller to use any session that is open to the
>     right repository.

> +      const char *old_session_url;
> +
> +      SVN_ERR(svn_client__ensure_ra_session_url(
> +                &old_session_url, ra_session, target_loc->url, scratch_pool));

>        [...] find_unsynced_ranges(const svn_client__p
>                        TRUE, /* discover_changed_paths */
>                        log_find_operative_revs, &log_baton,
>                        scratch_pool));
> +
> +      SVN_ERR(svn_ra_reparent(ra_session, old_session_url, scratch_pool));
>      }

Re: svn commit: r1365324 - reparenting an RA session each time it's used

Posted by Greg Stein <gs...@gmail.com>.
On Jul 24, 2012 9:32 PM, "C. Michael Pilato" <cm...@collab.net> wrote:
>...
> For ra_serf:  I was right again.  It's just a string operation when the
> repos root URL is known.  But I overestimated how common it was for this
> value to be known.  All HTTPv2 servers transmit the repos root URL when
the
> session is initialized via the OPTIONS command.  For older servers, the
> repos root URL is calculated and cached opportunistically as part of other
> operations, but isn't strictly guaranteed to have been determined before
any
> particular reparent operation occurs.  And when discovered in this
fashion,
> it's done using the very sort of PROPFIND dance that HTTPv2 was introduced
> to eliminate.  So yeah, potentially pretty costly.

But that dance *must* occur at least once for old servers. Whether it
happens at reparent, or some other operation... as long as the result is
saved, then there is zero *additional* cost to a reparent in ra_serf.

Cheers,
-g

Re: svn commit: r1365324 - reparenting an RA session each time it's used

Posted by Julian Foad <ju...@btopenworld.com>.
I (Julian Foad) wrote on 25 July 2012:
> I can't see the session URL semantics documented where I'm looking for 

> it (in svn_ra.h).  Would the following doc string update for ra_open4() be an 
> improvement?

Committed in r1367794.  This doesn't touch on the authz implications which is the thing I most need to learn about, but just documents the basic existence of the session URL.

- Julian


> Index: subversion/include/svn_ra.h
> ===================================================================
> --- subversion/include/svn_ra.h    (revision 1365585)
> +++ subversion/include/svn_ra.h    (working copy)
> @@ -601,39 +601,52 @@
>  /**
>   * Open a repository access session to the repository at @a repos_URL,
>   * or inform the caller regarding a correct URL by which to access
>   * that repository.
>   *
>   * If @a repos_URL can be used successfully to access the repository,
>   * set @a *session_p to an opaque object representing a repository
>   * session for the repository and (if @a corrected_url is non-NULL)
>   * set @a *corrected_url to NULL.  If there's a better URL that the
>   * caller should try and @a corrected_url is non-NULL, set
>   * @a *session_p to NULL and @a *corrected_url to the corrected URL.  If
>   * there's a better URL that the caller should try, and @a
>   * corrected_url is NULL, return an #SVN_ERR_RA_SESSION_URL_MISMATCH
>   * error.  Allocate all returned items in @a pool.
>   *
> + * The @a repos_URL need not point to the root of the repository: subject
> + * to authorization, it may point to any path within the repository, even
> + * a path at which no node exists in the repository.  The session will
> + * remember this URL as its "session URL" (also called "session 
> root URL"),
> + * until changed by svn_ra_reparent().  Many RA functions take or return
> + * paths that are relative to the session URL.
> + *
> + * If a @a corrected_url is returned, it will point to the same path
> + * within the new repository root URL that @a repos_URL pointed to within
> + * the old repository root URL.
> + *
>   * Return @c SVN_ERR_RA_UUID_MISMATCH if @a uuid is non-NULL and not equal
>   * to the UUID of the repository at @c repos_URL.
>   *
>   * @a callbacks/@a callback_baton is a table of callbacks provided by the
>   * client; see @c svn_ra_callbacks2_t.
>   *
>   * @a config is a hash mapping <tt>const char *</tt> keys to
>   * @c svn_config_t * values.  For example, the @c svn_config_t for the
>   * "~/.subversion/config" file is under the key "config".
>   *
>   * All RA requests require a session; they will continue to
>   * use @a pool for memory allocation.
>   *
>   * @see svn_client_open_ra_session().
>   *
>   * @since New in 1.7.
>   */
>  svn_error_t *
>  svn_ra_open4(svn_ra_session_t **session_p,
>               const char **corrected_url,
>               const char *repos_URL,
>               const char *uuid,
>               const svn_ra_callbacks2_t *callbacks,
>               void *callback_baton,
> 

Re: svn commit: r1365324 - reparenting an RA session each time it's used

Posted by Julian Foad <ju...@btopenworld.com>.
I (Julian Foad) wrote:
> I (Julian Foad) wrote:
> Using the attached 'reparenting-monitor.patch' and running 
> merge_reintegrate_tests-10, I found that each of the merge 
> commands executed in that test performs between 2 and 50 reparentings.
> 
> DBG: ... sessions:  3, reparentings:  4 (+ no-ops  9)
> DBG: ... sessions:  3, reparentings:  4 (+ no-ops  7)
> DBG: ... sessions:  4, reparentings: 12 (+ no-ops 11)
> DBG: ... sessions:  6, reparentings: 22 (+ no-ops 19)
> DBG: ... sessions:  8, reparentings: 28 (+ no-ops 25)
> DBG: ... sessions:  2, reparentings:  2 (+ no-ops  7)
> DBG: ... sessions:  3, reparentings: 12 (+ no-ops 12)
> DBG: ... sessions:  3, reparentings:  8 (+ no-ops 10)
> DBG: ... sessions:  6, reparentings: 28 (+ no-ops 19)
> DBG: ... sessions:  8, reparentings: 36 (+ no-ops 25)

And I meant to say, the numbers increased when I applied the attached 'reparenting-one-session-1.patch' which uses a single session instead of two sessions in some of the merge code:

DBG: ... sessions:  3, reparentings:  4 (+ no-ops  9)
DBG: ... sessions:  3, reparentings:  4 (+ no-ops  7)
DBG: ... sessions:  4, reparentings: 12 (+ no-ops 11)
DBG: ... sessions:  6, reparentings: 30 (+ no-ops 15)
DBG: ... sessions:  8, reparentings: 36 (+ no-ops 21)
DBG: ... sessions:  2, reparentings:  2 (+ no-ops  7)
DBG: ... sessions:  3, reparentings: 12 (+ no-ops 12)
DBG: ... sessions:  3, reparentings:  8 (+ no-ops 10)
DBG: ... sessions:  6, reparentings: 36 (+ no-ops 15)
DBG: ... sessions:  8, reparentings: 44 (+ no-ops 21)

So I won't make changes like that.  In fact, I might do the reverse: look for places where it's already flip-flopping between the source and target trees and would benefit from being given two separate sessions.  I don't know yet if such places are due to changes I made in the recent-ish past; that's possible.

- Julian


[...]
> 
> So it seems it is wise to keep two main RA sessions -- one for the merge source 
> tree and one for the merge target tree (in the repo).
> 
> But what about when we examine paths in the history of the merge source (or 
> target) tree if the source (or target) has been renamed -- we follow the rename, 
> but that means we'll be asking about paths that are no longer within either 
> of those trees. (The 'session root' concept is strictly by path, not by 
> node object).

Re: svn commit: r1365324 - reparenting an RA session each time it's used

Posted by Julian Foad <ju...@btopenworld.com>.
I (Julian Foad) wrote:
> Julian Foad wrote:
>> Daniel Shahaf wrote:
>>> So... should we revv ra_svn so 1.8 clients/servers can talk to each
>>> other exclusively in repos-root-relative paths?
>> 
>> That sounds good to me, but I don't understand the authz impact.
>> 
>> In much of the merge code, it is convenient to carry around just one RA
>> session and reparent it as necessary to address (parts of) the source
>> and target trees in the repository.  I have been changing more of the
>> code to work that way, and would like to continue.
>> 
>> If we can alleviate any ra_svn performance impact by adjusting RA_svn,
>> that would seem to be best.

Using the attached 'reparenting-monitor.patch' and running merge_reintegrate_tests-10, I found that each of the merge 
commands executed in that test performs between 2 and 50 reparentings.

DBG: ... sessions:  3, reparentings:  4 (+ no-ops  9)
DBG: ... sessions:  3, reparentings:  4 (+ no-ops  7)
DBG: ... sessions:  4, reparentings: 12 (+ no-ops 11)
DBG: ... sessions:  6, reparentings: 22 (+ no-ops 19)
DBG: ... sessions:  8, reparentings: 28 (+ no-ops 25)
DBG: ... sessions:  2, reparentings:  2 (+ no-ops  7)
DBG: ... sessions:  3, reparentings: 12 (+ no-ops 12)
DBG: ... sessions:  3, reparentings:  8 (+ no-ops 10)
DBG: ... sessions:  6, reparentings: 28 (+ no-ops 19)
DBG: ... sessions:  8, reparentings: 36 (+ no-ops 25)

> To reduce the load on ra_sv, there's also a coding style change I could 
> make.  In relatively low-level merge functions, instead of unconditionally 
> reparenting (temporarily) I could reparent only if the requested URL is
> not a child of the current session URL.

The numbers did not reduce when I taught some of the code to only reparent if the new URL is not inside the old URL, because nearly all of the reparentings are flip-flopping between the merge source tree and the merge target tree.  That's the worst-case scenario that Bert mentioned.

So it seems it is wise to keep two main RA sessions -- one for the merge source tree and one for the merge target tree (in the repo).

But what about when we examine paths in the history of the merge source (or target) tree if the source (or target) has been renamed -- we follow the rename, but that means we'll be asking about paths that are no longer within either of those trees. (The 'session root' concept is strictly by path, not by node object).

- Julian

Re: svn commit: r1365324 - reparenting an RA session each time it's used

Posted by Julian Foad <ju...@btopenworld.com>.
Julian Foad wrote:

> Daniel Shahaf wrote:
>> C. Michael Pilato wrote on Tue, Jul 24, 2012 at 21:31:49 -0400:
>>>  For ra_svn:  I was totally wrong.  This thing always requires network
>>>  activity: a "reparent" command/response at best; at worst, the complete
>>>  teardown and re-opening of the session.  This is just a side-effect of the
>>>  stateful protocol.  Unlike with HTTP, the server here is privy to the
>>>  "session URL" concept -- clients only perform operations using relpaths
>>>  against that URL -- and so the server must be told when that has changed.
>> 
>> So... should we revv ra_svn so 1.8 clients/servers can talk to each
>> other exclusively in repos-root-relative paths?
> 
> That sounds good to me, but I don't understand the authz impact.
> 
> In much of the merge code, it is convenient to carry around just one RA session 
> and reparent it as necessary to address (parts of) the source and target trees 
> in the repository.  I have been changing more of the code to work that way, and 
> would like to continue.
> 
> If we can alleviate any ra_svn performance impact by adjusting RA_svn, that 
> would seem to be best.

To reduce the load on ra_sv, there's also a coding style change I could make.  In relatively low-level merge functions, instead of unconditionally reparenting (temporarily) I could reparent only if the requested URL is not a child of the current session URL.

Instead of

  fetch_foo(url, session)
  {
    old_url = svn_ra_reparent(url)
    result = foo(relpath="", session)
    svn_ra_reparent(old_url)
  }

change to

  fetch_foo(url, session)
  {
    session_url = svn_ra_get_session_url(session)
    old_url = NULL

    if (! (relpath = svn_url_skip_ancestor(session_url, url)))
      {
        old_url = svn_ra_reparent(url)
        relpath = ""
      }
    result = foo(relpath, session)
    svn_ra_reparent(old_url)
  }

- Julian

Re: svn commit: r1365324 - reparenting an RA session each time it's used

Posted by Julian Foad <ju...@btopenworld.com>.
Daniel Shahaf wrote:

>C. Michael Pilato wrote on Tue, Jul 24, 2012 at 21:31:49 -0400:
>> For ra_svn:  I was totally wrong.  This thing always requires network
>> activity: a "reparent" command/response at best; at worst, the complete
>> teardown and re-opening of the session.  This is just a side-effect of the
>> stateful protocol.  Unlike with HTTP, the server here is privy to the
>> "session URL" concept -- clients only perform operations using relpaths
>> against that URL -- and so the server must be told when that has changed.
>
>So... should we revv ra_svn so 1.8 clients/servers can talk to each
>other exclusively in repos-root-relative paths?

That sounds good to me, but I don't understand the authz impact.

In much of the merge code, it is convenient to carry around just one RA session and reparent it as necessary to address (parts of) the source and target trees in the repository.  I have been changing more of the code to work that way, and would like to continue.


If we can alleviate any ra_svn performance impact by adjusting RA_svn, that would seem to be best.

I can't see the session URL semantics documented where I'm looking for it (in svn_ra.h).  Would the following doc string update for ra_open4() be an improvement?

- Julian


Index: subversion/include/svn_ra.h
===================================================================
--- subversion/include/svn_ra.h    (revision 1365585)
+++ subversion/include/svn_ra.h    (working copy)
@@ -601,39 +601,52 @@
 /**
  * Open a repository access session to the repository at @a repos_URL,
  * or inform the caller regarding a correct URL by which to access
  * that repository.
  *
  * If @a repos_URL can be used successfully to access the repository,
  * set @a *session_p to an opaque object representing a repository
  * session for the repository and (if @a corrected_url is non-NULL)
  * set @a *corrected_url to NULL.  If there's a better URL that the
  * caller should try and @a corrected_url is non-NULL, set
  * @a *session_p to NULL and @a *corrected_url to the corrected URL.  If
  * there's a better URL that the caller should try, and @a
  * corrected_url is NULL, return an #SVN_ERR_RA_SESSION_URL_MISMATCH
  * error.  Allocate all returned items in @a pool.
  *
+ * The @a repos_URL need not point to the root of the repository: subject
+ * to authorization, it may point to any path within the repository, even
+ * a path at which no node exists in the repository.  The session will
+ * remember this URL as its "session URL" (also called "session root URL"),
+ * until changed by svn_ra_reparent().  Many RA functions take or return
+ * paths that are relative to the session URL.
+ *
+ * If a @a corrected_url is returned, it will point to the same path
+ * within the new repository root URL that @a repos_URL pointed to within
+ * the old repository root URL.
+ *
  * Return @c SVN_ERR_RA_UUID_MISMATCH if @a uuid is non-NULL and not equal
  * to the UUID of the repository at @c repos_URL.
  *
  * @a callbacks/@a callback_baton is a table of callbacks provided by the
  * client; see @c svn_ra_callbacks2_t.
  *
  * @a config is a hash mapping <tt>const char *</tt> keys to
  * @c svn_config_t * values.  For example, the @c svn_config_t for the
  * "~/.subversion/config" file is under the key "config".
  *
  * All RA requests require a session; they will continue to
  * use @a pool for memory allocation.
  *
  * @see svn_client_open_ra_session().
  *
  * @since New in 1.7.
  */
 svn_error_t *
 svn_ra_open4(svn_ra_session_t **session_p,
              const char **corrected_url,
              const char *repos_URL,
              const char *uuid,
              const svn_ra_callbacks2_t *callbacks,
              void *callback_baton,

Re: svn commit: r1365324 - reparenting an RA session each time it's used

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
C. Michael Pilato wrote on Tue, Jul 24, 2012 at 21:31:49 -0400:
> For ra_svn:  I was totally wrong.  This thing always requires network
> activity: a "reparent" command/response at best; at worst, the complete
> teardown and re-opening of the session.  This is just a side-effect of the
> stateful protocol.  Unlike with HTTP, the server here is privy to the
> "session URL" concept -- clients only perform operations using relpaths
> against that URL -- and so the server must be told when that has changed.

So... should we revv ra_svn so 1.8 clients/servers can talk to each
other exclusively in repos-root-relative paths?

Re: svn commit: r1365324 - reparenting an RA session each time it's used

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 07/24/2012 06:50 PM, Julian Foad wrote:
> Bert 	I just tried to add that a reparent is not a 'free' operation... It is
> still an expensive operation compared to many other operations... But
> compared to opening a new ra session it is still at least 10 times faster.

I was curious about just how expensive reparent was.  I naively expected it
was a simple string operation (assuming the repos root URL was cached in the
session baton).  For others who were wondering, here's what I found.

For ra_local:  I was right.  String math only.

For ra_serf:  I was right again.  It's just a string operation when the
repos root URL is known.  But I overestimated how common it was for this
value to be known.  All HTTPv2 servers transmit the repos root URL when the
session is initialized via the OPTIONS command.  For older servers, the
repos root URL is calculated and cached opportunistically as part of other
operations, but isn't strictly guaranteed to have been determined before any
particular reparent operation occurs.  And when discovered in this fashion,
it's done using the very sort of PROPFIND dance that HTTPv2 was introduced
to eliminate.  So yeah, potentially pretty costly.

For ra_svn:  I was totally wrong.  This thing always requires network
activity: a "reparent" command/response at best; at worst, the complete
teardown and re-opening of the session.  This is just a side-effect of the
stateful protocol.  Unlike with HTTP, the server here is privy to the
"session URL" concept -- clients only perform operations using relpaths
against that URL -- and so the server must be told when that has changed.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development


Re: svn commit: r1365324 - reparenting an RA session each time it's used

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On Wed, Jul 25, 2012 at 2:50 AM, Julian Foad <ju...@btopenworld.com> wrote:
>
> Bert pointed out on IRC that changes like this one (and I have been making
> other similar changes for some time) could potentially have an adverse
> performance effect in some cases, because reparenting an RA session is
> not free.  Our IRC chat [1]:
>
I think that best solution would be completely deprecate concept of RA
session root URL and switch to repos-root relative path or full URL in
svn_ra_* API.

It was discussed three years ago:
[RFC] Concept of RA session relative paths
http://svn.haxx.se/dev/archive-2009-10/0334.shtml


--
Ivan Zhakov