You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Greg Stein <gs...@gmail.com> on 2015/02/04 03:22:00 UTC

Re: svn commit: r1656616 - in /subversion/branches/reuse-ra-session: BRANCH-README subversion/libsvn_client/client.h subversion/libsvn_client/ra_cache.c

On Mon, Feb 2, 2015 at 7:52 PM, <br...@apache.org> wrote:
>...

> +++ subversion/branches/reuse-ra-session/BRANCH-README Tue Feb  3 01:52:26
> 2015
> @@ -8,12 +8,19 @@ all changes made in the branch.
>
>  STATUS
>  ======
> -+ Initial implementation
> -- Do not call svn_ra_* methods in find_session_by_url() because callback
> -  table may be destroyed at that time.
> +
> +done:
> +- Initial implementation.
> +- Separate active and inactive session lists.
> +
> +todo:
> +- Fix timeout in davautocheck tests at log_tests.py::log_diff_moved.
> +- Limit the number of unused open sessions.
> +- Run performance comparisons between trunk and branch to prove that
> +  the RA session cache does in fact speed things up.
>

This is the part that I wonder about. If we had 1000 sessions, then I
*might* start to believe a separation would be interesting.

A cache of sessions: sure; that's why we already had a cache.

But to split that apart and keep multiple lists? Did you have an indicator
somewhere that this split could help? That "get me a connected RA session"
was somehow noticeably slow, relative to a simple iteration sessions?

Thanks,
-g

Re: svn commit: r1656616 - in /subversion/branches/reuse-ra-session: BRANCH-README subversion/libsvn_client/client.h subversion/libsvn_client/ra_cache.c

Posted by Branko Čibej <br...@wandisco.com>.

On 06.02.2015 09:54, Greg Stein wrote:
>
>
> On Wed, Feb 4, 2015 at 3:35 AM, Branko Čibej <brane@wandisco.com
> <ma...@wandisco.com>> wrote:
>
>     On 04.02.2015 03:22, Greg Stein wrote:
>     > On Mon, Feb 2, 2015 at 7:52 PM, <brane@apache.org
>     <ma...@apache.org>> wrote:
>     >> ...
>     >> +++ subversion/branches/reuse-ra-session/BRANCH-README Tue Feb 
>     3 01:52:26
>     >> 2015
>     >> @@ -8,12 +8,19 @@ all changes made in the branch.
>     >>
>     >>  STATUS
>     >>  ======
>     >> -+ Initial implementation
>     >> -- Do not call svn_ra_* methods in find_session_by_url()
>     because callback
>     >> -  table may be destroyed at that time.
>     >> +
>     >> +done:
>     >> +- Initial implementation.
>     >> +- Separate active and inactive session lists.
>     >> +
>     >> +todo:
>     >> +- Fix timeout in davautocheck tests at
>     log_tests.py::log_diff_moved.
>     >> +- Limit the number of unused open sessions.
>     >> +- Run performance comparisons between trunk and branch to
>     prove that
>     >> +  the RA session cache does in fact speed things up.
>     >>
>     > This is the part that I wonder about. If we had 1000 sessions,
>     then I
>     > *might* start to believe a separation would be interesting.
>     >
>     > A cache of sessions: sure; that's why we already had a cache.
>     >
>     > But to split that apart and keep multiple lists? Did you have an
>     indicator
>     > somewhere that this split could help? That "get me a connected
>     RA session"
>     > was somehow noticeably slow, relative to a simple iteration
>     sessions?
>
>     The important reason is that with this split it's much easier to
>     manage
>     the cached idle sessions than it was before the change. Before,
>     all the
>     sessions were stored in an unordered hash table, and the difference
>     between 'idle' and 'active' was in a few bits of data in the cache
>     entry
>     struct.
>
>
> "Code simplification" is a fine goal. But your log message said:
>
>   - it speeds up the search for an inactive session to reuse, since
>     the search will no longer have to iterate over active sessions;
>
> So I imagine you can understand my wondering :-P
>  
>
>
>     Now, the idle sessions are stored in a doubly-linked list, which is
>     ordered. This will make it easier in future to limit the number of
>     cached idle sessions and to remove the least-recently-used ones
>     from the
>     cache. It also ensures that when we want to reuse a session, we'll
>     always pick the one that was released most recently, which reduces the
>     chance that the session might have become invalid (connection
>     timed out,
>     Kerberos token invalidated, etc.) while it's been idle.
>
>
> Yups.
>  
>
>
>     And yes, I do expect that iterating through a shorter list will save
>     nanoseconds. :)
>
>
> But of course! :-P

So I went and updated the log message. Should be less confusing, if more
self-referential. :)

-- Brane

Re: svn commit: r1656616 - in /subversion/branches/reuse-ra-session: BRANCH-README subversion/libsvn_client/client.h subversion/libsvn_client/ra_cache.c

Posted by Greg Stein <gs...@gmail.com>.

On Wed, Feb 4, 2015 at 3:35 AM, Branko Čibej <br...@wandisco.com> wrote:

> On 04.02.2015 03:22, Greg Stein wrote:
> > On Mon, Feb 2, 2015 at 7:52 PM, <br...@apache.org> wrote:
> >> ...
> >> +++ subversion/branches/reuse-ra-session/BRANCH-README Tue Feb  3
> 01:52:26
> >> 2015
> >> @@ -8,12 +8,19 @@ all changes made in the branch.
> >>
> >>  STATUS
> >>  ======
> >> -+ Initial implementation
> >> -- Do not call svn_ra_* methods in find_session_by_url() because
> callback
> >> -  table may be destroyed at that time.
> >> +
> >> +done:
> >> +- Initial implementation.
> >> +- Separate active and inactive session lists.
> >> +
> >> +todo:
> >> +- Fix timeout in davautocheck tests at log_tests.py::log_diff_moved.
> >> +- Limit the number of unused open sessions.
> >> +- Run performance comparisons between trunk and branch to prove that
> >> +  the RA session cache does in fact speed things up.
> >>
> > This is the part that I wonder about. If we had 1000 sessions, then I
> > *might* start to believe a separation would be interesting.
> >
> > A cache of sessions: sure; that's why we already had a cache.
> >
> > But to split that apart and keep multiple lists? Did you have an
> indicator
> > somewhere that this split could help? That "get me a connected RA
> session"
> > was somehow noticeably slow, relative to a simple iteration sessions?
>
> The important reason is that with this split it's much easier to manage
> the cached idle sessions than it was before the change. Before, all the
> sessions were stored in an unordered hash table, and the difference
> between 'idle' and 'active' was in a few bits of data in the cache entry
> struct.
>

"Code simplification" is a fine goal. But your log message said:

  - it speeds up the search for an inactive session to reuse, since
    the search will no longer have to iterate over active sessions;

So I imagine you can understand my wondering :-P


>
> Now, the idle sessions are stored in a doubly-linked list, which is
> ordered. This will make it easier in future to limit the number of
> cached idle sessions and to remove the least-recently-used ones from the
> cache. It also ensures that when we want to reuse a session, we'll
> always pick the one that was released most recently, which reduces the
> chance that the session might have become invalid (connection timed out,
> Kerberos token invalidated, etc.) while it's been idle.
>

Yups.


>
> And yes, I do expect that iterating through a shorter list will save
> nanoseconds. :)
>

But of course! :-P

Thx,
-g

Re: svn commit: r1656616 - in /subversion/branches/reuse-ra-session: BRANCH-README subversion/libsvn_client/client.h subversion/libsvn_client/ra_cache.c

Posted by Branko Čibej <br...@wandisco.com>.

On 04.02.2015 03:22, Greg Stein wrote:
> On Mon, Feb 2, 2015 at 7:52 PM, <br...@apache.org> wrote:
>> ...
>> +++ subversion/branches/reuse-ra-session/BRANCH-README Tue Feb  3 01:52:26
>> 2015
>> @@ -8,12 +8,19 @@ all changes made in the branch.
>>
>>  STATUS
>>  ======
>> -+ Initial implementation
>> -- Do not call svn_ra_* methods in find_session_by_url() because callback
>> -  table may be destroyed at that time.
>> +
>> +done:
>> +- Initial implementation.
>> +- Separate active and inactive session lists.
>> +
>> +todo:
>> +- Fix timeout in davautocheck tests at log_tests.py::log_diff_moved.
>> +- Limit the number of unused open sessions.
>> +- Run performance comparisons between trunk and branch to prove that
>> +  the RA session cache does in fact speed things up.
>>
> This is the part that I wonder about. If we had 1000 sessions, then I
> *might* start to believe a separation would be interesting.
>
> A cache of sessions: sure; that's why we already had a cache.
>
> But to split that apart and keep multiple lists? Did you have an indicator
> somewhere that this split could help? That "get me a connected RA session"
> was somehow noticeably slow, relative to a simple iteration sessions?

The important reason is that with this split it's much easier to manage
the cached idle sessions than it was before the change. Before, all the
sessions were stored in an unordered hash table, and the difference
between 'idle' and 'active' was in a few bits of data in the cache entry
struct.

Now, the idle sessions are stored in a doubly-linked list, which is
ordered. This will make it easier in future to limit the number of
cached idle sessions and to remove the least-recently-used ones from the
cache. It also ensures that when we want to reuse a session, we'll
always pick the one that was released most recently, which reduces the
chance that the session might have become invalid (connection timed out,
Kerberos token invalidated, etc.) while it's been idle.

And yes, I do expect that iterating through a shorter list will save
nanoseconds. :)

-- Brane