You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Shahaf <da...@elego.de> on 2012/12/17 11:25:25 UTC

Re: svn commit: r1413451 - /subversion/branches/cache-server/BRANCH-README

stefan2@apache.org wrote on Mon, Nov 26, 2012 at 00:21:26 -0000:
> Author: stefan2
> Date: Mon Nov 26 00:21:26 2012
> New Revision: 1413451
> 
> URL: http://svn.apache.org/viewvc?rev=1413451&view=rev
> Log:
> On the cache-server branch.
> 
> * BRANCH-README: add
> 
> Added:
>     subversion/branches/cache-server/BRANCH-README
> 
> Added: subversion/branches/cache-server/BRANCH-README
> URL: http://svn.apache.org/viewvc/subversion/branches/cache-server/BRANCH-README?rev=1413451&view=auto
> ==============================================================================
> --- subversion/branches/cache-server/BRANCH-README (added)
> +++ subversion/branches/cache-server/BRANCH-README Mon Nov 26 00:21:26 2012
> @@ -0,0 +1,109 @@
> +Goal
> +====
> +
> +Provide a stand-alone executable that will provide a svn_cache__t 
> +implementation based on a single shared memory.  The core data
> +structure and access logic can be taken from / shared with today's
> +membuffer cache.  The latter shall remain available as it is now.

memcached solves the problem you're stating above, and it's an
independent third-party project.  Your solution is specific to
Subversion (it's in libsvn_subr and is not in the public API).  If
you're solving the same problem memcached does, why does your solution
need to be specific to svn?  Should it be a standalone tool that
Subversion interfaces to as an optional dependency, and any other
memcached consumer can switch to too?

I don't mean to discourage you from doing this work; I just wonder
whether the non-public parts of libsvn_subr is the right place for
it to live in.

Re: svn commit: r1413451 - /subversion/branches/cache-server/BRANCH-README

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Thu, Dec 20, 2012 at 5:08 AM, Branko Čibej <br...@wandisco.com> wrote:

> On 20.12.2012 02:08, Stefan Fuhrmann wrote:
> > The ineffectiveness of our use of memcached in 1.6 had
> > prompted the development of membuffer in the first place.
> >
> > Despite the relevant APR bug that got fixed only recently,
> > there are fundamental limitations compared to a SHM-based
> > implementation:
> >
> > * Instead of reading directly addressable memory, memcached
> >   requires inter-process calls over TCP/IP. That translated into
> >   ~10us latency. The performance numbers I found (~200k
> >   requests/s on larger SMP machines) are 1 order of magnitude
> >   less than what membuffer achieves with 1 thread.
>
> What kind of latency do you expect when you share this cache amongst
> several processes that have to use some other kind of RPC and/or locking
> to access the shared-memory segment? I'm ignoring marshalling since it
> costs the same in both cases.
>

Read lock is expected to be ~10ns in typical cases,
write lock would be 100 .. 200ns (in case no wait is required).
IOW, the latency is more or less the same as we have today
with our in-process caches.

No RPC is required once the connection to the shared mem
cache has been set up. The cache server only:

* creates & initializes the cache memory
* provides a registry for cache clients
* periodically checks for dead clients (to release zombi locks)


>  > * More critically, memcached does not support for partial data
> >   access, e.g. reading or adding a single directory entry. That's
> >   1.6-esque O(n^2) instead of O(n) runtime for large folders.
>
> That's an issue of cache layout. But I concede the point since it's a
> time vs. space decision.
>

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
*

http://www.wandisco.com/subversion/download
*

Re: svn commit: r1413451 - /subversion/branches/cache-server/BRANCH-README

Posted by Branko Čibej <br...@wandisco.com>.
On 20.12.2012 02:08, Stefan Fuhrmann wrote:
> The ineffectiveness of our use of memcached in 1.6 had
> prompted the development of membuffer in the first place.
>
> Despite the relevant APR bug that got fixed only recently,
> there are fundamental limitations compared to a SHM-based
> implementation:
>
> * Instead of reading directly addressable memory, memcached
>   requires inter-process calls over TCP/IP. That translated into
>   ~10us latency. The performance numbers I found (~200k
>   requests/s on larger SMP machines) are 1 order of magnitude
>   less than what membuffer achieves with 1 thread.

What kind of latency do you expect when you share this cache amongst
several processes that have to use some other kind of RPC and/or locking
to access the shared-memory segment? I'm ignoring marshalling since it
costs the same in both cases.

> * More critically, memcached does not support for partial data
>   access, e.g. reading or adding a single directory entry. That's
>   1.6-esque O(n^2) instead of O(n) runtime for large folders.

That's an issue of cache layout. But I concede the point since it's a
time vs. space decision.


-- Brane


-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: svn commit: r1413451 - /subversion/branches/cache-server/BRANCH-README

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Mon, Dec 17, 2012 at 1:29 PM, Branko Čibej <br...@wandisco.com> wrote:

> Resent to the correct list.
>
> On 17.12.2012 11:41, Branko Čibej wrote:
> > On 17.12.2012 11:25, Daniel Shahaf wrote:
> >> stefan2@apache.org wrote on Mon, Nov 26, 2012 at 00:21:26 -0000:
> >>> Author: stefan2
> >>> Date: Mon Nov 26 00:21:26 2012
> >>> New Revision: 1413451
> >>>
> >>> URL: http://svn.apache.org/viewvc?rev=1413451&view=rev
> >>> Log:
> >>> On the cache-server branch.
> >>>
> >>> * BRANCH-README: add
> >>>
> >>> Added:
> >>>     subversion/branches/cache-server/BRANCH-README
> >>>
> >>> Added: subversion/branches/cache-server/BRANCH-README
> >>> URL:
> http://svn.apache.org/viewvc/subversion/branches/cache-server/BRANCH-README?rev=1413451&view=auto
> >>>
> ==============================================================================
> >>> --- subversion/branches/cache-server/BRANCH-README (added)
> >>> +++ subversion/branches/cache-server/BRANCH-README Mon Nov 26 00:21:26
> 2012
> >>> @@ -0,0 +1,109 @@
> >>> +Goal
> >>> +====
> >>> +
> >>> +Provide a stand-alone executable that will provide a svn_cache__t
> >>> +implementation based on a single shared memory.  The core data
> >>> +structure and access logic can be taken from / shared with today's
> >>> +membuffer cache.  The latter shall remain available as it is now.
> >> memcached solves the problem you're stating above, and it's an
> >> independent third-party project.
>

The ineffectiveness of our use of memcached in 1.6 had
prompted the development of membuffer in the first place.

Despite the relevant APR bug that got fixed only recently,
there are fundamental limitations compared to a SHM-based
implementation:

* Instead of reading directly addressable memory, memcached
  requires inter-process calls over TCP/IP. That translated into
  ~10us latency. The performance numbers I found (~200k
  requests/s on larger SMP machines) are 1 order of magnitude
  less than what membuffer achieves with 1 thread.

* More critically, memcached does not support for partial data
  access, e.g. reading or adding a single directory entry. That's
  1.6-esque O(n^2) instead of O(n) runtime for large folders.


>  Your solution is specific to
> >> Subversion (it's in libsvn_subr and is not in the public API).  If
> >> you're solving the same problem memcached does, why does your solution
> >> need to be specific to svn?  Should it be a standalone tool that
> >> Subversion interfaces to as an optional dependency, and any other
> >> memcached consumer can switch to too?
>

The cache process does not need to be SVN-specific. Other tools
might, in theory, use it as well. However, there will only be some
negotiation and locking API while the actual data access is done
by the client (it's shared memory after all).

So, 3rd party tools using the SVN cache server would need to
talk to svn__cache_t, for instance.


> >> I don't mean to discourage you from doing this work; I just wonder
> >> whether the non-public parts of libsvn_subr is the right place for
> >> it to live in.
> > I've been wondering about all this caching, actually. There's memacache,
> > as Daniel mentions, and there's redis, and a bunch of other caching
> > solutions that have different strenghts and weaknesses. Yet here we are,
> > reinventing the wheel (and if I read the mails on the topic correctly,
> > having lots of fun while doing that).
>

I'm not reinventing the wheel. I'm construction a new one because
the old one does not fit.


> > It would be much better if fsfs could be configured to use one of
> > several caching servers and then the administrator would worry about the
> > rest. I think it's perfectly fine to require one of them.
>

Well then, I've got good news for you, sir. SVN supports memcached
since 1.6. Simply configure it properly.


> > I realize it's too late to do this for 1.8. But I doubt rolling our own
> > cache server makes any kind of sense.
>

It does. Proven with SVN 1.6.

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
*

http://www.wandisco.com/subversion/download
*

Re: svn commit: r1413451 - /subversion/branches/cache-server/BRANCH-README

Posted by Branko Čibej <br...@wandisco.com>.
Resent to the correct list.

On 17.12.2012 11:41, Branko Čibej wrote:
> On 17.12.2012 11:25, Daniel Shahaf wrote:
>> stefan2@apache.org wrote on Mon, Nov 26, 2012 at 00:21:26 -0000:
>>> Author: stefan2
>>> Date: Mon Nov 26 00:21:26 2012
>>> New Revision: 1413451
>>>
>>> URL: http://svn.apache.org/viewvc?rev=1413451&view=rev
>>> Log:
>>> On the cache-server branch.
>>>
>>> * BRANCH-README: add
>>>
>>> Added:
>>>     subversion/branches/cache-server/BRANCH-README
>>>
>>> Added: subversion/branches/cache-server/BRANCH-README
>>> URL: http://svn.apache.org/viewvc/subversion/branches/cache-server/BRANCH-README?rev=1413451&view=auto
>>> ==============================================================================
>>> --- subversion/branches/cache-server/BRANCH-README (added)
>>> +++ subversion/branches/cache-server/BRANCH-README Mon Nov 26 00:21:26 2012
>>> @@ -0,0 +1,109 @@
>>> +Goal
>>> +====
>>> +
>>> +Provide a stand-alone executable that will provide a svn_cache__t 
>>> +implementation based on a single shared memory.  The core data
>>> +structure and access logic can be taken from / shared with today's
>>> +membuffer cache.  The latter shall remain available as it is now.
>> memcached solves the problem you're stating above, and it's an
>> independent third-party project.  Your solution is specific to
>> Subversion (it's in libsvn_subr and is not in the public API).  If
>> you're solving the same problem memcached does, why does your solution
>> need to be specific to svn?  Should it be a standalone tool that
>> Subversion interfaces to as an optional dependency, and any other
>> memcached consumer can switch to too?
>>
>> I don't mean to discourage you from doing this work; I just wonder
>> whether the non-public parts of libsvn_subr is the right place for
>> it to live in.
> I've been wondering about all this caching, actually. There's memacache,
> as Daniel mentions, and there's redis, and a bunch of other caching
> solutions that have different strenghts and weaknesses. Yet here we are,
> reinventing the wheel (and if I read the mails on the topic correctly,
> having lots of fun while doing that).
>
> It would be much better if fsfs could be configured to use one of
> several caching servers and then the administrator would worry about the
> rest. I think it's perfectly fine to require one of them.
>
> I realize it's too late to do this for 1.8. But I doubt rolling our own
> cache server makes any kind of sense.
>
> -- Brane
>


Re: svn commit: r1413451 - /subversion/branches/cache-server/BRANCH-README

Posted by Branko Čibej <br...@wandisco.com>.
On 17.12.2012 11:25, Daniel Shahaf wrote:
> stefan2@apache.org wrote on Mon, Nov 26, 2012 at 00:21:26 -0000:
>> Author: stefan2
>> Date: Mon Nov 26 00:21:26 2012
>> New Revision: 1413451
>>
>> URL: http://svn.apache.org/viewvc?rev=1413451&view=rev
>> Log:
>> On the cache-server branch.
>>
>> * BRANCH-README: add
>>
>> Added:
>>     subversion/branches/cache-server/BRANCH-README
>>
>> Added: subversion/branches/cache-server/BRANCH-README
>> URL: http://svn.apache.org/viewvc/subversion/branches/cache-server/BRANCH-README?rev=1413451&view=auto
>> ==============================================================================
>> --- subversion/branches/cache-server/BRANCH-README (added)
>> +++ subversion/branches/cache-server/BRANCH-README Mon Nov 26 00:21:26 2012
>> @@ -0,0 +1,109 @@
>> +Goal
>> +====
>> +
>> +Provide a stand-alone executable that will provide a svn_cache__t 
>> +implementation based on a single shared memory.  The core data
>> +structure and access logic can be taken from / shared with today's
>> +membuffer cache.  The latter shall remain available as it is now.
> memcached solves the problem you're stating above, and it's an
> independent third-party project.  Your solution is specific to
> Subversion (it's in libsvn_subr and is not in the public API).  If
> you're solving the same problem memcached does, why does your solution
> need to be specific to svn?  Should it be a standalone tool that
> Subversion interfaces to as an optional dependency, and any other
> memcached consumer can switch to too?
>
> I don't mean to discourage you from doing this work; I just wonder
> whether the non-public parts of libsvn_subr is the right place for
> it to live in.

I've been wondering about all this caching, actually. There's memacache,
as Daniel mentions, and there's redis, and a bunch of other caching
solutions that have different strenghts and weaknesses. Yet here we are,
reinventing the wheel (and if I read the mails on the topic correctly,
having lots of fun while doing that).

It would be much better if fsfs could be configured to use one of
several caching servers and then the administrator would worry about the
rest. I think it's perfectly fine to require one of them.

I realize it's too late to do this for 1.8. But I doubt rolling our own
cache server makes any kind of sense.

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com