You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Bart van der Schans <b....@onehippo.com> on 2010/02/17 23:29:48 UTC

[jr3] Use JCache JSR-107 for (all) caches

Hi,

Right now there are several "homegrown" caches in Jackrabbit. Some
configurable, some based on soft/weak references. Using JCache it
would make it possible to leverage existing caching implementations.
This could help in making the caches better configurable and tunable
and have features like overflow to disk, which could help with large
transactions, persist caches to disk during restart for cache warming
and clustered caches. For example it could be interesting to share
bundle/item state caches between cluster nodes.

Regards,
Bart

Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Bart van der Schans <b....@onehippo.com>.
Hi Stefan,


On Thu, Feb 18, 2010 at 11:00 AM, Stefan Guggisberg
<st...@gmail.com> wrote:
> hi bart,
>
> On Wed, Feb 17, 2010 at 11:29 PM, Bart van der Schans
> <b....@onehippo.com> wrote:
>> Hi,
>>
>> Right now there are several "homegrown" caches in Jackrabbit. Some
>> configurable, some based on soft/weak references. Using JCache it
>> would make it possible to leverage existing caching implementations.
>
> jcache and friends have been suggested a number of times before.
Yes, I know. To me that indicates that maybe more people would like to
see a change in this area ;-)

> i had a look at jcache, ecache etc quite while ago, they didn't fit the
> the needs of jackrabbit core. the 'homegrown' caches in the core
> are not simple caches (holding serializable objects), the have special
> semantics and are fundamental to jackrabbit's implementation of
> isolation levels. none of the caching framworks i looked at supported
> the required semantics.
I agree that not all (most?) of jackrabbits current caches would
benefit (if even possible) to move to a general purpose cache. But for
example the BundleCache would be a good candidate.

> general purpose caching frameworks are probably fine at an application
> level, at the core level i'd rather rely on custom implementations that
> exactly do what we need, nothing more and nothing less.
>
> the core should IMO be small and higly optimized, not bloated with
> general purpose frameworks/black boxes ;)
I agree totally. The core could contain it's own simple default
implementation, but it could be nice if you could swap it for another
more bloated solution if that's what you require.

A lot would of course depend on how the new architecture is going to
be. If there aren't any caches, like Jukka mentioned to let the
persistence handle the caching, then there's of course no need to look
at generalized caches. But is we do need an equivalent of the current
BundleCache we should at least (re)consider making the caching
implementation pluggable.

Regards,
Bart


>
> cheers
> stefan
>
>> This could help in making the caches better configurable and tunable
>> and have features like overflow to disk, which could help with large
>> transactions, persist caches to disk during restart for cache warming
>> and clustered caches. For example it could be interesting to share
>> bundle/item state caches between cluster nodes.
>>
>> Regards,
>> Bart
>>
>



-- 
Hippo B.V.  -  Amsterdam
Oosteinde 11, 1017 WT, Amsterdam, +31(0)20-5224466

Hippo USA Inc.  -  San Francisco
101 H Street, Suite Q, Petaluma CA, 94952-3329, +1 (707) 773-4646
-----------------------------------------------------------------
http://www.onehippo.com   -  info@onehippo.com
-----------------------------------------------------------------

Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Stefan Guggisberg <st...@gmail.com>.
hi bart,

On Wed, Feb 17, 2010 at 11:29 PM, Bart van der Schans
<b....@onehippo.com> wrote:
> Hi,
>
> Right now there are several "homegrown" caches in Jackrabbit. Some
> configurable, some based on soft/weak references. Using JCache it
> would make it possible to leverage existing caching implementations.

jcache and friends have been suggested a number of times before.

i had a look at jcache, ecache etc quite while ago, they didn't fit the
the needs of jackrabbit core. the 'homegrown' caches in the core
are not simple caches (holding serializable objects), the have special
semantics and are fundamental to jackrabbit's implementation of
isolation levels. none of the caching framworks i looked at supported
the required semantics.

general purpose caching frameworks are probably fine at an application
level, at the core level i'd rather rely on custom implementations that
exactly do what we need, nothing more and nothing less.

the core should IMO be small and higly optimized, not bloated with
general purpose frameworks/black boxes ;)

cheers
stefan

> This could help in making the caches better configurable and tunable
> and have features like overflow to disk, which could help with large
> transactions, persist caches to disk during restart for cache warming
> and clustered caches. For example it could be interesting to share
> bundle/item state caches between cluster nodes.
>
> Regards,
> Bart
>

Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Bart van der Schans <b....@onehippo.com>.
On Thu, Feb 18, 2010 at 9:32 AM, Thomas Müller <th...@day.com> wrote:
> Hi,
>
> Is Jackrabbit too slow for you? Or do you have out of memory problems?
> Or why do you want to use your own cache?
Jackrabbit is not too slow ;-) The main goals for using JCache and
something like ehcache would be:
- monitoring (jmx), hit/mis counts, sizes
- management, easily adjustable, maybe runtime configurable
- clustering, for example having a separate cache cluster as a large
(100+GB) shared cache between Jackrabbit clustered nodes.

>> features like overflow to disk
>
> I would try to avoid that. It's not really a 'cache' if it has to be
> stored to disk, if the original data is also on disk.
Of course it is something to avoid and it won't help you that much if
your data is on disk, but I still think there are some advantages when
caches can overflow to disk:
- prevent OOMs (at the cost of a big slowdown)
- speed improvement for slow backends (for example cloud backends)
- caching of large binaries (100+MB)

> I would try to solve the root cause of the problem (problems
> supporting large transactions, improving performance) instead of
> trying to work around the issues on some higher level.
I totally agree. That's why I didn't mention large transactions or speed.

Regards,
Bart

Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Thomas Müller <th...@day.com>.
Hi,

About clustering: there are two main use cases:

A) to improve read throughput and to achieve high availability. In
this case writes can be serialized.

B) to improve write throughput. In this case writes should not be
serialized, instead writes should be merged later on (eventually
consistent).

I guess sometime we need to support both, but personally I think A is
important as well (if not more important than B).

Regards,
Thomas

Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Ian Boston <ie...@tfd.co.uk>.
On 19 Feb 2010, at 15:21, Marcel Reutegger wrote:

> On Thu, Feb 18, 2010 at 13:55, Jukka Zitting <ju...@gmail.com> wrote:
>> Agreed. Ideally (not sure if that's feasible) we'd push all caching
>> down below the unified persistence layer.
> 
> IMO that's exactly where a distributed cache belongs to.
> 
> the current design already implements some aspects but lacks other
> important things. currently each cluster node has it's own bundlecache
> with invalidation triggered on external changes. but filling the cache
> is always done through shared storage. I think this can become a
> bottleneck when the number of cluster nodes is more than just a few.
> e.g. consider that each of the cluster nodes has event listener
> registered that are interested in current changes. this will cause a
> rush on the shared storage whenever a change is done. ideally the
> cluster nodes (or more specifically the micro-kernels) would share
> their cached items with other cluster nodes. shared storage is only
> accessed when none of the cluster nodes can provide the request
> version of an item.

Agreed, 
Although distributed caches appear like a fix all solution replicated, and to some extents invalidated caches always seem to choke on traffic levels when a cache really starts to matter. Having used ehcache (which is good as a cache impl with jmx) reasonably extensively some observations.

Under the Cache layer (as opposed to a layer that knows something about the app)

  Replication is best totally avoided as a way of making a cluster scale to more than a few nodes.
  Invalidation can be effective, provide its not used to trigger reload and the invalidation traffic isn't massive.

Over in Shindig, there is a simple Cache API with memcached and ehcache default implementations. Most deployers tune those or replace with their own local implementations, putting cluster wide code below the Cache API and above the cache implementation.

Putting the cache below the persistence api, or using a cache API would, IMHO address most of these issue.

BTW, JSR-107 looks rather dead, and I found it lacking is certain areas.

Ian
> 
> regards
> marcel


Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Marcel Reutegger <ma...@gmx.net>.
On Thu, Feb 18, 2010 at 13:55, Jukka Zitting <ju...@gmail.com> wrote:
> Agreed. Ideally (not sure if that's feasible) we'd push all caching
> down below the unified persistence layer.

IMO that's exactly where a distributed cache belongs to.

the current design already implements some aspects but lacks other
important things. currently each cluster node has it's own bundlecache
with invalidation triggered on external changes. but filling the cache
is always done through shared storage. I think this can become a
bottleneck when the number of cluster nodes is more than just a few.
e.g. consider that each of the cluster nodes has event listener
registered that are interested in current changes. this will cause a
rush on the shared storage whenever a change is done. ideally the
cluster nodes (or more specifically the micro-kernels) would share
their cached items with other cluster nodes. shared storage is only
accessed when none of the cluster nodes can provide the request
version of an item.

regards
 marcel

Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Bart van der Schans <b....@onehippo.com>.
On Fri, Feb 19, 2010 at 11:20 AM, Stefan Guggisberg
<st...@gmail.com> wrote:
> On Fri, Feb 19, 2010 at 11:08 AM, Bart van der Schans
> <b....@onehippo.com> wrote:
>> On Thu, Feb 18, 2010 at 1:55 PM, Jukka Zitting <ju...@gmail.com> wrote:
>>> Hi,
>>>
>>> On Thu, Feb 18, 2010 at 1:30 PM, Alexander Klimetschek <ak...@day.com> wrote:
>>>> I would also first find the right persistence architecture and care
>>>> about what caches we need later (and avoid them as much as possible).
>>>
>>> Agreed. Ideally (not sure if that's feasible) we'd push all caching
>>> down below the unified persistence layer.
>>
>> If the persistence architecture will be plugable like now you'll never
>> have any garantees that the persistence layer will cache anything and
>> even read operations can become quite expensive or slow. In that case,
>> caching just "above" the persistence layer like we do now with the
>> BundleCache would make a lot of sense.
>
> BundleCache is not above, it's part of the persistence layer.
Ah yes of course ;-)

> btw, the persistence architecture should IMO not be plugable in the
> common sense, i.e. an operator shouldn't be able to switch them.
>
> the persistence managers in the current architecture aren't plugable
> either, for a good reason.
I agree.

We should probably postpone the "caching discussions" until the dust
has settled over the unified persistence thread.

Regards,
Bart


>
> cheers
> stefan
>
>>
>> For such a cache I do see benefits of using existing cache solutions
>> that provide monitoring, management and clustering.
>>
>> Regards,
>> Bart
>>
>



-- 
Hippo B.V.  -  Amsterdam
Oosteinde 11, 1017 WT, Amsterdam, +31(0)20-5224466

Hippo USA Inc.  -  San Francisco
101 H Street, Suite Q, Petaluma CA, 94952-3329, +1 (707) 773-4646
-----------------------------------------------------------------
http://www.onehippo.com   -  info@onehippo.com
-----------------------------------------------------------------

Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Thomas Müller <th...@day.com>.
Hi,

> the persistence architecture should IMO not be plugable in the
> common sense, i.e. an operator shouldn't be able to switch them.

I agree.

The repository URL should define which persistence backend to use. So
we get rid of the repository.xml and workspace.xml files, at least in
the normal case. There may be some cases (maybe clustering or cloud
storage) where some minimal external configuration is required (for
example a properties file).

Regards,
Thomas

Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Stefan Guggisberg <st...@gmail.com>.
On Fri, Feb 19, 2010 at 11:08 AM, Bart van der Schans
<b....@onehippo.com> wrote:
> On Thu, Feb 18, 2010 at 1:55 PM, Jukka Zitting <ju...@gmail.com> wrote:
>> Hi,
>>
>> On Thu, Feb 18, 2010 at 1:30 PM, Alexander Klimetschek <ak...@day.com> wrote:
>>> I would also first find the right persistence architecture and care
>>> about what caches we need later (and avoid them as much as possible).
>>
>> Agreed. Ideally (not sure if that's feasible) we'd push all caching
>> down below the unified persistence layer.
>
> If the persistence architecture will be plugable like now you'll never
> have any garantees that the persistence layer will cache anything and
> even read operations can become quite expensive or slow. In that case,
> caching just "above" the persistence layer like we do now with the
> BundleCache would make a lot of sense.

BundleCache is not above, it's part of the persistence layer.

btw, the persistence architecture should IMO not be plugable in the
common sense, i.e. an operator shouldn't be able to switch them.

the persistence managers in the current architecture aren't plugable
either, for a good reason.

cheers
stefan

>
> For such a cache I do see benefits of using existing cache solutions
> that provide monitoring, management and clustering.
>
> Regards,
> Bart
>

Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Bart van der Schans <b....@onehippo.com>.
On Thu, Feb 18, 2010 at 1:55 PM, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On Thu, Feb 18, 2010 at 1:30 PM, Alexander Klimetschek <ak...@day.com> wrote:
>> I would also first find the right persistence architecture and care
>> about what caches we need later (and avoid them as much as possible).
>
> Agreed. Ideally (not sure if that's feasible) we'd push all caching
> down below the unified persistence layer.

If the persistence architecture will be plugable like now you'll never
have any garantees that the persistence layer will cache anything and
even read operations can become quite expensive or slow. In that case,
caching just "above" the persistence layer like we do now with the
BundleCache would make a lot of sense.

For such a cache I do see benefits of using existing cache solutions
that provide monitoring, management and clustering.

Regards,
Bart

Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Feb 18, 2010 at 1:30 PM, Alexander Klimetschek <ak...@day.com> wrote:
> I would also first find the right persistence architecture and care
> about what caches we need later (and avoid them as much as possible).

Agreed. Ideally (not sure if that's feasible) we'd push all caching
down below the unified persistence layer.

BR,

Jukka Zitting

Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Alexander Klimetschek <ak...@day.com>.
On Thu, Feb 18, 2010 at 09:32, Thomas Müller <th...@day.com> wrote:
> I would try to solve the root cause of the problem (problems
> supporting large transactions, improving performance) instead of
> trying to work around the issues on some higher level.

I would also first find the right persistence architecture and care
about what caches we need later (and avoid them as much as possible).

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Thomas Müller <th...@day.com>.
Hi,

Is Jackrabbit too slow for you? Or do you have out of memory problems?
Or why do you want to use your own cache?

> features like overflow to disk

I would try to avoid that. It's not really a 'cache' if it has to be
stored to disk, if the original data is also on disk.

I would try to solve the root cause of the problem (problems
supporting large transactions, improving performance) instead of
trying to work around the issues on some higher level.

Regards,
Thomas

Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Bart van der Schans <b....@onehippo.com>.
On Thu, Feb 18, 2010 at 4:26 AM, Justin Edelson <ju...@gmail.com> wrote:
> On 2/17/10 5:29 PM, Bart van der Schans wrote:
>> Hi,
>>
>> Right now there are several "homegrown" caches in Jackrabbit. Some
>> configurable, some based on soft/weak references. Using JCache it
>> would make it possible to leverage existing caching implementations.
>> This could help in making the caches better configurable and tunable
>> and have features like overflow to disk, which could help with large
>> transactions, persist caches to disk during restart for cache warming
>> and clustered caches. For example it could be interesting to share
>> bundle/item state caches between cluster nodes.
>>
>> Regards,
>> Bart
> One thing to add to this list - monitorability of caches. AFAIK, there's
> no instrumentation available for Jackrabbit caches.
That would also be one of the benefits. A lot of the JCache
implementations support JMX management and monitoring. For an example
see:
http://ehcache.org/modules/monitor.html
http://ehcache.org/documentation/jmx.html

> In fact, JMX instrumentation in general may warrant its own thread.
We should start a separate thread for that. I would also love to have
all kind of JCR statistics from Jackrabbit through JMX like node read
and write counts, active sessions, total number of
nodes/props/revisions, etc.

Bart

Re: [jr3] Use JCache JSR-107 for (all) caches

Posted by Justin Edelson <ju...@gmail.com>.
On 2/17/10 5:29 PM, Bart van der Schans wrote:
> Hi,
> 
> Right now there are several "homegrown" caches in Jackrabbit. Some
> configurable, some based on soft/weak references. Using JCache it
> would make it possible to leverage existing caching implementations.
> This could help in making the caches better configurable and tunable
> and have features like overflow to disk, which could help with large
> transactions, persist caches to disk during restart for cache warming
> and clustered caches. For example it could be interesting to share
> bundle/item state caches between cluster nodes.
> 
> Regards,
> Bart
One thing to add to this list - monitorability of caches. AFAIK, there's
no instrumentation available for Jackrabbit caches.

In fact, JMX instrumentation in general may warrant its own thread.

Justin