You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Berry van Halderen <b....@1hippo.com> on 2010/05/10 09:43:28 UTC

Modified externally exception due to eviction from shared ism cache

With some specifc data sets we've been experiencing some modified
externally exceptions in Jackrabbit 1.5.7.  I know that these are
valid exception in cases, but now I've experiencing them in a single
threaded, no observation, just one writing session (other dorment).
The modified externally exceptions all happen on a save at the
jcr:mixinTypes property, but it's not very deterministic.  With
multiple runs it only happens between 25% to 5% of the runs (the more
logging one puts in, the less frequent it occurs.  After checking my
own code I've turned to the jackrabbit code to see it there might be
an issue there.

My first attempt to resolve the issue was to alight the
o.a.j.c.NodeImpl.setMixinTypesProperty with the regular setProperty
code as that seems to be out of sync, but without success.  Therefore
I no longer think it is directly related to the jcr:mixinTypes
property, just that I'm doing a lot (thousands) of addMixin,
removeMixin.  There are other operations (copy nodes, properties set
and intermediate save to flush state). Digging deeper I stumbled on
the following thing that seems to be happening:
- A property A gets retrieved.  It will be cached by both shared Item
State Manager (ISM) and Local ISM.
- Property A gets modified and saved, the modcount in the ItemState
will be upped from the initial valie 0 to 1.
- Now a lot of other operations occur, Property A will get older and
older and will be evicted from local and shared ISM.
- Now it does look like it that A can get evicted from the shared ISM,
but not the local ISM.  This is unlikely, but this is what is in my
logs.
- Property A gets retrieved again from the same session, it is on the
local ISM cache, with modcount 1.  It gets modified.
- Now the session state is saved, to persist the state, the shared ISM
retrieved the state from the persistence manager.
- the initial modcount of this state will be 0 as it is re-retrieved.
- the conflict now is that the changelog sent from the Local ISM
contains a modcount of 1 for property A, while the related state for
property A from the shared ISM has a modcount 0.  The verification to
connect the two states will fail with a slate item state exception.

Now I can probably fix it or discuss this, but before going into
changing this, I'd like to verify with you all I've I'm not completely
in the wrong on what is happening here.  Is there a crutial step or
mechanism in place that I'm overlooking?

With kind regards,
Berry van Halderen

Re: Modified externally exception due to eviction from shared ism cache

Posted by Berry van Halderen <b....@1hippo.com>.
On Mon, May 17, 2010 at 9:46 PM, Marcel Reutegger <mr...@day.com> wrote:
> I just stumbled over the method
> AbstractBundlePersistenceManager.load(PropertyId), which I think might
> be related to this issue.
>
> what if the items states in the SISM indeed get cleared because the
> local item states go into the change log? I guess this is more likely
> the bigger the change set is, right?
>
> usually, this is not a problem because the item states are reconnected
> with possibly new ones loaded from the persistence manager. now the
> problem with above mentioned method could be the way it loads the
> jcr:mixinTypes property. the property is not stored as a regular
> property. the names are serialized with the node and the property is
> created on the fly and always with modCount 0 (I think).
>
> opinions?
>
> I think there is also another issue with that method. the
> jcr:mixinTypes property always seems to exist from the persistence
> manager POV. IIUC, it will create a property with an empty value array
> even if there is no mixin types set.

I actually was able to create a unit test that recreates the problem
AND found the root cause.  I will share this on the list a bit later
today.  It is a combination of factors, but a true problem.

\Berry

Re: Modified externally exception due to eviction from shared ism cache

Posted by Marcel Reutegger <mr...@day.com>.
hi,

I just stumbled over the method
AbstractBundlePersistenceManager.load(PropertyId), which I think might
be related to this issue.

what if the items states in the SISM indeed get cleared because the
local item states go into the change log? I guess this is more likely
the bigger the change set is, right?

usually, this is not a problem because the item states are reconnected
with possibly new ones loaded from the persistence manager. now the
problem with above mentioned method could be the way it loads the
jcr:mixinTypes property. the property is not stored as a regular
property. the names are serialized with the node and the property is
created on the fly and always with modCount 0 (I think).

opinions?

I think there is also another issue with that method. the
jcr:mixinTypes property always seems to exist from the persistence
manager POV. IIUC, it will create a property with an empty value array
even if there is no mixin types set.

regards
 marcel

On Mon, May 10, 2010 at 10:29 AM, Stefan Guggisberg
<st...@day.com> wrote:
> hi berry
>
> On Mon, May 10, 2010 at 9:43 AM, Berry van Halderen
> <b....@1hippo.com> wrote:
>> With some specifc data sets we've been experiencing some modified
>> externally exceptions in Jackrabbit 1.5.7.  I know that these are
>> valid exception in cases, but now I've experiencing them in a single
>> threaded, no observation, just one writing session (other dorment).
>> The modified externally exceptions all happen on a save at the
>> jcr:mixinTypes property, but it's not very deterministic.  With
>> multiple runs it only happens between 25% to 5% of the runs (the more
>> logging one puts in, the less frequent it occurs.  After checking my
>> own code I've turned to the jackrabbit code to see it there might be
>> an issue there.
>>
>> My first attempt to resolve the issue was to alight the
>> o.a.j.c.NodeImpl.setMixinTypesProperty with the regular setProperty
>> code as that seems to be out of sync, but without success.  Therefore
>> I no longer think it is directly related to the jcr:mixinTypes
>> property, just that I'm doing a lot (thousands) of addMixin,
>> removeMixin.  There are other operations (copy nodes, properties set
>> and intermediate save to flush state). Digging deeper I stumbled on
>> the following thing that seems to be happening:
>> - A property A gets retrieved.  It will be cached by both shared Item
>> State Manager (ISM) and Local ISM.
>> - Property A gets modified and saved, the modcount in the ItemState
>> will be upped from the initial valie 0 to 1.
>> - Now a lot of other operations occur, Property A will get older and
>> older and will be evicted from local and shared ISM.
>> - Now it does look like it that A can get evicted from the shared ISM,
>> but not the local ISM.  This is unlikely, but this is what is in my
>> logs.
>> - Property A gets retrieved again from the same session, it is on the
>> local ISM cache, with modcount 1.  It gets modified.
>> - Now the session state is saved, to persist the state, the shared ISM
>> retrieved the state from the persistence manager.
>> - the initial modcount of this state will be 0 as it is re-retrieved.
>> - the conflict now is that the changelog sent from the Local ISM
>> contains a modcount of 1 for property A, while the related state for
>> property A from the shared ISM has a modcount 0.  The verification to
>> connect the two states will fail with a slate item state exception.
>>
>> Now I can probably fix it or discuss this, but before going into
>> changing this, I'd like to verify with you all I've I'm not completely
>> in the wrong on what is happening here.  Is there a crutial step or
>> mechanism in place that I'm overlooking?
>
> the SISM cache is reference-based, i.e. items are not evicted as long as they're
> currently being referenced. items managed by the LISM do keep a reference
> to the underlying (shared) item state.
>
> from the top of my head... the only situation where a local item state
> clears its
> reference to the underlying shared state is when it is added to the change log.
>
> if there's any chance to reproduce the issue with a junit case using
> plain jackrabbit
> i'll further investigate.
>
> cheers
> stefan
>
>>
>> With kind regards,
>> Berry van Halderen
>>
>

Re: Modified externally exception due to eviction from shared ism cache

Posted by Stefan Guggisberg <st...@day.com>.
hi berry,


On Tue, May 18, 2010 at 12:22 PM, Berry van Halderen
<b....@1hippo.com> wrote:
> I've created an issue in Jira for the modified externally problem I've
> stumbled upon as I've now been able to reproduce it with a unit test.  Quite a
> large amount of investigation because a lot of factors must be in place.
> In the Jira issue the unit test, and patches are attached, so I won't have to
> post them here.
>
>  https://issues.apache.org/jira/browse/JCR-2633
>
> The unit test itself is mostly trivial, the problem arrises when you add a
> mixin to a node, save it, do this again with another mixin and then remove
> both mixins and in the following save the shared item state cache is shrunk,
> and the garbage collectors hits at exactly the right time.  In the unit test a
> reference to the jcr:mixinTypes property must have been held.
>
> What will happen is that the ItemState of the jcr:mixinTypes property will get
> a modification count higher than 0 when addin the mixins.  Because a reference
> to the property is kept, it will not be evicted from the primary cache in the
> local item state manager.  When removing all mixins, an overlayed state will
> be created of this ItemState.  Because the state and overlayed state are
> linked, the ItemState won't be dropped from the primary cache of the shared
> item state manager.
>
> But is MAY be evicted from the secondary cache implementing the LRU/FIFO
> functionality.  If this is the case when while persisting the changes there is
> a small window where the overlayed state will be disconnected from the state.
> This happens just before collecting the changelog at:
>  o.a.j.c.state.ItemState.disconnect():211
>  o.a.j.c.state.SessionItemStateManager.disconnectTransientItemState():674
>  o.a.j.c.PropertyImpl.makePersistent():143
>  o.a.j.c.ItemImpl.persistTransientItems():609
>  o.a.j.c.ItemImpl.save():1087
>  o.a.j.c.SessionImpl.save():858
> Or in the when actually collecting the changelog in one of the methods
> Changelog.{modified(),deleted() or both.  I think the latter, but not really
> sure.
>
> In any case, this breaks the bondage that prevents the cached state in the
> shared item state manager.
>
> Now if the shared item state cache has been shrunk enough and the GC hits
> before o.a.j.c.state.SharedItemStateManager.Update.begin():650 when the
> disconnected state will be purged from the shared item state cache.  Just
> below line 650 the check for stale items will then cause a re-retrieval from
> the persisted store of the property.  Because that state will have a
> modification count of 0, it will conflict with the modification count of the
> mixin property type that is being persisted.
>
> It is true that the GC needs to go off at exactly the right time and the
> secondary cache needs to have shrunk to be able to evict the item.  This can
> however happen in extreme cases.  The patch that contains the unit test
> contains code that will force the purging of the item.
>
> There is still the matter why the modcount comes back at 0 when retrieving the
> property from the persistence manager, basically the assumption made in the
> code between session, local, and shared item state managers, their caches,
> etcetera is that the modification count in the shared item state is always
> incremented, and never reset.
>
> There is an apparent contract (partially documented) that the modification
> count is to be persisted.  Which is in fact the case for the InMemPersistence-
> Manager, but all bundle derived persistence managers do not persist the
> jcr:mixinType property at all, but just the mixintypes as part of the
> nodedefinition information.  This means that the jcr:mixinTypes property will
> always be re-created with modcount of 0, which leads to clashes.

phew! ;) thanks a lot for sharing your very detailed and excellent analysis!
i know how difficult it is to track down those kind of issues...

>
> There are multiple approaches to fix this:
> - One could prevent the overlayed states being evicted by holding some
>  alternative reference to it (patch provided (holdrefs.patch));
> - modify the cache implementation at the shared item state manager such that
>  it will simply never release states from its secondary cache unless all
>  items are also gone from all local item state caches;
> - replace the modcount implementation, where instead of counting modifications
>  a state is marked dirty.  This is probably too much work, but has the
>  advantage of avoiding problems with a modcount which can actually reset
>  anyway (it being a short making it not unlikely);
> - ensure the modification count of the jcr:mixinTypes is stored by the
>  persistence manager (patch provided in jira issue (persistMixinTypes.patch).
>
> I would prefer the latter.  It holds more firm to the JCR model where
> meta-data as mixinTypes just properties, and it holds the apparent contract
> that modcount is to be persisted.  In my opinion it is unjust that the
> mixinTypes property wasn't persisted.  Apart from jcr:mixinTypes, only
> jcr:uuid and jcr:primaryType aren't persisted, but those are immutable
> therefor they are really in a different league.

i agree, i.e.  +1 for ensure the modification count of the
jcr:mixinTypes is stored by the
persistence manager

>
> The patches also contrain a small modification as suggestion such that the
> caching mechanism may be overridden.  Caching strategies are never a one-size
> fits all, therefor it is good to make it pluggable.

i tend to disagree ;) the 'cache' of the SISM is not a 'cache' in the
traditional sense
but has special semantics which are essential for maintaining data
integrity. i have
tried to explain this repeatedly in the past on this list. making it
'pluggable' would
probably open the door for lots of new issues where people used some supposedly
'better' generic caching framework and are surprised when they
encounter corrupted
repositories...

-1 for making the SISM cache 'pluggable'

cheers
stefan

> And a small modification
> to make jackrabbit-core compilable with Java5.
>
> \Berry
>

Re: Modified externally exception due to eviction from shared ism cache

Posted by Berry van Halderen <b....@1hippo.com>.
I've created an issue in Jira for the modified externally problem I've
stumbled upon as I've now been able to reproduce it with a unit test.  Quite a
large amount of investigation because a lot of factors must be in place.
In the Jira issue the unit test, and patches are attached, so I won't have to
post them here.

  https://issues.apache.org/jira/browse/JCR-2633

The unit test itself is mostly trivial, the problem arrises when you add a
mixin to a node, save it, do this again with another mixin and then remove
both mixins and in the following save the shared item state cache is shrunk,
and the garbage collectors hits at exactly the right time.  In the unit test a
reference to the jcr:mixinTypes property must have been held.

What will happen is that the ItemState of the jcr:mixinTypes property will get
a modification count higher than 0 when addin the mixins.  Because a reference
to the property is kept, it will not be evicted from the primary cache in the
local item state manager.  When removing all mixins, an overlayed state will
be created of this ItemState.  Because the state and overlayed state are
linked, the ItemState won't be dropped from the primary cache of the shared
item state manager.

But is MAY be evicted from the secondary cache implementing the LRU/FIFO
functionality.  If this is the case when while persisting the changes there is
a small window where the overlayed state will be disconnected from the state.
This happens just before collecting the changelog at:
  o.a.j.c.state.ItemState.disconnect():211
  o.a.j.c.state.SessionItemStateManager.disconnectTransientItemState():674
  o.a.j.c.PropertyImpl.makePersistent():143
  o.a.j.c.ItemImpl.persistTransientItems():609
  o.a.j.c.ItemImpl.save():1087
  o.a.j.c.SessionImpl.save():858
Or in the when actually collecting the changelog in one of the methods
Changelog.{modified(),deleted() or both.  I think the latter, but not really
sure.

In any case, this breaks the bondage that prevents the cached state in the
shared item state manager.

Now if the shared item state cache has been shrunk enough and the GC hits
before o.a.j.c.state.SharedItemStateManager.Update.begin():650 when the
disconnected state will be purged from the shared item state cache.  Just
below line 650 the check for stale items will then cause a re-retrieval from
the persisted store of the property.  Because that state will have a
modification count of 0, it will conflict with the modification count of the
mixin property type that is being persisted.

It is true that the GC needs to go off at exactly the right time and the
secondary cache needs to have shrunk to be able to evict the item.  This can
however happen in extreme cases.  The patch that contains the unit test
contains code that will force the purging of the item.

There is still the matter why the modcount comes back at 0 when retrieving the
property from the persistence manager, basically the assumption made in the
code between session, local, and shared item state managers, their caches,
etcetera is that the modification count in the shared item state is always
incremented, and never reset.

There is an apparent contract (partially documented) that the modification
count is to be persisted.  Which is in fact the case for the InMemPersistence-
Manager, but all bundle derived persistence managers do not persist the
jcr:mixinType property at all, but just the mixintypes as part of the
nodedefinition information.  This means that the jcr:mixinTypes property will
always be re-created with modcount of 0, which leads to clashes.

There are multiple approaches to fix this:
- One could prevent the overlayed states being evicted by holding some
  alternative reference to it (patch provided (holdrefs.patch));
- modify the cache implementation at the shared item state manager such that
  it will simply never release states from its secondary cache unless all
  items are also gone from all local item state caches;
- replace the modcount implementation, where instead of counting modifications
  a state is marked dirty.  This is probably too much work, but has the
  advantage of avoiding problems with a modcount which can actually reset
  anyway (it being a short making it not unlikely);
- ensure the modification count of the jcr:mixinTypes is stored by the
  persistence manager (patch provided in jira issue (persistMixinTypes.patch).

I would prefer the latter.  It holds more firm to the JCR model where
meta-data as mixinTypes just properties, and it holds the apparent contract
that modcount is to be persisted.  In my opinion it is unjust that the
mixinTypes property wasn't persisted.  Apart from jcr:mixinTypes, only
jcr:uuid and jcr:primaryType aren't persisted, but those are immutable
therefor they are really in a different league.

The patches also contrain a small modification as suggestion such that the
caching mechanism may be overridden.  Caching strategies are never a one-size
fits all, therefor it is good to make it pluggable.  And a small modification
to make jackrabbit-core compilable with Java5.

\Berry

Re: Modified externally exception due to eviction from shared ism cache

Posted by Stefan Guggisberg <st...@day.com>.
hi berry

On Mon, May 10, 2010 at 9:43 AM, Berry van Halderen
<b....@1hippo.com> wrote:
> With some specifc data sets we've been experiencing some modified
> externally exceptions in Jackrabbit 1.5.7.  I know that these are
> valid exception in cases, but now I've experiencing them in a single
> threaded, no observation, just one writing session (other dorment).
> The modified externally exceptions all happen on a save at the
> jcr:mixinTypes property, but it's not very deterministic.  With
> multiple runs it only happens between 25% to 5% of the runs (the more
> logging one puts in, the less frequent it occurs.  After checking my
> own code I've turned to the jackrabbit code to see it there might be
> an issue there.
>
> My first attempt to resolve the issue was to alight the
> o.a.j.c.NodeImpl.setMixinTypesProperty with the regular setProperty
> code as that seems to be out of sync, but without success.  Therefore
> I no longer think it is directly related to the jcr:mixinTypes
> property, just that I'm doing a lot (thousands) of addMixin,
> removeMixin.  There are other operations (copy nodes, properties set
> and intermediate save to flush state). Digging deeper I stumbled on
> the following thing that seems to be happening:
> - A property A gets retrieved.  It will be cached by both shared Item
> State Manager (ISM) and Local ISM.
> - Property A gets modified and saved, the modcount in the ItemState
> will be upped from the initial valie 0 to 1.
> - Now a lot of other operations occur, Property A will get older and
> older and will be evicted from local and shared ISM.
> - Now it does look like it that A can get evicted from the shared ISM,
> but not the local ISM.  This is unlikely, but this is what is in my
> logs.
> - Property A gets retrieved again from the same session, it is on the
> local ISM cache, with modcount 1.  It gets modified.
> - Now the session state is saved, to persist the state, the shared ISM
> retrieved the state from the persistence manager.
> - the initial modcount of this state will be 0 as it is re-retrieved.
> - the conflict now is that the changelog sent from the Local ISM
> contains a modcount of 1 for property A, while the related state for
> property A from the shared ISM has a modcount 0.  The verification to
> connect the two states will fail with a slate item state exception.
>
> Now I can probably fix it or discuss this, but before going into
> changing this, I'd like to verify with you all I've I'm not completely
> in the wrong on what is happening here.  Is there a crutial step or
> mechanism in place that I'm overlooking?

the SISM cache is reference-based, i.e. items are not evicted as long as they're
currently being referenced. items managed by the LISM do keep a reference
to the underlying (shared) item state.

from the top of my head... the only situation where a local item state
clears its
reference to the underlying shared state is when it is added to the change log.

if there's any chance to reproduce the issue with a junit case using
plain jackrabbit
i'll further investigate.

cheers
stefan

>
> With kind regards,
> Berry van Halderen
>