You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Chetan Mehrotra <ch...@gmail.com> on 2013/10/21 12:38:43 UTC

Oak JCR Observation scalability aspects and concerns

Below are the details about discussion held [0] so far on factors
which possibly hinder scaling JCR Observation.
It also touches upon area where usage pattern of applications running
on JCR (like Sling) effects performance

Concerns
-------------

Marcel - Basic concern raised is that listeners without any filter
would cause lots of reads on the repository. these kind of listeners
would pull in modifications of all sessions performing distributed
writes. In our view this will not work well because it puts a very
high load on each of the cluster nodes and will likely delay delivery
of events.

Thomas - As for the theoretical bottleneck, it is quite clear: lets
assume there are 1000 cluster nodes, and each one writes 1000 nodes
per second, there would be 1 million events per second on _each_
cluster node, and 1 billion events per second in the system. It can
not possibly scale. Where exactly the bottleneck is (diffing, creating
the events, whatever) doesn't matter all that much in my view.

Data from an Adobe CQ Instance
------------------------------------------------

To get an idea around kind of JCR observations being used on Oak based
system we pulled in data froma default Adobe CQ setup which uses Oak

Complete details are provided at [1]. Some observations based on the data
* Around 20 listeners are registered for complete repo i.e. at /
* All listeners are using admin user session [2]
* Many listeners are using same set of attributes i.e. filter
  path+user+isDeep. Roughly 24 unique listeners excluding
  Sling JCR installer [3]


Current implementation
----------------------

Current logic works like this for each listener

1. ChangeProcessor gets reported of two states before and after
2. It would pull in tree for two different revisions and under the filter
   path for that listener
3. Then perform a diff which would result in pulling in state at that
   revision from the persistent store. The diff logic is optimized to
   traverse only changed tree paths
4. The diff result would then be filtered based on access check and
   delivered to the listener

So observation logic pull in changes (perform diff) for each listener
separately. Focusing only on one at root we still pull in same set of
information 21 times and then deliver it. This access pattern would
cause such entries to weigh high in the cache (as they are accessed
multiple times) and might occupy more space compared to that actual
importance.

Change the logic such that we have one Oak listener which listens for
changes on root and then deliver to all JCR (after filtering by path
-> nodeType -> user) then we would be able to reduce the number of
calls to persistent store or cache considerably. So it changes current
logic where N listeners independently pull in changes  to one where we
have 1 Oak listener per node which pulls in changes and then delivers
to all. Serving the same role as Sling Listener does for OSGi stack.

Comments
--------

* Marcel - Having large number of fine granied listeners (ones which are
  registered on different path) is better than having listeners registered
  for whole repo

* Thomas - Using a dedicated messaging system (maybe JMS based).As far as
  I understand, some sling / CQ components currently use observation where
  in fact messages wouldn't need to be persisted. At least for some usecases,
  messaging might be an alternative to observation, and would need less
  overhead. I know adding a messaging system would add complexity, but on the
  other hand it might improve performance / scalability because it is
  more flexible than observation.

* Michael Duerig - For layer working above JCR it would help to do some
  analysis of how observation is being used. Something like [4].
  Also when we aggregate listeners we need to ensure that they are
  compatible in terms of filter used, user session etc and it also has to
  take into account listeners lifecycle

* Carsten - We need to confirm where exactly is the performance problem
  Is it when/where the diff is generated?
  When jcr observation events are created based on the diff?
  When the jcr observation events are processed by the listeners?

  Also we can possibly modify the Sling eventing such that
  a) it uses an Oak API (instead of JCR Observation) for increased
     efficiency and
  b) adds more data to the the OSGi events, so that subsequent consumers
     can readily access the data without additional reads.

Next Steps
----------

* Look into the possibility to change design aggregates listeners and thus
  reduce number of call to backened persistent system
* Optimize the default setup for fewer number of cluster nodes
* Provide a way to scale for large number of cluster nodes i.e. say using
  external messaging system
* Collect more data around current design to confirm is this a problem at all :)

If I missed on any part kindly add it to the discussion.

regards
Chetan
[0] Participants - Michael Marth, Thomas Mueller, Marcel Reutegger,
    Michael Duerig, Carsten Ziegeler, Chetan Mehrotra
[1] https://gist.github.com/chetanmeh/7081328
[2] https://gist.github.com/chetanmeh/7081328/raw/listeners-list-filtered.txt
[3] https://git.corp.adobe.com/gist/chetanm/863/raw/listerners-per-path.txt
[4] https://cwiki.apache.org/confluence/display/SLING/Observation+usage+patterns

Re: Oak JCR Observation scalability aspects and concerns

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Tue, Oct 22, 2013 at 4:19 PM, Thomas Mueller <mu...@adobe.com> wrote:
>>This is the contract we have to maintain in Sling.
> Well, we can't maintain this contract, because it blocks scalability....

I tend to agree, and OTOH not everybody will need the kind of
scalability that we're discussing here.

Many of the usage patterns described at [1] can be solved without
observation, in more scalable ways.

Providing a way for users who actually need high scalability to move
away from the "catch all events" patterns, using Jukka's whiteboard
suggestion for example, sounds good to me. That might require work
from such users, but at some point we have to admit that there's no
magic.

-Bertrand

[1] https://cwiki.apache.org/confluence/display/SLING/Observation+usage+patterns#

Re: Oak JCR Observation scalability aspects and concerns

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Tue, Oct 22, 2013 at 9:43 AM, Carsten Ziegeler <cz...@apache.org> wrote:
> This is the contract we have to maintain in Sling.

I repeat from my earlier post:

> Right, it just means that a deployment with such an observer will have
> a built-in scalability limit as at some point the listener will no
> longer be able to keep up with all concurrent writes across a large
> and busy cluster.
>
> For now I'd document this limitation and possibly deprecate the
> JcrResourceListener functionality. A deployment can turn the
> functionality off once it reaches the scalability limit and has
> identified/fixed all affected code.

In addition to the earlier ideas I listed, the proposed
ContentChangeListener is one way we could use to reduce the reliance
of application code on the current JcrResourceListener contract, and
thus make it easier for a deployment to eventually break through that
inherent scalability limit.

BR,

Jukka Zitting

Re: Oak JCR Observation scalability aspects and concerns

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>This is the contract we have to maintain in Sling.

Well, we can't maintain this contract, because it blocks scalability.

Regards,
Thomas

Re: Oak JCR Observation scalability aspects and concerns

Posted by Carsten Ziegeler <cz...@apache.org>.

Sling's jcr listener provides an API/contract - so this is an
infrastructure component used by application code (or other infrastructure
code). And as the listener is delegating the promotion of events to the
EventAdmin, the jcr listener does not know if there are event listeners at
all or what kind of events they are interested in. And most OSGI event
listeners listen for all modification events and do their own filtering.
The event send out by the listener contains additional information like the
resource type, that's why the listener is reading every created / changed
node.
This is the contract we have to maintain in Sling.

Carsten


2013/10/22 Jukka Zitting <ju...@gmail.com>

> Hi,
>
> On Tue, Oct 22, 2013 at 5:21 AM, Felix Meschberger <fm...@adobe.com>
> wrote:
> > Am 22.10.2013 um 11:17 schrieb Chetan Mehrotra:
> >> I think in Sling case it would make sense for it to be implemented as
> >> an Observer. And I had a look at implementation of some of the
> >> listener implementations of [1] and I think they can be easily moved
> >> to Sling OSGi events
> >
> > To be discussed on the Sling list -- though wearing my Sling hat I am
> > extremely weary of implementing an Oak-dependency in Sling.
> > Sling uses JCR.
>
> Yet Sling is actively looking to expand its support for other non-JCR
> backends. ;-)
>
> I think we should do the same thing here, i.e. have an
> implementation-independent abstraction in Sling that can be
> implemented both by plain JCR and directly by Oak.
>
> As discussed, the main scalability problem with the current
> JcrResourceListener design is that it needs to handle *all* changes
> and the event producer has no way to know which events really are
> needed. To avoid that problem and to make life easier for most typical
> listeners, I would suggest adding a whiteboard service interface like
> the following:
>
>     interface ContentChangeListener {
>         void contentAdded(String pathToAddedNode);
>         void contentChanged(String pathToChangedNode);
>         void contentRemoved(String pathToRemovedNode);
>     }
>
> By registering such a service with a set of filter properties that
> identify which content changes are of interest, the client will start
> receiving callbacks at these methods whenever such changes are
> detected. The filter properties could be something like this:
>
>     paths - the paths under which to listen for changes
>     types - the types of nodes of interest
>     nodes - the names of nodes of interest
>     properties - the names of properties of interest
>
> For example, the following declaration would result in callbacks
> whenever there's  a base version change of a versionable README.txt
> node somewhere under /source subtree:
>
>     paths = "/source"
>     types = "mix:versionable"
>     nodes = "README.txt"
>     properties = "jcr:baseVersion"
>
> Additionally, a "granularity" property could be set to "coarse" to
> indicate that it's fine to deliver events just at the top of a
> modified subtree. For example, if changes are detected both at /foo
> and /foo/bar, a coarsely grained listener would only need to be
> notified at /foo. Setting the property to "fine" would result in
> callbacks for both /foo and /foo/bar.
>
> For proper access controls, the service would also need to have a
> "credentials" property that contains the access credentials to be used
> for determining which events the listener is entitled to.
>
> It should be fairly straightforward to support such a service
> interface both with plain JCR observers and with an Oak Observer, with
> the latter being potentially orders of magnitude faster.
>
> BR,
>
> Jukka Zitting
>



-- 
Carsten Ziegeler
cziegeler@apache.org

Re: Oak JCR Observation scalability aspects and concerns

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Tue, Oct 22, 2013 at 4:30 PM, Carsten Ziegeler <cz...@apache.org> wrote:
> ...4. same as 3. but keep the old Sling API with a bold marker when it's used
> that this does not scale...

That's my favorite choice - along with providing a way for users of
those Sling events to specify more precisely what they actually need,
so that they can improve the situation over time.

-Bertrand

Re: Oak JCR Observation scalability aspects and concerns

Posted by Felix Meschberger <fm...@adobe.com>.

Again: can we please have sling debates on dev@sling ?

Thanks
Felix

Von meinem iPad gesendet

> Am 22.10.2013 um 16:39 schrieb "Carsten Ziegeler" <cz...@apache.org>:
> 
> Just to reiterate :) if we go with 3 or 4, someone has to do the work in
> Sling (and other places) and adapt the code. As obviously as soon as a
> single listener is using the old pattern, the whole mechanism is mood.
> 
> Carsten
> 
> 
> 2013/10/22 Dominik Süß <do...@gmail.com>
> 
>> +1 on 4 since I fear 3 will create some overhead for existing solutions
>> that won't need this kind of scalabilty (and therefore create uncessary
>> efforts for migration).  This is the old "compat" pattern seen so often.
>> 
>> IMHO this should be an extension that "can" be installed but is not
>> available by default (to force devs to decide on that but being lazy and
>> not care about deprecation).
>> 
>> 
>> On Tue, Oct 22, 2013 at 4:30 PM, Carsten Ziegeler <cziegeler@apache.org
>>> wrote:
>> 
>>> I really would like to have a constructive discussion here. I think the
>>> Sling use case is pretty well explained now - that's an api Sling offers
>>> and which is used by a lot of code out there (a great part of Sling is
>>> based on the OSGi events and layers on top of Sling are using it as
>> well).
>>> That's a fact and it's also a fact that listeners for the OSGi event
>>> usually listener for all events.
>>> 
>>> Now basically we have three/four options:
>>> 1. we leave everything as is - it works but might be slow with larger
>>> installations and heavy writes
>>> 2. we maintain the API as-is in Sling and try to make the implementation
>> as
>>> fast as possible
>>> 3. we break compatibility in Sling, find a better solution, rewrite parts
>>> of Sling and require all downstream users to rewrite their stuff
>>> well, the fourth option would be
>>> 4. same as 3. but keep the old Sling API with a bold marker when it's
>> used
>>> that this does not scale
>>> 
>>> For the sake of compatibility I really would like to go with 2 which
>> might
>>> require changes in Sling and Oak but sounds to me as the best compromise.
>>> In addition, it really would help the discussion if we would have
>>> performance tests showing us the real boundaries in terms of scalability
>>> with observation with some real figures.
>>> 
>>> Thanks
>>> Carsten
>>> --
>>> Carsten Ziegeler
>>> cziegeler@apache.org
> 
> 
> 
> -- 
> Carsten Ziegeler
> cziegeler@apache.org

Re: Oak JCR Observation scalability aspects and concerns

Posted by Alexander Klimetschek <ak...@adobe.com>.

I created OAK-1133 for the "Observation listener PLUS" as I proposed it, to separate it from the Sling-specific case.

https://issues.apache.org/jira/browse/OAK-1133

Cheers,
Alex

On 27.10.2013, at 23:57, Thomas Mueller <mu...@adobe.com> wrote:

> Hi,
> 
>> I've created OAK-1120 to start with the simple case.,
> 
> OAK-1120 doesn't address scalability aspects.
> 
> Regards,
> Thomas
> 
> On 10/27/13 10:44 AM, "Carsten Ziegeler" <cz...@apache.org> wrote:
> 
>> I've created OAK-1120 to start with the simple case.,
>> 
>> Thanks
>> Carsten
>> 
>> 
>> 2013/10/25 Alexander Klimetschek <ak...@adobe.com>
>> 
>>> On 25.10.2013, at 14:11, Alexander Klimetschek <ak...@adobe.com>
>>> wrote:
>>>> Maybe it would be useful to additionally allow a generic "matches"
>>> function that can be passed upon listener registration that could check
>>> whatever it wants, working on the diff or change set directly.
>>> 
>>> Actually it needs to be able to work on the full tree, not just the
>>> diff.
>>> Say you have a listener registered on sling:resourceType=foo, and any
>>> other
>>> property of that node changed, you want the listener to trigger, even
>>> though the sling:resourceType property wasn't modified and isn't in the
>>> diff.
>>> 
>>> Oh, and another big issue used to be that you cannot check for any of
>>> this
>>> in a jcr observation listener in a REMOVED event, since the node is
>>> already
>>> gone (you only got the path - I think there are some dirty tricks we are
>>> using here, caching info etc.). If we move that down into Oak, we
>>> should be
>>> able to have access to the content before it actually gets removed.
>>> Right?
>>> 
>>> Cheers,
>>> Alex
>> 
>> 
>> 
>> 
>> -- 
>> Carsten Ziegeler
>> cziegeler@apache.org
>

Re: Oak JCR Observation scalability aspects and concerns

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

> I've created OAK-1120 to start with the simple case.,

OAK-1120 doesn't address scalability aspects.

Regards,
Thomas

On 10/27/13 10:44 AM, "Carsten Ziegeler" <cz...@apache.org> wrote:

>I've created OAK-1120 to start with the simple case.,
>
>Thanks
>Carsten
>
>
>2013/10/25 Alexander Klimetschek <ak...@adobe.com>
>
>> On 25.10.2013, at 14:11, Alexander Klimetschek <ak...@adobe.com>
>>wrote:
>> > Maybe it would be useful to additionally allow a generic "matches"
>> function that can be passed upon listener registration that could check
>> whatever it wants, working on the diff or change set directly.
>>
>> Actually it needs to be able to work on the full tree, not just the
>>diff.
>> Say you have a listener registered on sling:resourceType=foo, and any
>>other
>> property of that node changed, you want the listener to trigger, even
>> though the sling:resourceType property wasn't modified and isn't in the
>> diff.
>>
>> Oh, and another big issue used to be that you cannot check for any of
>>this
>> in a jcr observation listener in a REMOVED event, since the node is
>>already
>> gone (you only got the path - I think there are some dirty tricks we are
>> using here, caching info etc.). If we move that down into Oak, we
>>should be
>> able to have access to the content before it actually gets removed.
>>Right?
>>
>> Cheers,
>> Alex
>
>
>
>
>-- 
>Carsten Ziegeler
>cziegeler@apache.org

Re: Oak JCR Observation scalability aspects and concerns

Posted by Carsten Ziegeler <cz...@apache.org>.

I've created OAK-1120 to start with the simple case.,

Thanks
Carsten


2013/10/25 Alexander Klimetschek <ak...@adobe.com>

> On 25.10.2013, at 14:11, Alexander Klimetschek <ak...@adobe.com> wrote:
> > Maybe it would be useful to additionally allow a generic "matches"
> function that can be passed upon listener registration that could check
> whatever it wants, working on the diff or change set directly.
>
> Actually it needs to be able to work on the full tree, not just the diff.
> Say you have a listener registered on sling:resourceType=foo, and any other
> property of that node changed, you want the listener to trigger, even
> though the sling:resourceType property wasn't modified and isn't in the
> diff.
>
> Oh, and another big issue used to be that you cannot check for any of this
> in a jcr observation listener in a REMOVED event, since the node is already
> gone (you only got the path - I think there are some dirty tricks we are
> using here, caching info etc.). If we move that down into Oak, we should be
> able to have access to the content before it actually gets removed. Right?
>
> Cheers,
> Alex




-- 
Carsten Ziegeler
cziegeler@apache.org

Re: Oak JCR Observation scalability aspects and concerns

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 25.10.2013, at 14:11, Alexander Klimetschek <ak...@adobe.com> wrote:
> Maybe it would be useful to additionally allow a generic "matches" function that can be passed upon listener registration that could check whatever it wants, working on the diff or change set directly.

Actually it needs to be able to work on the full tree, not just the diff. Say you have a listener registered on sling:resourceType=foo, and any other property of that node changed, you want the listener to trigger, even though the sling:resourceType property wasn't modified and isn't in the diff.

Oh, and another big issue used to be that you cannot check for any of this in a jcr observation listener in a REMOVED event, since the node is already gone (you only got the path - I think there are some dirty tricks we are using here, caching info etc.). If we move that down into Oak, we should be able to have access to the content before it actually gets removed. Right?

Cheers,
Alex

Re: Oak JCR Observation scalability aspects and concerns

Posted by Alexander Klimetschek <ak...@adobe.com>.

Hi,

IMHO it would be nice to support more matching functionality for events down in Oak. See my mail on the Sling thread [0].

The general advantage would be that for non-matches there would be no event created at all, no thread started and what else is involved. I assume that could happen most efficiently in the diff of the editing session already (or similar).

The use cases are (taken from the Adobe CQ workflow launcher, should overlap with Sling's jcr listener):
- paths with globbing support (for example /content/foo/*/something)
- check for property values (equal, not equal, contains etc.), most importantly sling:resourceType in Sling apps
- allow to check properties on child nodes as well, typically jcr:content
- node types (already in jcr observation)
- created/modified/deleted events, separate from move/copy

Regarding move events, afaiu you often want to filter them out, to avoid reprocessing things if the contents of a node tree didn't actually change but just moved around. So having a created/modified event that does not get triggered by a move or copy would be required. (Probably repeating here, just wanted to clarify).

Maybe it would be useful to additionally allow a generic "matches" function that can be passed upon listener registration that could check whatever it wants, working on the diff or change set directly. Of course one would have to program it carefully to make sure it's fast, but at least it would run as early as possible.

Still, the common things like check on property etc. should be pre-implemented to avoid every listener having to come up with its own.

Maybe this can all be seen as a custom extension to Oak, but it would make sense to standardize it, since it is so common. Also, the part where the listener is triggered probably involves some thread work that should definitely not be left to the application level.

WDYT?

[0] http://markmail.org/message/dvs2o5p5x45fjyxy (and following)

Cheers,
Alex

On 25.10.2013, at 06:14, Angela Schreiber <an...@adobe.com> wrote:

> hi bertrand
> 
> +1 for everything you said.
> 
> in general i think that we have split that section in the docu:
> 
> a) summary from OAK point of view (for those that know the very details)
> b) summary from a JCR/Jackrabbit point of view. IMHO we can't expect
>   our user community to understand how these subtle differences
>   affect the behaviour of the JCR implementation and the event handling.
>   a more comprehensive list would IMO make sense in order to allow
>   users to get a feeling what they can expect.
> 
> in general i am sure michael has the best overview as he is
> implementing observation in OAK and knows best which cases will
> work and where the differences are. afaik he is also a sling
> committer :-)
> 
> kind regards
> angela
> 
> 
> 
> On 10/25/13 2:37 PM, "Bertrand Delacretaz" <bd...@apache.org> wrote:
> 
>> Hi Angela,
>> 
>> On Fri, Oct 25, 2013 at 11:21 AM, Angela Schreiber <an...@adobe.com>
>> wrote:
>>> Bertrand wrote:
>>>> The OSGi events that Sling rebroadcasts are less granular than JCR
>>>> events, so this might not be a problem for that case.
>>> 
>>> do you know it or are you guessing?...
>> 
>> I'm making an educated guess.
>> 
>> To get hard facts, as you rightly ask for, we need a test suite that
>> compares those OSGi events that Sling rebroadcasts, when doing various
>> things on Jackrabbit and Oak. I don't think we'll ever get 100%
>> identical observation behavior between Jackrabbit and Oak, so we'll
>> need to define which differences are acceptable. To me the only sane
>> way to do that is via a test suite.
>> 
>> This can be added to the Sling it-jackrabbit-oak module [1] which
>> makes it easy to run the exact same tests against Jackrabbit and Oak
>> in an OSGi environment, but someone will need to write those tests. We
>> might want to grant write access to both Sling + Oak committers to
>> that module to make things easier.
>> 
>> Thanks for the examples that you mention. I see the referenceable node
>> issue at http://jackrabbit.apache.org/oak/docs/differences.html, but
>> not the setPrimaryType one, could it be added?
>> 
>> Ciao,
>> -Bertrand
>> 
>> [1] 
>> http://svn.apache.org/repos/asf/sling/trunk/bundles/jcr/it-jackrabbit-oak/
>

Re: Oak JCR Observation scalability aspects and concerns

Posted by Angela Schreiber <an...@adobe.com>.

hi bertrand

+1 for everything you said.

in general i think that we have split that section in the docu:

a) summary from OAK point of view (for those that know the very details)
b) summary from a JCR/Jackrabbit point of view. IMHO we can't expect
   our user community to understand how these subtle differences
   affect the behaviour of the JCR implementation and the event handling.
   a more comprehensive list would IMO make sense in order to allow
   users to get a feeling what they can expect.

in general i am sure michael has the best overview as he is
implementing observation in OAK and knows best which cases will
work and where the differences are. afaik he is also a sling
committer :-)

kind regards
angela

On 10/25/13 2:37 PM, "Bertrand Delacretaz" <bd...@apache.org> wrote:

>Hi Angela,
>
>On Fri, Oct 25, 2013 at 11:21 AM, Angela Schreiber <an...@adobe.com>
>wrote:
>> Bertrand wrote:
>>>The OSGi events that Sling rebroadcasts are less granular than JCR
>>>events, so this might not be a problem for that case.
>>
>> do you know it or are you guessing?...
>
>I'm making an educated guess.
>
>To get hard facts, as you rightly ask for, we need a test suite that
>compares those OSGi events that Sling rebroadcasts, when doing various
>things on Jackrabbit and Oak. I don't think we'll ever get 100%
>identical observation behavior between Jackrabbit and Oak, so we'll
>need to define which differences are acceptable. To me the only sane
>way to do that is via a test suite.
>
>This can be added to the Sling it-jackrabbit-oak module [1] which
>makes it easy to run the exact same tests against Jackrabbit and Oak
>in an OSGi environment, but someone will need to write those tests. We
>might want to grant write access to both Sling + Oak committers to
>that module to make things easier.
>
>Thanks for the examples that you mention. I see the referenceable node
>issue at http://jackrabbit.apache.org/oak/docs/differences.html, but
>not the setPrimaryType one, could it be added?
>
>Ciao,
>-Bertrand
>
>[1] 
>http://svn.apache.org/repos/asf/sling/trunk/bundles/jcr/it-jackrabbit-oak/

Re: Oak JCR Observation scalability aspects and concerns

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi Angela,

On Fri, Oct 25, 2013 at 11:21 AM, Angela Schreiber <an...@adobe.com> wrote:
> Bertrand wrote:
>>The OSGi events that Sling rebroadcasts are less granular than JCR
>>events, so this might not be a problem for that case.
>
> do you know it or are you guessing?...

I'm making an educated guess.

To get hard facts, as you rightly ask for, we need a test suite that
compares those OSGi events that Sling rebroadcasts, when doing various
things on Jackrabbit and Oak. I don't think we'll ever get 100%
identical observation behavior between Jackrabbit and Oak, so we'll
need to define which differences are acceptable. To me the only sane
way to do that is via a test suite.

This can be added to the Sling it-jackrabbit-oak module [1] which
makes it easy to run the exact same tests against Jackrabbit and Oak
in an OSGi environment, but someone will need to write those tests. We
might want to grant write access to both Sling + Oak committers to
that module to make things easier.

Thanks for the examples that you mention. I see the referenceable node
issue at http://jackrabbit.apache.org/oak/docs/differences.html, but
not the setPrimaryType one, could it be added?

Ciao,
-Bertrand

[1] http://svn.apache.org/repos/asf/sling/trunk/bundles/jcr/it-jackrabbit-oak/

Re: Oak JCR Observation scalability aspects and concerns

Posted by Angela Schreiber <an...@adobe.com>.

hi bertrand

>On Fri, Oct 25, 2013 at 8:53 AM, Angela Schreiber <an...@adobe.com>
>wrote:
>> ...the biggest challenge i see in terms of backwards compatibility is
>> that the diff-mechanism in OAK doesn't allow a 1:1 translation
>> to JCR events as they used to be generated in jackrabbit-core...
>
>The OSGi events that Sling rebroadcasts are less granular than JCR
>events, so this might not be a problem for that case.

do you know it or are you guessing?

carsten writes:

"1. Compacts observation events into node added, node changed, and
 node removed events [...]"

and an example where oak differs from jackrabbit-core:
on the diff you will neither get nodeAdded nor nodeRemoved if a
referenceable node is being removed and a new one is being added
with the same name. the only thing that changes is the jcr:uuid property
and you can't find out if this was due to a change of the property
(which was possible on the OAK API) or if the parent node has been removed
and re-added. the reason is that the diff doesn't know about referenceable
nodes
and thus ignores that from a JCR point of view these 2 nodes are not
the same.

another example:
there exists Node#setPrimaryType which will modify the jcr:primaryType
property. in the diff you can't find out if someone called
Node#setPrimaryType
or if someone call Node#remove followed by Node#addNode(previousName,
newPrimaryType).
in jackrabbit the add/remove scenario always was add + remove because
every 
single node had a unique identifier... thus the 2 use cases where
easy to distinguish.

so, to my knowledge it's currently not possible to reliably generate
the exact same result for the OSGI events when running on OAK compared
to Jackrabbit (irrespective on whether you do it on the OAK level or
reading JCR events).

and what i am asking for is that we stop speculating whether it might
be a problem or not but finally get hard facts.

kind regards
angela

>
>If at some point we can say "low-level JCR events behave slightly
>differently between Oak and Jackrabbit, but the higher-level OSGi
>events that Sling provides are the same" that might be a realistic
>compromise.
>
>> ...we keep getting mixed signals
>> ("be fully compatible" vs "just make it scalable and fast") and need
>> get a more reliable feedback to in...
>
>So maybe being fully compatible (*) at the Sling level, and scalable
>and fast at the JCR level, is a good way forward.
>
>-Bertrand
>
>(*) might require some Sling client code to indicate more precisely
>which events it wants to listen to, but that's a reasonable price to
>pay IMO

Re: Oak JCR Observation scalability aspects and concerns

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi,

On Fri, Oct 25, 2013 at 8:53 AM, Angela Schreiber <an...@adobe.com> wrote:
> ...the biggest challenge i see in terms of backwards compatibility is
> that the diff-mechanism in OAK doesn't allow a 1:1 translation
> to JCR events as they used to be generated in jackrabbit-core...

The OSGi events that Sling rebroadcasts are less granular than JCR
events, so this might not be a problem for that case.

If at some point we can say "low-level JCR events behave slightly
differently between Oak and Jackrabbit, but the higher-level OSGi
events that Sling provides are the same" that might be a realistic
compromise.

> ...we keep getting mixed signals
> ("be fully compatible" vs "just make it scalable and fast") and need
> get a more reliable feedback to in...

So maybe being fully compatible (*) at the Sling level, and scalable
and fast at the JCR level, is a good way forward.

-Bertrand

(*) might require some Sling client code to indicate more precisely
which events it wants to listen to, but that's a reasonable price to
pay IMO

Re: Oak JCR Observation scalability aspects and concerns

Posted by Angela Schreiber <an...@adobe.com>.

hi carsten

IMO that should be feasible as the oak editor/commithook interfaces
(and the diff they may make use of) will provide you with the
required information (e.g. childNodeChanged takes both the nodestate
before and after the modification as params).

the biggest challenge i see in terms of backwards compatibility is
that the diff-mechanism in OAK doesn't allow a 1:1 translation
to JCR events as they used to be generated in jackrabbit-core.

however, since we have to be as backwards compatible as possible
under the given circumstances, the sling way of gathering that event
information would IMO the same or very similar as we will use to
generate the JCR events...

we had some discussions during this week's oakathon on how
to get there and how big the impact of changes in that area might
be; in particular but not exclusively when it comes to NODE_MOVED
events (see also OAK-783 [0]). michael consequently updated the
"Differences to Jackrabbit 2" documentation at [1]. in case of
doubt how a given change may impact Sling (or any other application
using JCR events) please drop a line on the list.

IMO it's crucial that we find out if the impact of these changes
is acceptable and what the consequences are in terms of backwards
compatibility and migration effort... we keep getting mixed signals
("be fully compatible" vs "just make it scalable and fast") and need
get a more reliable feedback to in order to be able to calculate
the risk of these changes and take the right decisions for the
next couple of months.

kind regards
angela

[0] https://issues.apache.org/jira/browse/OAK-783
[1] http://jackrabbit.apache.org/oak/docs/differences.html

On 10/25/13 8:03 AM, "Carsten Ziegeler" <cz...@apache.org> wrote:

>Rethinking this, I think it's clear that the current way the jcr listener
>works in Sling is not optimal as it reads each and every changed node. So
>I
>think it really would be great if we could directly get the information
>from Oak without the need to additionally process.
>
>The listener currently:
>1. Compacts observation events into node added, node changed, and node
>removed events - it also collects added/changed/removed properties from
>the
>events.
>2. For every added and changed node, it reads the node and the primary
>node
>type, the sling:resourceType and sling:resourceSuperType property
>3. There is a special case, if a node has been added/changed with the name
>"jcr:content" and the parent node is a file node, then the parent node is
>reported  as changed/added.
>
>I've no idea how the current diff mechanism works in Oak, but I would
>assume that the information for 1. are similar to the result of the diff.
>I
>would hope that the properties for 2. are there as well as they need to be
>diffed against the old value. 3. is more tricky and specific, and I think
>it would be ok if we would do this in Sling as the ratio of files vs other
>content is pretty low usually.
>
>If Oak could provide this information, we could leverage that in Sling and
>would remove the additional reads - which clearly is an improvement. This
>would require zero changes in applications based on Sling. If Sling is run
>on Oak, this implementation is used, otherwis the current JCR based on is
>used.
>
>WDYT is there any change to get this done?
>
>Thanks
>Carsten

Re: Oak JCR Observation scalability aspects and concerns

Posted by Carsten Ziegeler <cz...@apache.org>.

Rethinking this, I think it's clear that the current way the jcr listener
works in Sling is not optimal as it reads each and every changed node. So I
think it really would be great if we could directly get the information
from Oak without the need to additionally process.

The listener currently:
1. Compacts observation events into node added, node changed, and node
removed events - it also collects added/changed/removed properties from the
events.
2. For every added and changed node, it reads the node and the primary node
type, the sling:resourceType and sling:resourceSuperType property
3. There is a special case, if a node has been added/changed with the name
"jcr:content" and the parent node is a file node, then the parent node is
reported  as changed/added.

I've no idea how the current diff mechanism works in Oak, but I would
assume that the information for 1. are similar to the result of the diff. I
would hope that the properties for 2. are there as well as they need to be
diffed against the old value. 3. is more tricky and specific, and I think
it would be ok if we would do this in Sling as the ratio of files vs other
content is pretty low usually.

If Oak could provide this information, we could leverage that in Sling and
would remove the additional reads - which clearly is an improvement. This
would require zero changes in applications based on Sling. If Sling is run
on Oak, this implementation is used, otherwis the current JCR based on is
used.

WDYT is there any change to get this done?

Thanks
Carsten

Re: Oak JCR Observation scalability aspects and concerns

Posted by Dominik Süß <do...@gmail.com>.

I just opened a thread at sling-dev for further discussion about api and
implementation changes on sling side [0]

For discussions around usage of this api within sling please use this
linked thread [0].

Best regards
Dominik


[0] markmail.org/thread/plb7ledhsna33r3g


On Tue, Oct 22, 2013 at 4:54 PM, Jukka Zitting <ju...@gmail.com>wrote:

> Hi,
>
> On Tue, Oct 22, 2013 at 10:39 AM, Carsten Ziegeler <cz...@apache.org>
> wrote:
> > Just to reiterate :) if we go with 3 or 4, someone has to do the work in
> > Sling (and other places) and adapt the code. As obviously as soon as a
> > single listener is using the old pattern, the whole mechanism is mood.
>
> I think we can lay the groundwork with tools like the ones outlined in
> this thread, and postpone much of the required refactoring work to
> when we do have the appropriate benchmarks in place and a use case
> where such scale is needed in practice. At that point we can also make
> a more reasoned judgement of whether option 2 or 4 is the better
> solution for that particular case, i.e. are we still within such scale
> that normal optimization is good enough and broader design changes
> aren't needed. And we'll also have someone with a bad enough itch to
> scratch.
>
> As for 3 vs. 4, I think option 3 is clearly unworkable, as there's no
> immediate need to break backwards compatibility for most normal
> deployments. I'd go for option 4 of keeping the current mechanism with
> a note detailing the scalability issue and instructions on how to
> prepare for avoiding it. Note that option 4 is compatible with 2, as
> we can proceed on both fronts concurrently.
>
> BR,
>
> Jukka Zitting
>

Re: Oak JCR Observation scalability aspects and concerns

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Tue, Oct 22, 2013 at 10:39 AM, Carsten Ziegeler <cz...@apache.org> wrote:
> Just to reiterate :) if we go with 3 or 4, someone has to do the work in
> Sling (and other places) and adapt the code. As obviously as soon as a
> single listener is using the old pattern, the whole mechanism is mood.

I think we can lay the groundwork with tools like the ones outlined in
this thread, and postpone much of the required refactoring work to
when we do have the appropriate benchmarks in place and a use case
where such scale is needed in practice. At that point we can also make
a more reasoned judgement of whether option 2 or 4 is the better
solution for that particular case, i.e. are we still within such scale
that normal optimization is good enough and broader design changes
aren't needed. And we'll also have someone with a bad enough itch to
scratch.

As for 3 vs. 4, I think option 3 is clearly unworkable, as there's no
immediate need to break backwards compatibility for most normal
deployments. I'd go for option 4 of keeping the current mechanism with
a note detailing the scalability issue and instructions on how to
prepare for avoiding it. Note that option 4 is compatible with 2, as
we can proceed on both fronts concurrently.

BR,

Jukka Zitting

Re: Oak JCR Observation scalability aspects and concerns

Posted by Carsten Ziegeler <cz...@apache.org>.

Just to reiterate :) if we go with 3 or 4, someone has to do the work in
Sling (and other places) and adapt the code. As obviously as soon as a
single listener is using the old pattern, the whole mechanism is mood.

Carsten


2013/10/22 Dominik Süß <do...@gmail.com>

> +1 on 4 since I fear 3 will create some overhead for existing solutions
> that won't need this kind of scalabilty (and therefore create uncessary
> efforts for migration).  This is the old "compat" pattern seen so often.
>
>  IMHO this should be an extension that "can" be installed but is not
> available by default (to force devs to decide on that but being lazy and
> not care about deprecation).
>
>
> On Tue, Oct 22, 2013 at 4:30 PM, Carsten Ziegeler <cziegeler@apache.org
> >wrote:
>
> > I really would like to have a constructive discussion here. I think the
> > Sling use case is pretty well explained now - that's an api Sling offers
> > and which is used by a lot of code out there (a great part of Sling is
> > based on the OSGi events and layers on top of Sling are using it as
> well).
> > That's a fact and it's also a fact that listeners for the OSGi event
> > usually listener for all events.
> >
> > Now basically we have three/four options:
> > 1. we leave everything as is - it works but might be slow with larger
> > installations and heavy writes
> > 2. we maintain the API as-is in Sling and try to make the implementation
> as
> > fast as possible
> > 3. we break compatibility in Sling, find a better solution, rewrite parts
> > of Sling and require all downstream users to rewrite their stuff
> > well, the fourth option would be
> > 4. same as 3. but keep the old Sling API with a bold marker when it's
> used
> > that this does not scale
> >
> > For the sake of compatibility I really would like to go with 2 which
> might
> > require changes in Sling and Oak but sounds to me as the best compromise.
> > In addition, it really would help the discussion if we would have
> > performance tests showing us the real boundaries in terms of scalability
> > with observation with some real figures.
> >
> > Thanks
> > Carsten
> > --
> > Carsten Ziegeler
> > cziegeler@apache.org
> >
>



-- 
Carsten Ziegeler
cziegeler@apache.org

Re: Oak JCR Observation scalability aspects and concerns

Posted by Dominik Süß <do...@gmail.com>.

+1 on 4 since I fear 3 will create some overhead for existing solutions
that won't need this kind of scalabilty (and therefore create uncessary
efforts for migration).  This is the old "compat" pattern seen so often.

 IMHO this should be an extension that "can" be installed but is not
available by default (to force devs to decide on that but being lazy and
not care about deprecation).


On Tue, Oct 22, 2013 at 4:30 PM, Carsten Ziegeler <cz...@apache.org>wrote:

> I really would like to have a constructive discussion here. I think the
> Sling use case is pretty well explained now - that's an api Sling offers
> and which is used by a lot of code out there (a great part of Sling is
> based on the OSGi events and layers on top of Sling are using it as well).
> That's a fact and it's also a fact that listeners for the OSGi event
> usually listener for all events.
>
> Now basically we have three/four options:
> 1. we leave everything as is - it works but might be slow with larger
> installations and heavy writes
> 2. we maintain the API as-is in Sling and try to make the implementation as
> fast as possible
> 3. we break compatibility in Sling, find a better solution, rewrite parts
> of Sling and require all downstream users to rewrite their stuff
> well, the fourth option would be
> 4. same as 3. but keep the old Sling API with a bold marker when it's used
> that this does not scale
>
> For the sake of compatibility I really would like to go with 2 which might
> require changes in Sling and Oak but sounds to me as the best compromise.
> In addition, it really would help the discussion if we would have
> performance tests showing us the real boundaries in terms of scalability
> with observation with some real figures.
>
> Thanks
> Carsten
> --
> Carsten Ziegeler
> cziegeler@apache.org
>

Re: Oak JCR Observation scalability aspects and concerns

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>I really would like to have a constructive discussion here.

Sure. If we want a fully scalable solution, Sling needs to be changed.

> 1. we leave everything as is
> 2. we maintain the API as-is in Sling and try to make the implementation
>as fast as possible

This will limit scalability so I think it's not an option. Making things
as fast as possible will only help a little bit, but doesn't solve the
problem. Therefore, I wouldn't invest too much time trying to do that.

> 4. same as 3. but keep the old Sling API with a bold marker when it's
>used that this does not scale


Sure, this is what is needed.

We also need to change the Jackrabbit API to support listening for local
events. As I already wrote, the current mechanism using the "isExternal()"
flag doesn't scale. I think we should add a marker interface
"LocalObservationListener" or similar.

Regards,
Thomas

Re: Oak JCR Observation scalability aspects and concerns

Posted by Carsten Ziegeler <cz...@apache.org>.

I really would like to have a constructive discussion here. I think the
Sling use case is pretty well explained now - that's an api Sling offers
and which is used by a lot of code out there (a great part of Sling is
based on the OSGi events and layers on top of Sling are using it as well).
That's a fact and it's also a fact that listeners for the OSGi event
usually listener for all events.

Now basically we have three/four options:
1. we leave everything as is - it works but might be slow with larger
installations and heavy writes
2. we maintain the API as-is in Sling and try to make the implementation as
fast as possible
3. we break compatibility in Sling, find a better solution, rewrite parts
of Sling and require all downstream users to rewrite their stuff
well, the fourth option would be
4. same as 3. but keep the old Sling API with a bold marker when it's used
that this does not scale

For the sake of compatibility I really would like to go with 2 which might
require changes in Sling and Oak but sounds to me as the best compromise.
In addition, it really would help the discussion if we would have
performance tests showing us the real boundaries in terms of scalability
with observation with some real figures.

Thanks
Carsten
-- 
Carsten Ziegeler
cziegeler@apache.org

Re: Oak JCR Observation scalability aspects and concerns

Posted by Dominik Süß <do...@gmail.com>.

Hi :)

Speaking as developer using the Sling eventing I just wanted to add that in
most cases there are restrictions on Paths (most times not just one but
multiple "searchpaths") and on a resourceType (not just exact match but a
set or pattern to identify a set of resourceTypes) and in some occasions
further constraints like existance or a specific value of a specific
property. Currently it is up to the user to do this check within the
EventListener, but I think it would be feasible to register a listener that
defines those checks that can be processed by the implementation at a low
level. But it would be good to ask around in the sling community how these
events are used in production to be sure not to miss essential patterns.

Cheers
Dominik

On Tue, Oct 22, 2013 at 4:20 PM, Jukka Zitting <ju...@gmail.com>wrote:

> Hi,
>
> On Tue, Oct 22, 2013 at 9:59 AM, Felix Meschberger <fm...@adobe.com>
> wrote:
> > That's one Event object per event -- not one event per listener per
> event.
> > This is completely different to JCR.
>
> You're mistaking the problem here, it's not the number of listeners,
> it's the number of events *per listener*.
>
> What we're looking at here is scaling out to write loads that could
> well have over a million changed items per second. On my laptop just
> instantiating a dummy Event object takes a few hundred nanoseconds, so
> there's no way to process millions of them per second in a single
> listener.
>
> BR,
>
> Jukka Zitting
>

Re: Oak JCR Observation scalability aspects and concerns

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Tue, Oct 22, 2013 at 9:59 AM, Felix Meschberger <fm...@adobe.com> wrote:
> That's one Event object per event -- not one event per listener per event.
> This is completely different to JCR.

You're mistaking the problem here, it's not the number of listeners,
it's the number of events *per listener*.

What we're looking at here is scaling out to write loads that could
well have over a million changed items per second. On my laptop just
instantiating a dummy Event object takes a few hundred nanoseconds, so
there's no way to process millions of them per second in a single
listener.

BR,

Jukka Zitting

Re: Oak JCR Observation scalability aspects and concerns

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>That's one Event object per event -- not one event per listener per
>event. This is completely different to JCR.

Well, this still doesn't scale, if every cluster node needs all events. It
doesn't matter how many observation listeners you have.

To block scalability, it's enough to have one observation listener per
cluster node that listens to all events in the whole cluster.

Regards,
Thomas

Re: Oak JCR Observation scalability aspects and concerns

Posted by Felix Meschberger <fm...@adobe.com>.

Hi

Am 22.10.2013 um 15:52 schrieb Jukka Zitting:

> Hi,
> 
> On Tue, Oct 22, 2013 at 9:41 AM, Felix Meschberger <fm...@adobe.com> wrote:
>> The JcrResourceListener just gets JCR Observation events, creates the OSGi Event objects
>> and hands them over for distribution by the OSGi EventAdmin service. The latter service is
>> then responsible for dispatching taking the EventHandler service registration properties into
>> account for filtering.
> 
> Right, but the problem here is that this design can't scale out to
> millions of events per second, as it requires each individual OSGi
> Event to be instantiated before filtering. The proposed service
> interface doesn't need to do that, so it can achieve a much higher
> throughput, and since no event queue is needed, there's no need to
> worry about the queue filling up.

That's one Event object per event -- not one event per listener per event. This is completely different to JCR.

Regards
Felix

> 
> BR,
> 
> Jukka Zitting

Re: Oak JCR Observation scalability aspects and concerns

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Tue, Oct 22, 2013 at 9:41 AM, Felix Meschberger <fm...@adobe.com> wrote:
> The JcrResourceListener just gets JCR Observation events, creates the OSGi Event objects
> and hands them over for distribution by the OSGi EventAdmin service. The latter service is
> then responsible for dispatching taking the EventHandler service registration properties into
> account for filtering.

Right, but the problem here is that this design can't scale out to
millions of events per second, as it requires each individual OSGi
Event to be instantiated before filtering. The proposed service
interface doesn't need to do that, so it can achieve a much higher
throughput, and since no event queue is needed, there's no need to
worry about the queue filling up.

BR,

Jukka Zitting

Re: Oak JCR Observation scalability aspects and concerns

Posted by Felix Meschberger <fm...@adobe.com>.

Hi

Am 22.10.2013 um 15:27 schrieb Jukka Zitting:

> Hi,
> 
> On Tue, Oct 22, 2013 at 5:21 AM, Felix Meschberger <fm...@adobe.com> wrote:
>> Am 22.10.2013 um 11:17 schrieb Chetan Mehrotra:
>>> I think in Sling case it would make sense for it to be implemented as
>>> an Observer. And I had a look at implementation of some of the
>>> listener implementations of [1] and I think they can be easily moved
>>> to Sling OSGi events
>> 
>> To be discussed on the Sling list -- though wearing my Sling hat I am
>> extremely weary of implementing an Oak-dependency in Sling.
>> Sling uses JCR.
> 
> Yet Sling is actively looking to expand its support for other non-JCR
> backends. ;-)


That bears an interesting question, though: What is the relationship of Oak to JCR ?

> 
> I think we should do the same thing here, i.e. have an
> implementation-independent abstraction in Sling that can be
> implemented both by plain JCR and directly by Oak.
> 
> As discussed, the main scalability problem with the current
> JcrResourceListener design is that it needs to handle *all* changes
> and the event producer has no way to know which events really are
> needed. To avoid that problem and to make life easier for most typical
> listeners, I would suggest adding a whiteboard service interface like
> the following:

What you are describing is already implemented in the JcrResourceListener and the OSGi EventAdmin service ;-)

The JcrResourceListener just gets JCR Observation events, creates the OSGi Event objects and hands them over for distribution by the OSGi EventAdmin service. The latter service is then responsible for dispatching taking the EventHandler service registration properties into account for filtering.

Events are collated on a node level with names of added, removed, and modified properties listed in event properties and the node path provided by the path property. Plus we add an indication of whether the event occurred locally or on another cluster node as well as the event's user id if available.

Adding more properties to filter on would certainly be possible.

Regards
Felix

> 
>    interface ContentChangeListener {
>        void contentAdded(String pathToAddedNode);
>        void contentChanged(String pathToChangedNode);
>        void contentRemoved(String pathToRemovedNode);
>    }
> 
> By registering such a service with a set of filter properties that
> identify which content changes are of interest, the client will start
> receiving callbacks at these methods whenever such changes are
> detected. The filter properties could be something like this:
> 
>    paths - the paths under which to listen for changes
>    types - the types of nodes of interest
>    nodes - the names of nodes of interest
>    properties - the names of properties of interest
> 
> For example, the following declaration would result in callbacks
> whenever there's  a base version change of a versionable README.txt
> node somewhere under /source subtree:
> 
>    paths = "/source"
>    types = "mix:versionable"
>    nodes = "README.txt"
>    properties = "jcr:baseVersion"
> 
> Additionally, a "granularity" property could be set to "coarse" to
> indicate that it's fine to deliver events just at the top of a
> modified subtree. For example, if changes are detected both at /foo
> and /foo/bar, a coarsely grained listener would only need to be
> notified at /foo. Setting the property to "fine" would result in
> callbacks for both /foo and /foo/bar.
> 
> For proper access controls, the service would also need to have a
> "credentials" property that contains the access credentials to be used
> for determining which events the listener is entitled to.
> 
> It should be fairly straightforward to support such a service
> interface both with plain JCR observers and with an Oak Observer, with
> the latter being potentially orders of magnitude faster.
> 
> BR,
> 
> Jukka Zitting

Re: Oak JCR Observation scalability aspects and concerns

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Tue, Oct 22, 2013 at 5:21 AM, Felix Meschberger <fm...@adobe.com> wrote:
> Am 22.10.2013 um 11:17 schrieb Chetan Mehrotra:
>> I think in Sling case it would make sense for it to be implemented as
>> an Observer. And I had a look at implementation of some of the
>> listener implementations of [1] and I think they can be easily moved
>> to Sling OSGi events
>
> To be discussed on the Sling list -- though wearing my Sling hat I am
> extremely weary of implementing an Oak-dependency in Sling.
> Sling uses JCR.

Yet Sling is actively looking to expand its support for other non-JCR
backends. ;-)

I think we should do the same thing here, i.e. have an
implementation-independent abstraction in Sling that can be
implemented both by plain JCR and directly by Oak.

As discussed, the main scalability problem with the current
JcrResourceListener design is that it needs to handle *all* changes
and the event producer has no way to know which events really are
needed. To avoid that problem and to make life easier for most typical
listeners, I would suggest adding a whiteboard service interface like
the following:

    interface ContentChangeListener {
        void contentAdded(String pathToAddedNode);
        void contentChanged(String pathToChangedNode);
        void contentRemoved(String pathToRemovedNode);
    }

By registering such a service with a set of filter properties that
identify which content changes are of interest, the client will start
receiving callbacks at these methods whenever such changes are
detected. The filter properties could be something like this:

    paths - the paths under which to listen for changes
    types - the types of nodes of interest
    nodes - the names of nodes of interest
    properties - the names of properties of interest

For example, the following declaration would result in callbacks
whenever there's  a base version change of a versionable README.txt
node somewhere under /source subtree:

    paths = "/source"
    types = "mix:versionable"
    nodes = "README.txt"
    properties = "jcr:baseVersion"

Additionally, a "granularity" property could be set to "coarse" to
indicate that it's fine to deliver events just at the top of a
modified subtree. For example, if changes are detected both at /foo
and /foo/bar, a coarsely grained listener would only need to be
notified at /foo. Setting the property to "fine" would result in
callbacks for both /foo and /foo/bar.

For proper access controls, the service would also need to have a
"credentials" property that contains the access credentials to be used
for determining which events the listener is entitled to.

It should be fairly straightforward to support such a service
interface both with plain JCR observers and with an Oak Observer, with
the latter being potentially orders of magnitude faster.

BR,

Jukka Zitting

Re: Oak JCR Observation scalability aspects and concerns

Posted by Angela Schreiber <an...@adobe.com>.

hi felix

from what i see Sling heavily relies on jackrabbit-core functionality...
i would be very pleased if it would just rely on public API such as JCR,
Jackrabbit API and in the future OAK; it doesn't and this is causing a
lot of troubles.

kind regards
angela

On 10/22/13 11:21 AM, "Felix Meschberger" <fm...@adobe.com> wrote:

>Hi
>
>Am 22.10.2013 um 11:17 schrieb Chetan Mehrotra:
>
>> On Mon, Oct 21, 2013 at 11:39 PM, Jukka Zitting
>><ju...@gmail.com> wrote:
>>> 3) The Observer mechanism allows a listener to look at repository
>>> changes in variable granularity and frequency depending on application
>>> needs and current repository load. Thus an Oak Observer can
>>> potentially process orders of magnitude more changes than a JCR event
>>> listener that needs to look at each individual changed item.
>> 
>> +1
>> 
>> I think in Sling case it would make sense for it to be implemented as
>> an Observer. And I had a look at implementation of some of the
>> listener implementations of [1] and I think they can be easily moved
>> to Sling OSGi events
>
>To be discussed on the Sling list -- though wearing my Sling hat I am
>extremely weary of implementing an Oak-dependency in Sling. Sling uses
>JCR.
>
>Regards
>Felix
>
>> 
>> Chetan Mehrotra
>> [1] 
>>https://gist.github.com/chetanmeh/7081328/raw/listeners-list-filtered.txt
>

Re: Oak JCR Observation scalability aspects and concerns

Posted by Felix Meschberger <fm...@adobe.com>.

Hi

Am 22.10.2013 um 11:17 schrieb Chetan Mehrotra:

> On Mon, Oct 21, 2013 at 11:39 PM, Jukka Zitting <ju...@gmail.com> wrote:
>> 3) The Observer mechanism allows a listener to look at repository
>> changes in variable granularity and frequency depending on application
>> needs and current repository load. Thus an Oak Observer can
>> potentially process orders of magnitude more changes than a JCR event
>> listener that needs to look at each individual changed item.
> 
> +1
> 
> I think in Sling case it would make sense for it to be implemented as
> an Observer. And I had a look at implementation of some of the
> listener implementations of [1] and I think they can be easily moved
> to Sling OSGi events

To be discussed on the Sling list -- though wearing my Sling hat I am extremely weary of implementing an Oak-dependency in Sling. Sling uses JCR.

Regards
Felix

> 
> Chetan Mehrotra
> [1] https://gist.github.com/chetanmeh/7081328/raw/listeners-list-filtered.txt

Re: Oak JCR Observation scalability aspects and concerns

Posted by Chetan Mehrotra <ch...@gmail.com>.

On Mon, Oct 21, 2013 at 11:39 PM, Jukka Zitting <ju...@gmail.com> wrote:
> 3) The Observer mechanism allows a listener to look at repository
> changes in variable granularity and frequency depending on application
> needs and current repository load. Thus an Oak Observer can
> potentially process orders of magnitude more changes than a JCR event
> listener that needs to look at each individual changed item.

+1

I think in Sling case it would make sense for it to be implemented as
an Observer. And I had a look at implementation of some of the
listener implementations of [1] and I think they can be easily moved
to Sling OSGi events

Chetan Mehrotra
[1] https://gist.github.com/chetanmeh/7081328/raw/listeners-list-filtered.txt

Re: Oak JCR Observation scalability aspects and concerns

Posted by Angela Schreiber <an...@adobe.com>.

hi

just one more comment regarding caches from a security perspective

>> However, there are many use cases where a local cache is used on
>> each instance which would make such an approach useless.
>
>Oak has a variety of solutions that could be used to make such code
>more scalable (ordered by increasing effectiveness / decreasing
>implementation-independence):
>
>1) Oak is in often faster than Jackrabbit 2.x especially for
>concurrent access, so there might be no need for caching at all in at
>least some of the cases.

i definitely like that and think we should really try to get rid
of caches where ever possible. they usually tend to be quite
ignorant when it comes to access control and are prone to security
issues.

kind regards
angela

Re: Oak JCR Observation scalability aspects and concerns

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Mon, Oct 21, 2013 at 12:03 PM, Carsten Ziegeler <cz...@apache.org> wrote:
> This Sling listener is providing higher level application support as it
> creates resource events out of jcr observation events. A lot of code in
> Sling now relies on this functionality and even more higher level code
> based on Sling does so. Usually application listeners register for "/" and
> then filter out to when they handle the event. That's a single string
> operation per handler.
> So imho we should find a solition where this is as fast as possible.

There is ultimately nothing we can do on the repository level to fix this.

Consider a situation where N cluster nodes are each doing just 100
updates per second, changing 10 items per each update, then the Sling
listener would need to be able to process N events per *millisecond*
to keep up. That should be no problem for small values of N and
moderate write loads, but when you consider larger clusters (N > 10,
possibly > 100) and higher write loads (Oak can easily do 1k such
updates per second, and the CreateManyChildNodes benchmark currently
shows TarMK writing over 20k items per second) we reach loads where
the listener would have to process each event in less than a
microsecond, which quickly becomes challenging on modern hardware even
when the handler in most cases just does the mentioned single string
comparison.

The scalability bottleneck here isn't the repository, it's the
non-distributed listener that needs to look at every change across the
cluster.

> Now, if we think about a clustered installation, speaking about the Sling
> application, usually the observation events are not needed on each instance
> in the cluster as the same code is running on all instances. So we could
> think about in this direction as well and find out if delivering the events
> only on the originating instance is working.

Right. This would be a much more scalable approach, and would also
play well with the limitation that we extra user data is not available
for observation events originating from external cluster nodes.

> However, there are many use cases where a local cache is used on
> each instance which would make such an approach useless.

Oak has a variety of solutions that could be used to make such code
more scalable (ordered by increasing effectiveness / decreasing
implementation-independence):

1) Oak is in often faster than Jackrabbit 2.x especially for
concurrent access, so there might be no need for caching at all in at
least some of the cases.

2) Custom index definitions can be used to turn expensive
traversal/filtering accesses to efficient queries.

3) The Observer mechanism allows a listener to look at repository
changes in variable granularity and frequency depending on application
needs and current repository load. Thus an Oak Observer can
potentially process orders of magnitude more changes than a JCR event
listener that needs to look at each individual changed item.

4) Commit hooks can be used to maintain an in-repository cache in a
fully distributed manner.

BR,

Jukka Zitting

Re: Oak JCR Observation scalability aspects and concerns

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

I think it's quite clear that "global" observation listeners are not
scalable. (Observation listeners that listen for all events below root,
from all cluster nodes.) It is needed for backward compatibility, but we
need to find a solution to make this obsolete. It's not enough to just
reduce the number of such "global" observation listeners: *all* such
observation listeners have to go away. If we don't do that, then we can't
provide a scalable solution. If the customer adds such an global
observation listener, then OK it's his problem. But if our product (Sling,
CQ) requires such an observation listener, then this is our problem and we
need to solve it.

The only option I see is ensuring *each* observation listener only
receives a subset of the events. That filtering could be:

- Only events that were generated from the current cluster node, for
example by implementing a marker interface ("LocalObservationListener").
Jackrabbit 2.x doesn't support such a marker interface yet. Could the
Sling listener (that listens on "/") do this?

- Only events in a specific path (not "/", and probably not "/content" if
this is where most nodes are).

- Only events of a given node type: it would be tricky to make this
scalable within Oak (observation might need to use the node type index).

Are there other ways to filter events?

Regards,
Thomas

Re: Oak JCR Observation scalability aspects and concerns

Posted by Carsten Ziegeler <cz...@apache.org>.

This Sling listener is providing higher level application support as it
creates resource events out of jcr observation events. A lot of code in
Sling now relies on this functionality and even more higher level code
based on Sling does so. Usually application listeners register for "/" and
then filter out to when they handle the event. That's a single string
operation per handler.
So imho we should find a solition where this is as fast as possible.

Right now, the Sling listener does two things, it compacts the jcr
observation events and then reads each created/modified node to find out
the resource (super) type (with one additional special case).
Obviously, this isn't optimal, especially the read of every changed node -
but as there is no other way to get the values of the two properties, this
is how it is :)
I'm pretty sure that this can be optimized.

Of course, this should be independent of analysing the other parts as
discussed in thread.

Now, if we think about a clustered installation, speaking about the Sling
application, usually the observation events are not needed on each instance
in the cluster as the same code is running on all instances. So we could
think about in this direction as well and find out if delivering the events
only on the originating instance is working. However, there are many use
cases where a local cache is used on each instance which would make such an
approach useless.

Carsten

2013/10/21 Jukka Zitting <ju...@gmail.com>

> Hi,
>
> On Mon, Oct 21, 2013 at 9:47 AM, Bertrand Delacretaz
> <bd...@apache.org> wrote:
> > On Mon, Oct 21, 2013 at 3:17 PM, Jukka Zitting <ju...@gmail.com>
> wrote:
> >> ...Instead of an repository problem (like diffing, event creation,
> etc.),
> >> this analysis tells me that the bottleneck here is the application
> >> that tries to listen to so many events...
> >
> > FWIW, by default Sling does have a listener that catches everything to
> > rebroadcast JCR observation events as OSGi events.
> >
> > See JcrResourceListener [1] which is created by
> > JcrResourceProviderFactory [2] which by default causes it to listen
> > for all node/property added/removed/changed events on /.
> >
> > That's difficult to modify while staying backwards compatible, as we
> > have no way of knowing which events are actually used.
> >
> > I don't have a suggestion at this point, just wanted to make sure
> > you're aware of this.
>
> Right, it just means that a deployment with such an observer will have
> a built-in scalability limit as at some point the listener will no
> longer be able to keep up with all concurrent writes across a large
> and busy cluster.
>
> For now I'd document this limitation and possibly deprecate the
> JcrResourceListener functionality. A deployment can turn the
> functionality off once it reaches the scalability limit and has
> identified/fixed all affected code.
>
> BR,
>
> Jukka Zitting
>

-- 
Carsten Ziegeler
cziegeler@apache.org

Re: Oak JCR Observation scalability aspects and concerns

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Mon, Oct 21, 2013 at 9:47 AM, Bertrand Delacretaz
<bd...@apache.org> wrote:
> On Mon, Oct 21, 2013 at 3:17 PM, Jukka Zitting <ju...@gmail.com> wrote:
>> ...Instead of an repository problem (like diffing, event creation, etc.),
>> this analysis tells me that the bottleneck here is the application
>> that tries to listen to so many events...
>
> FWIW, by default Sling does have a listener that catches everything to
> rebroadcast JCR observation events as OSGi events.
>
> See JcrResourceListener [1] which is created by
> JcrResourceProviderFactory [2] which by default causes it to listen
> for all node/property added/removed/changed events on /.
>
> That's difficult to modify while staying backwards compatible, as we
> have no way of knowing which events are actually used.
>
> I don't have a suggestion at this point, just wanted to make sure
> you're aware of this.

Right, it just means that a deployment with such an observer will have
a built-in scalability limit as at some point the listener will no
longer be able to keep up with all concurrent writes across a large
and busy cluster.

For now I'd document this limitation and possibly deprecate the
JcrResourceListener functionality. A deployment can turn the
functionality off once it reaches the scalability limit and has
identified/fixed all affected code.

BR,

Jukka Zitting

Re: Oak JCR Observation scalability aspects and concerns

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Mon, Oct 21, 2013 at 3:17 PM, Jukka Zitting <ju...@gmail.com> wrote:
> ...Instead of an repository problem (like diffing, event creation, etc.),
> this analysis tells me that the bottleneck here is the application
> that tries to listen to so many events...

FWIW, by default Sling does have a listener that catches everything to
rebroadcast JCR observation events as OSGi events.

See JcrResourceListener [1] which is created by
JcrResourceProviderFactory [2] which by default causes it to listen
for all node/property added/removed/changed events on /.

That's difficult to modify while staying backwards compatible, as we
have no way of knowing which events are actually used.

I don't have a suggestion at this point, just wanted to make sure
you're aware of this.

-Bertrand

[1] https://svn.apache.org/repos/asf/sling/trunk/bundles/jcr/resource/src/main/java/org/apache/sling/jcr/resource/internal/JcrResourceListener.java

[2] https://svn.apache.org/repos/asf/sling/trunk/bundles/jcr/resource/src/main/java/org/apache/sling/jcr/resource/internal/helper/jcr/JcrResourceProviderFactory.java

Re: Oak JCR Observation scalability aspects and concerns

Posted by Chetan Mehrotra <ch...@gmail.com>.

On Mon, Oct 21, 2013 at 6:47 PM, Jukka Zitting <ju...@gmail.com> wrote:
> -1 This introduces the problem where a single JCR event listener can
> block or slow down all other listeners.

That can be mitigated upto an extent by using some sort of Black List
(OAK-1084). However current approach of each listener pulling in the
diff at its own pace is more robust to handle such cases.

> I'm not convinced by the assumption here that the observation
> listeners put undue pressure on the underlying MK or its caching. Do
> we have some data to prove this point? My reasoning is that if in any
> case we have a single (potentially multiplexed as suggested) listener
> that wants to read all the changed nodes, then those nodes will still
> need to be accessed from the MK and placed in the cache. If another
> listener does the same thing, they'll most likely find the items in
> the cache and not repeat the MK accesses. The end result is that the
> main performance cost goes to the first listener and any additional
> ones will come mostly for free, thus the claimed performance benefit
> of multiplexing observers is IMHO questionable.
>

Agreed (and also mentioned earlier) that current approach does cause
multiple calls to MK as in most cases the NodeState would be found in
the cache. However due the access pattern i.e. same node state being
fetched multiple times such entries in cache would get higher priority
and occupy memory which would otherwise would have been used to cache
NodeState for *latest* revision.

This is just an observation and I currently do not have any numbers
which indicate that this would cause significant performance issue and
further such things are hard to measure.

Chetan Mehrotra

Re: Oak JCR Observation scalability aspects and concerns

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Mon, Oct 21, 2013 at 6:38 AM, Chetan Mehrotra
<ch...@gmail.com> wrote:
> Marcel - Basic concern raised is that listeners without any filter
> would cause lots of reads on the repository. these kind of listeners
> would pull in modifications of all sessions performing distributed
> writes. In our view this will not work well because it puts a very
> high load on each of the cluster nodes and will likely delay delivery
> of events.
>
> Thomas - As for the theoretical bottleneck, it is quite clear: lets
> assume there are 1000 cluster nodes, and each one writes 1000 nodes
> per second, there would be 1 million events per second on _each_
> cluster node, and 1 billion events per second in the system. It can
> not possibly scale. Where exactly the bottleneck is (diffing, creating
> the events, whatever) doesn't matter all that much in my view.

Instead of an repository problem (like diffing, event creation, etc.),
this analysis tells me that the bottleneck here is the application
that tries to listen to so many events. It doesn't matter how much we
optimize the observation code inside a repository, as no
non-distributed listener is ever going to be able to handle 1 billion
events per second.

> Change the logic such that we have one Oak listener which listens for
> changes on root and then deliver to all JCR (after filtering by path
> -> nodeType -> user) then we would be able to reduce the number of
> calls to persistent store or cache considerably. So it changes current
> logic where N listeners independently pull in changes  to one where we
> have 1 Oak listener per node which pulls in changes and then delivers
> to all. Serving the same role as Sling Listener does for OSGi stack.

-1 This introduces the problem where a single JCR event listener can
block or slow down all other listeners.

I'm not convinced by the assumption here that the observation
listeners put undue pressure on the underlying MK or its caching. Do
we have some data to prove this point? My reasoning is that if in any
case we have a single (potentially multiplexed as suggested) listener
that wants to read all the changed nodes, then those nodes will still
need to be accessed from the MK and placed in the cache. If another
listener does the same thing, they'll most likely find the items in
the cache and not repeat the MK accesses. The end result is that the
main performance cost goes to the first listener and any additional
ones will come mostly for free, thus the claimed performance benefit
of multiplexing observers is IMHO questionable.

More generally, the basic premise here seems to be that a single
listener would need to scale to observe an entire highly scaled
repository with lots of concurrent writes. As explained by Thomas
above, that simply isn't going to work at the high end use cases. Thus
what we really should be doing to ensure full scalability of
observation, is to try to get rid of observers that listen to all
changes across a cluster. And where we can't do that, we simply accept
the scalability limit inherent in the application design that requires
such unlimited observers.

BR,

Jukka Zitting