You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Davide Giannella <da...@apache.org> on 2015/02/03 12:42:16 UTC

[DISCUSS] Atomic Counters on cluster

Good morning Team,

with OAK-2220 we added support for atomic counters on non-clustered
deployments.

Now with OAK-2472 we're trying to provide a reliable way for having it
working on a mongo cluster as well; read it some sort of async process.

We already instructed the Editor to write hidden variables with UUID and
consolidate them later on (0). As of now the consolidation mechanism
happens in the same commit (synchronous).

(0) http://goo.gl/lxCLGR

I was looking at how we could achieve it and thought that the easiest
way is to do something like the NodeCounterEditorProvider(1) where we
register an async index (default)(2).

(1) http://goo.gl/FbdIRi
(2) http://goo.gl/M1OXll

Pros

- it's very easy to achieve
- don't have to go through any refactoring or wheel-reinventing.
- customers running on segment and willing to have the very latest
counts up to dates will simply have to turn the index configuration to sync

Cons

- it's not really an index and sounds a bit misleading
- we should "share" the async mechanism so that it can be used by both
query indexes as any other additional process
- the point above could involve major refactoring of a core functionality.
- we should provide a different mechanism for configuring if wanted
synchronous or asynchronous.

Thoughts?

Cheers
Davide

Re: [DISCUSS] Atomic Counters on cluster

Posted by Davide Giannella <da...@apache.org>.

On 05/02/2015 16:43, Thomas Mueller wrote:
> Hi,
>
> I would use the "index" approach, at least for now. By the way, I have
> used the same idea for the approximate node counter mechanism (OAK-1907).
Yes that's the direction we're taking for now (see previous email from
myself). And the approximate counter is from where I took the inspiration.
> A generalisation of the "atomic counter" problem is the "atomic sum"
> problem. For that, you don't just want to support +1 and -1, but +x and
> -x, 
It's already this way.
> possibly with a constraint (an allowed range), so it could be used for
> reservation systems (in synchronous mode only; not in a cluster). 
This is a nice new feature that could be added. Do you mind to file an
improvement ticket so that we won't forget it. We'll then see when to
schedule it for future releases.

> One
> possible solution is:
>
> ...
>
We already have a solution in place for atomic counter. See
https://issues.apache.org/jira/browse/OAK-2220 and the attached (in the
issue) doc for the details.

Cheers
Davide

Re: [DISCUSS] Atomic Counters on cluster

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

I would use the "index" approach, at least for now. By the way, I have
used the same idea for the approximate node counter mechanism (OAK-1907).

A generalisation of the "atomic counter" problem is the "atomic sum"
problem. For that, you don't just want to support +1 and -1, but +x and
-x, possibly with a constraint (an allowed range), so it could be used for
reservation systems (in synchronous mode only; not in a cluster). One
possible solution is:

The data itself is stored in the content:

/content/flight_345/seats/@seat:count = 100
/content/flight_345/reservation_1/@seat:count = -1

/content/flight_345/reservation_2/@seat:count = -3
/content/flight_1001/seats/@seat:count = 200
/content/flight_1001/reservation_1/@seat:count = -3
/content/flight_1001/reservation_2/@seat:count = -2

The query to get the current count would be:

    select sum(count) from [nt:base]
    where descendantnode('/content/flight_1001')

That query would be very fast, O(1), if there is an index. The index
definition would be:


# index of type "sum", on property "seat:count"
/oak:index/flights/@type = sum
/oak:index/flights/@propertyName = seat:count


# restriction: sum must be at least 0, otherwise the commit fails (similar
to a unique index)
/oak:index/flights/@min = 0

# aggregate in the parent, so keep one sum(count) per flight
/oak:index/flights/@aggregationLevel = 1


For an async index, the "min" constraint can't be guaranteed.

For fast counters, if the count can get very high (page access count for
example), you probably want to avoid many nodes. For that case, a
background thread should be used, that only updates the content
periodically (once every 10 seconds for example), and aggregates the
content (replaces all nodes in /content/x/* once in a while with just one
node). 

Regards,
Thomas








On 04/02/15 10:33, "Davide Giannella" <da...@apache.org> wrote:

>On 03/02/2015 15:41, Michael Dürig wrote:
>>
>> Hi,
>>
>> I think we should keep this independent from indexing. Running the
>> counter consolidation asynchronously might have commonalities with
>> async indexing. If so, I'd first implement the former separately and
>> then factor out the commonalties.
>Discussed off-list with Michael came up with the proposed approach
>
>- try out the index configuration approach as it should be a quick win.
>- refactor later on
>
>About the refactoring here's my thinking which I'd like to have some
>ideas as I didn't look at the details and something could be wrong.
>
>Right now we have a reliable way to run async processes (indexes in the
>specific case) in oak that is the async index.
>
>This has a known way of configuring the aspects by acting on
>oak:index/oak:queryIndexDefinition.
>
>Proposal:
>
>- rename the AsyncIndexUpdate into something like AsyncProcess
>- create a new area in the repo, beside the oak:index where we can
>configure these aspects. For example oak:processes/oak:processDefinition
>- instruct the AsyncProcess to understand this new area as well.
>
>Thoughts?
>
>D.
>
>

Re: [DISCUSS] Atomic Counters on cluster

Posted by Davide Giannella <da...@apache.org>.

On 03/02/2015 15:41, Michael Dürig wrote:
>
> Hi,
>
> I think we should keep this independent from indexing. Running the
> counter consolidation asynchronously might have commonalities with
> async indexing. If so, I'd first implement the former separately and
> then factor out the commonalties.
Discussed off-list with Michael came up with the proposed approach

- try out the index configuration approach as it should be a quick win.
- refactor later on

About the refactoring here's my thinking which I'd like to have some
ideas as I didn't look at the details and something could be wrong.

Right now we have a reliable way to run async processes (indexes in the
specific case) in oak that is the async index.

This has a known way of configuring the aspects by acting on
oak:index/oak:queryIndexDefinition.

Proposal:

- rename the AsyncIndexUpdate into something like AsyncProcess
- create a new area in the repo, beside the oak:index where we can
configure these aspects. For example oak:processes/oak:processDefinition
- instruct the AsyncProcess to understand this new area as well.

Thoughts?

D.

Re: [DISCUSS] Atomic Counters on cluster

Posted by Michael Dürig <md...@apache.org>.

Hi,

I think we should keep this independent from indexing. Running the 
counter consolidation asynchronously might have commonalities with async 
indexing. If so, I'd first implement the former separately and then 
factor out the commonalties.

Michael

On 3.2.15 12:42 , Davide Giannella wrote:
> Good morning Team,
>
> with OAK-2220 we added support for atomic counters on non-clustered
> deployments.
>
> Now with OAK-2472 we're trying to provide a reliable way for having it
> working on a mongo cluster as well; read it some sort of async process.
>
> We already instructed the Editor to write hidden variables with UUID and
> consolidate them later on (0). As of now the consolidation mechanism
> happens in the same commit (synchronous).
>
> (0) http://goo.gl/lxCLGR
>
> I was looking at how we could achieve it and thought that the easiest
> way is to do something like the NodeCounterEditorProvider(1) where we
> register an async index (default)(2).
>
> (1) http://goo.gl/FbdIRi
> (2) http://goo.gl/M1OXll
>
> Pros
>
> - it's very easy to achieve
> - don't have to go through any refactoring or wheel-reinventing.
> - customers running on segment and willing to have the very latest
> counts up to dates will simply have to turn the index configuration to sync
>
> Cons
>
> - it's not really an index and sounds a bit misleading
> - we should "share" the async mechanism so that it can be used by both
> query indexes as any other additional process
> - the point above could involve major refactoring of a core functionality.
> - we should provide a different mechanism for configuring if wanted
> synchronous or asynchronous.
>
> Thoughts?
>
> Cheers
> Davide
>