You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Michael Dürig <md...@adobe.com> on 2012/02/22 18:36:41 UTC

[jr3 trade consistency for availability]

Hi,

Last week's F2F resulted in an initial draft of goals for jr3 [1]. A 
general direction this is taking is trading some of the consistency 
guarantees for better availability (especially in a clustered set up). 
As it stands - and as Jukka already noted - the specifics are currently 
too vague and we need further refinements.

What are the consistency assumptions a JCR client should be allowed to 
make?

An approach where temporary inconsistencies are tolerated (i.e. eventual 
consistency) increases availability and throughput. In such a case 
do/can/should we tolerate temporary violations of:

- Node type constraints?
- Access control rights?
- Lock enforcement?
- Query index consistency?
- Atomicity of save operations?
- ...?

Should we offer alternatives in some of these cases? That is, give the 
client the ability to choose between consistency and availability.

Michael


[1] 
http://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20Jackrabbit%203

Re: [jr3 trade consistency for availability]

Posted by Michael Dürig <md...@apache.org>.

>>>> What are the consistency assumptions a JCR client should be allowed to
>>>> make?
>>>>
>>>> An approach where temporary inconsistencies are tolerated (i.e. eventual
>>>> consistency) increases availability and throughput. In such a case
>>>> do/can/should we tolerate temporary violations of:
>>>>
>>>> - Node type constraints?
>>>
>>> so far we seem to have only discussed edge cases where node type
>>> constraints could be violated. I think, they are not too relevant in
>>> a real life system. I'd be OK to make some compromises in this area.
>>
>> With the current Microkernel whether these cases (i.e. write skew) [1]
>> are edge case or not depends on the degree of write concurrency we
>> anticipate. If we fully synchronize all writes, these cases wont occur
>> at all. If OTOH we aim for highly concurrent writes, we will see such
>> cases possibly more often than we like.
>
> I think most applications that have highly concurrent writes usually
> distribute the writes across many nodes. e.g. you have lots of users
> working with the system, but each of them is working with his/her
> own dataset.

This is correct as long as we exclude collaborative workspace use cases 
where users typically work on the same document concurrently.

[...]

> To me the example on the wiki page is a reason to drop support
> for setPrimaryType() for jr3. The specification says:

Agreed. Note however, that the same problem also occurs for mixins.

[...]

> Do we have other examples where we know consistency from a
> JCR perspective is at risk?

Referential integrity for mix:referenceable nodes might break in the 
same way.

The problem occurs anywhere where parts of the data in a save depend in 
some way on other parts of that save. For example when two properties of 
a node need to obey a certain condition.
This might make it also hard to implement things like versioning since 
the implementation must then encode dependent JCR properties into the 
same JSON value of the underlying Microkernel in order circumvent this 
problem.

As discussed in an earlier thread, the problem is easily fixed for 
direct clients of the Microkernel API if we add some testAndSet 
functionality to the Microkernel.

Michael



>
>> [1]
>> http://wiki.apache.org/jackrabbit/Transactional%20model%20of%20the%20
>> Microkernel%20based%20Jackrabbit%20prototype
>>
>>>> - Access control rights?
>>>
>>> I don't think any violations are acceptable here.
>>
>> Me neither. But again we need to be aware of the write skew issue here:
>> an ACL implementation must be very careful about its consistency
>> assumptions or it will eventually fail.
>>
>>>> - Lock enforcement?
>>>
>>> that's definitively a tough one because it depends on repository
>>> wide state.
>>
>> This is an area where Apache Zookeeper might help out.
>>
>>>> - Query index consistency?
>>>
>>> I think consistency is a prerequisite here, otherwise it's quite
>>> difficult to implement the query functionality. I'd rather
>>> make compromises for availability. eg. terminate a long query
>>> execution with an exception because the snapshot it was
>>> working on is not available anymore.
>>
>> I was more thinking of the other direction: would it be tolerable to
>> have the query index not up to date yet? (i.e. after a possibly large
>> save.) Again, this could either result in incomplete query results, an
>> exception or the query to be deferred until the index is up to date.
>> Maybe we could even let the client chose through 'query hints'.
>
> I like the query hint idea.
>
> alternatively we could also deny access to the most recent revision
> until the index is updated (possibly asynchronously). this way
> reads and writes are fast at the cost of consistency. reads would
> be eventually consistent (once index is updated).
>
> regards
>   marcel
>
>> Michael
>>
>>>
>>>> - Atomicity of save operations?
>>>
>>> how does a temporary violation of atomic saves look like?
>>> are you thinking of partially visible changes?
>>>
>>> regards
>>>    marcel
>>>
>>>> - ...?
>>>>
>>>> Should we offer alternatives in some of these cases? That is, give the
>>>> client the ability to choose between consistency and availability.
>>>>
>>>> Michael
>>>>
>>>>
>>>> [1]
>>>>
>> http://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20J
>>>> ackrabbit%203

RE: [jr3 trade consistency for availability]

Posted by Marcel Reutegger <mr...@adobe.com>.

Hi,

> >> What are the consistency assumptions a JCR client should be allowed to
> >> make?
> >>
> >> An approach where temporary inconsistencies are tolerated (i.e. eventual
> >> consistency) increases availability and throughput. In such a case
> >> do/can/should we tolerate temporary violations of:
> >>
> >> - Node type constraints?
> >
> > so far we seem to have only discussed edge cases where node type
> > constraints could be violated. I think, they are not too relevant in
> > a real life system. I'd be OK to make some compromises in this area.
> 
> With the current Microkernel whether these cases (i.e. write skew) [1]
> are edge case or not depends on the degree of write concurrency we
> anticipate. If we fully synchronize all writes, these cases wont occur
> at all. If OTOH we aim for highly concurrent writes, we will see such
> cases possibly more often than we like.

I think most applications that have highly concurrent writes usually
distribute the writes across many nodes. e.g. you have lots of users
working with the system, but each of them is working with his/her
own dataset.

I think conflicts are likely (even with low concurrency) when
nodes are added and/or removed on the same parent. These
kind of conflicts should IMO be resolved efficiently and
consistently. As you mentioned on the wiki page, these kind
of concurrent changes are usually not incompatible and can be
merged.

To me the example on the wiki page is a reason to drop support
for setPrimaryType() for jr3. The specification says:

"10.10.2 Updating a Node's Primary Type

A repository /may/ permit the primary type of a node to be changed
during its lifetime. Repositories are free to limit the scope of permitted
changes both in terms of which nodes may be changed and which
changes are allowed."

Do we have other examples where we know consistency from a
JCR perspective is at risk?

> [1]
> http://wiki.apache.org/jackrabbit/Transactional%20model%20of%20the%20
> Microkernel%20based%20Jackrabbit%20prototype
> 
> >> - Access control rights?
> >
> > I don't think any violations are acceptable here.
> 
> Me neither. But again we need to be aware of the write skew issue here:
> an ACL implementation must be very careful about its consistency
> assumptions or it will eventually fail.
> 
> >> - Lock enforcement?
> >
> > that's definitively a tough one because it depends on repository
> > wide state.
> 
> This is an area where Apache Zookeeper might help out.
> 
> >> - Query index consistency?
> >
> > I think consistency is a prerequisite here, otherwise it's quite
> > difficult to implement the query functionality. I'd rather
> > make compromises for availability. eg. terminate a long query
> > execution with an exception because the snapshot it was
> > working on is not available anymore.
> 
> I was more thinking of the other direction: would it be tolerable to
> have the query index not up to date yet? (i.e. after a possibly large
> save.) Again, this could either result in incomplete query results, an
> exception or the query to be deferred until the index is up to date.
> Maybe we could even let the client chose through 'query hints'.

I like the query hint idea.

alternatively we could also deny access to the most recent revision
until the index is updated (possibly asynchronously). this way 
reads and writes are fast at the cost of consistency. reads would
be eventually consistent (once index is updated).

regards
 marcel

> Michael
> 
> >
> >> - Atomicity of save operations?
> >
> > how does a temporary violation of atomic saves look like?
> > are you thinking of partially visible changes?
> >
> > regards
> >   marcel
> >
> >> - ...?
> >>
> >> Should we offer alternatives in some of these cases? That is, give the
> >> client the ability to choose between consistency and availability.
> >>
> >> Michael
> >>
> >>
> >> [1]
> >>
> http://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20J
> >> ackrabbit%203

Re: [jr3 trade consistency for availability]

Posted by Michael Dürig <md...@apache.org>.

>> What are the consistency assumptions a JCR client should be allowed to
>> make?
>>
>> An approach where temporary inconsistencies are tolerated (i.e. eventual
>> consistency) increases availability and throughput. In such a case
>> do/can/should we tolerate temporary violations of:
>>
>> - Node type constraints?
>
> so far we seem to have only discussed edge cases where node type
> constraints could be violated. I think, they are not too relevant in
> a real life system. I'd be OK to make some compromises in this area.

With the current Microkernel whether these cases (i.e. write skew) [1] 
are edge case or not depends on the degree of write concurrency we 
anticipate. If we fully synchronize all writes, these cases wont occur 
at all. If OTOH we aim for highly concurrent writes, we will see such 
cases possibly more often than we like.

[1] 
http://wiki.apache.org/jackrabbit/Transactional%20model%20of%20the%20Microkernel%20based%20Jackrabbit%20prototype

>> - Access control rights?
>
> I don't think any violations are acceptable here.

Me neither. But again we need to be aware of the write skew issue here: 
an ACL implementation must be very careful about its consistency 
assumptions or it will eventually fail.

>> - Lock enforcement?
>
> that's definitively a tough one because it depends on repository
> wide state.

This is an area where Apache Zookeeper might help out.

>> - Query index consistency?
>
> I think consistency is a prerequisite here, otherwise it's quite
> difficult to implement the query functionality. I'd rather
> make compromises for availability. eg. terminate a long query
> execution with an exception because the snapshot it was
> working on is not available anymore.

I was more thinking of the other direction: would it be tolerable to 
have the query index not up to date yet? (i.e. after a possibly large 
save.) Again, this could either result in incomplete query results, an 
exception or the query to be deferred until the index is up to date. 
Maybe we could even let the client chose through 'query hints'.

Michael

>
>> - Atomicity of save operations?
>
> how does a temporary violation of atomic saves look like?
> are you thinking of partially visible changes?
>
> regards
>   marcel
>
>> - ...?
>>
>> Should we offer alternatives in some of these cases? That is, give the
>> client the ability to choose between consistency and availability.
>>
>> Michael
>>
>>
>> [1]
>> http://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20J
>> ackrabbit%203

Re: [jr3 trade consistency for availability]

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>but in a distributed setup we cannot just use a simple counter.

I believe with vector clocks you can, see
http://en.wikipedia.org/wiki/Vector_clock

Regards,
Thomas

RE: [jr3 trade consistency for availability]

Posted by Marcel Reutegger <mr...@adobe.com>.

> > I understand this would require tagging each node with a timestamp,
> >right?
>  > If that's the case, then it's not just about complexity, but also
> additional
>  > storage requirements.
> 
> If the node id is a counter, then there is no additional storage
> requirement.

but in a distributed setup we cannot just use a simple counter.
we'd have to find a way to partition the id (key) space and at
the same time make sure the complete id space is at least
partially ordered.

regards
 marcel

Re: [jr3 trade consistency for availability]

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

> I understand this would require tagging each node with a timestamp,
>right?
 > If that's the case, then it's not just about complexity, but also
additional
 > storage requirements.

If the node id is a counter, then there is no additional storage
requirement.

Regards,
Thomas

RE: [jr3 trade consistency for availability]

Posted by Marcel Reutegger <mr...@adobe.com>.

> On 29.2.12 15:45, Marcel Reutegger wrote:
> >> Vector clocks. See the presentation [1] which I prepared for the last F2F.
> >
> > I understand this would require tagging each node with a timestamp, right?
> > If that's the case, then it's not just about complexity, but also additional
> > storage requirements.
> 
> Right. But since we version nodes already, we might come up with a
> clever way to reuse the id here. Just an idea though...

I was thinking of the same later last night... ;)

regards
 marcel

Re: [jr3 trade consistency for availability]

Posted by Michael Dürig <md...@apache.org>.


On 29.2.12 15:45, Marcel Reutegger wrote:
>> Vector clocks. See the presentation [1] which I prepared for the last F2F.
>
> I understand this would require tagging each node with a timestamp, right?
> If that's the case, then it's not just about complexity, but also additional
> storage requirements.

Right. But since we version nodes already, we might come up with a 
clever way to reuse the id here. Just an idea though...

Michael

>
> regards
>   marcel

RE: [jr3 trade consistency for availability]

Posted by Marcel Reutegger <mr...@adobe.com>.

> Vector clocks. See the presentation [1] which I prepared for the last F2F.

I understand this would require tagging each node with a timestamp, right?
If that's the case, then it's not just about complexity, but also additional
storage requirements.

regards
 marcel

Re: [jr3 trade consistency for availability]

Posted by Dominique Pfister <dp...@adobe.com>.

Vector clocks seem to not work well in systems with dynamic number of participants, a problem that is adressed by Interval Tree Clocks [1] and [2].

[1] https://github.com/ricardobcl/Interval-Tree-Clocks
[2] http://gsd.di.uminho.pt/members/cbm/ps/itc2008.pdf

On Feb 29, 2012, at 3:38 PM, Michael Dürig wrote:

> 
> Vector clocks. See the presentation [1] which I prepared for the last F2F.
> 
> [1] 
> https://docs.google.com/presentation/pub?id=131sVx5s58jAKE2FSVBfUZVQSl1W820_syyzLYRHGH6E&start=false&loop=false&delayms=3000#slide=id.g4272a65_0_39

Re: [jr3 trade consistency for availability]

Posted by Michael Dürig <md...@apache.org>.

On 29.2.12 13:52, Marcel Reutegger wrote:
> Hi,
>
>> On Feb 28, 2012, at 3:54 PM, Marcel Reutegger wrote:
>>
>>> I'd solve this differently. Saves are always performed on one
>>> partition,
>>> even if some of the change set actually goes beyond a given partition.
>>> this is however assuming that our implementation supports dynamic
>>> partitioning and redistribution (e.g. when a new cluster node is added
>>> to the federation). in this case the excessive part of the change set
>>> would eventually be migrated to the correct cluster node.
>>
>> I'd like to better understand your approach: if we have, say,
>> Partitions P  and Q, containing subtrees /p and /q, respectively, then
>> a save that spans elements in both /p and /q might be saved in P
>> first, and later migrated to Q? What happens if this later migration
>> leads to a conflict?
>
> I guess this would be the result of a concurrent save when there's
> an additional conflicting save under /q at the same time. good
> question... CouchDB solves this with a deterministic algorithm
> that simply picks one revision as the latest one and flags the conflict.
> maybe we could use something similar?

Vector clocks. See the presentation [1] which I prepared for the last F2F.

[1] 
https://docs.google.com/presentation/pub?id=131sVx5s58jAKE2FSVBfUZVQSl1W820_syyzLYRHGH6E&start=false&loop=false&delayms=3000#slide=id.g4272a65_0_39

RE: [jr3 trade consistency for availability]

Posted by Marcel Reutegger <mr...@adobe.com>.

> So, this could result in a save on P that initially succeeds but
> ultimately fails, because the concurrent one on Q wins? I'm wondering
> how this could be reflected to an MK client: if a save corresponds to
> a MK commit call that immediately returns a new revision ID, would you
> suggest that the mentioned algorithm adds a "shadow" commit (leading
> to a new head revision ID) on P, that effectively reverts the
> conflicting save on P?

yes, I think that's a reasonable approach. if a mechanism like this is used
it should happen automatically but it doesn't have to be magically. that
is, a client should be able to reconstruct the process.

regards
 marcel

Re: [jr3 trade consistency for availability]

Posted by Dominique Pfister <dp...@adobe.com>.

On Feb 29, 2012, at 5:45 PM, Michael Dürig wrote:

> That's an idea I mentioned earlier already [1]: make cluster sync
> transparent to JCR sessions. That is, any modification required by the
> sync, should look like just another session operation to JCR clients
> (i.e. there should also be observation events for such changes).

Ah, this did not catch my eye: in JR2, cluster syncs are transparent  
to JCR sessions as well, although a conflict never needs to be  
resolved or a change undone, because of its lock-and-sync contract.  
That a cluster node could actually commit a change now and revert it  
later on the MK level because of a conflict - as if some other party  
had actually performed this revert-operation - is a new and  
interesting idea.

Dominique

>
> Michael
>
> [1]
> https://docs.google.com/presentation/pub?id=131sVx5s58jAKE2FSVBfUZVQSl1W820_syyzLYRHGH6E&start=false&loop=false&delayms=3000 
> #slide=id.g4272a65_0_39
>
>>
>> Dominique
>>
>>>
>>> regards
>>> marcel
>>>
>>>
>>

Re: [jr3 trade consistency for availability]

Posted by Michael Dürig <md...@apache.org>.


On 29.2.12 16:30, Dominique Pfister wrote:
> Hi,
>
> On Feb 29, 2012, at 2:52 PM, Marcel Reutegger wrote:
>
>> Hi,
>>
>>> On Feb 28, 2012, at 3:54 PM, Marcel Reutegger wrote:
>>>
>>>> I'd solve this differently. Saves are always performed on one
>>>> partition,
>>>> even if some of the change set actually goes beyond a given partition.
>>>> this is however assuming that our implementation supports dynamic
>>>> partitioning and redistribution (e.g. when a new cluster node is added
>>>> to the federation). in this case the excessive part of the change set
>>>> would eventually be migrated to the correct cluster node.
>>>
>>> I'd like to better understand your approach: if we have, say,
>>> Partitions P and Q, containing subtrees /p and /q, respectively, then
>>> a save that spans elements in both /p and /q might be saved in P
>>> first, and later migrated to Q? What happens if this later migration
>>> leads to a conflict?
>>
>> I guess this would be the result of a concurrent save when there's
>> an additional conflicting save under /q at the same time. good
>> question... CouchDB solves this with a deterministic algorithm
>> that simply picks one revision as the latest one and flags the conflict.
>> maybe we could use something similar?
>
> So, this could result in a save on P that initially succeeds but
> ultimately fails, because the concurrent one on Q wins? I'm wondering
> how this could be reflected to an MK client: if a save corresponds to a
> MK commit call that immediately returns a new revision ID, would you
> suggest that the mentioned algorithm adds a "shadow" commit (leading to
> a new head revision ID) on P, that effectively reverts the conflicting
> save on P?

That's an idea I mentioned earlier already [1]: make cluster sync 
transparent to JCR sessions. That is, any modification required by the 
sync, should look like just another session operation to JCR clients 
(i.e. there should also be observation events for such changes).

Michael

[1] 
https://docs.google.com/presentation/pub?id=131sVx5s58jAKE2FSVBfUZVQSl1W820_syyzLYRHGH6E&start=false&loop=false&delayms=3000#slide=id.g4272a65_0_39

>
> Dominique
>
>>
>> regards
>> marcel
>>
>>
>

Re: [jr3 trade consistency for availability]

Posted by Dominique Pfister <dp...@adobe.com>.

Hi,

On Feb 29, 2012, at 2:52 PM, Marcel Reutegger wrote:

> Hi,
>
>> On Feb 28, 2012, at 3:54 PM, Marcel Reutegger wrote:
>>
>>> I'd solve this differently. Saves are always performed on one
>>> partition,
>>> even if some of the change set actually goes beyond a given  
>>> partition.
>>> this is however assuming that our implementation supports dynamic
>>> partitioning and redistribution (e.g. when a new cluster node is  
>>> added
>>> to the federation). in this case the excessive part of the change  
>>> set
>>> would eventually be migrated to the correct cluster node.
>>
>> I'd like to better understand your approach: if we have, say,
>> Partitions P  and Q, containing subtrees /p and /q, respectively,  
>> then
>> a save that spans elements in both /p and /q might be saved in P
>> first, and later migrated to Q? What happens if this later migration
>> leads to a conflict?
>
> I guess this would be the result of a concurrent save when there's
> an additional conflicting save under /q at the same time. good
> question... CouchDB solves this with a deterministic algorithm
> that simply picks one revision as the latest one and flags the  
> conflict.
> maybe we could use something similar?

So, this could result in a save on P that initially succeeds but  
ultimately fails, because the concurrent one on Q wins? I'm wondering  
how this could be reflected to an MK client: if a save corresponds to  
a MK commit call that immediately returns a new revision ID, would you  
suggest that the mentioned algorithm adds a "shadow" commit (leading  
to a new head revision ID) on P, that effectively reverts the  
conflicting save on P?

Dominique

>
> regards
> marcel
>
>

RE: [jr3 trade consistency for availability]

Posted by Marcel Reutegger <mr...@adobe.com>.

Hi,

> On Feb 28, 2012, at 3:54 PM, Marcel Reutegger wrote:
> 
> > I'd solve this differently. Saves are always performed on one
> > partition,
> > even if some of the change set actually goes beyond a given partition.
> > this is however assuming that our implementation supports dynamic
> > partitioning and redistribution (e.g. when a new cluster node is added
> > to the federation). in this case the excessive part of the change set
> > would eventually be migrated to the correct cluster node.
> 
> I'd like to better understand your approach: if we have, say,
> Partitions P  and Q, containing subtrees /p and /q, respectively, then
> a save that spans elements in both /p and /q might be saved in P
> first, and later migrated to Q? What happens if this later migration
> leads to a conflict?

I guess this would be the result of a concurrent save when there's
an additional conflicting save under /q at the same time. good
question... CouchDB solves this with a deterministic algorithm
that simply picks one revision as the latest one and flags the conflict.
maybe we could use something similar?

regards
 marcel

Re: [jr3 trade consistency for availability]

Posted by Dominique Pfister <dp...@adobe.com>.

Hi,

On Feb 28, 2012, at 3:54 PM, Marcel Reutegger wrote:

> I'd solve this differently. Saves are always performed on one  
> partition,
> even if some of the change set actually goes beyond a given partition.
> this is however assuming that our implementation supports dynamic
> partitioning and redistribution (e.g. when a new cluster node is added
> to the federation). in this case the excessive part of the change set
> would eventually be migrated to the correct cluster node.

I'd like to better understand your approach: if we have, say,  
Partitions P  and Q, containing subtrees /p and /q, respectively, then  
a save that spans elements in both /p and /q might be saved in P  
first, and later migrated to Q? What happens if this later migration  
leads to a conflict?

Regards
Dominique

>
> regards
> marcel

RE: [jr3 trade consistency for availability]

Posted by Marcel Reutegger <mr...@adobe.com>.

> > I'd solve this differently. Saves are always performed on one partition,
> > even if some of the change set actually goes beyond a given partition.
> > this is however assuming that our implementation supports dynamic
> > partitioning and redistribution (e.g. when a new cluster node is added
> > to the federation). in this case the excessive part of the change set
> > would eventually be migrated to the correct cluster node.
> >
> 
> This is a very attractive approach I think. There are still too many
> open questions regarding cluster topology, distribution and scale ATM
> but we should definitely keep this in mind!

btw, one of the drawback of this approach is, that it is not trival anymore
where the borders of the partitions are.

regards
 marcel

Re: [jr3 trade consistency for availability]

Posted by Michael Dürig <md...@apache.org>.


On 28.2.12 14:54, Marcel Reutegger wrote:
> Hi,
>
>> On 23.2.12 11:43, Michael Dürig wrote:
>>>
>>>
>>>>> - Atomicity of save operations?
>>>>
>>>> how does a temporary violation of atomic saves look like?
>>>> are you thinking of partially visible changes?
>>>>
>>>
>>> I actually had clustering on my mind where the repository is partitioned
>>> across various cluster nodes. If we require atomicity for save operation
>>> across partitions we will need to implement some form of atomic
>>> commitment protocol (i.e. two phase commit). This can cause blocking in
>>> the face of network failures (i.e. less availability).
>>
>> Again, Apache Zookeeper might be worth looking into if we decide to
>> implement ACP.
>
> I'd solve this differently. Saves are always performed on one partition,
> even if some of the change set actually goes beyond a given partition.
> this is however assuming that our implementation supports dynamic
> partitioning and redistribution (e.g. when a new cluster node is added
> to the federation). in this case the excessive part of the change set
> would eventually be migrated to the correct cluster node.
>

This is a very attractive approach I think. There are still too many 
open questions regarding cluster topology, distribution and scale ATM 
but we should definitely keep this in mind!

Michael

RE: [jr3 trade consistency for availability]

Posted by Marcel Reutegger <mr...@adobe.com>.

Hi,

> On 23.2.12 11:43, Michael Dürig wrote:
> >
> >
> >>> - Atomicity of save operations?
> >>
> >> how does a temporary violation of atomic saves look like?
> >> are you thinking of partially visible changes?
> >>
> >
> > I actually had clustering on my mind where the repository is partitioned
> > across various cluster nodes. If we require atomicity for save operation
> > across partitions we will need to implement some form of atomic
> > commitment protocol (i.e. two phase commit). This can cause blocking in
> > the face of network failures (i.e. less availability).
> 
> Again, Apache Zookeeper might be worth looking into if we decide to
> implement ACP.

I'd solve this differently. Saves are always performed on one partition,
even if some of the change set actually goes beyond a given partition.
this is however assuming that our implementation supports dynamic
partitioning and redistribution (e.g. when a new cluster node is added
to the federation). in this case the excessive part of the change set
would eventually be migrated to the correct cluster node.

regards
 marcel

Re: [jr3 trade consistency for availability]

Posted by Michael Dürig <mi...@gmail.com>.


On 23.2.12 11:43, Michael Dürig wrote:
>
>
>>> - Atomicity of save operations?
>>
>> how does a temporary violation of atomic saves look like?
>> are you thinking of partially visible changes?
>>
>
> I actually had clustering on my mind where the repository is partitioned
> across various cluster nodes. If we require atomicity for save operation
> across partitions we will need to implement some form of atomic
> commitment protocol (i.e. two phase commit). This can cause blocking in
> the face of network failures (i.e. less availability).

Again, Apache Zookeeper might be worth looking into if we decide to 
implement ACP.

Michael

>
> Michael

Re: [jr3 trade consistency for availability]

Posted by Michael Dürig <md...@apache.org>.


>> - Atomicity of save operations?
>
> how does a temporary violation of atomic saves look like?
> are you thinking of partially visible changes?
>

I actually had clustering on my mind where the repository is partitioned 
across various cluster nodes. If we require atomicity for save operation 
across partitions we will need to implement some form of atomic 
commitment protocol (i.e. two phase commit). This can cause blocking in 
the face of network failures (i.e. less availability).

Michael

Re: [jr3 trade consistency for availability]

Posted by Ard Schrijvers <a....@onehippo.com>.

On Fri, Feb 24, 2012 at 11:01 AM, Michael Dürig <md...@apache.org> wrote:
>
> On 24.2.12 9:53, Ard Schrijvers wrote:
>>
>> On Thu, Feb 23, 2012 at 1:48 PM, Thomas Mueller<mu...@adobe.com>  wrote:
>>>
>>> Hi,
>>>
>>>> after a search is executed, you get back a jcr
>>>> NodeIterator from the search result. In the mean time, while
>>>> iterating, a node from the result can be deleted by a different
>>>> session.
>>>
>>>
>>> We plan to use a MVCC model, so you will still see the old data.
>>
>>
>> I missed that! (should catch up first)
>
>
> FYI: There is some information about what has already been done for jr3 on
> the Wiki [1].

Thanks Michael!

Regards Ard

>
> Michael
>
> [1] http://wiki.apache.org/jackrabbit/Jackrabbit%203

Re: [jr3 trade consistency for availability]

Posted by Michael Dürig <md...@apache.org>.

On 24.2.12 9:53, Ard Schrijvers wrote:
> On Thu, Feb 23, 2012 at 1:48 PM, Thomas Mueller<mu...@adobe.com>  wrote:
>> Hi,
>>
>>> after a search is executed, you get back a jcr
>>> NodeIterator from the search result. In the mean time, while
>>> iterating, a node from the result can be deleted by a different
>>> session.
>>
>> We plan to use a MVCC model, so you will still see the old data.
>
> I missed that! (should catch up first)

FYI: There is some information about what has already been done for jr3 
on the Wiki [1].

Michael

[1] http://wiki.apache.org/jackrabbit/Jackrabbit%203

Re: [jr3 trade consistency for availability]

Posted by Ard Schrijvers <a....@onehippo.com>.

On Thu, Feb 23, 2012 at 1:48 PM, Thomas Mueller <mu...@adobe.com> wrote:
> Hi,
>
>>after a search is executed, you get back a jcr
>>NodeIterator from the search result. In the mean time, while
>>iterating, a node from the result can be deleted by a different
>>session.
>
> We plan to use a MVCC model, so you will still see the old data.

I missed that! (should catch up first)

sounds promising (and challenging).

Thanks a lot for your feedback Thomas

Regards Ard

>

>
> Regards,
> Thomas
>

Re: [jr3 trade consistency for availability]

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>after a search is executed, you get back a jcr
>NodeIterator from the search result. In the mean time, while
>iterating, a node from the result can be deleted by a different
>session.

We plan to use a MVCC model, so you will still see the old data.

>So, the search result is not transactional.

It is (snapshot isolation).

> Which makes much
>sense: We are not building a database!

How I see it is: JCR is a hybrid between a database and a file system.

>Thus, other sessions can query while the
>index queue is non empty.

There is currently no plan to have an index queue.

Regards,
Thomas

Re: [jr3 trade consistency for availability]

Posted by Ard Schrijvers <a....@onehippo.com>.

Hello,

On Thu, Feb 23, 2012 at 12:07 PM, Thomas Mueller <mu...@adobe.com> wrote:

>>then we still don't have
>>transactional searches
>
> My plan was to support searches for data that is persisted (no search in
> the transient space).

Yes this seems very natural to me, so very much agree. However what I
meant was that after a search is executed, you get back a jcr
NodeIterator from the search result. In the mean time, while
iterating, a node from the result can be deleted by a different
session. So, the search result is not transactional. Which makes much
sense: We are not building a database! However, when we cannot and
should not try to implement transactional searches, then why should we
try to create consistent indexes? Would it be consistent enough, that
only a jcr session that does a 'save', has to wait for the index to be
updated with his changes. Thus, other sessions can query while the
index queue is non empty.

Any way, perhaps too much into the details

Regards Ard

> But the index should be updated as part of the save
> operation I think (except for fulltext search).
>
> Regards,
> Thomas
>

-- 
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

Re: [jr3 trade consistency for availability]

Posted by Ard Schrijvers <a....@onehippo.com>.

On Fri, Feb 24, 2012 at 11:30 PM, Alexander Klimetschek
<ak...@adobe.com> wrote:
> Am 23.02.2012 um 11:26 schrieb Ard Schrijvers:
>> I've come to believe over the years, that a generic
>> hierarchical jcr full text index and queries is a bad idea : In the
>> end, it just doesn't scale, is extremely complex to build (Lucene is
>> flat), and even worse, it doesn't seem to satisfy customers/developers
>> in the end: They want to index and search *their* specific model they
>> store in jackrabbit. You can tweak a bit with indexing_configuration
>> kind of things, but in the end, I think a (Lucene) index is just to
>> domain specific
>
> Huge +1.
>
> Allowing to create & define separate search indexes based on some rules (subpath, all cq:Page nodes, similar to indexing_config) would be a really helpful improvement.
>
> One of the main CMS search uses cases is to have different full text search indexes for different sites, say /site1 and /site2. Currently you have to include a location step "/jcr:root/site1//*[jcr:contains(. 'foo')]" which is not indexed due to the optimization for moves, making this much slower than necessary. Separate indexes for site1 + site2 would solve this easily.

For this reason we chose to index the location of documents as well:
We have fast (simple!) path constraint searches now, and as a price,
expensive moves as this needs iterating through the entire tree to
re-index the location. However, I prefer fast path constraints
searches over fast moves: moves normally happen much less

>
> Also, we loose a lot of Lucene's features by hiding them under the search implementation tied to the JCR query specifications. Opening this up would make improvements and/or extensions for search index configurations much easier.

I'll try to summarize a brain dump (I hope I can keep it to the point
:-) on a upcoming jr3 queries mail topic

Regards Ard

>
> Cheers,
> Alex
>
> --
> Alexander Klimetschek
> Developer // Adobe (Day) // Berlin - Basel
>



-- 
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

Re: [jr3 trade consistency for availability]

Posted by Alexander Klimetschek <ak...@adobe.com>.

Am 23.02.2012 um 11:26 schrieb Ard Schrijvers:
> I've come to believe over the years, that a generic
> hierarchical jcr full text index and queries is a bad idea : In the
> end, it just doesn't scale, is extremely complex to build (Lucene is
> flat), and even worse, it doesn't seem to satisfy customers/developers
> in the end: They want to index and search *their* specific model they
> store in jackrabbit. You can tweak a bit with indexing_configuration
> kind of things, but in the end, I think a (Lucene) index is just to
> domain specific

Huge +1.

Allowing to create & define separate search indexes based on some rules (subpath, all cq:Page nodes, similar to indexing_config) would be a really helpful improvement.

One of the main CMS search uses cases is to have different full text search indexes for different sites, say /site1 and /site2. Currently you have to include a location step "/jcr:root/site1//*[jcr:contains(. 'foo')]" which is not indexed due to the optimization for moves, making this much slower than necessary. Separate indexes for site1 + site2 would solve this easily.

Also, we loose a lot of Lucene's features by hiding them under the search implementation tied to the JCR query specifications. Opening this up would make improvements and/or extensions for search index configurations much easier.

Cheers,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel

Re: [jr3 trade consistency for availability]

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>But before discussing the details, what is to be understand by 'query
>index consistency'?

I don't know if I mean the same thing, but...

>Does this mean that the indexes should be consistent with the latest
>persisted data. Thus within a single cluster node, after a persist,
>the index must be updated directly?

That would be an option (I would prefer that), except for fulltext search
which is always updated asynchronous. But updating the index later on
(using the journal) might also be an option. This is not yet defined.
Maybe both will be possible.

> Would this mean that it blocks new
>search requests until the indexing queue is emptied?

I didn't plan to use any kind of 'indexing queue' by default :-) except
for Lucene.

>then we still don't have
>transactional searches

My plan was to support searches for data that is persisted (no search in
the transient space). But the index should be updated as part of the save
operation I think (except for fulltext search).

Regards,
Thomas

Re: [jr3 trade consistency for availability]

Posted by Ard Schrijvers <a....@onehippo.com>.

On Thu, Feb 23, 2012 at 9:09 AM, Marcel Reutegger <mr...@adobe.com> wrote:
>
>> - Lock enforcement?
>
> that's definitively a tough one because it depends on repository
> wide state.
>
>> - Query index consistency?
>
> I think consistency is a prerequisite here, otherwise it's quite
> difficult to implement the query functionality. I'd rather

Personally I am a strong advocate of eventual index consistency,
perhaps because I just cannot see how it can ever be implemented
without putting hard constraints on performance.

But before discussing the details, what is to be understand by 'query
index consistency'?

Does this mean that the indexes should be consistent with the latest
persisted data. Thus within a single cluster node, after a persist,
the index must be updated directly? Would this mean that it blocks new
search requests until the indexing queue is emptied?  Or that an index
should be consistent in a cluster? The latter isn't the case for
jackrabbit 2 any way, right?

Which one of constraints above is considered to be taken into account
doesn't really matter afaics, as I don't see either one of them could
be implemented efficiently...at least, with Lucene in the back of my
head. And even if then all the effort is done and all the burden is
accepted of creating index consistency, then we still don't have
transactional searches, so the search results again still could
contain nodes that are removed after the search was executed

I do understand that relaxing the query index consistency most likely
makes it really hard to implement the (specification) query
functionality. But this might also be a result of the specification
itself. I've come to believe over the years, that a generic
hierarchical jcr full text index and queries is a bad idea : In the
end, it just doesn't scale, is extremely complex to build (Lucene is
flat), and even worse, it doesn't seem to satisfy customers/developers
in the end: They want to index and search *their* specific model they
store in jackrabbit. You can tweak a bit with indexing_configuration
kind of things, but in the end, I think a (Lucene) index is just to
domain specific
If you need a consistent query, because you want to store and query
something like banking accounts, you shouldn't use Jackrabbit (or some
NoSQL db) in the first place imo

Regards Ard

> make compromises for availability. eg. terminate a long query

>
> regards
>  marcel
>

RE: [jr3 trade consistency for availability]

Posted by Marcel Reutegger <mr...@adobe.com>.

Hi,

> Last week's F2F resulted in an initial draft of goals for jr3 [1]. A
> general direction this is taking is trading some of the consistency
> guarantees for better availability (especially in a clustered set up).
> As it stands - and as Jukka already noted - the specifics are currently
> too vague and we need further refinements.
> 
> What are the consistency assumptions a JCR client should be allowed to
> make?
> 
> An approach where temporary inconsistencies are tolerated (i.e. eventual
> consistency) increases availability and throughput. In such a case
> do/can/should we tolerate temporary violations of:
> 
> - Node type constraints?

so far we seem to have only discussed edge cases where node type
constraints could be violated. I think, they are not too relevant in
a real life system. I'd be OK to make some compromises in this area.

> - Access control rights?

I don't think any violations are acceptable here.

> - Lock enforcement?

that's definitively a tough one because it depends on repository
wide state. 

> - Query index consistency?

I think consistency is a prerequisite here, otherwise it's quite
difficult to implement the query functionality. I'd rather
make compromises for availability. eg. terminate a long query
execution with an exception because the snapshot it was 
working on is not available anymore.

> - Atomicity of save operations?

how does a temporary violation of atomic saves look like?
are you thinking of partially visible changes?

regards
 marcel

> - ...?
> 
> Should we offer alternatives in some of these cases? That is, give the
> client the ability to choose between consistency and availability.
> 
> Michael
> 
> 
> [1]
> http://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20J
> ackrabbit%203