You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Carsten Ziegeler <cz...@apache.org> on 2014/08/01 07:56:12 UTC

Re: How to implement a queue in Oak?

I'm wondering if anyone has a good idea how to model a queue with efficient
operations in JCR - or is JCR not suited for this use case?

Regards
Carsten


2014-07-30 15:57 GMT+02:00 Carsten Ziegeler <cz...@apache.org>:

> Using a different storage than JCR would be easy in my case, however I
> *want* to use JCR
>
> Carsten
>
>
> 2014-07-30 14:55 GMT+02:00 Lukas Smith <sm...@pooteeweet.org>:
>
> Hi,
>>
>> I can totally see that it might be useful to be able to go through the
>> Oak/JCR API to have a queue but maybe this is stretching Oak a bit far if
>> you end up with 1k+ queues.
>>
>> However I think it would be great to look more into federation for this.
>> I think ModeShape supports this quite well already, ie. being able to hook
>> in another JCR tree, a file system, a git repository, CMIS .. I am sure
>> that it would also be possible to implement on top of some MQ standard.
>>
>> see also https://docs.jboss.org/author/display/MODE/Federation?_sscc=t
>>
>> regards,
>> Lukas
>>
>> > On 30 Jul 2014, at 14:41, Angela Schreiber <an...@adobe.com> wrote:
>> >
>> > hi carsten
>> >
>> > if you are expecting your nodes to be in a given order (e.g. the
>> > order of creation) you need to have a parent that has orderable
>> > children... in which case we don't make any promises about huge
>> > child collections... it will not work well.
>> >
>> > if you don't have the requirement of ordered children, you can
>> > have _many_ but need to make sure that your parent node doesn't
>> > have orderable children (e.g. oak:Unstructured)... but then you
>> > cannot expect that new children are appended at the "end of the
>> > list"... there is no list and there is not guaranteed order.
>> >
>> > i guess you have a little misunderstanding when it comes to
>> > the concept of orderable child nodes -> JSR 283 will be your friend.
>> >
>> > regards
>> > angela
>> >
>> >> On 30/07/14 13:27, "Carsten Ziegeler" <cz...@apache.org> wrote:
>> >>
>> >> Hi,
>> >>
>> >> afaik with Oak the too many child nodes problem of JR2 is solved,
>> >> therefore
>> >> I'm wondering what the best way to store a queue in the repository is?
>> >>
>> >> In my use cases, there are usually not many items within a single
>> queue,
>> >> let's say a few hundreds. In some cases the queue might grow to some
>> >> thousands but not more than maybe 20k.
>> >>
>> >> The idea is that new entries (nodes) are added to the end of the queue,
>> >> and
>> >> processing would read the first node from the queue, update the
>> properties
>> >> and once done, remove it.
>> >>
>> >> My initial design was to simply store all entries as sub nodes of some
>> >> queue root entry without any hierarchy. addNode should add them at the
>> end
>> >> and simply iteration over the child nodes of the root gives the first
>> >> entry. No need for sortable nodes.
>> >>
>> >> Does this make sense? Is there anything else to be considered?
>> >>
>> >> Regards
>> >> Carsten
>> >> --
>> >> Carsten Ziegeler
>> >> Adobe Research Switzerland
>> >> cziegeler@apache.org
>> >
>>
>
>
>
> --
> Carsten Ziegeler
> Adobe Research Switzerland
> cziegeler@apache.org
>



-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: How to implement a queue in Oak?

Posted by Michael Dürig <md...@apache.org>.
There are the utilities in the org.apache.jackrabbit.commons.flat 
package, which were built for mapping flat structures to a JCR 
hierarchy. See the BTreeManager class for a good starting point.

Michael



On 1.8.14 7:56 , Carsten Ziegeler wrote:
> I'm wondering if anyone has a good idea how to model a queue with efficient
> operations in JCR - or is JCR not suited for this use case?
>
> Regards
> Carsten
>
>
> 2014-07-30 15:57 GMT+02:00 Carsten Ziegeler <cz...@apache.org>:
>
>> Using a different storage than JCR would be easy in my case, however I
>> *want* to use JCR
>>
>> Carsten
>>
>>
>> 2014-07-30 14:55 GMT+02:00 Lukas Smith <sm...@pooteeweet.org>:
>>
>> Hi,
>>>
>>> I can totally see that it might be useful to be able to go through the
>>> Oak/JCR API to have a queue but maybe this is stretching Oak a bit far if
>>> you end up with 1k+ queues.
>>>
>>> However I think it would be great to look more into federation for this.
>>> I think ModeShape supports this quite well already, ie. being able to hook
>>> in another JCR tree, a file system, a git repository, CMIS .. I am sure
>>> that it would also be possible to implement on top of some MQ standard.
>>>
>>> see also https://docs.jboss.org/author/display/MODE/Federation?_sscc=t
>>>
>>> regards,
>>> Lukas
>>>
>>>> On 30 Jul 2014, at 14:41, Angela Schreiber <an...@adobe.com> wrote:
>>>>
>>>> hi carsten
>>>>
>>>> if you are expecting your nodes to be in a given order (e.g. the
>>>> order of creation) you need to have a parent that has orderable
>>>> children... in which case we don't make any promises about huge
>>>> child collections... it will not work well.
>>>>
>>>> if you don't have the requirement of ordered children, you can
>>>> have _many_ but need to make sure that your parent node doesn't
>>>> have orderable children (e.g. oak:Unstructured)... but then you
>>>> cannot expect that new children are appended at the "end of the
>>>> list"... there is no list and there is not guaranteed order.
>>>>
>>>> i guess you have a little misunderstanding when it comes to
>>>> the concept of orderable child nodes -> JSR 283 will be your friend.
>>>>
>>>> regards
>>>> angela
>>>>
>>>>> On 30/07/14 13:27, "Carsten Ziegeler" <cz...@apache.org> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> afaik with Oak the too many child nodes problem of JR2 is solved,
>>>>> therefore
>>>>> I'm wondering what the best way to store a queue in the repository is?
>>>>>
>>>>> In my use cases, there are usually not many items within a single
>>> queue,
>>>>> let's say a few hundreds. In some cases the queue might grow to some
>>>>> thousands but not more than maybe 20k.
>>>>>
>>>>> The idea is that new entries (nodes) are added to the end of the queue,
>>>>> and
>>>>> processing would read the first node from the queue, update the
>>> properties
>>>>> and once done, remove it.
>>>>>
>>>>> My initial design was to simply store all entries as sub nodes of some
>>>>> queue root entry without any hierarchy. addNode should add them at the
>>> end
>>>>> and simply iteration over the child nodes of the root gives the first
>>>>> entry. No need for sortable nodes.
>>>>>
>>>>> Does this make sense? Is there anything else to be considered?
>>>>>
>>>>> Regards
>>>>> Carsten
>>>>> --
>>>>> Carsten Ziegeler
>>>>> Adobe Research Switzerland
>>>>> cziegeler@apache.org
>>>>
>>>
>>
>>
>>
>> --
>> Carsten Ziegeler
>> Adobe Research Switzerland
>> cziegeler@apache.org
>>
>
>
>

Re: How to implement a queue in Oak?

Posted by Davide Giannella <da...@apache.org>.
On 01/08/2014 10:42, Tommaso Teofili wrote:
> I don't yet have a proper proposal, but maybe what could be done may be
> something similar to what Davide has done with regards to the ordered index
> (using a skiplist), that is defining a specific structure of the nodes,
> together with a specific implementation, that avoids having to use sortable
> node types but is an inherently sorted structure, e.g. binary search tree.
> Any opinions?

Unfortunately both the above options won't efficiently work for a reason
or another as we discovered during the implementations of the ordered index.

Skip list:

* doesn't really scale with large numbers (500k-1M) as of the randomised
access due to the probabilistic data structure. On the Segment case for
example, fetching the "next" property resulted in an expensive
operations with big data as you end-up most probably fetching a new
segment that is not cached (read from disk).

* in case of mongo back-end is not concurrent due to the storing of the
"next" property under the node. The stack will raise a merge exception
and it can't be catch by the commit hook. That's why we have to keep it
asynchronous when running on mongo. Segment is not affected.

BTree:

Considered it but it requires sooner or later a big tree moving due to
tree restructure and this is not efficient with oak. With the result
that some CUD operations could take long to complete.

Hopefully if the new algorithm will result in a fully working and
scalable one we could consider making it a node type.

Cheers
Davide

 

Re: How to implement a queue in Oak?

Posted by Tommaso Teofili <to...@gmail.com>.
I don't yet have a proper proposal, but maybe what could be done may be
something similar to what Davide has done with regards to the ordered index
(using a skiplist), that is defining a specific structure of the nodes,
together with a specific implementation, that avoids having to use sortable
node types but is an inherently sorted structure, e.g. binary search tree.
Any opinions?

Regards,
Tommaso


2014-08-01 7:56 GMT+02:00 Carsten Ziegeler <cz...@apache.org>:

> I'm wondering if anyone has a good idea how to model a queue with efficient
> operations in JCR - or is JCR not suited for this use case?
>
> Regards
> Carsten
>
>
> 2014-07-30 15:57 GMT+02:00 Carsten Ziegeler <cz...@apache.org>:
>
> > Using a different storage than JCR would be easy in my case, however I
> > *want* to use JCR
> >
> > Carsten
> >
> >
> > 2014-07-30 14:55 GMT+02:00 Lukas Smith <sm...@pooteeweet.org>:
> >
> > Hi,
> >>
> >> I can totally see that it might be useful to be able to go through the
> >> Oak/JCR API to have a queue but maybe this is stretching Oak a bit far
> if
> >> you end up with 1k+ queues.
> >>
> >> However I think it would be great to look more into federation for this.
> >> I think ModeShape supports this quite well already, ie. being able to
> hook
> >> in another JCR tree, a file system, a git repository, CMIS .. I am sure
> >> that it would also be possible to implement on top of some MQ standard.
> >>
> >> see also https://docs.jboss.org/author/display/MODE/Federation?_sscc=t
> >>
> >> regards,
> >> Lukas
> >>
> >> > On 30 Jul 2014, at 14:41, Angela Schreiber <an...@adobe.com> wrote:
> >> >
> >> > hi carsten
> >> >
> >> > if you are expecting your nodes to be in a given order (e.g. the
> >> > order of creation) you need to have a parent that has orderable
> >> > children... in which case we don't make any promises about huge
> >> > child collections... it will not work well.
> >> >
> >> > if you don't have the requirement of ordered children, you can
> >> > have _many_ but need to make sure that your parent node doesn't
> >> > have orderable children (e.g. oak:Unstructured)... but then you
> >> > cannot expect that new children are appended at the "end of the
> >> > list"... there is no list and there is not guaranteed order.
> >> >
> >> > i guess you have a little misunderstanding when it comes to
> >> > the concept of orderable child nodes -> JSR 283 will be your friend.
> >> >
> >> > regards
> >> > angela
> >> >
> >> >> On 30/07/14 13:27, "Carsten Ziegeler" <cz...@apache.org> wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> afaik with Oak the too many child nodes problem of JR2 is solved,
> >> >> therefore
> >> >> I'm wondering what the best way to store a queue in the repository
> is?
> >> >>
> >> >> In my use cases, there are usually not many items within a single
> >> queue,
> >> >> let's say a few hundreds. In some cases the queue might grow to some
> >> >> thousands but not more than maybe 20k.
> >> >>
> >> >> The idea is that new entries (nodes) are added to the end of the
> queue,
> >> >> and
> >> >> processing would read the first node from the queue, update the
> >> properties
> >> >> and once done, remove it.
> >> >>
> >> >> My initial design was to simply store all entries as sub nodes of
> some
> >> >> queue root entry without any hierarchy. addNode should add them at
> the
> >> end
> >> >> and simply iteration over the child nodes of the root gives the first
> >> >> entry. No need for sortable nodes.
> >> >>
> >> >> Does this make sense? Is there anything else to be considered?
> >> >>
> >> >> Regards
> >> >> Carsten
> >> >> --
> >> >> Carsten Ziegeler
> >> >> Adobe Research Switzerland
> >> >> cziegeler@apache.org
> >> >
> >>
> >
> >
> >
> > --
> > Carsten Ziegeler
> > Adobe Research Switzerland
> > cziegeler@apache.org
> >
>
>
>
> --
> Carsten Ziegeler
> Adobe Research Switzerland
> cziegeler@apache.org
>

Re: AW: How to implement a queue in Oak?

Posted by Angela Schreiber <an...@adobe.com>.
Hi Thomas

What about making the proposed solution built-in with Oak?

For example by defining a mixin type oak:Created extending
from mix:created... the latter is already widely used and the
oak extension could just offer the additional ability to
be orderable upon search.

In addition the solution would not need to be implemented multiple times.

Kind regards
Angela

On 06/08/14 13:40, "Thomas Mueller" <mu...@adobe.com> wrote:

>Hi,
>
>The following should work:
>
>- use a node type without _orderable_ child nodes
>- in each node, add a property "created" with the timestamp plus a unique
>id
>
>- to get the first entry, use a query ("order by created")
>
>This should scale well once synchronous, orderable indexes are
>implemented.
>
>The problem with orderable child nodes is that in a cluster, this either
>gets many conflicts or doesn't scale well. The problem with using a tree
>structure ("/queue/2014-08/06T13/40_45") is that it's hard to implement.
>
>Please note that in Oak, right now, orderable indexes are not synchronous.
>But I think this is a problem we can and should solve soon. Orderable
>child nodes should also be improved in my view, but that's a area where we
>know there are limitations.
>
>Regards,
>Thomas
>
>
>On 05/08/14 13:37, "Michael Dürig" <md...@apache.org> wrote:
>
>>
>>
>>On 5.8.14 12:22 , Christian Keller wrote:
>>> Jumping in here, where the thread developed to a discussion about the
>>>repository implementation of orderable Nodes.
>>> I like to come back to the use-cases needed.
>>>
>>> These can be summarized that I expect a certain order upon access of
>>>the Node's NodeIterator.
>>
>>That order is up to the implementation and applications should not
>>depend on this. See 23.5 Non-orderable Child Nodes:
>>http://www.day.com/specs/jcr/2.0/23_Orderable_Child_Nodes.html#23.5%20Non
>>-
>>orderable%20Child%20Nodes
>>
>>> In any cases I had in CQ there was never a requirement of strict
>>>ordering.
>>> But it was used to allow a user defined order: The Page-Order in the
>>>navigation.
>>> So I wonder if a basic non strict order to allow for JCR order support
>>>is sufficient.
>>>
>>> For any other orders I wonder if an additional method to
>>>Nod#getNodes(Comparator) would not be sufficient.
>>
>>As I don't think this can scale to many child nodes, you could use an
>>orderable node type as well.
>>
>>Michael
>>
>>>
>>> With this you can move the burden of ordering on access.
>>> A time sequence for child-node could be easily achived on the
>>>jcr:created Property
>>>
>>> Regards
>>> Christian
>>>
>>> ________________________________________
>>> Von: Jukka Zitting <ju...@zitting.name>
>>> Gesendet: Montag, 4. August 2014 20:02
>>> An: oak-dev@jackrabbit.apache.org
>>> Betreff: Re: How to implement a queue in Oak?
>>>
>>> Hi,
>>>
>>> On Mon, Aug 4, 2014 at 8:04 AM, Felix Meschberger <fm...@adobe.com>
>>>wrote:
>>>> I have the impression, that JCR itself can adequately solve the
>>>>problem using ordered child nodes.
>>>>
>>>> The problem is not JCR per-se but the implementation and the early-on
>>>>decision to not optimize
>>>> the use case of ordered child nodes (in favor of having support for
>>>>optimized implementation of
>>>> the unsorted case).
>>>
>>> The preference against ordering is more than just a performance
>>> optimization. Maintaining strict ordering semantics is impossible in
>>> an eventually consistent system with no global synchronization. For
>>> example, if two clients on different cluster nodes add entries at the
>>> end of the list at the same time, which entry should come first? And
>>> how do we ensure that the ordering remains consistent across all
>>> clients?
>>>
>>>> Now with Oak, we also have customizable indices. Would it be possible
>>>>to define an ordering
>>>> property, index that property and then use a query to get these nodes.
>>>>The query could be
>>>> created such that the ordering property index would be considered
>>>>(only). As a result we get
>>>> quick and transparent sorted nodes?
>>>
>>> Yes, that's possible. My "order by node name" suggestion works roughly
>>> in the same way. The problem with these approaches is that you won't
>>> get strict ordering semantics without some external synchronization or
>>> a global clock to decide how the values of the ordering property
>>> should be assigned. However, it sounds like in Carsten's case relaxed
>>> ordering semantics based on millisecond timings from the system clock
>>> should be good enough.
>>>
>>> BR,
>>>
>>> Jukka Zitting
>>>
>


Re: AW: How to implement a queue in Oak?

Posted by Davide Giannella <da...@apache.org>.
On 06/08/2014 13:40, Thomas Mueller wrote:
> Hi,
>
> The following should work:
>
> - use a node type without _orderable_ child nodes
> - in each node, add a property "created" with the timestamp plus a unique
> id
Avoid re-using jcr:created as you risk to bloat the repository size due
to the fact that almost all nodes in a JCR repo has jcr:created.
Therefore avoid indexing the property or use a custom node type on which
you could restrict the index.

> - to get the first entry, use a query ("order by created")
>
> This should scale well once synchronous, orderable indexes are implemented.
>
> The problem with orderable child nodes is that in a cluster, this either
> gets many conflicts or doesn't scale well. The problem with using a tree
> structure ("/queue/2014-08/06T13/40_45") is that it's hard to implement.
>
> Please note that in Oak, right now, orderable indexes are not synchronous.
> But I think this is a problem we can and should solve soon. Orderable
> child nodes should also be improved in my view, but that's a area where we
> know there are limitations.
>
Actually if you run on Segment the index could be synchronous as Segment
serialise the writing and no conflict arise. However the
scalability/performance issues we had with current algorithm still remain.

For sake of records, history and experimental incomplete code :)

http://markmail.org/thread/qdrvm6rwnblu3rdn
https://issues.apache.org/jira/browse/OAK-1717
https://github.com/davidegiannella/jackrabbit-oak/tree/OAK-1717

Cheers
D.



Re: AW: How to implement a queue in Oak?

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

The following should work:

- use a node type without _orderable_ child nodes
- in each node, add a property "created" with the timestamp plus a unique
id

- to get the first entry, use a query ("order by created")

This should scale well once synchronous, orderable indexes are implemented.

The problem with orderable child nodes is that in a cluster, this either
gets many conflicts or doesn't scale well. The problem with using a tree
structure ("/queue/2014-08/06T13/40_45") is that it's hard to implement.

Please note that in Oak, right now, orderable indexes are not synchronous.
But I think this is a problem we can and should solve soon. Orderable
child nodes should also be improved in my view, but that's a area where we
know there are limitations.

Regards,
Thomas


On 05/08/14 13:37, "Michael Dürig" <md...@apache.org> wrote:

>
>
>On 5.8.14 12:22 , Christian Keller wrote:
>> Jumping in here, where the thread developed to a discussion about the
>>repository implementation of orderable Nodes.
>> I like to come back to the use-cases needed.
>>
>> These can be summarized that I expect a certain order upon access of
>>the Node's NodeIterator.
>
>That order is up to the implementation and applications should not
>depend on this. See 23.5 Non-orderable Child Nodes:
>http://www.day.com/specs/jcr/2.0/23_Orderable_Child_Nodes.html#23.5%20Non-
>orderable%20Child%20Nodes
>
>> In any cases I had in CQ there was never a requirement of strict
>>ordering.
>> But it was used to allow a user defined order: The Page-Order in the
>>navigation.
>> So I wonder if a basic non strict order to allow for JCR order support
>>is sufficient.
>>
>> For any other orders I wonder if an additional method to
>>Nod#getNodes(Comparator) would not be sufficient.
>
>As I don't think this can scale to many child nodes, you could use an
>orderable node type as well.
>
>Michael
>
>>
>> With this you can move the burden of ordering on access.
>> A time sequence for child-node could be easily achived on the
>>jcr:created Property
>>
>> Regards
>> Christian
>>
>> ________________________________________
>> Von: Jukka Zitting <ju...@zitting.name>
>> Gesendet: Montag, 4. August 2014 20:02
>> An: oak-dev@jackrabbit.apache.org
>> Betreff: Re: How to implement a queue in Oak?
>>
>> Hi,
>>
>> On Mon, Aug 4, 2014 at 8:04 AM, Felix Meschberger <fm...@adobe.com>
>>wrote:
>>> I have the impression, that JCR itself can adequately solve the
>>>problem using ordered child nodes.
>>>
>>> The problem is not JCR per-se but the implementation and the early-on
>>>decision to not optimize
>>> the use case of ordered child nodes (in favor of having support for
>>>optimized implementation of
>>> the unsorted case).
>>
>> The preference against ordering is more than just a performance
>> optimization. Maintaining strict ordering semantics is impossible in
>> an eventually consistent system with no global synchronization. For
>> example, if two clients on different cluster nodes add entries at the
>> end of the list at the same time, which entry should come first? And
>> how do we ensure that the ordering remains consistent across all
>> clients?
>>
>>> Now with Oak, we also have customizable indices. Would it be possible
>>>to define an ordering
>>> property, index that property and then use a query to get these nodes.
>>>The query could be
>>> created such that the ordering property index would be considered
>>>(only). As a result we get
>>> quick and transparent sorted nodes?
>>
>> Yes, that's possible. My "order by node name" suggestion works roughly
>> in the same way. The problem with these approaches is that you won't
>> get strict ordering semantics without some external synchronization or
>> a global clock to decide how the values of the ordering property
>> should be assigned. However, it sounds like in Carsten's case relaxed
>> ordering semantics based on millisecond timings from the system clock
>> should be good enough.
>>
>> BR,
>>
>> Jukka Zitting
>>


AW: AW: How to implement a queue in Oak?

Posted by Christian Keller <ch...@adobe.com>.
________________________________________
Von: Michael Dürig <md...@apache.org>
Gesendet: Dienstag, 5. August 2014 13:37
An: oak-dev@jackrabbit.apache.org
Betreff: Re: AW: How to implement a queue in Oak?

>> In any cases I had in CQ there was never a requirement of strict ordering.
>> But it was used to allow a user defined order: The Page-Order in the navigation.
>> So I wonder if a basic non strict order to allow for JCR order support is sufficient.
>>
>> For any other orders I wonder if an additional method to Nod#getNodes(Comparator) would not be sufficient.
>
>As I don't think this can scale to many child nodes, you could use an
> orderable node type as well.

For sake of correctness: What you suggest is something different: 
The orderable NodeType is the order of the persistence. 
Where I say offer a convenient way to give an arbitrary order, than applications may stop trying to use persistence to implement their specific order needs. 


Michael

>
> With this you can move the burden of ordering on access.
> A time sequence for child-node could be easily achived on the jcr:created Property
>
> Regards
> Christian
>
> ________________________________________
> Von: Jukka Zitting <ju...@zitting.name>
> Gesendet: Montag, 4. August 2014 20:02
> An: oak-dev@jackrabbit.apache.org
> Betreff: Re: How to implement a queue in Oak?
>
> Hi,
>
> On Mon, Aug 4, 2014 at 8:04 AM, Felix Meschberger <fm...@adobe.com> wrote:
>> I have the impression, that JCR itself can adequately solve the problem using ordered child nodes.
>>
>> The problem is not JCR per-se but the implementation and the early-on decision to not optimize
>> the use case of ordered child nodes (in favor of having support for optimized implementation of
>> the unsorted case).
>
> The preference against ordering is more than just a performance
> optimization. Maintaining strict ordering semantics is impossible in
> an eventually consistent system with no global synchronization. For
> example, if two clients on different cluster nodes add entries at the
> end of the list at the same time, which entry should come first? And
> how do we ensure that the ordering remains consistent across all
> clients?
>
>> Now with Oak, we also have customizable indices. Would it be possible to define an ordering
>> property, index that property and then use a query to get these nodes. The query could be
>> created such that the ordering property index would be considered (only). As a result we get
>> quick and transparent sorted nodes?
>
> Yes, that's possible. My "order by node name" suggestion works roughly
> in the same way. The problem with these approaches is that you won't
> get strict ordering semantics without some external synchronization or
> a global clock to decide how the values of the ordering property
> should be assigned. However, it sounds like in Carsten's case relaxed
> ordering semantics based on millisecond timings from the system clock
> should be good enough.
>
> BR,
>
> Jukka Zitting
>

Re: AW: How to implement a queue in Oak?

Posted by Michael Dürig <md...@apache.org>.

On 5.8.14 12:22 , Christian Keller wrote:
> Jumping in here, where the thread developed to a discussion about the repository implementation of orderable Nodes.
> I like to come back to the use-cases needed.
>
> These can be summarized that I expect a certain order upon access of the Node's NodeIterator.

That order is up to the implementation and applications should not 
depend on this. See 23.5 Non-orderable Child Nodes: 
http://www.day.com/specs/jcr/2.0/23_Orderable_Child_Nodes.html#23.5%20Non-orderable%20Child%20Nodes

> In any cases I had in CQ there was never a requirement of strict ordering.
> But it was used to allow a user defined order: The Page-Order in the navigation.
> So I wonder if a basic non strict order to allow for JCR order support is sufficient.
>
> For any other orders I wonder if an additional method to Nod#getNodes(Comparator) would not be sufficient.

As I don't think this can scale to many child nodes, you could use an 
orderable node type as well.

Michael

>
> With this you can move the burden of ordering on access.
> A time sequence for child-node could be easily achived on the jcr:created Property
>
> Regards
> Christian
>
> ________________________________________
> Von: Jukka Zitting <ju...@zitting.name>
> Gesendet: Montag, 4. August 2014 20:02
> An: oak-dev@jackrabbit.apache.org
> Betreff: Re: How to implement a queue in Oak?
>
> Hi,
>
> On Mon, Aug 4, 2014 at 8:04 AM, Felix Meschberger <fm...@adobe.com> wrote:
>> I have the impression, that JCR itself can adequately solve the problem using ordered child nodes.
>>
>> The problem is not JCR per-se but the implementation and the early-on decision to not optimize
>> the use case of ordered child nodes (in favor of having support for optimized implementation of
>> the unsorted case).
>
> The preference against ordering is more than just a performance
> optimization. Maintaining strict ordering semantics is impossible in
> an eventually consistent system with no global synchronization. For
> example, if two clients on different cluster nodes add entries at the
> end of the list at the same time, which entry should come first? And
> how do we ensure that the ordering remains consistent across all
> clients?
>
>> Now with Oak, we also have customizable indices. Would it be possible to define an ordering
>> property, index that property and then use a query to get these nodes. The query could be
>> created such that the ordering property index would be considered (only). As a result we get
>> quick and transparent sorted nodes?
>
> Yes, that's possible. My "order by node name" suggestion works roughly
> in the same way. The problem with these approaches is that you won't
> get strict ordering semantics without some external synchronization or
> a global clock to decide how the values of the ordering property
> should be assigned. However, it sounds like in Carsten's case relaxed
> ordering semantics based on millisecond timings from the system clock
> should be good enough.
>
> BR,
>
> Jukka Zitting
>

AW: How to implement a queue in Oak?

Posted by Christian Keller <ch...@adobe.com>.
Jumping in here, where the thread developed to a discussion about the repository implementation of orderable Nodes.
I like to come back to the use-cases needed.

These can be summarized that I expect a certain order upon access of the Node's NodeIterator.
In any cases I had in CQ there was never a requirement of strict ordering.
But it was used to allow a user defined order: The Page-Order in the navigation.
So I wonder if a basic non strict order to allow for JCR order support is sufficient.

For any other orders I wonder if an additional method to Nod#getNodes(Comparator) would not be sufficient.

With this you can move the burden of ordering on access.
A time sequence for child-node could be easily achived on the jcr:created Property 

Regards
Christian

________________________________________
Von: Jukka Zitting <ju...@zitting.name>
Gesendet: Montag, 4. August 2014 20:02
An: oak-dev@jackrabbit.apache.org
Betreff: Re: How to implement a queue in Oak?

Hi,

On Mon, Aug 4, 2014 at 8:04 AM, Felix Meschberger <fm...@adobe.com> wrote:
> I have the impression, that JCR itself can adequately solve the problem using ordered child nodes.
>
> The problem is not JCR per-se but the implementation and the early-on decision to not optimize
> the use case of ordered child nodes (in favor of having support for optimized implementation of
> the unsorted case).

The preference against ordering is more than just a performance
optimization. Maintaining strict ordering semantics is impossible in
an eventually consistent system with no global synchronization. For
example, if two clients on different cluster nodes add entries at the
end of the list at the same time, which entry should come first? And
how do we ensure that the ordering remains consistent across all
clients?

> Now with Oak, we also have customizable indices. Would it be possible to define an ordering
> property, index that property and then use a query to get these nodes. The query could be
> created such that the ordering property index would be considered (only). As a result we get
> quick and transparent sorted nodes?

Yes, that's possible. My "order by node name" suggestion works roughly
in the same way. The problem with these approaches is that you won't
get strict ordering semantics without some external synchronization or
a global clock to decide how the values of the ordering property
should be assigned. However, it sounds like in Carsten's case relaxed
ordering semantics based on millisecond timings from the system clock
should be good enough.

BR,

Jukka Zitting

Re: How to implement a queue in Oak?

Posted by Jukka Zitting <ju...@zitting.name>.
Hi,

On Mon, Aug 4, 2014 at 8:04 AM, Felix Meschberger <fm...@adobe.com> wrote:
> I have the impression, that JCR itself can adequately solve the problem using ordered child nodes.
>
> The problem is not JCR per-se but the implementation and the early-on decision to not optimize
> the use case of ordered child nodes (in favor of having support for optimized implementation of
> the unsorted case).

The preference against ordering is more than just a performance
optimization. Maintaining strict ordering semantics is impossible in
an eventually consistent system with no global synchronization. For
example, if two clients on different cluster nodes add entries at the
end of the list at the same time, which entry should come first? And
how do we ensure that the ordering remains consistent across all
clients?

> Now with Oak, we also have customizable indices. Would it be possible to define an ordering
> property, index that property and then use a query to get these nodes. The query could be
> created such that the ordering property index would be considered (only). As a result we get
> quick and transparent sorted nodes?

Yes, that's possible. My "order by node name" suggestion works roughly
in the same way. The problem with these approaches is that you won't
get strict ordering semantics without some external synchronization or
a global clock to decide how the values of the ordering property
should be assigned. However, it sounds like in Carsten's case relaxed
ordering semantics based on millisecond timings from the system clock
should be good enough.

BR,

Jukka Zitting

Re: How to implement a queue in Oak?

Posted by Felix Meschberger <fm...@adobe.com>.
Hi

I have the impression, that JCR itself can adequately solve the problem using ordered child nodes.

The problem is not JCR per-se but the implementation and the early-on decision to not optimize the use case of ordered child nodes (in favor of having support for optimized implementation of the unsorted case).

Now with Oak, we also have customizable indices. Would it be possible to define an ordering property, index that property and then use a query to get these nodes. The query could be created such that the ordering property index would be considered (only). As a result we get quick and transparent sorted nodes ?

Regards
Felix

Am 01.08.2014 um 07:56 schrieb Carsten Ziegeler <cz...@apache.org>:

> I'm wondering if anyone has a good idea how to model a queue with efficient
> operations in JCR - or is JCR not suited for this use case?
> 
> Regards
> Carsten
> 
> 
> 2014-07-30 15:57 GMT+02:00 Carsten Ziegeler <cz...@apache.org>:
> 
>> Using a different storage than JCR would be easy in my case, however I
>> *want* to use JCR
>> 
>> Carsten
>> 
>> 
>> 2014-07-30 14:55 GMT+02:00 Lukas Smith <sm...@pooteeweet.org>:
>> 
>> Hi,
>>> 
>>> I can totally see that it might be useful to be able to go through the
>>> Oak/JCR API to have a queue but maybe this is stretching Oak a bit far if
>>> you end up with 1k+ queues.
>>> 
>>> However I think it would be great to look more into federation for this.
>>> I think ModeShape supports this quite well already, ie. being able to hook
>>> in another JCR tree, a file system, a git repository, CMIS .. I am sure
>>> that it would also be possible to implement on top of some MQ standard.
>>> 
>>> see also https://docs.jboss.org/author/display/MODE/Federation?_sscc=t
>>> 
>>> regards,
>>> Lukas
>>> 
>>>> On 30 Jul 2014, at 14:41, Angela Schreiber <an...@adobe.com> wrote:
>>>> 
>>>> hi carsten
>>>> 
>>>> if you are expecting your nodes to be in a given order (e.g. the
>>>> order of creation) you need to have a parent that has orderable
>>>> children... in which case we don't make any promises about huge
>>>> child collections... it will not work well.
>>>> 
>>>> if you don't have the requirement of ordered children, you can
>>>> have _many_ but need to make sure that your parent node doesn't
>>>> have orderable children (e.g. oak:Unstructured)... but then you
>>>> cannot expect that new children are appended at the "end of the
>>>> list"... there is no list and there is not guaranteed order.
>>>> 
>>>> i guess you have a little misunderstanding when it comes to
>>>> the concept of orderable child nodes -> JSR 283 will be your friend.
>>>> 
>>>> regards
>>>> angela
>>>> 
>>>>> On 30/07/14 13:27, "Carsten Ziegeler" <cz...@apache.org> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> afaik with Oak the too many child nodes problem of JR2 is solved,
>>>>> therefore
>>>>> I'm wondering what the best way to store a queue in the repository is?
>>>>> 
>>>>> In my use cases, there are usually not many items within a single
>>> queue,
>>>>> let's say a few hundreds. In some cases the queue might grow to some
>>>>> thousands but not more than maybe 20k.
>>>>> 
>>>>> The idea is that new entries (nodes) are added to the end of the queue,
>>>>> and
>>>>> processing would read the first node from the queue, update the
>>> properties
>>>>> and once done, remove it.
>>>>> 
>>>>> My initial design was to simply store all entries as sub nodes of some
>>>>> queue root entry without any hierarchy. addNode should add them at the
>>> end
>>>>> and simply iteration over the child nodes of the root gives the first
>>>>> entry. No need for sortable nodes.
>>>>> 
>>>>> Does this make sense? Is there anything else to be considered?
>>>>> 
>>>>> Regards
>>>>> Carsten
>>>>> --
>>>>> Carsten Ziegeler
>>>>> Adobe Research Switzerland
>>>>> cziegeler@apache.org
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Carsten Ziegeler
>> Adobe Research Switzerland
>> cziegeler@apache.org
>> 
> 
> 
> 
> -- 
> Carsten Ziegeler
> Adobe Research Switzerland
> cziegeler@apache.org