You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Jukka Zitting <ju...@gmail.com> on 2006/10/04 02:09:09 UTC

Support concurrent node modifications

Hi,

The issue of concurrent node modifications from different sessions not
being supported has (again) recently been raised. It seems like a
really useful feature that is only prevented by the limitations of the
current NodeState implementation. Are there other reasons why we
couldn't support such modifications?

More fundamentally, I'd be interested in finding out ways to separate
the parent-child relationship handling from the parent NodeState. The
current design is the root cause for us not supporting content models
with unlimited children per node.

BR,

Jukka Zitting

-- 
Yukatan - http://yukatan.fi/ - info@yukatan.fi
Software craftsmanship, JCR consulting, and Java development

Re: Support concurrent node modifications

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 10/4/06, Stefan Guggisberg <st...@gmail.com> wrote:
> i dare to doubt that a wiki calls for flat hierarchies. wiki's are inherently
> structured and this structure could be easily reflected in a hierarchical
> model.

Of course there are many different types of wikis out there, but the
most basic idea of a wiki is an unstructured collection of interlinked
pages. A flat space would intuitively be my first shot at designing a
wiki content model.

Look at Wikipedia for an example. It does have some hierarchical
add-on structure (talk pages, etc.), but the main content model is
very flat.

BR,

Jukka Zitting

-- 
Yukatan - http://yukatan.fi/ - info@yukatan.fi
Software craftsmanship, JCR consulting, and Java development

Re: Support concurrent node modifications

Posted by Stefan Guggisberg <st...@gmail.com>.
On 10/4/06, Stefan Rinner <st...@collettiva.com> wrote:
>
> On Oct 4, 2006, at 9:33 AM, Thomas Mueller wrote:
>
> > Hi,
> >
> > I agree. Of course there are other (may be better) ways to achieve
> > the same
> > without such problems, like using a deeper hierarchy. But the flat
> > hierarchy
> > is an important use case IMO. Many developers are used to this, and
> > it will
> > be hard to educate them to do it in a different way. I think it
> > should be
> > considered to change the implementation to allow fast addition of
> > child
> > nodes to a node. I'm not sure if this change alone will solve all
> > problems
> > (may be not).
> >
> > At the same time, I think the current model of node references
> > should be
> > changed to allow O(1) time when adding or removing a reference to a
> > node. In
> > my opinion, the same mechanism could be used for references and for
> > child
> > nodes.
>
> I definitely second these ideas.
> We are currently implementing a wiki which calls for flat hierarchies
> (yes you can invent some hierarchies but there is no real hierarchy
> in the data so it's more like adding unnecessary complexity).

i dare to doubt that a wiki calls for flat hierarchies. wiki's are inherently
structured and this structure could be easily reflected in a hierarchical
model.

>
> We are also running against the limitations of the current references
> implementation (all references of a node in a blob and therefore
> truncation if you got a LOT of references).
> By the way - is there really no rdbms-based persistence manager out
> there using proper relations instead of blobs for references?

afaik, no. however, feel free to roll your own, and as always, contributions
are very welcome ;)

cheers
stefan

>
>
> - stefan
>

Re: Support concurrent node modifications

Posted by Stefan Rinner <st...@collettiva.com>.
On Oct 4, 2006, at 9:33 AM, Thomas Mueller wrote:

> Hi,
>
> I agree. Of course there are other (may be better) ways to achieve  
> the same
> without such problems, like using a deeper hierarchy. But the flat  
> hierarchy
> is an important use case IMO. Many developers are used to this, and  
> it will
> be hard to educate them to do it in a different way. I think it  
> should be
> considered to change the implementation to allow fast addition of  
> child
> nodes to a node. I'm not sure if this change alone will solve all  
> problems
> (may be not).
>
> At the same time, I think the current model of node references  
> should be
> changed to allow O(1) time when adding or removing a reference to a  
> node. In
> my opinion, the same mechanism could be used for references and for  
> child
> nodes.

I definitely second these ideas.
We are currently implementing a wiki which calls for flat hierarchies  
(yes you can invent some hierarchies but there is no real hierarchy  
in the data so it's more like adding unnecessary complexity).

We are also running against the limitations of the current references  
implementation (all references of a node in a blob and therefore  
truncation if you got a LOT of references).
By the way - is there really no rdbms-based persistence manager out  
there using proper relations instead of blobs for references?


- stefan

Re: Support concurrent node modifications

Posted by Stefan Guggisberg <st...@gmail.com>.
i agree with tobi's statements.

since this is a rather major core change with potential
backward compatibility issues i guess this would be
a candidate for a 2.0 release.

for the time being i created an issue which should significantly
improve the current situation and which should be relatively easy
to implement:
https://issues.apache.org/jira/browse/JCR-584

cheers
stefan

On 10/4/06, Tobias Bocanegra <to...@day.com> wrote:
> > Proposed change:
> > - the pointer from child to parent should stay
> > - list of child nodes per node should be stored as a tree (b-tree or binary
> > tree)
> > - the list of references to a node should be stored as a tree as well
> >
> > The actual b-tree implementation does not need to be in Jackrabbit. When
> > using databases as persistent manager, it is possible to use a regular
> > b-tree index for example. This feature may not supported by all persistent
> > managers.
>
> this not only a problem of the storage in the persistence manager, but
> rather how the items are layed out in memory. if every node state has
> a list of child node entries, the update will always fail in case of
> concurrent modifications. further, such updates MUST fail in the case
> of ordered child nodes. otherwise the application can't rely on the
> proper ordering of the nodes.
>
> what is needed is a (b)tree representation in the memory which can be
> dynamically loaded, if nodes are missing. and that support concurrent
> hierarchy modifications, especially 'adding a childnode' (which should
> be possible to propagate down to the pm).
>
> how the items are stored in the persistence manager is secondary.
> since at the time the data is stored, the 'consolidated' states are
> already available in the shareditemstatemanager.
>
> the references should probably be stored more finegrained. i.e. the
> following operations should be supported by the 'referencemanager' and
> the respective pms:
>   addReference(source, target)
>   modifyReference(source, target) // could be covered by addReference
>   removeReference(source, target)
>   getReferences(target)
>   getReference(source) // which is optional, since this is stored in
> the property
>
> regards, toby
> --
> -----------------------------------------< tobias.bocanegra@day.com >---
> Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
> T +41 61 226 98 98, F +41 61 226 98 97
> -----------------------------------------------< http://www.day.com >---
>

Re: Support concurrent node modifications

Posted by Tobias Bocanegra <to...@day.com>.
> Proposed change:
> - the pointer from child to parent should stay
> - list of child nodes per node should be stored as a tree (b-tree or binary
> tree)
> - the list of references to a node should be stored as a tree as well
>
> The actual b-tree implementation does not need to be in Jackrabbit. When
> using databases as persistent manager, it is possible to use a regular
> b-tree index for example. This feature may not supported by all persistent
> managers.

this not only a problem of the storage in the persistence manager, but
rather how the items are layed out in memory. if every node state has
a list of child node entries, the update will always fail in case of
concurrent modifications. further, such updates MUST fail in the case
of ordered child nodes. otherwise the application can't rely on the
proper ordering of the nodes.

what is needed is a (b)tree representation in the memory which can be
dynamically loaded, if nodes are missing. and that support concurrent
hierarchy modifications, especially 'adding a childnode' (which should
be possible to propagate down to the pm).

how the items are stored in the persistence manager is secondary.
since at the time the data is stored, the 'consolidated' states are
already available in the shareditemstatemanager.

the references should probably be stored more finegrained. i.e. the
following operations should be supported by the 'referencemanager' and
the respective pms:
  addReference(source, target)
  modifyReference(source, target) // could be covered by addReference
  removeReference(source, target)
  getReferences(target)
  getReference(source) // which is optional, since this is stored in
the property

regards, toby
-- 
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---

Re: Support concurrent node modifications

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

I agree. Of course there are other (may be better) ways to achieve the same
without such problems, like using a deeper hierarchy. But the flat hierarchy
is an important use case IMO. Many developers are used to this, and it will
be hard to educate them to do it in a different way. I think it should be
considered to change the implementation to allow fast addition of child
nodes to a node. I'm not sure if this change alone will solve all problems
(may be not).

At the same time, I think the current model of node references should be
changed to allow O(1) time when adding or removing a reference to a node. In
my opinion, the same mechanism could be used for references and for child
nodes.

Current implementation (AFAIK):
- each node has a pointer to the parent node
- the parent node has a list of child nodes, stored as a simple list (BLOB)
- each referenced node has a list of references stored as a simple list
(BLOB)

Proposed change:
- the pointer from child to parent should stay
- list of child nodes per node should be stored as a tree (b-tree or binary
tree)
- the list of references to a node should be stored as a tree as well

The actual b-tree implementation does not need to be in Jackrabbit. When
using databases as persistent manager, it is possible to use a regular
b-tree index for example. This feature may not supported by all persistent
managers.

Thomas


On 10/4/06, Jukka Zitting <ju...@gmail.com> wrote:
>
> Hi,
>
> The issue of concurrent node modifications from different sessions not
> being supported has (again) recently been raised. It seems like a
> really useful feature that is only prevented by the limitations of the
> current NodeState implementation. Are there other reasons why we
> couldn't support such modifications?
>
> More fundamentally, I'd be interested in finding out ways to separate
> the parent-child relationship handling from the parent NodeState. The
> current design is the root cause for us not supporting content models
> with unlimited children per node.
>
> BR,
>
> Jukka Zitting
>
> --
> Yukatan - http://yukatan.fi/ - info@yukatan.fi
> Software craftsmanship, JCR consulting, and Java development
>

Re: Support concurrent node modifications

Posted by Stefan Guggisberg <st...@gmail.com>.
On 10/4/06, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> The issue of concurrent node modifications from different sessions not
> being supported has (again) recently been raised. It seems like a
> really useful feature that is only prevented by the limitations of the
> current NodeState implementation. Are there other reasons why we
> couldn't support such modifications?

the main problems are maintaining consistency of same-name sibling subscripts
and order of child nodes.

the spec clearly states that adding/removing child node entries causes a
modification of the parent node, i.e. adding a child node requires the parent
to be saved as well. the current behavior of jackrabbit is in accordance with
the spec. however, a more sophisticated approach to determine whether a
node is 'stale' would be to analyze the modifications rather than deciding
based on the 'modifiied' flag alone.

i created a jira issue that addresses the handling of trivial changes:
https://issues.apache.org/jira/browse/JCR-584

cheers
stefan

>
> More fundamentally, I'd be interested in finding out ways to separate
> the parent-child relationship handling from the parent NodeState. The
> current design is the root cause for us not supporting content models
> with unlimited children per node.
>
> BR,
>
> Jukka Zitting
>
> --
> Yukatan - http://yukatan.fi/ - info@yukatan.fi
> Software craftsmanship, JCR consulting, and Java development
>