You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Florent Guillaume <fg...@nuxeo.com> on 2006/04/18 19:59:18 UTC
efficient note type indexing
Hi,
I have a node that has lots of unordered children nodes. Some of
these nodes are "real children" in the document management sense, the
others (in small number) are just nodes that hold complex datatypes
but are really part of the main document.
I'd like to access both categories of nodes in an efficient manner:
- get only the nodes for my complex datatypes,
- get the list of "real children" nodes.
I have flexibility in deciding how these node are typed. I can have
mixin types that are used as marker interface for these two
categories. Or (preferably) I can rely on the supertypes for my node
types to distinguish between the two.
What would you recommend so that my queries are processed
efficiently, using underlying indexes?
Thanks,
Florent
--
Florent Guillaume, Nuxeo (Paris, France) Director of R&D
+33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
Re: efficient note type indexing
Posted by Florent Guillaume <fg...@nuxeo.com>.
Hi Peeter,
Peeter Piegaze wrote:
> On 4/19/06, Marcel Reutegger <ma...@gmx.net> wrote:
>> Florent Guillaume wrote:
>>> I have a node that has lots of unordered children nodes. Some of these
>>> nodes are "real children" in the document management sense, the others
>>> (in small number) are just nodes that hold complex datatypes but are
>>> really part of the main document.
>>>
>>> I'd like to access both categories of nodes in an efficient manner:
>>> - get only the nodes for my complex datatypes,
>>> - get the list of "real children" nodes.
>> when you say 'get a list of child nodes' isn't it easier just using the
>> api instead of a query? Node.getNodes() and then have a custom
>> NodeIterator that filters out unnecessary nodes?
>
> Or you could simply push one or both categories of nodes down one
> level by defining an intervening node. Then you would just get *that*
> node and get its children.
Yes, that was the other solution I'd envisioned. I'll fallback to this if
performance is not adequate using other node organizations.
Thanks for your comments,
Florent
--
Florent Guillaume, Nuxeo (Paris, France) Director of R&D
+33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
Re: efficient note type indexing
Posted by Peeter Piegaze <pe...@day.com>.
Hi Florent,
On 4/19/06, Marcel Reutegger <ma...@gmx.net> wrote:
> Hi Florent,
>
> Florent Guillaume wrote:
> > Hi,
> >
> > I have a node that has lots of unordered children nodes. Some of these
> > nodes are "real children" in the document management sense, the others
> > (in small number) are just nodes that hold complex datatypes but are
> > really part of the main document.
> >
> > I'd like to access both categories of nodes in an efficient manner:
> > - get only the nodes for my complex datatypes,
> > - get the list of "real children" nodes.
>
> when you say 'get a list of child nodes' isn't it easier just using the
> api instead of a query? Node.getNodes() and then have a custom
> NodeIterator that filters out unnecessary nodes?
Or you could simply push one or both categories of nodes down one
level by defining an intervening node. Then you would just get *that*
node and get its children.
Peeter
Re: efficient note type indexing
Posted by Florent Guillaume <fg...@nuxeo.com>.
Marcel Reutegger wrote:
> Florent Guillaume wrote:
>>> using different types for the child nodes is definitively a good
>>> idea, as it helps narrowing down the set of nodes that may match.
>>
>> If I have the (non-mixin) types:
>> [my:bar]
>> ...
>> [my:foo] > my:bar
>> ...
>> [my:gee] > my:bar
>> ...
>> the spec (6.6.3.2) tells me that I can query
>> //element(*, my:bar)
>> and I'll get my:foo and my:gee nodes too. But is this implemented in
>> jackrabbit using efficient indexes, or is there an iteration and
>> comparison going on?
>
> jackrabbit uses an index to resolve the types. it basically expands the
> type hierarchy on parse time and then uses the index to collect the node.
Ah excellent, thanks. That's what I hoped.
Florent
--
Florent Guillaume, Nuxeo (Paris, France) Director of R&D
+33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
Re: efficient note type indexing
Posted by Marcel Reutegger <ma...@gmx.net>.
Florent Guillaume wrote:
>> using different types for the child nodes is definitively a good idea,
>> as it helps narrowing down the set of nodes that may match.
>
> If I have the (non-mixin) types:
> [my:bar]
> ...
> [my:foo] > my:bar
> ...
> [my:gee] > my:bar
> ...
> the spec (6.6.3.2) tells me that I can query
> //element(*, my:bar)
> and I'll get my:foo and my:gee nodes too. But is this implemented in
> jackrabbit using efficient indexes, or is there an iteration and
> comparison going on?
jackrabbit uses an index to resolve the types. it basically expands the
type hierarchy on parse time and then uses the index to collect the node.
regards
marcel
Re: efficient note type indexing
Posted by Florent Guillaume <fg...@nuxeo.com>.
Hi,
Marcel Reutegger wrote:
> Florent Guillaume wrote:
>> I have a node that has lots of unordered children nodes. Some of these
>> nodes are "real children" in the document management sense, the others
>> (in small number) are just nodes that hold complex datatypes but are
>> really part of the main document.
>>
>> I'd like to access both categories of nodes in an efficient manner:
>> - get only the nodes for my complex datatypes,
>> - get the list of "real children" nodes.
>
> when you say 'get a list of child nodes' isn't it easier just using the
> api instead of a query? Node.getNodes() and then have a custom
> NodeIterator that filters out unnecessary nodes?
An iterator that filters while iterating would be ok when most of the nodes
match, but in the case where the nodes that I want are those in small
numbers (and which may be at the end of the iterator list), it's
inefficient. That's why I mentioned indexed queries.
>> I have flexibility in deciding how these node are typed. I can have
>> mixin types that are used as marker interface for these two
>> categories. Or (preferably) I can rely on the supertypes for my node
>> types to distinguish between the two.
>>
>> What would you recommend so that my queries are processed efficiently,
>> using underlying indexes?
>
> using different types for the child nodes is definitively a good idea,
> as it helps narrowing down the set of nodes that may match.
If I have the (non-mixin) types:
[my:bar]
...
[my:foo] > my:bar
...
[my:gee] > my:bar
...
the spec (6.6.3.2) tells me that I can query
//element(*, my:bar)
and I'll get my:foo and my:gee nodes too. But is this implemented in
jackrabbit using efficient indexes, or is there an iteration and comparison
going on?
Thanks,
Florent
--
Florent Guillaume, Nuxeo (Paris, France) Director of R&D
+33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
Re: efficient note type indexing
Posted by Marcel Reutegger <ma...@gmx.net>.
Hi Florent,
Florent Guillaume wrote:
> Hi,
>
> I have a node that has lots of unordered children nodes. Some of these
> nodes are "real children" in the document management sense, the others
> (in small number) are just nodes that hold complex datatypes but are
> really part of the main document.
>
> I'd like to access both categories of nodes in an efficient manner:
> - get only the nodes for my complex datatypes,
> - get the list of "real children" nodes.
when you say 'get a list of child nodes' isn't it easier just using the
api instead of a query? Node.getNodes() and then have a custom
NodeIterator that filters out unnecessary nodes?
> I have flexibility in deciding how these node are typed. I can have
> mixin types that are used as marker interface for these two categories.
> Or (preferably) I can rely on the supertypes for my node types to
> distinguish between the two.
>
> What would you recommend so that my queries are processed efficiently,
> using underlying indexes?
using different types for the child nodes is definitively a good idea,
as it helps narrowing down the set of nodes that may match.
regards
marcel