You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Florent Guillaume <fg...@nuxeo.com> on 2006/04/18 19:59:18 UTC

efficient note type indexing

Hi,

I have a node that has lots of unordered children nodes. Some of  
these nodes are "real children" in the document management sense, the  
others (in small number) are just nodes that hold complex datatypes  
but are really part of the main document.

I'd like to access both categories of nodes in an efficient manner:
- get only the nodes for my complex datatypes,
- get the list of "real children" nodes.

I have flexibility in deciding how these node are typed. I can have  
mixin types that are used as marker interface for these two  
categories. Or (preferably) I can rely on the supertypes for my node  
types to distinguish between the two.

What would you recommend so that my queries are processed  
efficiently, using underlying indexes?

Thanks,
Florent

-- 
Florent Guillaume, Nuxeo (Paris, France)   Director of R&D
+33 1 40 33 71 59   http://nuxeo.com   fg@nuxeo.com




Re: efficient note type indexing

Posted by Florent Guillaume <fg...@nuxeo.com>.
Hi Peeter,

Peeter Piegaze wrote:
> On 4/19/06, Marcel Reutegger <ma...@gmx.net> wrote:
>> Florent Guillaume wrote:
>>> I have a node that has lots of unordered children nodes. Some of these
>>> nodes are "real children" in the document management sense, the others
>>> (in small number) are just nodes that hold complex datatypes but are
>>> really part of the main document.
>>>
>>> I'd like to access both categories of nodes in an efficient manner:
>>> - get only the nodes for my complex datatypes,
>>> - get the list of "real children" nodes.
>> when you say 'get a list of child nodes' isn't it easier just using the
>> api instead of a query? Node.getNodes() and then have a custom
>> NodeIterator that filters out unnecessary nodes?
> 
> Or you could simply push one or both categories of nodes down one
> level by defining an intervening node. Then you would just get *that*
> node and get its children.

Yes, that was the other solution I'd envisioned. I'll fallback to this if 
performance is not adequate using other node organizations.

Thanks for your comments,

Florent

-- 
Florent Guillaume, Nuxeo (Paris, France)   Director of R&D
+33 1 40 33 71 59   http://nuxeo.com   fg@nuxeo.com

Re: efficient note type indexing

Posted by Peeter Piegaze <pe...@day.com>.
Hi Florent,

On 4/19/06, Marcel Reutegger <ma...@gmx.net> wrote:
> Hi Florent,
>
> Florent Guillaume wrote:
> > Hi,
> >
> > I have a node that has lots of unordered children nodes. Some of these
> > nodes are "real children" in the document management sense, the others
> > (in small number) are just nodes that hold complex datatypes but are
> > really part of the main document.
> >
> > I'd like to access both categories of nodes in an efficient manner:
> > - get only the nodes for my complex datatypes,
> > - get the list of "real children" nodes.
>
> when you say 'get a list of child nodes' isn't it easier just using the
> api instead of a query? Node.getNodes() and then have a custom
> NodeIterator that filters out unnecessary nodes?

Or you could simply push one or both categories of nodes down one
level by defining an intervening node. Then you would just get *that*
node and get its children.

Peeter

Re: efficient note type indexing

Posted by Florent Guillaume <fg...@nuxeo.com>.
Marcel Reutegger wrote:
> Florent Guillaume wrote:
>>> using different types for the child nodes is definitively a good 
>>> idea, as it helps narrowing down the set of nodes that may match.
>>
>> If I have the (non-mixin) types:
>>   [my:bar]
>>      ...
>>   [my:foo] > my:bar
>>      ...
>>   [my:gee] > my:bar
>>      ...
>> the spec (6.6.3.2) tells me that I can query
>>   //element(*, my:bar)
>> and I'll get my:foo and my:gee nodes too. But is this implemented in 
>> jackrabbit using efficient indexes, or is there an iteration and 
>> comparison going on?
> 
> jackrabbit uses an index to resolve the types. it basically expands the 
> type hierarchy on parse time and then uses the index to collect the node.

Ah excellent, thanks. That's what I hoped.

Florent

-- 
Florent Guillaume, Nuxeo (Paris, France)   Director of R&D
+33 1 40 33 71 59   http://nuxeo.com   fg@nuxeo.com

Re: efficient note type indexing

Posted by Marcel Reutegger <ma...@gmx.net>.
Florent Guillaume wrote:
>> using different types for the child nodes is definitively a good idea, 
>> as it helps narrowing down the set of nodes that may match.
> 
> If I have the (non-mixin) types:
>   [my:bar]
>      ...
>   [my:foo] > my:bar
>      ...
>   [my:gee] > my:bar
>      ...
> the spec (6.6.3.2) tells me that I can query
>   //element(*, my:bar)
> and I'll get my:foo and my:gee nodes too. But is this implemented in 
> jackrabbit using efficient indexes, or is there an iteration and 
> comparison going on?

jackrabbit uses an index to resolve the types. it basically expands the 
type hierarchy on parse time and then uses the index to collect the node.


regards
  marcel

Re: efficient note type indexing

Posted by Florent Guillaume <fg...@nuxeo.com>.
Hi,

Marcel Reutegger wrote:
> Florent Guillaume wrote: 
>> I have a node that has lots of unordered children nodes. Some of these 
>> nodes are "real children" in the document management sense, the others 
>> (in small number) are just nodes that hold complex datatypes but are 
>> really part of the main document.
>>
>> I'd like to access both categories of nodes in an efficient manner:
>> - get only the nodes for my complex datatypes,
>> - get the list of "real children" nodes.
> 
> when you say 'get a list of child nodes' isn't it easier just using the 
> api instead of a query? Node.getNodes() and then have a custom 
> NodeIterator that filters out unnecessary nodes?

An iterator that filters while iterating would be ok when most of the nodes 
match, but in the case where the nodes that I want are those in small 
numbers (and which may be at the end of the iterator list), it's 
inefficient. That's why I mentioned indexed queries.

>> I have flexibility in deciding how these node are typed. I can have 
>> mixin types that are used as marker interface for these two 
>> categories. Or (preferably) I can rely on the supertypes for my node 
>> types to distinguish between the two.
>>
>> What would you recommend so that my queries are processed efficiently, 
>> using underlying indexes?
> 
> using different types for the child nodes is definitively a good idea, 
> as it helps narrowing down the set of nodes that may match.

If I have the (non-mixin) types:
   [my:bar]
      ...
   [my:foo] > my:bar
      ...
   [my:gee] > my:bar
      ...
the spec (6.6.3.2) tells me that I can query
   //element(*, my:bar)
and I'll get my:foo and my:gee nodes too. But is this implemented in 
jackrabbit using efficient indexes, or is there an iteration and comparison 
going on?

Thanks,

Florent

-- 
Florent Guillaume, Nuxeo (Paris, France)   Director of R&D
+33 1 40 33 71 59   http://nuxeo.com   fg@nuxeo.com

Re: efficient note type indexing

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi Florent,

Florent Guillaume wrote:
> Hi,
> 
> I have a node that has lots of unordered children nodes. Some of these 
> nodes are "real children" in the document management sense, the others 
> (in small number) are just nodes that hold complex datatypes but are 
> really part of the main document.
> 
> I'd like to access both categories of nodes in an efficient manner:
> - get only the nodes for my complex datatypes,
> - get the list of "real children" nodes.

when you say 'get a list of child nodes' isn't it easier just using the 
api instead of a query? Node.getNodes() and then have a custom 
NodeIterator that filters out unnecessary nodes?

> I have flexibility in deciding how these node are typed. I can have 
> mixin types that are used as marker interface for these two categories. 
> Or (preferably) I can rely on the supertypes for my node types to 
> distinguish between the two.
> 
> What would you recommend so that my queries are processed efficiently, 
> using underlying indexes?

using different types for the child nodes is definitively a good idea, 
as it helps narrowing down the set of nodes that may match.

regards
  marcel