You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Alessandro Bologna <al...@gmail.com> on 2012/01/28 20:35:17 UTC

excluding nodes based on node name or namespace

Hi,

because of the (ever growing) growing size of some of our workspaces,
we had the need to tweak the indexing configuration to exclude
indexing on any properties based on a condition that specifies solely
the node name, the local name or the node namespace, and the default
implementation would not allow that easily. And we found that adding
explicit support for the ancestor-or-self and self axis was useful, so
I tried to make some changes to
org.apache.jackrabbit.core.query.lucene.IndexingConfigurationImpl

With this quick (and maybe not very elegant) patch (coded and tested,
but not very thoroughly, against jackrabbit-core 2.2.10), we can
support indexing with rules like the following:

<configuration xmlns=""
        xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
        xmlns:jcr="http://www.jcp.org/jcr/1.0"
        xmlns:html="http://www.w3.org/1999/xhtml">

<!-- exclude all the properties of nt:unstructured nodes called 'body'
in the default namespace and their children -->
        <index-rule nodeType="nt:unstructured"
                condition="ancestor-or-self::body">
        </index-rule>

<!-- exclude all the properties of nt:unstructured nodes called
'headline' in any namespace and their children -->
        <index-rule nodeType="nt:unstructured"
                condition="ancestor-or-self::*:headline">
        </index-rule>

<!-- exclude all the nt:unstructured children nodes of a node in the
html namespace -->
        <index-rule nodeType="nt:unstructured"
                condition="ancestor::html:*">
        </index-rule>

</configuration>

The test for node name was already implemented, but it required the
test of a property value as well, and, while we got away with
something like the following, it didn't seem too intuitive.
<index-rule nodeType="nt:unstructured"

condition="ancestor::body/@jcr:primaryType='{http://www.jcp.org/jcr/nt/1.0}unstructured'">
</index-rule>

As a side note, if you see in the example pasted above we had to add
the empty namespace declaration to make the indexing rule recognize a
node or a property name in the default namespace (even before the
patch). Not sure if this is a well known behavior, but it took a while
for us to understand why our conditions were ignored. Adding the
namespace declaration fixed the problem.

Hoping it may be useful to some.
Alessandro