You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jackrabbit.apache.org by Apache Wiki <wi...@apache.org> on 2007/05/15 17:14:51 UTC

[Jackrabbit Wiki] Update of "IndexingConfiguration" by MarcelReutegger

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The following page has been changed by MarcelReutegger:
http://wiki.apache.org/jackrabbit/IndexingConfiguration

New page:
Starting with version 1.4, the default search index implementation in Jackrabbit allows you to control which properties of a node are indexed.

The configuration parameter is called {{{indexingConfiguration}}} and per default is not set. This means all properties of a node are indexed.

If you whish to configure the indexing behaviour you need to add a parameter to the SearchIndex element in your workspace.xml and repository.xml file. The following example uses a configuration file, which is located in the same folder like the workspace.xml file:

{{{
<param name="indexingConfiguration" value="${wsp.home}/indexing_configuration.xml"/>
}}}

=== Index rules ===

To optimize the index size you can index only certain properties of a node type.

With the below configuration only properties named Text are indexed for nodes of type nt:unstructured. This also applies to nodes with a type that extends from nt:unstructured.

{{{
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured">
    <property>Text</property>
  </index-rule>
</configuration>
}}}

Please note that you have to declare the namespace prefixes in the configuration element that you are using throughout the XML file!

It is also possible to configure a boost value for the nodes that match the index rule. The default boost value is 1.0. Higher boost values (a reasonable range is 1.0 - 5.0) will yield a higher score value and appear as more relevant.

{{{
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured"
              boost="2.0">
    <property>Text</property>
  </index-rule>
</configuration>
}}}

If you do not whish to boost the complete node but only certain properties you can also provide a boost value for the listed properties:

{{{
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured">
    <property boost="3.0">Title</property>
    <property boost="1.5">Text</property>
  </index-rule>
</configuration>
}}}


You may also add a condition to the index rule and have multiple rules with the same nodeType. The first index rule that matches will apply and all remaining ones are ignored:

{{{
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured"
              boost="2.0"
              condition="@priority = 'high'">
    <property>Text</property>
  </index-rule>
  <index-rule nodeType="nt:unstructured">
    <property>Text</property>
  </index-rule>
</configuration>
}}}

In the above example the first inde rule only applies if the nt:unstructured node has a priority property with a value 'high'. The condition only supports the equals operator and a string literal.

You may also reference properties in the condition that are not on the current node:

{{{
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured"
              boost="2.0"
              condition="ancestor::*@priority = 'high'">
    <property>Text</property>
  </index-rule>
  <index-rule nodeType="nt:unstructured"
              boost="0.5"
              condition="parent::foo@priority = 'low'">
    <property>Text</property>
  </index-rule>
  <index-rule nodeType="nt:unstructured"
              boost="1.5"
              condition="bar/@priority = 'medium'">
    <property>Text</property>
  </index-rule>
  <index-rule nodeType="nt:unstructured">
    <property>Text</property>
  </index-rule>
</configuration>
}}}

The indexing configuration also allows you to specify the type of a node in the condition. Please note however that the type match must be exact. It does not consider sub types of the specified node type.

{{{
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured"
              boost="2.0"
              condition="element(*, nt:unstructured)/@priority = 'high'">
    <property>Text</property>
  </index-rule>
</configuration>
}}}

Per default the configured properties are fulltext indexed if they are of type STRING and included in the node scope index. That is, you can do a {{{jcr:contains(., 'foo')}}} and it will return nodes that have a string property that contains the word foo. This behaviour can be disabled:

{{{
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured">
    <property nodeScopeIndex="false">Text</property>
  </index-rule>
</configuration>
}}}

=== Index Aggregates ===

Sometimes it is useful to include the contents of descendant nodes into a single node to easier search on content that is scattered across multiple nodes.

Jackrabbit allows you to define index aggregates based on relative path patterns and primary node types.

The following example creates an index aggregate on nt:file that includes the content of the jcr:content node:

{{{
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
               xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <aggregate primaryType="nt:file">
    <include>jcr:content</include>
  </aggregate>
</configuration>
}}}

You can also restrict the included nodes to a certain type:

{{{
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
               xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <aggregate primaryType="nt:file">
    <include primaryType="nt:resource">jcr:content</include>
  </aggregate>
</configuration>
}}}

You may also use the * to match all child nodes:

{{{
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
               xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <aggregate primaryType="nt:file">
    <include primaryType="nt:resource">*</include>
  </aggregate>
</configuration>
}}}

If you wish to include nodes up to a certain depth below the current node you can add multiple include elements. E.g. the nt:file node may contain an exploded XML document under jcr:content:

{{{
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
               xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <aggregate primaryType="nt:file">
    <include>*</include>
    <include>*/*</include>
    <include>*/*/*</include>
  </aggregate>
</configuration>
}}}

'''Important note''': Both index rules and index aggregates influence how content is indexed in Jackrabbit. If you change the configuration the existing content is not automatically re-indexed according to the new rules. You therefore have to manually re-index the content when you change the configuration!