You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Ian Boston <ie...@tfd.co.uk> on 2010/10/21 11:24:18 UTC

Controlling Indexing.

Hi,

I have a query that uses jcr:contains(.,'admin') matches all nodes where jcr:createdBy = admin.
Because of the requirements we have jcr:createBy=admin happens quite a lot and is not a meaningful search field, certainly not as a tokenized search field to be matched by jcr:contains. So I have configured the indexing_configuration with 

<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.2.dtd">
<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
               xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
               xmlns:sakai="http://www.sakaiproject.org/nakamura/2.0">
  <index-rule nodeType="nt:unstructured" >
    <property boost="0.0" nodeScopeIndex="false"  >jcr:createdBy</property>
  </index-rule>
</configuration>

However, when I do a search on other things, jcr:contains now doesn't match anything and when I open the Lucene index in Luke I still see that the field 3:FULL:createdBy is in the index with 1 value "admin",  and in _:FULLTEXT "admin" has the highest rank.  It looks to me like I haven't got the indexing configuration right.

Is it possible to remove jcr:createdBy from the full text index so it doesn't match in jcr:contains, and leave all all other fields (we don't have a full list of all possibilities) in the index ?

Can I do this for all nodes not just nt:unstructured, or do I need an entry for each nodeType ?

Is there a better way of doing this?

Thanks
Ian


Re: Controlling Indexing.

Posted by Ian Boston <ie...@tfd.co.uk>.
Hmm, 
wierd, I tried that and it didnt work for me but that could have been one of those moments when I wasn't doing what I thought I was.
Will give it another go,
Thanks
Ian

On 22 Oct 2010, at 17:42, Ray Davis wrote:

> According to this message [1], Jackrabbit chose to interpret "nodeScopeIndex='false'" to mean "do not index ANY properties by default, and especially do not index this particular one."
> 
> However, I just noticed that a workaround is documented ([2], [3]) for Jackrabbit 2.0 and later:
> 
>  <index-rule nodeType="nt:unstructured">
>    <property nodeScopeIndex="false">jcr:createdBy</property>
>    <property isRegexp="true">.*:.*</property>
>  </index-rule>
> 
> In my test, this works to exclude "jcr:createdBy" from a node search while still letting other properties through.
> 
> Best,
> Ray
> 
> [1] http://www.mail-archive.com/dev@jackrabbit.apache.org/msg09642.html
> [2] http://wiki.apache.org/jackrabbit/IndexingConfiguration
> [3] http://dev.day.com/kb/content/wiki/kb/cq5/CQ5SystemAdministration/HowToExcludeNodePropertyFromSearchIndexing.html
> 
> On 10/21/10 2:24 AM, Ian Boston wrote:
>> Hi,
>> 
>> I have a query that uses jcr:contains(.,'admin') matches all nodes where jcr:createdBy = admin.
>> Because of the requirements we have jcr:createBy=admin happens quite a lot and is not a meaningful search field, certainly not as a tokenized search field to be matched by jcr:contains. So I have configured the indexing_configuration with
>> 
>> <!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.2.dtd">
>> <configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
>>                xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
>>                xmlns:sakai="http://www.sakaiproject.org/nakamura/2.0">
>>   <index-rule nodeType="nt:unstructured">
>>     <property boost="0.0" nodeScopeIndex="false">jcr:createdBy</property>
>>   </index-rule>
>> </configuration>
>> 
>> However, when I do a search on other things, jcr:contains now doesn't match anything and when I open the Lucene index in Luke I still see that the field 3:FULL:createdBy is in the index with 1 value "admin",  and in _:FULLTEXT "admin" has the highest rank.  It looks to me like I haven't got the indexing configuration right.
>> 
>> Is it possible to remove jcr:createdBy from the full text index so it doesn't match in jcr:contains, and leave all all other fields (we don't have a full list of all possibilities) in the index ?
>> 
>> Can I do this for all nodes not just nt:unstructured, or do I need an entry for each nodeType ?
>> 
>> Is there a better way of doing this?
>> 
>> Thanks
>> Ian
>> 
>> 
> 


Re: Controlling Indexing.

Posted by Ray Davis <ra...@media.berkeley.edu>.
According to this message [1], Jackrabbit chose to interpret 
"nodeScopeIndex='false'" to mean "do not index ANY properties by 
default, and especially do not index this particular one."

However, I just noticed that a workaround is documented ([2], [3]) for 
Jackrabbit 2.0 and later:

   <index-rule nodeType="nt:unstructured">
     <property nodeScopeIndex="false">jcr:createdBy</property>
     <property isRegexp="true">.*:.*</property>
   </index-rule>

In my test, this works to exclude "jcr:createdBy" from a node search 
while still letting other properties through.

Best,
Ray

[1] http://www.mail-archive.com/dev@jackrabbit.apache.org/msg09642.html
[2] http://wiki.apache.org/jackrabbit/IndexingConfiguration
[3] 
http://dev.day.com/kb/content/wiki/kb/cq5/CQ5SystemAdministration/HowToExcludeNodePropertyFromSearchIndexing.html

On 10/21/10 2:24 AM, Ian Boston wrote:
> Hi,
>
> I have a query that uses jcr:contains(.,'admin') matches all nodes where jcr:createdBy = admin.
> Because of the requirements we have jcr:createBy=admin happens quite a lot and is not a meaningful search field, certainly not as a tokenized search field to be matched by jcr:contains. So I have configured the indexing_configuration with
>
> <!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.2.dtd">
> <configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
>                 xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
>                 xmlns:sakai="http://www.sakaiproject.org/nakamura/2.0">
>    <index-rule nodeType="nt:unstructured">
>      <property boost="0.0" nodeScopeIndex="false">jcr:createdBy</property>
>    </index-rule>
> </configuration>
>
> However, when I do a search on other things, jcr:contains now doesn't match anything and when I open the Lucene index in Luke I still see that the field 3:FULL:createdBy is in the index with 1 value "admin",  and in _:FULLTEXT "admin" has the highest rank.  It looks to me like I haven't got the indexing configuration right.
>
> Is it possible to remove jcr:createdBy from the full text index so it doesn't match in jcr:contains, and leave all all other fields (we don't have a full list of all possibilities) in the index ?
>
> Can I do this for all nodes not just nt:unstructured, or do I need an entry for each nodeType ?
>
> Is there a better way of doing this?
>
> Thanks
> Ian
>
>