You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Ian Boston <ie...@tfd.co.uk> on 2010/10/21 11:24:18 UTC
Controlling Indexing.
Hi,
I have a query that uses jcr:contains(.,'admin') matches all nodes where jcr:createdBy = admin.
Because of the requirements we have jcr:createBy=admin happens quite a lot and is not a meaningful search field, certainly not as a tokenized search field to be matched by jcr:contains. So I have configured the indexing_configuration with
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.2.dtd">
<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
xmlns:sakai="http://www.sakaiproject.org/nakamura/2.0">
<index-rule nodeType="nt:unstructured" >
<property boost="0.0" nodeScopeIndex="false" >jcr:createdBy</property>
</index-rule>
</configuration>
However, when I do a search on other things, jcr:contains now doesn't match anything and when I open the Lucene index in Luke I still see that the field 3:FULL:createdBy is in the index with 1 value "admin", and in _:FULLTEXT "admin" has the highest rank. It looks to me like I haven't got the indexing configuration right.
Is it possible to remove jcr:createdBy from the full text index so it doesn't match in jcr:contains, and leave all all other fields (we don't have a full list of all possibilities) in the index ?
Can I do this for all nodes not just nt:unstructured, or do I need an entry for each nodeType ?
Is there a better way of doing this?
Thanks
Ian
Re: Controlling Indexing.
Posted by Ian Boston <ie...@tfd.co.uk>.
Hmm,
wierd, I tried that and it didnt work for me but that could have been one of those moments when I wasn't doing what I thought I was.
Will give it another go,
Thanks
Ian
On 22 Oct 2010, at 17:42, Ray Davis wrote:
> According to this message [1], Jackrabbit chose to interpret "nodeScopeIndex='false'" to mean "do not index ANY properties by default, and especially do not index this particular one."
>
> However, I just noticed that a workaround is documented ([2], [3]) for Jackrabbit 2.0 and later:
>
> <index-rule nodeType="nt:unstructured">
> <property nodeScopeIndex="false">jcr:createdBy</property>
> <property isRegexp="true">.*:.*</property>
> </index-rule>
>
> In my test, this works to exclude "jcr:createdBy" from a node search while still letting other properties through.
>
> Best,
> Ray
>
> [1] http://www.mail-archive.com/dev@jackrabbit.apache.org/msg09642.html
> [2] http://wiki.apache.org/jackrabbit/IndexingConfiguration
> [3] http://dev.day.com/kb/content/wiki/kb/cq5/CQ5SystemAdministration/HowToExcludeNodePropertyFromSearchIndexing.html
>
> On 10/21/10 2:24 AM, Ian Boston wrote:
>> Hi,
>>
>> I have a query that uses jcr:contains(.,'admin') matches all nodes where jcr:createdBy = admin.
>> Because of the requirements we have jcr:createBy=admin happens quite a lot and is not a meaningful search field, certainly not as a tokenized search field to be matched by jcr:contains. So I have configured the indexing_configuration with
>>
>> <!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.2.dtd">
>> <configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
>> xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
>> xmlns:sakai="http://www.sakaiproject.org/nakamura/2.0">
>> <index-rule nodeType="nt:unstructured">
>> <property boost="0.0" nodeScopeIndex="false">jcr:createdBy</property>
>> </index-rule>
>> </configuration>
>>
>> However, when I do a search on other things, jcr:contains now doesn't match anything and when I open the Lucene index in Luke I still see that the field 3:FULL:createdBy is in the index with 1 value "admin", and in _:FULLTEXT "admin" has the highest rank. It looks to me like I haven't got the indexing configuration right.
>>
>> Is it possible to remove jcr:createdBy from the full text index so it doesn't match in jcr:contains, and leave all all other fields (we don't have a full list of all possibilities) in the index ?
>>
>> Can I do this for all nodes not just nt:unstructured, or do I need an entry for each nodeType ?
>>
>> Is there a better way of doing this?
>>
>> Thanks
>> Ian
>>
>>
>
Re: Controlling Indexing.
Posted by Ray Davis <ra...@media.berkeley.edu>.
According to this message [1], Jackrabbit chose to interpret
"nodeScopeIndex='false'" to mean "do not index ANY properties by
default, and especially do not index this particular one."
However, I just noticed that a workaround is documented ([2], [3]) for
Jackrabbit 2.0 and later:
<index-rule nodeType="nt:unstructured">
<property nodeScopeIndex="false">jcr:createdBy</property>
<property isRegexp="true">.*:.*</property>
</index-rule>
In my test, this works to exclude "jcr:createdBy" from a node search
while still letting other properties through.
Best,
Ray
[1] http://www.mail-archive.com/dev@jackrabbit.apache.org/msg09642.html
[2] http://wiki.apache.org/jackrabbit/IndexingConfiguration
[3]
http://dev.day.com/kb/content/wiki/kb/cq5/CQ5SystemAdministration/HowToExcludeNodePropertyFromSearchIndexing.html
On 10/21/10 2:24 AM, Ian Boston wrote:
> Hi,
>
> I have a query that uses jcr:contains(.,'admin') matches all nodes where jcr:createdBy = admin.
> Because of the requirements we have jcr:createBy=admin happens quite a lot and is not a meaningful search field, certainly not as a tokenized search field to be matched by jcr:contains. So I have configured the indexing_configuration with
>
> <!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.2.dtd">
> <configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
> xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
> xmlns:sakai="http://www.sakaiproject.org/nakamura/2.0">
> <index-rule nodeType="nt:unstructured">
> <property boost="0.0" nodeScopeIndex="false">jcr:createdBy</property>
> </index-rule>
> </configuration>
>
> However, when I do a search on other things, jcr:contains now doesn't match anything and when I open the Lucene index in Luke I still see that the field 3:FULL:createdBy is in the index with 1 value "admin", and in _:FULLTEXT "admin" has the highest rank. It looks to me like I haven't got the indexing configuration right.
>
> Is it possible to remove jcr:createdBy from the full text index so it doesn't match in jcr:contains, and leave all all other fields (we don't have a full list of all possibilities) in the index ?
>
> Can I do this for all nodes not just nt:unstructured, or do I need an entry for each nodeType ?
>
> Is there a better way of doing this?
>
> Thanks
> Ian
>
>