You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Robert Haycock <Ro...@artificial-solutions.com> on 2012/12/06 14:40:11 UTC

How to make indexing case insensitive

Hi,

We have created our custom analyser and set the analyzer param in the workspace.xml...
public class LowerCaseAnalyzer extends Analyzer {

    @Override
    public TokenStream tokenStream(String fieldName, Reader reader) {
        TokenStream stream = new LetterTokenizer(reader);
        stream = new LowerCaseFilter(stream);
        return stream;
    }
}
And when I debug I can see it is being hit and tokens being converted to lowercase.
On a node I have a property called "name" with value "FLOW". Problem is, when I do a JCR2 SQL search the sql "WHERE name = 'FLOW'" returns the node but "WHERE name = 'flow'" doesn't. This is not what I expected.
My assumption was that the LowerCaseFilter would convert all text to lower case before being indexed.
Is my assumption wrong? How should I tell Jackrabbit to index everything as lower case so when I search I can convert all my search terms to lower case?

Thanks,

Rob.

RE: How to make indexing case insensitive

Posted by Marcel Reutegger <mr...@adobe.com>.

Hi,

> You can also do jcr:contains(@property, "...."), I am not exactly sure what
> happens in this case (if there is also a full text index for the invidual
> properties).

yes, that's correct. Jackrabbit also maintains a fulltext indexed lucene
field per property.

Regards
 Marcel

Re: How to make indexing case insensitive

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 06.12.2012, at 16:43, Robert Haycock <Ro...@artificial-solutions.com> wrote:

> That explains it. I thought the analyzer was also used for indexing, from my Lucene days.

The analyzer is used both for indexing the full text index *and* for parsing the terms in the jcr:contains() of the query (otherwise things like stemming or lower-casing in the analyzer wouldn't work in all cases).

The "full text index" is aggregated on the node by default and all properties (incl. text extracted from binary properties) contribute to it. Individual properties however, are indexed separately as fixed strings, and this is where @prop = "foo", jcr:like() and the fn:lower/upperCase() functions come into play.

You can also do jcr:contains(@property, "...."), I am not exactly sure what happens in this case (if there is also a full text index for the invidual properties).

Cheers,
Alex

RE: How to make indexing case insensitive

Posted by Robert Haycock <Ro...@artificial-solutions.com>.

That explains it. I thought the analyzer was also used for indexing, from my Lucene days.

Thanks.

-----Original Message-----
From: Marcel Reutegger [mailto:mreutegg@adobe.com] 
Sent: 06 December 2012 15:06
To: users@jackrabbit.apache.org
Subject: RE: How to make indexing case insensitive

Hi,

the analyzer is only used for fulltext query statements within a
CONTAINS() in SQL or jcr:contains() in XPath.

in your case you should rather use the built-in upper and lower case functions available in Jackrabbit. See the corresponding JIRA issue for a number of example queries:
https://issues.apache.org/jira/browse/JCR-638

Regards
 Marcel

> -----Original Message-----
> From: Robert Haycock [mailto:Robert.Haycock@artificial-solutions.com]
> Sent: Donnerstag, 6. Dezember 2012 14:40
> To: users@jackrabbit.apache.org
> Subject: How to make indexing case insensitive
> 
> Hi,
> 
> We have created our custom analyser and set the analyzer param in the 
> workspace.xml...
> public class LowerCaseAnalyzer extends Analyzer {
> 
>     @Override
>     public TokenStream tokenStream(String fieldName, Reader reader) {
>         TokenStream stream = new LetterTokenizer(reader);
>         stream = new LowerCaseFilter(stream);
>         return stream;
>     }
> }
> And when I debug I can see it is being hit and tokens being converted 
> to lowercase.
> On a node I have a property called "name" with value "FLOW". Problem 
> is, when I do a JCR2 SQL search the sql "WHERE name = 'FLOW'" returns 
> the node but "WHERE name = 'flow'" doesn't. This is not what I expected.
> My assumption was that the LowerCaseFilter would convert all text to 
> lower case before being indexed.
> Is my assumption wrong? How should I tell Jackrabbit to index 
> everything as lower case so when I search I can convert all my search terms to lower case?
> 
> Thanks,
> 
> Rob.

RE: How to make indexing case insensitive

Posted by Marcel Reutegger <mr...@adobe.com>.

Hi,

the analyzer is only used for fulltext query statements within a
CONTAINS() in SQL or jcr:contains() in XPath.

in your case you should rather use the built-in upper and lower case
functions available in Jackrabbit. See the corresponding JIRA issue
for a number of example queries:
https://issues.apache.org/jira/browse/JCR-638

Regards
 Marcel

> -----Original Message-----
> From: Robert Haycock [mailto:Robert.Haycock@artificial-solutions.com]
> Sent: Donnerstag, 6. Dezember 2012 14:40
> To: users@jackrabbit.apache.org
> Subject: How to make indexing case insensitive
> 
> Hi,
> 
> We have created our custom analyser and set the analyzer param in the
> workspace.xml...
> public class LowerCaseAnalyzer extends Analyzer {
> 
>     @Override
>     public TokenStream tokenStream(String fieldName, Reader reader) {
>         TokenStream stream = new LetterTokenizer(reader);
>         stream = new LowerCaseFilter(stream);
>         return stream;
>     }
> }
> And when I debug I can see it is being hit and tokens being converted to
> lowercase.
> On a node I have a property called "name" with value "FLOW". Problem is,
> when I do a JCR2 SQL search the sql "WHERE name = 'FLOW'" returns the
> node but "WHERE name = 'flow'" doesn't. This is not what I expected.
> My assumption was that the LowerCaseFilter would convert all text to lower
> case before being indexed.
> Is my assumption wrong? How should I tell Jackrabbit to index everything as
> lower case so when I search I can convert all my search terms to lower case?
> 
> Thanks,
> 
> Rob.