You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jamir Shaikh <sh...@gmail.com> on 2011/10/15 02:21:42 UTC
Case insensitive Keyword Analyser
Hi Guys,
Use Case: Field: Name
Data: Jose ,
Jose Sam,
jose,
jose jacob,
jose ,
joseph,
josef ,
S. Jose,
B. jose
etc.
There is a field (Name), I want to index this field.
I will be searching this field for a Wildcard query
e.g. jose*
This should return all names starting with jose.
Search: Jose* (should return all names starting with jose)
Solution:
1. Using Standard analyser.
Problem with Standard Analyser:
If I use Standard Analyser in addition to correct results it returns results
like S. Jose, B. jose
which do not start with Jose.
2. Using Keyword Analyser.
Problem with Keyword Analyser:
Keyword Analyser is case sensitive so it misses names like Jose, Jose Sam,
This happens becuase a search Jose* will be changed to jose* (all small
letters)
So is there any analyser available which will take care of such use case.
What I am searching is a Case insensitive Keyword Analyser.
Or let me know if there is any other approach to handle this use case.
Thanks,
Jamir
Re: Case insensitive Keyword Analyser
Posted by Jamir Shaikh <sh...@gmail.com>.
Thanks a ton Anna..
It's working fine...
On Sun, Oct 16, 2011 at 11:51 PM, Anna Hunecke <A....@topdesk.com>wrote:
> Hi Jamir,
>
> you can easily combine Analyzers however you need it by filtering the
> output of one Analyzer with another. In your case, I would just write my own
> Analyzer class like this:
>
> class LowerCaseKeywordAnalyzer extends Analyzer {
>
> @Override
> public TokenStream tokenStream(String fieldName, Reader reader) {
> TokenStream tokenStream = new KeywordTokenizer(reader);
> tokenStream =
> new LowerCaseFilter(Version.LUCENE_34, tokenStream);
> return tokenStream;
> }
>
> }
>
> Best,
> Anna
>
>
> -----Ursprüngliche Nachricht-----
> Von: Jamir Shaikh [mailto:shaikhjamir@gmail.com]
> Gesendet: Samstag, 15. Oktober 2011 02:22
> An: java-user@lucene.apache.org
> Betreff: Case insensitive Keyword Analyser
>
> Hi Guys,
>
> Use Case: Field: Name
> Data: Jose ,
> Jose Sam,
> jose,
> jose jacob,
> jose ,
> joseph,
> josef ,
> S. Jose,
> B. jose
> etc.
>
> There is a field (Name), I want to index this field.
> I will be searching this field for a Wildcard query
> e.g. jose*
> This should return all names starting with jose.
>
> Search: Jose* (should return all names starting with jose)
>
> Solution:
> 1. Using Standard analyser.
>
> Problem with Standard Analyser:
> If I use Standard Analyser in addition to correct results it returns
> results
> like S. Jose, B. jose
> which do not start with Jose.
>
>
> 2. Using Keyword Analyser.
> Problem with Keyword Analyser:
> Keyword Analyser is case sensitive so it misses names like Jose, Jose Sam,
> This happens becuase a search Jose* will be changed to jose* (all small
> letters)
>
>
>
> So is there any analyser available which will take care of such use case.
> What I am searching is a Case insensitive Keyword Analyser.
> Or let me know if there is any other approach to handle this use case.
>
>
> Thanks,
> Jamir
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
--
regards,
Jamir...
AW: Case insensitive Keyword Analyser
Posted by Anna Hunecke <A....@topdesk.com>.
Hi Jamir,
you can easily combine Analyzers however you need it by filtering the output of one Analyzer with another. In your case, I would just write my own Analyzer class like this:
class LowerCaseKeywordAnalyzer extends Analyzer {
@Override
public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream tokenStream = new KeywordTokenizer(reader);
tokenStream =
new LowerCaseFilter(Version.LUCENE_34, tokenStream);
return tokenStream;
}
}
Best,
Anna
-----Ursprüngliche Nachricht-----
Von: Jamir Shaikh [mailto:shaikhjamir@gmail.com]
Gesendet: Samstag, 15. Oktober 2011 02:22
An: java-user@lucene.apache.org
Betreff: Case insensitive Keyword Analyser
Hi Guys,
Use Case: Field: Name
Data: Jose ,
Jose Sam,
jose,
jose jacob,
jose ,
joseph,
josef ,
S. Jose,
B. jose
etc.
There is a field (Name), I want to index this field.
I will be searching this field for a Wildcard query
e.g. jose*
This should return all names starting with jose.
Search: Jose* (should return all names starting with jose)
Solution:
1. Using Standard analyser.
Problem with Standard Analyser:
If I use Standard Analyser in addition to correct results it returns results
like S. Jose, B. jose
which do not start with Jose.
2. Using Keyword Analyser.
Problem with Keyword Analyser:
Keyword Analyser is case sensitive so it misses names like Jose, Jose Sam,
This happens becuase a search Jose* will be changed to jose* (all small
letters)
So is there any analyser available which will take care of such use case.
What I am searching is a Case insensitive Keyword Analyser.
Or let me know if there is any other approach to handle this use case.
Thanks,
Jamir
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org