You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jamir Shaikh <sh...@gmail.com> on 2011/10/15 02:21:42 UTC

Case insensitive Keyword Analyser

Hi Guys,

Use Case: Field: Name
                 Data:  Jose ,
                           Jose Sam,
                            jose,
                            jose jacob,
                             jose ,
                                      joseph,
                                      josef ,
                             S. Jose,
                             B. jose
              etc.

There is a field (Name), I want to index this field.
I will be searching this field for a Wildcard query
e.g. jose*
This should return all names starting with jose.

Search: Jose* (should return all names starting with jose)

Solution:
1. Using Standard analyser.

Problem with Standard Analyser:
If I use Standard Analyser in addition to correct results it returns results
like S. Jose, B. jose
which do not start with Jose.


2. Using Keyword Analyser.
Problem with Keyword Analyser:
Keyword Analyser is case sensitive so it misses names like Jose, Jose Sam,
This happens becuase a search Jose* will be changed to jose* (all small
letters)



So is there any analyser available which will take care of such use case.
What I am searching is a Case insensitive Keyword Analyser.
 Or let me know if there is any other approach to handle this use case.


Thanks,
Jamir

Re: Case insensitive Keyword Analyser

Posted by Jamir Shaikh <sh...@gmail.com>.
Thanks a ton Anna..
It's working fine...

On Sun, Oct 16, 2011 at 11:51 PM, Anna Hunecke <A....@topdesk.com>wrote:

> Hi Jamir,
>
> you can easily combine Analyzers however you need it by filtering the
> output of one Analyzer with another. In your case, I would just write my own
> Analyzer class like this:
>
> class LowerCaseKeywordAnalyzer extends Analyzer {
>
>        @Override
>        public TokenStream tokenStream(String fieldName, Reader reader) {
>             TokenStream tokenStream = new KeywordTokenizer(reader);
>                tokenStream =
>                        new LowerCaseFilter(Version.LUCENE_34, tokenStream);
>                return tokenStream;
>        }
>
> }
>
> Best,
> Anna
>
>
> -----Ursprüngliche Nachricht-----
> Von: Jamir Shaikh [mailto:shaikhjamir@gmail.com]
> Gesendet: Samstag, 15. Oktober 2011 02:22
> An: java-user@lucene.apache.org
> Betreff: Case insensitive Keyword Analyser
>
> Hi Guys,
>
> Use Case: Field: Name
>                 Data:  Jose ,
>                           Jose Sam,
>                            jose,
>                            jose jacob,
>                             jose ,
>                                      joseph,
>                                      josef ,
>                             S. Jose,
>                             B. jose
>              etc.
>
> There is a field (Name), I want to index this field.
> I will be searching this field for a Wildcard query
> e.g. jose*
> This should return all names starting with jose.
>
> Search: Jose* (should return all names starting with jose)
>
> Solution:
> 1. Using Standard analyser.
>
> Problem with Standard Analyser:
> If I use Standard Analyser in addition to correct results it returns
> results
> like S. Jose, B. jose
> which do not start with Jose.
>
>
> 2. Using Keyword Analyser.
> Problem with Keyword Analyser:
> Keyword Analyser is case sensitive so it misses names like Jose, Jose Sam,
> This happens becuase a search Jose* will be changed to jose* (all small
> letters)
>
>
>
> So is there any analyser available which will take care of such use case.
> What I am searching is a Case insensitive Keyword Analyser.
>  Or let me know if there is any other approach to handle this use case.
>
>
> Thanks,
> Jamir
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
regards,
Jamir...

AW: Case insensitive Keyword Analyser

Posted by Anna Hunecke <A....@topdesk.com>.
Hi Jamir,

you can easily combine Analyzers however you need it by filtering the output of one Analyzer with another. In your case, I would just write my own Analyzer class like this:

class LowerCaseKeywordAnalyzer extends Analyzer {

	@Override
	public TokenStream tokenStream(String fieldName, Reader reader) {		TokenStream tokenStream = new KeywordTokenizer(reader);
		tokenStream = 
			new LowerCaseFilter(Version.LUCENE_34, tokenStream);
		return tokenStream;
	}
		
}

Best,
Anna


-----Ursprüngliche Nachricht-----
Von: Jamir Shaikh [mailto:shaikhjamir@gmail.com] 
Gesendet: Samstag, 15. Oktober 2011 02:22
An: java-user@lucene.apache.org
Betreff: Case insensitive Keyword Analyser

Hi Guys,

Use Case: Field: Name
                 Data:  Jose ,
                           Jose Sam,
                            jose,
                            jose jacob,
                             jose ,
                                      joseph,
                                      josef ,
                             S. Jose,
                             B. jose
              etc.

There is a field (Name), I want to index this field.
I will be searching this field for a Wildcard query
e.g. jose*
This should return all names starting with jose.

Search: Jose* (should return all names starting with jose)

Solution:
1. Using Standard analyser.

Problem with Standard Analyser:
If I use Standard Analyser in addition to correct results it returns results
like S. Jose, B. jose
which do not start with Jose.


2. Using Keyword Analyser.
Problem with Keyword Analyser:
Keyword Analyser is case sensitive so it misses names like Jose, Jose Sam,
This happens becuase a search Jose* will be changed to jose* (all small
letters)



So is there any analyser available which will take care of such use case.
What I am searching is a Case insensitive Keyword Analyser.
 Or let me know if there is any other approach to handle this use case.


Thanks,
Jamir


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org