You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Chitra <ch...@gmail.com> on 2017/10/06 10:06:55 UTC

Re: Accent insensitive search for greek characters

Hi koji,
          I am not having knowledge of greek characters. so only I am
looking for standard rules to perform greek accent insensitive search.

Does ICUFoldingFilter solve my case? I have tried this already. Its working
fine for greek accent characters.

But this is not language specific... It has internalization support for all
languages. Here, I am not sure whether it will break my existing language
behavior in the index.


Is there any way to make ICUFoldingFilter as language specific?



Kindly post your suggestions.


-- 
Regards,
Chitra

Re: Accent insensitive search for greek characters

Posted by Chitra <ch...@gmail.com>.
Hi Robert,
                   Thank you so much for the kind response and seems it's
working fine...

Could you please ensure whether the below one restricts to the greek region
alone?

UnicodeSet unicodeSet = new UnicodeSet().applyPattern("[:Greek:]");

Normalizer2 base = Normalizer2.getInstance(ICUFoldingFilter.class.
> getResourceAsStream("utr30.nrm"), "utr30", Normalizer2.Mode.COMPOSE);

Normalizer2 normalizeFilter = new FilteredNormalizer2(base, unicodeSet);
> TokenStream tok = new ICUNormalizer2Filter(tok, normalizeFilter);



Kindly help me to resolve this.


-- 
Regards,
Chitra

Re: Accent insensitive search for greek characters

Posted by Robert Muir <rc...@gmail.com>.
Your greek transform stuff does not work because you use "Lower"
instead of casefolding.

If ICUFoldingFilter works for what you want, but you want to restrict
it to greek, then just restrict it to the greek region. See
FilteredNormalizer2 and UnicodeSet documentation. And look at how
ICUFoldingFilter is implemented in source code so you understand how
to instantiate an equivalent ICUNormalizer2Filter just with the greek
restriction.

On Tue, Oct 24, 2017 at 8:16 AM, Chitra <ch...@gmail.com> wrote:
> Hi,
>                    ICUTransformFilter is working fine for greek characters
> alone as per requirement. but one case it's breaking( σ & ς are the lower
> forms of Σ Sigma).
>
> *Example:*
>
> I indexed the terms πελάτης (indexed as πελατης) & πελάτηΣ (indexed as
> πελατης).I get the expected search results if I perform the search for
> πελάτηΣ (or) πελάτης (or) any combinations of upper case & lower case Greek
> characters. But if I search as πελατησ I won't get any search results.
>
> In Greek, σ & ς are the lower forms of Σ Sigma. And this case is solved in
> ICUFoldingFilter.
>
>
> Is ICU Transliterator rule formed right? Kindly look at the below code
>
>
> TokenStream tok = new ICUTransformFilter(tok,
> Transliterator.getInstance("Greek;
>> Lower; NFD; [:Nonspacing Mark:] Remove; NFC;"));
>
>
>
> Kindly help me to resolve this.
>
>
> Regards,
> Chitra

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Accent insensitive search for greek characters

Posted by Chitra <ch...@gmail.com>.
Hi,
                   ICUTransformFilter is working fine for greek characters
alone as per requirement. but one case it's breaking( σ & ς are the lower
forms of Σ Sigma).

*Example:*

I indexed the terms πελάτης (indexed as πελατης) & πελάτηΣ (indexed as
πελατης).I get the expected search results if I perform the search for
πελάτηΣ (or) πελάτης (or) any combinations of upper case & lower case Greek
characters. But if I search as πελατησ I won't get any search results.

In Greek, σ & ς are the lower forms of Σ Sigma. And this case is solved in
ICUFoldingFilter.


Is ICU Transliterator rule formed right? Kindly look at the below code


TokenStream tok = new ICUTransformFilter(tok,
Transliterator.getInstance("Greek;
> Lower; NFD; [:Nonspacing Mark:] Remove; NFC;"));



Kindly help me to resolve this.


Regards,
Chitra

Re: Accent insensitive search for greek characters

Posted by Chitra <ch...@gmail.com>.
Hi all,

         Any help would be greatly appreciated.

-- 
Regards,
Chitra