You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Chitra <ch...@gmail.com> on 2017/09/27 10:54:11 UTC

Accent insensitive search for greek characters

Hi,
                 In Lucene, I want to search greek characters(with accent
insensitive) by removing or replacing accent marks with similar characters.

Example: we are trying to convert  Greek Extended characters
<http://www.unicode.org/charts/PDF/U1F00.pdf> to basic Greek Unicode
<http://www.unicode.org/charts/PDF/U0370.pdf> for providing accent
insensitive search...


Kindly suggest the better solution to achieve this...? Does
ICUFoldingFilter solve my use-case?

-- 
Regards,
Chitra

Re: Accent insensitive search for greek characters

Posted by Mike Sokolov <ms...@gmail.com>.
These are only used in classical Greek I think, explaining probably why they are not covered by the simpler filter.

On September 27, 2017 9:48:37 AM EDT, Ahmet Arslan <io...@yahoo.com.INVALID> wrote:
>I may be wrong about ASCIIFoldingFilter. Please go with the
>ICUFoldingFilter.
>Ahmet
>On Wednesday, September 27, 2017, 3:47:01 PM GMT+3, Chitra
><ch...@gmail.com> wrote:  
> 
> Hi Ahmet,                      Thank you so much for the reply.
>
>I have tried but it seems, ASCIIFoldingFilter is not supporting greek
>accent characters and it supports only Latin like accent characters. Am
>I missing anything?
>
>
>
>Chitra
>
>
>
>On Wed, Sep 27, 2017 at 5:47 PM, Ahmet Arslan <io...@yahoo.com>
>wrote:
>
>
>
>Hi,
>Yes ICUFoldingFilter or ASCIIFoldingFilter could be used.
>ahmet 
>
> 
> 
>On Wednesday, September 27, 2017, 1:54:43 PM GMT+3, Chitra
><ch...@gmail.com> wrote: 
>
>
>
>
>
>Hi,
>                In Lucene, I want to search greek characters(with
>accent
>insensitive) by removing or replacing accent marks with similar
>characters.
>
>Example: we are trying to convert  Greek Extended characters
><http://www.unicode.org/ charts/PDF/U1F00.pdf> to basic Greek Unicode
><http://www.unicode.org/ charts/PDF/U0370.pdf> for providing accent
>insensitive search...
>
>
>Kindly suggest the better solution to achieve this...? Does
>ICUFoldingFilter solve my use-case?
>
>-- 
>Regards,
>Chitra
>
>
>
>
>
>-- 
>Regards,Chitra

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Accent insensitive search for greek characters

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
 I may be wrong about ASCIIFoldingFilter. Please go with the ICUFoldingFilter.
Ahmet
    On Wednesday, September 27, 2017, 3:47:01 PM GMT+3, Chitra <ch...@gmail.com> wrote:  
 
 Hi Ahmet,                      Thank you so much for the reply.

I have tried but it seems, ASCIIFoldingFilter is not supporting greek accent characters and it supports only Latin like accent characters. Am I missing anything?



Chitra



On Wed, Sep 27, 2017 at 5:47 PM, Ahmet Arslan <io...@yahoo.com> wrote:



Hi,
Yes ICUFoldingFilter or ASCIIFoldingFilter could be used.
ahmet 

 
 
 On Wednesday, September 27, 2017, 1:54:43 PM GMT+3, Chitra <ch...@gmail.com> wrote: 





Hi,
                In Lucene, I want to search greek characters(with accent
insensitive) by removing or replacing accent marks with similar characters.

Example: we are trying to convert  Greek Extended characters
<http://www.unicode.org/ charts/PDF/U1F00.pdf> to basic Greek Unicode
<http://www.unicode.org/ charts/PDF/U0370.pdf> for providing accent
insensitive search...


Kindly suggest the better solution to achieve this...? Does
ICUFoldingFilter solve my use-case?

-- 
Regards,
Chitra





-- 
Regards,Chitra

Re: Accent insensitive search for greek characters

Posted by Chitra <ch...@gmail.com>.
Hi Robert,
                   Thank you so much for the kind response and seems it's
working fine...

Could you please ensure whether the below one restricts to the greek region
alone?

UnicodeSet unicodeSet = new UnicodeSet().applyPattern("[:Greek:]");

Normalizer2 base = Normalizer2.getInstance(ICUFoldingFilter.class.
> getResourceAsStream("utr30.nrm"), "utr30", Normalizer2.Mode.COMPOSE);

Normalizer2 normalizeFilter = new FilteredNormalizer2(base, unicodeSet);
> TokenStream tok = new ICUNormalizer2Filter(tok, normalizeFilter);



Kindly help me to resolve this.


-- 
Regards,
Chitra

Re: Accent insensitive search for greek characters

Posted by Robert Muir <rc...@gmail.com>.
Your greek transform stuff does not work because you use "Lower"
instead of casefolding.

If ICUFoldingFilter works for what you want, but you want to restrict
it to greek, then just restrict it to the greek region. See
FilteredNormalizer2 and UnicodeSet documentation. And look at how
ICUFoldingFilter is implemented in source code so you understand how
to instantiate an equivalent ICUNormalizer2Filter just with the greek
restriction.

On Tue, Oct 24, 2017 at 8:16 AM, Chitra <ch...@gmail.com> wrote:
> Hi,
>                    ICUTransformFilter is working fine for greek characters
> alone as per requirement. but one case it's breaking( σ & ς are the lower
> forms of Σ Sigma).
>
> *Example:*
>
> I indexed the terms πελάτης (indexed as πελατης) & πελάτηΣ (indexed as
> πελατης).I get the expected search results if I perform the search for
> πελάτηΣ (or) πελάτης (or) any combinations of upper case & lower case Greek
> characters. But if I search as πελατησ I won't get any search results.
>
> In Greek, σ & ς are the lower forms of Σ Sigma. And this case is solved in
> ICUFoldingFilter.
>
>
> Is ICU Transliterator rule formed right? Kindly look at the below code
>
>
> TokenStream tok = new ICUTransformFilter(tok,
> Transliterator.getInstance("Greek;
>> Lower; NFD; [:Nonspacing Mark:] Remove; NFC;"));
>
>
>
> Kindly help me to resolve this.
>
>
> Regards,
> Chitra

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Accent insensitive search for greek characters

Posted by Chitra <ch...@gmail.com>.
Hi,
                   ICUTransformFilter is working fine for greek characters
alone as per requirement. but one case it's breaking( σ & ς are the lower
forms of Σ Sigma).

*Example:*

I indexed the terms πελάτης (indexed as πελατης) & πελάτηΣ (indexed as
πελατης).I get the expected search results if I perform the search for
πελάτηΣ (or) πελάτης (or) any combinations of upper case & lower case Greek
characters. But if I search as πελατησ I won't get any search results.

In Greek, σ & ς are the lower forms of Σ Sigma. And this case is solved in
ICUFoldingFilter.


Is ICU Transliterator rule formed right? Kindly look at the below code


TokenStream tok = new ICUTransformFilter(tok,
Transliterator.getInstance("Greek;
> Lower; NFD; [:Nonspacing Mark:] Remove; NFC;"));



Kindly help me to resolve this.


Regards,
Chitra

Re: Accent insensitive search for greek characters

Posted by Chitra <ch...@gmail.com>.
Hi all,

         Any help would be greatly appreciated.

-- 
Regards,
Chitra

Re: Accent insensitive search for greek characters

Posted by Chitra <ch...@gmail.com>.
Hi koji,
          I am not having knowledge of greek characters. so only I am
looking for standard rules to perform greek accent insensitive search.

Does ICUFoldingFilter solve my case? I have tried this already. Its working
fine for greek accent characters.

But this is not language specific... It has internalization support for all
languages. Here, I am not sure whether it will break my existing language
behavior in the index.


Is there any way to make ICUFoldingFilter as language specific?



Kindly post your suggestions.


-- 
Regards,
Chitra

Re: Accent insensitive search for greek characters

Posted by Koji Sekiguchi <ko...@rondhuit.com>.
Hi Chitra,

Without having the knowledge of the language, but can you solve the problem not in TokenFilter level 
but in CharFilter level, by setting your own mapping definition using MappingCharFilter?

Koji

On 2017/09/27 21:39, Chitra wrote:
> Hi Ahmet,
>                        Thank you so much for the reply.
> 
> I have tried but it seems, ASCIIFoldingFilter is not supporting greek
> accent characters and it supports only Latin like accent characters. Am I
> missing anything?
> 
> 
> 
> Chitra
> 
> 
> 
> 
> On Wed, Sep 27, 2017 at 5:47 PM, Ahmet Arslan <io...@yahoo.com> wrote:
> 
>>
>>
>> Hi,
>>
>> Yes ICUFoldingFilter or ASCIIFoldingFilter could be used.
>>
>> ahmet
>>
>>
>>
>>
>> On Wednesday, September 27, 2017, 1:54:43 PM GMT+3, Chitra <
>> chithu.r111@gmail.com> wrote:
>>
>>
>>
>>
>>
>> Hi,
>>                  In Lucene, I want to search greek characters(with accent
>> insensitive) by removing or replacing accent marks with similar characters.
>>
>> Example: we are trying to convert  Greek Extended characters
>> <http://www.unicode.org/charts/PDF/U1F00.pdf> to basic Greek Unicode
>> <http://www.unicode.org/charts/PDF/U0370.pdf> for providing accent
>> insensitive search...
>>
>>
>> Kindly suggest the better solution to achieve this...? Does
>> ICUFoldingFilter solve my use-case?
>>
>> --
>> Regards,
>> Chitra
>>
>>
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Accent insensitive search for greek characters

Posted by Chitra <ch...@gmail.com>.
Hi Ahmet,
                      Thank you so much for the reply.

I have tried but it seems, ASCIIFoldingFilter is not supporting greek
accent characters and it supports only Latin like accent characters. Am I
missing anything?



Chitra




On Wed, Sep 27, 2017 at 5:47 PM, Ahmet Arslan <io...@yahoo.com> wrote:

>
>
> Hi,
>
> Yes ICUFoldingFilter or ASCIIFoldingFilter could be used.
>
> ahmet
>
>
>
>
> On Wednesday, September 27, 2017, 1:54:43 PM GMT+3, Chitra <
> chithu.r111@gmail.com> wrote:
>
>
>
>
>
> Hi,
>                 In Lucene, I want to search greek characters(with accent
> insensitive) by removing or replacing accent marks with similar characters.
>
> Example: we are trying to convert  Greek Extended characters
> <http://www.unicode.org/charts/PDF/U1F00.pdf> to basic Greek Unicode
> <http://www.unicode.org/charts/PDF/U0370.pdf> for providing accent
> insensitive search...
>
>
> Kindly suggest the better solution to achieve this...? Does
> ICUFoldingFilter solve my use-case?
>
> --
> Regards,
> Chitra
>
>


-- 
Regards,
Chitra

Re: Accent insensitive search for greek characters

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.

Hi,
Yes ICUFoldingFilter or ASCIIFoldingFilter could be used.
ahmet 

 
 
 On Wednesday, September 27, 2017, 1:54:43 PM GMT+3, Chitra <ch...@gmail.com> wrote: 





Hi,
                In Lucene, I want to search greek characters(with accent
insensitive) by removing or replacing accent marks with similar characters.

Example: we are trying to convert  Greek Extended characters
<http://www.unicode.org/charts/PDF/U1F00.pdf> to basic Greek Unicode
<http://www.unicode.org/charts/PDF/U0370.pdf> for providing accent
insensitive search...


Kindly suggest the better solution to achieve this...? Does
ICUFoldingFilter solve my use-case?

-- 
Regards,
Chitra