You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eyal Naamati <Ey...@exlibrisgroup.com> on 2015/03/29 07:52:28 UTC

Korean script conversion

Hi,

We are starting to index records in Korean. Korean text can be written in two scripts: Han characters (Chinese) and Hangul characters (Korean).
We are looking for some solr filter or another built in solr component that converts between Han and Hangul characters (transliteration).
I know there is the ICUTransformFilterFactory that can convert between Japanese or chinese scripts, for example:
<filter class="solr.ICUTransformFilterFactory" id="Katakana- Hiragana"/> for Japanese script conversions
So far I couldn't find anything readymade for Korean scripts, but perhaps someone knows of one?

Thanks!
Eyal Naamati
Alma Developer
Tel: +972-2-6499313
Mobile: +972-547915255
Eyal.Naamati@exlibrisgroup.com<ma...@exlibrisgroup.com>
[Description: Description: Description: Description: C://signature/exlibris.jpg]
www.exlibrisgroup.com<http://www.exlibrisgroup.com/>


RE: Korean script conversion

Posted by Eyal Naamati <Ey...@exlibrisgroup.com>.
We only want the conversion Hanja->Hangul, for each Hanja character there exists only one Hangul character that can replace it in a Korean text.
The other way around is not convertible. 
We want to allow searching in both scripts and find matches in both scripts.
 Thanks

Eyal Naamati
Alma Developer
Tel: +972-2-6499313
Mobile: +972-547915255
Eyal.Naamati@exlibrisgroup.com

www.exlibrisgroup.com

-----Original Message-----
From: Benson Margulies [mailto:bimargulies@gmail.com] 
Sent: Monday, March 30, 2015 1:58 PM
To: solr-user
Subject: Re: Korean script conversion

Why do you think that this is a good idea? Hanja are used for special purposes; they are not trivally convertable to Hanjul due to ambiguity, and it's not at all clear that a typical search user wants to treat them as equivalent.

On Sun, Mar 29, 2015 at 1:52 AM, Eyal Naamati < Eyal.Naamati@exlibrisgroup.com> wrote:

>  Hi,
>
>
>
> We are starting to index records in Korean. Korean text can be written 
> in two scripts: Han characters (Chinese) and Hangul characters (Korean).
>
> We are looking for some solr filter or another built in solr component 
> that converts between Han and Hangul characters (transliteration).
>
> I know there is the ICUTransformFilterFactory that can convert between 
> Japanese or chinese scripts, for example:
>
> <filter class=*"solr.ICUTransformFilterFactory"* id=*"Katakana- 
> Hiragana"* /> for Japanese script conversions
>
> So far I couldn't find anything readymade for Korean scripts, but 
> perhaps someone knows of one?
>
>
>
> Thanks!
>
> Eyal Naamati
> Alma Developer
> Tel: +972-2-6499313
> Mobile: +972-547915255
> Eyal.Naamati@exlibrisgroup.com
> [image: Description: Description: Description: Description:
> C://signature/exlibris.jpg]
> www.exlibrisgroup.com
>
>
>

Re: Korean script conversion

Posted by Benson Margulies <bi...@gmail.com>.
Why do you think that this is a good idea? Hanja are used for special
purposes; they are not trivally convertable to Hanjul due to ambiguity, and
it's not at all clear that a typical search user wants to treat them as
equivalent.

On Sun, Mar 29, 2015 at 1:52 AM, Eyal Naamati <
Eyal.Naamati@exlibrisgroup.com> wrote:

>  Hi,
>
>
>
> We are starting to index records in Korean. Korean text can be written in
> two scripts: Han characters (Chinese) and Hangul characters (Korean).
>
> We are looking for some solr filter or another built in solr component
> that converts between Han and Hangul characters (transliteration).
>
> I know there is the ICUTransformFilterFactory that can convert between
> Japanese or chinese scripts, for example:
>
> <filter class=*"solr.ICUTransformFilterFactory"* id=*"Katakana- Hiragana"*
> /> for Japanese script conversions
>
> So far I couldn't find anything readymade for Korean scripts, but perhaps
> someone knows of one?
>
>
>
> Thanks!
>
> Eyal Naamati
> Alma Developer
> Tel: +972-2-6499313
> Mobile: +972-547915255
> Eyal.Naamati@exlibrisgroup.com
> [image: Description: Description: Description: Description:
> C://signature/exlibris.jpg]
> www.exlibrisgroup.com
>
>
>

RE: Korean script conversion

Posted by Eyal Naamati <Ey...@exlibrisgroup.com>.
Trying again since I don't have an answer yet.
Thanks!

Eyal Naamati
Alma Developer
Tel: +972-2-6499313
Mobile: +972-547915255
Eyal.Naamati@exlibrisgroup.com<ma...@exlibrisgroup.com>
[Description: Description: Description: Description: C://signature/exlibris.jpg]
www.exlibrisgroup.com<http://www.exlibrisgroup.com/>

From: Eyal Naamati
Sent: Sunday, March 29, 2015 7:52 AM
To: solr-user@lucene.apache.org
Subject: Korean script conversion

Hi,

We are starting to index records in Korean. Korean text can be written in two scripts: Han characters (Chinese) and Hangul characters (Korean).
We are looking for some solr filter or another built in solr component that converts between Han and Hangul characters (transliteration).
I know there is the ICUTransformFilterFactory that can convert between Japanese or chinese scripts, for example:
<filter class="solr.ICUTransformFilterFactory" id="Katakana- Hiragana"/> for Japanese script conversions
So far I couldn't find anything readymade for Korean scripts, but perhaps someone knows of one?

Thanks!
Eyal Naamati
Alma Developer
Tel: +972-2-6499313
Mobile: +972-547915255
Eyal.Naamati@exlibrisgroup.com<ma...@exlibrisgroup.com>
[Description: Description: Description: Description: C://signature/exlibris.jpg]
www.exlibrisgroup.com<http://www.exlibrisgroup.com/>