You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Floyd Wu <fl...@gmail.com> on 2011/10/20 11:43:35 UTC
Does anybody has experience in Chinese soundex(sounds like) of SOLR?
Hi there,
There are many English soundex implementation can be referenced, but I
wonder how to do Chinese soundex(sounds like) filter (maybe).
any idea?
Floyd
Re: Does anybody has experience in Chinese soundex(sounds like) of SOLR?
Posted by Paul Libbrecht <pa...@hoplahup.net>.
Wouldn't the conversion to a western writing followed by Soundex or Metaphone be the right thing to try?
I thought such conversions were mainstream.
paul
Le 20 oct. 2011 à 12:16, Otis Gospodnetic a écrit :
> Hi,
>
> Wow, interesting question. Can soundex even be applied to a language like Chinese, which is tonal and doesn't have individual letters, but whole characters? I'm no expert, but intuitively speaking it sounds hard or maybe even impossible...
>
> Otis
> ----
>
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>> ________________________________
>> From: Floyd Wu <fl...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, October 20, 2011 5:43 AM
>> Subject: Does anybody has experience in Chinese soundex(sounds like) of SOLR?
>>
>> Hi there,
>>
>> There are many English soundex implementation can be referenced, but I
>> wonder how to do Chinese soundex(sounds like) filter (maybe).
>>
>> any idea?
>>
>> Floyd
>>
>>
Re: Does anybody has experience in Chinese soundex(sounds like) of SOLR?
Posted by Floyd Wu <fl...@gmail.com>.
Hi Ken,
Indeed, I want to support function like phonetic (pinyin or zhuyin)
search, not soundex (sorry and thanks correct me).
any further idea?
Floyd
2011/10/20 Ken Krugler <kk...@transpac.com>:
>> Wow, interesting question. Can soundex even be applied to a language like Chinese, which is tonal and doesn't have individual letters, but whole characters? I'm no expert, but intuitively speaking it sounds hard or maybe even impossible...
>
> The only two cases I can think of are:
>
> - Cases where you have two (or more) characters that are variant forms. Unicode tried to unify all of these, but some still exist. And in GB 18030 there are tons.
>
> - If you wanted to support phonetic (pinyin or zhuyin) search, then you might want to collapse syllables that are commonly confused. But then of course you'd have to be storing the phonetic forms for all of the words.
>
> -- Ken
>
>
>>> From: Floyd Wu <fl...@gmail.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Thursday, October 20, 2011 5:43 AM
>>> Subject: Does anybody has experience in Chinese soundex(sounds like) of SOLR?
>>>
>>> Hi there,
>>>
>>> There are many English soundex implementation can be referenced, but I
>>> wonder how to do Chinese soundex(sounds like) filter (maybe).
>>>
>>> any idea?
>>>
>>> Floyd
>>>
>>>
>>>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr
>
>
>
>
Re: Does anybody has experience in Chinese soundex(sounds like) of SOLR?
Posted by Ken Krugler <kk...@transpac.com>.
> Wow, interesting question. Can soundex even be applied to a language like Chinese, which is tonal and doesn't have individual letters, but whole characters? I'm no expert, but intuitively speaking it sounds hard or maybe even impossible...
The only two cases I can think of are:
- Cases where you have two (or more) characters that are variant forms. Unicode tried to unify all of these, but some still exist. And in GB 18030 there are tons.
- If you wanted to support phonetic (pinyin or zhuyin) search, then you might want to collapse syllables that are commonly confused. But then of course you'd have to be storing the phonetic forms for all of the words.
-- Ken
>> From: Floyd Wu <fl...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, October 20, 2011 5:43 AM
>> Subject: Does anybody has experience in Chinese soundex(sounds like) of SOLR?
>>
>> Hi there,
>>
>> There are many English soundex implementation can be referenced, but I
>> wonder how to do Chinese soundex(sounds like) filter (maybe).
>>
>> any idea?
>>
>> Floyd
>>
>>
>>
--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr
Re: Does anybody has experience in Chinese soundex(sounds like) of SOLR?
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,
Wow, interesting question. Can soundex even be applied to a language like Chinese, which is tonal and doesn't have individual letters, but whole characters? I'm no expert, but intuitively speaking it sounds hard or maybe even impossible...
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
>________________________________
>From: Floyd Wu <fl...@gmail.com>
>To: solr-user@lucene.apache.org
>Sent: Thursday, October 20, 2011 5:43 AM
>Subject: Does anybody has experience in Chinese soundex(sounds like) of SOLR?
>
>Hi there,
>
>There are many English soundex implementation can be referenced, but I
>wonder how to do Chinese soundex(sounds like) filter (maybe).
>
>any idea?
>
>Floyd
>
>
>