You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Floyd Wu <fl...@gmail.com> on 2011/10/20 11:43:35 UTC

Does anybody has experience in Chinese soundex(sounds like) of SOLR?

Hi  there,

There are many English soundex implementation can be referenced, but I
wonder how to do Chinese soundex(sounds like) filter (maybe).

any idea?

Floyd

Re: Does anybody has experience in Chinese soundex(sounds like) of SOLR?

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Wouldn't the conversion to a western writing followed by Soundex or Metaphone be the right thing to try?

I thought such conversions were mainstream.

paul



Le 20 oct. 2011 à 12:16, Otis Gospodnetic a écrit :

> Hi,
> 
> Wow, interesting question.  Can soundex even be applied to a language like Chinese, which is tonal and doesn't have individual letters, but whole characters?  I'm no expert, but intuitively speaking it sounds hard or maybe even impossible...  
> 
> Otis
> ----
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
>> ________________________________
>> From: Floyd Wu <fl...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, October 20, 2011 5:43 AM
>> Subject: Does anybody has experience in Chinese soundex(sounds like) of SOLR?
>> 
>> Hi  there,
>> 
>> There are many English soundex implementation can be referenced, but I
>> wonder how to do Chinese soundex(sounds like) filter (maybe).
>> 
>> any idea?
>> 
>> Floyd
>> 
>> 


Re: Does anybody has experience in Chinese soundex(sounds like) of SOLR?

Posted by Floyd Wu <fl...@gmail.com>.
Hi Ken,

Indeed, I want to support function like phonetic (pinyin or zhuyin)
search, not soundex (sorry and thanks correct me).

any further idea?

Floyd


2011/10/20 Ken Krugler <kk...@transpac.com>:
>> Wow, interesting question.  Can soundex even be applied to a language like Chinese, which is tonal and doesn't have individual letters, but whole characters?  I'm no expert, but intuitively speaking it sounds hard or maybe even impossible...
>
> The only two cases I can think of are:
>
>  - Cases where you have two (or more) characters that are variant forms. Unicode tried to unify all of these, but some still exist. And in GB 18030 there are tons.
>
>  - If you wanted to support phonetic (pinyin or zhuyin) search, then you might want to collapse syllables that are commonly confused. But then of course you'd have to be storing the phonetic forms for all of the words.
>
> -- Ken
>
>
>>> From: Floyd Wu <fl...@gmail.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Thursday, October 20, 2011 5:43 AM
>>> Subject: Does anybody has experience in Chinese soundex(sounds like) of SOLR?
>>>
>>> Hi  there,
>>>
>>> There are many English soundex implementation can be referenced, but I
>>> wonder how to do Chinese soundex(sounds like) filter (maybe).
>>>
>>> any idea?
>>>
>>> Floyd
>>>
>>>
>>>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr
>
>
>
>

Re: Does anybody has experience in Chinese soundex(sounds like) of SOLR?

Posted by Ken Krugler <kk...@transpac.com>.
> Wow, interesting question.  Can soundex even be applied to a language like Chinese, which is tonal and doesn't have individual letters, but whole characters?  I'm no expert, but intuitively speaking it sounds hard or maybe even impossible...  

The only two cases I can think of are:

 - Cases where you have two (or more) characters that are variant forms. Unicode tried to unify all of these, but some still exist. And in GB 18030 there are tons.

 - If you wanted to support phonetic (pinyin or zhuyin) search, then you might want to collapse syllables that are commonly confused. But then of course you'd have to be storing the phonetic forms for all of the words.

-- Ken


>> From: Floyd Wu <fl...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, October 20, 2011 5:43 AM
>> Subject: Does anybody has experience in Chinese soundex(sounds like) of SOLR?
>> 
>> Hi  there,
>> 
>> There are many English soundex implementation can be referenced, but I
>> wonder how to do Chinese soundex(sounds like) filter (maybe).
>> 
>> any idea?
>> 
>> Floyd
>> 
>> 
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr




Re: Does anybody has experience in Chinese soundex(sounds like) of SOLR?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,

Wow, interesting question.  Can soundex even be applied to a language like Chinese, which is tonal and doesn't have individual letters, but whole characters?  I'm no expert, but intuitively speaking it sounds hard or maybe even impossible...  

Otis
----

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: Floyd Wu <fl...@gmail.com>
>To: solr-user@lucene.apache.org
>Sent: Thursday, October 20, 2011 5:43 AM
>Subject: Does anybody has experience in Chinese soundex(sounds like) of SOLR?
>
>Hi  there,
>
>There are many English soundex implementation can be referenced, but I
>wonder how to do Chinese soundex(sounds like) filter (maybe).
>
>any idea?
>
>Floyd
>
>
>