You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2008/12/29 19:30:06 UTC

Re: Multiple language support

Hi,

The problem is that a single document (and even a field in your case) is multilingual.  Ideally you'd detect different languages within a document and apply a different tokenizer/filter to different parts of the field.  So the first part would be handled as EN, and the second part as Chinese.  At search time you would have to find the language of the query one way or the other, and again apply the appropriate analyzer.  If the right analyzer is applied, you could match even this multilingual field.  None of the existing Analyzers/tokenizers/filters are capable of handling a single piece of text in multiple languages, so you will have to create a custom analyzer that is smart enough to do that.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: "Deshpande, Mukta" <mu...@ptc.com>
> To: solr-user@lucene.apache.org
> Sent: Monday, December 29, 2008 4:52:19 AM
> Subject: Multiple language support 
> 
> Hi All,
> 
> I have a multiple language supporting schema in which there is a separate field 
> for every language.
> 
> I have a field "product_name" to store product name and its description that can 
> be in any user preferred language. 
> This can be stored in fields product_name_EN if user prefers English language, 
> product_name_SCH if user prefers Simplified Chinese language.
> The WhitespaceTokenizerFactory and filter EnglishPorterFilterFactory are applied 
> on product_name_EN.
> The CJKAnalyzer and CJKTokenizer are applied on product_name_SCH.
> 
> e.g. Value can be : ElectrolyticCapacitor - 被对立的电容器以价值220µF
> 
> Now my problem is: Which field do I store the above value?
> product_name_EN OR product_name_SCH OR should it be something else?
> 
> How do I find out which analyzers should get applied for this field.
> 
> Did any one face a similar situation before. 
> Please help ASAP.
> 
> Thanks,
> ~Mukta