You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Sujatha Arun <su...@gmail.com> on 2008/12/18 12:25:57 UTC

Multi language search help

Hi,
I am prototyping lanuage search using solr 1.3 .I  have 3 fields in the
schema -id,content and language.

I am indexing 3 pdf files ,the languages are foroyo,chinese and japanese.

I use xpdf to convert the content of pdf to text and push the text to solr
in the content field.

What is the analyzer  that i need to use for the above.

By using the default text analyzer and posting this content to solr, i am
not getting any  results.

Does solr support stemming for the above languages.

Regards
Sujatha

Re: Multi language search help

Posted by Sujatha Arun <su...@gmail.com>.

Thanks Grant,

The requirement from the user end is to only search in that particular
language and not across languages.

Also going forward we will be adding more languages.

so if i have separate fields for each language ,then we need to change the
schema everytime and that will not scale very well.

So there are two options ,either use dynamic fields  or use multi core .

Please advice which is better in terms of scaling ,optimum use of existing
resources (available  ram which is abt 4GB for several instances of solr) .

If we use multicore ,will it degrade in terms of speed etc?

Any pointers will be helpful

Regards
Sujatha

On 12/19/08, Grant Ingersoll <gs...@apache.org> wrote:
>
>
> On Dec 18, 2008, at 6:25 AM, Sujatha Arun wrote:
>
> Hi,
>> I am prototyping lanuage search using solr 1.3 .I  have 3 fields in the
>> schema -id,content and language.
>>
>> I am indexing 3 pdf files ,the languages are foroyo,chinese and japanese.
>>
>> I use xpdf to convert the content of pdf to text and push the text to solr
>> in the content field.
>>
>> What is the analyzer  that i need to use for the above.
>>
>> By using the default text analyzer and posting this content to solr, i am
>> not getting any  results.
>>
>> Does solr support stemming for the above languages.
>>
>
> I'm not familiar with Foroyo, but there should be tokenizers/analysis
> available for Chines and Japanese.  Are you putting all three languages into
> the same field?  If that is the case, you will need some type of language
> detection piece that can choose the correct analyzer.
>
> How are your users searching?  That is, do you know the language they want
> to search in?  If so, then you can have a field for each language.
>
> -Grant
>
>

Re: Multi language search help

Posted by Grant Ingersoll <gs...@apache.org>.

On Dec 18, 2008, at 6:25 AM, Sujatha Arun wrote:

> Hi,
> I am prototyping lanuage search using solr 1.3 .I  have 3 fields in  
> the
> schema -id,content and language.
>
> I am indexing 3 pdf files ,the languages are foroyo,chinese and  
> japanese.
>
> I use xpdf to convert the content of pdf to text and push the text  
> to solr
> in the content field.
>
> What is the analyzer  that i need to use for the above.
>
> By using the default text analyzer and posting this content to solr,  
> i am
> not getting any  results.
>
> Does solr support stemming for the above languages.

I'm not familiar with Foroyo, but there should be tokenizers/analysis  
available for Chines and Japanese.  Are you putting all three  
languages into the same field?  If that is the case, you will need  
some type of language detection piece that can choose the correct  
analyzer.

How are your users searching?  That is, do you know the language they  
want to search in?  If so, then you can have a field for each language.

-Grant