You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by aniljayanti <an...@yahoo.co.in> on 2013/12/03 07:12:32 UTC

Indexing Multiple Languages with solr (Arabic & English)

Hi,

I am working on solr for using searching by indexing with "text_general" for
"ENGLISH" language. Search is working fine. Now I have a Arabic text, which
needs to indexing and searching. Below is my basic config for English.* Same
field contains "ENGLISH" and "ARABIC" text in database*. Please guide me in
this.

<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

I saw below configs in schema.xml file for Arabic language. 

 
    <fieldType name="text_ar" class="solr.TextField"
positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_ar.txt" enablePositionIncrements="true"/>
        
        <filter class="solr.ArabicNormalizationFilterFactory"/>
        <filter class="solr.ArabicStemFilterFactory"/>
      </analyzer>
    </fieldType>

Please suggest me to configure Arabic indexing and searching.

Thanks in Advance,

AnilJayanti




--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Multiple-Languages-with-solr-Arabic-English-tp4104580.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing Multiple Languages with solr (Arabic & English)

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
It's just a text type. So, just declare another field and instead of
text_general or text_en, use text_ar. Then use copyField from source text
field to it.

Go through the tutorial, if you haven't yet. It explains some of the things.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Dec 3, 2013 at 3:12 PM, aniljayanti <an...@yahoo.co.in> wrote:

> Hi,
>
> Thanks for ur post,
>
> I donot know how to use "text_ar" fieldtype for Arabic language. What are
> the configurations need to add in schema.xml file ? Please guide me.
>
>
> AnilJayanti
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-Multiple-Languages-with-solr-Arabic-English-tp4104580p4104613.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Indexing Multiple Languages with solr (Arabic & English)

Posted by aniljayanti <an...@yahoo.co.in>.
Hi,

Thanks for ur post,

I donot know how to use "text_ar" fieldtype for Arabic language. What are
the configurations need to add in schema.xml file ? Please guide me.


AnilJayanti



--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Multiple-Languages-with-solr-Arabic-English-tp4104580p4104613.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing Multiple Languages with solr (Arabic & English)

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Have you tried doing copyField to replicate the content and have one field
indexed as English text type and another with the same content as Arabic
text type. Then, doing the search against both using edismax or similar.

That's one approach to this. Just because it is in one field in the
database, it does not have to be in one field in Solr. And you don't have
to 'store' both copies if you just want to search them.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Dec 3, 2013 at 1:12 PM, aniljayanti <an...@yahoo.co.in> wrote:

> Hi,
>
> I am working on solr for using searching by indexing with "text_general"
> for
> "ENGLISH" language. Search is working fine. Now I have a Arabic text, which
> needs to indexing and searching. Below is my basic config for English.*
> Same
> field contains "ENGLISH" and "ARABIC" text in database*. Please guide me in
> this.
>
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> I saw below configs in schema.xml file for Arabic language.
>
>
>     <fieldType name="text_ar" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_ar.txt" enablePositionIncrements="true"/>
>
>         <filter class="solr.ArabicNormalizationFilterFactory"/>
>         <filter class="solr.ArabicStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> Please suggest me to configure Arabic indexing and searching.
>
> Thanks in Advance,
>
> AnilJayanti
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-Multiple-Languages-with-solr-Arabic-English-tp4104580.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>