You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Dilip.TS" <di...@starmarksv.com> on 2007/11/12 07:56:15 UTC

I18N with SOLR?

Hello,
   
  Does SOLR supports I18N (with multiple language support) ?
  Thanks in advance.

Regards,
Dilip TS


Re: I18N with SOLR?

Posted by Ed Summers <eh...@pobox.com>.
I'd say yes. Solr supports Unicode and ships with language specific
analyzers, and allows you to provide your own custom analyzers if you
need them. This allows you to create different <fieldType> definitions
for the languages you want to support. For example here is an example
field type for French text which uses a French stopword list and
French stemming.

    <fieldType
      name="text_french"
      class="solr.TextField" >
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter
          class="solr.FrenchStopFilterFactory"
          ignoreCase="true"
          words="stopwords_french.txt" />
        <filter
          class="solr.FrenchPorterFilterFactory"
          protected="protwords_french.txt" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
      </analyzer>
    </fieldType>

Then you can create a <dynamicField> definitions that allow you to
index and query your documents using the correct field type:

    <dynamicField
      name="*_french"
      type="text_french"
      indexed="true"
      stored="true"/>

This means that when you index you need to know what language your
data is in so that you know what field names to use in your document
(e.g. title_french). And at search time you need to know what language
you are in so you know which fields to search.  Most user interfaces
are in a single language context so from the query perspective you'll
most likely know the language they want to search in. If you don't
know the language context in either case you could try to guess using
something like org.apache.nutch.analysis.lang.LanguageIdentifier.

I hope this helps. We used this technique (without the guessing) quite
effectively at the Library of Congress recently for a prototype
application that needed to provide search functionality in 7 different
languages.

//Ed

On Nov 12, 2007 1:56 AM, Dilip.TS <di...@starmarksv.com> wrote:
> Hello,
>
>   Does SOLR supports I18N (with multiple language support) ?
>   Thanks in advance.
>
> Regards,
> Dilip TS
>
>