You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by srinir <sr...@nextag.com> on 2012/04/14 02:32:32 UTC

dynamic analyzer based on condition

Hi,

I want to pick different analyzers for the same field for different
languages. I can determine the language from a different field. I would have
different fieldTypes defined in my schema.xml such as text_en, text_de,
text_fr, etc where i specify which analyzer and filter to use during
indexing and query time. 

    <fieldType name="text_en" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.EnglishPossessiveFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.EnglishPossessiveFilterFactory"/>
      </analyzer>
    </fieldType>

but i would like to define the field dynamically. for e.g

if lang=="en"
<field name="description" type="text_en" indexed="true" stored="true"  />
else if lang=="de"
<field name="description" type="text_de" indexed="true" stored="true" />
...


Can I achieve this somehow ? If this approach cannot be done then i can just
create one field for every language. 

Thanks
Srini

--
View this message in context: http://lucene.472066.n3.nabble.com/dynamic-analyzer-based-on-condition-tp3909345p3909345.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: dynamic analyzer based on condition

Posted by Erick Erickson <er...@gmail.com>.
Before you start worrying about memory, do you have any proof at all
that memory is a problem? Are you expecting to have a lot of documents
in your index (as in multiple tens of millions)?

If you try to put multiple languages in a single field, the results will be
problematic for some set documents/queries, especially if you're mixing
widely disparate languages (think English and Chinese for instance).

I'd try the field per language option just to see if you need to go to a more
complex solution.

There is no penalty for empty fields in documents, so don't worry about
that.


Best
Erick

On Sun, Apr 15, 2012 at 3:40 PM, srinir <sr...@nextag.com> wrote:
> Hi Erick,
>
> Thanks a lot for your reply. I have around 10-15 searchable text fields (and
> 5-6 languages). If I create one per language will that increase the memory
> occupied by my index. Even though only one field will have a value at a
> time, will there be a case the empty fields in the index will occupy some
> memory ? will that happen if i enable field caching ?
>
>
> Thanks
> Srini
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/dynamic-analyzer-based-on-condition-tp3909345p3912605.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: dynamic analyzer based on condition

Posted by srinir <sr...@nextag.com>.
Hi Erick,

Thanks a lot for your reply. I have around 10-15 searchable text fields (and
5-6 languages). If I create one per language will that increase the memory
occupied by my index. Even though only one field will have a value at a
time, will there be a case the empty fields in the index will occupy some
memory ? will that happen if i enable field caching ?


Thanks
Srini

--
View this message in context: http://lucene.472066.n3.nabble.com/dynamic-analyzer-based-on-condition-tp3909345p3912605.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: dynamic analyzer based on condition

Posted by Erick Erickson <er...@gmail.com>.
You'll have to create a field per language...

The 3.6 example code has the fieldType
definitions for a lot of languages, that might
be a good place to start.

Best
Erick

On Fri, Apr 13, 2012 at 8:32 PM, srinir <sr...@nextag.com> wrote:
> Hi,
>
> I want to pick different analyzers for the same field for different
> languages. I can determine the language from a different field. I would have
> different fieldTypes defined in my schema.xml such as text_en, text_de,
> text_fr, etc where i specify which analyzer and filter to use during
> indexing and query time.
>
>    <fieldType name="text_en" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPossessiveFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPossessiveFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
> but i would like to define the field dynamically. for e.g
>
> if lang=="en"
> <field name="description" type="text_en" indexed="true" stored="true"  />
> else if lang=="de"
> <field name="description" type="text_de" indexed="true" stored="true" />
> ...
>
>
> Can I achieve this somehow ? If this approach cannot be done then i can just
> create one field for every language.
>
> Thanks
> Srini
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/dynamic-analyzer-based-on-condition-tp3909345p3909345.html
> Sent from the Solr - User mailing list archive at Nabble.com.