You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by srinir <sr...@nextag.com> on 2012/04/14 02:32:32 UTC
dynamic analyzer based on condition
Hi,
I want to pick different analyzers for the same field for different
languages. I can determine the language from a different field. I would have
different fieldTypes defined in my schema.xml such as text_en, text_de,
text_fr, etc where i specify which analyzer and filter to use during
indexing and query time.
<fieldType name="text_en" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
</analyzer>
</fieldType>
but i would like to define the field dynamically. for e.g
if lang=="en"
<field name="description" type="text_en" indexed="true" stored="true" />
else if lang=="de"
<field name="description" type="text_de" indexed="true" stored="true" />
...
Can I achieve this somehow ? If this approach cannot be done then i can just
create one field for every language.
Thanks
Srini
--
View this message in context: http://lucene.472066.n3.nabble.com/dynamic-analyzer-based-on-condition-tp3909345p3909345.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: dynamic analyzer based on condition
Posted by Erick Erickson <er...@gmail.com>.
Before you start worrying about memory, do you have any proof at all
that memory is a problem? Are you expecting to have a lot of documents
in your index (as in multiple tens of millions)?
If you try to put multiple languages in a single field, the results will be
problematic for some set documents/queries, especially if you're mixing
widely disparate languages (think English and Chinese for instance).
I'd try the field per language option just to see if you need to go to a more
complex solution.
There is no penalty for empty fields in documents, so don't worry about
that.
Best
Erick
On Sun, Apr 15, 2012 at 3:40 PM, srinir <sr...@nextag.com> wrote:
> Hi Erick,
>
> Thanks a lot for your reply. I have around 10-15 searchable text fields (and
> 5-6 languages). If I create one per language will that increase the memory
> occupied by my index. Even though only one field will have a value at a
> time, will there be a case the empty fields in the index will occupy some
> memory ? will that happen if i enable field caching ?
>
>
> Thanks
> Srini
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/dynamic-analyzer-based-on-condition-tp3909345p3912605.html
> Sent from the Solr - User mailing list archive at Nabble.com.
Re: dynamic analyzer based on condition
Posted by srinir <sr...@nextag.com>.
Hi Erick,
Thanks a lot for your reply. I have around 10-15 searchable text fields (and
5-6 languages). If I create one per language will that increase the memory
occupied by my index. Even though only one field will have a value at a
time, will there be a case the empty fields in the index will occupy some
memory ? will that happen if i enable field caching ?
Thanks
Srini
--
View this message in context: http://lucene.472066.n3.nabble.com/dynamic-analyzer-based-on-condition-tp3909345p3912605.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: dynamic analyzer based on condition
Posted by Erick Erickson <er...@gmail.com>.
You'll have to create a field per language...
The 3.6 example code has the fieldType
definitions for a lot of languages, that might
be a good place to start.
Best
Erick
On Fri, Apr 13, 2012 at 8:32 PM, srinir <sr...@nextag.com> wrote:
> Hi,
>
> I want to pick different analyzers for the same field for different
> languages. I can determine the language from a different field. I would have
> different fieldTypes defined in my schema.xml such as text_en, text_de,
> text_fr, etc where i specify which analyzer and filter to use during
> indexing and query time.
>
> <fieldType name="text_en" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EnglishPossessiveFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EnglishPossessiveFilterFactory"/>
> </analyzer>
> </fieldType>
>
> but i would like to define the field dynamically. for e.g
>
> if lang=="en"
> <field name="description" type="text_en" indexed="true" stored="true" />
> else if lang=="de"
> <field name="description" type="text_de" indexed="true" stored="true" />
> ...
>
>
> Can I achieve this somehow ? If this approach cannot be done then i can just
> create one field for every language.
>
> Thanks
> Srini
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/dynamic-analyzer-based-on-condition-tp3909345p3909345.html
> Sent from the Solr - User mailing list archive at Nabble.com.