You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ja...@nokia.com on 2010/11/22 15:58:41 UTC

passing arguments to analyzer/filter at runtime

Hi,

I’m trying to find a solution to search only in a given language.

On index time the language is known per string to be tokenized so I would like to write a filter that prefixes each token according to its language.
First question: how to pass the language argument to the filter best?

I’m going to use multivalued fields, and each value I put in that field has another language.
How do I pass several languages on to the filter best?

on search side it gets a bit trickier, here I do not know exactly the language of the input query but several possible. So instead of prefixing each token with one language code I need to prefix each token with every possible language code.
How do I pass parameters to the filter at query time?

I’m not using the URL variant I am using the SolrServer.query(SolrQuery) interface.

Jan

RE: passing arguments to analyzer/filter at runtime

Posted by ja...@nokia.com.
Hi,

yes this is one of my four options I am going to evaluate. Why your suggestion might be problematic:

We have ca. 12 language sensitive fields and support ca. 200 distinct languages = 2400 fields
a multifield/dismax query spanning 2400 fields might become problematic?

We will go for this approach as well, but we are not sure if it will be the best for roughly 20GB raw data with (due to the many languages and names) 100billions of separate tokens.

Is my approach possible?

Jan

-----Original Message-----
From: ext Markus Jelsma [mailto:markus.jelsma@openindex.io] 
Sent: Montag, 22. November 2010 16:10
To: solr-user@lucene.apache.org
Subject: Re: passing arguments to analyzer/filter at runtime

Hi,

I wouldn't use a multiValued field for this because you then you would have the 
same analyzers (and possibly stemmers) for different languages.

The usual method is to have fieldTypes for each language (en_text, de_text etc) 
and then create specific fields that map to them (en_content, de_content etc).

Since you know the language at index time, you can simply add the content to 
the proper LANG_content field.

Cheers,

On Monday 22 November 2010 15:58:41 jan.kurella@nokia.com wrote:
> Hi,
> 
> I’m trying to find a solution to search only in a given language.
> 
> On index time the language is known per string to be tokenized so I would
> like to write a filter that prefixes each token according to its language.
> First question: how to pass the language argument to the filter best?
> 
> I’m going to use multivalued fields, and each value I put in that field has
> another language. How do I pass several languages on to the filter best?
> 
> on search side it gets a bit trickier, here I do not know exactly the
> language of the input query but several possible. So instead of prefixing
> each token with one language code I need to prefix each token with every
> possible language code. How do I pass parameters to the filter at query
> time?
> 
> I’m not using the URL variant I am using the SolrServer.query(SolrQuery)
> interface.
> 
> Jan

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350

Re: passing arguments to analyzer/filter at runtime

Posted by Markus Jelsma <ma...@openindex.io>.
Hi,

I wouldn't use a multiValued field for this because you then you would have the 
same analyzers (and possibly stemmers) for different languages.

The usual method is to have fieldTypes for each language (en_text, de_text etc) 
and then create specific fields that map to them (en_content, de_content etc).

Since you know the language at index time, you can simply add the content to 
the proper LANG_content field.

Cheers,

On Monday 22 November 2010 15:58:41 jan.kurella@nokia.com wrote:
> Hi,
> 
> I’m trying to find a solution to search only in a given language.
> 
> On index time the language is known per string to be tokenized so I would
> like to write a filter that prefixes each token according to its language.
> First question: how to pass the language argument to the filter best?
> 
> I’m going to use multivalued fields, and each value I put in that field has
> another language. How do I pass several languages on to the filter best?
> 
> on search side it gets a bit trickier, here I do not know exactly the
> language of the input query but several possible. So instead of prefixing
> each token with one language code I need to prefix each token with every
> possible language code. How do I pass parameters to the filter at query
> time?
> 
> I’m not using the URL variant I am using the SolrServer.query(SolrQuery)
> interface.
> 
> Jan

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350