You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tobias Dittrich <di...@wave-computer.de> on 2009/03/16 14:13:58 UTC

Query multiple fields using configured analyzer stack

Hi,

how can I query multiple fields in such way that for each of 
the fields the configured analyzer stack with Tokenizer is 
used for the whole query string?

I have several fields in my schema that use ShingleFilter 
and/or WordDelimiterFilter and other stuff. But when I 
search for example for "blue tooth" (without the quotes) the 
query is parsed to +name:blue +name:tooth which is not what 
I expected. The search for "blue-tooth" on the other hand 
yields the expected query: name:blue-tooth name:blue 
name:tooth name:bluetooth. Just it is only for one field 
instead of many.

This is when using the LuceneQParser. Using DisMax gives 
almost exactly what I want when searching for "blue-tooth" 
but gives even more strange results for "blue tooth".

Is there an existing parser or plugin that can do this? Or 
maybe do I just have to rewrite my config a bit? Any 
comments are welcome.

Thanks in advance
Tobi

P.S.: I asked something similar in an earlier post but in 
the meantime I spent a lot of time thinking about what my 
actual problem is and came up with a different view of things...


Re: Query multiple fields using configured analyzer stack

Posted by Tobias Dittrich <di...@wave-computer.de>.
Hi Steve,

thanks for your quick response. Quoting the string really is 
not a good idea in this case. And it does not what I need 
anyway since the query is converted into a PhraseQuery and 
treated differently.

But thanks for pointing me to the FieldQParserPlugin. Yet I 
seem not to get it to work properly. I registered it as 
plugin in my solrconfig.xml like this:

<queryParser name="field" 
class="org.apache.solr.search.FieldQParserPlugin"/>

But when I send a query I get the following results (solrj 
debug output):

rawquerystring -> {!field f=name}blue tooth
querystring -> {!field f=name}blue tooth
parsedquery -> name:blue name:tooth
parsedquery_toString -> name:blue name:tooth

But I'd expect it to be like name:(blue tooth) name:blue 
name:tooth name:bluetooth

Here is what my schema.xml looks for name:

<!-- normal german text -->
<fieldType name="text_de" class="solr.TextField" 
positionIncrementGap="100" omitNorms="true">
   <analyzer type="index">
	[..]
   </analyzer>
   <analyzer type="query">
     <tokenizer class="solr.WhitespaceTokenizerFactory" />
     <filter class="solr.WordDelimiterFilterFactory" 
splitOnCaseChange="1" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="1" preserveOriginal="1" />
     <filter class="solr.PositionFilterFactory" />
     <filter class="solr.StopFilterFactory" 
words="stopwords_de.txt" ignoreCase="true" />
     <filter class="solr.LowerCaseFilterFactory" />
     <filter class="solr.SnowballPorterFilterFactory" 
language="German" />
     <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
   </analyzer>
</fieldType>


<field name="name" type="text_de" indexed="true" 
stored="true" />

Is there anything else I have to configure?

Thanks
Tobi


Steven A Rowe schrieb:
> Hi Tobi,
> 
> On 3/16/2009 at 9:14 AM, Tobias Dittrich wrote:
>> how can I query multiple fields in such way that for each of
>> the fields the configured analyzer stack with Tokenizer is
>> used for the whole query string?
> 
> Lucene's QueryParser (and AFAIK, Solr's QPs too) first break queries on whitespace (except quoted strings), and then sends individual words to be analyzed by the appropriate analyzer.  
> 
> One way to ensure that an analyzer sees the whole string at once is to enclose the query in quotation marks.  This is not ideal.  Another way (that I've never used): Solr's FieldQParserPlugin will send the entire string for a field to the appropriate analyzer; Chris Hostetter explains here:
> 
> <http://www.lucidimagination.com/search/document/ea7b0b27b1b17b1c/re_replacing_fast_functionality_atsesam_no_shinglefilter_exactmatching>
>  
> PositionFilter was created to make ShingleFilter work better with query parsing, by making the positions of the generated shingles all be the same, which triggers "synonym" handling - any one of the generated shingles will cause a hit if present in a document: 
> 
> <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-7563e3bc4d5f7874c4c0ff824671e9ca62f40524>
> 
> Steve
> 


Re: Query multiple fields using configured analyzer stack

Posted by Walter Underwood <wu...@netflix.com>.
Wow, I was thinking of writing this. I made some comments on a full-text
query parser last week, I think.

Is there an easy way to generate the query tree then have it rewritten with
the DisMax field mapping?

wunder

On 3/16/09 8:22 AM, "Steven A Rowe" <sa...@syr.edu> wrote:

> Another way (that I've never used): Solr's FieldQParserPlugin will send the
> entire string for a field to the appropriate analyzer; Chris Hostetter
> explains here:
> 
> <http://www.lucidimagination.com/search/document/ea7b0b27b1b17b1c/re_replacing
> _fast_functionality_atsesam_no_shinglefilter_exactmatching>
>  


RE: Query multiple fields using configured analyzer stack

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Tobi,

On 3/16/2009 at 9:14 AM, Tobias Dittrich wrote:
> how can I query multiple fields in such way that for each of
> the fields the configured analyzer stack with Tokenizer is
> used for the whole query string?

Lucene's QueryParser (and AFAIK, Solr's QPs too) first break queries on whitespace (except quoted strings), and then sends individual words to be analyzed by the appropriate analyzer.  

One way to ensure that an analyzer sees the whole string at once is to enclose the query in quotation marks.  This is not ideal.  Another way (that I've never used): Solr's FieldQParserPlugin will send the entire string for a field to the appropriate analyzer; Chris Hostetter explains here:

<http://www.lucidimagination.com/search/document/ea7b0b27b1b17b1c/re_replacing_fast_functionality_atsesam_no_shinglefilter_exactmatching>
 
PositionFilter was created to make ShingleFilter work better with query parsing, by making the positions of the generated shingles all be the same, which triggers "synonym" handling - any one of the generated shingles will cause a hit if present in a document: 

<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-7563e3bc4d5f7874c4c0ff824671e9ca62f40524>

Steve