You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tobias Dittrich <di...@wave-computer.de> on 2009/03/16 14:13:58 UTC
Query multiple fields using configured analyzer stack
Hi,
how can I query multiple fields in such way that for each of
the fields the configured analyzer stack with Tokenizer is
used for the whole query string?
I have several fields in my schema that use ShingleFilter
and/or WordDelimiterFilter and other stuff. But when I
search for example for "blue tooth" (without the quotes) the
query is parsed to +name:blue +name:tooth which is not what
I expected. The search for "blue-tooth" on the other hand
yields the expected query: name:blue-tooth name:blue
name:tooth name:bluetooth. Just it is only for one field
instead of many.
This is when using the LuceneQParser. Using DisMax gives
almost exactly what I want when searching for "blue-tooth"
but gives even more strange results for "blue tooth".
Is there an existing parser or plugin that can do this? Or
maybe do I just have to rewrite my config a bit? Any
comments are welcome.
Thanks in advance
Tobi
P.S.: I asked something similar in an earlier post but in
the meantime I spent a lot of time thinking about what my
actual problem is and came up with a different view of things...
Re: Query multiple fields using configured analyzer stack
Posted by Tobias Dittrich <di...@wave-computer.de>.
Hi Steve,
thanks for your quick response. Quoting the string really is
not a good idea in this case. And it does not what I need
anyway since the query is converted into a PhraseQuery and
treated differently.
But thanks for pointing me to the FieldQParserPlugin. Yet I
seem not to get it to work properly. I registered it as
plugin in my solrconfig.xml like this:
<queryParser name="field"
class="org.apache.solr.search.FieldQParserPlugin"/>
But when I send a query I get the following results (solrj
debug output):
rawquerystring -> {!field f=name}blue tooth
querystring -> {!field f=name}blue tooth
parsedquery -> name:blue name:tooth
parsedquery_toString -> name:blue name:tooth
But I'd expect it to be like name:(blue tooth) name:blue
name:tooth name:bluetooth
Here is what my schema.xml looks for name:
<!-- normal german text -->
<fieldType name="text_de" class="solr.TextField"
positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
[..]
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.WordDelimiterFilterFactory"
splitOnCaseChange="1" generateWordParts="1"
generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="1" preserveOriginal="1" />
<filter class="solr.PositionFilterFactory" />
<filter class="solr.StopFilterFactory"
words="stopwords_de.txt" ignoreCase="true" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory"
language="German" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>
<field name="name" type="text_de" indexed="true"
stored="true" />
Is there anything else I have to configure?
Thanks
Tobi
Steven A Rowe schrieb:
> Hi Tobi,
>
> On 3/16/2009 at 9:14 AM, Tobias Dittrich wrote:
>> how can I query multiple fields in such way that for each of
>> the fields the configured analyzer stack with Tokenizer is
>> used for the whole query string?
>
> Lucene's QueryParser (and AFAIK, Solr's QPs too) first break queries on whitespace (except quoted strings), and then sends individual words to be analyzed by the appropriate analyzer.
>
> One way to ensure that an analyzer sees the whole string at once is to enclose the query in quotation marks. This is not ideal. Another way (that I've never used): Solr's FieldQParserPlugin will send the entire string for a field to the appropriate analyzer; Chris Hostetter explains here:
>
> <http://www.lucidimagination.com/search/document/ea7b0b27b1b17b1c/re_replacing_fast_functionality_atsesam_no_shinglefilter_exactmatching>
>
> PositionFilter was created to make ShingleFilter work better with query parsing, by making the positions of the generated shingles all be the same, which triggers "synonym" handling - any one of the generated shingles will cause a hit if present in a document:
>
> <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-7563e3bc4d5f7874c4c0ff824671e9ca62f40524>
>
> Steve
>
Re: Query multiple fields using configured analyzer stack
Posted by Walter Underwood <wu...@netflix.com>.
Wow, I was thinking of writing this. I made some comments on a full-text
query parser last week, I think.
Is there an easy way to generate the query tree then have it rewritten with
the DisMax field mapping?
wunder
On 3/16/09 8:22 AM, "Steven A Rowe" <sa...@syr.edu> wrote:
> Another way (that I've never used): Solr's FieldQParserPlugin will send the
> entire string for a field to the appropriate analyzer; Chris Hostetter
> explains here:
>
> <http://www.lucidimagination.com/search/document/ea7b0b27b1b17b1c/re_replacing
> _fast_functionality_atsesam_no_shinglefilter_exactmatching>
>
RE: Query multiple fields using configured analyzer stack
Posted by Steven A Rowe <sa...@syr.edu>.
Hi Tobi,
On 3/16/2009 at 9:14 AM, Tobias Dittrich wrote:
> how can I query multiple fields in such way that for each of
> the fields the configured analyzer stack with Tokenizer is
> used for the whole query string?
Lucene's QueryParser (and AFAIK, Solr's QPs too) first break queries on whitespace (except quoted strings), and then sends individual words to be analyzed by the appropriate analyzer.
One way to ensure that an analyzer sees the whole string at once is to enclose the query in quotation marks. This is not ideal. Another way (that I've never used): Solr's FieldQParserPlugin will send the entire string for a field to the appropriate analyzer; Chris Hostetter explains here:
<http://www.lucidimagination.com/search/document/ea7b0b27b1b17b1c/re_replacing_fast_functionality_atsesam_no_shinglefilter_exactmatching>
PositionFilter was created to make ShingleFilter work better with query parsing, by making the positions of the generated shingles all be the same, which triggers "synonym" handling - any one of the generated shingles will cause a hit if present in a document:
<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-7563e3bc4d5f7874c4c0ff824671e9ca62f40524>
Steve