You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mirko <id...@googlemail.com> on 2013/11/21 11:30:59 UTC
Parse eDisMax queries for keywords
Hi,
We would like to implement special handling for queries that contain
certain keywords. Our particular use case:
In the example query "Footitle season 1" we want to discover the keywords
"season" , get the subsequent number, and boost (or filter for) documents
that match "1" on field name="season".
We have two fields in our schema:
<!-- "titles" contains titles -->
<field name="title" type="text" indexed="true" stored="true"
multiValued="false"/>
<fieldType name="text" class="solr.TextField" omitNorms="true">
<analyzer >
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!-- ... -->
</analyzer>
</fieldType>
<field name="season" type="season_number" indexed="true" stored="false"
multiValued="false"/>
<!-- "season" contains season numbers -->
<fieldType name="season_number" class="solr.TextField" omitNorms="true" >
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern=".*(?:season)
*0*([0-9]+).*" replacement="$1"/>
</analyzer>
</fieldType>
Our idea was to use a Keyword tokenizer and a Regex on the "season" field
to extract the season number from the complete query.
However, we use a ExtendedDisMax query parser in our search handler:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="qf">
title season
</str>
</lst>
</requestHandler>
The problem is that the eDisMax tokenizes the query, so that our field
"season" receives the tokens ["Foo", "season", "1"] without any order,
instead of the complete query.
How can we pass the complete query (untokenized) to the season field? We
don't understand which tokenizer is used here and why our "season" field
received tokens instead of the complete query.
Or is there another approach to solve this use case with Solr?
Thanks,
Mirko
Re: Parse eDisMax queries for keywords
Posted by Mirko <id...@googlemail.com>.
Hi Jack,
thanks for your reply. Ok in this case I agree that "enriching" the query
in the application layer is a good idea. We are still a bit puzzled how the
enriched query should look like. I'll post here when we found a solution.
If somebody has suggestions, I'd be happy to hear them.
Mirko
2013/11/21 Jack Krupansky <ja...@basetechnology.com>
> The query parser does its own tokenization and parsing before your
> analyzer tokenizer and filters are called, assuring that only one white
> space-delimited token is analyzed at a time.
>
> You're probably best off having an application layer preprocessor for the
> query that "enriches" the query in the manner that you're describing.
>
> Or, simply settle for a "heuristic" approach that may give you 70% of what
> you want using only existing Solr features on the server side.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Mirko
> Sent: Thursday, November 21, 2013 5:30 AM
> To: solr-user@lucene.apache.org
> Subject: Parse eDisMax queries for keywords
>
>
> Hi,
> We would like to implement special handling for queries that contain
> certain keywords. Our particular use case:
>
> In the example query "Footitle season 1" we want to discover the keywords
> "season" , get the subsequent number, and boost (or filter for) documents
> that match "1" on field name="season".
>
> We have two fields in our schema:
>
> <!-- "titles" contains titles -->
> <field name="title" type="text" indexed="true" stored="true"
> multiValued="false"/>
>
> <fieldType name="text" class="solr.TextField" omitNorms="true">
> <analyzer >
> <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <!-- ... -->
> </analyzer>
> </fieldType>
>
> <field name="season" type="season_number" indexed="true" stored="false"
> multiValued="false"/>
>
> <!-- "season" contains season numbers -->
> <fieldType name="season_number" class="solr.TextField" omitNorms="true" >
> <analyzer type="query">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.PatternReplaceFilterFactory" pattern=".*(?:season)
> *0*([0-9]+).*" replacement="$1"/>
> </analyzer>
> </fieldType>
>
>
> Our idea was to use a Keyword tokenizer and a Regex on the "season" field
> to extract the season number from the complete query.
>
> However, we use a ExtendedDisMax query parser in our search handler:
>
> <requestHandler name="/select" class="solr.SearchHandler">
> <lst name="defaults">
> <str name="defType">edismax</str>
> <str name="qf">
> title season
> </str>
>
> </lst>
> </requestHandler>
>
>
> The problem is that the eDisMax tokenizes the query, so that our field
> "season" receives the tokens ["Foo", "season", "1"] without any order,
> instead of the complete query.
>
> How can we pass the complete query (untokenized) to the season field? We
> don't understand which tokenizer is used here and why our "season" field
> received tokens instead of the complete query.
>
> Or is there another approach to solve this use case with Solr?
>
> Thanks,
> Mirko
>
Re: Parse eDisMax queries for keywords
Posted by Jack Krupansky <ja...@basetechnology.com>.
The query parser does its own tokenization and parsing before your analyzer
tokenizer and filters are called, assuring that only one white
space-delimited token is analyzed at a time.
You're probably best off having an application layer preprocessor for the
query that "enriches" the query in the manner that you're describing.
Or, simply settle for a "heuristic" approach that may give you 70% of what
you want using only existing Solr features on the server side.
-- Jack Krupansky
-----Original Message-----
From: Mirko
Sent: Thursday, November 21, 2013 5:30 AM
To: solr-user@lucene.apache.org
Subject: Parse eDisMax queries for keywords
Hi,
We would like to implement special handling for queries that contain
certain keywords. Our particular use case:
In the example query "Footitle season 1" we want to discover the keywords
"season" , get the subsequent number, and boost (or filter for) documents
that match "1" on field name="season".
We have two fields in our schema:
<!-- "titles" contains titles -->
<field name="title" type="text" indexed="true" stored="true"
multiValued="false"/>
<fieldType name="text" class="solr.TextField" omitNorms="true">
<analyzer >
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!-- ... -->
</analyzer>
</fieldType>
<field name="season" type="season_number" indexed="true" stored="false"
multiValued="false"/>
<!-- "season" contains season numbers -->
<fieldType name="season_number" class="solr.TextField" omitNorms="true" >
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern=".*(?:season)
*0*([0-9]+).*" replacement="$1"/>
</analyzer>
</fieldType>
Our idea was to use a Keyword tokenizer and a Regex on the "season" field
to extract the season number from the complete query.
However, we use a ExtendedDisMax query parser in our search handler:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="qf">
title season
</str>
</lst>
</requestHandler>
The problem is that the eDisMax tokenizes the query, so that our field
"season" receives the tokens ["Foo", "season", "1"] without any order,
instead of the complete query.
How can we pass the complete query (untokenized) to the season field? We
don't understand which tokenizer is used here and why our "season" field
received tokens instead of the complete query.
Or is there another approach to solve this use case with Solr?
Thanks,
Mirko