You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Ferguson <ma...@gmail.com> on 2009/01/01 01:29:09 UTC

Dismax query parser with different field classes

Hello,

I have a set of documents in which I have different classes of fields that I
would like to search separately. For example, I would like to search the
HTML body and title of a webpage using one set of keywords, and the page
author using another set. I cannot use the dismax parser for this problem
because all keywords will search across all the query fields. However, I
like the dismax query parser because it handles matching and scoring very
nicely.

I read one suggestion on this group which was to make one of the queries a
query filter. So for example, I may use the dismax query parser to search
the body and title of a webpage, then use a filter query for the author.
There are two problems with this approach in regards to what I need:
  1. The filter query does not affect scoring, but I need the scoring to be
influenced by the results of all fields being searched.
  2. A filter query will do a simple AND or OR filter, while I would need
the search to be an OR search with higher scoring for multiple matches
(related to the first problem).

I think what I need is a dismax parser into which the parsed query will not
just contain all keywords for all fields, but into which you can specify
which fields correspond to which sets of keywords. Has anything like this
been tackled before? If not, can someone help point me in the right
direction for how I would build this myself? Thanks very much for your time.

Regards,

Mark Ferguson

Re: Dismax query parser with different field classes

Posted by Chris Hostetter <ho...@fucit.org>.
: I have a small problem with using a boost query, which is that I would like
: documents found in the boost query to be returned even if the main query
: does not include those results. So what I am effectively looking for is an
: OR between the dismax query and the boost query, rather than a required main
: query or'd with the boost query. Does anything currently exist which can
: facilitate this?

i don't think so.

I think you would need to write a custom version of QueryComponent to do 
this.

A lot of cool magic (that is as yet still largely undocumented) got added 
when the QParser and QParserPlugin apis were added for doing local params 
and even variable substitution using other request params -- but that 
still all happens at a string level.  i can't think of any way to say that 
you want multiple querys generated by different QParsers to be combined in 
a particular way


-Hoss


Re: Dismax query parser with different field classes

Posted by Mark Ferguson <ma...@gmail.com>.
Hi again,

I have a small problem with using a boost query, which is that I would like
documents found in the boost query to be returned even if the main query
does not include those results. So what I am effectively looking for is an
OR between the dismax query and the boost query, rather than a required main
query or'd with the boost query. Does anything currently exist which can
facilitate this?

For example, currently my parsed query looks something like this, where the
domain, page_title and body_text fields are part of the dismax query, and
user_id is part of the boost query:

+DisjunctionMaxQuery((domain:maps^4.0 page_title:map^2.0 |
body_text:map)~0.01) user_id:12^5.0

Whereas I would like it to look like this:

+(DisjunctionMaxQuery((domain:maps^4.0 page_title:map^2.0 |
body_text:map)~0.01 user_id:12^5.0)

This is because I want all documents with user_id:12 to be returned, even if
there are no keyword matches. I also want all documents with keyword matches
to be returned, even when the user_id doesn't match, so I can't just switch
the query and the boost query.

Any ideas? Thanks for your time.

Mark


On Fri, Jan 2, 2009 at 2:33 PM, Mark Ferguson <ma...@gmail.com>wrote:

> Hello,
>
> It looks like a boost query will accomplish what I am looking for quite
> nicely.
>
> Mark
>
>
>
> On Wed, Dec 31, 2008 at 5:29 PM, Mark Ferguson <ma...@gmail.com>wrote:
>
>> Hello,
>>
>> I have a set of documents in which I have different classes of fields that
>> I would like to search separately. For example, I would like to search the
>> HTML body and title of a webpage using one set of keywords, and the page
>> author using another set. I cannot use the dismax parser for this problem
>> because all keywords will search across all the query fields. However, I
>> like the dismax query parser because it handles matching and scoring very
>> nicely.
>>
>> I read one suggestion on this group which was to make one of the queries a
>> query filter. So for example, I may use the dismax query parser to search
>> the body and title of a webpage, then use a filter query for the author.
>> There are two problems with this approach in regards to what I need:
>>   1. The filter query does not affect scoring, but I need the scoring to
>> be influenced by the results of all fields being searched.
>>   2. A filter query will do a simple AND or OR filter, while I would need
>> the search to be an OR search with higher scoring for multiple matches
>> (related to the first problem).
>>
>> I think what I need is a dismax parser into which the parsed query will
>> not just contain all keywords for all fields, but into which you can specify
>> which fields correspond to which sets of keywords. Has anything like this
>> been tackled before? If not, can someone help point me in the right
>> direction for how I would build this myself? Thanks very much for your time.
>>
>> Regards,
>>
>> Mark Ferguson
>>
>
>

Re: Dismax query parser with different field classes

Posted by Mark Ferguson <ma...@gmail.com>.
Hello,

It looks like a boost query will accomplish what I am looking for quite
nicely.

Mark


On Wed, Dec 31, 2008 at 5:29 PM, Mark Ferguson <ma...@gmail.com>wrote:

> Hello,
>
> I have a set of documents in which I have different classes of fields that
> I would like to search separately. For example, I would like to search the
> HTML body and title of a webpage using one set of keywords, and the page
> author using another set. I cannot use the dismax parser for this problem
> because all keywords will search across all the query fields. However, I
> like the dismax query parser because it handles matching and scoring very
> nicely.
>
> I read one suggestion on this group which was to make one of the queries a
> query filter. So for example, I may use the dismax query parser to search
> the body and title of a webpage, then use a filter query for the author.
> There are two problems with this approach in regards to what I need:
>   1. The filter query does not affect scoring, but I need the scoring to be
> influenced by the results of all fields being searched.
>   2. A filter query will do a simple AND or OR filter, while I would need
> the search to be an OR search with higher scoring for multiple matches
> (related to the first problem).
>
> I think what I need is a dismax parser into which the parsed query will not
> just contain all keywords for all fields, but into which you can specify
> which fields correspond to which sets of keywords. Has anything like this
> been tackled before? If not, can someone help point me in the right
> direction for how I would build this myself? Thanks very much for your time.
>
> Regards,
>
> Mark Ferguson
>