You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2006/09/21 06:02:38 UTC

Standard vs. DisMaxQueryHandler

Hi,

Is the main difference between the StandardQueryHandler and DisMaxQueryHandler the supported query syntax (and different query parser used in each of them), and the fact that the latter creates DisjunctionMaxQueries, while the former just creates vanilla BooleanQueries?  Are there any other differences?

Thanks,
Otis

Re: Standard vs. DisMaxQueryHandler

Posted by "David Smiley @MITRE.org" <DS...@mitre.org>.

I'm looking at the latest source and I see that the only way I can use
Lucene's DisjunctionMaxQuery is to use the limited query syntax.  These two
are entangled together in SolrPluginUtils.DisjunctionMaxQueryParser.  Am I
looking at the wrong place?  I'd love to be proven wrong.

We want our users to use boolean queries ("and", "or" syntax) and prefix
queries.  We don't want to constraint them to the less capable syntax
hard-coded into DisjunctionMaxQueryParser.  We want terms that aren't bound
to a field to not simply default to one field, but to be applied against
several of them with different boosting.  The standard query handler doesn't
really do that.


ryantxu wrote:
> 
> In 1.3 (trunk, dev build), the query parsing has been extracted into a  
> component.  It shoudl be easy to replace just query parsing component  
> and keep the rest of the chain the same.
> 
> I'm not quite following why it is a problem to have two urls for  
> dismax vs standard query.  Dismax expects unescapped user queries  
> while standard requires valid lucene query syntax.  Is your problem  
> that you want a HTML form to hit a single URL for either case?
> 
> ryan
> 
> 
> 
> On Apr 26, 2008, at 1:33 AM, David Smiley @MITRE.org wrote:
>>
>> I am frustrated that I have to pick between the two because I want  
>> both.  The
>> way I look at it, there should be a more configurable query handler  
>> which
>> allows me to dimax if I want to, and pick a parser for the user's  
>> query
>> (like the flexible one used by the standard query handler, or the more
>> restrictive one found in DisMax Q.H. today).  At the moment, I'm  
>> faced with
>> telling a user of my search service (another developer of a  
>> corporate app
>> using my corporate search service) that he has to compose a dis-max  
>> manually
>> (i.e. use the standard query handler to get the job done) simply  
>> because he
>> wants to do a prefix query (which isn't supported by DisMax Q.H.).   
>> This is
>> for an auto-complete type thing, by the way.  You might argue it's  
>> not hard
>> -- that's true though it is annoying.  But the bigger issue is that  
>> I can't
>> encapsulate these internal details into my search service -- where it
>> belongs IMO.
>>
>> ~ David Smiley
>>
>>
>> hossman_lucene wrote:
>>>
>>>
>>> : Is the main difference between the StandardQueryHandler and
>>> : DisMaxQueryHandler the supported query syntax (and different query
>>> : parser used in each of them), and the fact that the latter creates
>>> : DisjunctionMaxQueries, while the former just creates vanilla
>>> : BooleanQueries?  Are there any other differences?
>>>
>>> the main differnece is the query string yes: Standard expects to get
>>> "lucene QueryParser" formatted queries, while DisMax expects to get  
>>> raw
>>> user input strings ... Standard builds queries (wehter they be  
>>> prefix or
>>> boolean or wildcard) using the QueryParser as is, while DisMax does a
>>> "cross product" of the user input across many differnet fields and  
>>> builds
>>> up a very specific query structure -- which can then be augmented  
>>> with
>>> aditional query clauses like the "bq" boost query and the "bf" boost
>>> function.
>>>
>>> there's no reason the StandardRequestHandler can't construct  
>>> DisMaxQueries
>>> (once QueryParser has some syntax for them) and  
>>> DisMaxRequestHandler does
>>> (at the outermost level) generate a BooleanQuery (with a custom
>>> "minShouldMatch" value set on it) but the main differnece is really  
>>> the
>>> use case: if you want the clinet to specify the exact query  
>>> structure that
>>> they want, use StandardRequstHandler.  if you want the client to just
>>> propogate the raw search string typed by the user, without any  
>>> structure
>>> or escaping, and get the nice complex DisMax style query across the
>>> configured fields, the DisMax handler was written to fill that niche.
>>>
>>> (load up the example configs, and take a look at the query toString  
>>> from
>>> this url to see what i mean about the complex structure...
>>>
>>> http://localhost:8983/solr/select/?qt=dismax&q=how+now+brown+cow&debugQuery=1
>>>
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Standard-vs.-DisMaxQueryHandler-tp6421205p16909626.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Standard-vs.-DisMaxQueryHandler-tp6421205p16945818.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Standard vs. DisMaxQueryHandler

Posted by Ryan McKinley <ry...@gmail.com>.

In 1.3 (trunk, dev build), the query parsing has been extracted into a  
component.  It shoudl be easy to replace just query parsing component  
and keep the rest of the chain the same.

I'm not quite following why it is a problem to have two urls for  
dismax vs standard query.  Dismax expects unescapped user queries  
while standard requires valid lucene query syntax.  Is your problem  
that you want a HTML form to hit a single URL for either case?

ryan



On Apr 26, 2008, at 1:33 AM, David Smiley @MITRE.org wrote:
>
> I am frustrated that I have to pick between the two because I want  
> both.  The
> way I look at it, there should be a more configurable query handler  
> which
> allows me to dimax if I want to, and pick a parser for the user's  
> query
> (like the flexible one used by the standard query handler, or the more
> restrictive one found in DisMax Q.H. today).  At the moment, I'm  
> faced with
> telling a user of my search service (another developer of a  
> corporate app
> using my corporate search service) that he has to compose a dis-max  
> manually
> (i.e. use the standard query handler to get the job done) simply  
> because he
> wants to do a prefix query (which isn't supported by DisMax Q.H.).   
> This is
> for an auto-complete type thing, by the way.  You might argue it's  
> not hard
> -- that's true though it is annoying.  But the bigger issue is that  
> I can't
> encapsulate these internal details into my search service -- where it
> belongs IMO.
>
> ~ David Smiley
>
>
> hossman_lucene wrote:
>>
>>
>> : Is the main difference between the StandardQueryHandler and
>> : DisMaxQueryHandler the supported query syntax (and different query
>> : parser used in each of them), and the fact that the latter creates
>> : DisjunctionMaxQueries, while the former just creates vanilla
>> : BooleanQueries?  Are there any other differences?
>>
>> the main differnece is the query string yes: Standard expects to get
>> "lucene QueryParser" formatted queries, while DisMax expects to get  
>> raw
>> user input strings ... Standard builds queries (wehter they be  
>> prefix or
>> boolean or wildcard) using the QueryParser as is, while DisMax does a
>> "cross product" of the user input across many differnet fields and  
>> builds
>> up a very specific query structure -- which can then be augmented  
>> with
>> aditional query clauses like the "bq" boost query and the "bf" boost
>> function.
>>
>> there's no reason the StandardRequestHandler can't construct  
>> DisMaxQueries
>> (once QueryParser has some syntax for them) and  
>> DisMaxRequestHandler does
>> (at the outermost level) generate a BooleanQuery (with a custom
>> "minShouldMatch" value set on it) but the main differnece is really  
>> the
>> use case: if you want the clinet to specify the exact query  
>> structure that
>> they want, use StandardRequstHandler.  if you want the client to just
>> propogate the raw search string typed by the user, without any  
>> structure
>> or escaping, and get the nice complex DisMax style query across the
>> configured fields, the DisMax handler was written to fill that niche.
>>
>> (load up the example configs, and take a look at the query toString  
>> from
>> this url to see what i mean about the complex structure...
>>
>> http://localhost:8983/solr/select/?qt=dismax&q=how+now+brown+cow&debugQuery=1
>>
>>
>>
>>
>> -Hoss
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Standard-vs.-DisMaxQueryHandler-tp6421205p16909626.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Standard vs. DisMaxQueryHandler

Posted by "David Smiley @MITRE.org" <DS...@mitre.org>.

I'm already aware of defType but that doesn't really change things.

Either I use defType of DISMAX to get DisjunctionMaxQuery but then I can't
use prefix queries and more complicated boolean queries, OR I use the
standard defType which doesn't use DisjunctionMaxQuery.

I need this feature ASAP so tonight I plan on enhancing the standard handler
to support disjunctionmax if the "qf" boost list is specified.  I plan on
post-processing the parsed query to rewrite it so that references to a field
(a bogus value of some sort) gets replaced with a sub-DisjunctionMaxQuery. 
OR, I could take the approach that SolrPluginUtils.DisjunctionMaxQueryParser
takes which is to write it correctly the first time around via the template
design pattern.  I like the former approach because it opens the door to
alternative query parsing mechanisms rather than entangling these separate
concerns together (which is perhaps why this is a problem to this day).

At least that's my plan, having not gotten started.

~ David Smiley

hossman wrote:
> 
> 
> : I am frustrated that I have to pick between the two because I want both. 
> The
> : way I look at it, there should be a more configurable query handler
> which
> : allows me to dimax if I want to, and pick a parser for the user's query
> : (like the flexible one used by the standard query handler, or the more
> 
> The message you replied to is 19 months old ... there have been a lot of 
> improvements in this regard.  
> 
> On the trunk today, DisMax is a (mostly empty) subclass of Standard and 
> the type of parser you want to use can be determined by the "defType" 
> (default type) of query parsing you want ... unfortunately a lot of this 
> isn't very well documented for users yet (but it's not in a release and 
> the kinks are still beingworked out, so that's somewhat expected)
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Standard-vs.-DisMaxQueryHandler-tp6421205p16944438.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Standard vs. DisMaxQueryHandler

Posted by "David Smiley @MITRE.org" <DS...@mitre.org>.

Now you've got it Hossman.

I don't plan on mucking with the parser syntax.  I look at this feature as a
smarter default field.  Instead of it being one field, it is an array of
them constructed via Dismax with various boosts.

~ David


hossman wrote:
> 
> 
> : Either I use defType of DISMAX to get DisjunctionMaxQuery but then I
> can't
> : use prefix queries and more complicated boolean queries, OR I use the
> : standard defType which doesn't use DisjunctionMaxQuery.
> 
> I think I missunderstood your complaint ... it sounds like you don't care 
> about (or want) the features of hte dismax parser (where the whole point 
> is to provided a limited user friend syntax that's hard to break) ... you 
> want clients to be able to send you query strings containing a rich syntax 
> supporting complex boolean expressions, and prefix queries, AND 
> DisjunctionMaxQueries.
> 
> The only reason the "StandardRequestHandler" doesn't support that is 
> because the underlying lucene QueryParser doesn't have any support for 
> DisjunctionMaxQueries ... adding it in would require picking a syntax and 
> making some changes to the grammer -- but it could be done.
> 
> : I need this feature ASAP so tonight I plan on enhancing the standard
> handler
> : to support disjunctionmax if the "qf" boost list is specified.  I plan
> on
> : post-processing the parsed query to rewrite it so that references to a
> field
> : (a bogus value of some sort) gets replaced with a
> sub-DisjunctionMaxQuery. 
> 
> that would also work, but if your "q" param is in the standard syntax 
> (with a special marker term that you look for in post processing) you 
> still need some other param to know what to build a dismax query out of 
> with those qf fields.
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Standard-vs.-DisMaxQueryHandler-tp6421205p16945850.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Standard vs. DisMaxQueryHandler

Posted by Chris Hostetter <ho...@fucit.org>.

: Either I use defType of DISMAX to get DisjunctionMaxQuery but then I can't
: use prefix queries and more complicated boolean queries, OR I use the
: standard defType which doesn't use DisjunctionMaxQuery.

I think I missunderstood your complaint ... it sounds like you don't care 
about (or want) the features of hte dismax parser (where the whole point 
is to provided a limited user friend syntax that's hard to break) ... you 
want clients to be able to send you query strings containing a rich syntax 
supporting complex boolean expressions, and prefix queries, AND 
DisjunctionMaxQueries.

The only reason the "StandardRequestHandler" doesn't support that is 
because the underlying lucene QueryParser doesn't have any support for 
DisjunctionMaxQueries ... adding it in would require picking a syntax and 
making some changes to the grammer -- but it could be done.

: I need this feature ASAP so tonight I plan on enhancing the standard handler
: to support disjunctionmax if the "qf" boost list is specified.  I plan on
: post-processing the parsed query to rewrite it so that references to a field
: (a bogus value of some sort) gets replaced with a sub-DisjunctionMaxQuery. 

that would also work, but if your "q" param is in the standard syntax 
(with a special marker term that you look for in post processing) you 
still need some other param to know what to build a dismax query out of 
with those qf fields.


-Hoss

Re: Standard vs. DisMaxQueryHandler

Posted by "David Smiley @MITRE.org" <DS...@mitre.org>.

I'm already aware of defType but that doesn't really change things.

Either I use defType of DISMAX to get DisjunctionMaxQuery but then I can't
use prefix queries and more complicated boolean queries, OR I use the
standard defType which doesn't use DisjunctionMaxQuery.

I need this feature ASAP so tonight I plan on enhancing the standard handler
to support disjunctionmax if the "qf" boost list is specified.  I plan on
post-processing the parsed query to rewrite it so that references to a field
(a bogus value of some sort) gets replaced with a sub-DisjunctionMaxQuery. 
OR, I could take the approach that SolrPluginUtils.DisjunctionMaxQueryParser
takes which is to write it correctly the first time around via the template
design pattern.  I like the former approach because it opens the door to
alternative query parsing mechanisms rather than entangling these separate
concerns together (which is perhaps why this is a problem to this day).

At least that's my plan, having not gotten started.

~ David Smiley

hossman wrote:
> 
> 
> : I am frustrated that I have to pick between the two because I want both. 
> The
> : way I look at it, there should be a more configurable query handler
> which
> : allows me to dimax if I want to, and pick a parser for the user's query
> : (like the flexible one used by the standard query handler, or the more
> 
> The message you replied to is 19 months old ... there have been a lot of 
> improvements in this regard.  
> 
> On the trunk today, DisMax is a (mostly empty) subclass of Standard and 
> the type of parser you want to use can be determined by the "defType" 
> (default type) of query parsing you want ... unfortunately a lot of this 
> isn't very well documented for users yet (but it's not in a release and 
> the kinks are still beingworked out, so that's somewhat expected)
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Standard-vs.-DisMaxQueryHandler-tp6421205p16944495.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Standard vs. DisMaxQueryHandler

Posted by Chris Hostetter <ho...@fucit.org>.

: I am frustrated that I have to pick between the two because I want both.  The
: way I look at it, there should be a more configurable query handler which
: allows me to dimax if I want to, and pick a parser for the user's query
: (like the flexible one used by the standard query handler, or the more

The message you replied to is 19 months old ... there have been a lot of 
improvements in this regard.  

On the trunk today, DisMax is a (mostly empty) subclass of Standard and 
the type of parser you want to use can be determined by the "defType" 
(default type) of query parsing you want ... unfortunately a lot of this 
isn't very well documented for users yet (but it's not in a release and 
the kinks are still beingworked out, so that's somewhat expected)


-Hoss

Re: Standard vs. DisMaxQueryHandler

Posted by "David Smiley @MITRE.org" <DS...@mitre.org>.

I am frustrated that I have to pick between the two because I want both.  The
way I look at it, there should be a more configurable query handler which
allows me to dimax if I want to, and pick a parser for the user's query
(like the flexible one used by the standard query handler, or the more
restrictive one found in DisMax Q.H. today).  At the moment, I'm faced with
telling a user of my search service (another developer of a corporate app
using my corporate search service) that he has to compose a dis-max manually
(i.e. use the standard query handler to get the job done) simply because he
wants to do a prefix query (which isn't supported by DisMax Q.H.).  This is
for an auto-complete type thing, by the way.  You might argue it's not hard
-- that's true though it is annoying.  But the bigger issue is that I can't
encapsulate these internal details into my search service -- where it
belongs IMO.

~ David Smiley


hossman_lucene wrote:
> 
> 
> : Is the main difference between the StandardQueryHandler and
> : DisMaxQueryHandler the supported query syntax (and different query
> : parser used in each of them), and the fact that the latter creates
> : DisjunctionMaxQueries, while the former just creates vanilla
> : BooleanQueries?  Are there any other differences?
> 
> the main differnece is the query string yes: Standard expects to get
> "lucene QueryParser" formatted queries, while DisMax expects to get raw
> user input strings ... Standard builds queries (wehter they be prefix or
> boolean or wildcard) using the QueryParser as is, while DisMax does a
> "cross product" of the user input across many differnet fields and builds
> up a very specific query structure -- which can then be augmented with
> aditional query clauses like the "bq" boost query and the "bf" boost
> function.
> 
> there's no reason the StandardRequestHandler can't construct DisMaxQueries
> (once QueryParser has some syntax for them) and DisMaxRequestHandler does
> (at the outermost level) generate a BooleanQuery (with a custom
> "minShouldMatch" value set on it) but the main differnece is really the
> use case: if you want the clinet to specify the exact query structure that
> they want, use StandardRequstHandler.  if you want the client to just
> propogate the raw search string typed by the user, without any structure
> or escaping, and get the nice complex DisMax style query across the
> configured fields, the DisMax handler was written to fill that niche.
> 
> (load up the example configs, and take a look at the query toString from
> this url to see what i mean about the complex structure...
> 
> http://localhost:8983/solr/select/?qt=dismax&q=how+now+brown+cow&debugQuery=1
> 
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Standard-vs.-DisMaxQueryHandler-tp6421205p16909626.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Standard vs. DisMaxQueryHandler

Posted by Chris Hostetter <ho...@fucit.org>.

: Is the main difference between the StandardQueryHandler and
: DisMaxQueryHandler the supported query syntax (and different query
: parser used in each of them), and the fact that the latter creates
: DisjunctionMaxQueries, while the former just creates vanilla
: BooleanQueries?  Are there any other differences?

the main differnece is the query string yes: Standard expects to get
"lucene QueryParser" formatted queries, while DisMax expects to get raw
user input strings ... Standard builds queries (wehter they be prefix or
boolean or wildcard) using the QueryParser as is, while DisMax does a
"cross product" of the user input across many differnet fields and builds
up a very specific query structure -- which can then be augmented with
aditional query clauses like the "bq" boost query and the "bf" boost
function.

there's no reason the StandardRequestHandler can't construct DisMaxQueries
(once QueryParser has some syntax for them) and DisMaxRequestHandler does
(at the outermost level) generate a BooleanQuery (with a custom
"minShouldMatch" value set on it) but the main differnece is really the
use case: if you want the clinet to specify the exact query structure that
they want, use StandardRequstHandler.  if you want the client to just
propogate the raw search string typed by the user, without any structure
or escaping, and get the nice complex DisMax style query across the
configured fields, the DisMax handler was written to fill that niche.

(load up the example configs, and take a look at the query toString from
this url to see what i mean about the complex structure...

http://localhost:8983/solr/select/?qt=dismax&q=how+now+brown+cow&debugQuery=1




-Hoss