You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2010/07/30 03:55:25 UTC

[Solr Wiki] Update of "DisMaxQParserPlugin" by HossMan

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "DisMaxQParserPlugin" page has been changed by HossMan.
The comment on this change is: new page based almost verbatim on DisMaxRequestHandler page.
http://wiki.apache.org/solr/DisMaxQParserPlugin

--------------------------------------------------

New page:
The DisMaxQParserPlugin is designed to process simple user entered phrases (without heavy syntax) and search for the individual words across several fields using different weighting (boosts) based on the significance of each field.  Additional options let you influence the score based on rules specific to each use case (independent of user input).  The DisMax document has more background on the conceptual origins and behavior.

'''NOTE:''' Post Solr 1.4 there will be an Extended DisMax parsing that will improve punctuation handling and relevancy calculations, among other things.  It may be called '''edismax'''.  See JIRA [[http://issues.apache.org/jira/browse/SOLR-1553|SOLR-1553]]

<<TableOfContents>>

== Overview ==

This query parser supports an extremely simplified subset of the Lucene !QueryParser syntax.  Quotes can be used to group phrases, and +/- can be used to denote mandatory and optional clauses ... but all other Lucene query parser special characters are escaped to simplify the user experience.  The handler takes responsibility for building a good query from the user's input using !BooleanQueries containing !DisjunctionMaxQueries across fields and boosts you specify.  It also lets you provide additional boosting queries, boosting functions, and filtering queries to artificially affect the outcome of all searches.  These options can all be specified as default parameters for the handler in your solrconfig.xml or overridden in the Solr query URL.

== Query Syntax ==

This is designed to be support raw input strings provided by users with no special escaping.   '+' and '-' characters are treated as "mandatory" and "prohibited" modifiers for the subsequent terms.  Text wrapped in balanced quote characters '"' are treated as phrases, any query containing an odd number of quote characters is evaluated as if there were no quote characters at all.  Wildcards  are not supported.

== Query Structure ==

For each "word" in the query string, dismax builds a !DisjunctionMaxQuery object for that word across all of the fields in the `qf` param (with the appropriate boost values and a tiebreaker value set from the `tie` param).  These !DisjunctionMaxQuery objects are then put in a !BooleanQuery with the minNumberShouldMatch option set according to the `mm` param.  If any other params are specified, a larger !BooleanQuery is wrapped arround the first !BooleanQuery from the `qf` options, and the other params (`bf`, `bq`, `pf`) are added as optional clauses.  The only complex clause comes from from the `pf` param, which is a single !DisjuntionMaxQuery containing the whole query 'phrase' against each of the `pf` fields.

/!\ :TODO: /!\ Need more detail on the query structure generated based on input ... a picture would be nice.

== Parameters ==

The following parameters are supported, either as regular request params, or as local params

/!\ :TODO: /!\ document which params are multivalue

=== q.alt ===
If specified, this query will be used (and parsed by default using standard query parsing syntax) when the main query string is not specified or blank.  This comes in handy when you need something like a match-all-docs query (don't forget &rows=0 for that one!) in order to get collection-wise faceting counts.

=== qf (Query Fields) ===
List of fields and the "boosts" to associate with each of them when building !DisjunctionMaxQueries from the user's query.  The format supported is {{{fieldOne^2.3 fieldTwo fieldThree^0.4}}}, which indicates that fieldOne has a boost of 2.3, fieldTwo has the default boost, and fieldThree has a boost of 0.4 ... this indicates that matches in fieldOne are much more significant than matches in fieldTwo, which are more significant than matches in fieldThree.

=== mm (Minimum 'Should' Match) ===
When dealing with queries there are 3 types of "clauses" that Lucene knows about: mandatory, prohibited, and 'optional' (aka: "SHOULD")  By default all words or phrases specified in the "q" param are treated as "optional" clauses unless they are preceeded by a "+" or a "-".   When dealing with these "optional" clauses, the "mm" option makes it possible to say that a certain minimum number of those clauses must match (mm).  Specifying this minimum number can be done in complex ways, equating to ideas like...

 * At least 2 of the optional clauses must match, regardless of how many clauses there are: "{{{2}}}"
 * At least 75% of the optional clauses must match, rounded down: "{{{75%}}}"
 * If there are less than 3 optional clauses, they all must match; if there are 3 or more, then 75% must match, rounded up: "{{{2<-25%}}}"
 * If there are less than 3 optional clauses, they all must match; for 3 to 5 clauses, one less than the number of clauses must match, for 6 or more clauses, 80% must match, rounded down:  "{{{2<-1 5<80%}}}"

Full details on the variety of complex expressions supported are explained in detail [[http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html|here]].

The default value is 100% (all clauses must match)

=== pf (Phrase Fields) ===
Once the list of matching documents has been identified using the "fq" and "qf" params, the "pf" param can be used to "boost" the score of documents in cases where all of the terms in the "q" param appear in close proximity.

The format is the same as the "qf" param: a list of fields and "boosts" to associate with each of them when making phrase queries out of the entire "q" param.

=== ps (Phrase Slop) ===
Amount of slop on phrase queries built for "pf" fields (affects boosting).

=== qs (Query Phrase Slop) ===
Amount of slop on phrase queries explicitly included in the user's query string (in qf fields; affects matching).  <!> [[Solr1.3]]

=== tie (Tie breaker) ===
Float value to use as tiebreaker in !DisjunctionMaxQueries (should be something much less than 1)

When a term from the users input is tested against multiple fields, more than one field may match and each field will generate a different score based on how common that word is in that field (for each document relative to all other documents).  The "tie" param let's you configure how much the final score of the query will be influenced by the scores of the lower scoring fields compared to the highest scoring field.

A value of "0.0" makes the query a pure "disjunction max query" -- only the maximum scoring sub query contributes to the final score.  A value of "1.0" makes the query a pure "disjunction sum query" where it doesn't matter what the maximum scoring sub query is, the final score is the sum of the sub scores.  Typically a low value (ie: 0.1) is useful.

=== bq (Boost Query) ===
A raw query string (in the SolrQuerySyntax) that will be included with the user's query to influence the score.  If this is a !BooleanQuery with a default boost (1.0f) then the individual clauses will be added directly to the main query. Otherwise, the query will be included as is.

/!\ :TODO: /!\  That latter part is deprecated behavior but still works.  It can be problematic so avoid it.

=== bf (Boost Functions) ===
[[FunctionQuery|Functions]] (with optional boosts) that will be included in the user's query to influence the score.  Any function supported natively by Solr can be used, along with a boost value, e.g.: recip(rord(myfield),1,2,3)^1.5

Specifying functions with the "bf" param is just shorthand for using the {{{_val_:"...function..."}}} syntax in a "bq" param.

For example, if you want to [[FunctionQuery|show more recent documents first]], use recip(rord(creationDate),1,1000,1000)

== Examples ==

/!\ :TODO: /!\ cleanup and expand examples

Search across multiple fields, specifying (via boosts) how important each field is relative each other
{{{
http://localhost:8983/solr/select/?q=video&qt=defType=dismax&qf=features^20.0+text^0.3
}}}

You can boost results that have a field that matches a specific value...
{{{
http://localhost:8983/solr/select/?q=video&defType=dismax&qf=features^20.0+text^0.3&bq=cat:electronics^5.0
}}}


Using the "mm" param, 1 and 2 word queries require that all of the optional clauses match, but for queries with three or more clauses one missing clause is allowed...
{{{
http://localhost:8983/solr/select/?q=belkin+ipod&defType=dismax&mm=2
http://localhost:8983/solr/select/?q=belkin+ipod+gibberish&defType=dismax&mm=2
http://localhost:8983/solr/select/?q=belkin+ipod+apple&defType=dismax&mm=2
}}}