You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2006/05/21 00:31:46 UTC

[Solr Wiki] Update of "DisMaxRequestHandler" by HossMan

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by HossMan:
http://wiki.apache.org/solr/DisMaxRequestHandler

New page:
Below you will find the initial announcement about the !DisMaxRequestHandler.  

This email, the [http://incubator.apache.org/solr/docs/api/org/apache/solr/request/DisMaxRequestHandler.html javadocs], and the [http://svn.apache.org/viewvc/incubator/solr/trunk/example/solr/conf/solrconfig.xml?view=co example configuration] are the documentation currently available about it's use.

/!\ :TODO: /!\ write some more general documentation akin to StandardRequestHandler

{{{

Date: Sat, 20 May 2006 15:17:38 -0700 (PDT)
From: Chris Hostetter 
To: solr-user @ lucene.apache.org
Subject: Two Solr Announcements: CNET Product Search and DisMax


I've got two related announcements to make, which I think are pretty
cool...

The first is that the Search result pages for CNET Shopper.com are now
powered by Solr.  You may be thinking "Didn't he announce that last year?"
... not quite.  CNET's faceted product listing pages for browsing products
by category have been powered by by Solr for about a year now, but up
until a few weeks ago, searching for products by keywords was still
powered by a legacy system.  I was working hard to come up with a good
mechanism for building Lucene queries based on user input, that would
allow us to leverage our "domain expertise" about consumer technology
products to ensure that users got the best matches.

Which brings me to my second announcement:  I've just committed a new
SolrQueryHandler called the "DisMaxQueryHandler" into the Solr subversion
repository.

This query handler supports a simplified version of the Lucene QueryParser
syntax.  Quotes can be used to group phrases, and +/- can be used to
denote mandatory and optional clauses ... but all other Lucene query
parser special characters are escaped to simplify the user experience.
The handler takes responsibility for building a good query from the user's
input using BooleanQueries containing DisjunctionMaxQueries across fields
and boosts you specify It also allows you to provide additional boosting
queries, boosting functions, and filtering queries to artificially affect
the outcome of all searches. These options can all be specified as init
parameters for the handler in your solrconfig.xml or overridden the Solr
query URL.

The code in this plugin is what is now powering CNET product search.

I've updated the "example" solrconfig.xml to take advantage of it, you can
take it for a spin right now if you build from scratch using subversion,
otherwise you'll have to wait for the solr-2006-05-21.zip nightly release
due out in a few hours.  Once you've got it, the javadocs for
DisMaxRequestHandler contain the details about all of the options it
supports, and here are a few URLs you can try out using the product data
in the exampledocs directory...

Normal results for the word "video" using the StandardRequestHandler with
the default search field...
  http://localhost:8983/solr/select/?q=video&fl=name+score&qt=standard

The "dismax" handler is configured to search across the text, features,
name, sku, id, manu, and cat fields all with varying boosts designed to
ensure that "better" matches appear first, specifically: documents which
match on the name and cat fields get higher scores...
  http://localhost:8983/solr/select/?q=video&qt=dismax

...note that this instance is also configured with a default field list,
which can be overridden in the URL...
  http://localhost:8983/solr/select/?q=video&qt=dismax&fl=*,score

You can also override which fields are searched on, and how much boost
each field gets...
  http://localhost:8983/solr/select/?q=video&qt=dismax&qf=features^20.0+text^0.3

Another instance of the handler is registered using the qt "instock" and
has slightly different configuration options, notably: a filter for (you
guessed it) inStock:true)...
  http://localhost:8983/solr/select/?q=video&qt=dismax&fl=name,score,inStock
  http://localhost:8983/solr/select/?q=video&qt=instock&fl=name,score,inStock

One of the other really cool features in this handler, is robust
support for specifying the "BooleanQuery.minimumNumberShouldMatch" you
want to be used based on how many terms are in your users query.
These allows flexibility for typos and partial matches.  For the
dismax handler, 1 and 2 word queries require that all of the optional
clauses match, but for 3-5 word queries one missing word is allowed...
  http://localhost:8983/solr/select/?q=belkin+ipod&qt=dismax
  http://localhost:8983/solr/select/?q=belkin+ipod+gibberish&qt=dismax
  http://localhost:8983/solr/select/?q=belkin+ipod+apple&qt=dismax

Just like the StandardRequestHandler, it supports the debugQuery
option to viewing the parsed query, and the score explanations for each
doc...
  http://localhost:8983/solr/select/?q=belkin+ipod+gibberish&qt=dismax&debugQuery=1
  http://localhost:8983/solr/select/?q=video+card&qt=dismax&debugQuery=1

...That's the overall gist of it.  I hope other people find it useful
out of the box -- and even if it doesn't meet your needs, hopefully it
gives you some good ideas of the types of things that can be done in a
SolrRequestHandler that aren't supported natively with the Lucene
QueryParser.  If you do decide to write your own handler, make sure to
take a look at the new SolrPluginUtils class as well -- it provides
some nice reusable methods that came in handy when writing the
DisMaxRequestHandler.

-Hoss

}}}