You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2007/04/17 21:54:23 UTC
[Solr Wiki] Update of "SolrRelevancyFAQ" by YonikSeeley

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by YonikSeeley:
http://wiki.apache.org/solr/SolrRelevancyFAQ

------------------------------------------------------------------------------
  = Solr Relevancy FAQ =
+ 
+ Relevancy is the quality of results returned from a query, encompassing  both what documents are found, and their relative ranking (the order that they are returned to the user.)
+ 
  [[TableOfContents]]
- == How can I customize the text analysis for a field ==
- Different fields may require different text analysis for maximum relevancy.
- Every field may have it's own fieldType with different customized analysis.
- See [wiki:AnalyzersTokenizersTokenFilters] for more.
  
- == How do I query across multiple fields ==
- (bring in discussion of dismax handler here)
+ [[Anchor(multiFieldQuery)]]
+ == How can I search for "superman" in both the title and subject fields ==
+ The standard request handler uses [http://lucene.apache.org/java/docs/queryparsersyntax.html Lucene QueryParser] syntax for '''q''':
  
+ {{{q=title:superman subject:superman}}}
- == How can I make matches on one field score higher than matches on another ==
- (bring in discussion of dismax handler here)
  
- == How can I boost the score of exact case matches ==
+ Using the [wiki:DisMaxRequestHandler dismax request handler], specify the query fields using the '''qf''' param.
  
- == How can I index the same field more than once ==
+ {{{q=superman&qf=title subject}}}
  
- == How can I match parts of words, or sub-words ==
  
- == How can I increase the score when terms appear closer together ==
+ == How can I make "superman" in the title field score higher than in the subject field ==
+ For the standard request handler, "boost" the clause on the title field:
  
+ {{{q=title:superman^2 subject:superman}}}
+ 
+ Using the dismax request handler, one can specify boosts on fields in parameters such as '''qf''':
+ 
+ {{{q=superman&qf=title^2 subject}}}
+ 
+ 
+ == Why doesn't my query of "flash" match a field containing "Flash" ==
+ 
+ The fieldType for the field containing "Flash" must have an analyzer that lowercases terms.  This will cause all searches on that field to be case insensitive.
+ 
+ See AnalyzersTokenizersTokenFilters for more.
+ 
+ 
+ == How can I make exact-case matches score higher ==
+ Example: a query of "Penguin" should score documents containing "Penguin" higher than docs containing "penguin".
+ 
+ The general strategy is to index the content twice, using different fields
+ with different fieldTypes (and different analyzers associated with those fieldTypes).
+ One analyzer will contain a lowercase filter for case-insensitive matches, and one will preserve case for exact-case matches.
+ 
+ Use [wiki:Self:SchemaXml#copyField copyField] commands in the schema to index a single
+ input field multiple times.
+ 
+ Once the content is indexed into multiple fields that are analyzed differently, 
+ [#multiFieldQuery query across both fields].
+ 
+ == I'm getting query parse exceptions when making queries ==
+ For the standard request handler, the query must be correctly formatted in Lucene Query Parser syntax with any special characters escaped.
+ 
+ The dismax handler has a more forgiving parser for the '''q''' parameter, useful for directly passing in a user-supplied query string.
+ 
+ 
+ == How can I make queries of "spiderman" and "spider man" match "Spider-Man" ==
+ [wiki:Self:AnalyzersTokenizersTokenFilters#WordDelimiterFilter WordDelimiterFilter] can be used
+ in the analyzer for the field being queried to match words with intra-word delimiters
+ such as dashes or case changes.
+ 
+ 
+ == How can I search for one term near another term (say, "batman" and "movie") ==
+ A proximity search can be done with a sloppy phrase query.  The closer together the two
+ terms appear in the document, the higher the score will be.  A sloppy phrase query 
+ specifies a maximum "slop", or the number of positions tokens need to be moved to get a match.
+ 
+ This example for the standard request handler will find all documents where "batman" occurs within 100 words of "movie":
+ 
+ {{{q=text:"batman movie"~100}}}
+ 
+ The dismax handler can easily create sloppy phrase queries with the '''pf''' (phrase fields) and '''ps''' (phrase slop) parameters:
+ 
+ {{{q=batman movie&pf=text&ps=100}}}
+ 
+ The dismax handler also allows users to explicitly specify a phrase query with double quotes, and the '''qs'''(query slop) parameter can be used to add slop to any explicit phrase
+ queries:
+ 
+ {{{q="batman movie"&qs=100}}}
+ 
+ 
- == How can I increase the score for certain documents ==
+ == How can I increase the score for specific documents ==
  To increase the scores for certain documents that match a query, regardless of what that query may be, one can use index-time boosts.
  
- boosted (provided that the hit was on a field for which norms were not omitted).  Index-time boosts can be specified per-field also.  An Index-time boost on a value of a multiValued field applies to all values for that field.
+ Index-time boosts can be specified per-field also, so only queries matching on that specific field will get the extra boost.  An Index-time boost on a value of a multiValued field applies to all values for that field.
+ 
  
  == How are documents scored ==
  Basic scoring factors:
@@ -35, +93 @@

    * index-time boost - if a boost was specified for a document at index time, scores for searches that match that document will be boosted.
    * query clause boost - a user may explicitly boost the contribution of one part of a query over another.
  
+ See the [http://lucene.apache.org/java/docs/scoring.html Lucene scoring documentation] for more info.
+ 
  == Why does document X score higher than document Y ==
+ Add '''debugQuery=on''' to your request, and you will get (fairly dense) scoring information for each document returned.  This can also be accessed via the admin query page, http://localhost:8983/solr/admin/form.jsp
- To debug scoring issues, Lucene "explain" functionality may be accessed by enabling query debugging.
- ...
  
+ == Why doesn't document id:foobar appear in the top 10 results for my relevancy ranked query ==
+ Since '''debugQuery=on''' only gives you scoring "explain" info for the documents
+ returned, the '''explainOther''' parameter can be used to specify other documents
+ you want explain info for.
+ 
+ {{{debugQuery=on&explainOther=id:foobar}}}
+ 
+ Now you should be able to examine the explain info of the top matching documents, and compare it to the explain info for the id:foobar document, and determine why the rankings are not as you expect.
+ 
- == How can I change the score of a document based on the *value* of a field ==
+ == How can I change the score of a document based on the *value* of a field (say, "popularity") ==
  Use a [http://lucene.apache.org/solr/api/org/apache/solr/search/function/FunctionQuery.html FunctionQuery] as a component of your query.
  
  Solr can parse function queries in the following [http://lucene.apache.org/solr/api/org/apache/solr/search/QueryParsing.html#parseFunction(java.lang.String,%20org.apache.solr.schema.IndexSchema) syntax].
+ 
+ TODO: actual examples...
  
  == How can I boost the score of newer documents ==
    * Do an explicit sort by date (relevancy scores are ignored)