You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2010/02/05 01:07:58 UTC

[Solr Wiki] Trivial Update of "SolrRelevancyFAQ" by MarcSturlese

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrRelevancyFAQ" page has been changed by MarcSturlese.
The comment on this change is: The bq clause was wrong.
http://wiki.apache.org/solr/SolrRelevancyFAQ?action=diff&rev1=16&rev2=17

--------------------------------------------------

  = Solr Relevancy FAQ =
- 
  Relevancy is the quality of results returned from a query, encompassing  both what documents are found, and their relative ranking (the order that they are returned to the user.)
  
  <<TableOfContents>>
  
  <<Anchor(standard_vs_dismax)>>
+ 
  == Should I use the standard or dismax request handler ==
  The [[StandardRequestHandler|standard]] request handler uses SolrQuerySyntax to specify the query via the '''q''' parameter, and it must be well formed or an error will be returned.  It's good for specifying exact, arbitrarily complex queries.
  
@@ -15, +15 @@

  For servicing user-entered queries, start by using dismax.
  
  <<Anchor(multiFieldQuery)>>
+ 
  == How can I search for "superman" in both the title and subject fields ==
  The standard request handler uses SolrQuerySyntax for '''q''':
  
@@ -24, +25 @@

  
  {{{q=superman&qf=title subject}}}
  
- 
  == How can I make "superman" in the title field score higher than in the subject field ==
  For the standard request handler, "boost" the clause on the title field:
  
@@ -34, +34 @@

  
  {{{q=superman&qf=title^2 subject}}}
  
- 
- 
  == Why are search results returned in the order they are? ==
  If no other sort order is specified, the default is by relevancy score.
  
- 
  == How can I see the relevancy scores for search results ==
- Request that the pseudo-field named "score" be returned by adding it to the '''fl''' (field list) parameter. The "score" will then appear along with the stored fields in returned documents.
+ Request that the pseudo-field named "score" be returned by adding it to the '''fl''' (field list) parameter. The "score" will then appear along with the stored fields in returned documents. {{{q=Justice League&fl=*,score}}}
- {{{q=Justice League&fl=*,score}}}
- 
  
  == Why doesn't my query of "flash" match a field containing "Flash" (with a capital "F") ==
- 
  The fieldType for the field containing "Flash" must have an analyzer that lowercases terms.  This will cause all searches on that field to be case insensitive.
  
  See AnalyzersTokenizersTokenFilters for more.
- 
  
  == How can I make exact-case matches score higher ==
  Example: a query of "Penguin" should score documents containing "Penguin" higher than docs containing "penguin".
  
+ The general strategy is to index the content twice, using different fields with different fieldTypes (and different analyzers associated with those fieldTypes). One analyzer will contain a lowercase filter for case-insensitive matches, and one will preserve case for exact-case matches.
- The general strategy is to index the content twice, using different fields
- with different fieldTypes (and different analyzers associated with those fieldTypes).
- One analyzer will contain a lowercase filter for case-insensitive matches, and one will preserve case for exact-case matches.
  
- Use [[SchemaXml#copyField|copyField]] commands in the schema to index a single
+ Use [[SchemaXml#copyField|copyField]] commands in the schema to index a single input field multiple times.
- input field multiple times.
  
- Once the content is indexed into multiple fields that are analyzed differently, 
+ Once the content is indexed into multiple fields that are analyzed differently,  [[#multiFieldQuery|query across both fields]].
- [[#multiFieldQuery|query across both fields]].
  
  == I'm getting query parse exceptions when making queries ==
  For the standard request handler, the '''q''' parameter must be correctly formatted SolrQuerySyntax, with any special characters escaped.  If this is a user-entered query, [[#standard_vs_dismax|consider using the dismax handler]].
  
- Many other parameters such as '''fq''' and '''facet.query''' must also conform to SolrQuerySyntax
+ Many other parameters such as '''fq''' and '''facet.query''' must also conform to SolrQuerySyntax regardless of which handler is used.
- regardless of which handler is used.
  
  == How can I make queries of "spiderman" and "spider man" match "Spider-Man" ==
+ [[AnalyzersTokenizersTokenFilters#WordDelimiterFilter|WordDelimiterFilter]] can be used in the analyzer for the field being queried to match words with intra-word delimiters such as dashes or case changes.
- [[AnalyzersTokenizersTokenFilters#WordDelimiterFilter|WordDelimiterFilter]] can be used
- in the analyzer for the field being queried to match words with intra-word delimiters
- such as dashes or case changes.
- 
  
  == How can I search for one term near another term (say, "batman" and "movie") ==
+ A proximity search can be done with a sloppy phrase query.  The closer together the two terms appear in the document, the higher the score will be.  A sloppy phrase query  specifies a maximum "slop", or the number of positions tokens need to be moved to get a match.
- A proximity search can be done with a sloppy phrase query.  The closer together the two
- terms appear in the document, the higher the score will be.  A sloppy phrase query 
- specifies a maximum "slop", or the number of positions tokens need to be moved to get a match.
  
  This example for the standard request handler will find all documents where "batman" occurs within 100 words of "movie":
  
@@ -90, +73 @@

  
  {{{q=batman movie&pf=text&ps=100}}}
  
- The dismax handler also allows users to explicitly specify a phrase query with double quotes, and the '''qs'''(query slop) parameter can be used to add slop to any explicit phrase
+ The dismax handler also allows users to explicitly specify a phrase query with double quotes, and the '''qs'''(query slop) parameter can be used to add slop to any explicit phrase queries:
- queries:
  
  {{{q="batman movie"&qs=100}}}
- 
  
  == How can I increase the score for specific documents ==
  === index-time boosts ===
@@ -113, +94 @@

  Solr can parse function queries in the following [[http://lucene.apache.org/solr/api/org/apache/solr/search/QueryParsing.html#parseFunction(java.lang.String,%20org.apache.solr.schema.IndexSchema)|syntax]].
  
  Some examples...
+ 
  {{{
    # simple boosts by popularity
    q=%2Bsupervillians+_val_:"popularity"
@@ -122, +104 @@

    q=%2Bsupervillians+_val_:"scale(popularity,0,100)"
    defType=dismax&qf=text&q=supervillians&bf=sqrt(popularity)
  }}}
- 
  == How are documents scored ==
  Basic scoring factors:
+ 
-   * tf stands for term frequency - the more times a search term appears in a document, the higher the score
+  * tf stands for term frequency - the more times a search term appears in a document, the higher the score
-   * idf stands for inverse document frequency - matches on rarer terms count more than matches on common terms 
+  * idf stands for inverse document frequency - matches on rarer terms count more than matches on common terms
-   * coord is the coordination factor - if there are multiple terms in a query, the more terms that match, the higher the score
+  * coord is the coordination factor - if there are multiple terms in a query, the more terms that match, the higher the score
-   * lengthNorm - matches on a smaller field score higher than matches on a larger field
+  * lengthNorm - matches on a smaller field score higher than matches on a larger field
-   * index-time boost - if a boost was specified for a document at index time, scores for searches that match that document will be boosted.
+  * index-time boost - if a boost was specified for a document at index time, scores for searches that match that document will be boosted.
-   * query clause boost - a user may explicitly boost the contribution of one part of a query over another.
+  * query clause boost - a user may explicitly boost the contribution of one part of a query over another.
  
  See the [[http://lucene.apache.org/java/2_4_0/scoring.html|Lucene scoring documentation]] for more info.
  
- 
  == Why does id:archangel come before id:hawkgirl when querying for "wings" ==
- Add '''debugQuery=on''' to your request, and you will get (fairly dense) detailed scoring information for each document returned. 
+ Add '''debugQuery=on''' to your request, and you will get (fairly dense) detailed scoring information for each document returned.
  
  {{{q=wings&indent=on&debugQuery=on}}}
  
  This extra information will appear in the "explain" section of the "debug" section in the response.
+ 
  {{{
  <response>
  <result>[...]</result>
@@ -164, +146 @@

  </str>
  [...]
  }}}
- 
  In this specific example, we see that the main scoring difference between the two documents is the '''tf''' or (term frequency) factor.  The text field for the '''id:archangel''' document contains the term '''wings''' 3 times ({{{termFreq(text:wings)=3}}}) while the '''id:hawkgirl''' document only contains it once.
  
  Debug info is expensive to generate, and should only be used for debugging problems with specific queries.
@@ -172, +153 @@

  Debug info can also be selected from the admin query page, http://localhost:8983/solr/admin/form.jsp
  
  == Why doesn't document id:juggernaut appear in the top 10 results for my query ==
+ Since '''debugQuery=on''' only gives you scoring "explain" info for the documents returned, the '''explainOther''' parameter can be used to specify other documents you want detailed scoring info for.
- Since '''debugQuery=on''' only gives you scoring "explain" info for the documents
- returned, the '''explainOther''' parameter can be used to specify other documents
- you want detailed scoring info for.
  
  {{{q=supervillians&debugQuery=on&explainOther=id:juggernaut}}}
  
  Now you should be able to examine the scoring explain info of the top matching documents, compare it to the explain info for documents matching id:juggernaut, and determine why the rankings are not as you expect.
  
- 
  == How can I boost the score of newer documents ==
-   * Do an explicit sort by date (relevancy scores are ignored)
+  * Do an explicit sort by date (relevancy scores are ignored)
-   * Use an index-time boost that is larger for newer documents
+  * Use an index-time boost that is larger for newer documents
-   * Use a FunctionQuery to influence the score based on a date field.
+  * Use a FunctionQuery to influence the score based on a date field.
-     * In Solr 1.3, use something of the form recip(rord(myfield),1,1000,1000)
+   * In Solr 1.3, use something of the form recip(rord(myfield),1,1000,1000)
-     * In Solr 1.4, use something of the form recip(ms(NOW,mydatefield),3.16e-11,1,1)
+   * In Solr 1.4, use something of the form recip(ms(NOW,mydatefield),3.16e-11,1,1)
+  http://lucene.apache.org/solr/api/org/apache/solr/search/function/ReciprocalFloatFunction.html http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html
-   http://lucene.apache.org/solr/api/org/apache/solr/search/function/ReciprocalFloatFunction.html
- 
-   http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html
  
  A full example of a query for "ipod" with the score boosted higher the newer the product is:
+ 
  {{{
  http://localhost:8983/solr/select?q={!boost b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}ipod
  }}}
- 
  One can simplify the implementation by decomposing the query into multiple arguments:
+ 
  {{{
  http://localhost:8983/solr/select?q={!boost b=$dateboost v=$qq}&dateboost=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)&qq=ipod
  }}}
- 
  Now the main "q" argument as well as the "dateboost" argument may be specified as defaults in a search handler in solrconfig.xml, and clients would only need to pass "qq", the user query.
  
  To boost another query type such as a dismax query, the value of the boost query is a full sub-query and hence can use the {!querytype} syntax. Alternately, the defType param can be used in the boost local params to set the default type to dismax.  The other dismax parameters may be set as top level parameters.
+ 
  {{{
  http://localhost:8983/solr/select?q={!boost b=$dateboost v=$qq defType=dismax}&dateboost=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)&qf=text&pf=text&qq=ipod
  }}}
- 
  == How do I give a very low boost to documents that match my query ==
  In general the problem is that a "low" boost is still a boost, it can only improve the score of documents that match. One way to fake a "negative boost" is to give a high boost to everything that does *not* match. For example:
  
-        bq=(*:* -field_a:54^10000)
+  . bq=(*:* -field_a:54)^10000
  
  TODO: If "bq" supports pure negative queries then you can simplify that to bq=-field_a:54^10000
  
  == TODO ==
  /!\ :TODO: /!\
+ 
   * shorter fields score higher
   * filter vs query clause
   * when should index-time boosts be used