You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2007/04/17 21:54:23 UTC
[Solr Wiki] Update of "SolrRelevancyFAQ" by YonikSeeley
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by YonikSeeley:
http://wiki.apache.org/solr/SolrRelevancyFAQ
------------------------------------------------------------------------------
= Solr Relevancy FAQ =
+
+ Relevancy is the quality of results returned from a query, encompassing both what documents are found, and their relative ranking (the order that they are returned to the user.)
+
[[TableOfContents]]
- == How can I customize the text analysis for a field ==
- Different fields may require different text analysis for maximum relevancy.
- Every field may have it's own fieldType with different customized analysis.
- See [wiki:AnalyzersTokenizersTokenFilters] for more.
- == How do I query across multiple fields ==
- (bring in discussion of dismax handler here)
+ [[Anchor(multiFieldQuery)]]
+ == How can I search for "superman" in both the title and subject fields ==
+ The standard request handler uses [http://lucene.apache.org/java/docs/queryparsersyntax.html Lucene QueryParser] syntax for '''q''':
+ {{{q=title:superman subject:superman}}}
- == How can I make matches on one field score higher than matches on another ==
- (bring in discussion of dismax handler here)
- == How can I boost the score of exact case matches ==
+ Using the [wiki:DisMaxRequestHandler dismax request handler], specify the query fields using the '''qf''' param.
- == How can I index the same field more than once ==
+ {{{q=superman&qf=title subject}}}
- == How can I match parts of words, or sub-words ==
- == How can I increase the score when terms appear closer together ==
+ == How can I make "superman" in the title field score higher than in the subject field ==
+ For the standard request handler, "boost" the clause on the title field:
+ {{{q=title:superman^2 subject:superman}}}
+
+ Using the dismax request handler, one can specify boosts on fields in parameters such as '''qf''':
+
+ {{{q=superman&qf=title^2 subject}}}
+
+
+ == Why doesn't my query of "flash" match a field containing "Flash" ==
+
+ The fieldType for the field containing "Flash" must have an analyzer that lowercases terms. This will cause all searches on that field to be case insensitive.
+
+ See AnalyzersTokenizersTokenFilters for more.
+
+
+ == How can I make exact-case matches score higher ==
+ Example: a query of "Penguin" should score documents containing "Penguin" higher than docs containing "penguin".
+
+ The general strategy is to index the content twice, using different fields
+ with different fieldTypes (and different analyzers associated with those fieldTypes).
+ One analyzer will contain a lowercase filter for case-insensitive matches, and one will preserve case for exact-case matches.
+
+ Use [wiki:Self:SchemaXml#copyField copyField] commands in the schema to index a single
+ input field multiple times.
+
+ Once the content is indexed into multiple fields that are analyzed differently,
+ [#multiFieldQuery query across both fields].
+
+ == I'm getting query parse exceptions when making queries ==
+ For the standard request handler, the query must be correctly formatted in Lucene Query Parser syntax with any special characters escaped.
+
+ The dismax handler has a more forgiving parser for the '''q''' parameter, useful for directly passing in a user-supplied query string.
+
+
+ == How can I make queries of "spiderman" and "spider man" match "Spider-Man" ==
+ [wiki:Self:AnalyzersTokenizersTokenFilters#WordDelimiterFilter WordDelimiterFilter] can be used
+ in the analyzer for the field being queried to match words with intra-word delimiters
+ such as dashes or case changes.
+
+
+ == How can I search for one term near another term (say, "batman" and "movie") ==
+ A proximity search can be done with a sloppy phrase query. The closer together the two
+ terms appear in the document, the higher the score will be. A sloppy phrase query
+ specifies a maximum "slop", or the number of positions tokens need to be moved to get a match.
+
+ This example for the standard request handler will find all documents where "batman" occurs within 100 words of "movie":
+
+ {{{q=text:"batman movie"~100}}}
+
+ The dismax handler can easily create sloppy phrase queries with the '''pf''' (phrase fields) and '''ps''' (phrase slop) parameters:
+
+ {{{q=batman movie&pf=text&ps=100}}}
+
+ The dismax handler also allows users to explicitly specify a phrase query with double quotes, and the '''qs'''(query slop) parameter can be used to add slop to any explicit phrase
+ queries:
+
+ {{{q="batman movie"&qs=100}}}
+
+
- == How can I increase the score for certain documents ==
+ == How can I increase the score for specific documents ==
To increase the scores for certain documents that match a query, regardless of what that query may be, one can use index-time boosts.
- boosted (provided that the hit was on a field for which norms were not omitted). Index-time boosts can be specified per-field also. An Index-time boost on a value of a multiValued field applies to all values for that field.
+ Index-time boosts can be specified per-field also, so only queries matching on that specific field will get the extra boost. An Index-time boost on a value of a multiValued field applies to all values for that field.
+
== How are documents scored ==
Basic scoring factors:
@@ -35, +93 @@
* index-time boost - if a boost was specified for a document at index time, scores for searches that match that document will be boosted.
* query clause boost - a user may explicitly boost the contribution of one part of a query over another.
+ See the [http://lucene.apache.org/java/docs/scoring.html Lucene scoring documentation] for more info.
+
== Why does document X score higher than document Y ==
+ Add '''debugQuery=on''' to your request, and you will get (fairly dense) scoring information for each document returned. This can also be accessed via the admin query page, http://localhost:8983/solr/admin/form.jsp
- To debug scoring issues, Lucene "explain" functionality may be accessed by enabling query debugging.
- ...
+ == Why doesn't document id:foobar appear in the top 10 results for my relevancy ranked query ==
+ Since '''debugQuery=on''' only gives you scoring "explain" info for the documents
+ returned, the '''explainOther''' parameter can be used to specify other documents
+ you want explain info for.
+
+ {{{debugQuery=on&explainOther=id:foobar}}}
+
+ Now you should be able to examine the explain info of the top matching documents, and compare it to the explain info for the id:foobar document, and determine why the rankings are not as you expect.
+
- == How can I change the score of a document based on the *value* of a field ==
+ == How can I change the score of a document based on the *value* of a field (say, "popularity") ==
Use a [http://lucene.apache.org/solr/api/org/apache/solr/search/function/FunctionQuery.html FunctionQuery] as a component of your query.
Solr can parse function queries in the following [http://lucene.apache.org/solr/api/org/apache/solr/search/QueryParsing.html#parseFunction(java.lang.String,%20org.apache.solr.schema.IndexSchema) syntax].
+
+ TODO: actual examples...
== How can I boost the score of newer documents ==
* Do an explicit sort by date (relevancy scores are ignored)