You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2008/04/01 00:55:15 UTC
RE: Highlight - get terms used by lucene
: Solr returns the max score and the score per document.
: This means that the best hit always is 100% which is not always what you
: want because the article itself could still be quite irrelevant...
Solr doesn't give you a percentage, and there's no reason to divide a
doc's scroe by maxScore to get a percentage -- anymore then there would be
with the Oracle function as described. The Oracle docs don't say that you
can divide a score of 23 by a max score of 100 to determine it's a 23%
match, just that scores will always be less then 100 ... in fact the doc
you linked to specificly says you can't compare scores, so a score of 23
for one query doesn't mean the samething as a score for 23 from another
query (which is also true for Lucene scores BTW, Lucene just doesn't
promise you any particular max score because there are so many more
internesting and complex query types in Lucene that make determining such
a max impossible)
My main point was: rather then letting Solr score the results one way, and
then trying to come up with your own variation on that score externally
(which is error prone given that your scoring varaition might result in a
differnet ordering and change which results appear per "page") let
Solr compute the score for you.
If you aren't happy with the way Solr computes the score, and you want a
simpiler Score calculation likewhat Oracle provides (that will only work
for simple Term queries) write a custom Similarity instance that does what
you want...
http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/Similarity.html
http://wiki.apache.org/solr/SolrPlugins
Off the cuff I think you'd get what Oracle describes by:
- omiting norms on all fields in your schema.xml
- making Similarity.queryNorm(float) allways return 3
- making Similarity.tf(float) allways return it's input
- not using query boosts
...all bets are off though if you use multi term queries (or phrase
queries, or fuzzy queries, etc..) but you can play with the other methods
in Similarity if you have a particular idea how you'd like those scored if
they you do use them.
-Hoss