You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by du...@web.de on 2006/11/10 12:48:53 UTC

result explanations / how to get the current document id inside a similarity subclass

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello folks,

we want to work with explanations of document scores inside result lists.
In this context we are interested on the scores of the single terms from a
query, for each document inside the result list:

Query:
"termA termB"

Result:
doc1  => overall score 2.3
doc2  => overall score 1.6

We would love it to have explanations like this:
overall score doc1 (2.3) = score termA (1.2) + score termB (1.1)
overall score doc2 (1.6) = score termA (1.1) + score termB (0.5)

In the past we worked with the Searcher.explain(..) method which is appropriate
in order to explain the results for small numbers of documents, but since this
method takes as much time as a whole search (as written inside the ApiDoc), this
of course is not feasible for whole result lists.

Nevertheless, all values should be available during the calculation of the overall
score, which is done inside the Similarity class. Thus, collecting of these should
result into nearly no runtime overhead, its mainly a question about memory.

We have looked inside Similarity, and all is available except the current document
id - so we have term score values but we don't know the documents they are related
to. And this is our question:
Does anybody know how to get this current document number/id inside a subclass
implementation of Similarity?


Thanks in advance!

Chris




- --
______________________________________________________________________

Christian Reuschling, Dipl.-Ing.(BA)
Software Engineer

Knowledge Management Department
German Research Center for Artificial Intelligence DFKI GmbH
Erwin-Schrödinger-Straße 57, D-67663 Kaiserslautern, Germany

Phone: +49.631.205-3441
mailto:reuschling@dfki.de  http://www.dfki.uni-kl.de/~reuschling/
______________________________________________________________________
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFFVGclQoTr50f1tpcRArjFAJ44DUfNDCdjhj3H/nsh7lWQD4nb7QCgmnXM
SHA8byhpGUZJyA/mwQ7IIws=
=Uka/
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: result explanations / how to get the current document id inside a similarity subclass

Posted by Chris Hostetter <ho...@fucit.org>.

: Nevertheless, all values should be available during the calculation of the overall
: score, which is done inside the Similarity class. Thus, collecting of these should
: result into nearly no runtime overhead, its mainly a question about memory.

Similarity instances don't calculate any scores -- Similarity provides the
utilities neccessary for hte various Scorer classes to compute their
scores using common functions.  the "overall score" of a query depends
on the query type ... in your examples it seems that you are dealing with
BooleanQueries contaiing TermQueries, but I could write a new Query class
that never even used the Similarity class if i wanted to.

your best bet for getting out the information that you would be to
subclass BooleanQuery/BooleanScorer2 and add your logic there.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org