You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Patrick Diviacco <pa...@gmail.com> on 2011/03/26 15:57:49 UTC

get the cosine similarity measure as output results ?

I'm performing several queries and I get scores per each document which I
have been told being not comparable across queries.

For example, if I get score: 8.234234 for a specific document from a query
A, I cannot compare such score with the document score: 3.342432 of the
query B.

However I need to find a comparable score across queries, and more
specifically the cosine similarity... as similarity measure between my query
document and the documents in the collection.

could you give me some tip about it ?
thanks

Re: get the cosine similarity measure as output results ?

Posted by Patrick Diviacco <pa...@gmail.com>.
Update:

I actually don't understand why if the scores are substantially the cosine
similarity between query and the docs, such scores are not comparable
between queries.

Isn't cosine similarity describing the divergence between vectors ? If I
have vector A and B (my queries) and vector C (a doc), can't I say which
vector is more similar to vector C by considering the score ? In other
words, can't I compare the scores ?

I'm asking this because I've been told I cannot compare the score of a query
with the score of another one.

Furthermore, my queries are docs from the collection: I compare 1 doc from
the Collection against all other docs.

I need some more info about this...
thanks



On 26 March 2011 15:57, Patrick Diviacco <pa...@gmail.com> wrote:

> I'm performing several queries and I get scores per each document which I
> have been told being not comparable across queries.
>
> For example, if I get score: 8.234234 for a specific document from a query
> A, I cannot compare such score with the document score: 3.342432 of the
> query B.
>
> However I need to find a comparable score across queries, and more
> specifically the cosine similarity... as similarity measure between my query
> document and the documents in the collection.
>
> could you give me some tip about it ?
> thanks
>