You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Eugene <ec...@gmail.com> on 2006/03/02 18:56:37 UTC

Help interpreting explanation

Hi All,

I'm not sure how to interpret the result of the toString method of 
Explanation.  I'm trying to see the values of each component of the 
Default Similarity formula for a particular query and a doc.  Given 
below is a sample of my Explanation output. Many thanks if anyone could 
help explain some of the values or direct me to a place that does so.

Explanation = 0.683103 = product of:
   1.7077575 = sum of:
     0.184242 = weight(Contents:x in 78), product of:
       0.13565542 = queryWeight(Contents:x), product of:
         2.509232 = idf(docFreq=85)
         0.054062527 = queryNorm
       1.3581617 = fieldWeight(Contents:x in 78), product of:
         1.7320508 = tf(termFreq(Contents:x)=3)
         2.509232 = idf(docFreq=85)
         0.3125 = fieldNorm(field=Contents, doc=78)
     0.184242 = weight(Contents:x in 78), product of:
       0.13565542 = queryWeight(Contents:x), product of:
         2.509232 = idf(docFreq=85)
         0.054062527 = queryNorm
       1.3581617 = fieldWeight(Contents:x in 78), product of:
         1.7320508 = tf(termFreq(Contents:x)=3)
         2.509232 = idf(docFreq=85)
         0.3125 = fieldNorm(field=Contents, doc=78)
     0.26218253 = weight(Contents:y in 78), product of:
       0.16182467 = queryWeight(Contents:y), product of:
         2.9932873 = idf(docFreq=52)
         0.054062527 = queryNorm
       1.6201642 = fieldWeight(Contents:y in 78), product of:
         1.7320508 = tf(termFreq(Contents:y)=3)
         2.9932873 = idf(docFreq=52)
         0.3125 = fieldNorm(field=Contents, doc=78)

--
Eugene

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Chris Hostetter <ho...@fucit.org>.

: on using Lucene but info for the internal workings of Lucene is hard to
: come by.

As with many OS code bases: the code is the documentation.

: 1) I'm using the default QueryParser to parse and return a query so it's
: a Boolean-OR query. So does this mean it uses the DisjunctionSumScorer
: or something?

I honestly don't understand all the ways the different Scorers are used
for BooleanQueries.  The thing to keep in mind is that they are all
optimizations that get choosen based on wether some/all clauses are
required, wether any clauses are prohibited, etc...  If you understand
what the basic BooleanScorer does, then you understand what all of the
other Scorers do -- they just go about it in slightly different ways.

: 2) Just wondering looking at the API for BooleanQuery i saw this: "Using
: setMinimumNumberShouldMatch will force the use of BooleanWeight2,
: regardless of wether setUseScorer14(true) has been called."
: What is the method setUseScorer14 about?

Hmmm... i guess it's not really documented is it?  setUseScorer14(true) is
just a way to force the old lucene 1.4.x style Boolean Scoring.  I don't
really know why that code was left in, or why you might want to use it.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Eugene <ec...@gmail.com>.

Thanks, Chris for your clear explanations, it seems there are a lot info 
on using Lucene but info for the internal workings of Lucene is hard to 
come by.

I got some more questions which I'll ask in-line.

Chris Hostetter wrote:
> : Since i'm using a boolean OR query i figured it must be related to the
> : BooleanScorer (though there's a more complicated BooleanScorer2 which
> : I'm not sure when it's use).
> 
> There's actually three possible scorers used: ConjunctionScorer can be
> used if all of the clauses are required.  Most of the behavior is driven
> based on wether or not BooleanQuery.setUseScorer14(true) -- by default it
> is false, which means BooleanScorer2 is used.

1) I'm using the default QueryParser to parse and return a query so it's 
a Boolean-OR query. So does this mean it uses the DisjunctionSumScorer 
or something?

2) Just wondering looking at the API for BooleanQuery i saw this: "Using 
setMinimumNumberShouldMatch will force the use of BooleanWeight2, 
regardless of wether setUseScorer14(true) has been called."
What is the method setUseScorer14 about?

--
Eugene

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Chris Hostetter <ho...@fucit.org>.

: Since i'm using a boolean OR query i figured it must be related to the
: BooleanScorer (though there's a more complicated BooleanScorer2 which
: I'm not sure when it's use).

There's actually three possible scorers used: ConjunctionScorer can be
used if all of the clauses are required.  Most of the behavior is driven
based on wether or not BooleanQuery.setUseScorer14(true) -- by default it
is false, which means BooleanScorer2 is used.

: But, I would appreciate if someone could point me to the method where
: the searcher iterates over all query terms and outputs the score. I grep
:   both the Searcher classes and the BooleanScorer classes but can't find it.

the searcher doesn't really iterate over query terms, it knows about one
and only one Weight, and it asks that Weight instance to give it a Scorer
for the current index, and then it asks that Scorer to iterate over the
documents and tell it which ones match (using the score(HitCollector)
method).  Internally, the Scorer iterates over the matching documents
using the to next(), doc()" and score() methods.

When you are searching for a single Term, the Scorer involved is
TermScorer; when you are seraching for many terms, the Scorer involved is
(usually) a BooleanScorer2.   BooleanScorers are complicated because they
are juggling a lot of things at once keeping track of which of the Scorers it
contains has the lowest "next" doc, but if you look at TermWeight and
TermScorer you'll get a pretty good idea of how the various similarity
methods are used.

: Also, I would like to know whether will the sloppyFreq "kick in" if I'm
: just using a Boolean OR query or is this only for phrase queries? And
: how do I disable this so that it'll always be 1.0 without overwriting
: the method?

as it says in the javadocs "amount of a sloppy phrase match..." ... only
for phrase queries (or phrase like queries, ie: SpanNear)

In general, the only way to change the implimentation of a Similarity
method used for all queries is to write your own Similarity class, and use
the Similarity.setDefault or Searcher.setSimilarity methods to "register"
it.

If you really only one type of query (or one instance of a query object)
to get a different similarity, you can override the getSimiliarty() method
in the Query class in question, and use the SimilarityDelegator to wrap
the default, and only change the methods you want to change.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Eugene <ec...@gmail.com>.

Hi,

Since i'm using a boolean OR query i figured it must be related to the 
BooleanScorer (though there's a more complicated BooleanScorer2 which 
I'm not sure when it's use).

Looking at the BooleanScorer code it's probably a little over my head as 
I'm still a beginner to Lucene.

But, I would appreciate if someone could point me to the method where 
the searcher iterates over all query terms and outputs the score. I grep 
  both the Searcher classes and the BooleanScorer classes but can't find it.

Also, I would like to know whether will the sloppyFreq "kick in" if I'm 
just using a Boolean OR query or is this only for phrase queries? And 
how do I disable this so that it'll always be 1.0 without overwriting 
the method?

Thanks for all the help so far.

Chris Hostetter wrote:
> : cosine similarity and need some help.  Can anyone tell me in which file
> : are the methods of the DefaultSimilarity methods called?
> 
> Most of the Similarity methods are called by the various Scorers.   A good
> IDE will tell you where they are called (or you could just grep the
> source, that's what I do)
> 
> : For example, looking at the tf method i see that it takes in a float for
> : freq instead of int. So i'm curious to see how this method is invoked.
> 
> I commented on this recently (and no one contested my explanation)...
> 
> http://www.nabble.com/Similarity-Usage%3A-tf%28int%29-vs-tf%28float%29-p2981283.html
> 
> 
> -Hoss
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Chris Hostetter <ho...@fucit.org>.

: cosine similarity and need some help.  Can anyone tell me in which file
: are the methods of the DefaultSimilarity methods called?

Most of the Similarity methods are called by the various Scorers.   A good
IDE will tell you where they are called (or you could just grep the
source, that's what I do)

: For example, looking at the tf method i see that it takes in a float for
: freq instead of int. So i'm curious to see how this method is invoked.

I commented on this recently (and no one contested my explanation)...

http://www.nabble.com/Similarity-Usage%3A-tf%28int%29-vs-tf%28float%29-p2981283.html


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Eugene <ec...@gmail.com>.

Thanks, for posting the "more like this" code.  I just began coding my 
cosine similarity and need some help.  Can anyone tell me in which file 
are the methods of the DefaultSimilarity methods called?

For example, looking at the tf method i see that it takes in a float for 
freq instead of int. So i'm curious to see how this method is invoked.

Thanks.

--
Eugene
Eric Jain wrote:
> Eugene wrote:
>> Any good links on extending the similarity class? A lot of posts 
>> discusses David Spencer's "More Like This" but i can;t find this 
>> anywhere.
> 
> The "More Like This" code can be found here:
> 
> http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/similarity/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Eric Jain <Er...@isb-sib.ch>.

Eugene wrote:
> Any good links on extending the similarity class? A lot of posts 
> discusses David Spencer's "More Like This" but i can;t find this anywhere.

The "More Like This" code can be found here:

http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/similarity/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Eugene <ec...@gmail.com>.

I was wondering if anyone has any idea how i can start to implement my 
own similarity. I wanna use the cosine similarity measure instead. I was 
looking through the past forums posts and saw that quite a few people 
have also discussed this, but no real method of doing it was mentioned.

Any good links on extending the similarity class? A lot of posts 
discusses David Spencer's "More Like This" but i can;t find this anywhere.

Thanks.

Chris Hostetter wrote:
> : I was looking at the new 1.9 api and can't seem to find this expert mode
> : of searching.
> 
> yonik's refering to all of the methods in the Searcher class that have
> "Expert" in their (javadoc) description.
> 
> : http://lucene.apache.org/java/docs/api/org/apache/lucene/search/IndexSearcher.html#search(org.apache.lucene.search.Weight,%20org.apache.lucene.search.Filter,%20org.apache.lucene.search.HitCollector)
> 
> ...that method isn't labeled "expert" but it also uses raw scores
> (HitCollector's have allways recieved the raw scores)
> 
> 
> -Hoss
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Chris Hostetter <ho...@fucit.org>.

: I was looking at the new 1.9 api and can't seem to find this expert mode
: of searching.

yonik's refering to all of the methods in the Searcher class that have
"Expert" in their (javadoc) description.

: http://lucene.apache.org/java/docs/api/org/apache/lucene/search/IndexSearcher.html#search(org.apache.lucene.search.Weight,%20org.apache.lucene.search.Filter,%20org.apache.lucene.search.HitCollector)

...that method isn't labeled "expert" but it also uses raw scores
(HitCollector's have allways recieved the raw scores)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Eugene <ec...@gmail.com>.

I was looking at the new 1.9 api and can't seem to find this expert mode 
of searching.
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/IndexSearcher.html#search(org.apache.lucene.search.Weight,%20org.apache.lucene.search.Filter,%20org.apache.lucene.search.HitCollector)

Can you tell where i can find it?

thanks.

--
Eugene
Yonik Seeley wrote:
> On 3/3/06, Eugene <ec...@gmail.com> wrote:
>> Just one more question: Any way in which i can disable this normalization?
> 
> We disabled this normalization for in Lucene 1.9 for the "expert"
> level search methods on IndexSearcher.  Use the search methods that
> don't return Hits.
> 
> -Yonik
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Yonik Seeley <ys...@gmail.com>.

On 3/3/06, Eugene <ec...@gmail.com> wrote:
> Just one more question: Any way in which i can disable this normalization?

We disabled this normalization for in Lucene 1.9 for the "expert"
level search methods on IndexSearcher.  Use the search methods that
don't return Hits.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Eugene <ec...@gmail.com>.

Ok, i figured out the normalization it was actually on an earlier post 
here: 
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200601.mbox/%3C88c6a6720601270001k760740b4h3606b5e7fdb904c3@mail.gmail.com%3E

Just one more question: Any way in which i can disable this normalization?

Thanks for all the help so far.

--
Eugene

Eugene wrote:
> Hi,
> 
> You mentioned:
> "The Hits class normalizes scores by dividing all scores by the highest
> score, if that highest score is above 1.0."
> 
> Can you explain what highest score are we talking about? I think there's 
> only one score for a query and doc right?
> 
> Thanks
> Yonik Seeley wrote:
>> On 3/3/06, Eugene <ec...@gmail.com> wrote:
>>> Hi Yonik,
>>>
>>> Thanks a lot, I think i understand how explanation works better now.
>>>
>>> But, there's something weird I noticed. I've a query like:
>>> "problem formulation each possible x probability p x y find x p x y
>>> maximized how compute p x y"
>>>
>>> The weird thing is that literals like "problem", "formulation" and other
>>> words don't show up in explanation only "p" "x" and "y" do show up. And
>>> I get returned a hit score of 1.0 when the explanation output is 
>>> 1.3260187:
>>>
>>> Explanation = 1.3260187 = product of:
>>>    2.410943 = sum of:
>>> .....
>>>
>>> So, basically 2 simple questions:
>>>
>>> 1) How do I make all the literals in my query show up in explanation?
>>
>> Only the literals that match that particular document will show up in
>> the explain for that document.  So the explain that you showed before
>> either belonged to a document that only matched "x" and "y" from all
>> the terms in your query, or you have an analyzer problem that is
>> causing more terms not to match (try using the same analyzer to query
>> that you used to index the document)
>>
>>> 2) How does Lucene convert an Explanation score of 1.3260187 to 1.0?
>>
>> The Hits class normalizes scores by dividing all scores by the highest
>> score, if that highest score is above 1.0.
>>
>> -Yonik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Eugene <ec...@gmail.com>.

Hi,

You mentioned:
"The Hits class normalizes scores by dividing all scores by the highest
score, if that highest score is above 1.0."

Can you explain what highest score are we talking about? I think there's 
only one score for a query and doc right?

Thanks
Yonik Seeley wrote:
> On 3/3/06, Eugene <ec...@gmail.com> wrote:
>> Hi Yonik,
>>
>> Thanks a lot, I think i understand how explanation works better now.
>>
>> But, there's something weird I noticed. I've a query like:
>> "problem formulation each possible x probability p x y find x p x y
>> maximized how compute p x y"
>>
>> The weird thing is that literals like "problem", "formulation" and other
>> words don't show up in explanation only "p" "x" and "y" do show up. And
>> I get returned a hit score of 1.0 when the explanation output is 1.3260187:
>>
>> Explanation = 1.3260187 = product of:
>>    2.410943 = sum of:
>> .....
>>
>> So, basically 2 simple questions:
>>
>> 1) How do I make all the literals in my query show up in explanation?
> 
> Only the literals that match that particular document will show up in
> the explain for that document.  So the explain that you showed before
> either belonged to a document that only matched "x" and "y" from all
> the terms in your query, or you have an analyzer problem that is
> causing more terms not to match (try using the same analyzer to query
> that you used to index the document)
> 
>> 2) How does Lucene convert an Explanation score of 1.3260187 to 1.0?
> 
> The Hits class normalizes scores by dividing all scores by the highest
> score, if that highest score is above 1.0.
> 
> -Yonik
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Yonik Seeley <ys...@gmail.com>.

On 3/3/06, Eugene <ec...@gmail.com> wrote:
> Hi Yonik,
>
> Thanks a lot, I think i understand how explanation works better now.
>
> But, there's something weird I noticed. I've a query like:
> "problem formulation each possible x probability p x y find x p x y
> maximized how compute p x y"
>
> The weird thing is that literals like "problem", "formulation" and other
> words don't show up in explanation only "p" "x" and "y" do show up. And
> I get returned a hit score of 1.0 when the explanation output is 1.3260187:
>
> Explanation = 1.3260187 = product of:
>    2.410943 = sum of:
> .....
>
> So, basically 2 simple questions:
>
> 1) How do I make all the literals in my query show up in explanation?

Only the literals that match that particular document will show up in
the explain for that document.  So the explain that you showed before
either belonged to a document that only matched "x" and "y" from all
the terms in your query, or you have an analyzer problem that is
causing more terms not to match (try using the same analyzer to query
that you used to index the document)

> 2) How does Lucene convert an Explanation score of 1.3260187 to 1.0?

The Hits class normalizes scores by dividing all scores by the highest
score, if that highest score is above 1.0.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Eugene <ec...@gmail.com>.

Hi Yonik,

Thanks a lot, I think i understand how explanation works better now.

But, there's something weird I noticed. I've a query like:
"problem formulation each possible x probability p x y find x p x y 
maximized how compute p x y"

The weird thing is that literals like "problem", "formulation" and other 
words don't show up in explanation only "p" "x" and "y" do show up. And 
I get returned a hit score of 1.0 when the explanation output is 1.3260187:

Explanation = 1.3260187 = product of:
   2.410943 = sum of:
.....

So, basically 2 simple questions:

1) How do I make all the literals in my query show up in explanation?

2) How does Lucene convert an Explanation score of 1.3260187 to 1.0?

Thanks.

--
Eugene

Yonik Seeley wrote:
> On 3/2/06, Eugene Ezekiel <ec...@gmail.com> wrote:
>> Thanks Yonik for the reply. I got just a couple more questions,
>>
>> 1) Why does the explanantion print so  many times?
> 
> Because it was a compound query with multiple parts to it.  It's one explanation
> with multiple parts.
> 
>>>From the explain output, I would guess the original query was something like
> x x y or Contents:x Contents:x Contents:y
> 
>> 2) Since my query is made up of multiple terms how do I know what term "x"
>> is referring to?
> 
> It's actually a literal "x".
> 
> For example, in my index, if I search for
> solr search lucene when the default field is text, then I get the
> following explain:
> 
> 1.1132671 = sum of:
>   0.27831677 = weight(text:solr in 84), product of:
>     0.57735026 = queryWeight(text:solr), product of:
>       3.85647 = idf(docFreq=4)
>       0.14970951 = queryNorm
>     0.48205876 = fieldWeight(text:solr in 84), product of:
>       1.0 = tf(termFreq(text:solr)=1)
>       3.85647 = idf(docFreq=4)
>       0.125 = fieldNorm(field=text, doc=84)
>   0.55663353 = weight(text:search in 84), product of:
>     0.57735026 = queryWeight(text:search), product of:
>       3.85647 = idf(docFreq=4)
>       0.14970951 = queryNorm
>     0.9641175 = fieldWeight(text:search in 84), product of:
>       2.0 = tf(termFreq(text:search)=4)
>       3.85647 = idf(docFreq=4)
>       0.125 = fieldNorm(field=text, doc=84)
>   0.27831677 = weight(text:lucen in 84), product of:
>     0.57735026 = queryWeight(text:lucen), product of:
>       3.85647 = idf(docFreq=4)
>       0.14970951 = queryNorm
>     0.48205876 = fieldWeight(text:lucen in 84), product of:
>       1.0 = tf(termFreq(text:lucen)=1)
>       3.85647 = idf(docFreq=4)
>       0.125 = fieldNorm(field=text, doc=84)
> 
> -Yonik
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Yonik Seeley <ys...@gmail.com>.

On 3/2/06, Eugene Ezekiel <ec...@gmail.com> wrote:
> Thanks Yonik for the reply. I got just a couple more questions,
>
> 1) Why does the explanantion print so  many times?

Because it was a compound query with multiple parts to it.  It's one explanation
with multiple parts.

>From the explain output, I would guess the original query was something like
x x y or Contents:x Contents:x Contents:y

> 2) Since my query is made up of multiple terms how do I know what term "x"
> is referring to?

It's actually a literal "x".

For example, in my index, if I search for
solr search lucene when the default field is text, then I get the
following explain:

1.1132671 = sum of:
  0.27831677 = weight(text:solr in 84), product of:
    0.57735026 = queryWeight(text:solr), product of:
      3.85647 = idf(docFreq=4)
      0.14970951 = queryNorm
    0.48205876 = fieldWeight(text:solr in 84), product of:
      1.0 = tf(termFreq(text:solr)=1)
      3.85647 = idf(docFreq=4)
      0.125 = fieldNorm(field=text, doc=84)
  0.55663353 = weight(text:search in 84), product of:
    0.57735026 = queryWeight(text:search), product of:
      3.85647 = idf(docFreq=4)
      0.14970951 = queryNorm
    0.9641175 = fieldWeight(text:search in 84), product of:
      2.0 = tf(termFreq(text:search)=4)
      3.85647 = idf(docFreq=4)
      0.125 = fieldNorm(field=text, doc=84)
  0.27831677 = weight(text:lucen in 84), product of:
    0.57735026 = queryWeight(text:lucen), product of:
      3.85647 = idf(docFreq=4)
      0.14970951 = queryNorm
    0.48205876 = fieldWeight(text:lucen in 84), product of:
      1.0 = tf(termFreq(text:lucen)=1)
      3.85647 = idf(docFreq=4)
      0.125 = fieldNorm(field=text, doc=84)

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Help interpreting explanation

Posted by Eugene Ezekiel <ec...@gmail.com>.

Thanks Yonik for the reply. I got just a couple more questions,

1) Why does the explanantion print so  many times?

2) Since my query is made up of multiple terms how do I know what term "x"
is referring to?




On 3/3/06, Yonik Seeley <ys...@gmail.com> wrote:
>
> I think Lucene in Action does a good job of it.
> There is also a formula given in the javadoc for DefaultSimilarity
>
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html
>
> See my comments below (inline)
>
> On 3/2/06, Eugene <ec...@gmail.com> wrote:
> > Hi All,
> >
> > I'm not sure how to interpret the result of the toString method of
> > Explanation.  I'm trying to see the values of each component of the
> > Default Similarity formula for a particular query and a doc.  Given
> > below is a sample of my Explanation output. Many thanks if anyone could
> > help explain some of the values or direct me to a place that does so.
> >
> > Explanation = 0.683103 = product of:
> >    1.7077575 = sum of:
> >      0.184242 = weight(Contents:x in 78), product of:
> >        0.13565542 = queryWeight(Contents:x), product of:
> the queryWeight is query-specific... it will have the same value
> for all documents matching the query.
> >          2.509232 = idf(docFreq=85)
> inverse document frequency... term "x" appears in 85 documents.
> >          0.054062527 = queryNorm
> queryNorm is a normalization factor... 1/sqrt(sum of all query weights
> squared)
>
> If you had a boost, it would also be multiplied into the queryWeight
> at this point.
> >        1.3581617 = fieldWeight(Contents:x in 78), product of:
> fieldWeight components are document specific.
> >          1.7320508 = tf(termFreq(Contents:x)=3)
> "x" appears 3 times in the field for this document
> >          2.509232 = idf(docFreq=85)
> same as the previous idf factor - 85 documents contain "x"
> >          0.3125 = fieldNorm(field=Contents, doc=78)
> the norm is calculated at index time... it's the length normalization
> factor (1/sqrt(num tokens in this field)) multipled by any on the
> field or document.
>
> >      0.184242 = weight(Contents:x in 78), product of:
> >        0.13565542 = queryWeight(Contents:x), product of:
> >          2.509232 = idf(docFreq=85)
> >          0.054062527 = queryNorm
> >        1.3581617 = fieldWeight(Contents:x in 78), product of:
> >          1.7320508 = tf(termFreq(Contents:x)=3)
> >          2.509232 = idf(docFreq=85)
> >          0.3125 = fieldNorm(field=Contents, doc=78)
> >      0.26218253 = weight(Contents:y in 78), product of:
> >        0.16182467 = queryWeight(Contents:y), product of:
> >          2.9932873 = idf(docFreq=52)
> >          0.054062527 = queryNorm
> >        1.6201642 = fieldWeight(Contents:y in 78), product of:
> >          1.7320508 = tf(termFreq(Contents:y)=3)
> >          2.9932873 = idf(docFreq=52)
> >          0.3125 = fieldNorm(field=Contents, doc=78)
>
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


--
Regards,
Eugene

Re: Help interpreting explanation

Posted by Yonik Seeley <ys...@gmail.com>.

I think Lucene in Action does a good job of it.
There is also a formula given in the javadoc for DefaultSimilarity
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html

See my comments below (inline)

On 3/2/06, Eugene <ec...@gmail.com> wrote:
> Hi All,
>
> I'm not sure how to interpret the result of the toString method of
> Explanation.  I'm trying to see the values of each component of the
> Default Similarity formula for a particular query and a doc.  Given
> below is a sample of my Explanation output. Many thanks if anyone could
> help explain some of the values or direct me to a place that does so.
>
> Explanation = 0.683103 = product of:
>    1.7077575 = sum of:
>      0.184242 = weight(Contents:x in 78), product of:
>        0.13565542 = queryWeight(Contents:x), product of:
the queryWeight is query-specific... it will have the same value
for all documents matching the query.
>          2.509232 = idf(docFreq=85)
inverse document frequency... term "x" appears in 85 documents.
>          0.054062527 = queryNorm
queryNorm is a normalization factor... 1/sqrt(sum of all query weights squared)

If you had a boost, it would also be multiplied into the queryWeight
at this point.
>        1.3581617 = fieldWeight(Contents:x in 78), product of:
fieldWeight components are document specific.
>          1.7320508 = tf(termFreq(Contents:x)=3)
"x" appears 3 times in the field for this document
>          2.509232 = idf(docFreq=85)
same as the previous idf factor - 85 documents contain "x"
>          0.3125 = fieldNorm(field=Contents, doc=78)
the norm is calculated at index time... it's the length normalization
factor (1/sqrt(num tokens in this field)) multipled by any on the
field or document.

>      0.184242 = weight(Contents:x in 78), product of:
>        0.13565542 = queryWeight(Contents:x), product of:
>          2.509232 = idf(docFreq=85)
>          0.054062527 = queryNorm
>        1.3581617 = fieldWeight(Contents:x in 78), product of:
>          1.7320508 = tf(termFreq(Contents:x)=3)
>          2.509232 = idf(docFreq=85)
>          0.3125 = fieldNorm(field=Contents, doc=78)
>      0.26218253 = weight(Contents:y in 78), product of:
>        0.16182467 = queryWeight(Contents:y), product of:
>          2.9932873 = idf(docFreq=52)
>          0.054062527 = queryNorm
>        1.6201642 = fieldWeight(Contents:y in 78), product of:
>          1.7320508 = tf(termFreq(Contents:y)=3)
>          2.9932873 = idf(docFreq=52)
>          0.3125 = fieldNorm(field=Contents, doc=78)


-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org