You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Rishabh Bajpai <r_...@lycos.com> on 2003/01/23 11:05:29 UTC

Interpreting the score asociated with the Term? |

Hi All,

I am using Lucene as a Search Engine for my work. I am new to this, so forgive me if I am asking a cliched question!

I need to understand how the SCORE for the search TERMs is calculated for Lucene, so that indexing can be appropriately be designed to return the most relevant results, when searched. 

On the official FAQ page of the Lucene site, a formula is listed as 
score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t) * coord_q_d
where:
  score_d   : score for document d
  sum_t     : sum for all terms t
  tf_q      : the square root of the frequency of t in the query
  tf_d      : the square root of the frequency of t in d
  idf_t     : log(numDocs/docFreq_t+1) + 1.0
  numDocs   : number of documents in index
  docFreq_t : number of documents containing t
  norm_q    : sqrt(sum_t((tf_q*idf_t)^2))
  norm_d_t  : square root of number of tokens in d in the same field as t
  boost_t   : the user-specified boost for term t
  coord_q_d : number of terms in both query and document / number of terms in query

I didnot find the formula too helpful in figuring out what exactly the score is trying to calculate. 

I want to know of a logic that can be used for translating this score into something that can be used for determining which Terms are more relevant for a given Search Request. 

One way would be to just assume that - higher the score, more relveant is the search. But is this assumption really valid? Or are there any possible caveats to this?

-Rishabh



_____________________________________________________________
Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for $19.95/year.
http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Interpreting the score asociated with the Term? |

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Maybe we'll do this in the future, but what I provided is really basic,
not Lucene specific info.  If somebody else describes it in detail I'll
gladly put it up on the site.

Otis

--- Terry Steichen <te...@net-frame.com> wrote:
> Otis,
> 
> I think the effort you made in your previous message (to describe the
> basic
> relevance measures in simple, non-algorithmic terms) is very
> important.  If
> you think that list is reasonably comprehensive (that is, it captures
> most
> of what relevance means), I'd urge you to insert this into the
> documentation.  I think it is very valuable.
> 
> Regards,
> 
> Terry
> 
> ----- Original Message -----
> From: "Otis Gospodnetic" <ot...@yahoo.com>
> To: "Lucene Users List" <lu...@jakarta.apache.org>
> Sent: Thursday, January 23, 2003 12:02 PM
> Subject: Re: Interpreting the score asociated with the Term? |
> 
> 
> > Yes, I believe so.
> >
> > --- Terry Steichen <te...@net-frame.com> wrote:
> > > Otis,
> > >
> > > Didn't somebody (Doug?) also mention that a keyword in a shorter
> > > document is
> > > deemed more significant than in a longer one (because, I guess,
> it
> > > represents a larger percentage of the document)?
> > >
> > > Regards,
> > >
> > > Terry
> > > ----- Original Message -----
> > > From: "Otis Gospodnetic" <ot...@yahoo.com>
> > > To: "Lucene Users List" <lu...@jakarta.apache.org>;
> > > <r_...@lycos.com>
> > > Sent: Thursday, January 23, 2003 10:58 AM
> > > Subject: Re: Interpreting the score asociated with the Term? |
> > >
> > >
> > > > Here is a simplified explanation of some basic stuff.
> > > >
> > > > 1. the more frequent the term (in a collection) the lower its
> > > weight
> > > > (significance).  Makes sense - very popular words don't
> distinguish
> > > one
> > > > document from the other much, because they are present in so
> many
> > > docs.
> > > >
> > > > 2. the more frequent a word in a single document, the higher
> the
> > > > documents 'value' when the query contains that word.  So the
> score
> > > goes
> > > > up for frequent words in a document, esp. if they are not
> frequent
> > > in
> > > > other documents in the collection.
> > > >
> > > > 3. there is a boost factor which allow you to boost certain
> terms
> > > at
> > > > query time (e.g. you value matches in title field more than the
> > > body
> > > > field?  boost title field queries)
> > > >
> > > > 4. normalization factor, I believe, normalizes things so that
> > > longer
> > > > documents don't have advantage over shorter ones.
> > > >
> > > > There is more to this....but I am already not 100% about all of
> the
> > > > above, so I'll stop here :)
> > > >
> > > > Also note that you can boost fields at index time (you'll have
> to
> > > use
> > > > the nightly build for that instead of the 1.2 release to get
> this,
> > > I
> > > > believe).
> > > >
> > > > Otis
> > > >
> > > >
> > > > --- Rishabh Bajpai <r_...@lycos.com> wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > I am using Lucene as a Search Engine for my work. I am new to
> > > this,
> > > > > so forgive me if I am asking a cliched question!
> > > > >
> > > > > I need to understand how the SCORE for the search TERMs is
> > > calculated
> > > > > for Lucene, so that indexing can be appropriately be designed
> to
> > > > > return the most relevant results, when searched.
> > > > >
> > > > > On the official FAQ page of the Lucene site, a formula is
> listed
> > > as
> > > > > score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t /
> norm_d_t *
> > > > > boost_t) * coord_q_d
> > > > > where:
> > > > >   score_d   : score for document d
> > > > >   sum_t     : sum for all terms t
> > > > >   tf_q      : the square root of the frequency of t in the
> query
> > > > >   tf_d      : the square root of the frequency of t in d
> > > > >   idf_t     : log(numDocs/docFreq_t+1) + 1.0
> > > > >   numDocs   : number of documents in index
> > > > >   docFreq_t : number of documents containing t
> > > > >   norm_q    : sqrt(sum_t((tf_q*idf_t)^2))
> > > > >   norm_d_t  : square root of number of tokens in d in the
> same
> > > field
> > > > > as t
> > > > >   boost_t   : the user-specified boost for term t
> > > > >   coord_q_d : number of terms in both query and document /
> number
> > > of
> > > > > terms in query
> > > > >
> > > > > I didnot find the formula too helpful in figuring out what
> > > exactly
> > > > > the score is trying to calculate.
> > > > >
> > > > > I want to know of a logic that can be used for translating
> this
> > > score
> > > > > into something that can be used for determining which Terms
> are
> > > more
> > > > > relevant for a given Search Request.
> > > > >
> > > > > One way would be to just assume that - higher the score, more
> > > > > relveant is the search. But is this assumption really valid?
> Or
> > > are
> > > > > there any possible caveats to this?
> > > > >
> > > > > -Rishabh
> > > > >
> > > > >
> > > > >
> > > > > _____________________________________________________________
> > > > > Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for
> > > $19.95/year.
> > > > >
> > >
> http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus
> > > > >
> > > > > --
> > > > > To unsubscribe, e-mail:
> > > > > <ma...@jakarta.apache.org>
> > > > > For additional commands, e-mail:
> > > > > <ma...@jakarta.apache.org>
> > > > >
> > > >
> > > >
> > > > __________________________________________________
> > > > Do you Yahoo!?
> > > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > > > http://mailplus.yahoo.com
> > > >
> > > > --
> > > > To unsubscribe, e-mail:
> > > <ma...@jakarta.apache.org>
> > > > For additional commands, e-mail:
> > > <ma...@jakarta.apache.org>
> > > >
> > > >
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <ma...@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <ma...@jakarta.apache.org>
> > >
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > http://mailplus.yahoo.com
> >
> > --
> > To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> >
> >
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Interpreting the score asociated with the Term? |

Posted by Terry Steichen <te...@net-frame.com>.

Otis,

I think the effort you made in your previous message (to describe the basic
relevance measures in simple, non-algorithmic terms) is very important.  If
you think that list is reasonably comprehensive (that is, it captures most
of what relevance means), I'd urge you to insert this into the
documentation.  I think it is very valuable.

Regards,

Terry

----- Original Message -----
From: "Otis Gospodnetic" <ot...@yahoo.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, January 23, 2003 12:02 PM
Subject: Re: Interpreting the score asociated with the Term? |


> Yes, I believe so.
>
> --- Terry Steichen <te...@net-frame.com> wrote:
> > Otis,
> >
> > Didn't somebody (Doug?) also mention that a keyword in a shorter
> > document is
> > deemed more significant than in a longer one (because, I guess, it
> > represents a larger percentage of the document)?
> >
> > Regards,
> >
> > Terry
> > ----- Original Message -----
> > From: "Otis Gospodnetic" <ot...@yahoo.com>
> > To: "Lucene Users List" <lu...@jakarta.apache.org>;
> > <r_...@lycos.com>
> > Sent: Thursday, January 23, 2003 10:58 AM
> > Subject: Re: Interpreting the score asociated with the Term? |
> >
> >
> > > Here is a simplified explanation of some basic stuff.
> > >
> > > 1. the more frequent the term (in a collection) the lower its
> > weight
> > > (significance).  Makes sense - very popular words don't distinguish
> > one
> > > document from the other much, because they are present in so many
> > docs.
> > >
> > > 2. the more frequent a word in a single document, the higher the
> > > documents 'value' when the query contains that word.  So the score
> > goes
> > > up for frequent words in a document, esp. if they are not frequent
> > in
> > > other documents in the collection.
> > >
> > > 3. there is a boost factor which allow you to boost certain terms
> > at
> > > query time (e.g. you value matches in title field more than the
> > body
> > > field?  boost title field queries)
> > >
> > > 4. normalization factor, I believe, normalizes things so that
> > longer
> > > documents don't have advantage over shorter ones.
> > >
> > > There is more to this....but I am already not 100% about all of the
> > > above, so I'll stop here :)
> > >
> > > Also note that you can boost fields at index time (you'll have to
> > use
> > > the nightly build for that instead of the 1.2 release to get this,
> > I
> > > believe).
> > >
> > > Otis
> > >
> > >
> > > --- Rishabh Bajpai <r_...@lycos.com> wrote:
> > > >
> > > > Hi All,
> > > >
> > > > I am using Lucene as a Search Engine for my work. I am new to
> > this,
> > > > so forgive me if I am asking a cliched question!
> > > >
> > > > I need to understand how the SCORE for the search TERMs is
> > calculated
> > > > for Lucene, so that indexing can be appropriately be designed to
> > > > return the most relevant results, when searched.
> > > >
> > > > On the official FAQ page of the Lucene site, a formula is listed
> > as
> > > > score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t *
> > > > boost_t) * coord_q_d
> > > > where:
> > > >   score_d   : score for document d
> > > >   sum_t     : sum for all terms t
> > > >   tf_q      : the square root of the frequency of t in the query
> > > >   tf_d      : the square root of the frequency of t in d
> > > >   idf_t     : log(numDocs/docFreq_t+1) + 1.0
> > > >   numDocs   : number of documents in index
> > > >   docFreq_t : number of documents containing t
> > > >   norm_q    : sqrt(sum_t((tf_q*idf_t)^2))
> > > >   norm_d_t  : square root of number of tokens in d in the same
> > field
> > > > as t
> > > >   boost_t   : the user-specified boost for term t
> > > >   coord_q_d : number of terms in both query and document / number
> > of
> > > > terms in query
> > > >
> > > > I didnot find the formula too helpful in figuring out what
> > exactly
> > > > the score is trying to calculate.
> > > >
> > > > I want to know of a logic that can be used for translating this
> > score
> > > > into something that can be used for determining which Terms are
> > more
> > > > relevant for a given Search Request.
> > > >
> > > > One way would be to just assume that - higher the score, more
> > > > relveant is the search. But is this assumption really valid? Or
> > are
> > > > there any possible caveats to this?
> > > >
> > > > -Rishabh
> > > >
> > > >
> > > >
> > > > _____________________________________________________________
> > > > Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for
> > $19.95/year.
> > > >
> > http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus
> > > >
> > > > --
> > > > To unsubscribe, e-mail:
> > > > <ma...@jakarta.apache.org>
> > > > For additional commands, e-mail:
> > > > <ma...@jakarta.apache.org>
> > > >
> > >
> > >
> > > __________________________________________________
> > > Do you Yahoo!?
> > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > > http://mailplus.yahoo.com
> > >
> > > --
> > > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> > > For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> > >
> > >
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Interpreting the score asociated with the Term? |

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Yes, I believe so.

--- Terry Steichen <te...@net-frame.com> wrote:
> Otis,
> 
> Didn't somebody (Doug?) also mention that a keyword in a shorter
> document is
> deemed more significant than in a longer one (because, I guess, it
> represents a larger percentage of the document)?
> 
> Regards,
> 
> Terry
> ----- Original Message -----
> From: "Otis Gospodnetic" <ot...@yahoo.com>
> To: "Lucene Users List" <lu...@jakarta.apache.org>;
> <r_...@lycos.com>
> Sent: Thursday, January 23, 2003 10:58 AM
> Subject: Re: Interpreting the score asociated with the Term? |
> 
> 
> > Here is a simplified explanation of some basic stuff.
> >
> > 1. the more frequent the term (in a collection) the lower its
> weight
> > (significance).  Makes sense - very popular words don't distinguish
> one
> > document from the other much, because they are present in so many
> docs.
> >
> > 2. the more frequent a word in a single document, the higher the
> > documents 'value' when the query contains that word.  So the score
> goes
> > up for frequent words in a document, esp. if they are not frequent
> in
> > other documents in the collection.
> >
> > 3. there is a boost factor which allow you to boost certain terms
> at
> > query time (e.g. you value matches in title field more than the
> body
> > field?  boost title field queries)
> >
> > 4. normalization factor, I believe, normalizes things so that
> longer
> > documents don't have advantage over shorter ones.
> >
> > There is more to this....but I am already not 100% about all of the
> > above, so I'll stop here :)
> >
> > Also note that you can boost fields at index time (you'll have to
> use
> > the nightly build for that instead of the 1.2 release to get this,
> I
> > believe).
> >
> > Otis
> >
> >
> > --- Rishabh Bajpai <r_...@lycos.com> wrote:
> > >
> > > Hi All,
> > >
> > > I am using Lucene as a Search Engine for my work. I am new to
> this,
> > > so forgive me if I am asking a cliched question!
> > >
> > > I need to understand how the SCORE for the search TERMs is
> calculated
> > > for Lucene, so that indexing can be appropriately be designed to
> > > return the most relevant results, when searched.
> > >
> > > On the official FAQ page of the Lucene site, a formula is listed
> as
> > > score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t *
> > > boost_t) * coord_q_d
> > > where:
> > >   score_d   : score for document d
> > >   sum_t     : sum for all terms t
> > >   tf_q      : the square root of the frequency of t in the query
> > >   tf_d      : the square root of the frequency of t in d
> > >   idf_t     : log(numDocs/docFreq_t+1) + 1.0
> > >   numDocs   : number of documents in index
> > >   docFreq_t : number of documents containing t
> > >   norm_q    : sqrt(sum_t((tf_q*idf_t)^2))
> > >   norm_d_t  : square root of number of tokens in d in the same
> field
> > > as t
> > >   boost_t   : the user-specified boost for term t
> > >   coord_q_d : number of terms in both query and document / number
> of
> > > terms in query
> > >
> > > I didnot find the formula too helpful in figuring out what
> exactly
> > > the score is trying to calculate.
> > >
> > > I want to know of a logic that can be used for translating this
> score
> > > into something that can be used for determining which Terms are
> more
> > > relevant for a given Search Request.
> > >
> > > One way would be to just assume that - higher the score, more
> > > relveant is the search. But is this assumption really valid? Or
> are
> > > there any possible caveats to this?
> > >
> > > -Rishabh
> > >
> > >
> > >
> > > _____________________________________________________________
> > > Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for
> $19.95/year.
> > >
> http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <ma...@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <ma...@jakarta.apache.org>
> > >
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > http://mailplus.yahoo.com
> >
> > --
> > To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> >
> >
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Interpreting the score asociated with the Term? |

Posted by Terry Steichen <te...@net-frame.com>.

Otis,

Didn't somebody (Doug?) also mention that a keyword in a shorter document is
deemed more significant than in a longer one (because, I guess, it
represents a larger percentage of the document)?

Regards,

Terry
----- Original Message -----
From: "Otis Gospodnetic" <ot...@yahoo.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>;
<r_...@lycos.com>
Sent: Thursday, January 23, 2003 10:58 AM
Subject: Re: Interpreting the score asociated with the Term? |


> Here is a simplified explanation of some basic stuff.
>
> 1. the more frequent the term (in a collection) the lower its weight
> (significance).  Makes sense - very popular words don't distinguish one
> document from the other much, because they are present in so many docs.
>
> 2. the more frequent a word in a single document, the higher the
> documents 'value' when the query contains that word.  So the score goes
> up for frequent words in a document, esp. if they are not frequent in
> other documents in the collection.
>
> 3. there is a boost factor which allow you to boost certain terms at
> query time (e.g. you value matches in title field more than the body
> field?  boost title field queries)
>
> 4. normalization factor, I believe, normalizes things so that longer
> documents don't have advantage over shorter ones.
>
> There is more to this....but I am already not 100% about all of the
> above, so I'll stop here :)
>
> Also note that you can boost fields at index time (you'll have to use
> the nightly build for that instead of the 1.2 release to get this, I
> believe).
>
> Otis
>
>
> --- Rishabh Bajpai <r_...@lycos.com> wrote:
> >
> > Hi All,
> >
> > I am using Lucene as a Search Engine for my work. I am new to this,
> > so forgive me if I am asking a cliched question!
> >
> > I need to understand how the SCORE for the search TERMs is calculated
> > for Lucene, so that indexing can be appropriately be designed to
> > return the most relevant results, when searched.
> >
> > On the official FAQ page of the Lucene site, a formula is listed as
> > score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t *
> > boost_t) * coord_q_d
> > where:
> >   score_d   : score for document d
> >   sum_t     : sum for all terms t
> >   tf_q      : the square root of the frequency of t in the query
> >   tf_d      : the square root of the frequency of t in d
> >   idf_t     : log(numDocs/docFreq_t+1) + 1.0
> >   numDocs   : number of documents in index
> >   docFreq_t : number of documents containing t
> >   norm_q    : sqrt(sum_t((tf_q*idf_t)^2))
> >   norm_d_t  : square root of number of tokens in d in the same field
> > as t
> >   boost_t   : the user-specified boost for term t
> >   coord_q_d : number of terms in both query and document / number of
> > terms in query
> >
> > I didnot find the formula too helpful in figuring out what exactly
> > the score is trying to calculate.
> >
> > I want to know of a logic that can be used for translating this score
> > into something that can be used for determining which Terms are more
> > relevant for a given Search Request.
> >
> > One way would be to just assume that - higher the score, more
> > relveant is the search. But is this assumption really valid? Or are
> > there any possible caveats to this?
> >
> > -Rishabh
> >
> >
> >
> > _____________________________________________________________
> > Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for $19.95/year.
> > http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus
> >
> > --
> > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Interpreting the score asociated with the Term? |

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Here is a simplified explanation of some basic stuff.

1. the more frequent the term (in a collection) the lower its weight
(significance).  Makes sense - very popular words don't distinguish one
document from the other much, because they are present in so many docs.

2. the more frequent a word in a single document, the higher the
documents 'value' when the query contains that word.  So the score goes
up for frequent words in a document, esp. if they are not frequent in
other documents in the collection.

3. there is a boost factor which allow you to boost certain terms at
query time (e.g. you value matches in title field more than the body
field?  boost title field queries)

4. normalization factor, I believe, normalizes things so that longer
documents don't have advantage over shorter ones.

There is more to this....but I am already not 100% about all of the
above, so I'll stop here :)

Also note that you can boost fields at index time (you'll have to use
the nightly build for that instead of the 1.2 release to get this, I
believe).

Otis

--- Rishabh Bajpai <r_...@lycos.com> wrote:
> 
> Hi All,
> 
> I am using Lucene as a Search Engine for my work. I am new to this,
> so forgive me if I am asking a cliched question!
> 
> I need to understand how the SCORE for the search TERMs is calculated
> for Lucene, so that indexing can be appropriately be designed to
> return the most relevant results, when searched. 
> 
> On the official FAQ page of the Lucene site, a formula is listed as 
> score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t *
> boost_t) * coord_q_d
> where:
>   score_d   : score for document d
>   sum_t     : sum for all terms t
>   tf_q      : the square root of the frequency of t in the query
>   tf_d      : the square root of the frequency of t in d
>   idf_t     : log(numDocs/docFreq_t+1) + 1.0
>   numDocs   : number of documents in index
>   docFreq_t : number of documents containing t
>   norm_q    : sqrt(sum_t((tf_q*idf_t)^2))
>   norm_d_t  : square root of number of tokens in d in the same field
> as t
>   boost_t   : the user-specified boost for term t
>   coord_q_d : number of terms in both query and document / number of
> terms in query
> 
> I didnot find the formula too helpful in figuring out what exactly
> the score is trying to calculate. 
> 
> I want to know of a logic that can be used for translating this score
> into something that can be used for determining which Terms are more
> relevant for a given Search Request. 
> 
> One way would be to just assume that - higher the score, more
> relveant is the search. But is this assumption really valid? Or are
> there any possible caveats to this?
> 
> -Rishabh
> 
> 
> 
> _____________________________________________________________
> Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for $19.95/year.
> http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>