You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Shailesh Kochhar <sh...@gmail.com> on 2006/02/18 02:22:56 UTC

Implementing new scoring algorithms in lucene

Hi,

I'm interested in implementing a few new scoring algorithms in Lucene
and I was wondering if anyone had attempted this in the past and how
successful they had been. If there are any resources that someone
could point me to that would be great, Googling and searching the
mailing-list archives didn't turn up anything.

After looking over the current implementation of tf-idf scoring, I
concluded that the  weighting and scoring framework is mostly
implemented in TermQuery and TermScorer classes. I am thinking of
extending these classes and replacing a few others to implement the
new algorithm. Am I heading in the right direction? Does it make sense
to try and extend these classes or should I try building a parallel
heirarchy to do this?

Thank you for your time,
  - Shailesh

Re: Implementing new scoring algorithms in lucene

Posted by Paul Elschot <pa...@xs4all.nl>.
On Tuesday 21 February 2006 05:34, Shailesh Kochhar wrote:
...
> 
> I have a question about the sumOfSquaredWeigths method. As I
> understand it, it computes the square of the idf for a given term that
> is used to normalize the weight of individual terms in the query.
> 
> In implementing a different scoring algorithm, the query normalization
> I use is different and the sumOfSquaredWeights method isn't needed.
> However, it is being called from a number of different places that
> makes it hard to remove. I could easily implement the calculation of
> the qery normalization factor here, but the name of the method would
> be very misleading.
>
> Is there something I'm missing about this method, or is it a good
> candidate for renaming to something broader? I feel that the entire

What's in a name? It is one of the methods called at normalisation time,
so there is nothing wrong with using it for your own normalisation.
In case you need another method signature, you'll need to
extend Weight, but even then a new method might well be called
from sumOfSquaredWeights.

> scoring framework has many components too tightly knit together that
> make swapping a new algorithm in quite difficult. Ideally one should
> only have to extend the Similarity, Query and Scorer classes.

It's possible to implement another way of scoring. To keep the efficiency
of Lucene, you might want to stick to the way TermScorer works.

Regards,
Paul Elschot
 
 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Implementing new scoring algorithms in lucene

Posted by Shailesh Kochhar <sh...@gmail.com>.
On 2/18/06, Paul Elschot <pa...@xs4all.nl> wrote:
> On Saturday 18 February 2006 02:22, Shailesh Kochhar wrote:
> > Hi,
> >
> > I'm interested in implementing a few new scoring algorithms in Lucene
> > and I was wondering if anyone had attempted this in the past and how
> > successful they had been. If there are any resources that someone
> > could point me to that would be great, Googling and searching the
> > mailing-list archives didn't turn up anything.
> >
> > After looking over the current implementation of tf-idf scoring, I
> > concluded that the  weighting and scoring framework is mostly
> > implemented in TermQuery and TermScorer classes. I am thinking of
> > extending these classes and replacing a few others to implement the
> > new algorithm. Am I heading in the right direction? Does it make sense
> > to try and extend these classes or should I try building a parallel
> > heirarchy to do this?
>
> At the moment I only have time to answer with links:
>
> http://issues.apache.org/jira/browse/LUCENE-293
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200410.mbox/<200410172050.24372.paul.elschot%40xs4all.nl>
> http://www.loc.gov/standards/sru/cql/
> http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/surround/

I have a question about the sumOfSquaredWeigths method. As I
understand it, it computes the square of the idf for a given term that
is used to normalize the weight of individual terms in the query.

In implementing a different scoring algorithm, the query normalization
I use is different and the sumOfSquaredWeights method isn't needed.
However, it is being called from a number of different places that
makes it hard to remove. I could easily implement the calculation of
the qery normalization factor here, but the name of the method would
be very misleading.

Is there something I'm missing about this method, or is it a good
candidate for renaming to something broader? I feel that the entire
scoring framework has many components too tightly knit together that
make swapping a new algorithm in quite difficult. Ideally one should
only have to extend the Similarity, Query and Scorer classes.

Thoughts and comments?

  - Shailesh

Re: Implementing new scoring algorithms in lucene

Posted by Paul Elschot <pa...@xs4all.nl>.
On Saturday 18 February 2006 02:22, Shailesh Kochhar wrote:
> Hi,
> 
> I'm interested in implementing a few new scoring algorithms in Lucene
> and I was wondering if anyone had attempted this in the past and how
> successful they had been. If there are any resources that someone
> could point me to that would be great, Googling and searching the
> mailing-list archives didn't turn up anything.
> 
> After looking over the current implementation of tf-idf scoring, I
> concluded that the  weighting and scoring framework is mostly
> implemented in TermQuery and TermScorer classes. I am thinking of
> extending these classes and replacing a few others to implement the
> new algorithm. Am I heading in the right direction? Does it make sense
> to try and extend these classes or should I try building a parallel
> heirarchy to do this?

At the moment I only have time to answer with links:

http://issues.apache.org/jira/browse/LUCENE-293
http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200410.mbox/<200410172050.24372.paul.elschot%40xs4all.nl>
http://www.loc.gov/standards/sru/cql/
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/surround/

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org