You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Santi Gori <ha...@gmail.com> on 2005/06/08 21:04:38 UTC

Re: ranking algorithm

On 6/8/05, Santi Gori <ha...@gmail.com> wrote:
> Hello. I would like to know how nutch`s ranking algorithm works.
> Thanks
>

Re: ranking algorithm

Posted by Santi Gori <ha...@gmail.com>.
On 6/8/05, Santi Gori <ha...@gmail.com> wrote:
> On 6/8/05, Santi Gori <ha...@gmail.com> wrote:
> > Hello. I would like to know how nutch`s ranking algorithm works.
> > Thanks
> >
>

RE: ranking algorithm

Posted by Chirag Chaman <de...@filangy.com>.
It's a modification of the Google PigeonRank:
http://www.google.com/technology/pigeonrank.html

.
.
.
.
.
.
.
.
.
.
.
.
.
.
Okay, serious now:

Here's an overview of nutch scoring:

1. At query time: Does a TF-IDF (Term Frequency - Inverse Document
Frequency) at search time.

2. Link analysis boost. This is done by computing the Page rank as a
separate analysis step. Now this steps takes too long and is not actively
supported in newer versions...thus as an alternative you can use the "poor
man's PageRank" and use the following link boost:  sqrt(inlinks)
(controlled via properties files, see nutch-default.xml).

3. set boost for specific fields, i.e. if query matches Title or anchor the
final score is boosted by a factor.  I would encourage you to look at the
explain page. 

Here are a couple of links that might help: 

JavaDoc:
http://lucene.apache.org/nutch/apidocs/net/nutch/indexer/NutchSimilarity.htm
l
 
Erik's presentation:
http://www.cooug.org/java/presentations/september2004/NEJUG.pdf
(look at scoring slide)



-----Original Message-----
From: Santi Gori [mailto:hayqdarlotodo@gmail.com] 
Sent: Wednesday, June 08, 2005 3:05 PM
To: nutch-user@incubator.apache.org
Subject: Re: ranking algorithm

On 6/8/05, Santi Gori <ha...@gmail.com> wrote:
> Hello. I would like to know how nutch`s ranking algorithm works.
> Thanks
>