You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Michael Ji <fj...@yahoo.com> on 2005/09/04 01:16:32 UTC

page rank in Nutch

hi,

Lucene has basic scoring algorithm based on tf, tdf
and field boost value.

And Nutch adopts page rank concept by using its'
unique link analysis via DistributedAnalysisTool
class.

But when I take a look at "score in detail" of Nutch's
search result, I didn't see a factor called "link
analysis" or something else like that.

Where can I see this factor or it is already combined
into the score value we saw in the score detail page.

thanks,

Michael Ji,

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: page rank in Nutch

Posted by Ken Krugler <kk...@transpac.com>.
>Lucene has basic scoring algorithm based on tf, tdf
>and field boost value.
>
>And Nutch adopts page rank concept by using its'
>unique link analysis via DistributedAnalysisTool
>class.

Actually I don't think most people run this. I believe it starts to 
have performance issues when your page counts get large, which is one 
of the reasons for the mapred work being done by Doug/Mike in a 
branch.

Typically the extent of "link analysis" is the number of inbound 
links to a page, which is always being calculated whenever the WebDB 
is updated following a crawl.

>But when I take a look at "score in detail" of Nutch's
>search result, I didn't see a factor called "link
>analysis" or something else like that.
>
>Where can I see this factor or it is already combined
>into the score value we saw in the score detail page.

See my previous post on how inbound link counts are used to boost a 
Lucene document (web page).

-- Ken
-- 
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200