You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Thomas Anderson <t....@gmail.com> on 2011/10/14 15:03:16 UTC

How does LinkRank converge?

I read wiki (http://wiki.apache.org/nutch/NewScoring#LinkRank) stating
the process of LinkRank is iterative and scores tend to converge after
iteration. However, from the the source I discover it seems that the
job always reads from the same input path and produce to the same
output path. For instance,

runCounter() reads intput from nodes and returns the number of nodes
runInitializer() reads from nodes and initializes inLinkScore

then iteration (default is 10)
runInverted() reads from nodes, where inLinkScore is initialized,
outlinks, and loops; then produces output to
linkrank-<random>/inverted
runAnalysis() reads from nodes (inLinkScore is inited), and inverted
path (in previous step); then produces output to
linkrank-<random>/nodes

This seems to me with the same process to calculate the scores, the
result of LinkRank will always be the same at each iteration. So I
can't understand very well how scores would converge. What place would
be the key point to spot at? Or any doc that may explain this more
detail?

Thanks.

Re: How does LinkRank converge?

Posted by Markus Jelsma <ma...@openindex.io>.
We could attempt to patch it to terminate when most scores values converege 
less than a specified threshold. 

> Also to add to what Markus has said.  A true PageRank type calculation
> would run until it converges.  LinkRank being iterative, runs a given
> number of loops, by default 10.  This tends to converge for many, but
> not all, link sets.
> 
> Dennis
> 
> On 10/14/2011 08:26 AM, Markus Jelsma wrote:
> > On Friday 14 October 2011 15:03:16 Thomas Anderson wrote:
> >> I read wiki (http://wiki.apache.org/nutch/NewScoring#LinkRank) stating
> >> the process of LinkRank is iterative and scores tend to converge after
> >> iteration. However, from the the source I discover it seems that the
> >> job always reads from the same input path and produce to the same
> >> output path. For instance,
> >> 
> >> runCounter() reads intput from nodes and returns the number of nodes
> >> runInitializer() reads from nodes and initializes inLinkScore
> >> 
> >> then iteration (default is 10)
> >> runInverted() reads from nodes, where inLinkScore is initialized,
> >> outlinks, and loops; then produces output to
> >> linkrank-<random>/inverted
> >> runAnalysis() reads from nodes (inLinkScore is inited), and inverted
> >> path (in previous step); then produces output to
> >> linkrank-<random>/nodes
> > 
> > The score for X and Y after the first iteration are (1 - damping) +
> > (damping * sum(inlinkScore)). Suppose X also links to Y, then
> > sum(inlinkScore) for Y will change as X has a new value after the first
> > iteration.
> > 
> > This is convergence as the delta's between iterations will flatten out
> > after each iteration.
> > 
> >> This seems to me with the same process to calculate the scores, the
> >> result of LinkRank will always be the same at each iteration. So I
> >> can't understand very well how scores would converge. What place would
> >> be the key point to spot at? Or any doc that may explain this more
> >> detail?
> >> 
> >> Thanks.

Re: How does LinkRank converge?

Posted by Dennis Kubes <ku...@apache.org>.
Also to add to what Markus has said.  A true PageRank type calculation 
would run until it converges.  LinkRank being iterative, runs a given 
number of loops, by default 10.  This tends to converge for many, but 
not all, link sets.

Dennis

On 10/14/2011 08:26 AM, Markus Jelsma wrote:
>
> On Friday 14 October 2011 15:03:16 Thomas Anderson wrote:
>> I read wiki (http://wiki.apache.org/nutch/NewScoring#LinkRank) stating
>> the process of LinkRank is iterative and scores tend to converge after
>> iteration. However, from the the source I discover it seems that the
>> job always reads from the same input path and produce to the same
>> output path. For instance,
>>
>> runCounter() reads intput from nodes and returns the number of nodes
>> runInitializer() reads from nodes and initializes inLinkScore
>>
>> then iteration (default is 10)
>> runInverted() reads from nodes, where inLinkScore is initialized,
>> outlinks, and loops; then produces output to
>> linkrank-<random>/inverted
>> runAnalysis() reads from nodes (inLinkScore is inited), and inverted
>> path (in previous step); then produces output to
>> linkrank-<random>/nodes
> The score for X and Y after the first iteration are (1 - damping) + (damping *
> sum(inlinkScore)). Suppose X also links to Y, then sum(inlinkScore) for Y will
> change as X has a new value after the first iteration.
>
> This is convergence as the delta's between iterations will flatten out after
> each iteration.
>
>> This seems to me with the same process to calculate the scores, the
>> result of LinkRank will always be the same at each iteration. So I
>> can't understand very well how scores would converge. What place would
>> be the key point to spot at? Or any doc that may explain this more
>> detail?
>>
>> Thanks.

Re: How does LinkRank converge?

Posted by Markus Jelsma <ma...@openindex.io>.

On Friday 14 October 2011 15:03:16 Thomas Anderson wrote:
> I read wiki (http://wiki.apache.org/nutch/NewScoring#LinkRank) stating
> the process of LinkRank is iterative and scores tend to converge after
> iteration. However, from the the source I discover it seems that the
> job always reads from the same input path and produce to the same
> output path. For instance,
> 
> runCounter() reads intput from nodes and returns the number of nodes
> runInitializer() reads from nodes and initializes inLinkScore
> 
> then iteration (default is 10)
> runInverted() reads from nodes, where inLinkScore is initialized,
> outlinks, and loops; then produces output to
> linkrank-<random>/inverted
> runAnalysis() reads from nodes (inLinkScore is inited), and inverted
> path (in previous step); then produces output to
> linkrank-<random>/nodes

The score for X and Y after the first iteration are (1 - damping) + (damping * 
sum(inlinkScore)). Suppose X also links to Y, then sum(inlinkScore) for Y will 
change as X has a new value after the first iteration.

This is convergence as the delta's between iterations will flatten out after 
each iteration.

> 
> This seems to me with the same process to calculate the scores, the
> result of LinkRank will always be the same at each iteration. So I
> can't understand very well how scores would converge. What place would
> be the key point to spot at? Or any doc that may explain this more
> detail?
> 
> Thanks.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350