You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Thomas Anderson <t....@gmail.com> on 2011/10/14 15:03:16 UTC
How does LinkRank converge?
I read wiki (http://wiki.apache.org/nutch/NewScoring#LinkRank) stating
the process of LinkRank is iterative and scores tend to converge after
iteration. However, from the the source I discover it seems that the
job always reads from the same input path and produce to the same
output path. For instance,
runCounter() reads intput from nodes and returns the number of nodes
runInitializer() reads from nodes and initializes inLinkScore
then iteration (default is 10)
runInverted() reads from nodes, where inLinkScore is initialized,
outlinks, and loops; then produces output to
linkrank-<random>/inverted
runAnalysis() reads from nodes (inLinkScore is inited), and inverted
path (in previous step); then produces output to
linkrank-<random>/nodes
This seems to me with the same process to calculate the scores, the
result of LinkRank will always be the same at each iteration. So I
can't understand very well how scores would converge. What place would
be the key point to spot at? Or any doc that may explain this more
detail?
Thanks.
Re: How does LinkRank converge?
Posted by Markus Jelsma <ma...@openindex.io>.
We could attempt to patch it to terminate when most scores values converege
less than a specified threshold.
> Also to add to what Markus has said. A true PageRank type calculation
> would run until it converges. LinkRank being iterative, runs a given
> number of loops, by default 10. This tends to converge for many, but
> not all, link sets.
>
> Dennis
>
> On 10/14/2011 08:26 AM, Markus Jelsma wrote:
> > On Friday 14 October 2011 15:03:16 Thomas Anderson wrote:
> >> I read wiki (http://wiki.apache.org/nutch/NewScoring#LinkRank) stating
> >> the process of LinkRank is iterative and scores tend to converge after
> >> iteration. However, from the the source I discover it seems that the
> >> job always reads from the same input path and produce to the same
> >> output path. For instance,
> >>
> >> runCounter() reads intput from nodes and returns the number of nodes
> >> runInitializer() reads from nodes and initializes inLinkScore
> >>
> >> then iteration (default is 10)
> >> runInverted() reads from nodes, where inLinkScore is initialized,
> >> outlinks, and loops; then produces output to
> >> linkrank-<random>/inverted
> >> runAnalysis() reads from nodes (inLinkScore is inited), and inverted
> >> path (in previous step); then produces output to
> >> linkrank-<random>/nodes
> >
> > The score for X and Y after the first iteration are (1 - damping) +
> > (damping * sum(inlinkScore)). Suppose X also links to Y, then
> > sum(inlinkScore) for Y will change as X has a new value after the first
> > iteration.
> >
> > This is convergence as the delta's between iterations will flatten out
> > after each iteration.
> >
> >> This seems to me with the same process to calculate the scores, the
> >> result of LinkRank will always be the same at each iteration. So I
> >> can't understand very well how scores would converge. What place would
> >> be the key point to spot at? Or any doc that may explain this more
> >> detail?
> >>
> >> Thanks.
Re: How does LinkRank converge?
Posted by Dennis Kubes <ku...@apache.org>.
Also to add to what Markus has said. A true PageRank type calculation
would run until it converges. LinkRank being iterative, runs a given
number of loops, by default 10. This tends to converge for many, but
not all, link sets.
Dennis
On 10/14/2011 08:26 AM, Markus Jelsma wrote:
>
> On Friday 14 October 2011 15:03:16 Thomas Anderson wrote:
>> I read wiki (http://wiki.apache.org/nutch/NewScoring#LinkRank) stating
>> the process of LinkRank is iterative and scores tend to converge after
>> iteration. However, from the the source I discover it seems that the
>> job always reads from the same input path and produce to the same
>> output path. For instance,
>>
>> runCounter() reads intput from nodes and returns the number of nodes
>> runInitializer() reads from nodes and initializes inLinkScore
>>
>> then iteration (default is 10)
>> runInverted() reads from nodes, where inLinkScore is initialized,
>> outlinks, and loops; then produces output to
>> linkrank-<random>/inverted
>> runAnalysis() reads from nodes (inLinkScore is inited), and inverted
>> path (in previous step); then produces output to
>> linkrank-<random>/nodes
> The score for X and Y after the first iteration are (1 - damping) + (damping *
> sum(inlinkScore)). Suppose X also links to Y, then sum(inlinkScore) for Y will
> change as X has a new value after the first iteration.
>
> This is convergence as the delta's between iterations will flatten out after
> each iteration.
>
>> This seems to me with the same process to calculate the scores, the
>> result of LinkRank will always be the same at each iteration. So I
>> can't understand very well how scores would converge. What place would
>> be the key point to spot at? Or any doc that may explain this more
>> detail?
>>
>> Thanks.
Re: How does LinkRank converge?
Posted by Markus Jelsma <ma...@openindex.io>.
On Friday 14 October 2011 15:03:16 Thomas Anderson wrote:
> I read wiki (http://wiki.apache.org/nutch/NewScoring#LinkRank) stating
> the process of LinkRank is iterative and scores tend to converge after
> iteration. However, from the the source I discover it seems that the
> job always reads from the same input path and produce to the same
> output path. For instance,
>
> runCounter() reads intput from nodes and returns the number of nodes
> runInitializer() reads from nodes and initializes inLinkScore
>
> then iteration (default is 10)
> runInverted() reads from nodes, where inLinkScore is initialized,
> outlinks, and loops; then produces output to
> linkrank-<random>/inverted
> runAnalysis() reads from nodes (inLinkScore is inited), and inverted
> path (in previous step); then produces output to
> linkrank-<random>/nodes
The score for X and Y after the first iteration are (1 - damping) + (damping *
sum(inlinkScore)). Suppose X also links to Y, then sum(inlinkScore) for Y will
change as X has a new value after the first iteration.
This is convergence as the delta's between iterations will flatten out after
each iteration.
>
> This seems to me with the same process to calculate the scores, the
> result of LinkRank will always be the same at each iteration. So I
> can't understand very well how scores would converge. What place would
> be the key point to spot at? Or any doc that may explain this more
> detail?
>
> Thanks.
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350