You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Arthur Tre-Hardy <at...@gmail.com> on 2016/04/18 14:44:23 UTC
WebGraph LinkRank Strange initialization for the sum of the score of
incoming links.
Hi !
Doing some test With Webgraph and Linkrank i can not figure out how
totalInlinkScore is initialized (totalInlinkScore at
LinkRank.Analyzer$reduce):
If i understand well LinkScore is computed as follow :
Counter : count the number of nodes.
Initializer : initialize score to 1 (by default).
Repeat for a fixed number of iterations :
Inverter : invert outlink into inlink and compute the score
transmitted by each link.
Analyzer : update the score of each node by summing the score of
each inlinks (in a variable called totalInlinkScore) and apply the damping
factor.
My question is : during Analyzer, why is the sum of inlinks scores
initialized to 1/N where N is the result of Counter(see totalInlinkScore at
LinkRank.Analyzer$reduce). Tell me if i am wrong but In a normal PageRank
this sum would be initialized to zero. I know only two variant of PageRank :
Normalized version of PageRank in which all vertices score is set to
1/N during Initializer and the update formula becomes (1-damping)/N
+damping*totalInlinkScore.
non-Normalized PageRank which is close to what nutch is doing right
know but do not justify the initialization of Inlink Score at 1/N. If i
understand well, the advantage of non-Normalized pageRank is that the total
number of node is remove from PageRank Formula so there is no need to run
Counter.
In both case i do not see why totalInlinkScore would be initialized to
something other than zero.
Thanks you for any help.
------
Arthur