You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Arthur Tre-Hardy <at...@gmail.com> on 2016/04/18 14:44:23 UTC

WebGraph LinkRank Strange initialization for the sum of the score of incoming links.

Hi !

Doing some test With Webgraph and Linkrank i can not figure out how
totalInlinkScore is initialized (totalInlinkScore at
LinkRank.Analyzer$reduce):
If i understand well LinkScore is computed as follow :

        Counter  : count the  number of nodes.

        Initializer : initialize score to 1 (by default).

Repeat for a fixed number of iterations :

      Inverter : invert outlink into inlink and compute the score
transmitted by each link.

        Analyzer : update the score of each node by summing the score of
each inlinks (in a variable called totalInlinkScore) and apply the damping
factor.

My question is : during Analyzer, why is the sum of inlinks scores
initialized to 1/N where N is the result of Counter(see totalInlinkScore at
LinkRank.Analyzer$reduce). Tell me if i am wrong but In a normal PageRank
this sum would be initialized to zero. I know only two variant of PageRank :

    Normalized version of PageRank in which all vertices score is set to
1/N during Initializer and the update formula becomes (1-damping)/N
+damping*totalInlinkScore.
    non-Normalized PageRank which is close to what nutch is doing right
know but do not justify the initialization of Inlink Score at 1/N. If i
understand well, the advantage of non-Normalized pageRank is that the total
number of node is remove from PageRank Formula so there is no need to run
Counter.

In both case i do not see why totalInlinkScore would be initialized to
something other than zero.

Thanks you for any help.

------

Arthur