You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/11/24 09:36:35 UTC

[jira] [Created] (NUTCH-1672) Inlinks are added twice in DbUpdateReducer

Nguyen Manh Tien created NUTCH-1672:
---------------------------------------

             Summary: Inlinks are added twice in DbUpdateReducer
                 Key: NUTCH-1672
                 URL: https://issues.apache.org/jira/browse/NUTCH-1672
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 2.2.1
            Reporter: Nguyen Manh Tien
            Priority: Minor
         Attachments: NUTCH-1672.patch

The first for loop is redundant 

for (ScoreDatum inlink : inlinkedScoreData) {
      page.putToInlinks(new Utf8(inlink.getUrl()), new Utf8(inlink.getAnchor()));
}
...
for (ScoreDatum inlink : inlinkedScoreData) {
      int inlinkDist = inlink.getDistance();
      if (inlinkDist < smallestDist) {
        smallestDist=inlinkDist;
      }
      page.putToInlinks(new Utf8(inlink.getUrl()), new Utf8(inlink.getAnchor()));
}



--
This message was sent by Atlassian JIRA
(v6.1#6144)