You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by le...@apache.org on 2014/01/13 14:21:26 UTC

svn commit: r1557707 - in /nutch/branches/2.x: CHANGES.txt src/java/org/apache/nutch/crawl/DbUpdateReducer.java

Author: lewismc
Date: Mon Jan 13 13:21:26 2014
New Revision: 1557707

URL: http://svn.apache.org/r1557707
Log:
NUTCH-1672 Inlinks are added twice in DbUpdateReducer

Modified:
    nutch/branches/2.x/CHANGES.txt
    nutch/branches/2.x/src/java/org/apache/nutch/crawl/DbUpdateReducer.java

Modified: nutch/branches/2.x/CHANGES.txt
URL: http://svn.apache.org/viewvc/nutch/branches/2.x/CHANGES.txt?rev=1557707&r1=1557706&r2=1557707&view=diff
==============================================================================
--- nutch/branches/2.x/CHANGES.txt (original)
+++ nutch/branches/2.x/CHANGES.txt Mon Jan 13 13:21:26 2014
@@ -2,6 +2,8 @@ Nutch Change Log
 
 Current Development
 
+* NUTCH-1672 Inlinks are added twice in DbUpdateReducer (Tien Nguyen Manh via lewismc)
+
 * NUTCH-1667 Updatedb always ignore batchId (Tien Nguyen Manh via lewismc)
 
 * NUTCH-1695 NutchDocument.toString() (markus via lewismc)

Modified: nutch/branches/2.x/src/java/org/apache/nutch/crawl/DbUpdateReducer.java
URL: http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/crawl/DbUpdateReducer.java?rev=1557707&r1=1557706&r2=1557707&view=diff
==============================================================================
--- nutch/branches/2.x/src/java/org/apache/nutch/crawl/DbUpdateReducer.java (original)
+++ nutch/branches/2.x/src/java/org/apache/nutch/crawl/DbUpdateReducer.java Mon Jan 13 13:21:26 2014
@@ -159,10 +159,7 @@ extends GoraReducer<UrlWithScore, NutchW
     if (page.getInlinks() != null) {
       page.getInlinks().clear();
     }
-    for (ScoreDatum inlink : inlinkedScoreData) {
-      page.putToInlinks(new Utf8(inlink.getUrl()), new Utf8(inlink.getAnchor()));
-    }
-
+    
     // Distance calculation.
     // Retrieve smallest distance from all inlinks distances
     // Calculate new distance for current page: smallest inlink distance plus 1.