You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by fe...@apache.org on 2012/02/15 10:38:59 UTC

svn commit: r1244416 - in /nutch/branches/nutchgora: CHANGES.txt src/java/org/apache/nutch/crawl/GeneratorReducer.java

Author: ferdy
Date: Wed Feb 15 09:38:58 2012
New Revision: 1244416

URL: http://svn.apache.org/viewvc?rev=1244416&view=rev
Log:
NUTCH-1279 Check if limit has been reached in GeneraterReducer must be the first check performance-wise.

Modified:
    nutch/branches/nutchgora/CHANGES.txt
    nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java

Modified: nutch/branches/nutchgora/CHANGES.txt
URL: http://svn.apache.org/viewvc/nutch/branches/nutchgora/CHANGES.txt?rev=1244416&r1=1244415&r2=1244416&view=diff
==============================================================================
--- nutch/branches/nutchgora/CHANGES.txt (original)
+++ nutch/branches/nutchgora/CHANGES.txt Wed Feb 15 09:38:58 2012
@@ -2,6 +2,8 @@ Nutch Change Log
 
 Release nutchgora - Current Development
 
+* NUTCH-1279 Check if limit has been reached in GeneraterReducer must be the first check performance-wise. (ferdy)
+
 * NUTCH-1255 Change ivy.xml of all plugins to remove "nutch.root" property (ferdy)
 
 * NUTCH-1189 add commented out default settings to gora.properties file (lewismc, Ferdy)

Modified: nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java
URL: http://svn.apache.org/viewvc/nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java?rev=1244416&r1=1244415&r2=1244416&view=diff
==============================================================================
--- nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java (original)
+++ nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java Wed Feb 15 09:38:58 2012
@@ -50,6 +50,9 @@ extends GoraReducer<SelectorEntry, WebPa
   protected void reduce(SelectorEntry key, Iterable<WebPage> values,
       Context context) throws IOException, InterruptedException {
     for (WebPage page : values) {
+      if (count >= limit) {
+        return;
+      }
       if (maxCount > 0) {
         String hostordomain;
         if (byDomain) {
@@ -68,9 +71,6 @@ extends GoraReducer<SelectorEntry, WebPa
         }
         hostCountMap.put(hostordomain, hostCount + 1);
       }
-      if (count >= limit) {
-        return;
-      }
 
       Mark.GENERATE_MARK.putMark(page, batchId);
       context.write(TableUtil.reverseUrl(key.url), page);