You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by fe...@apache.org on 2012/02/15 10:38:59 UTC
svn commit: r1244416 - in /nutch/branches/nutchgora: CHANGES.txt
src/java/org/apache/nutch/crawl/GeneratorReducer.java
Author: ferdy
Date: Wed Feb 15 09:38:58 2012
New Revision: 1244416
URL: http://svn.apache.org/viewvc?rev=1244416&view=rev
Log:
NUTCH-1279 Check if limit has been reached in GeneraterReducer must be the first check performance-wise.
Modified:
nutch/branches/nutchgora/CHANGES.txt
nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java
Modified: nutch/branches/nutchgora/CHANGES.txt
URL: http://svn.apache.org/viewvc/nutch/branches/nutchgora/CHANGES.txt?rev=1244416&r1=1244415&r2=1244416&view=diff
==============================================================================
--- nutch/branches/nutchgora/CHANGES.txt (original)
+++ nutch/branches/nutchgora/CHANGES.txt Wed Feb 15 09:38:58 2012
@@ -2,6 +2,8 @@ Nutch Change Log
Release nutchgora - Current Development
+* NUTCH-1279 Check if limit has been reached in GeneraterReducer must be the first check performance-wise. (ferdy)
+
* NUTCH-1255 Change ivy.xml of all plugins to remove "nutch.root" property (ferdy)
* NUTCH-1189 add commented out default settings to gora.properties file (lewismc, Ferdy)
Modified: nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java
URL: http://svn.apache.org/viewvc/nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java?rev=1244416&r1=1244415&r2=1244416&view=diff
==============================================================================
--- nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java (original)
+++ nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java Wed Feb 15 09:38:58 2012
@@ -50,6 +50,9 @@ extends GoraReducer<SelectorEntry, WebPa
protected void reduce(SelectorEntry key, Iterable<WebPage> values,
Context context) throws IOException, InterruptedException {
for (WebPage page : values) {
+ if (count >= limit) {
+ return;
+ }
if (maxCount > 0) {
String hostordomain;
if (byDomain) {
@@ -68,9 +71,6 @@ extends GoraReducer<SelectorEntry, WebPa
}
hostCountMap.put(hostordomain, hostCount + 1);
}
- if (count >= limit) {
- return;
- }
Mark.GENERATE_MARK.putMark(page, batchId);
context.write(TableUtil.reverseUrl(key.url), page);