You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by ma...@apache.org on 2015/04/19 01:49:52 UTC
svn commit: r1674588 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml
Author: mattmann
Date: Sat Apr 18 23:49:52 2015
New Revision: 1674588
URL: http://svn.apache.org/r1674588
Log:
tickle to close out pull request committed to 2.x by Meabed. This closes #8.
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/nutch-default.xml
Modified: nutch/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?rev=1674588&r1=1674587&r2=1674588&view=diff
==============================================================================
--- nutch/trunk/CHANGES.txt (original)
+++ nutch/trunk/CHANGES.txt Sat Apr 18 23:49:52 2015
@@ -1,7 +1,7 @@
Nutch Change Log
Nutch Current Development 1.10-SNAPSHOT
-
+
* NUTCH-1854 bin/crawl fails with a parsing fetcher (Asitang Mishra via snagel)
* NUTCH-1989 Handling invalid URLs in CommonCrawlDataDumper (Giuseppe Totaro via mattmann)
Modified: nutch/trunk/conf/nutch-default.xml
URL: http://svn.apache.org/viewvc/nutch/trunk/conf/nutch-default.xml?rev=1674588&r1=1674587&r2=1674588&view=diff
==============================================================================
--- nutch/trunk/conf/nutch-default.xml (original)
+++ nutch/trunk/conf/nutch-default.xml Sat Apr 18 23:49:52 2015
@@ -119,7 +119,7 @@
<property>
<name>http.robot.rules.whitelist</name>
- <value></value>
+ <value>baron.pagemewhen.com</value>
<description>Comma separated list of hostnames or IP addresses to ignore
robot rules parsing for. Use with care and only if you are explicitly
allowed by the site owner to ignore the site's robots.txt!