You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2016/08/22 21:51:03 UTC

[5/5] nutch git commit: NUTCH-2300 Fetcher to optionally save robots.txt Merge branch 'SaveRobotsTxt' of https://github.com/sebastian-nagel/nutch, this closes #141

NUTCH-2300 Fetcher to optionally save robots.txt
Merge branch 'SaveRobotsTxt' of https://github.com/sebastian-nagel/nutch, this closes #141


Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Commit: http://git-wip-us.apache.org/repos/asf/nutch/commit/3fca1a59
Tree: http://git-wip-us.apache.org/repos/asf/nutch/tree/3fca1a59
Diff: http://git-wip-us.apache.org/repos/asf/nutch/diff/3fca1a59

Branch: refs/heads/master
Commit: 3fca1a5902a151867733806fc0511f18ab0b4e6f
Parents: d37b7ce f3af9a5
Author: Sebastian Nagel <sn...@apache.org>
Authored: Mon Aug 22 23:50:16 2016 +0200
Committer: Sebastian Nagel <sn...@apache.org>
Committed: Mon Aug 22 23:50:16 2016 +0200

----------------------------------------------------------------------
 conf/nutch-default.xml                          |  10 ++
 .../org/apache/nutch/fetcher/FetcherThread.java |  29 +++-
 .../org/apache/nutch/parse/ParseSegment.java    |  11 +-
 .../org/apache/nutch/protocol/Protocol.java     |  20 ++-
 .../apache/nutch/protocol/RobotRulesParser.java | 174 +++++++++++++++----
 .../nutch/protocol/http/api/HttpBase.java       |  29 ++--
 .../protocol/http/api/HttpRobotRulesParser.java |  52 +++++-
 .../org/apache/nutch/protocol/file/File.java    |  13 +-
 .../java/org/apache/nutch/protocol/ftp/Ftp.java |   9 +-
 .../nutch/protocol/ftp/FtpRobotRulesParser.java |  17 +-
 10 files changed, 286 insertions(+), 78 deletions(-)
----------------------------------------------------------------------