You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2016/08/22 21:51:03 UTC
[5/5] nutch git commit: NUTCH-2300 Fetcher to optionally save
robots.txt Merge branch 'SaveRobotsTxt' of
https://github.com/sebastian-nagel/nutch, this closes #141
NUTCH-2300 Fetcher to optionally save robots.txt
Merge branch 'SaveRobotsTxt' of https://github.com/sebastian-nagel/nutch, this closes #141
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Commit: http://git-wip-us.apache.org/repos/asf/nutch/commit/3fca1a59
Tree: http://git-wip-us.apache.org/repos/asf/nutch/tree/3fca1a59
Diff: http://git-wip-us.apache.org/repos/asf/nutch/diff/3fca1a59
Branch: refs/heads/master
Commit: 3fca1a5902a151867733806fc0511f18ab0b4e6f
Parents: d37b7ce f3af9a5
Author: Sebastian Nagel <sn...@apache.org>
Authored: Mon Aug 22 23:50:16 2016 +0200
Committer: Sebastian Nagel <sn...@apache.org>
Committed: Mon Aug 22 23:50:16 2016 +0200
----------------------------------------------------------------------
conf/nutch-default.xml | 10 ++
.../org/apache/nutch/fetcher/FetcherThread.java | 29 +++-
.../org/apache/nutch/parse/ParseSegment.java | 11 +-
.../org/apache/nutch/protocol/Protocol.java | 20 ++-
.../apache/nutch/protocol/RobotRulesParser.java | 174 +++++++++++++++----
.../nutch/protocol/http/api/HttpBase.java | 29 ++--
.../protocol/http/api/HttpRobotRulesParser.java | 52 +++++-
.../org/apache/nutch/protocol/file/File.java | 13 +-
.../java/org/apache/nutch/protocol/ftp/Ftp.java | 9 +-
.../nutch/protocol/ftp/FtpRobotRulesParser.java | 17 +-
10 files changed, 286 insertions(+), 78 deletions(-)
----------------------------------------------------------------------