You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2019/01/18 15:26:24 UTC
[nutch] branch master updated: NUTCH-2680 Documentation: https
supported by multiple protocol plugins not only httpclient Improve
description of property plugin.includes: - https is supported by default -
no need to enable the stub plugin nutch-extensionpoints
This is an automated email from the ASF dual-hosted git repository.
snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git
The following commit(s) were added to refs/heads/master by this push:
new 9ae7a80 NUTCH-2680 Documentation: https supported by multiple protocol plugins not only httpclient Improve description of property plugin.includes: - https is supported by default - no need to enable the stub plugin nutch-extensionpoints
new 0c18f6c Merge pull request #426 from sebastian-nagel/NUTCH-2680
9ae7a80 is described below
commit 9ae7a8049c246aa638328605a8ce0922e48dddf6
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Mon Jan 7 12:16:10 2019 +0100
NUTCH-2680 Documentation: https supported by multiple protocol plugins not only httpclient
Improve description of property plugin.includes:
- https is supported by default
- no need to enable the stub plugin nutch-extensionpoints
---
conf/nutch-default.xml | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/conf/nutch-default.xml b/conf/nutch-default.xml
index 00cb845..913f901 100644
--- a/conf/nutch-default.xml
+++ b/conf/nutch-default.xml
@@ -1360,11 +1360,11 @@
<value>protocol-http|urlfilter-(regex|validator)|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
- In any case you need at least include the nutch-extensionpoints plugin. By
- default Nutch includes crawling just HTML and plain text via HTTP,
- and basic indexing and search plugins. In order to use HTTPS please enable
- protocol-httpclient, but be aware of possible intermittent problems with the
- underlying commons-httpclient library. Set parsefilter-naivebayes for classification based focused crawler.
+ By default Nutch includes plugins to crawl HTML and various other
+ document formats via HTTP/HTTPS and indexing the crawled content
+ into Solr. More plugins are available to support more indexing
+ backends, to fetch ftp:// and file:// URLs, for focused crawling,
+ and many other use cases.
</description>
</property>