You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2015/03/20 10:05:38 UTC
[jira] [Commented] (NUTCH-1971) The crawldb.url.filters property is
not present in any configuration file
[ https://issues.apache.org/jira/browse/NUTCH-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371018#comment-14371018 ]
Sebastian Nagel commented on NUTCH-1971:
----------------------------------------
+1 Since NUTCH-1786 crawldb.url.filters and crawldb.url.normalizers can be set via configuration properties. Before they have been purely temporary properties to pass settings from command-line arguments to map/reduce jobs. Yes, it's possible to set (and overwrite!) these properties via command-line options ({{nutch updatedb ... \[-filter\] \[-normalize\]}}). Maybe the fact that they can be overwritten should be noted, preferable in the command-line help, not in nutch-default.xml.
> The crawldb.url.filters property is not present in any configuration file
> -------------------------------------------------------------------------
>
> Key: NUTCH-1971
> URL: https://issues.apache.org/jira/browse/NUTCH-1971
> Project: Nutch
> Issue Type: Improvement
> Components: crawldb
> Affects Versions: 1.9
> Reporter: Luis Lopez
> Labels: configuration, crawldb, nutch-default.xml
>
> In CrawlDbFilter.java there is a line for getting a boolean that sets if the filters are going to be applied or not:
> public static final String URL_FILTERING = "crawldb.url.filters";
> However in nutch-default.xml that property is not present. Currently the only way to set this value is using the -filter parameter from the command line.
> The same applies to:
> public static final String URL_NORMALIZING = "crawldb.url.normalizers";
> public static final String URL_NORMALIZING_SCOPE = "crawldb.url.normalizers.scope";
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)