You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/11/18 23:29:11 UTC

[jira] [Commented] (NUTCH-2069) Ignore external links based on domain

    [ https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012193#comment-15012193 ] 

Markus Jelsma commented on NUTCH-2069:
--------------------------------------

Hi J - i agree with the mode! Have it defaulted so it never breaks older instances and doesn't allow excluding both. Your follow up patch is probably spot on, have you got one? It can still come in 1.11!
M.

> Ignore external links based on domain
> -------------------------------------
>
>                 Key: NUTCH-2069
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2069
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher, parser
>    Affects Versions: 1.10
>            Reporter: Julien Nioche
>         Attachments: NUTCH-2069.patch
>
>
> We currently have `db.ignore.external.links` which is a nice way of restricting the crawl based on the hostname. This adds a new parameter 'db.ignore.external.links.domain' to do the same based on the domain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)