You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/07/01 10:38:06 UTC

[jira] [Commented] (NUTCH-1730) Scoring-depth optionally not to increment depth for external hosts

    [ https://issues.apache.org/jira/browse/NUTCH-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609740#comment-14609740 ] 

Markus Jelsma commented on NUTCH-1730:
--------------------------------------

Hello Sebastian!

* thanksI The unit tests are not affected as both have the same typo
* of course!
* yes, -1 disables it completely and 0 is a non-sensible depth as well

The use-case is that if you want to crawl many different hosts and not restrict them to the initial seed that was another host. You are right about linking to external deep page indeed. So this approach is flawed. Depth must always be controlled from the domain root!

> Scoring-depth optionally not to increment depth for external hosts
> ------------------------------------------------------------------
>
>                 Key: NUTCH-1730
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1730
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 1.7
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.11
>
>         Attachments: NUTCH-1730-trunk.patch, NUTCH-1730.patch
>
>
> Currently, the plugin always increments depth, even when coming or going to external hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)