You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2013/01/12 21:02:12 UTC

[jira] [Resolved] (NUTCH-813) Repetitive crawl 403 status page

     [ https://issues.apache.org/jira/browse/NUTCH-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Nagel resolved NUTCH-813.
-----------------------------------

    Resolution: Duplicate

The described problem is identical to that of NUTCH-578. The provided patch (call setPageGoneSchedule when retry counter hits db.fetch.retry.max) is included in all patches of NUTCH-578.
                
> Repetitive crawl 403 status page
> --------------------------------
>
>                 Key: NUTCH-813
>                 URL: https://issues.apache.org/jira/browse/NUTCH-813
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.1
>            Reporter: Nguyen Manh Tien
>            Priority: Minor
>             Fix For: 1.7
>
>         Attachments: ASF.LICENSE.NOT.GRANTED--Patch
>
>
> When we crawl a page the return a 403 status. It will be crawl repetitively each days with default schedule.
> Even when we restrict by paramter db.fetch.retry.max

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira