You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "behnam nikbakht (Created) (JIRA)" <ji...@apache.org> on 2012/03/07 07:46:14 UTC

[jira] [Created] (NUTCH-1303) Fetcher to skip queues for URLS getting repeated exceptions, based on percentage

Fetcher to skip queues for URLS getting repeated exceptions, based on percentage
--------------------------------------------------------------------------------

                 Key: NUTCH-1303
                 URL: https://issues.apache.org/jira/browse/NUTCH-1303
             Project: Nutch
          Issue Type: Improvement
          Components: fetcher
    Affects Versions: 1.4
            Reporter: behnam nikbakht


as described in https://issues.apache.org/jira/browse/NUTCH-769, it is a good solution to skip queues with high exception value, but it is not easy to set value of fetcher.max.exceptions.per.queue when size of queues are different.
i suggest that define a ratio instead of value, so if the ratio of exceptions per requests exceeds, then queue cleared.
also, it is not sufficient to keep fetcher from high exceptions, value of fetcher.throughput.threshold.pages ensures that a valueable throughput of fetch can gained against slow hosts, but it clean all queues not slow queue. i suggest for this one that this factor like fetcher.max.exceptions.per.queue enforce to each queue not all of them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1303) Fetcher to skip queues for URLS getting repeated exceptions, based on percentage

Posted by "behnam nikbakht (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

behnam nikbakht updated NUTCH-1303:
-----------------------------------

    Attachment: NUTCH-1303.patch
    
> Fetcher to skip queues for URLS getting repeated exceptions, based on percentage
> --------------------------------------------------------------------------------
>
>                 Key: NUTCH-1303
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1303
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.4
>            Reporter: behnam nikbakht
>              Labels: fetch
>         Attachments: NUTCH-1303.patch
>
>
> as described in https://issues.apache.org/jira/browse/NUTCH-769, it is a good solution to skip queues with high exception value, but it is not easy to set value of fetcher.max.exceptions.per.queue when size of queues are different.
> i suggest that define a ratio instead of value, so if the ratio of exceptions per requests exceeds, then queue cleared.
> also, it is not sufficient to keep fetcher from high exceptions, value of fetcher.throughput.threshold.pages ensures that a valueable throughput of fetch can gained against slow hosts, but it clean all queues not slow queue. i suggest for this one that this factor like fetcher.max.exceptions.per.queue enforce to each queue not all of them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira