You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2010/08/09 16:10:15 UTC

[jira] Created: (NUTCH-876) Remove remaining robots/IP blocking code in lib-http

Remove remaining robots/IP blocking code in lib-http
----------------------------------------------------

                 Key: NUTCH-876
                 URL: https://issues.apache.org/jira/browse/NUTCH-876
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 2.0
            Reporter: Andrzej Bialecki 
            Assignee: Andrzej Bialecki 


There are remains of the (very old) blocking code in lib-http/.../HttpBase.java. This code was used with the OldFetcher to manage politeness limits. New trunk doesn't have OldFetcher anymore, so this code is useless. Furthermore, there is an actual bug here - FetcherJob forgets to set Protocol.CHECK_BLOCKING and Protocol.CHECK_ROBOTS to false, and the defaults in lib-http are set to true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-876) Remove remaining robots/IP blocking code in lib-http

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated NUTCH-876:
------------------------------------

    Attachment: NUTCH-876.patch

Patch to fix the issue. If there are no objections I'll commit this shortly.

> Remove remaining robots/IP blocking code in lib-http
> ----------------------------------------------------
>
>                 Key: NUTCH-876
>                 URL: https://issues.apache.org/jira/browse/NUTCH-876
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 2.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>         Attachments: NUTCH-876.patch
>
>
> There are remains of the (very old) blocking code in lib-http/.../HttpBase.java. This code was used with the OldFetcher to manage politeness limits. New trunk doesn't have OldFetcher anymore, so this code is useless. Furthermore, there is an actual bug here - FetcherJob forgets to set Protocol.CHECK_BLOCKING and Protocol.CHECK_ROBOTS to false, and the defaults in lib-http are set to true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (NUTCH-876) Remove remaining robots/IP blocking code in lib-http

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  resolved NUTCH-876.
-------------------------------------

    Fix Version/s: 2.0
       Resolution: Fixed

Committed in rev. 984337.

> Remove remaining robots/IP blocking code in lib-http
> ----------------------------------------------------
>
>                 Key: NUTCH-876
>                 URL: https://issues.apache.org/jira/browse/NUTCH-876
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 2.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: 2.0
>
>         Attachments: NUTCH-876.patch
>
>
> There are remains of the (very old) blocking code in lib-http/.../HttpBase.java. This code was used with the OldFetcher to manage politeness limits. New trunk doesn't have OldFetcher anymore, so this code is useless. Furthermore, there is an actual bug here - FetcherJob forgets to set Protocol.CHECK_BLOCKING and Protocol.CHECK_ROBOTS to false, and the defaults in lib-http are set to true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.