You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/11/07 19:16:20 UTC

[jira] Resolved: (NUTCH-124) protocol-httpclient does not follow redirects when fetching robots.txt

     [ http://issues.apache.org/jira/browse/NUTCH-124?page=all ]
     
Doug Cutting resolved NUTCH-124:
--------------------------------

    Fix Version: 0.8-dev
     Resolution: Fixed

I have fixed this in the mapred branch.

> protocol-httpclient does not follow redirects when fetching robots.txt
> ----------------------------------------------------------------------
>
>          Key: NUTCH-124
>          URL: http://issues.apache.org/jira/browse/NUTCH-124
>      Project: Nutch
>         Type: Bug
>   Components: fetcher
>     Versions: 0.8-dev, 0.7.2-dev
>     Reporter: Doug Cutting
>      Fix For: 0.8-dev

>
> If a site's robots.txt redirects, protocol-httpclient does not correctly fetch the robots.txt and effectively ignores it for the site.  See http://www.webmasterworld.com/forum11/3008.htm.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Re: [Nutch-dev] [jira] Resolved: (NUTCH-124) protocol-httpclient does not follow redirects when fetching robots.txt

Posted by Doug Cutting <cu...@nutch.org>.
Massimo Miccoli wrote:
> Ther's a problem with that solution.  The protocol-httpclient now , for 
> some site,  gerate a SEVERE Narrowly avoided an infinite loop in execute
> So the fetcher exit ands only some pages is fetched until the SEVERE 
> message.
> I don't know a solution, for now I switch back to protocoll-http.

Can you provide more details?

Thanks,

Doug

Re: [Nutch-dev] [jira] Resolved: (NUTCH-124) protocol-httpclient does not follow redirects when fetching robots.txt

Posted by Massimo Miccoli <mm...@iltrovatore.it>.
Ther's a problem with that solution.  The protocol-httpclient now , for 
some site,  gerate a SEVERE Narrowly avoided an infinite loop in execute
So the fetcher exit ands only some pages is fetched until the SEVERE 
message.
I don't know a solution, for now I switch back to protocoll-http.



Doug Cutting (JIRA) ha scritto:

>     [ http://issues.apache.org/jira/browse/NUTCH-124?page=all ]
>     
>Doug Cutting resolved NUTCH-124:
>--------------------------------
>
>    Fix Version: 0.8-dev
>     Resolution: Fixed
>
>I have fixed this in the mapred branch.
>
>  
>
>>protocol-httpclient does not follow redirects when fetching robots.txt
>>----------------------------------------------------------------------
>>
>>         Key: NUTCH-124
>>         URL: http://issues.apache.org/jira/browse/NUTCH-124
>>     Project: Nutch
>>        Type: Bug
>>  Components: fetcher
>>    Versions: 0.8-dev, 0.7.2-dev
>>    Reporter: Doug Cutting
>>     Fix For: 0.8-dev
>>    
>>
>
>  
>
>>If a site's robots.txt redirects, protocol-httpclient does not correctly fetch the robots.txt and effectively ignores it for the site.  See http://www.webmasterworld.com/forum11/3008.htm.
>>    
>>
>
>  
>