You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Massimo Miccoli <mm...@iltrovatore.it> on 2005/11/09 14:17:47 UTC

Re: [Nutch-dev] [jira] Resolved: (NUTCH-124) protocol-httpclient does not follow redirects when fetching robots.txt

Ther's a problem with that solution.  The protocol-httpclient now , for 
some site,  gerate a SEVERE Narrowly avoided an infinite loop in execute
So the fetcher exit ands only some pages is fetched until the SEVERE 
message.
I don't know a solution, for now I switch back to protocoll-http.



Doug Cutting (JIRA) ha scritto:

>     [ http://issues.apache.org/jira/browse/NUTCH-124?page=all ]
>     
>Doug Cutting resolved NUTCH-124:
>--------------------------------
>
>    Fix Version: 0.8-dev
>     Resolution: Fixed
>
>I have fixed this in the mapred branch.
>
>  
>
>>protocol-httpclient does not follow redirects when fetching robots.txt
>>----------------------------------------------------------------------
>>
>>         Key: NUTCH-124
>>         URL: http://issues.apache.org/jira/browse/NUTCH-124
>>     Project: Nutch
>>        Type: Bug
>>  Components: fetcher
>>    Versions: 0.8-dev, 0.7.2-dev
>>    Reporter: Doug Cutting
>>     Fix For: 0.8-dev
>>    
>>
>
>  
>
>>If a site's robots.txt redirects, protocol-httpclient does not correctly fetch the robots.txt and effectively ignores it for the site.  See http://www.webmasterworld.com/forum11/3008.htm.
>>    
>>
>
>  
>

Re: [Nutch-dev] [jira] Resolved: (NUTCH-124) protocol-httpclient does not follow redirects when fetching robots.txt

Posted by Doug Cutting <cu...@nutch.org>.
Massimo Miccoli wrote:
> Ther's a problem with that solution.  The protocol-httpclient now , for 
> some site,  gerate a SEVERE Narrowly avoided an infinite loop in execute
> So the fetcher exit ands only some pages is fetched until the SEVERE 
> message.
> I don't know a solution, for now I switch back to protocoll-http.

Can you provide more details?

Thanks,

Doug