You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Massimo Miccoli <mm...@iltrovatore.it> on 2005/11/09 14:17:47 UTC
Re: [Nutch-dev] [jira] Resolved: (NUTCH-124) protocol-httpclient
does not follow redirects when fetching robots.txt
Ther's a problem with that solution. The protocol-httpclient now , for
some site, gerate a SEVERE Narrowly avoided an infinite loop in execute
So the fetcher exit ands only some pages is fetched until the SEVERE
message.
I don't know a solution, for now I switch back to protocoll-http.
Doug Cutting (JIRA) ha scritto:
> [ http://issues.apache.org/jira/browse/NUTCH-124?page=all ]
>
>Doug Cutting resolved NUTCH-124:
>--------------------------------
>
> Fix Version: 0.8-dev
> Resolution: Fixed
>
>I have fixed this in the mapred branch.
>
>
>
>>protocol-httpclient does not follow redirects when fetching robots.txt
>>----------------------------------------------------------------------
>>
>> Key: NUTCH-124
>> URL: http://issues.apache.org/jira/browse/NUTCH-124
>> Project: Nutch
>> Type: Bug
>> Components: fetcher
>> Versions: 0.8-dev, 0.7.2-dev
>> Reporter: Doug Cutting
>> Fix For: 0.8-dev
>>
>>
>
>
>
>>If a site's robots.txt redirects, protocol-httpclient does not correctly fetch the robots.txt and effectively ignores it for the site. See http://www.webmasterworld.com/forum11/3008.htm.
>>
>>
>
>
>
Re: [Nutch-dev] [jira] Resolved: (NUTCH-124) protocol-httpclient
does not follow redirects when fetching robots.txt
Posted by Doug Cutting <cu...@nutch.org>.
Massimo Miccoli wrote:
> Ther's a problem with that solution. The protocol-httpclient now , for
> some site, gerate a SEVERE Narrowly avoided an infinite loop in execute
> So the fetcher exit ands only some pages is fetched until the SEVERE
> message.
> I don't know a solution, for now I switch back to protocoll-http.
Can you provide more details?
Thanks,
Doug