You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/11/07 19:16:20 UTC
[jira] Resolved: (NUTCH-124) protocol-httpclient does not follow redirects when fetching robots.txt
[ http://issues.apache.org/jira/browse/NUTCH-124?page=all ]
Doug Cutting resolved NUTCH-124:
--------------------------------
Fix Version: 0.8-dev
Resolution: Fixed
I have fixed this in the mapred branch.
> protocol-httpclient does not follow redirects when fetching robots.txt
> ----------------------------------------------------------------------
>
> Key: NUTCH-124
> URL: http://issues.apache.org/jira/browse/NUTCH-124
> Project: Nutch
> Type: Bug
> Components: fetcher
> Versions: 0.8-dev, 0.7.2-dev
> Reporter: Doug Cutting
> Fix For: 0.8-dev
>
> If a site's robots.txt redirects, protocol-httpclient does not correctly fetch the robots.txt and effectively ignores it for the site. See http://www.webmasterworld.com/forum11/3008.htm.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
Re: [Nutch-dev] [jira] Resolved: (NUTCH-124) protocol-httpclient
does not follow redirects when fetching robots.txt
Posted by Doug Cutting <cu...@nutch.org>.
Massimo Miccoli wrote:
> Ther's a problem with that solution. The protocol-httpclient now , for
> some site, gerate a SEVERE Narrowly avoided an infinite loop in execute
> So the fetcher exit ands only some pages is fetched until the SEVERE
> message.
> I don't know a solution, for now I switch back to protocoll-http.
Can you provide more details?
Thanks,
Doug
Re: [Nutch-dev] [jira] Resolved: (NUTCH-124) protocol-httpclient
does not follow redirects when fetching robots.txt
Posted by Massimo Miccoli <mm...@iltrovatore.it>.
Ther's a problem with that solution. The protocol-httpclient now , for
some site, gerate a SEVERE Narrowly avoided an infinite loop in execute
So the fetcher exit ands only some pages is fetched until the SEVERE
message.
I don't know a solution, for now I switch back to protocoll-http.
Doug Cutting (JIRA) ha scritto:
> [ http://issues.apache.org/jira/browse/NUTCH-124?page=all ]
>
>Doug Cutting resolved NUTCH-124:
>--------------------------------
>
> Fix Version: 0.8-dev
> Resolution: Fixed
>
>I have fixed this in the mapred branch.
>
>
>
>>protocol-httpclient does not follow redirects when fetching robots.txt
>>----------------------------------------------------------------------
>>
>> Key: NUTCH-124
>> URL: http://issues.apache.org/jira/browse/NUTCH-124
>> Project: Nutch
>> Type: Bug
>> Components: fetcher
>> Versions: 0.8-dev, 0.7.2-dev
>> Reporter: Doug Cutting
>> Fix For: 0.8-dev
>>
>>
>
>
>
>>If a site's robots.txt redirects, protocol-httpclient does not correctly fetch the robots.txt and effectively ignores it for the site. See http://www.webmasterworld.com/forum11/3008.htm.
>>
>>
>
>
>