You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/07/05 16:17:00 UTC

[jira] [Commented] (NUTCH-2398) Fetcher saving redirected robots.txt under redirect target URL

    [ https://issues.apache.org/jira/browse/NUTCH-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075021#comment-16075021 ] 

ASF GitHub Bot commented on NUTCH-2398:
---------------------------------------

sebastian-nagel opened a new pull request #199: NUTCH-2398: Save content of redirected robots.txt under redirect target URL
URL: https://github.com/apache/nutch/pull/199
 
 
   do not use original URL (http://example.com/robots.txt) to store both
   redirect response (HTTP 301) and response of redirect target
   
   See also commoncrawl/nutch#4.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Fetcher saving redirected robots.txt under redirect target URL
> --------------------------------------------------------------
>
>                 Key: NUTCH-2398
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2398
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.13
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.14
>
>
> NUTCH-2300 lets the Fetcher store optionally the robots.txt response (content and HTTP status). If the '.../robots.txt' is redirected, the redirected content is also stored but with the redirect source URL as key. It should use the redirect target URL instead. Otherwise one of the responses is overwritten in the segments map file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)