You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Tejas Patil (JIRA)" <ji...@apache.org> on 2013/05/03 22:38:15 UTC

[jira] [Comment Edited] (NUTCH-1039) Fetcher fails for pages without content-length header

    [ https://issues.apache.org/jira/browse/NUTCH-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648787#comment-13648787 ] 

Tejas Patil edited comment on NUTCH-1039 at 5/3/13 8:36 PM:
------------------------------------------------------------

I feel that this item wont have any progress unless we get some real url wherein this gets reproduced (that will indicate if it really got fixed or not due to some checkin). Marking it as "cannot reproduce" for now. If anyone faces it, please re-open it so that we can work on it.
                
      was (Author: tejasp):
    I feel that thin item wont have any progress unless we get some real url wherein this gets reproduced (that will indicate if it really got fixed or not due to some checkin). Marking it as "cannot reproduce" for now. If anyone faces it, please re-open it so that we can work on it.
                  
> Fetcher fails for pages without content-length header
> -----------------------------------------------------
>
>                 Key: NUTCH-1039
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1039
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.7
>
>
> Fetcher fails:
> 2011-07-11 14:45:34,764 ERROR http.Http - org.apache.nutch.protocol.http.api.HttpException: bad content length:
> 2011-07-11 14:45:34,765 ERROR http.Http - at org.apache.nutch.protocol.http.HttpResponse.readPlainContent(HttpResponse.java:218)
> 2011-07-11 14:45:34,765 ERROR http.Http - at org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:158)
> 2011-07-11 14:45:34,765 ERROR http.Http - at org.apache.nutch.protocol.http.Http.getResponse(Http.java:64)
> 2011-07-11 14:45:34,765 ERROR http.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:138)
> 2011-07-11 14:45:34,765 ERROR http.Http - at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:79)
> Both fetcher and indexing filter checker fail sometimes. I'm unsure whether this is something in Nutch or whether the remote server only returns content-length incidentally.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira