You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/10 21:22:00 UTC

[jira] [Commented] (NUTCH-2562) protocol-http fails to read large chunked HTTP responses

    [ https://issues.apache.org/jira/browse/NUTCH-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471136#comment-16471136 ] 

Sebastian Nagel commented on NUTCH-2562:
----------------------------------------

Confirmed and reproduced. The reason why the remaining chunks are continued was obviously to read the optional trailing headers. But you're right: better stop and skip the trailing headers (if any) with the remaining content.

> protocol-http fails to read large chunked HTTP responses
> --------------------------------------------------------
>
>                 Key: NUTCH-2562
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2562
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Gerard Bouchar
>            Priority: Major
>
> While reading chunked content, if the content size becomes larger than http.getMaxContent(), instead of just stopping and truncate the content, it tries to read a new chunk before having read the previous one completely, resulting in a '{color:#333333}bad chunk length' error.{color}
>  
> {color:#333333}See: https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L440-L442{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)