You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Omkar Reddy (JIRA)" <ji...@apache.org> on 2018/05/06 09:19:00 UTC

[jira] [Commented] (NUTCH-2575) protocol-http does not respect the maximum content-size

    [ https://issues.apache.org/jira/browse/NUTCH-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16465037#comment-16465037 ] 

Omkar Reddy commented on NUTCH-2575:
------------------------------------

Hi [~gbouchar], I see the issue, while reading every chunk we are calculating the number of bytes read in the chunk with the variable: "chunkBytesRead" but it is not added into the "contentBytesRead" after reading the chunk. 

A simple solution is to do "contentBytesRead += chunkBytesRead" at the end of every chunk. This is should fix it. I will send a PR for this. Thanks. 

> protocol-http does not respect the maximum content-size
> -------------------------------------------------------
>
>                 Key: NUTCH-2575
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2575
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Gerard Bouchar
>            Priority: Critical
>
> There is a bug in HttpResponse::readChunkedContent that prevents it to stop reading content when it exceeds the maximum allowed size.
> There [is a variable contentBytesRead|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L404] that is used to check how much content has been read, but it is never updated, so it always stays null, and [the size check|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L440-L442] always returns false (unless a single chunk is larger than the maximum allowed content size).
> This allows any server to cause out-of-memory errors on our size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)