You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/10 21:02:00 UTC

[jira] [Commented] (NUTCH-2575) protocol-http does not respect the maximum content-size for chunked responses

    [ https://issues.apache.org/jira/browse/NUTCH-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471105#comment-16471105 ] 

ASF GitHub Bot commented on NUTCH-2575:
---------------------------------------

sebastian-nagel closed pull request #327: NUTCH-2575 Storing total number of bytes read after every chunk
URL: https://github.com/apache/nutch/pull/327
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java b/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
index c87c11125..591b94298 100644
--- a/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
+++ b/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
@@ -464,6 +464,7 @@ private void readChunkedContent(PushbackInputStream in, StringBuffer line)
         chunkBytesRead += len;
       }
 
+      contentBytesRead += chunkBytesRead;
       readLine(in, line, false);
 
     }


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> protocol-http does not respect the maximum content-size for chunked responses
> -----------------------------------------------------------------------------
>
>                 Key: NUTCH-2575
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2575
>             Project: Nutch
>          Issue Type: Sub-task
>    Affects Versions: 1.14
>            Reporter: Gerard Bouchar
>            Priority: Critical
>
> There is a bug in HttpResponse::readChunkedContent that prevents it to stop reading content when it exceeds the maximum allowed size.
> There [is a variable contentBytesRead|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L404] that is used to check how much content has been read, but it is never updated, so it always stays null, and [the size check|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L440-L442] always returns false (unless a single chunk is larger than the maximum allowed content size).
> This allows any server to cause out-of-memory errors on our size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)