You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2018/11/08 11:10:00 UTC
[jira] [Commented] (HADOOP-15911) Over-eager allocation in
ByteBufferUtil.fallbackRead
[ https://issues.apache.org/jira/browse/HADOOP-15911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679606#comment-16679606 ]
Steve Loughran commented on HADOOP-15911:
-----------------------------------------
Patches should be supplied as a .patch file, then hit "patch-submit". Jenkins likes tests for this.
w.r.t S3 download, whose library? S3A doesn't do byte buffers, AFAIK
> Over-eager allocation in ByteBufferUtil.fallbackRead
> ----------------------------------------------------
>
> Key: HADOOP-15911
> URL: https://issues.apache.org/jira/browse/HADOOP-15911
> Project: Hadoop Common
> Issue Type: Bug
> Components: common
> Reporter: Vanco Buca
> Priority: Major
>
> The heap-memory path of ByteBufferUtil.fallbackRead ([see master branch code here|[https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ByteBufferUtil.java#L95])] massively overallocates memory when the underlying input stream returns data in smaller chunks. This happens on a regular basis when using the S3 input stream as input.
> The behavior is an O(N^2)-ish. In a recent debug session, we were trying to read 6MB, but getting 16K at a time. The code would:
> * allocate 16M, use the first 16K
> * allocate 16M - 16K, use the first 16K of that
> * allocate 16M - 32K, use the first 16K of that
> * (etc)
> The patch is simple. Here's the text version of the patch:
> {code}
> @@ -88,10 +88,17 @@ public final class ByteBufferUtil {
> buffer.flip();
> } else {
> buffer.clear();
> - int nRead = stream.read(buffer.array(),
> - buffer.arrayOffset(), maxLength);
> - if (nRead >= 0) {
> - buffer.limit(nRead);
> + int totalRead = 0;
> + while (totalRead < maxLength) {
> + final int nRead = stream.read(buffer.array(),
> + buffer.arrayOffset() + totalRead, maxLength - totalRead);
> + if (nRead <= 0) {
> + break;
> + }
> + totalRead += nRead;
> + }
> + if (totalRead >= 0) {
> + buffer.limit(totalRead);
> success = true;
> }
> }
> {code}
> so, essentially, do the same thing that the code in the direct memory path is doing
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org