You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2018/11/08 11:10:00 UTC
[jira] [Commented] (HADOOP-15911) Over-eager allocation in ByteBufferUtil.fallbackRead

    [ https://issues.apache.org/jira/browse/HADOOP-15911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679606#comment-16679606 ] 

Steve Loughran commented on HADOOP-15911:
-----------------------------------------

Patches should be supplied as a .patch file, then hit "patch-submit". Jenkins likes tests for this.

w.r.t S3 download, whose library? S3A doesn't do byte buffers, AFAIK

> Over-eager allocation in ByteBufferUtil.fallbackRead
> ----------------------------------------------------
>
>                 Key: HADOOP-15911
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15911
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common
>            Reporter: Vanco Buca
>            Priority: Major
>
> The heap-memory path of ByteBufferUtil.fallbackRead ([see master branch code here|[https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ByteBufferUtil.java#L95])] massively overallocates memory when the underlying input stream returns data in smaller chunks. This happens on a regular basis when using the S3 input stream as input.
> The behavior is an O(N^2)-ish. In a recent debug session, we were trying to read 6MB, but getting 16K at a time. The code would:
>  * allocate 16M, use the first 16K
>  * allocate 16M - 16K, use the first 16K of that
>  * allocate 16M - 32K, use the first 16K of that
>  * (etc)
> The patch is simple. Here's the text version of the patch:
> {code}
> @@ -88,10 +88,17 @@ public final class ByteBufferUtil {
>          buffer.flip();
>        } else {
>          buffer.clear();
> -        int nRead = stream.read(buffer.array(),
> -          buffer.arrayOffset(), maxLength);
> -        if (nRead >= 0) {
> -          buffer.limit(nRead);
> +        int totalRead = 0;
> +        while (totalRead < maxLength) {
> +          final int nRead = stream.read(buffer.array(),
> +            buffer.arrayOffset() + totalRead, maxLength - totalRead);
> +          if (nRead <= 0) {
> +            break;
> +          }
> +          totalRead += nRead;
> +        }
> +        if (totalRead >= 0) {
> +          buffer.limit(totalRead);
>            success = true;
>          }
>        }
> {code}
> so, essentially, do the same thing that the code in the direct memory path is doing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org