You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2017/04/07 06:14:41 UTC

[jira] [Commented] (TEZ-3673) Allocate smaller buffers in UnorderedPartitionedKVWriter

    [ https://issues.apache.org/jira/browse/TEZ-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960344#comment-15960344 ] 

Siddharth Seth commented on TEZ-3673:
-------------------------------------

Don't think a new configuration is required to say "USE 32M buffers". Buffer size control should already be possible via tez.runtime.unordered.output.max-per-buffer.size-bytes ?

Instead, I think a Configuration is required which indicates when a spill should happen. Something like.
<=0 -> Spill each buffer individually
0-100 -> Trigger point as percentage of entire buffer which will cause a spill. (Wrapped to per-buffer boundaries, ceiled). 75% of 10 buffers would mean spill after 8 buffers.

With Final merge avoidance, we would spill after each buffer.
Wit Final merge enabled, spill less frequently.

cc [~rajesh.balamohan] - any thoughts on this from a performance standpoint?

> Allocate smaller buffers in UnorderedPartitionedKVWriter
> --------------------------------------------------------
>
>                 Key: TEZ-3673
>                 URL: https://issues.apache.org/jira/browse/TEZ-3673
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Harish Jaiprakash
>            Assignee: Harish Jaiprakash
>         Attachments: TEZ-3673.01.patch
>
>
> UnorderedPartitionedKVWriter allocates in bigger chunks. It may or may not get filled up. In PipelinedSorter, we start off with 32MB chunks. But UnorderedPartitionedKVWriter can be worse as it allocates bigger blocks. Need to revisit this allocation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)