You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2017/04/05 05:06:41 UTC

[jira] [Comment Edited] (TEZ-3680) Optimizations to UnorderedPartitionedKVWriter

    [ https://issues.apache.org/jira/browse/TEZ-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956305#comment-15956305 ] 

Rajesh Balamohan edited comment on TEZ-3680 at 4/5/17 5:06 AM:
---------------------------------------------------------------

Changes are for counter updating and for increasing number of threads in threadpool. 

Merging all partitions at the time of close() is very expensive currently depending on the number of partitions and spills (e.g for 20 spills with 1009 partitions, it ends up reopening the file lots of times). And reading/decompressing/appending/compressing data to final file. Haven't considered optimizing this.


was (Author: rajesh.balamohan):
Changes are for counter updating and for increasing number of threads in threadpool. 

Merging all partitions at the time of close() is very expensive currently depending on the number of partitions and spills (e.g for 20 spills with 1009 partitions, it ends up reopening the file lots of times). Haven't considered optimizing this.

> Optimizations to UnorderedPartitionedKVWriter
> ---------------------------------------------
>
>                 Key: TEZ-3680
>                 URL: https://issues.apache.org/jira/browse/TEZ-3680
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>         Attachments: profiler.png, TEZ-3680.1.patch
>
>
> 1. Consider increasing the number of threads in spill executor. {{TEZ_RUNTIME_UNORDERED_OUTPUT_MAX_PER_BUFFER_SIZE_BYTES}} can be used to configure the buffer size. If smaller buffer sizes are provided, there is a chance of getting frequent spills; currently the spill executor operates in single threaded mode.
> 2. During profiling, things like incrementing the counters, notifying progress came up. This may not be common in regular tez jobs. But in processes like LLAP (hive based), it is possible to get into such situations. I will attach the profiler snapshot showing this. It would be good to update/notify less frequently.
> 3. Optimize mergeAll().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)