You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (Jira)" <ji...@apache.org> on 2020/07/30 07:58:00 UTC

[jira] [Commented] (TEZ-4211) Optimise MergeManager final merge

    [ https://issues.apache.org/jira/browse/TEZ-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167729#comment-17167729 ] 

Rajesh Balamohan commented on TEZ-4211:
---------------------------------------

Attaching wip patch

> Optimise MergeManager final merge
> ---------------------------------
>
>                 Key: TEZ-4211
>                 URL: https://issues.apache.org/jira/browse/TEZ-4211
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Priority: Major
>         Attachments: TEZ-4211.wip.patch
>
>
> There are cases, when entire data is held in memory and no disk segments are present in MergeManager. Currently, mergemanager spills mem segments to disk before proceeding.
>  
> [https://github.com/apache/tez/blob/master/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/MergeManager.java#L1184]
>  
> {code:java}
> if (numMemDiskSegments > 0 && ioSortFactor > onDiskMapOutputs.size()) {
> ...
> ..
> TezMerger.writeFile(rIter, writer, progressable, TezRuntimeConfiguration.TEZ_RUNTIME_RECORDS_BEFORE_PROGRESS_DEFAULT);
> ...
> ..
>  {code}
> This can be optimised not to spill to disk when only mem segments are present.
> Snippet from logs in one of the apps (Q78)
> {noformat}
>  [ShuffleAndMergeRunner {Map_1} ()] org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: finalMerge with #inMemoryOutputs=4112, size=839646500 and #onDiskOutputs=0, size=0
>  [ShuffleAndMergeRunner {Map_1} ()] org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: finalMerge with #inMemoryOutputs=4112, size=859378362 and #onDiskOutputs=0, size=0
>  [ShuffleAndMergeRunner {Map_1} ()] org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: finalMerge with #inMemoryOutputs=4112, size=856145179 and #onDiskOutputs=0, size=0
>  [ShuffleAndMergeRunner {Map_1} ()] org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: finalMerge with #inMemoryOutputs=4112, size=849878734 and #onDiskOutputs=0, size=0
>  [ShuffleAndMergeRunner {Map_1} ()] org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: finalMerge with #inMemoryOutputs=4112, size=842666749 and #onDiskOutputs=0, size=0
>  [ShuffleAndMergeRunner {Map_1} ()] org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: finalMerge with #inMemoryOutputs=4112, size=839533127 and #onDiskOutputs=0, size=0
>  [ShuffleAndMergeRunner {Map_1} ()] org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: finalMerge with #inMemoryOutputs=4112, size=860448335 and #onDiskOutputs=0, size=0
>  [ShuffleAndMergeRunner {Map_1} ()] org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: finalMerge with #inMemoryOutputs=4112, size=844468505 and #onDiskOutputs=0, size=0
>  [ShuffleAndMergeRunner {Map_1} ()] org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: finalMerge with #inMemoryOutputs=4112, size=850099810 and #onDiskOutputs=0, size=0
>  [ShuffleAndMergeRunner {Map_1} ()] org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: finalMerge with #inMemoryOutputs=4112, size=849206236 and #onDiskOutputs=0, size=0
>  [ShuffleAndMergeRunner {Map_1} ()] org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: finalMerge with #inMemoryOutputs=4112, size=840238680 and #onDiskOutputs=0, size=0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)