You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2016/07/08 22:25:11 UTC

[jira] [Commented] (TEZ-3332) Parallelize closing of outputs

    [ https://issues.apache.org/jira/browse/TEZ-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368598#comment-15368598 ] 

Rohini Palaniswamy commented on TEZ-3332:
-----------------------------------------

Below example is on tiny data, so it finished fast. For larger data, parallelizing can provide considerable speedup.

{code}
2016-07-07 21:39:23,392 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush of map output
2016-07-07 21:39:23,392 [INFO] [TezChild] |dflt.DefaultSorter|: scope-525: Sorting & Spilling map output. bufstart = 0, bufend = 4091674, bufvoid = 268435456; kvstart=67108860(268435440), kvend = 67104732(268418928), length = 4129/16777216
2016-07-07 21:39:23,419 [INFO] [TezChild] |compress.CodecPool|: Got brand-new compressor [.lzo_deflate]
2016-07-07 21:39:23,452 [INFO] [TezChild] |mapReduceLayer.PigCombiner$Combine|: Aliases being processed per job phase (AliasName[line,offset]): null
2016-07-07 21:39:23,860 [INFO] [TezChild] |dflt.DefaultSorter|: scope-525: Finished spill 0
2016-07-07 21:39:23,894 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush of map output
2016-07-07 21:39:23,894 [INFO] [TezChild] |dflt.DefaultSorter|: scope-554: Sorting & Spilling map output. bufstart = 0, bufend = 493566, bufvoid = 268435456; kvstart=67108860(268435440), kvend = 67102792(268411168), length = 6069/16777216
2016-07-07 21:39:24,127 [INFO] [TezChild] |dflt.DefaultSorter|: scope-554: Finished spill 0
2016-07-07 21:39:24,130 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush of map output
2016-07-07 21:39:24,130 [INFO] [TezChild] |dflt.DefaultSorter|: scope-512: Sorting & Spilling map output. bufstart = 0, bufend = 769, bufvoid = 268435456; kvstart=67108860(268435440), kvend = 67108856(268435424), length = 5/16777216
2016-07-07 21:39:24,148 [INFO] [TezChild] |dflt.DefaultSorter|: scope-512: Finished spill 0
2016-07-07 21:39:24,151 [INFO] [TezChild] |shuffle.ShuffleUtils|: EmptyPartition bitsetSize=18, numOutputs=20, emptyPartitions=18, compressedSize=11
2016-07-07 21:39:24,152 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush of map output
2016-07-07 21:39:24,152 [INFO] [TezChild] |dflt.DefaultSorter|: scope-490: Sorting & Spilling map output. bufstart = 0, bufend = 5539516, bufvoid = 268435456; kvstart=67108860(268435440), kvend = 67107376(268429504), length = 1485/16777216
2016-07-07 21:39:24,361 [INFO] [TezChild] |dflt.DefaultSorter|: scope-490: Finished spill 0
2016-07-07 21:39:24,363 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush of map output
2016-07-07 21:39:24,363 [INFO] [TezChild] |dflt.DefaultSorter|: scope-541: Sorting & Spilling map output. bufstart = 0, bufend = 12169, bufvoid = 268435456; kvstart=67108860(268435440), kvend = 67108736(268434944), length = 125/16777216
2016-07-07 21:39:24,662 [INFO] [TezChild] |dflt.DefaultSorter|: scope-541: Finished spill 0
{code}

> Parallelize closing of outputs
> ------------------------------
>
>                 Key: TEZ-3332
>                 URL: https://issues.apache.org/jira/browse/TEZ-3332
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>
> Currently it is serial and when there are multiple outputs it can take time to finish sorting and running the combiner



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)