You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (Jira)" <ji...@apache.org> on 2020/07/29 04:00:01 UTC

[jira] [Comment Edited] (TEZ-4208) Pipelinesorter uses single SortSpan after spill

    [ https://issues.apache.org/jira/browse/TEZ-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166857#comment-17166857 ] 

Rajesh Balamohan edited comment on TEZ-4208 at 7/29/20, 3:59 AM:
-----------------------------------------------------------------

Q67 runtime with/without patch in internal cluster @ 10 TB scale:
|| ||Without Patch||With Patch||
|Job Runtime (in seconds)|1961.63 s|1656.14 s|
|TaskCounter_Map_1_OUTPUT_Reducer_2:|x|x |
|OUTPUT_BYTES_PHYSICAL: |457771151796|311823523913|
|OUTPUT_RECORDS:|20169930972|20169930972|
|SHUFFLE_CHUNK_COUNT:|37776|5193|


was (Author: rajesh.balamohan):
Q67 runtime with/without patch in internal cluster @ 10 TB scale:
|| ||Without Patch||With Patch||
|Job Runtime (in seconds)|1961.63 s|1656.14 s|
|TaskCounter_Map_1_OUTPUT_Reducer_2:|
 
| |
|OUTPUT_BYTES_PHYSICAL: |457771151796|311823523913|
|OUTPUT_RECORDS:|20169930972|20169930972|
|SHUFFLE_CHUNK_COUNT:|37776|5193|

> Pipelinesorter uses single SortSpan after spill
> -----------------------------------------------
>
>                 Key: TEZ-4208
>                 URL: https://issues.apache.org/jira/browse/TEZ-4208
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Priority: Major
>         Attachments: TEZ-4208.1.patch, q67_sorter.log
>
>
> Though it could have created multiple spans, tez always uses the first span after spill. It is quite possible that other spans are bigger compared to the first one, due to progressive space allocation.  Fixing this would help in reducing the number of spills (depending on the jobs) and lesser load for indexcache entries (as lesser number of files have to be opened).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)