You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2014/09/22 09:43:33 UTC
[jira] [Issue Comment Deleted] (TEZ-978) Enhance auto parallelism
tuning for queries having empty outputs or data skewness
[ https://issues.apache.org/jira/browse/TEZ-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bikas Saha updated TEZ-978:
---------------------------
Comment: was deleted
(was: looks good. committing this.)
> Enhance auto parallelism tuning for queries having empty outputs or data skewness
> ---------------------------------------------------------------------------------
>
> Key: TEZ-978
> URL: https://issues.apache.org/jira/browse/TEZ-978
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-978-v1.patch, TEZ-978-v2.patch, TEZ-978.3.patch, TEZ-978.4.patch, TEZ-978.4.wip.patch, TEZ-978.5.patch, TEZ-978.6.patch
>
>
> Running tpcds (query-92) with auto-tuning "tez.am.shuffle-vertex-manager.enable.auto-parallel" degraded the performance than original run.
> Query has lots of empty outputs and these tasks tend to complete a lot more faster than others. Tez computes the parallelism with the given information (wherein most of the output is empty) and set the reducers to "1". When other tasks complete, single reducer has to do the heavy lifting and this causes the performance degradation.
> Map 1: 2/181 Map 5: 16/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 2/181 Map 5: 22/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 2/181 Map 5: 25/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 2/181 Map 5: 30/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 2/181 Map 5: 35/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 2/181 Map 5: 36/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 2/181 Map 5: 39/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 3/181 Map 5: 43/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
> Map 1: 5/181 Map 5: 46/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1 <=== ShuffleVertexManager changing parallelism
> Map 1: 5/181 Map 5: 63/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 7/181 Map 5: 72/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 7/181 Map 5: 83/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 8/181 Map 5: 95/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 8/181 Map 5: 104/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 9/181 Map 5: 116/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 12/181 Map 5: 123/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 13/181 Map 5: 127/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 16/181 Map 5: 127/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 17/181 Map 5: 128/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 18/181 Map 5: 131/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 19/181 Map 5: 131/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 25/181 Map 5: 132/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 33/181 Map 5: 132/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 42/181 Map 5: 134/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1 <=== ShuffleVertexManager changing parallelism
> Map 1: 51/181 Map 5: 135/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/1 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 58/181 Map 5: 136/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/1 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 63/181 Map 5: 136/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/1 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Map 1: 70/181 Map 5: 136/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/1 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
> Suggestion is to include
> 1. Empty output information when computing auto-parallelism.
> 2. Have a configurable value for determining the average output from the source (e.g minimum of 1 MB output from each source). If the average task output size does not meet this criteria (which means all the completed tasks are small tasks), we can defer the computation of auto-parallelism until other tasks are completed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)