You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2014/09/22 09:43:33 UTC

[jira] [Issue Comment Deleted] (TEZ-978) Enhance auto parallelism tuning for queries having empty outputs or data skewness

     [ https://issues.apache.org/jira/browse/TEZ-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bikas Saha updated TEZ-978:
---------------------------
    Comment: was deleted

(was: looks good. committing this.)

> Enhance auto parallelism tuning for queries having empty outputs or data skewness
> ---------------------------------------------------------------------------------
>
>                 Key: TEZ-978
>                 URL: https://issues.apache.org/jira/browse/TEZ-978
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-978-v1.patch, TEZ-978-v2.patch, TEZ-978.3.patch, TEZ-978.4.patch, TEZ-978.4.wip.patch, TEZ-978.5.patch, TEZ-978.6.patch
>
>
> Running tpcds (query-92) with auto-tuning "tez.am.shuffle-vertex-manager.enable.auto-parallel" degraded the performance than original run.  
> Query has lots of empty outputs and these tasks tend to complete a lot more faster than others.  Tez computes the parallelism with the given information (wherein most of the output is empty) and set the reducers to "1".  When other tasks complete, single reducer has to do the heavy lifting and this causes the performance degradation.
> Map 1: 2/181    Map 5: 16/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 2/181    Map 5: 22/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 2/181    Map 5: 25/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 2/181    Map 5: 30/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 2/181    Map 5: 35/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 2/181    Map 5: 36/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 2/181    Map 5: 39/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 3/181    Map 5: 43/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/166
> Map 1: 5/181    Map 5: 46/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1   <=== ShuffleVertexManager changing parallelism 
> Map 1: 5/181    Map 5: 63/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 7/181    Map 5: 72/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 7/181    Map 5: 83/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 8/181    Map 5: 95/179   Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 8/181    Map 5: 104/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 9/181    Map 5: 116/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 12/181   Map 5: 123/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 13/181   Map 5: 127/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 16/181   Map 5: 127/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 17/181   Map 5: 128/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 18/181   Map 5: 131/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 19/181   Map 5: 131/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 25/181   Map 5: 132/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 33/181   Map 5: 132/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 42/181   Map 5: 134/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/109        Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1	 <=== ShuffleVertexManager changing parallelism 
> Map 1: 51/181   Map 5: 135/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/1  Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 58/181   Map 5: 136/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/1  Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 63/181   Map 5: 136/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/1  Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Map 1: 70/181   Map 5: 136/179  Map 7: 1/1      Map 8: 1/1      Reducer 2: 0/1  Reducer 3: 0/137        Reducer 4: 0/1  Reducer 6: 0/1
> Suggestion is to include 
> 1. Empty output information when computing auto-parallelism.  
> 2. Have a configurable value for determining the average output from the source (e.g minimum of 1 MB output from each source).  If the average task output size does not meet this criteria (which means all the completed tasks are small tasks), we can defer the computation of auto-parallelism until other tasks are completed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)