You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/02/06 03:45:26 UTC

[GitHub] [hive] okumin commented on a change in pull request #1744: HIVE-24485: Make the slow-start behavior tunable

okumin commented on a change in pull request #1744:
URL: https://github.com/apache/hive/pull/1744#discussion_r571351354



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java
##########
@@ -81,6 +82,12 @@ public static ReduceWork createReduceWork(
     boolean isAutoReduceParallelism =
         context.conf.getBoolVar(HiveConf.ConfVars.TEZ_AUTO_REDUCER_PARALLELISM);
 
+    float slowStartMaxSrcFraction = context.conf.getFloat(
+        ShuffleVertexManager.TEZ_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION,
+        ShuffleVertexManager.TEZ_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION_DEFAULT);
+    float slowStartMinSrcFraction = context.conf.getFloat(
+        ShuffleVertexManager.TEZ_SHUFFLE_VERTEX_MANAGER_MIN_SRC_FRACTION,
+        ShuffleVertexManager.TEZ_SHUFFLE_VERTEX_MANAGER_MIN_SRC_FRACTION_DEFAULT);

Review comment:
       I wonder if we should add new parameters and use them instead of ones defined in ShuffleVertexManager.
   Looking through this class, it would be consistent to introduce new parameters and use `TEZ_SHUFFLE_VERTEX_MANAGER_{MIN, MAX}_SRC_FRACTION` only in `DagUtils.java`.
   However, Hive on Tez can access `TEZ_SHUFFLE_VERTEX_MANAGER_{MIN, MAX}_SRC_FRACTION` in other cases. So I thought it might be also confusing to have two methods to tweak slow-start behavior, one is for auto-reduce parallelism and the other one is for all other cases.
   https://github.com/apache/tez/blob/73bcabd2bca2536bf4f3673443a8dcdaaf79a4eb/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java#L2893-L2897
   
   My ideas are
   - use `TEZ_SHUFFLE_VERTEX_MANAGER_{MIN, MAX}_SRC_FRACTION`
   - add params for auto parallelism, like `TEZ_AUTO_REDUCER_PARALLELISM_{MIN, MAX}_SRC_FRACTION`, and use them only for auto parallelism
   - add params to configure slow-start behavior, like `TEZ_SLOW_START_{MIN, MAX}_SRC_FRACTION`, and use it for all cases, meaning Hive on Tez ignores `TEZ_SHUFFLE_VERTEX_MANAGER_{MIN, MAX}_SRC_FRACTION` configured by a user




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org