You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/02/02 20:43:04 UTC

[GitHub] [hive] belugabehr opened a new pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

belugabehr opened a new pull request #1939:
URL: https://github.com/apache/hive/pull/1939


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr commented on pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-788341945


   @abstractdog @pgaref Hey, thanks so much for your time.  I know it's not the most exciting work ever to review, but I hope that this will slowly improve the project and provide good examples for others to follow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
abstractdog commented on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-784969170


   > > @pgaref Are you OK with your 'LGTM' or would you like @abstractdog to weigh in?
   > 
   > Hey @belugabehr -- I would like @abstractdog to check this out so lets give it a couple of days more. I will take another look if he is out of spare cycles :)
   
   basically, it looks good to me, how have you tested this?
   btw, I'm running some basic tests on cluster to ensure correct behavior


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr commented on pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-774804590


   @pgaref  OK.  I got the current implementation paired down quite a bit.  Please review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr commented on a change in pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
belugabehr commented on a change in pull request #1939:
URL: https://github.com/apache/hive/pull/1939#discussion_r569594294



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java
##########
@@ -395,20 +379,38 @@ int getInitialSize() {
     } while (!deltaRemaining.compareAndSet(oldVal, oldVal + delta));
     int toStart = oldVal + delta;
     if (toStart <= 0) return createDummyFuture();
-    LOG.info("Resizing the pool; adding " + toStart + " sessions");
-
-    // 2) If we need to create some extra sessions, we'd do it just like startup does.
-    int threadCount = Math.max(1, Math.min(toStart,
-        HiveConf.getIntVar(initConf, ConfVars.HIVE_SERVER2_TEZ_SESSION_MAX_INIT_THREADS)));
-    List<ListenableFutureTask<Boolean>> threadTasks = new ArrayList<>(threadCount);
-    // This is an async method, so always launch threads, even for a single task.
-    for (int i = 0; i < threadCount; ++i) {
-      ListenableFutureTask<Boolean> task = ListenableFutureTask.create(
-          new CreateSessionsRunnable(deltaRemaining));
-      new Thread(task, "Tez pool resize " + i).start();
-      threadTasks.add(task);
+    LOG.info("Resizing the pool; adding {} sessions", toStart);
+
+    final int threadCount =
+        Math.min(toStart, HiveConf.getIntVar(initConf, ConfVars.HIVE_SERVER2_TEZ_SESSION_MAX_INIT_THREADS));

Review comment:
       I would advise against that.  The first one initialized the pool, the second one is to increase the existing pool.  I think the naming is OK.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java
##########
@@ -395,20 +379,38 @@ int getInitialSize() {
     } while (!deltaRemaining.compareAndSet(oldVal, oldVal + delta));
     int toStart = oldVal + delta;
     if (toStart <= 0) return createDummyFuture();
-    LOG.info("Resizing the pool; adding " + toStart + " sessions");
-
-    // 2) If we need to create some extra sessions, we'd do it just like startup does.
-    int threadCount = Math.max(1, Math.min(toStart,
-        HiveConf.getIntVar(initConf, ConfVars.HIVE_SERVER2_TEZ_SESSION_MAX_INIT_THREADS)));
-    List<ListenableFutureTask<Boolean>> threadTasks = new ArrayList<>(threadCount);
-    // This is an async method, so always launch threads, even for a single task.
-    for (int i = 0; i < threadCount; ++i) {
-      ListenableFutureTask<Boolean> task = ListenableFutureTask.create(
-          new CreateSessionsRunnable(deltaRemaining));
-      new Thread(task, "Tez pool resize " + i).start();
-      threadTasks.add(task);
+    LOG.info("Resizing the pool; adding {} sessions", toStart);
+
+    final int threadCount =
+        Math.min(toStart, HiveConf.getIntVar(initConf, ConfVars.HIVE_SERVER2_TEZ_SESSION_MAX_INIT_THREADS));
+
+    return createSessions(toStart, threadCount);
+  }
+
+  private ListenableFuture<List<Boolean>> createSessions(int sessionCount, int maxParallel) {

Review comment:
       Sure




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr merged pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
belugabehr merged pull request #1939:
URL: https://github.com/apache/hive/pull/1939


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref commented on a change in pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #1939:
URL: https://github.com/apache/hive/pull/1939#discussion_r569583989



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java
##########
@@ -395,20 +379,38 @@ int getInitialSize() {
     } while (!deltaRemaining.compareAndSet(oldVal, oldVal + delta));
     int toStart = oldVal + delta;
     if (toStart <= 0) return createDummyFuture();
-    LOG.info("Resizing the pool; adding " + toStart + " sessions");
-
-    // 2) If we need to create some extra sessions, we'd do it just like startup does.
-    int threadCount = Math.max(1, Math.min(toStart,
-        HiveConf.getIntVar(initConf, ConfVars.HIVE_SERVER2_TEZ_SESSION_MAX_INIT_THREADS)));
-    List<ListenableFutureTask<Boolean>> threadTasks = new ArrayList<>(threadCount);
-    // This is an async method, so always launch threads, even for a single task.
-    for (int i = 0; i < threadCount; ++i) {
-      ListenableFutureTask<Boolean> task = ListenableFutureTask.create(
-          new CreateSessionsRunnable(deltaRemaining));
-      new Thread(task, "Tez pool resize " + i).start();
-      threadTasks.add(task);
+    LOG.info("Resizing the pool; adding {} sessions", toStart);
+
+    final int threadCount =
+        Math.min(toStart, HiveConf.getIntVar(initConf, ConfVars.HIVE_SERVER2_TEZ_SESSION_MAX_INIT_THREADS));
+
+    return createSessions(toStart, threadCount);
+  }
+
+  private ListenableFuture<List<Boolean>> createSessions(int sessionCount, int maxParallel) {

Review comment:
       Shall we use same vars here? Add method doc?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java
##########
@@ -395,20 +379,38 @@ int getInitialSize() {
     } while (!deltaRemaining.compareAndSet(oldVal, oldVal + delta));
     int toStart = oldVal + delta;
     if (toStart <= 0) return createDummyFuture();
-    LOG.info("Resizing the pool; adding " + toStart + " sessions");
-
-    // 2) If we need to create some extra sessions, we'd do it just like startup does.
-    int threadCount = Math.max(1, Math.min(toStart,
-        HiveConf.getIntVar(initConf, ConfVars.HIVE_SERVER2_TEZ_SESSION_MAX_INIT_THREADS)));
-    List<ListenableFutureTask<Boolean>> threadTasks = new ArrayList<>(threadCount);
-    // This is an async method, so always launch threads, even for a single task.
-    for (int i = 0; i < threadCount; ++i) {
-      ListenableFutureTask<Boolean> task = ListenableFutureTask.create(
-          new CreateSessionsRunnable(deltaRemaining));
-      new Thread(task, "Tez pool resize " + i).start();
-      threadTasks.add(task);
+    LOG.info("Resizing the pool; adding {} sessions", toStart);
+
+    final int threadCount =
+        Math.min(toStart, HiveConf.getIntVar(initConf, ConfVars.HIVE_SERVER2_TEZ_SESSION_MAX_INIT_THREADS));

Review comment:
       Shall we rename toStart -> initialSize for consistency?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java
##########
@@ -395,20 +379,38 @@ int getInitialSize() {
     } while (!deltaRemaining.compareAndSet(oldVal, oldVal + delta));
     int toStart = oldVal + delta;
     if (toStart <= 0) return createDummyFuture();
-    LOG.info("Resizing the pool; adding " + toStart + " sessions");
-
-    // 2) If we need to create some extra sessions, we'd do it just like startup does.
-    int threadCount = Math.max(1, Math.min(toStart,
-        HiveConf.getIntVar(initConf, ConfVars.HIVE_SERVER2_TEZ_SESSION_MAX_INIT_THREADS)));
-    List<ListenableFutureTask<Boolean>> threadTasks = new ArrayList<>(threadCount);
-    // This is an async method, so always launch threads, even for a single task.
-    for (int i = 0; i < threadCount; ++i) {
-      ListenableFutureTask<Boolean> task = ListenableFutureTask.create(
-          new CreateSessionsRunnable(deltaRemaining));
-      new Thread(task, "Tez pool resize " + i).start();
-      threadTasks.add(task);
+    LOG.info("Resizing the pool; adding {} sessions", toStart);
+
+    final int threadCount =
+        Math.min(toStart, HiveConf.getIntVar(initConf, ConfVars.HIVE_SERVER2_TEZ_SESSION_MAX_INIT_THREADS));
+
+    return createSessions(toStart, threadCount);
+  }
+
+  private ListenableFuture<List<Boolean>> createSessions(int sessionCount, int maxParallel) {
+    Preconditions.checkArgument(sessionCount > 0);
+    Preconditions.checkArgument(maxParallel > 0);

Review comment:
       Second Precondition check is redundant




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref edited a comment on pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
pgaref edited a comment on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-776680783


   > @pgaref OK. I got the current implementation paired down quite a bit. Please review.
   
   Thanks @belugabehr changes LGTM.
   @abstractdog  can you also take a look please to these TezSession related changes? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
abstractdog commented on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-787701232


   > @abstractdog Just the existing unit tests. I have some confidence in the unit testing because they did indeed fail on a couple of mistakes I made along the way.
   
   yeah, I haven't found issues, LGTM, I think it's safe to commit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref commented on pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-788782335


   Thanks for the PR @belugabehr and the time to polish this! :) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr commented on pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-778217200


   @abstractdog If you have any cycles, can you please review this PR please? Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref commented on pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-772672176


   There are also some test timeouts -- are they related to this TezSessionPool change? cc @abstractdog 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr commented on pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-786000322


   @abstractdog Just the existing unit tests.  I have some confidence in the unit testing because they did indeed fail on a couple of mistakes I made along the way.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr commented on pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-772673304


   @pgaref Ya, tests failing were from this change.  I made a typo.  Will address your comments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref commented on pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-776680783


   > @pgaref OK. I got the current implementation paired down quite a bit. Please review.
   
   Thanks @belugabehr changes LGTM.
   @abstractdog  can you also take a look please? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr commented on pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-772547163


   @pgaref FYI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr commented on pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-784616061


   @pgaref Are you OK with your 'LGTM' or would you like @abstractdog to weigh in?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pgaref commented on pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-784947400


   > @pgaref Are you OK with your 'LGTM' or would you like @abstractdog to weigh in?
   
   Hey @belugabehr -- I would like @abstractdog to check this out so lets give it a couple of days more. I will take another look if he is out of spare cycles :) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr commented on a change in pull request #1939: HIVE-24723: Use ExecutorService in TezSessionPool

Posted by GitBox <gi...@apache.org>.
belugabehr commented on a change in pull request #1939:
URL: https://github.com/apache/hive/pull/1939#discussion_r569663760



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java
##########
@@ -395,20 +379,38 @@ int getInitialSize() {
     } while (!deltaRemaining.compareAndSet(oldVal, oldVal + delta));
     int toStart = oldVal + delta;
     if (toStart <= 0) return createDummyFuture();
-    LOG.info("Resizing the pool; adding " + toStart + " sessions");
-
-    // 2) If we need to create some extra sessions, we'd do it just like startup does.
-    int threadCount = Math.max(1, Math.min(toStart,
-        HiveConf.getIntVar(initConf, ConfVars.HIVE_SERVER2_TEZ_SESSION_MAX_INIT_THREADS)));
-    List<ListenableFutureTask<Boolean>> threadTasks = new ArrayList<>(threadCount);
-    // This is an async method, so always launch threads, even for a single task.
-    for (int i = 0; i < threadCount; ++i) {
-      ListenableFutureTask<Boolean> task = ListenableFutureTask.create(
-          new CreateSessionsRunnable(deltaRemaining));
-      new Thread(task, "Tez pool resize " + i).start();
-      threadTasks.add(task);
+    LOG.info("Resizing the pool; adding {} sessions", toStart);
+
+    final int threadCount =
+        Math.min(toStart, HiveConf.getIntVar(initConf, ConfVars.HIVE_SERVER2_TEZ_SESSION_MAX_INIT_THREADS));
+
+    return createSessions(toStart, threadCount);
+  }
+
+  private ListenableFuture<List<Boolean>> createSessions(int sessionCount, int maxParallel) {
+    Preconditions.checkArgument(sessionCount > 0);
+    Preconditions.checkArgument(maxParallel > 0);

Review comment:
       This was previous checked in the calling methods.  I have moved it here to remove redundancy.  Not an expensive operation and it's not called all that often, so no worries even if it is doubled somewhere.  Better safe than sorry as the code changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org