You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by GitBox <gi...@apache.org> on 2021/10/14 12:45:41 UTC

[GitHub] [zeppelin] zjffdu opened a new pull request #4255: [ZEPPELIN-5560] spark yarn app end with failed status in yarn-cluster mode

zjffdu opened a new pull request #4255:
URL: https://github.com/apache/zeppelin/pull/4255


   
   ### What is this PR for?
   
   The root cause is that `RemoteInterpreterServer` would call System.exit to forceShutdown spark driver in yarn-cluster mode.
   This PR would disable forceShutdown in spark's yarn-cluster mode.
   
   ### What type of PR is it?
   [ Improvement ]
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   * https://issues.apache.org/jira/browse/ZEPPELIN-5560
   
   ### How should this be tested?
   * CI
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * Does the licenses files need update? No
   * Is there breaking changes for older versions? No
   * Does this needs documentation? No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zeppelin] zjffdu commented on pull request #4255: [ZEPPELIN-5560] spark yarn app end with failed status in yarn-cluster mode

Posted by GitBox <gi...@apache.org>.
zjffdu commented on pull request #4255:
URL: https://github.com/apache/zeppelin/pull/4255#issuecomment-946337110


   Will merge if no more comment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zeppelin] zjffdu commented on pull request #4255: [ZEPPELIN-5560] spark yarn app end with failed status in yarn-cluster mode

Posted by GitBox <gi...@apache.org>.
zjffdu commented on pull request #4255:
URL: https://github.com/apache/zeppelin/pull/4255#issuecomment-945450998


   @Reamer Could you help review this ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zeppelin] zjffdu commented on a change in pull request #4255: [ZEPPELIN-5560] spark yarn app end with failed status in yarn-cluster mode

Posted by GitBox <gi...@apache.org>.
zjffdu commented on a change in pull request #4255:
URL: https://github.com/apache/zeppelin/pull/4255#discussion_r730682888



##########
File path: zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/launcher/SparkInterpreterLauncher.java
##########
@@ -98,9 +98,14 @@ public SparkInterpreterLauncher(ZeppelinConfiguration zConf, RecoveryStorage rec
       sparkProperties.setProperty("spark.pyspark.python", condaEnvName + "/bin/python");
     }
 
-    if (isYarnMode() && getDeployMode().equals("cluster")) {
+    if (isYarnCluster()) {
       env.put("ZEPPELIN_SPARK_YARN_CLUSTER", "true");
       sparkProperties.setProperty("spark.yarn.submit.waitAppCompletion", "false");
+      // Need to set `zeppelin.interpreter.forceShutdown` in interpreter properties directly
+      // instead of updating sparkProperties.
+      // Because `zeppelin.interpreter.forceShutdown` is initialized in RemoteInterpreterServer
+      // before SparkInterpreter is created.
+      context.getProperties().put("zeppelin.interpreter.forceShutdown", "false");

Review comment:
       It would pass to RemoteInterpreterServer when creating interpreter.
   https://github.com/apache/zeppelin/blob/master/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterServer.java#L356




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zeppelin] jongyoul commented on a change in pull request #4255: [ZEPPELIN-5560] spark yarn app end with failed status in yarn-cluster mode

Posted by GitBox <gi...@apache.org>.
jongyoul commented on a change in pull request #4255:
URL: https://github.com/apache/zeppelin/pull/4255#discussion_r730654425



##########
File path: zeppelin-zengine/src/main/java/org/apache/zeppelin/interpreter/launcher/SparkInterpreterLauncher.java
##########
@@ -98,9 +98,14 @@ public SparkInterpreterLauncher(ZeppelinConfiguration zConf, RecoveryStorage rec
       sparkProperties.setProperty("spark.pyspark.python", condaEnvName + "/bin/python");
     }
 
-    if (isYarnMode() && getDeployMode().equals("cluster")) {
+    if (isYarnCluster()) {
       env.put("ZEPPELIN_SPARK_YARN_CLUSTER", "true");
       sparkProperties.setProperty("spark.yarn.submit.waitAppCompletion", "false");
+      // Need to set `zeppelin.interpreter.forceShutdown` in interpreter properties directly
+      // instead of updating sparkProperties.
+      // Because `zeppelin.interpreter.forceShutdown` is initialized in RemoteInterpreterServer
+      // before SparkInterpreter is created.
+      context.getProperties().put("zeppelin.interpreter.forceShutdown", "false");

Review comment:
       I'm just curious but this setting helps to send proper status code to yarn?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zeppelin] asfgit closed pull request #4255: [ZEPPELIN-5560] spark yarn app end with failed status in yarn-cluster mode

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #4255:
URL: https://github.com/apache/zeppelin/pull/4255


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@zeppelin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org