You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by GitBox <gi...@apache.org> on 2020/03/24 03:20:38 UTC

[GitHub] [zeppelin] zjffdu opened a new pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output

zjffdu opened a new pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output
URL: https://github.com/apache/zeppelin/pull/3696
 
 
   ### What is this PR for?
   
   The root cause of this issue is that we didn't redirect java output to interpreter output. This PR fix it via redirect java output before interpreting python code in both PySparkInterpreter & IPySparkInterpreter. Unit test is also added to verify this feature. 
   
   
   ### What type of PR is it?
   [Bug Fix]
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   * https://issues.apache.org/jira/browse/ZEPPELIN-4692
   
   ### How should this be tested?
   * Unit test is added, also manually tested it
   
   
   ### Screenshots (if appropriate)
   
   ![image](https://user-images.githubusercontent.com/164491/77384871-78249300-6dc1-11ea-9cdd-98d17a2ebbf6.png)
   
   
   ### Questions:
   * Does the licenses files need update? No
   * Is there breaking changes for older versions? No
   * Does this needs documentation? NO
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] alexott commented on a change in pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output

Posted by GitBox <gi...@apache.org>.
alexott commented on a change in pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output
URL: https://github.com/apache/zeppelin/pull/3696#discussion_r399209903
 
 

 ##########
 File path: spark/interpreter/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
 ##########
 @@ -125,8 +127,18 @@ protected ZeppelinContext createZeppelinContext() {
   @Override
   public InterpreterResult interpret(String st, InterpreterContext context)
       throws InterpreterException {
-    Utils.printDeprecateMessage(sparkInterpreter.getSparkVersion(), context, properties);
-    return super.interpret(st, context);
+    // redirect java stdout/stdout to interpreter output. Because pyspark may call java code.
+    PrintStream originalStdout = System.out;
+    PrintStream originalStderr = System.err;
+    try {
+      System.setOut(new PrintStream(context.out));
+      System.setErr(new PrintStream(context.out));
+      Utils.printDeprecateMessage(sparkInterpreter.getSparkVersion(), context, properties);
+      return super.interpret(st, context);
 
 Review comment:
   But it's not called anywhere in the context of the PySparkInterpreter - in the whole source code I see only similar function in PythonInterpreter but it's called inside it...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] alexott commented on a change in pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output

Posted by GitBox <gi...@apache.org>.
alexott commented on a change in pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output
URL: https://github.com/apache/zeppelin/pull/3696#discussion_r398582974
 
 

 ##########
 File path: spark/interpreter/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
 ##########
 @@ -125,8 +127,18 @@ protected ZeppelinContext createZeppelinContext() {
   @Override
   public InterpreterResult interpret(String st, InterpreterContext context)
       throws InterpreterException {
-    Utils.printDeprecateMessage(sparkInterpreter.getSparkVersion(), context, properties);
-    return super.interpret(st, context);
+    // redirect java stdout/stdout to interpreter output. Because pyspark may call java code.
+    PrintStream originalStdout = System.out;
+    PrintStream originalStderr = System.err;
+    try {
+      System.setOut(new PrintStream(context.out));
+      System.setErr(new PrintStream(context.out));
+      Utils.printDeprecateMessage(sparkInterpreter.getSparkVersion(), context, properties);
+      return super.interpret(st, context);
 
 Review comment:
   Shouldn't we set pool, job group, etc., like in IPython interpreter?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] zjffdu commented on a change in pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output

Posted by GitBox <gi...@apache.org>.
zjffdu commented on a change in pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output
URL: https://github.com/apache/zeppelin/pull/3696#discussion_r399736561
 
 

 ##########
 File path: spark/interpreter/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
 ##########
 @@ -125,8 +127,18 @@ protected ZeppelinContext createZeppelinContext() {
   @Override
   public InterpreterResult interpret(String st, InterpreterContext context)
       throws InterpreterException {
-    Utils.printDeprecateMessage(sparkInterpreter.getSparkVersion(), context, properties);
-    return super.interpret(st, context);
+    // redirect java stdout/stdout to interpreter output. Because pyspark may call java code.
+    PrintStream originalStdout = System.out;
+    PrintStream originalStderr = System.err;
+    try {
+      System.setOut(new PrintStream(context.out));
+      System.setErr(new PrintStream(context.out));
+      Utils.printDeprecateMessage(sparkInterpreter.getSparkVersion(), context, properties);
+      return super.interpret(st, context);
 
 Review comment:
   preCall is a hook before running user code, PySparkInterpreter extends PythonInterpreter. https://github.com/apache/zeppelin/blob/master/python/src/main/java/org/apache/zeppelin/python/PythonInterpreter.java#L347
   https://github.com/apache/zeppelin/blob/master/python/src/main/java/org/apache/zeppelin/python/PythonInterpreter.java#L384

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] zjffdu commented on issue #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output

Posted by GitBox <gi...@apache.org>.
zjffdu commented on issue #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output
URL: https://github.com/apache/zeppelin/pull/3696#issuecomment-605644016
 
 
   Thanks for the review @alexott will merge if no more comment

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] zjffdu commented on a change in pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output

Posted by GitBox <gi...@apache.org>.
zjffdu commented on a change in pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output
URL: https://github.com/apache/zeppelin/pull/3696#discussion_r399064011
 
 

 ##########
 File path: spark/interpreter/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
 ##########
 @@ -125,8 +127,18 @@ protected ZeppelinContext createZeppelinContext() {
   @Override
   public InterpreterResult interpret(String st, InterpreterContext context)
       throws InterpreterException {
-    Utils.printDeprecateMessage(sparkInterpreter.getSparkVersion(), context, properties);
-    return super.interpret(st, context);
+    // redirect java stdout/stdout to interpreter output. Because pyspark may call java code.
+    PrintStream originalStdout = System.out;
+    PrintStream originalStderr = System.err;
+    try {
+      System.setOut(new PrintStream(context.out));
+      System.setErr(new PrintStream(context.out));
+      Utils.printDeprecateMessage(sparkInterpreter.getSparkVersion(), context, properties);
+      return super.interpret(st, context);
 
 Review comment:
   They are here https://github.com/apache/zeppelin/blob/master/spark/interpreter/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java#L133

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] zjffdu commented on a change in pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output

Posted by GitBox <gi...@apache.org>.
zjffdu commented on a change in pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output
URL: https://github.com/apache/zeppelin/pull/3696#discussion_r399736561
 
 

 ##########
 File path: spark/interpreter/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
 ##########
 @@ -125,8 +127,18 @@ protected ZeppelinContext createZeppelinContext() {
   @Override
   public InterpreterResult interpret(String st, InterpreterContext context)
       throws InterpreterException {
-    Utils.printDeprecateMessage(sparkInterpreter.getSparkVersion(), context, properties);
-    return super.interpret(st, context);
+    // redirect java stdout/stdout to interpreter output. Because pyspark may call java code.
+    PrintStream originalStdout = System.out;
+    PrintStream originalStderr = System.err;
+    try {
+      System.setOut(new PrintStream(context.out));
+      System.setErr(new PrintStream(context.out));
+      Utils.printDeprecateMessage(sparkInterpreter.getSparkVersion(), context, properties);
+      return super.interpret(st, context);
 
 Review comment:
   preCall is a hook before running user code, https://github.com/apache/zeppelin/blob/master/python/src/main/java/org/apache/zeppelin/python/PythonInterpreter.java#L347
   https://github.com/apache/zeppelin/blob/master/python/src/main/java/org/apache/zeppelin/python/PythonInterpreter.java#L384

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] asfgit closed pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output
URL: https://github.com/apache/zeppelin/pull/3696
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] alexott commented on a change in pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output

Posted by GitBox <gi...@apache.org>.
alexott commented on a change in pull request #3696: [ZEPPELIN-4692]. zeppelin pyspark doesn't print java output
URL: https://github.com/apache/zeppelin/pull/3696#discussion_r399767848
 
 

 ##########
 File path: spark/interpreter/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
 ##########
 @@ -125,8 +127,18 @@ protected ZeppelinContext createZeppelinContext() {
   @Override
   public InterpreterResult interpret(String st, InterpreterContext context)
       throws InterpreterException {
-    Utils.printDeprecateMessage(sparkInterpreter.getSparkVersion(), context, properties);
-    return super.interpret(st, context);
+    // redirect java stdout/stdout to interpreter output. Because pyspark may call java code.
+    PrintStream originalStdout = System.out;
+    PrintStream originalStderr = System.err;
+    try {
+      System.setOut(new PrintStream(context.out));
+      System.setErr(new PrintStream(context.out));
+      Utils.printDeprecateMessage(sparkInterpreter.getSparkVersion(), context, properties);
+      return super.interpret(st, context);
 
 Review comment:
   Thank you Jeff! I somehow missed the `@Override` :-(

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services