You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@toree.apache.org by "poplav (JIRA)" <ji...@apache.org> on 2016/06/01 22:01:59 UTC

[jira] [Comment Edited] (TOREE-318) PySpark Interpreter Prints Are Results

    [ https://issues.apache.org/jira/browse/TOREE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304687#comment-15304687 ] 

poplav edited comment on TOREE-318 at 6/1/16 10:01 PM:
-------------------------------------------------------

* [In ExecuteRequestHandler | https://github.com/apache/incubator-toree/blob/master/kernel/src/main/scala/org/apache/toree/kernel/protocol/v5/handler/ExecuteRequestHandler.scala#L94] we are responding with an execute result if the ExecuteResult from the specific interpreter [.hasContent | https://github.com/apache/incubator-toree/blob/master/protocol/src/main/scala/org/apache/toree/kernel/protocol/v5/content/ExecuteResult.scala#L28]
* Noted in scala spark that on a print statement it does not have content, but on pySpark it does.  In [toree pyspark_runner |https://github.com/apache/incubator-toree/blob/master/pyspark-interpreter/src/main/resources/PySpark/pyspark_runner.py#L136] when we compile and evaluate we always respond with system output.  The pyspark_runner is similar logic as what [zepplin_pyspark | https://github.com/apache/incubator-zeppelin/blob/master/spark/src/main/resources/python/zeppelin_pyspark.py] has.  
* These are in contrast to [ipythons executor | https://github.com/ipython/ipython/blob/3c266c45cf8f7933cef9f7a5ccaab8f36f2e9a95/IPython/core/magics/execution.py#L1157] which checks if it is and expression and if it's only an expression records the output.  The ipython executor uses pythons AST which can check if the parsed code is an expression.  Looking at the [AST abstract grammar | https://docs.python.org/2/library/ast.html#abstract-grammar] Print falls under a statement which is not an expression and in our toree pyspark runner we should not be returning the output there (which would follow in line with scala spark based off logging the ExecuteRequestHandler that gets invoked by all interpreters).  
* Need to figure out how to port over ipython's expression checker logic as it has some additional [setups | https://github.com/ipython/ipython/blob/3c266c45cf8f7933cef9f7a5ccaab8f36f2e9a95/IPython/core/magics/execution.py#L1148].  The kernel tester should work well for this as we can assert for specific messages on an execution.
* [Edit] Tried porting over the ast parse logic from ipython, but it would only work in a python 2 env.  In 2 print was a statement in 3 it is an expression like any other function, so the distinction couldn't be made.  The workaround I am trying is get the parse tree from the code, if the last node/code segment of the parse tree is an expression modify it to be assigned to a variable and execute the modified parse tree.  If that variable is None the print statement would get caught in the output stream, but if it the variable is not None return that value as an execute result.  Involved adding a sendOutput in Brokerstate to flush the output stream that gets passed in and on sendOutput matching the code_id to the outputstream to use.


was (Author: poplav):
* [In ExecuteRequestHandler | https://github.com/apache/incubator-toree/blob/master/kernel/src/main/scala/org/apache/toree/kernel/protocol/v5/handler/ExecuteRequestHandler.scala#L94] we are responding with an execute result if the ExecuteResult from the specific interpreter [.hasContent | https://github.com/apache/incubator-toree/blob/master/protocol/src/main/scala/org/apache/toree/kernel/protocol/v5/content/ExecuteResult.scala#L28]
* Noted in scala spark that on a print statement it does not have content, but on pySpark it does.  In [toree pyspark_runner |https://github.com/apache/incubator-toree/blob/master/pyspark-interpreter/src/main/resources/PySpark/pyspark_runner.py#L136] when we compile and evaluate we always respond with system output.  The pyspark_runner is similar logic as what [zepplin_pyspark | https://github.com/apache/incubator-zeppelin/blob/master/spark/src/main/resources/python/zeppelin_pyspark.py] has.  
* These are in contrast to [ipythons executor | https://github.com/ipython/ipython/blob/3c266c45cf8f7933cef9f7a5ccaab8f36f2e9a95/IPython/core/magics/execution.py#L1157] which checks if it is and expression and if it's only an expression records the output.  The ipython executor uses pythons AST which can check if the parsed code is an expression.  Looking at the [AST abstract grammar | https://docs.python.org/2/library/ast.html#abstract-grammar] Print falls under a statement which is not an expression and in our toree pyspark runner we should not be returning the output there (which would follow in line with scala spark based off logging the ExecuteRequestHandler that gets invoked by all interpreters).  
* Need to figure out how to port over ipython's expression checker logic as it has some additional [setups | https://github.com/ipython/ipython/blob/3c266c45cf8f7933cef9f7a5ccaab8f36f2e9a95/IPython/core/magics/execution.py#L1148].  The kernel tester should work well for this as we can assert for specific messages on an execution.

> PySpark Interpreter Prints Are Results
> --------------------------------------
>
>                 Key: TOREE-318
>                 URL: https://issues.apache.org/jira/browse/TOREE-318
>             Project: TOREE
>          Issue Type: Bug
>            Reporter: Corey A Stubbs
>             Fix For: 0.1.0
>
>
> When running any code which outputs a print statement, the statement is sent back to the notebook as an execute result (see http://imgur.com/F0yO1nU). I have only tested this in PySpark, so I assume this could be broken across all interpreters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)