You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by agoodm <gi...@git.apache.org> on 2016/09/19 23:19:48 UTC

[GitHub] zeppelin pull request #1439: [ZEPPELIN-1423] Allow users to specify pre/post...

GitHub user agoodm opened a pull request:

    https://github.com/apache/zeppelin/pull/1439

    [ZEPPELIN-1423] Allow users to specify pre/post-execute callbacks for interpreters

    ### What is this PR for?
    One feature built-in to Jupyter's ipykernel that is currently missing in Zeppelin are pre/post-execute "callbacks". This PR allows for the possibility of entering code that is set to be executed by the interpreter before  the code entered into the paragraph (pre-execute) and after (post-execute). In addition to saving some users the effort of writing code that they want executed in every paragraph, this will help pave the way for more advanced features coming soon, like automatically displaying matplotlib figures without an explicit call to a `show()` function. See the JIRA issue for more details on this.
    
    The core implementation for this is primarily contained in a new class called `InterpreterCallbackRegistry` which heavily mimics the Angular Display system (eg, `AngularObjectRegistry`). While this interface alone should suffice for Interpreter maintainers on the Java/JVM side, I also thought it might be useful to expose the system directly to the users in their notebooks. Hence, an implementation that works with the spark and pyspark interpreters (via the `ZeppelinContext` class) is also included here.
    
    ### What type of PR is it?
    New Feature
    
    ### Todos
    * [ ] - Add unit tests
    * [ ] - Add new documentation
    * [ ] - Find a better way to check for errors in pre-execute as this can brick the REPL (see more info below)
    
    ### What is the Jira issue?
    [ZEPPELIN-1423](https://issues.apache.org/jira/browse/ZEPPELIN-1423)
    
    ### How should this be tested?
    In a new note, add the following lines of code to a paragraph:
    ```python
    %pyspark
    z.registerCallback("post_exec", "print 'This code should be executed before the parapgraph code!'")
    z.registerCallback("pre_exec", "print 'This code should be executed after the paragraph code!'")
    ```
    
    Then run any other paragraph in the note containing some other code, eg
    ```python
    %pyspark
    print "This code should be entered into the paragraph by the user!"
    ```
    
    The output should be:
    ```
    This code should be executed after the paragraph code!
    This code should be entered into the paragraph by the user!
    This code should be executed before the parapgraph code!
    ```
    
    You should also test out the other two methods (`getCallback()` and `unregisterCallback()`) specified in `ZeppelinContext.java`.
    
    One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in `pre_exec` directly to the paragraph entered code before calling `interpret()`. The current workaround for this would be to either restart the interpreter group or call `unregisterCallback()` via a different REPL within the interpreter group (eg, `z.unregisterCallback("pyspark", "pre_exec")` from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here.
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? Yes


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/agoodm/zeppelin ZEPPELIN-1423

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1439.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1439
    
----
commit 2c4756d4d6861ba6192e52f0ca1c4c23b21a8070
Author: Alex Goodman <ag...@users.noreply.github.com>
Date:   2016-09-13T20:32:36Z

    First attempt at adding pre/post-execute callbacks
    
    Added support for pre/post-execute callback hooks to interpreters

commit e4e4ef3d8b4e3da1937d43892e88f030a7bd42b7
Author: Alex Goodman <ag...@users.noreply.github.com>
Date:   2016-09-19T22:48:34Z

    Added support for user defined callbacks in spark/pyspark interpreters

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1439: [ZEPPELIN-1423] Allow users to specify pre/post...

Posted by agoodm <gi...@git.apache.org>.

Github user agoodm closed the pull request at:

    https://github.com/apache/zeppelin/pull/1439


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1439: [ZEPPELIN-1423] Allow users to specify pre/post-execut...

Posted by Leemoonsoo <gi...@git.apache.org>.

Github user Leemoonsoo commented on the issue:

    https://github.com/apache/zeppelin/pull/1439
  
    @agoodm Thanks for the contribution.
    
    While documentation says "Interpreter Execution Callbacks" is experimental (which i agree), i think `@Experimental` annotation should be used for API instead of `@ZeppelinApi`.
    
    The implementation keeps callback registry in ZeppelinServer side. But i'm curios while it's volatile and per interpreter, I wonder if callback registry can be placed inside of Interpreter process, that might reduce complexity (no thrift communication) and increases flexibility (able to register actual function, not only the strings). What do you think? Could you elaborate these two choices?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1439: [ZEPPELIN-1423] Allow users to specify pre/post-execut...

Posted by agoodm <gi...@git.apache.org>.

Github user agoodm commented on the issue:

https://github.com/apache/zeppelin/pull/1439

@Leemoonsoo Thanks for the feedback.

Indeed I think your first point on the Experimental annotation made sense, so I have changed it.

As for your second point, I am not really sure how to best go about it myself. When I first created the JIRA issue, I came up with a quick implementation that used kept separate registries for each `Interpreter` instance. It was certainly simpler, but the main problem that I ran into was that I couldn't figure out how to make sure those changes in the registry entered by the user were broadcasted to the ZeppelinServer side. I studied the implementation of `AngularObjectRegistry` in order to understand how this could be done in Zeppelin, and that is ultimately what led me to this current implementation which uses thrift. I am still in the process of learning the core Zeppelin codebase, so if you had any specific ideas, I would appreciate it.

Also, can you also be specific on how this change would make it possible to pass actual references to functions instead of code strings (hence why I used put "callbacks" in quotes in the original PR)? I don't consider this to be too important but am just curious.

Finally, can you provide some insight for some of the odd CI build failures I have been experiencing with more recent commits?

Thanks!

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1439: [ZEPPELIN-1423] Allow users to specify pre/post-execut...

Posted by doanduyhai <gi...@git.apache.org>.

Github user doanduyhai commented on the issue:

    https://github.com/apache/zeppelin/pull/1439
  
    Yes it's very clear. Indeed a bunch of internal features (Angular registry, Callback registry ...) are exposed to end-users only via Spark interpreter. I guess that each interpreter implementor have to find a way to expose these features using their own DSL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1439: [ZEPPELIN-1423] Allow users to specify pre/post-execut...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/zeppelin/pull/1439
  
    That's great - would you like to update the rest of the file to make the style consistent?
    
    I understand the default value - I'm only pointing out that there are many users and use cases out there that do not use "pyspark" as name?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1439: [ZEPPELIN-1423] Allow users to specify pre/post-execut...

Posted by agoodm <gi...@git.apache.org>.

Github user agoodm commented on the issue:

    https://github.com/apache/zeppelin/pull/1439
  
    Ok, TODO's are now done. @AhyoungRyu @Leemoonsoo @bzz Please review, I myself am unsure if some of the errors I am seeing in some of the Travis CI builds are related to these changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1439: [ZEPPELIN-1423] Allow users to specify pre/post-execut...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/zeppelin/pull/1439
  
    right, I was referring when the interpreter group is not the default, or there is multiple etc..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1439: [ZEPPELIN-1423] Allow users to specify pre/post-execut...

Posted by agoodm <gi...@git.apache.org>.

Github user agoodm commented on the issue:

    https://github.com/apache/zeppelin/pull/1439
  
    @felixcheung Ok, I have changed the default behavior to now automatically detect the REPL name from the paragraph input text. The Travis CI build isn't running after I rebased my commits to resolve a merge conflict.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1439: [ZEPPELIN-1423] Allow users to specify pre/post-execut...

Posted by agoodm <gi...@git.apache.org>.

Github user agoodm commented on the issue:

https://github.com/apache/zeppelin/pull/1439

@doanduyhai To be precise, I designed the callback registry system with two different types of use cases. The first one is for the different interpreter maintainers to implement in their individual subclasses of `Interpreter`, which would be used for use cases that are specific for better integrating certain features / libraries of that interpreter with zeppelin. Right now a prime example of this is matplotlib integration with python / pyspark as I have discussed in more detail in the JIRA issue.

The second set of use cases are those in which the API is directly exposed to notebook users. Right now this is indeed only fully available in spark / pyspark via the `ZeppelinContext` class. The reason for this is that getting this sort of interface to work with every interpreter would require some sort of language specific API that can be used to expose the JVM (as py4j does with python), hence one reason I labeled it as experimental in the docs. It's a similar limitation to the `angularBind()` and `angularWatch()` API. Keep in mind though that the registry can still be applied for other interpreters if the replName is manually specified, eg:

```scala
%spark
z.registerCallback("post_exec", "ls -l", "sh")
```

Does this help clarify things?

[GitHub] zeppelin pull request #1439: [ZEPPELIN-1423] Allow users to specify pre/post...

Posted by agoodm <gi...@git.apache.org>.

GitHub user agoodm reopened a pull request:

    https://github.com/apache/zeppelin/pull/1439

    [ZEPPELIN-1423] Allow users to specify pre/post-execute callbacks for interpreters

    ### What is this PR for?
    One feature built-in to Jupyter's ipykernel that is currently missing in Zeppelin are pre/post-execute "callbacks". This PR allows for the possibility of entering code that is set to be executed by the interpreter before  the code entered into the paragraph (pre-execute) and after (post-execute). In addition to saving some users the effort of writing code that they want executed in every paragraph, this will help pave the way for more advanced features coming soon, like automatically displaying matplotlib figures without an explicit call to a `show()` function. See the JIRA issue for more details on this.
    
    The core implementation for this is primarily contained in a new class called `InterpreterCallbackRegistry` which heavily mimics the Angular Display system (eg, `AngularObjectRegistry`). While this interface alone should suffice for Interpreter maintainers on the Java/JVM side, I also thought it might be useful to expose the system directly to the users in their notebooks. Hence, an implementation that works with the spark and pyspark interpreters (via the `ZeppelinContext` class) is also included here.
    
    ### What type of PR is it?
    New Feature
    
    ### Todos
    * [x] - Add unit tests
    * [x] - Add new documentation
    
    (Maybe) - Find a better way to check for errors in pre-execute as this can brick the REPL (see more info below)
    
    ### What is the Jira issue?
    [ZEPPELIN-1423](https://issues.apache.org/jira/browse/ZEPPELIN-1423)
    
    ### How should this be tested?
    In a new note, add the following lines of code to a paragraph:
    ```python
    %pyspark
    z.registerCallback("post_exec", "print 'This code should be executed before the parapgraph code!'")
    z.registerCallback("pre_exec", "print 'This code should be executed after the paragraph code!'")
    ```
    
    Then run any other paragraph in the note containing some other code, eg
    ```python
    %pyspark
    print "This code should be entered into the paragraph by the user!"
    ```
    
    The output should be:
    ```
    This code should be executed before the paragraph code!
    This code should be entered into the paragraph by the user!
    This code should be executed after the paragraph code!
    ```
    
    You should also test out the other two methods (`getCallback()` and `unregisterCallback()`) specified in `ZeppelinContext.java`.
    
    One final caveat that should be mentioned: If there are errors in the code you specify for a pre-execute event, it will render the interpreter useless since the current implementation prepends the the code specified in `pre_exec` directly to the paragraph entered code before calling `interpret()`. The current workaround for this would be to either restart the interpreter group or call `unregisterCallback()` via a different REPL within the interpreter group (eg, `z.unregisterCallback("pre_exec", "pyspark")` from the spark interpreter). I would appreciate if anyone here would be willing to share any better approaches here.
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? Yes


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/agoodm/zeppelin ZEPPELIN-1423

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1439.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1439
    
----
commit eb26886986ec067a6104fa5de6f8bd99deb2ee1b
Author: Alex Goodman <ag...@users.noreply.github.com>
Date:   2016-09-13T20:32:36Z

    First attempt at adding pre/post-execute callbacks
    
    Added support for pre/post-execute callback hooks to interpreters

commit f50eccaa56889f0feda4a48a2222226df6596d53
Author: Alex Goodman <ag...@users.noreply.github.com>
Date:   2016-09-19T22:48:34Z

    Added support for user defined callbacks in spark/pyspark interpreters

commit 1594dfbcb2df805705143df913e9a5eb63c6be18
Author: Alex Goodman <ag...@users.noreply.github.com>
Date:   2016-09-20T18:14:47Z

    Deleted unused files

commit e1ba95b4e8a8c46ac8f685d64ff1cd59ea9c71c1
Author: Alex Goodman <ag...@users.noreply.github.com>
Date:   2016-09-20T18:16:28Z

    Regenerated thrift files using v0.9.2

commit b8265fea798c92a411324a2eb3a73f245b8e50ee
Author: Alex Goodman <ag...@users.noreply.github.com>
Date:   2016-09-20T20:13:33Z

    Added unit tests for InterpreterCallbackRegistry

commit 3e7c8520786190891e70762828406f5ff7b431b0
Author: Alex Goodman <ag...@users.noreply.github.com>
Date:   2016-09-20T21:02:19Z

    Cleaned up ZeppelinContext callback registry API

commit 5822ecd85ec19c156ee7722fcab90dac675e94dd
Author: Alex Goodman <ag...@users.noreply.github.com>
Date:   2016-09-20T21:11:25Z

    Added documentation for callback registry system

commit ffc02f7f1e29d7e635ee7562d493fd305a45479a
Author: Alex Goodman <ag...@users.noreply.github.com>
Date:   2016-09-23T20:49:02Z

    Get default replName for callback registry from paragraph

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1439: [ZEPPELIN-1423] Allow users to specify pre/post-execut...

Posted by agoodm <gi...@git.apache.org>.

Github user agoodm commented on the issue:

    https://github.com/apache/zeppelin/pull/1439
  
    Thank you very much for the detailed explanation. To be precise, I made the `cmd` argument that gets registered a String solely because `Interpreter.interpret()` accepts a String representation of the code rather than actual callable objects. I think actually being able to do that (eg, use some API like ` z.registerCallback("post_exec", callback_func)` rather than `z.registerCallback("post_exec", "callback_func()") `would depend on the language that's being interpreted and what sort of libraries are available to connect it to the JVM (eg py4j with python). 
    
    Nevertheless I think you have made a very compelling point. The source of my problem with my first implementation of the registry system using the `Interpreter` class was that I only considered attempting an implementation of pre/post execute from a Paragraph execution perspective. This is why I thought that I needed to broadcast changes in the registry to the `ZeppelinServer`, as I processed the registry from within the `Paragraph` class rather than from the `RemoteInterpreterServer` class. As a result of your explanation I think I understand everything a lot better now, so I'll go ahead and work on an implementation tomorrow which works exclusively through the interpreter process.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1439: [ZEPPELIN-1423] Allow users to specify pre/post-execut...

Posted by agoodm <gi...@git.apache.org>.

Github user agoodm commented on the issue:

    https://github.com/apache/zeppelin/pull/1439
  
    Going to close this for now since I have made another implementation on a separate branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1439: [ZEPPELIN-1423] Allow users to specify pre/post...

Posted by agoodm <gi...@git.apache.org>.

Github user agoodm closed the pull request at:

    https://github.com/apache/zeppelin/pull/1439


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1439: [ZEPPELIN-1423] Allow users to specify pre/post-execut...

Posted by Leemoonsoo <gi...@git.apache.org>.

Github user Leemoonsoo commented on the issue:

https://github.com/apache/zeppelin/pull/1439

When `callback registry` places in `ZeppelinServer` process and want to let user register callback routine from the `interpreter` by exposing user `callback registry` api, then callback routine need to be serialized from `Interpreter` to send to the `ZeppelinServer`. That's why i think you send string representation of code, which is callback routine, through thrift message.

If callback registry places inside of `Interpreter` instance, `Interpreter` implementation can directly access `callback registry`. So callback routine does not need to be serialized. That means not only string representation of code, but also some java object (runnable object) can be registered into the registry. And then [RemoteInterpreterServer](https://github.com/apache/zeppelin/blob/master/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterServer.java#L390)] can run callback routines before and after it invokes `Interpreter.interpret()` method.

So ZeppelinServer doesn't really need to know callback registry, i think. Or are there reasons and use cases that changes of callback registry need to be broadcasted to ZeppelinServer?

[GitHub] zeppelin issue #1439: [ZEPPELIN-1423] Allow users to specify pre/post-execut...

Posted by doanduyhai <gi...@git.apache.org>.

Github user doanduyhai commented on the issue:

    https://github.com/apache/zeppelin/pull/1439
  
    Question, how interpreters other than Spark can leverage this new feature ? I mean you added a new `registerCallback()` function for ZeppelinContext object but this context is not exposed for **shell** or **angular** interpreter for example. How can people use the feature in this case ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1439: [ZEPPELIN-1423] Allow users to specify pre/post-execut...

Posted by agoodm <gi...@git.apache.org>.

Github user agoodm commented on the issue:

    https://github.com/apache/zeppelin/pull/1439
  
    @felixcheung Sure thing, I can do that in a bit.
    
    As for the issue of the default value for `replName`, are you simply referring to the use case for different interpreters, or some variants of pyspark itself (eg, `%<something>.pyspark`)? If it's the latter, then I do agree that there should be a more elegant way to pick the default value in case the spark interpreter group is not set as the default, so any good ideas on how to go about this are appreciated. If you are just talking about how this is implemented on the scala (%spark) side though, I implemented a similar convenience method in the original `ZeppelinContext` class, but instead assumes "spark" is the default `replName`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---