You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by HeartSaVioR <gi...@git.apache.org> on 2015/12/28 23:19:09 UTC

[GitHub] incubator-zeppelin pull request: ZEPPELIN-539 RemoteInterpreter He...

GitHub user HeartSaVioR opened a pull request:

    https://github.com/apache/incubator-zeppelin/pull/576

    ZEPPELIN-539 RemoteInterpreter Heartbeat (WIP)

    ### What is this PR for?
    
    To help users to determine remote interpreter is not able to respond.
    This is just WIP, and "How to help users" could be improved with discussions.
    
    ### What type of PR is it?
    
    Feature (Improve?)
    
    ### Todos
    * [ ] - Rebase when #574 is merged to master
      * since shutdowning heartbeat threads requires reference count to be zero
      * without #574 some threads could be alive although remote interpreter is closed
    * [ ] - Discuss proper values for sending heartbeat interval, checking timeout interval
    * [ ] - Discuss how to let users know when remote interpreter is timed out
    * [ ] - Discuss possible way to restore remote interpreter back to normal
    
    ### Is there a relevant Jira issue?
    
    https://issues.apache.org/jira/browse/ZEPPELIN-539
    
    ### How should this be tested?
    
    1. run a spark paragraph to ensure spark remote interpreter process is run
    2. kill -9 to spark remote interpreter process
    3. run paragraph again (it may show broken pipe, or connection refused after #575)
    4. wait 30 secs (or remote interpreter connection timeout value) to let RemoteInterpreterProcess classifies process to be timed out
    5. run paragraph again (it shows org.apache.zeppelin.interpreter.InterpreterProcessHeartbeatFailedException to users)
    
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? (No)
    * Is there breaking changes for older versions? (No)
    * Does this needs documentation? (Maybe no)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HeartSaVioR/incubator-zeppelin ZEPPELIN-539-WIP-v1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-zeppelin/pull/576.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #576
    
----
commit b5a75bd4d4bf2293469d4210c9035575029a8d85
Author: Jungtaek Lim <ka...@gmail.com>
Date:   2015-12-28T22:14:46Z

    ZEPPELIN-539 RemoteInterpreter Heartbeat
    
    * introduce "ping" function to thrift
    * every remote interpreter processes will have two additional threads
      * send "ping" to check that remote interpreter process is able to respond
      * check last heartbeat timestamp and determine it's timed out
    * introduce InterpreterProcessHeartbeatFailedException
      * thrown when remote interpreter process is determined to timed out

commit d27152ee1400930401a62bba4ce96948e4ebae16
Author: Jungtaek Lim <ka...@gmail.com>
Date:   2015-12-28T22:16:52Z

    ZEPPELIN-539 Add missing license header

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: ZEPPELIN-539 RemoteInterpreter He...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/576#issuecomment-167719456
  
    Actually this is adoption of https://github.com/apache/storm/pull/286 from Apache Storm.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: ZEPPELIN-539 RemoteInterpreter He...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/576#issuecomment-167668813
  
    I couldn't create an unit test for this since it requires remote interpreter process to be stopped or killed.
    Please share the idea if you have one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: ZEPPELIN-539 RemoteInterpreter He...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/576#issuecomment-167670085
  
    > Discuss proper values for sending heartbeat interval, checking timeout interval
    
    Currently, RemoteInterpreterProcess sends heartbeat every 1 sec, and also checks timeout every 1 sec.
    Please let me know if you think it is too often.
    
    > Discuss how to let users know when remote interpreter is timed out
    
    Currently, Zeppelin can notice users via showing InterpreterProcessHeartbeatFailedException when executing any client-related works.
    Please share your ideas to improve this feature.
    
    > Discuss possible way to restore remote interpreter back to normal
    
    At first, I thought it would be good to force-kill and re-run remote interpreter process automatically when timed out. But remote interpreter process is stateful so users may not want it.
    #480 could be a good way to let users restart interpreter by hand if problem occurs.
    Please share your ideas to improve this feature.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: ZEPPELIN-539 RemoteInterpreter He...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/576#issuecomment-167691292
  
    We can even provide way to validate Client instance with new "ping".
    
    1. Implement validateObject() to ClientFactory.
    ```
      @Override
      public boolean validateObject(PooledObject<Client> p) {
        final Client client = p.getObject();
    
        try {
          return client.ping().equals("pong");
        } catch (Exception e) {
          return false;
        }
      }
    ```
    
    2. Build a strategy to exclude invalid objects via providing GenericObjectPoolConfig to GenericObjectPool.
    
    https://commons.apache.org/proper/commons-pool/apidocs/org/apache/commons/pool2/impl/GenericObjectPoolConfig.html
    
    We can even set max idle, min idle, max total which helps to control total thrift connections per remote interpreter process.
    
    If we think it would be better to have, I'll address to this PR, or another PR. (when we don't want to merge this PR)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---