You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by karuppayya <gi...@git.apache.org> on 2017/02/12 15:06:46 UTC

[GitHub] zeppelin pull request #2011: ZEPPELIN-2102: Restart interpreter automaticall...

GitHub user karuppayya opened a pull request:

    https://github.com/apache/zeppelin/pull/2011

    ZEPPELIN-2102: Restart interpreter automatically

    ### What is this PR for?
    Sparkcontext is shutdown in following cases
    1. calling `sc.stop` explicitly
    In this case, the user has to manually restart the interpreter to submit his next spark job
    2. OOM error from spark
    Restarting of interpreter will not help. The only recovery is restarting zeppelin server
    I have not enumerated all the cases in which sc can go down. There might be other cases.
    In this PR, restarting interpreter automatically
    
    ### What type of PR is it?
    Improvement
    
    ### Todos -NA
    
    ### What is the Jira issue?
    ZEPPELIN-2102
    
    ### How should this be tested?
    1. Run `sc.stop`. Subsequently run another paragraph
    2.  Cause spark driver to crash due to OOM, run subsequent spark paragraph
    
    ### Screenshots (if appropriate) NA
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/karuppayya/incubator-zeppelin ZEPPELIN-2102

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/2011.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2011
    
----
commit 752a42368601f1b531654c6e9fab139b8b5b430f
Author: karuppayya <ka...@qubole.com>
Date:   2017-02-12T14:38:29Z

    Restart interpreter automatically

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2011: ZEPPELIN-2102: Restart interpreter automatically

Posted by zjffdu <gi...@git.apache.org>.

Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2011
  
    For case 1, why not creating a new SparkContext ? `sc.stop` only cause the spark app shutdown, but the remote interpreter process should still be alive. 
    Overall, I don't think restarting `SparkContext` implicitly for user is a proper solution. This might cause confusion for users, as creating new `SparkContext` means all the historical state is lost, user have to rerun all the paragraphs. One proper solution I can think of is that send a warning message to front end to tell user the SparkContext is dead for some unknown reason, you need to either create a new one or restarting the interpreter.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2011: ZEPPELIN-2102: Restart interpreter automatically

Posted by hero0926 <gi...@git.apache.org>.

Github user hero0926 commented on the issue:

    https://github.com/apache/zeppelin/pull/2011
  
    We're meeting same problem as karuppayya. when we trying to unpersist in pyspark+zeppelin, memory dosen't released - so sc.stop for second hand, zeppelin interpreter dies... I concern OOM error couldn't be escaped but is there any possible way to use pyspark without this problem.
    
    @karuppayya , how'bout uncache and make another sc or make sc to null several times.. I think make sc newer in code is alter way to escape this problem.


---

[GitHub] zeppelin issue #2011: ZEPPELIN-2102: Restart interpreter automatically

Posted by zjffdu <gi...@git.apache.org>.

Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2011
  
    I don't think zeppelin should do extra things for spark interpreter.  This would cause confusion for user. Interpreter should do general thing for user, any specific thing should be handled by users.  You might want to kill the remote process after `sc.stop`, but this doesn't mean the same thing for other users.  They may want to keep the process alive to do scala or python things.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #2011: ZEPPELIN-2102: Restart interpreter automaticall...

Posted by karuppayya <gi...@git.apache.org>.

Github user karuppayya closed the pull request at:

    https://github.com/apache/zeppelin/pull/2011


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2011: ZEPPELIN-2102: Restart interpreter automatically

Posted by karuppayya <gi...@git.apache.org>.

Github user karuppayya commented on the issue:

    https://github.com/apache/zeppelin/pull/2011
  
    @zjffdu Yes the remote process will still be up which will consume as much memory as configured for driver. In a multi user environment, we might want to release the resources as soon as the user is done with his application. If a user runs `sc.stop` once he is done, we can kill the remote process to allow for other processes. 
    +1, for showing appropriate message on UI.
    Will adding a zeppelin specific interpreter setting  to gate this be helpful,.?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2011: ZEPPELIN-2102: Restart interpreter automatically

Posted by karuppayya <gi...@git.apache.org>.

Github user karuppayya commented on the issue:

    https://github.com/apache/zeppelin/pull/2011
  
    @zjffdu Thanks for your feedback.
    The change is not specific to spark interpreter.  
    It is generic so that any other interpreter also can initiate a restart .
    I was targeting free-ing up of resources once the spark application completes
    I was not aware of the use cases for using spark interpreter as scala, python REPl after sparkcontext is stopped.Then this change is not helpful.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2011: ZEPPELIN-2102: Restart interpreter automatically

Posted by karuppayya <gi...@git.apache.org>.

Github user karuppayya commented on the issue:

    https://github.com/apache/zeppelin/pull/2011
  
    @felixcheung I am not able to repro this scenario now. Restart works fine(Will update the description)
    When spark goes OOM, the subsequent para runs throw connection refused exception. This can be avoided by restart of interpreter.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---