You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by prabhjyotsingh <gi...@git.apache.org> on 2017/02/18 03:38:23 UTC

[GitHub] zeppelin pull request #2034: [ZEPPELIN-2133] All interpreters sometimes thro...

GitHub user prabhjyotsingh opened a pull request:

    https://github.com/apache/zeppelin/pull/2034

    [ZEPPELIN-2133] All interpreters sometimes throw random Connection refuse errors

    ### What is this PR for?
    When shiro authentication is enabled, then at times on restarting an interpreter doesn't kill(s) the remote interpreter process. 
    
    ### What type of PR is it?
    [Bug Fix]
    
    ### What is the Jira issue?
    * [ZEPPELIN-2133](https://issues.apache.org/jira/browse/ZEPPELIN-2133)
    
    ### How should this be tested?
     - enable shiro authentication
     - run atleast 2/3 different interpreter, wait for paragraph to complete
     - not restart all of these interpreter
     - repeat step 2
     - from command line do (in this example I'm testing if livy was killed or not) `ps aux | grep interpreter | grep livy` and observe there would be 2x processes running.
    
    ### Screenshots (if appropriate)
    Before 
    ![process-test-before](https://cloud.githubusercontent.com/assets/674497/23089994/1fb0064c-f5b9-11e6-80fd-84b08e4e8663.gif)
    
    After
    ![process-test-after](https://cloud.githubusercontent.com/assets/674497/23090016/ac31b552-f5b9-11e6-8f11-8a66aef08192.gif)
    
    
    
    ### Questions:
    * Does the licenses files need update? N/A
    * Is there breaking changes for older versions? N/A
    * Does this needs documentation? N/A


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/prabhjyotsingh/zeppelin ZEPPELIN-2133

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/2034.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2034
    
----
commit c1b711034269054013bbcce3460dc7e427d3b9b9
Author: Prabhjyot Singh <pr...@gmail.com>
Date:   2017-02-18T03:26:53Z

    ZEPPELIN-2133: merge logic for close InterpreterGroup

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by prabhjyotsingh <gi...@git.apache.org>.
Github user prabhjyotsingh commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    Sure I see your point. Just one more case that I wanted to discuss; say in an enterprise world where not all users are allowed to access settings page, or let me put it this way there is a specific group say "admin-grp" that only has access to this page. Now when this user hits restart, expectations will be that it should restart for all users.
    Is this a valid expectation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by prabhjyotsingh <gi...@git.apache.org>.
Github user prabhjyotsingh commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    Before even implemeting a solution I thought to discuss as to what I think the behaviour should be.
    
     \                        |	Global              | per note                | per user                     | per user + Impersonate
    ---|----|---|---|----
    Restart form Setting page | All processes terminate | All processes terminate |         ?                    | Terminate this user's process
    Edit + Save               | All processes terminate | All processes terminate | All process terminate        | All processes terminate
    Restart from note         | All processes terminate | All processes terminate | derefrence only current user | Terminate this user's process
    
    
    Let me know you thoughts what do you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by prabhjyotsingh <gi...@git.apache.org>.
Github user prabhjyotsingh commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    Yes, agreed, documentation is one part, but, other part being how should Zeppelin or `RemoteInterpreterProcess` handle connection is more important. 
    
    So, in the same example as describe above where livy's default mode is `per user`, and here "admin-grp" goes and changes some setting say mode from "yarn-cluster" to "yarn-client" (a hypothetical example) saves setting and restart. Now in current implementation, except for this user others that connected to livy's server before this user "admin" are still submitting to "yarn-cluster", which to me is wrong; because the source of truth should have been Zeppelin's interpreter setting page, new the users that were connected to this interpreter before this change are behaving different, and new users will behave different.
    
    So, IMHO until we have individual interpreter setting for individual user (or we support the scenario described above https://github.com/apache/zeppelin/pull/2034#issuecomment-280888650); whenever restart or edit and saved is clicked, all processes should terminate.
    
    Let me know what you think, and any other thought that I might be missing ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by prabhjyotsingh <gi...@git.apache.org>.
Github user prabhjyotsingh commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    Yes, I understand this will violate that [ZEPPELIN-1770](https://issues.apache.org/jira/browse/ZEPPELIN-1770). But right now I see following problem/challenges, and that is why I would like to call following functionality broken.
    
     - Editing an interpreter causes inconsistent config being used among users; Time-of-check Time-of-use (TOCTOU)
     - Deleting an interpreter, deletes the interpreter closes the connection, but doesn't terminates the process.
     -  Support for scenario described above https://github.com/apache/zeppelin/pull/2034#issuecomment-280888650. 
    IMO "admin-grp" user(s) need not manually check for related running processes and kill those; if they choose to restart or edit an Interpreter's config.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    Yeah, that seems a valid scenario. And looks like currently there's no document for these behavior, I think we need to add doc for it as well.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by prabhjyotsingh <gi...@git.apache.org>.
Github user prabhjyotsingh commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    @zjffdu have made some changes to honour ZEPPELIN-1770, but this will be only when user restarts from notebook. But if a restart, edit or delete is made to an interpreter from interpreter setting page, it will kill all the process of the respective interpreter.
    
    Let me know you thought.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by prabhjyotsingh <gi...@git.apache.org>.
Github user prabhjyotsingh commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    @FireArrow  Yes, agreed, I too believe that there are few minimum expectation as an Admin, and as a end user that Zeppelin should work in desired way.
    
    So, if we all can come to a common conclusion, I can implement those and write a doc for the as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #2034: [ZEPPELIN-2133] All interpreters sometimes thro...

Posted by prabhjyotsingh <gi...@git.apache.org>.
Github user prabhjyotsingh closed the pull request at:

    https://github.com/apache/zeppelin/pull/2034


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by prabhjyotsingh <gi...@git.apache.org>.
Github user prabhjyotsingh commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    Did a force push to re-trigger CI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by prabhjyotsingh <gi...@git.apache.org>.
Github user prabhjyotsingh commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    @jongyoul @Leemoonsoo  @cloverhearts Please help review this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by prabhjyotsingh <gi...@git.apache.org>.
Github user prabhjyotsingh commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    @cloverhearts  I see your point, let me try to improvise, also I'm open for suggestion. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by FireArrow <gi...@git.apache.org>.
Github user FireArrow commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    I think restart from settings in the table above should have the same behavior as edit+save. The settings page is generally considered an administrative function, and as long as there is a central page for the settings I think the only reasonable behavior would be to apply it to all users, impersonate or not.
    
    That being said, I agree that terminating is a good fix for now. In our environment we actually have a script that kills of all lingering sessions on zeppelin restart, because they don't terminate on their own (and I'm pretty sure it's related to what is discussed here)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    Agree with @cloverhearts, first we need to identify what's wrong in the current logic for the long term solution. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by FireArrow <gi...@git.apache.org>.
Github user FireArrow commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    There should probably be a jira to decide the effect of restarting in different locations.
    For example:
    As an admin I expect the restart button in Interpreter to force a restart of all users interprete's because I clearly have made config changes that everyone must have
    As a user I expect the restart button in a note to restart my interpreter, for that note only if isolation is in place.
    
    But this is just how I expect it to work


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by cloverhearts <gi...@git.apache.org>.
Github user cloverhearts commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    
    > Leemoonsoo 3 hours ago Member
    I think void close(String sessionId) supposed not necessarily terminate process while an interpreter process can have multiple sessions.
    
    I also agree with @Leemoonsoo's opinion.
    The "close () method" should only serve to close the session.
    "Close () method" To force a process termination call,
    It is very effective in this issue, but it has the potential to become a problem in the future.
    (Especially about user management functions or roles for sessions)
    
    **In my personal opinion, we need to identify this as an increase and a decrease in the erroneous reference.**
    In the past we actually had a separate code to forcefully terminate the "Remote Interpreter".
    
    Ref Code (destroy) : https://github.com/apache/zeppelin/blob/branch-0.6/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/InterpreterGroup.java#L204
    remove PR : https://github.com/apache/zeppelin/commit/c6c67d7b112eceb36846e03207cb9c37519f1730
    
    Actually this is very similar to the current code.
    Actually, this problem is almost solved by recovering to the point of the code of the corresponding PR or adding the content of the problem.
    However, in my opinion, the essence of this problem is not a "zombie process"
    I think the **session management is wrong.**
    If you do not know what to do with the destroy () method, you can simply restore it.
    But, in my humble opinion, the deletion of the "destory () method" is not the wrong way.
    Of course I also need more specific research on this part.
    
    what do you think?
    
    Sorry. @prabhjyotsingh 
    I think you presented a good solution.
    But I think I have to approach this problem carefully.
    Please tell us your opinion at any time.
    Thank you. Have a good day.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    @prabhjyotsingh 
    Do you mind turn on "Build pushes" option? Then CI build status will correctly displayed in this page.
    ![image](https://cloud.githubusercontent.com/assets/1540981/23090986/9ee1e2b6-f5ee-11e6-85a3-f0d21b467ae0.png)
    
    You can find the option at https://travis-ci.org/prabhjyotsingh/zeppelin/settings



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by prabhjyotsingh <gi...@git.apache.org>.
Github user prabhjyotsingh commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    I've pushed few commits now in which restart/edit/delete of any interpreter from any where will close/terminate all the process of that interpreter.
    
    Let me know your thoughts on this one.
    
    @FireArrow would you like to test it out ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    I think we don't need to differentiate Global/Per Note/Per User and/Per User + Impersonate. It should just close current session. And let `RemoteInterpreterProcess` to determinate whether to shut down the process. So that means for Global, there should be only one session, the process will be terminated if restart is invoked. And for `Per user`, it should only close current session, and will terminate the process if there's no session associated with the process (we can even make the process wait there for a while in case new session will be launched soon, but this is a another story). If the behavior is not what we expect, then there must be something wrong with `RemoteInterpreterProcess`. Currently we don't have a very clear and dedicated component for session management, `RemoteInterpreterProcess` and `InterpreterGroup` seems kind of session manager.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    Thanks @prabhjyotsingh, Is it possible to keep the behavior consistent between notebook and interpreter setting page ? Because I am afraid it would confuse users when we introduce different behavior between those 2 places. And personally I still don't feel restarting the process is the proper approach. BTW, should we start a thread in user and dev mail list to discuss about this ? Because I think it would affect all users. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2034: [ZEPPELIN-2133] All interpreters sometimes throw rando...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2034
  
    @prabhjyotsingh Would this violate the behavior of [ZEPPELIN-1770](https://issues.apache.org/jira/browse/ZEPPELIN-1770) ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---