You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@livy.apache.org by "Bikas Saha (Jira)" <ji...@apache.org> on 2019/12/30 16:28:00 UTC

[jira] [Issue Comment Deleted] (LIVY-718) Support multi-active high availability in Livy

     [ https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bikas Saha updated LIVY-718:
----------------------------
    Comment: was deleted

(was: There could be in-memory state in the livy server but could that be re-created from the state in the Spark driver with an initial sync operation?

If not, then what additional metadata could be stored in the Spark drive to make it happen?

The ideal situation would be (keeping in mind Meisam's observations)
 # Any livy client can hit any livy server and continue from where it was. The first time a livy server is hit for a session it may take some time to hydrate the state in case it was not done in the background.
 ## Note that this can happen even without any livy server failure in cases where a load balancer is running in front of the livy server and sticky sessions are not working or there is too much hot-spotting.
 # A livy server can (with some extra sync operation if needed) service any session from that sessions Spark driver. The only information it needs is the information of how to connect with the Spark driver. That could be stored in a reliable state store (e.g. even in a YARN application tag for YARN clusters)

If we can achieve the above then the system could be much simpler to operate and work with.

IIRC JDBC had a REST and an RPC mode. The RPC mode might not be HA without a fat client but perhaps the REST mode could. Does Hive JDBC support HA on the Hive Thrift server? Then maybe the hive JDBC client now supports server side transitions. If not, then we may have the caveat that HA won't work for such connections. I am not super familiar with the JDBC client.

 )

> Support multi-active high availability in Livy
> ----------------------------------------------
>
>                 Key: LIVY-718
>                 URL: https://issues.apache.org/jira/browse/LIVY-718
>             Project: Livy
>          Issue Type: Epic
>          Components: RSC, Server
>            Reporter: Yiheng Wang
>            Priority: Major
>
> In this JIRA we want to discuss how to implement multi-active high availability in Livy.
> Currently, Livy only supports single node recovery. This is not sufficient in some production environments. In our scenario, the Livy server serves many notebook and JDBC services. We want to make Livy service more fault-tolerant and scalable.
> There're already some proposals in the community for high availability. But they're not so complete or just for active-standby high availability. So we propose a multi-active high availability design to achieve the following goals:
> # One or more servers will serve the client requests at the same time.
> # Sessions are allocated among different servers.
> # When one node crashes, the affected sessions will be moved to other active services.
> Here's our design document, please review and comment:
> https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)