You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by liuxun <ne...@163.com> on 2018/07/17 08:46:35 UTC

Zeppelin distributed architecture design

hi:

Our company installed and deployed a lot of zeppelin for data analysis. The single server version of zeppelin could not meet our application scenarios, so we transformed zeppelin into a clustered service that supports distributed deployment, Have a unified entrance, high availability, and High server resource usage.  the email attachment is the entire design document, I am very happy to feedback our modified code back to the community.


this is the JIRA I submitted in the community,

https://issues.apache.org/jira/browse/ZEPPELIN-3471 <https://issues.apache.org/jira/browse/ZEPPELIN-3471>


Since the design document size exceeds the mail attachment size limit, the document link address has to be sent.
https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%20design.pdf <https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%20design.pdf>
https://issues.apache.org/jira/secure/attachment/12931895/zepplin%20Cluster%20Sequence%20Diagram.png <https://issues.apache.org/jira/secure/attachment/12931895/zepplin%20Cluster%20Sequence%20Diagram.png>


liuxun

Re: Zeppelin distributed architecture design

Posted by vincent gromakowski <vi...@gmail.com>.

good job ! it seems to be very interesting

2018-07-18 7:30 GMT+02:00 liuxun <ne...@163.com>:

> hi，Ruslan Dautkhanov
>
> Thank you very much for your question. according to your advice, I added 3
> schematics to illustrate.
> 1. Distributed Zeppelin Deployment architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>
>
> The email attachment exceeded the size limit, so I reorganized the
> document and updated it with Google Docs.
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
> VDKCRRBm-Qa3Bw/edit?usp=sharing
>
>
> 在 2018年7月18日，下午1:03，liuxun <ne...@163.com> 写道：
>
> hi，Ruslan Dautkhanov
>
> Thank you very much for your question. according to your advice, I added 3
> schematics to illustrate.
> 1. Zeppelin Cluster architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>
> Later, I will merge the schematic into the system design document.
>
> <Zeppelin system architecture diagram00.png>
>
>
> <Distributed zeppelin Server fault tolerance diagram 1.png>
>
>
>
> <Distributed zeppelin Server fault tolerance diagram 2.png>
>
>
>
> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <da...@gmail.com> 写道：
>
> Nice.
>
> Thanks for sharing.
>
> Can you explain how are users routed into a particular zeppelin server
> instance? I've seen nginx on top of them, but I don't think the document
> covers details? If one zeppelin server goes down or unhealthy, is nginx
> supposed to detect (if so, how?) that and reroute users to a survived
> instance?
>
> Thanks,
> Ruslan Dautkhanov
>
>
> On Tue, Jul 17, 2018 at 2:46 AM liuxun <ne...@163.com> wrote:
>
> hi:
>
> Our company installed and deployed a lot of zeppelin for data analysis.
> The single server version of zeppelin could not meet our application
> scenarios, so we transformed zeppelin into a clustered service that
> supports distributed deployment, Have a unified entrance, high
> availability, and High server resource usage.  the email attachment is the
> entire design document, I am very happy to feedback our modified code back
> to the community.
>
>
> this is the JIRA I submitted in the community,
>
> https://issues.apache.org/jira/browse/ZEPPELIN-3471
>
>
> Since the design document size exceeds the mail attachment size limit, the
> document link address has to be sent.
>
> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%
> 20distributed%20architecture%20design.pdf
>
> https://issues.apache.org/jira/secure/attachment/
> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png
>
>
> liuxun
>
>
>
>

Re: Zeppelin distributed architecture design

Posted by vincent gromakowski <vi...@gmail.com>.

good job ! it seems to be very interesting

2018-07-18 7:30 GMT+02:00 liuxun <ne...@163.com>:

> hi，Ruslan Dautkhanov
>
> Thank you very much for your question. according to your advice, I added 3
> schematics to illustrate.
> 1. Distributed Zeppelin Deployment architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>
>
> The email attachment exceeded the size limit, so I reorganized the
> document and updated it with Google Docs.
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
> VDKCRRBm-Qa3Bw/edit?usp=sharing
>
>
> 在 2018年7月18日，下午1:03，liuxun <ne...@163.com> 写道：
>
> hi，Ruslan Dautkhanov
>
> Thank you very much for your question. according to your advice, I added 3
> schematics to illustrate.
> 1. Zeppelin Cluster architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>
> Later, I will merge the schematic into the system design document.
>
> <Zeppelin system architecture diagram00.png>
>
>
> <Distributed zeppelin Server fault tolerance diagram 1.png>
>
>
>
> <Distributed zeppelin Server fault tolerance diagram 2.png>
>
>
>
> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <da...@gmail.com> 写道：
>
> Nice.
>
> Thanks for sharing.
>
> Can you explain how are users routed into a particular zeppelin server
> instance? I've seen nginx on top of them, but I don't think the document
> covers details? If one zeppelin server goes down or unhealthy, is nginx
> supposed to detect (if so, how?) that and reroute users to a survived
> instance?
>
> Thanks,
> Ruslan Dautkhanov
>
>
> On Tue, Jul 17, 2018 at 2:46 AM liuxun <ne...@163.com> wrote:
>
> hi:
>
> Our company installed and deployed a lot of zeppelin for data analysis.
> The single server version of zeppelin could not meet our application
> scenarios, so we transformed zeppelin into a clustered service that
> supports distributed deployment, Have a unified entrance, high
> availability, and High server resource usage.  the email attachment is the
> entire design document, I am very happy to feedback our modified code back
> to the community.
>
>
> this is the JIRA I submitted in the community,
>
> https://issues.apache.org/jira/browse/ZEPPELIN-3471
>
>
> Since the design document size exceeds the mail attachment size limit, the
> document link address has to be sent.
>
> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%
> 20distributed%20architecture%20design.pdf
>
> https://issues.apache.org/jira/secure/attachment/
> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png
>
>
> liuxun
>
>
>
>

Re: Zeppelin distributed architecture design

Posted by liuxun <ne...@163.com>.

hi，

I have submitted the first module of the zeppline cluster upgrade, please help me review the code, thank you!
https://github.com/apache/zeppelin/pull/3156 <https://github.com/apache/zeppelin/pull/3156>

I updated the atomix algorithm library module in the system design documentation, please click on the link below to browse.
https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8 <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8>



> 在 2018年8月11日，上午10:36，liuxun <ne...@163.com> 写道：
> 
> hi，
> 
> After 2 weeks of development, I have completed the development of upgrading copycat to the atomix algorithm library.
> The reason for the increased workload is the need to resolve the problem of netty package conflicts. Now it has been used on our intra-company clusters using the atomix algorithm.
> 
> Because atomix uses the 4.1.27-Final version of the netty JAR package.
> If you put the high version of the netty package directly in ./zeppelin/lib or the ./zeppelin/interpreter path, it will conflict with the netty package version of spark, causing the spark-interpreter to fail.
> Need to be isolated in zeppelin-server and interpreter-process by loading the atomix netty JAR and the netty package in the classpath through the custom classloader.
> 
> I updated the atomix algorithm library module in the system design documentation, please click on the link below to browse.
> 
> Atomix Raft algorithm library
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8 <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8>
> 
> I will send a new code to submit the pull, please help me merge it, thank you.
> 
> Thanks,
> Xun Liu
> 
>> 在 2018年7月24日，下午12:57，liuxun <neliuxun@163.com <ma...@163.com>> 写道：
>> 
>> @Jongyoul Lee：
>> Thank you for your attention.
>> 
>> Indeed, as you said, the `Copycat` project has been closed and has been migrated to `https://github.com/atomix/atomix` <https://github.com/atomix/atomix%60>.
>> 
>> I also considered this issue during development.
>> The main reason was that it was enough to realize Raft using `Copycat` at the time, and it was not considered too long.
>> 
>> Today, I took a look at the documentation of atomix, https://atomix.io/docs/latest/user-manual/ <https://atomix.io/docs/latest/user-manual/> , 
>> which has a lot of features, such as broadcasting messages in the cluster, detecting cluster events... ,
>> From the perspective of zeppelin's long-term development, it is better to use atomix.
>> So, I will switch the Raft protocol algorithm library to atomix, which is not difficult to modify.
>> 
>> Struggle for zeppelin!!! :-)
>> 
>> 
>>> 在 2018年7月24日，上午9:35，Jongyoul Lee <jongyoul@gmail.com <ma...@gmail.com>> 写道：
>>> 
>>> First of all, thank you for your effort and contribution.
>>> 
>>> I read it carefully today, and personally, it's a very nice feature and
>>> idea.
>>> 
>>> Let's discuss it and improve more concretely. I also left comments on the
>>> doc.
>>> 
>>> And I have a simple question.
>>> 
>>> `Copycat`, which you used to implement it, is deprecated by owner[1] and
>>> moved under https://github.com/atomix/atomix/ <https://github.com/atomix/atomix/>. I'm afraid of it. Do you
>>> have any reason to use this library? It's even SNAPSHOT version.
>>> 
>>> Regards,
>>> JL
>>> 
>>> [1]: https://github.com/atomix/copycat <https://github.com/atomix/copycat>
>>> 
>>> On Sat, Jul 21, 2018 at 2:07 AM, liuxun <neliuxun@163.com <ma...@163.com>> wrote:
>>> 
>>>> HI：
>>>> 
>>>> In order to more intuitively express the actual use of distributed
>>>> zeppelin clusters.
>>>> I updated this design document, starting with the 16th page of the
>>>> document, adding 2 GIF animations showing the operation record screen of
>>>> the zeppelin cluster we are using now.
>>>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu>
>>>> VDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/ <https://docs.google.com/document/d/>
>>>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>
>>>> 
>>>> Distributed clustered zeppelin is already in use at our company, and the
>>>> recorded screens are all real.
>>>> The first recorded screens GIF shows the following
>>>> Create a cluster of three zeppelin servers
>>>> Add 234, 235, 236 to the zeppelin.cluster.addr attribute in
>>>> zeppelin-site.xml to create a cluster
>>>> Start these 3 servers at the same time
>>>> Open the web pages of these 3 servers and prepare for the notebook
>>>> operation.
>>>> 
>>>> 
>>>> The second recorded screens GIF shows the following
>>>> Create an interpreter process in the cluster
>>>> Create a notebook on host234 and execute it, This action will create an
>>>> interpreter process in the server with free resources in the cluster.
>>>> You can then continue editing this notebook on host235 and execute it, You
>>>> can return results immediately without waiting for the time to create an
>>>> interpreter process.
>>>> Again, you can continue to edit this notebook on host236. And execute it,
>>>> you can return results immediately without waiting for the time to create
>>>> the interpreter process
>>>> The same notebook will reuse the first created interpreter process, so you
>>>> can get the execution result immediately on any server.
>>>> By looking at the background server process, you will find that host234,
>>>> host235, and host235 use the same interpreter process for the same notebook.
>>>> 
>>>> Originally, I wanted to record the interpreter process exception. The
>>>> cluster re-created the screenshot of the interpreter process in the idle
>>>> server, but I am too tired now.
>>>> There is time to record later.
>>>> 
>>>> 
>>>>> 在 2018年7月19日，上午7:36，Ruslan Dautkhanov <dautkhanov@gmail.com <ma...@gmail.com>> 写道：
>>>>> 
>>>>> Thank you luxun,
>>>>> 
>>>>> I left a couple of comments in that google document.
>>>>> 
>>>>> --
>>>>> Ruslan Dautkhanov
>>>>> 
>>>>> 
>>>>> On Tue, Jul 17, 2018 at 11:30 PM liuxun <neliuxun@163.com <ma...@163.com> <mailto:
>>>> neliuxun@163.com <ma...@163.com>>> wrote:
>>>>> hi，Ruslan Dautkhanov
>>>>> 
>>>>> Thank you very much for your question. according to your advice, I added
>>>> 3 schematics to illustrate.
>>>>> 1. Distributed Zeppelin Deployment architecture diagram.
>>>>> 2. Distributed zeppelin Server fault tolerance diagram.
>>>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>>>>> 
>>>>> 
>>>>> The email attachment exceeded the size limit, so I reorganized the
>>>> document and updated it with Google Docs.
>>>>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu>
>>>> VDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/ <https://docs.google.com/document/d/>
>>>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
>>>>> 
>>>>> 
>>>>>> 在 2018年7月18日，下午1:03，liuxun <neliuxun@163.com <ma...@163.com> <mailto:neliuxun@163.com <ma...@163.com>>>
>>>> 写道：
>>>>>> 
>>>>>> hi，Ruslan Dautkhanov
>>>>>> 
>>>>>> Thank you very much for your question. according to your advice, I
>>>> added 3 schematics to illustrate.
>>>>>> 1. Zeppelin Cluster architecture diagram.
>>>>>> 2. Distributed zeppelin Server fault tolerance diagram.
>>>>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>>>>>> 
>>>>>> Later, I will merge the schematic into the system design document.
>>>>>> 
>>>>>> <Zeppelin system architecture diagram00.png>
>>>>>> 
>>>>>> 
>>>>>> <Distributed zeppelin Server fault tolerance diagram 1.png>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> <Distributed zeppelin Server fault tolerance diagram 2.png>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <dautkhanov@gmail.com <ma...@gmail.com> <mailto:
>>>> dautkhanov@gmail.com <ma...@gmail.com>>> 写道：
>>>>>>> 
>>>>>>> Nice.
>>>>>>> 
>>>>>>> Thanks for sharing.
>>>>>>> 
>>>>>>> Can you explain how are users routed into a particular zeppelin server
>>>>>>> instance? I've seen nginx on top of them, but I don't think the
>>>> document
>>>>>>> covers details? If one zeppelin server goes down or unhealthy, is nginx
>>>>>>> supposed to detect (if so, how?) that and reroute users to a survived
>>>>>>> instance?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Ruslan Dautkhanov
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliuxun@163.com <ma...@163.com> <mailto:
>>>> neliuxun@163.com <ma...@163.com>>> wrote:
>>>>>>> 
>>>>>>>> hi:
>>>>>>>> 
>>>>>>>> Our company installed and deployed a lot of zeppelin for data
>>>> analysis.
>>>>>>>> The single server version of zeppelin could not meet our application
>>>>>>>> scenarios, so we transformed zeppelin into a clustered service that
>>>>>>>> supports distributed deployment, Have a unified entrance, high
>>>>>>>> availability, and High server resource usage.  the email attachment
>>>> is the
>>>>>>>> entire design document, I am very happy to feedback our modified code
>>>> back
>>>>>>>> to the community.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> this is the JIRA I submitted in the community,
>>>>>>>> 
>>>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <https://issues.apache.org/jira/browse/ZEPPELIN-3471> <
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <https://issues.apache.org/jira/browse/ZEPPELIN-3471>>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Since the design document size exceeds the mail attachment size
>>>> limit, the
>>>>>>>> document link address has to be sent.
>>>>>>>> 
>>>>>>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin% <https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%>
>>>> 20distributed%20architecture%20design.pdf <https://issues.apache.org/ <https://issues.apache.org/>
>>>> jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%
>>>> 20design.pdf>
>>>>>>>> 
>>>>>>>> https://issues.apache.org/jira/secure/attachment/ <https://issues.apache.org/jira/secure/attachment/>
>>>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png <
>>>> https://issues.apache.org/jira/secure/attachment/ <https://issues.apache.org/jira/secure/attachment/>
>>>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> liuxun
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> 이종열, Jongyoul Lee, 李宗烈
>>> http://madeng.net <http://madeng.net/>
>> 
>

Re: Zeppelin distributed architecture design

Posted by liuxun <ne...@163.com>.

hi，

I have submitted the first module of the zeppline cluster upgrade, please help me review the code, thank you!
https://github.com/apache/zeppelin/pull/3156 <https://github.com/apache/zeppelin/pull/3156>

I updated the atomix algorithm library module in the system design documentation, please click on the link below to browse.
https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8 <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8>



> 在 2018年8月11日，上午10:36，liuxun <ne...@163.com> 写道：
> 
> hi，
> 
> After 2 weeks of development, I have completed the development of upgrading copycat to the atomix algorithm library.
> The reason for the increased workload is the need to resolve the problem of netty package conflicts. Now it has been used on our intra-company clusters using the atomix algorithm.
> 
> Because atomix uses the 4.1.27-Final version of the netty JAR package.
> If you put the high version of the netty package directly in ./zeppelin/lib or the ./zeppelin/interpreter path, it will conflict with the netty package version of spark, causing the spark-interpreter to fail.
> Need to be isolated in zeppelin-server and interpreter-process by loading the atomix netty JAR and the netty package in the classpath through the custom classloader.
> 
> I updated the atomix algorithm library module in the system design documentation, please click on the link below to browse.
> 
> Atomix Raft algorithm library
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8 <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8>
> 
> I will send a new code to submit the pull, please help me merge it, thank you.
> 
> Thanks,
> Xun Liu
> 
>> 在 2018年7月24日，下午12:57，liuxun <neliuxun@163.com <ma...@163.com>> 写道：
>> 
>> @Jongyoul Lee：
>> Thank you for your attention.
>> 
>> Indeed, as you said, the `Copycat` project has been closed and has been migrated to `https://github.com/atomix/atomix` <https://github.com/atomix/atomix%60>.
>> 
>> I also considered this issue during development.
>> The main reason was that it was enough to realize Raft using `Copycat` at the time, and it was not considered too long.
>> 
>> Today, I took a look at the documentation of atomix, https://atomix.io/docs/latest/user-manual/ <https://atomix.io/docs/latest/user-manual/> , 
>> which has a lot of features, such as broadcasting messages in the cluster, detecting cluster events... ,
>> From the perspective of zeppelin's long-term development, it is better to use atomix.
>> So, I will switch the Raft protocol algorithm library to atomix, which is not difficult to modify.
>> 
>> Struggle for zeppelin!!! :-)
>> 
>> 
>>> 在 2018年7月24日，上午9:35，Jongyoul Lee <jongyoul@gmail.com <ma...@gmail.com>> 写道：
>>> 
>>> First of all, thank you for your effort and contribution.
>>> 
>>> I read it carefully today, and personally, it's a very nice feature and
>>> idea.
>>> 
>>> Let's discuss it and improve more concretely. I also left comments on the
>>> doc.
>>> 
>>> And I have a simple question.
>>> 
>>> `Copycat`, which you used to implement it, is deprecated by owner[1] and
>>> moved under https://github.com/atomix/atomix/ <https://github.com/atomix/atomix/>. I'm afraid of it. Do you
>>> have any reason to use this library? It's even SNAPSHOT version.
>>> 
>>> Regards,
>>> JL
>>> 
>>> [1]: https://github.com/atomix/copycat <https://github.com/atomix/copycat>
>>> 
>>> On Sat, Jul 21, 2018 at 2:07 AM, liuxun <neliuxun@163.com <ma...@163.com>> wrote:
>>> 
>>>> HI：
>>>> 
>>>> In order to more intuitively express the actual use of distributed
>>>> zeppelin clusters.
>>>> I updated this design document, starting with the 16th page of the
>>>> document, adding 2 GIF animations showing the operation record screen of
>>>> the zeppelin cluster we are using now.
>>>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu>
>>>> VDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/ <https://docs.google.com/document/d/>
>>>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>
>>>> 
>>>> Distributed clustered zeppelin is already in use at our company, and the
>>>> recorded screens are all real.
>>>> The first recorded screens GIF shows the following
>>>> Create a cluster of three zeppelin servers
>>>> Add 234, 235, 236 to the zeppelin.cluster.addr attribute in
>>>> zeppelin-site.xml to create a cluster
>>>> Start these 3 servers at the same time
>>>> Open the web pages of these 3 servers and prepare for the notebook
>>>> operation.
>>>> 
>>>> 
>>>> The second recorded screens GIF shows the following
>>>> Create an interpreter process in the cluster
>>>> Create a notebook on host234 and execute it, This action will create an
>>>> interpreter process in the server with free resources in the cluster.
>>>> You can then continue editing this notebook on host235 and execute it, You
>>>> can return results immediately without waiting for the time to create an
>>>> interpreter process.
>>>> Again, you can continue to edit this notebook on host236. And execute it,
>>>> you can return results immediately without waiting for the time to create
>>>> the interpreter process
>>>> The same notebook will reuse the first created interpreter process, so you
>>>> can get the execution result immediately on any server.
>>>> By looking at the background server process, you will find that host234,
>>>> host235, and host235 use the same interpreter process for the same notebook.
>>>> 
>>>> Originally, I wanted to record the interpreter process exception. The
>>>> cluster re-created the screenshot of the interpreter process in the idle
>>>> server, but I am too tired now.
>>>> There is time to record later.
>>>> 
>>>> 
>>>>> 在 2018年7月19日，上午7:36，Ruslan Dautkhanov <dautkhanov@gmail.com <ma...@gmail.com>> 写道：
>>>>> 
>>>>> Thank you luxun,
>>>>> 
>>>>> I left a couple of comments in that google document.
>>>>> 
>>>>> --
>>>>> Ruslan Dautkhanov
>>>>> 
>>>>> 
>>>>> On Tue, Jul 17, 2018 at 11:30 PM liuxun <neliuxun@163.com <ma...@163.com> <mailto:
>>>> neliuxun@163.com <ma...@163.com>>> wrote:
>>>>> hi，Ruslan Dautkhanov
>>>>> 
>>>>> Thank you very much for your question. according to your advice, I added
>>>> 3 schematics to illustrate.
>>>>> 1. Distributed Zeppelin Deployment architecture diagram.
>>>>> 2. Distributed zeppelin Server fault tolerance diagram.
>>>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>>>>> 
>>>>> 
>>>>> The email attachment exceeded the size limit, so I reorganized the
>>>> document and updated it with Google Docs.
>>>>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu>
>>>> VDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/ <https://docs.google.com/document/d/>
>>>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
>>>>> 
>>>>> 
>>>>>> 在 2018年7月18日，下午1:03，liuxun <neliuxun@163.com <ma...@163.com> <mailto:neliuxun@163.com <ma...@163.com>>>
>>>> 写道：
>>>>>> 
>>>>>> hi，Ruslan Dautkhanov
>>>>>> 
>>>>>> Thank you very much for your question. according to your advice, I
>>>> added 3 schematics to illustrate.
>>>>>> 1. Zeppelin Cluster architecture diagram.
>>>>>> 2. Distributed zeppelin Server fault tolerance diagram.
>>>>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>>>>>> 
>>>>>> Later, I will merge the schematic into the system design document.
>>>>>> 
>>>>>> <Zeppelin system architecture diagram00.png>
>>>>>> 
>>>>>> 
>>>>>> <Distributed zeppelin Server fault tolerance diagram 1.png>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> <Distributed zeppelin Server fault tolerance diagram 2.png>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <dautkhanov@gmail.com <ma...@gmail.com> <mailto:
>>>> dautkhanov@gmail.com <ma...@gmail.com>>> 写道：
>>>>>>> 
>>>>>>> Nice.
>>>>>>> 
>>>>>>> Thanks for sharing.
>>>>>>> 
>>>>>>> Can you explain how are users routed into a particular zeppelin server
>>>>>>> instance? I've seen nginx on top of them, but I don't think the
>>>> document
>>>>>>> covers details? If one zeppelin server goes down or unhealthy, is nginx
>>>>>>> supposed to detect (if so, how?) that and reroute users to a survived
>>>>>>> instance?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Ruslan Dautkhanov
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliuxun@163.com <ma...@163.com> <mailto:
>>>> neliuxun@163.com <ma...@163.com>>> wrote:
>>>>>>> 
>>>>>>>> hi:
>>>>>>>> 
>>>>>>>> Our company installed and deployed a lot of zeppelin for data
>>>> analysis.
>>>>>>>> The single server version of zeppelin could not meet our application
>>>>>>>> scenarios, so we transformed zeppelin into a clustered service that
>>>>>>>> supports distributed deployment, Have a unified entrance, high
>>>>>>>> availability, and High server resource usage.  the email attachment
>>>> is the
>>>>>>>> entire design document, I am very happy to feedback our modified code
>>>> back
>>>>>>>> to the community.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> this is the JIRA I submitted in the community,
>>>>>>>> 
>>>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <https://issues.apache.org/jira/browse/ZEPPELIN-3471> <
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <https://issues.apache.org/jira/browse/ZEPPELIN-3471>>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Since the design document size exceeds the mail attachment size
>>>> limit, the
>>>>>>>> document link address has to be sent.
>>>>>>>> 
>>>>>>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin% <https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%>
>>>> 20distributed%20architecture%20design.pdf <https://issues.apache.org/ <https://issues.apache.org/>
>>>> jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%
>>>> 20design.pdf>
>>>>>>>> 
>>>>>>>> https://issues.apache.org/jira/secure/attachment/ <https://issues.apache.org/jira/secure/attachment/>
>>>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png <
>>>> https://issues.apache.org/jira/secure/attachment/ <https://issues.apache.org/jira/secure/attachment/>
>>>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> liuxun
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> 이종열, Jongyoul Lee, 李宗烈
>>> http://madeng.net <http://madeng.net/>
>> 
>

Re: Zeppelin distributed architecture design

Posted by liuxun <ne...@163.com>.

hi，

After 2 weeks of development, I have completed the development of upgrading copycat to the atomix algorithm library.
The reason for the increased workload is the need to resolve the problem of netty package conflicts. Now it has been used on our intra-company clusters using the atomix algorithm.

Because atomix uses the 4.1.27-Final version of the netty JAR package.
If you put the high version of the netty package directly in ./zeppelin/lib or the ./zeppelin/interpreter path, it will conflict with the netty package version of spark, causing the spark-interpreter to fail.
Need to be isolated in zeppelin-server and interpreter-process by loading the atomix netty JAR and the netty package in the classpath through the custom classloader.

I updated the atomix algorithm library module in the system design documentation, please click on the link below to browse.

Atomix Raft algorithm library
https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8 <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8>

I will send a new code to submit the pull, please help me merge it, thank you.

Thanks,
Xun Liu

> 在 2018年7月24日，下午12:57，liuxun <ne...@163.com> 写道：
> 
> @Jongyoul Lee：
> Thank you for your attention.
> 
> Indeed, as you said, the `Copycat` project has been closed and has been migrated to `https://github.com/atomix/atomix` <https://github.com/atomix/atomix%60>.
> 
> I also considered this issue during development.
> The main reason was that it was enough to realize Raft using `Copycat` at the time, and it was not considered too long.
> 
> Today, I took a look at the documentation of atomix, https://atomix.io/docs/latest/user-manual/ <https://atomix.io/docs/latest/user-manual/> , 
> which has a lot of features, such as broadcasting messages in the cluster, detecting cluster events... ,
> From the perspective of zeppelin's long-term development, it is better to use atomix.
> So, I will switch the Raft protocol algorithm library to atomix, which is not difficult to modify.
> 
> Struggle for zeppelin!!! :-)
> 
> 
>> 在 2018年7月24日，上午9:35，Jongyoul Lee <jongyoul@gmail.com <ma...@gmail.com>> 写道：
>> 
>> First of all, thank you for your effort and contribution.
>> 
>> I read it carefully today, and personally, it's a very nice feature and
>> idea.
>> 
>> Let's discuss it and improve more concretely. I also left comments on the
>> doc.
>> 
>> And I have a simple question.
>> 
>> `Copycat`, which you used to implement it, is deprecated by owner[1] and
>> moved under https://github.com/atomix/atomix/ <https://github.com/atomix/atomix/>. I'm afraid of it. Do you
>> have any reason to use this library? It's even SNAPSHOT version.
>> 
>> Regards,
>> JL
>> 
>> [1]: https://github.com/atomix/copycat <https://github.com/atomix/copycat>
>> 
>> On Sat, Jul 21, 2018 at 2:07 AM, liuxun <neliuxun@163.com <ma...@163.com>> wrote:
>> 
>>> HI：
>>> 
>>> In order to more intuitively express the actual use of distributed
>>> zeppelin clusters.
>>> I updated this design document, starting with the 16th page of the
>>> document, adding 2 GIF animations showing the operation record screen of
>>> the zeppelin cluster we are using now.
>>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu>
>>> VDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/
>>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>
>>> 
>>> Distributed clustered zeppelin is already in use at our company, and the
>>> recorded screens are all real.
>>> The first recorded screens GIF shows the following
>>> Create a cluster of three zeppelin servers
>>> Add 234, 235, 236 to the zeppelin.cluster.addr attribute in
>>> zeppelin-site.xml to create a cluster
>>> Start these 3 servers at the same time
>>> Open the web pages of these 3 servers and prepare for the notebook
>>> operation.
>>> 
>>> 
>>> The second recorded screens GIF shows the following
>>> Create an interpreter process in the cluster
>>> Create a notebook on host234 and execute it, This action will create an
>>> interpreter process in the server with free resources in the cluster.
>>> You can then continue editing this notebook on host235 and execute it, You
>>> can return results immediately without waiting for the time to create an
>>> interpreter process.
>>> Again, you can continue to edit this notebook on host236. And execute it,
>>> you can return results immediately without waiting for the time to create
>>> the interpreter process
>>> The same notebook will reuse the first created interpreter process, so you
>>> can get the execution result immediately on any server.
>>> By looking at the background server process, you will find that host234,
>>> host235, and host235 use the same interpreter process for the same notebook.
>>> 
>>> Originally, I wanted to record the interpreter process exception. The
>>> cluster re-created the screenshot of the interpreter process in the idle
>>> server, but I am too tired now.
>>> There is time to record later.
>>> 
>>> 
>>>> 在 2018年7月19日，上午7:36，Ruslan Dautkhanov <da...@gmail.com> 写道：
>>>> 
>>>> Thank you luxun,
>>>> 
>>>> I left a couple of comments in that google document.
>>>> 
>>>> --
>>>> Ruslan Dautkhanov
>>>> 
>>>> 
>>>> On Tue, Jul 17, 2018 at 11:30 PM liuxun <neliuxun@163.com <mailto:
>>> neliuxun@163.com>> wrote:
>>>> hi，Ruslan Dautkhanov
>>>> 
>>>> Thank you very much for your question. according to your advice, I added
>>> 3 schematics to illustrate.
>>>> 1. Distributed Zeppelin Deployment architecture diagram.
>>>> 2. Distributed zeppelin Server fault tolerance diagram.
>>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>>>> 
>>>> 
>>>> The email attachment exceeded the size limit, so I reorganized the
>>> document and updated it with Google Docs.
>>>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
>>> VDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/
>>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
>>>> 
>>>> 
>>>>> 在 2018年7月18日，下午1:03，liuxun <neliuxun@163.com <ma...@163.com>>
>>> 写道：
>>>>> 
>>>>> hi，Ruslan Dautkhanov
>>>>> 
>>>>> Thank you very much for your question. according to your advice, I
>>> added 3 schematics to illustrate.
>>>>> 1. Zeppelin Cluster architecture diagram.
>>>>> 2. Distributed zeppelin Server fault tolerance diagram.
>>>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>>>>> 
>>>>> Later, I will merge the schematic into the system design document.
>>>>> 
>>>>> <Zeppelin system architecture diagram00.png>
>>>>> 
>>>>> 
>>>>> <Distributed zeppelin Server fault tolerance diagram 1.png>
>>>>> 
>>>>> 
>>>>> 
>>>>> <Distributed zeppelin Server fault tolerance diagram 2.png>
>>>>> 
>>>>> 
>>>>> 
>>>>>> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <dautkhanov@gmail.com <mailto:
>>> dautkhanov@gmail.com>> 写道：
>>>>>> 
>>>>>> Nice.
>>>>>> 
>>>>>> Thanks for sharing.
>>>>>> 
>>>>>> Can you explain how are users routed into a particular zeppelin server
>>>>>> instance? I've seen nginx on top of them, but I don't think the
>>> document
>>>>>> covers details? If one zeppelin server goes down or unhealthy, is nginx
>>>>>> supposed to detect (if so, how?) that and reroute users to a survived
>>>>>> instance?
>>>>>> 
>>>>>> Thanks,
>>>>>> Ruslan Dautkhanov
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliuxun@163.com <mailto:
>>> neliuxun@163.com>> wrote:
>>>>>> 
>>>>>>> hi:
>>>>>>> 
>>>>>>> Our company installed and deployed a lot of zeppelin for data
>>> analysis.
>>>>>>> The single server version of zeppelin could not meet our application
>>>>>>> scenarios, so we transformed zeppelin into a clustered service that
>>>>>>> supports distributed deployment, Have a unified entrance, high
>>>>>>> availability, and High server resource usage.  the email attachment
>>> is the
>>>>>>> entire design document, I am very happy to feedback our modified code
>>> back
>>>>>>> to the community.
>>>>>>> 
>>>>>>> 
>>>>>>> this is the JIRA I submitted in the community,
>>>>>>> 
>>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471>
>>>>>>> 
>>>>>>> 
>>>>>>> Since the design document size exceeds the mail attachment size
>>> limit, the
>>>>>>> document link address has to be sent.
>>>>>>> 
>>>>>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%
>>> 20distributed%20architecture%20design.pdf <https://issues.apache.org/
>>> jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%
>>> 20design.pdf>
>>>>>>> 
>>>>>>> https://issues.apache.org/jira/secure/attachment/
>>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png <
>>> https://issues.apache.org/jira/secure/attachment/
>>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png>
>>>>>>> 
>>>>>>> 
>>>>>>> liuxun
>>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> 이종열, Jongyoul Lee, 李宗烈
>> http://madeng.net <http://madeng.net/>
>

Re: Zeppelin distributed architecture design

Posted by Jongyoul Lee <jo...@gmail.com>.

Thank you.

I fully agree with you that we need a framework to support distributed
version. IMHO, we cannot afford to develop our own. I'll dig into atomix as
well.



On Tue, Jul 24, 2018 at 1:57 PM, liuxun <ne...@163.com> wrote:

> @Jongyoul Lee：
> Thank you for your attention.
>
> Indeed, as you said, the `Copycat` project has been closed and has been
> migrated to `https://github.com/atomix/atomix`
> <https://github.com/atomix/atomix>.
>
> I also considered this issue during development.
> The main reason was that it was enough to realize Raft using `Copycat` at
> the time, and it was not considered too long.
>
> Today, I took a look at the documentation of atomix,
> https://atomix.io/docs/latest/user-manual/ ,
> which has a lot of features, such as broadcasting messages in the cluster,
> detecting cluster events... ,
> From the perspective of zeppelin's long-term development, it is better to
> use atomix.
> So, I will switch the Raft protocol algorithm library to atomix, which is
> not difficult to modify.
>
> Struggle for zeppelin!!! :-)
>
>
> 在 2018年7月24日，上午9:35，Jongyoul Lee <jo...@gmail.com> 写道：
>
> First of all, thank you for your effort and contribution.
>
> I read it carefully today, and personally, it's a very nice feature and
> idea.
>
> Let's discuss it and improve more concretely. I also left comments on the
> doc.
>
> And I have a simple question.
>
> `Copycat`, which you used to implement it, is deprecated by owner[1] and
> moved under https://github.com/atomix/atomix/. I'm afraid of it. Do you
> have any reason to use this library? It's even SNAPSHOT version.
>
> Regards,
> JL
>
> [1]: https://github.com/atomix/copycat
>
> On Sat, Jul 21, 2018 at 2:07 AM, liuxun <ne...@163.com> wrote:
>
> HI：
>
> In order to more intuitively express the actual use of distributed
> zeppelin clusters.
> I updated this design document, starting with the 16th page of the
> document, adding 2 GIF animations showing the operation record screen of
> the zeppelin cluster we are using now.
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
> VDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/
> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>
>
> Distributed clustered zeppelin is already in use at our company, and the
> recorded screens are all real.
> The first recorded screens GIF shows the following
> Create a cluster of three zeppelin servers
> Add 234, 235, 236 to the zeppelin.cluster.addr attribute in
> zeppelin-site.xml to create a cluster
> Start these 3 servers at the same time
> Open the web pages of these 3 servers and prepare for the notebook
> operation.
>
>
> The second recorded screens GIF shows the following
> Create an interpreter process in the cluster
> Create a notebook on host234 and execute it, This action will create an
> interpreter process in the server with free resources in the cluster.
> You can then continue editing this notebook on host235 and execute it, You
> can return results immediately without waiting for the time to create an
> interpreter process.
> Again, you can continue to edit this notebook on host236. And execute it,
> you can return results immediately without waiting for the time to create
> the interpreter process
> The same notebook will reuse the first created interpreter process, so you
> can get the execution result immediately on any server.
> By looking at the background server process, you will find that host234,
> host235, and host235 use the same interpreter process for the same
> notebook.
>
> Originally, I wanted to record the interpreter process exception. The
> cluster re-created the screenshot of the interpreter process in the idle
> server, but I am too tired now.
> There is time to record later.
>
>
> 在 2018年7月19日，上午7:36，Ruslan Dautkhanov <da...@gmail.com> 写道：
>
> Thank you luxun,
>
> I left a couple of comments in that google document.
>
> --
> Ruslan Dautkhanov
>
>
> On Tue, Jul 17, 2018 at 11:30 PM liuxun <neliuxun@163.com <mailto:
>
> neliuxun@163.com>> wrote:
>
> hi，Ruslan Dautkhanov
>
> Thank you very much for your question. according to your advice, I added
>
> 3 schematics to illustrate.
>
> 1. Distributed Zeppelin Deployment architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>
>
> The email attachment exceeded the size limit, so I reorganized the
>
> document and updated it with Google Docs.
>
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
>
> VDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/
> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
>
>
>
> 在 2018年7月18日，下午1:03，liuxun <neliuxun@163.com <ma...@163.com>>
>
> 写道：
>
>
> hi，Ruslan Dautkhanov
>
> Thank you very much for your question. according to your advice, I
>
> added 3 schematics to illustrate.
>
> 1. Zeppelin Cluster architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>
> Later, I will merge the schematic into the system design document.
>
> <Zeppelin system architecture diagram00.png>
>
>
> <Distributed zeppelin Server fault tolerance diagram 1.png>
>
>
>
> <Distributed zeppelin Server fault tolerance diagram 2.png>
>
>
>
> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <dautkhanov@gmail.com <mailto:
>
> dautkhanov@gmail.com>> 写道：
>
>
> Nice.
>
> Thanks for sharing.
>
> Can you explain how are users routed into a particular zeppelin server
> instance? I've seen nginx on top of them, but I don't think the
>
> document
>
> covers details? If one zeppelin server goes down or unhealthy, is nginx
> supposed to detect (if so, how?) that and reroute users to a survived
> instance?
>
> Thanks,
> Ruslan Dautkhanov
>
>
> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliuxun@163.com <mailto:
>
> neliuxun@163.com>> wrote:
>
>
> hi:
>
> Our company installed and deployed a lot of zeppelin for data
>
> analysis.
>
> The single server version of zeppelin could not meet our application
> scenarios, so we transformed zeppelin into a clustered service that
> supports distributed deployment, Have a unified entrance, high
> availability, and High server resource usage.  the email attachment
>
> is the
>
> entire design document, I am very happy to feedback our modified code
>
> back
>
> to the community.
>
>
> this is the JIRA I submitted in the community,
>
> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <
>
> https://issues.apache.org/jira/browse/ZEPPELIN-3471>
>
>
>
> Since the design document size exceeds the mail attachment size
>
> limit, the
>
> document link address has to be sent.
>
> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%
>
> 20distributed%20architecture%20design.pdf <https://issues.apache.org/
> jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%
> 20design.pdf>
>
>
> https://issues.apache.org/jira/secure/attachment/
>
> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png <
> https://issues.apache.org/jira/secure/attachment/
> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png>
>
>
>
> liuxun
>
>
>
>
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>
>
>


-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: Zeppelin distributed architecture design

Posted by Jongyoul Lee <jo...@gmail.com>.

Thank you.

I fully agree with you that we need a framework to support distributed
version. IMHO, we cannot afford to develop our own. I'll dig into atomix as
well.



On Tue, Jul 24, 2018 at 1:57 PM, liuxun <ne...@163.com> wrote:

> @Jongyoul Lee：
> Thank you for your attention.
>
> Indeed, as you said, the `Copycat` project has been closed and has been
> migrated to `https://github.com/atomix/atomix`
> <https://github.com/atomix/atomix>.
>
> I also considered this issue during development.
> The main reason was that it was enough to realize Raft using `Copycat` at
> the time, and it was not considered too long.
>
> Today, I took a look at the documentation of atomix,
> https://atomix.io/docs/latest/user-manual/ ,
> which has a lot of features, such as broadcasting messages in the cluster,
> detecting cluster events... ,
> From the perspective of zeppelin's long-term development, it is better to
> use atomix.
> So, I will switch the Raft protocol algorithm library to atomix, which is
> not difficult to modify.
>
> Struggle for zeppelin!!! :-)
>
>
> 在 2018年7月24日，上午9:35，Jongyoul Lee <jo...@gmail.com> 写道：
>
> First of all, thank you for your effort and contribution.
>
> I read it carefully today, and personally, it's a very nice feature and
> idea.
>
> Let's discuss it and improve more concretely. I also left comments on the
> doc.
>
> And I have a simple question.
>
> `Copycat`, which you used to implement it, is deprecated by owner[1] and
> moved under https://github.com/atomix/atomix/. I'm afraid of it. Do you
> have any reason to use this library? It's even SNAPSHOT version.
>
> Regards,
> JL
>
> [1]: https://github.com/atomix/copycat
>
> On Sat, Jul 21, 2018 at 2:07 AM, liuxun <ne...@163.com> wrote:
>
> HI：
>
> In order to more intuitively express the actual use of distributed
> zeppelin clusters.
> I updated this design document, starting with the 16th page of the
> document, adding 2 GIF animations showing the operation record screen of
> the zeppelin cluster we are using now.
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
> VDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/
> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>
>
> Distributed clustered zeppelin is already in use at our company, and the
> recorded screens are all real.
> The first recorded screens GIF shows the following
> Create a cluster of three zeppelin servers
> Add 234, 235, 236 to the zeppelin.cluster.addr attribute in
> zeppelin-site.xml to create a cluster
> Start these 3 servers at the same time
> Open the web pages of these 3 servers and prepare for the notebook
> operation.
>
>
> The second recorded screens GIF shows the following
> Create an interpreter process in the cluster
> Create a notebook on host234 and execute it, This action will create an
> interpreter process in the server with free resources in the cluster.
> You can then continue editing this notebook on host235 and execute it, You
> can return results immediately without waiting for the time to create an
> interpreter process.
> Again, you can continue to edit this notebook on host236. And execute it,
> you can return results immediately without waiting for the time to create
> the interpreter process
> The same notebook will reuse the first created interpreter process, so you
> can get the execution result immediately on any server.
> By looking at the background server process, you will find that host234,
> host235, and host235 use the same interpreter process for the same
> notebook.
>
> Originally, I wanted to record the interpreter process exception. The
> cluster re-created the screenshot of the interpreter process in the idle
> server, but I am too tired now.
> There is time to record later.
>
>
> 在 2018年7月19日，上午7:36，Ruslan Dautkhanov <da...@gmail.com> 写道：
>
> Thank you luxun,
>
> I left a couple of comments in that google document.
>
> --
> Ruslan Dautkhanov
>
>
> On Tue, Jul 17, 2018 at 11:30 PM liuxun <neliuxun@163.com <mailto:
>
> neliuxun@163.com>> wrote:
>
> hi，Ruslan Dautkhanov
>
> Thank you very much for your question. according to your advice, I added
>
> 3 schematics to illustrate.
>
> 1. Distributed Zeppelin Deployment architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>
>
> The email attachment exceeded the size limit, so I reorganized the
>
> document and updated it with Google Docs.
>
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
>
> VDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/
> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
>
>
>
> 在 2018年7月18日，下午1:03，liuxun <neliuxun@163.com <ma...@163.com>>
>
> 写道：
>
>
> hi，Ruslan Dautkhanov
>
> Thank you very much for your question. according to your advice, I
>
> added 3 schematics to illustrate.
>
> 1. Zeppelin Cluster architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>
> Later, I will merge the schematic into the system design document.
>
> <Zeppelin system architecture diagram00.png>
>
>
> <Distributed zeppelin Server fault tolerance diagram 1.png>
>
>
>
> <Distributed zeppelin Server fault tolerance diagram 2.png>
>
>
>
> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <dautkhanov@gmail.com <mailto:
>
> dautkhanov@gmail.com>> 写道：
>
>
> Nice.
>
> Thanks for sharing.
>
> Can you explain how are users routed into a particular zeppelin server
> instance? I've seen nginx on top of them, but I don't think the
>
> document
>
> covers details? If one zeppelin server goes down or unhealthy, is nginx
> supposed to detect (if so, how?) that and reroute users to a survived
> instance?
>
> Thanks,
> Ruslan Dautkhanov
>
>
> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliuxun@163.com <mailto:
>
> neliuxun@163.com>> wrote:
>
>
> hi:
>
> Our company installed and deployed a lot of zeppelin for data
>
> analysis.
>
> The single server version of zeppelin could not meet our application
> scenarios, so we transformed zeppelin into a clustered service that
> supports distributed deployment, Have a unified entrance, high
> availability, and High server resource usage.  the email attachment
>
> is the
>
> entire design document, I am very happy to feedback our modified code
>
> back
>
> to the community.
>
>
> this is the JIRA I submitted in the community,
>
> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <
>
> https://issues.apache.org/jira/browse/ZEPPELIN-3471>
>
>
>
> Since the design document size exceeds the mail attachment size
>
> limit, the
>
> document link address has to be sent.
>
> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%
>
> 20distributed%20architecture%20design.pdf <https://issues.apache.org/
> jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%
> 20design.pdf>
>
>
> https://issues.apache.org/jira/secure/attachment/
>
> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png <
> https://issues.apache.org/jira/secure/attachment/
> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png>
>
>
>
> liuxun
>
>
>
>
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>
>
>


-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: Zeppelin distributed architecture design

Posted by liuxun <ne...@163.com>.

hi，

After 2 weeks of development, I have completed the development of upgrading copycat to the atomix algorithm library.
The reason for the increased workload is the need to resolve the problem of netty package conflicts. Now it has been used on our intra-company clusters using the atomix algorithm.

Because atomix uses the 4.1.27-Final version of the netty JAR package.
If you put the high version of the netty package directly in ./zeppelin/lib or the ./zeppelin/interpreter path, it will conflict with the netty package version of spark, causing the spark-interpreter to fail.
Need to be isolated in zeppelin-server and interpreter-process by loading the atomix netty JAR and the netty package in the classpath through the custom classloader.

I updated the atomix algorithm library module in the system design documentation, please click on the link below to browse.

Atomix Raft algorithm library
https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8 <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8>

I will send a new code to submit the pull, please help me merge it, thank you.

Thanks,
Xun Liu

> 在 2018年7月24日，下午12:57，liuxun <ne...@163.com> 写道：
> 
> @Jongyoul Lee：
> Thank you for your attention.
> 
> Indeed, as you said, the `Copycat` project has been closed and has been migrated to `https://github.com/atomix/atomix` <https://github.com/atomix/atomix%60>.
> 
> I also considered this issue during development.
> The main reason was that it was enough to realize Raft using `Copycat` at the time, and it was not considered too long.
> 
> Today, I took a look at the documentation of atomix, https://atomix.io/docs/latest/user-manual/ <https://atomix.io/docs/latest/user-manual/> , 
> which has a lot of features, such as broadcasting messages in the cluster, detecting cluster events... ,
> From the perspective of zeppelin's long-term development, it is better to use atomix.
> So, I will switch the Raft protocol algorithm library to atomix, which is not difficult to modify.
> 
> Struggle for zeppelin!!! :-)
> 
> 
>> 在 2018年7月24日，上午9:35，Jongyoul Lee <jongyoul@gmail.com <ma...@gmail.com>> 写道：
>> 
>> First of all, thank you for your effort and contribution.
>> 
>> I read it carefully today, and personally, it's a very nice feature and
>> idea.
>> 
>> Let's discuss it and improve more concretely. I also left comments on the
>> doc.
>> 
>> And I have a simple question.
>> 
>> `Copycat`, which you used to implement it, is deprecated by owner[1] and
>> moved under https://github.com/atomix/atomix/ <https://github.com/atomix/atomix/>. I'm afraid of it. Do you
>> have any reason to use this library? It's even SNAPSHOT version.
>> 
>> Regards,
>> JL
>> 
>> [1]: https://github.com/atomix/copycat <https://github.com/atomix/copycat>
>> 
>> On Sat, Jul 21, 2018 at 2:07 AM, liuxun <neliuxun@163.com <ma...@163.com>> wrote:
>> 
>>> HI：
>>> 
>>> In order to more intuitively express the actual use of distributed
>>> zeppelin clusters.
>>> I updated this design document, starting with the 16th page of the
>>> document, adding 2 GIF animations showing the operation record screen of
>>> the zeppelin cluster we are using now.
>>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu>
>>> VDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/
>>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>
>>> 
>>> Distributed clustered zeppelin is already in use at our company, and the
>>> recorded screens are all real.
>>> The first recorded screens GIF shows the following
>>> Create a cluster of three zeppelin servers
>>> Add 234, 235, 236 to the zeppelin.cluster.addr attribute in
>>> zeppelin-site.xml to create a cluster
>>> Start these 3 servers at the same time
>>> Open the web pages of these 3 servers and prepare for the notebook
>>> operation.
>>> 
>>> 
>>> The second recorded screens GIF shows the following
>>> Create an interpreter process in the cluster
>>> Create a notebook on host234 and execute it, This action will create an
>>> interpreter process in the server with free resources in the cluster.
>>> You can then continue editing this notebook on host235 and execute it, You
>>> can return results immediately without waiting for the time to create an
>>> interpreter process.
>>> Again, you can continue to edit this notebook on host236. And execute it,
>>> you can return results immediately without waiting for the time to create
>>> the interpreter process
>>> The same notebook will reuse the first created interpreter process, so you
>>> can get the execution result immediately on any server.
>>> By looking at the background server process, you will find that host234,
>>> host235, and host235 use the same interpreter process for the same notebook.
>>> 
>>> Originally, I wanted to record the interpreter process exception. The
>>> cluster re-created the screenshot of the interpreter process in the idle
>>> server, but I am too tired now.
>>> There is time to record later.
>>> 
>>> 
>>>> 在 2018年7月19日，上午7:36，Ruslan Dautkhanov <da...@gmail.com> 写道：
>>>> 
>>>> Thank you luxun,
>>>> 
>>>> I left a couple of comments in that google document.
>>>> 
>>>> --
>>>> Ruslan Dautkhanov
>>>> 
>>>> 
>>>> On Tue, Jul 17, 2018 at 11:30 PM liuxun <neliuxun@163.com <mailto:
>>> neliuxun@163.com>> wrote:
>>>> hi，Ruslan Dautkhanov
>>>> 
>>>> Thank you very much for your question. according to your advice, I added
>>> 3 schematics to illustrate.
>>>> 1. Distributed Zeppelin Deployment architecture diagram.
>>>> 2. Distributed zeppelin Server fault tolerance diagram.
>>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>>>> 
>>>> 
>>>> The email attachment exceeded the size limit, so I reorganized the
>>> document and updated it with Google Docs.
>>>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
>>> VDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/
>>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
>>>> 
>>>> 
>>>>> 在 2018年7月18日，下午1:03，liuxun <neliuxun@163.com <ma...@163.com>>
>>> 写道：
>>>>> 
>>>>> hi，Ruslan Dautkhanov
>>>>> 
>>>>> Thank you very much for your question. according to your advice, I
>>> added 3 schematics to illustrate.
>>>>> 1. Zeppelin Cluster architecture diagram.
>>>>> 2. Distributed zeppelin Server fault tolerance diagram.
>>>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>>>>> 
>>>>> Later, I will merge the schematic into the system design document.
>>>>> 
>>>>> <Zeppelin system architecture diagram00.png>
>>>>> 
>>>>> 
>>>>> <Distributed zeppelin Server fault tolerance diagram 1.png>
>>>>> 
>>>>> 
>>>>> 
>>>>> <Distributed zeppelin Server fault tolerance diagram 2.png>
>>>>> 
>>>>> 
>>>>> 
>>>>>> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <dautkhanov@gmail.com <mailto:
>>> dautkhanov@gmail.com>> 写道：
>>>>>> 
>>>>>> Nice.
>>>>>> 
>>>>>> Thanks for sharing.
>>>>>> 
>>>>>> Can you explain how are users routed into a particular zeppelin server
>>>>>> instance? I've seen nginx on top of them, but I don't think the
>>> document
>>>>>> covers details? If one zeppelin server goes down or unhealthy, is nginx
>>>>>> supposed to detect (if so, how?) that and reroute users to a survived
>>>>>> instance?
>>>>>> 
>>>>>> Thanks,
>>>>>> Ruslan Dautkhanov
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliuxun@163.com <mailto:
>>> neliuxun@163.com>> wrote:
>>>>>> 
>>>>>>> hi:
>>>>>>> 
>>>>>>> Our company installed and deployed a lot of zeppelin for data
>>> analysis.
>>>>>>> The single server version of zeppelin could not meet our application
>>>>>>> scenarios, so we transformed zeppelin into a clustered service that
>>>>>>> supports distributed deployment, Have a unified entrance, high
>>>>>>> availability, and High server resource usage.  the email attachment
>>> is the
>>>>>>> entire design document, I am very happy to feedback our modified code
>>> back
>>>>>>> to the community.
>>>>>>> 
>>>>>>> 
>>>>>>> this is the JIRA I submitted in the community,
>>>>>>> 
>>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471>
>>>>>>> 
>>>>>>> 
>>>>>>> Since the design document size exceeds the mail attachment size
>>> limit, the
>>>>>>> document link address has to be sent.
>>>>>>> 
>>>>>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%
>>> 20distributed%20architecture%20design.pdf <https://issues.apache.org/
>>> jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%
>>> 20design.pdf>
>>>>>>> 
>>>>>>> https://issues.apache.org/jira/secure/attachment/
>>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png <
>>> https://issues.apache.org/jira/secure/attachment/
>>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png>
>>>>>>> 
>>>>>>> 
>>>>>>> liuxun
>>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> 이종열, Jongyoul Lee, 李宗烈
>> http://madeng.net <http://madeng.net/>
>

Re: Zeppelin distributed architecture design

Posted by liuxun <ne...@163.com>.

@Jongyoul Lee：
Thank you for your attention.

Indeed, as you said, the `Copycat` project has been closed and has been migrated to `https://github.com/atomix/atomix`.

I also considered this issue during development.
The main reason was that it was enough to realize Raft using `Copycat` at the time, and it was not considered too long.

Today, I took a look at the documentation of atomix, https://atomix.io/docs/latest/user-manual/ <https://atomix.io/docs/latest/user-manual/> , 
which has a lot of features, such as broadcasting messages in the cluster, detecting cluster events... ,
From the perspective of zeppelin's long-term development, it is better to use atomix.
So, I will switch the Raft protocol algorithm library to atomix, which is not difficult to modify.

Struggle for zeppelin!!! :-)


> 在 2018年7月24日，上午9:35，Jongyoul Lee <jo...@gmail.com> 写道：
> 
> First of all, thank you for your effort and contribution.
> 
> I read it carefully today, and personally, it's a very nice feature and
> idea.
> 
> Let's discuss it and improve more concretely. I also left comments on the
> doc.
> 
> And I have a simple question.
> 
> `Copycat`, which you used to implement it, is deprecated by owner[1] and
> moved under https://github.com/atomix/atomix/. I'm afraid of it. Do you
> have any reason to use this library? It's even SNAPSHOT version.
> 
> Regards,
> JL
> 
> [1]: https://github.com/atomix/copycat
> 
> On Sat, Jul 21, 2018 at 2:07 AM, liuxun <ne...@163.com> wrote:
> 
>> HI：
>> 
>> In order to more intuitively express the actual use of distributed
>> zeppelin clusters.
>> I updated this design document, starting with the 16th page of the
>> document, adding 2 GIF animations showing the operation record screen of
>> the zeppelin cluster we are using now.
>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
>> VDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/
>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>
>> 
>> Distributed clustered zeppelin is already in use at our company, and the
>> recorded screens are all real.
>> The first recorded screens GIF shows the following
>> Create a cluster of three zeppelin servers
>> Add 234, 235, 236 to the zeppelin.cluster.addr attribute in
>> zeppelin-site.xml to create a cluster
>> Start these 3 servers at the same time
>> Open the web pages of these 3 servers and prepare for the notebook
>> operation.
>> 
>> 
>> The second recorded screens GIF shows the following
>> Create an interpreter process in the cluster
>> Create a notebook on host234 and execute it, This action will create an
>> interpreter process in the server with free resources in the cluster.
>> You can then continue editing this notebook on host235 and execute it, You
>> can return results immediately without waiting for the time to create an
>> interpreter process.
>> Again, you can continue to edit this notebook on host236. And execute it,
>> you can return results immediately without waiting for the time to create
>> the interpreter process
>> The same notebook will reuse the first created interpreter process, so you
>> can get the execution result immediately on any server.
>> By looking at the background server process, you will find that host234,
>> host235, and host235 use the same interpreter process for the same notebook.
>> 
>> Originally, I wanted to record the interpreter process exception. The
>> cluster re-created the screenshot of the interpreter process in the idle
>> server, but I am too tired now.
>> There is time to record later.
>> 
>> 
>>> 在 2018年7月19日，上午7:36，Ruslan Dautkhanov <da...@gmail.com> 写道：
>>> 
>>> Thank you luxun,
>>> 
>>> I left a couple of comments in that google document.
>>> 
>>> --
>>> Ruslan Dautkhanov
>>> 
>>> 
>>> On Tue, Jul 17, 2018 at 11:30 PM liuxun <neliuxun@163.com <mailto:
>> neliuxun@163.com>> wrote:
>>> hi，Ruslan Dautkhanov
>>> 
>>> Thank you very much for your question. according to your advice, I added
>> 3 schematics to illustrate.
>>> 1. Distributed Zeppelin Deployment architecture diagram.
>>> 2. Distributed zeppelin Server fault tolerance diagram.
>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>>> 
>>> 
>>> The email attachment exceeded the size limit, so I reorganized the
>> document and updated it with Google Docs.
>>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
>> VDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/
>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
>>> 
>>> 
>>>> 在 2018年7月18日，下午1:03，liuxun <neliuxun@163.com <ma...@163.com>>
>> 写道：
>>>> 
>>>> hi，Ruslan Dautkhanov
>>>> 
>>>> Thank you very much for your question. according to your advice, I
>> added 3 schematics to illustrate.
>>>> 1. Zeppelin Cluster architecture diagram.
>>>> 2. Distributed zeppelin Server fault tolerance diagram.
>>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>>>> 
>>>> Later, I will merge the schematic into the system design document.
>>>> 
>>>> <Zeppelin system architecture diagram00.png>
>>>> 
>>>> 
>>>> <Distributed zeppelin Server fault tolerance diagram 1.png>
>>>> 
>>>> 
>>>> 
>>>> <Distributed zeppelin Server fault tolerance diagram 2.png>
>>>> 
>>>> 
>>>> 
>>>>> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <dautkhanov@gmail.com <mailto:
>> dautkhanov@gmail.com>> 写道：
>>>>> 
>>>>> Nice.
>>>>> 
>>>>> Thanks for sharing.
>>>>> 
>>>>> Can you explain how are users routed into a particular zeppelin server
>>>>> instance? I've seen nginx on top of them, but I don't think the
>> document
>>>>> covers details? If one zeppelin server goes down or unhealthy, is nginx
>>>>> supposed to detect (if so, how?) that and reroute users to a survived
>>>>> instance?
>>>>> 
>>>>> Thanks,
>>>>> Ruslan Dautkhanov
>>>>> 
>>>>> 
>>>>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliuxun@163.com <mailto:
>> neliuxun@163.com>> wrote:
>>>>> 
>>>>>> hi:
>>>>>> 
>>>>>> Our company installed and deployed a lot of zeppelin for data
>> analysis.
>>>>>> The single server version of zeppelin could not meet our application
>>>>>> scenarios, so we transformed zeppelin into a clustered service that
>>>>>> supports distributed deployment, Have a unified entrance, high
>>>>>> availability, and High server resource usage.  the email attachment
>> is the
>>>>>> entire design document, I am very happy to feedback our modified code
>> back
>>>>>> to the community.
>>>>>> 
>>>>>> 
>>>>>> this is the JIRA I submitted in the community,
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3471>
>>>>>> 
>>>>>> 
>>>>>> Since the design document size exceeds the mail attachment size
>> limit, the
>>>>>> document link address has to be sent.
>>>>>> 
>>>>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%
>> 20distributed%20architecture%20design.pdf <https://issues.apache.org/
>> jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%
>> 20design.pdf>
>>>>>> 
>>>>>> https://issues.apache.org/jira/secure/attachment/
>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png <
>> https://issues.apache.org/jira/secure/attachment/
>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png>
>>>>>> 
>>>>>> 
>>>>>> liuxun
>>>>>> 
>>>> 
>>> 
>> 
>> 
> 
> 
> -- 
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net

Re: Zeppelin distributed architecture design

Posted by liuxun <ne...@163.com>.

@Jongyoul Lee：
Thank you for your attention.

Indeed, as you said, the `Copycat` project has been closed and has been migrated to `https://github.com/atomix/atomix`.

I also considered this issue during development.
The main reason was that it was enough to realize Raft using `Copycat` at the time, and it was not considered too long.

Today, I took a look at the documentation of atomix, https://atomix.io/docs/latest/user-manual/ <https://atomix.io/docs/latest/user-manual/> , 
which has a lot of features, such as broadcasting messages in the cluster, detecting cluster events... ,
From the perspective of zeppelin's long-term development, it is better to use atomix.
So, I will switch the Raft protocol algorithm library to atomix, which is not difficult to modify.

Struggle for zeppelin!!! :-)


> 在 2018年7月24日，上午9:35，Jongyoul Lee <jo...@gmail.com> 写道：
> 
> First of all, thank you for your effort and contribution.
> 
> I read it carefully today, and personally, it's a very nice feature and
> idea.
> 
> Let's discuss it and improve more concretely. I also left comments on the
> doc.
> 
> And I have a simple question.
> 
> `Copycat`, which you used to implement it, is deprecated by owner[1] and
> moved under https://github.com/atomix/atomix/. I'm afraid of it. Do you
> have any reason to use this library? It's even SNAPSHOT version.
> 
> Regards,
> JL
> 
> [1]: https://github.com/atomix/copycat
> 
> On Sat, Jul 21, 2018 at 2:07 AM, liuxun <ne...@163.com> wrote:
> 
>> HI：
>> 
>> In order to more intuitively express the actual use of distributed
>> zeppelin clusters.
>> I updated this design document, starting with the 16th page of the
>> document, adding 2 GIF animations showing the operation record screen of
>> the zeppelin cluster we are using now.
>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
>> VDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/
>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>
>> 
>> Distributed clustered zeppelin is already in use at our company, and the
>> recorded screens are all real.
>> The first recorded screens GIF shows the following
>> Create a cluster of three zeppelin servers
>> Add 234, 235, 236 to the zeppelin.cluster.addr attribute in
>> zeppelin-site.xml to create a cluster
>> Start these 3 servers at the same time
>> Open the web pages of these 3 servers and prepare for the notebook
>> operation.
>> 
>> 
>> The second recorded screens GIF shows the following
>> Create an interpreter process in the cluster
>> Create a notebook on host234 and execute it, This action will create an
>> interpreter process in the server with free resources in the cluster.
>> You can then continue editing this notebook on host235 and execute it, You
>> can return results immediately without waiting for the time to create an
>> interpreter process.
>> Again, you can continue to edit this notebook on host236. And execute it,
>> you can return results immediately without waiting for the time to create
>> the interpreter process
>> The same notebook will reuse the first created interpreter process, so you
>> can get the execution result immediately on any server.
>> By looking at the background server process, you will find that host234,
>> host235, and host235 use the same interpreter process for the same notebook.
>> 
>> Originally, I wanted to record the interpreter process exception. The
>> cluster re-created the screenshot of the interpreter process in the idle
>> server, but I am too tired now.
>> There is time to record later.
>> 
>> 
>>> 在 2018年7月19日，上午7:36，Ruslan Dautkhanov <da...@gmail.com> 写道：
>>> 
>>> Thank you luxun,
>>> 
>>> I left a couple of comments in that google document.
>>> 
>>> --
>>> Ruslan Dautkhanov
>>> 
>>> 
>>> On Tue, Jul 17, 2018 at 11:30 PM liuxun <neliuxun@163.com <mailto:
>> neliuxun@163.com>> wrote:
>>> hi，Ruslan Dautkhanov
>>> 
>>> Thank you very much for your question. according to your advice, I added
>> 3 schematics to illustrate.
>>> 1. Distributed Zeppelin Deployment architecture diagram.
>>> 2. Distributed zeppelin Server fault tolerance diagram.
>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>>> 
>>> 
>>> The email attachment exceeded the size limit, so I reorganized the
>> document and updated it with Google Docs.
>>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
>> VDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/
>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
>>> 
>>> 
>>>> 在 2018年7月18日，下午1:03，liuxun <neliuxun@163.com <ma...@163.com>>
>> 写道：
>>>> 
>>>> hi，Ruslan Dautkhanov
>>>> 
>>>> Thank you very much for your question. according to your advice, I
>> added 3 schematics to illustrate.
>>>> 1. Zeppelin Cluster architecture diagram.
>>>> 2. Distributed zeppelin Server fault tolerance diagram.
>>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>>>> 
>>>> Later, I will merge the schematic into the system design document.
>>>> 
>>>> <Zeppelin system architecture diagram00.png>
>>>> 
>>>> 
>>>> <Distributed zeppelin Server fault tolerance diagram 1.png>
>>>> 
>>>> 
>>>> 
>>>> <Distributed zeppelin Server fault tolerance diagram 2.png>
>>>> 
>>>> 
>>>> 
>>>>> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <dautkhanov@gmail.com <mailto:
>> dautkhanov@gmail.com>> 写道：
>>>>> 
>>>>> Nice.
>>>>> 
>>>>> Thanks for sharing.
>>>>> 
>>>>> Can you explain how are users routed into a particular zeppelin server
>>>>> instance? I've seen nginx on top of them, but I don't think the
>> document
>>>>> covers details? If one zeppelin server goes down or unhealthy, is nginx
>>>>> supposed to detect (if so, how?) that and reroute users to a survived
>>>>> instance?
>>>>> 
>>>>> Thanks,
>>>>> Ruslan Dautkhanov
>>>>> 
>>>>> 
>>>>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliuxun@163.com <mailto:
>> neliuxun@163.com>> wrote:
>>>>> 
>>>>>> hi:
>>>>>> 
>>>>>> Our company installed and deployed a lot of zeppelin for data
>> analysis.
>>>>>> The single server version of zeppelin could not meet our application
>>>>>> scenarios, so we transformed zeppelin into a clustered service that
>>>>>> supports distributed deployment, Have a unified entrance, high
>>>>>> availability, and High server resource usage.  the email attachment
>> is the
>>>>>> entire design document, I am very happy to feedback our modified code
>> back
>>>>>> to the community.
>>>>>> 
>>>>>> 
>>>>>> this is the JIRA I submitted in the community,
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3471>
>>>>>> 
>>>>>> 
>>>>>> Since the design document size exceeds the mail attachment size
>> limit, the
>>>>>> document link address has to be sent.
>>>>>> 
>>>>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%
>> 20distributed%20architecture%20design.pdf <https://issues.apache.org/
>> jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%
>> 20design.pdf>
>>>>>> 
>>>>>> https://issues.apache.org/jira/secure/attachment/
>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png <
>> https://issues.apache.org/jira/secure/attachment/
>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png>
>>>>>> 
>>>>>> 
>>>>>> liuxun
>>>>>> 
>>>> 
>>> 
>> 
>> 
> 
> 
> -- 
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net

Re: Zeppelin distributed architecture design

Posted by Jongyoul Lee <jo...@gmail.com>.

First of all, thank you for your effort and contribution.

I read it carefully today, and personally, it's a very nice feature and
idea.

Let's discuss it and improve more concretely. I also left comments on the
doc.

And I have a simple question.

`Copycat`, which you used to implement it, is deprecated by owner[1] and
moved under https://github.com/atomix/atomix/. I'm afraid of it. Do you
have any reason to use this library? It's even SNAPSHOT version.

Regards,
JL

[1]: https://github.com/atomix/copycat

On Sat, Jul 21, 2018 at 2:07 AM, liuxun <ne...@163.com> wrote:

> HI：
>
> In order to more intuitively express the actual use of distributed
> zeppelin clusters.
> I updated this design document, starting with the 16th page of the
> document, adding 2 GIF animations showing the operation record screen of
> the zeppelin cluster we are using now.
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
> VDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/
> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>
>
> Distributed clustered zeppelin is already in use at our company, and the
> recorded screens are all real.
> The first recorded screens GIF shows the following
> Create a cluster of three zeppelin servers
> Add 234, 235, 236 to the zeppelin.cluster.addr attribute in
> zeppelin-site.xml to create a cluster
> Start these 3 servers at the same time
> Open the web pages of these 3 servers and prepare for the notebook
> operation.
>
>
> The second recorded screens GIF shows the following
> Create an interpreter process in the cluster
> Create a notebook on host234 and execute it, This action will create an
> interpreter process in the server with free resources in the cluster.
> You can then continue editing this notebook on host235 and execute it, You
> can return results immediately without waiting for the time to create an
> interpreter process.
> Again, you can continue to edit this notebook on host236. And execute it,
> you can return results immediately without waiting for the time to create
> the interpreter process
> The same notebook will reuse the first created interpreter process, so you
> can get the execution result immediately on any server.
> By looking at the background server process, you will find that host234,
> host235, and host235 use the same interpreter process for the same notebook.
>
> Originally, I wanted to record the interpreter process exception. The
> cluster re-created the screenshot of the interpreter process in the idle
> server, but I am too tired now.
> There is time to record later.
>
>
> > 在 2018年7月19日，上午7:36，Ruslan Dautkhanov <da...@gmail.com> 写道：
> >
> > Thank you luxun,
> >
> > I left a couple of comments in that google document.
> >
> > --
> > Ruslan Dautkhanov
> >
> >
> > On Tue, Jul 17, 2018 at 11:30 PM liuxun <neliuxun@163.com <mailto:
> neliuxun@163.com>> wrote:
> > hi，Ruslan Dautkhanov
> >
> > Thank you very much for your question. according to your advice, I added
> 3 schematics to illustrate.
> > 1. Distributed Zeppelin Deployment architecture diagram.
> > 2. Distributed zeppelin Server fault tolerance diagram.
> > 3. Distributed zeppelin Server & intp process fault tolerance diagram.
> >
> >
> > The email attachment exceeded the size limit, so I reorganized the
> document and updated it with Google Docs.
> > https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
> VDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/
> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
> >
> >
> >> 在 2018年7月18日，下午1:03，liuxun <neliuxun@163.com <ma...@163.com>>
> 写道：
> >>
> >> hi，Ruslan Dautkhanov
> >>
> >> Thank you very much for your question. according to your advice, I
> added 3 schematics to illustrate.
> >> 1. Zeppelin Cluster architecture diagram.
> >> 2. Distributed zeppelin Server fault tolerance diagram.
> >> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
> >>
> >> Later, I will merge the schematic into the system design document.
> >>
> >> <Zeppelin system architecture diagram00.png>
> >>
> >>
> >> <Distributed zeppelin Server fault tolerance diagram 1.png>
> >>
> >>
> >>
> >> <Distributed zeppelin Server fault tolerance diagram 2.png>
> >>
> >>
> >>
> >>> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <dautkhanov@gmail.com <mailto:
> dautkhanov@gmail.com>> 写道：
> >>>
> >>> Nice.
> >>>
> >>> Thanks for sharing.
> >>>
> >>> Can you explain how are users routed into a particular zeppelin server
> >>> instance? I've seen nginx on top of them, but I don't think the
> document
> >>> covers details? If one zeppelin server goes down or unhealthy, is nginx
> >>> supposed to detect (if so, how?) that and reroute users to a survived
> >>> instance?
> >>>
> >>> Thanks,
> >>> Ruslan Dautkhanov
> >>>
> >>>
> >>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliuxun@163.com <mailto:
> neliuxun@163.com>> wrote:
> >>>
> >>>> hi:
> >>>>
> >>>> Our company installed and deployed a lot of zeppelin for data
> analysis.
> >>>> The single server version of zeppelin could not meet our application
> >>>> scenarios, so we transformed zeppelin into a clustered service that
> >>>> supports distributed deployment, Have a unified entrance, high
> >>>> availability, and High server resource usage.  the email attachment
> is the
> >>>> entire design document, I am very happy to feedback our modified code
> back
> >>>> to the community.
> >>>>
> >>>>
> >>>> this is the JIRA I submitted in the community,
> >>>>
> >>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3471>
> >>>>
> >>>>
> >>>> Since the design document size exceeds the mail attachment size
> limit, the
> >>>> document link address has to be sent.
> >>>>
> >>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%
> 20distributed%20architecture%20design.pdf <https://issues.apache.org/
> jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%
> 20design.pdf>
> >>>>
> >>>> https://issues.apache.org/jira/secure/attachment/
> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png <
> https://issues.apache.org/jira/secure/attachment/
> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png>
> >>>>
> >>>>
> >>>> liuxun
> >>>>
> >>
> >
>
>


-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: Zeppelin distributed architecture design

Posted by Jongyoul Lee <jo...@gmail.com>.

First of all, thank you for your effort and contribution.

I read it carefully today, and personally, it's a very nice feature and
idea.

Let's discuss it and improve more concretely. I also left comments on the
doc.

And I have a simple question.

`Copycat`, which you used to implement it, is deprecated by owner[1] and
moved under https://github.com/atomix/atomix/. I'm afraid of it. Do you
have any reason to use this library? It's even SNAPSHOT version.

Regards,
JL

[1]: https://github.com/atomix/copycat

On Sat, Jul 21, 2018 at 2:07 AM, liuxun <ne...@163.com> wrote:

> HI：
>
> In order to more intuitively express the actual use of distributed
> zeppelin clusters.
> I updated this design document, starting with the 16th page of the
> document, adding 2 GIF animations showing the operation record screen of
> the zeppelin cluster we are using now.
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
> VDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/
> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>
>
> Distributed clustered zeppelin is already in use at our company, and the
> recorded screens are all real.
> The first recorded screens GIF shows the following
> Create a cluster of three zeppelin servers
> Add 234, 235, 236 to the zeppelin.cluster.addr attribute in
> zeppelin-site.xml to create a cluster
> Start these 3 servers at the same time
> Open the web pages of these 3 servers and prepare for the notebook
> operation.
>
>
> The second recorded screens GIF shows the following
> Create an interpreter process in the cluster
> Create a notebook on host234 and execute it, This action will create an
> interpreter process in the server with free resources in the cluster.
> You can then continue editing this notebook on host235 and execute it, You
> can return results immediately without waiting for the time to create an
> interpreter process.
> Again, you can continue to edit this notebook on host236. And execute it,
> you can return results immediately without waiting for the time to create
> the interpreter process
> The same notebook will reuse the first created interpreter process, so you
> can get the execution result immediately on any server.
> By looking at the background server process, you will find that host234,
> host235, and host235 use the same interpreter process for the same notebook.
>
> Originally, I wanted to record the interpreter process exception. The
> cluster re-created the screenshot of the interpreter process in the idle
> server, but I am too tired now.
> There is time to record later.
>
>
> > 在 2018年7月19日，上午7:36，Ruslan Dautkhanov <da...@gmail.com> 写道：
> >
> > Thank you luxun,
> >
> > I left a couple of comments in that google document.
> >
> > --
> > Ruslan Dautkhanov
> >
> >
> > On Tue, Jul 17, 2018 at 11:30 PM liuxun <neliuxun@163.com <mailto:
> neliuxun@163.com>> wrote:
> > hi，Ruslan Dautkhanov
> >
> > Thank you very much for your question. according to your advice, I added
> 3 schematics to illustrate.
> > 1. Distributed Zeppelin Deployment architecture diagram.
> > 2. Distributed zeppelin Server fault tolerance diagram.
> > 3. Distributed zeppelin Server & intp process fault tolerance diagram.
> >
> >
> > The email attachment exceeded the size limit, so I reorganized the
> document and updated it with Google Docs.
> > https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
> VDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/
> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
> >
> >
> >> 在 2018年7月18日，下午1:03，liuxun <neliuxun@163.com <ma...@163.com>>
> 写道：
> >>
> >> hi，Ruslan Dautkhanov
> >>
> >> Thank you very much for your question. according to your advice, I
> added 3 schematics to illustrate.
> >> 1. Zeppelin Cluster architecture diagram.
> >> 2. Distributed zeppelin Server fault tolerance diagram.
> >> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
> >>
> >> Later, I will merge the schematic into the system design document.
> >>
> >> <Zeppelin system architecture diagram00.png>
> >>
> >>
> >> <Distributed zeppelin Server fault tolerance diagram 1.png>
> >>
> >>
> >>
> >> <Distributed zeppelin Server fault tolerance diagram 2.png>
> >>
> >>
> >>
> >>> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <dautkhanov@gmail.com <mailto:
> dautkhanov@gmail.com>> 写道：
> >>>
> >>> Nice.
> >>>
> >>> Thanks for sharing.
> >>>
> >>> Can you explain how are users routed into a particular zeppelin server
> >>> instance? I've seen nginx on top of them, but I don't think the
> document
> >>> covers details? If one zeppelin server goes down or unhealthy, is nginx
> >>> supposed to detect (if so, how?) that and reroute users to a survived
> >>> instance?
> >>>
> >>> Thanks,
> >>> Ruslan Dautkhanov
> >>>
> >>>
> >>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliuxun@163.com <mailto:
> neliuxun@163.com>> wrote:
> >>>
> >>>> hi:
> >>>>
> >>>> Our company installed and deployed a lot of zeppelin for data
> analysis.
> >>>> The single server version of zeppelin could not meet our application
> >>>> scenarios, so we transformed zeppelin into a clustered service that
> >>>> supports distributed deployment, Have a unified entrance, high
> >>>> availability, and High server resource usage.  the email attachment
> is the
> >>>> entire design document, I am very happy to feedback our modified code
> back
> >>>> to the community.
> >>>>
> >>>>
> >>>> this is the JIRA I submitted in the community,
> >>>>
> >>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3471>
> >>>>
> >>>>
> >>>> Since the design document size exceeds the mail attachment size
> limit, the
> >>>> document link address has to be sent.
> >>>>
> >>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%
> 20distributed%20architecture%20design.pdf <https://issues.apache.org/
> jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%
> 20design.pdf>
> >>>>
> >>>> https://issues.apache.org/jira/secure/attachment/
> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png <
> https://issues.apache.org/jira/secure/attachment/
> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png>
> >>>>
> >>>>
> >>>> liuxun
> >>>>
> >>
> >
>
>


-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: Zeppelin distributed architecture design

Posted by liuxun <ne...@163.com>.

HI：

In order to more intuitively express the actual use of distributed zeppelin clusters.
I updated this design document, starting with the 16th page of the document, adding 2 GIF animations showing the operation record screen of the zeppelin cluster we are using now.
https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>

Distributed clustered zeppelin is already in use at our company, and the recorded screens are all real.
The first recorded screens GIF shows the following
Create a cluster of three zeppelin servers
Add 234, 235, 236 to the zeppelin.cluster.addr attribute in zeppelin-site.xml to create a cluster
Start these 3 servers at the same time
Open the web pages of these 3 servers and prepare for the notebook operation.

The second recorded screens GIF shows the following
Create an interpreter process in the cluster
Create a notebook on host234 and execute it, This action will create an interpreter process in the server with free resources in the cluster.
You can then continue editing this notebook on host235 and execute it, You can return results immediately without waiting for the time to create an interpreter process.
Again, you can continue to edit this notebook on host236. And execute it, you can return results immediately without waiting for the time to create the interpreter process
The same notebook will reuse the first created interpreter process, so you can get the execution result immediately on any server.
By looking at the background server process, you will find that host234, host235, and host235 use the same interpreter process for the same notebook.

Originally, I wanted to record the interpreter process exception. The cluster re-created the screenshot of the interpreter process in the idle server, but I am too tired now.
There is time to record later.

> 在 2018年7月19日，上午7:36，Ruslan Dautkhanov <da...@gmail.com> 写道：
> 
> Thank you luxun,
> 
> I left a couple of comments in that google document. 
> 
> -- 
> Ruslan Dautkhanov
> 
> 
> On Tue, Jul 17, 2018 at 11:30 PM liuxun <neliuxun@163.com <ma...@163.com>> wrote:
> hi，Ruslan Dautkhanov
> 
> Thank you very much for your question. according to your advice, I added 3 schematics to illustrate.
> 1. Distributed Zeppelin Deployment architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
> 
> 
> The email attachment exceeded the size limit, so I reorganized the document and updated it with Google Docs.
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
> 
> 
>> 在 2018年7月18日，下午1:03，liuxun <neliuxun@163.com <ma...@163.com>> 写道：
>> 
>> hi，Ruslan Dautkhanov
>> 
>> Thank you very much for your question. according to your advice, I added 3 schematics to illustrate.
>> 1. Zeppelin Cluster architecture diagram.
>> 2. Distributed zeppelin Server fault tolerance diagram.
>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>> 
>> Later, I will merge the schematic into the system design document.
>> 
>> <Zeppelin system architecture diagram00.png>
>> 
>> 
>> <Distributed zeppelin Server fault tolerance diagram 1.png>
>> 
>> 
>> 
>> <Distributed zeppelin Server fault tolerance diagram 2.png>
>> 
>> 
>> 
>>> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <dautkhanov@gmail.com <ma...@gmail.com>> 写道：
>>> 
>>> Nice.
>>> 
>>> Thanks for sharing.
>>> 
>>> Can you explain how are users routed into a particular zeppelin server
>>> instance? I've seen nginx on top of them, but I don't think the document
>>> covers details? If one zeppelin server goes down or unhealthy, is nginx
>>> supposed to detect (if so, how?) that and reroute users to a survived
>>> instance?
>>> 
>>> Thanks,
>>> Ruslan Dautkhanov
>>> 
>>> 
>>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliuxun@163.com <ma...@163.com>> wrote:
>>> 
>>>> hi:
>>>> 
>>>> Our company installed and deployed a lot of zeppelin for data analysis.
>>>> The single server version of zeppelin could not meet our application
>>>> scenarios, so we transformed zeppelin into a clustered service that
>>>> supports distributed deployment, Have a unified entrance, high
>>>> availability, and High server resource usage.  the email attachment is the
>>>> entire design document, I am very happy to feedback our modified code back
>>>> to the community.
>>>> 
>>>> 
>>>> this is the JIRA I submitted in the community,
>>>> 
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <https://issues.apache.org/jira/browse/ZEPPELIN-3471>
>>>> 
>>>> 
>>>> Since the design document size exceeds the mail attachment size limit, the
>>>> document link address has to be sent.
>>>> 
>>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%20design.pdf <https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%20design.pdf>
>>>> 
>>>> https://issues.apache.org/jira/secure/attachment/12931895/zepplin%20Cluster%20Sequence%20Diagram.png <https://issues.apache.org/jira/secure/attachment/12931895/zepplin%20Cluster%20Sequence%20Diagram.png>
>>>> 
>>>> 
>>>> liuxun
>>>> 
>> 
>

Re: Zeppelin distributed architecture design

Posted by liuxun <ne...@163.com>.

HI：

In order to more intuitively express the actual use of distributed zeppelin clusters.
I updated this design document, starting with the 16th page of the document, adding 2 GIF animations showing the operation record screen of the zeppelin cluster we are using now.
https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>

Distributed clustered zeppelin is already in use at our company, and the recorded screens are all real.
The first recorded screens GIF shows the following
Create a cluster of three zeppelin servers
Add 234, 235, 236 to the zeppelin.cluster.addr attribute in zeppelin-site.xml to create a cluster
Start these 3 servers at the same time
Open the web pages of these 3 servers and prepare for the notebook operation.

The second recorded screens GIF shows the following
Create an interpreter process in the cluster
Create a notebook on host234 and execute it, This action will create an interpreter process in the server with free resources in the cluster.
You can then continue editing this notebook on host235 and execute it, You can return results immediately without waiting for the time to create an interpreter process.
Again, you can continue to edit this notebook on host236. And execute it, you can return results immediately without waiting for the time to create the interpreter process
The same notebook will reuse the first created interpreter process, so you can get the execution result immediately on any server.
By looking at the background server process, you will find that host234, host235, and host235 use the same interpreter process for the same notebook.

Originally, I wanted to record the interpreter process exception. The cluster re-created the screenshot of the interpreter process in the idle server, but I am too tired now.
There is time to record later.

> 在 2018年7月19日，上午7:36，Ruslan Dautkhanov <da...@gmail.com> 写道：
> 
> Thank you luxun,
> 
> I left a couple of comments in that google document. 
> 
> -- 
> Ruslan Dautkhanov
> 
> 
> On Tue, Jul 17, 2018 at 11:30 PM liuxun <neliuxun@163.com <ma...@163.com>> wrote:
> hi，Ruslan Dautkhanov
> 
> Thank you very much for your question. according to your advice, I added 3 schematics to illustrate.
> 1. Distributed Zeppelin Deployment architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
> 
> 
> The email attachment exceeded the size limit, so I reorganized the document and updated it with Google Docs.
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
> 
> 
>> 在 2018年7月18日，下午1:03，liuxun <neliuxun@163.com <ma...@163.com>> 写道：
>> 
>> hi，Ruslan Dautkhanov
>> 
>> Thank you very much for your question. according to your advice, I added 3 schematics to illustrate.
>> 1. Zeppelin Cluster architecture diagram.
>> 2. Distributed zeppelin Server fault tolerance diagram.
>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>> 
>> Later, I will merge the schematic into the system design document.
>> 
>> <Zeppelin system architecture diagram00.png>
>> 
>> 
>> <Distributed zeppelin Server fault tolerance diagram 1.png>
>> 
>> 
>> 
>> <Distributed zeppelin Server fault tolerance diagram 2.png>
>> 
>> 
>> 
>>> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <dautkhanov@gmail.com <ma...@gmail.com>> 写道：
>>> 
>>> Nice.
>>> 
>>> Thanks for sharing.
>>> 
>>> Can you explain how are users routed into a particular zeppelin server
>>> instance? I've seen nginx on top of them, but I don't think the document
>>> covers details? If one zeppelin server goes down or unhealthy, is nginx
>>> supposed to detect (if so, how?) that and reroute users to a survived
>>> instance?
>>> 
>>> Thanks,
>>> Ruslan Dautkhanov
>>> 
>>> 
>>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliuxun@163.com <ma...@163.com>> wrote:
>>> 
>>>> hi:
>>>> 
>>>> Our company installed and deployed a lot of zeppelin for data analysis.
>>>> The single server version of zeppelin could not meet our application
>>>> scenarios, so we transformed zeppelin into a clustered service that
>>>> supports distributed deployment, Have a unified entrance, high
>>>> availability, and High server resource usage.  the email attachment is the
>>>> entire design document, I am very happy to feedback our modified code back
>>>> to the community.
>>>> 
>>>> 
>>>> this is the JIRA I submitted in the community,
>>>> 
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <https://issues.apache.org/jira/browse/ZEPPELIN-3471>
>>>> 
>>>> 
>>>> Since the design document size exceeds the mail attachment size limit, the
>>>> document link address has to be sent.
>>>> 
>>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%20design.pdf <https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%20design.pdf>
>>>> 
>>>> https://issues.apache.org/jira/secure/attachment/12931895/zepplin%20Cluster%20Sequence%20Diagram.png <https://issues.apache.org/jira/secure/attachment/12931895/zepplin%20Cluster%20Sequence%20Diagram.png>
>>>> 
>>>> 
>>>> liuxun
>>>> 
>> 
>

Re: Zeppelin distributed architecture design

Posted by Ruslan Dautkhanov <da...@gmail.com>.

Thank you luxun,

I left a couple of comments in that google document.

-- 
Ruslan Dautkhanov


On Tue, Jul 17, 2018 at 11:30 PM liuxun <ne...@163.com> wrote:

> hi，Ruslan Dautkhanov
>
> Thank you very much for your question. according to your advice, I added 3
> schematics to illustrate.
> 1. Distributed Zeppelin Deployment architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>
>
> The email attachment exceeded the size limit, so I reorganized the
> document and updated it with Google Docs.
>
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing
>
>
> 在 2018年7月18日，下午1:03，liuxun <ne...@163.com> 写道：
>
> hi，Ruslan Dautkhanov
>
> Thank you very much for your question. according to your advice, I added 3
> schematics to illustrate.
> 1. Zeppelin Cluster architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>
> Later, I will merge the schematic into the system design document.
>
> <Zeppelin system architecture diagram00.png>
>
>
> <Distributed zeppelin Server fault tolerance diagram 1.png>
>
>
>
> <Distributed zeppelin Server fault tolerance diagram 2.png>
>
>
>
> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <da...@gmail.com> 写道：
>
> Nice.
>
> Thanks for sharing.
>
> Can you explain how are users routed into a particular zeppelin server
> instance? I've seen nginx on top of them, but I don't think the document
> covers details? If one zeppelin server goes down or unhealthy, is nginx
> supposed to detect (if so, how?) that and reroute users to a survived
> instance?
>
> Thanks,
> Ruslan Dautkhanov
>
>
> On Tue, Jul 17, 2018 at 2:46 AM liuxun <ne...@163.com> wrote:
>
> hi:
>
> Our company installed and deployed a lot of zeppelin for data analysis.
> The single server version of zeppelin could not meet our application
> scenarios, so we transformed zeppelin into a clustered service that
> supports distributed deployment, Have a unified entrance, high
> availability, and High server resource usage.  the email attachment is the
> entire design document, I am very happy to feedback our modified code back
> to the community.
>
>
> this is the JIRA I submitted in the community,
>
> https://issues.apache.org/jira/browse/ZEPPELIN-3471
>
>
> Since the design document size exceeds the mail attachment size limit, the
> document link address has to be sent.
>
>
> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%20design.pdf
>
>
> https://issues.apache.org/jira/secure/attachment/12931895/zepplin%20Cluster%20Sequence%20Diagram.png
>
>
> liuxun
>
>
>
>

Re: Zeppelin distributed architecture design

Posted by liuxun <ne...@163.com>.

hi，Ruslan Dautkhanov

Thank you very much for your question. according to your advice, I added 3 schematics to illustrate.
1. Distributed Zeppelin Deployment architecture diagram.
2. Distributed zeppelin Server fault tolerance diagram.
3. Distributed zeppelin Server & intp process fault tolerance diagram.


The email attachment exceeded the size limit, so I reorganized the document and updated it with Google Docs.
https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>


> 在 2018年7月18日，下午1:03，liuxun <ne...@163.com> 写道：
> 
> hi，Ruslan Dautkhanov
> 
> Thank you very much for your question. according to your advice, I added 3 schematics to illustrate.
> 1. Zeppelin Cluster architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
> 
> Later, I will merge the schematic into the system design document.
> 
> <Zeppelin system architecture diagram00.png>
> 
> 
> <Distributed zeppelin Server fault tolerance diagram 1.png>
> 
> 
> 
> <Distributed zeppelin Server fault tolerance diagram 2.png>
> 
> 
> 
>> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <dautkhanov@gmail.com <ma...@gmail.com>> 写道：
>> 
>> Nice.
>> 
>> Thanks for sharing.
>> 
>> Can you explain how are users routed into a particular zeppelin server
>> instance? I've seen nginx on top of them, but I don't think the document
>> covers details? If one zeppelin server goes down or unhealthy, is nginx
>> supposed to detect (if so, how?) that and reroute users to a survived
>> instance?
>> 
>> Thanks,
>> Ruslan Dautkhanov
>> 
>> 
>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliuxun@163.com <ma...@163.com>> wrote:
>> 
>>> hi:
>>> 
>>> Our company installed and deployed a lot of zeppelin for data analysis.
>>> The single server version of zeppelin could not meet our application
>>> scenarios, so we transformed zeppelin into a clustered service that
>>> supports distributed deployment, Have a unified entrance, high
>>> availability, and High server resource usage.  the email attachment is the
>>> entire design document, I am very happy to feedback our modified code back
>>> to the community.
>>> 
>>> 
>>> this is the JIRA I submitted in the community,
>>> 
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <https://issues.apache.org/jira/browse/ZEPPELIN-3471>
>>> 
>>> 
>>> Since the design document size exceeds the mail attachment size limit, the
>>> document link address has to be sent.
>>> 
>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%20design.pdf
>>> 
>>> https://issues.apache.org/jira/secure/attachment/12931895/zepplin%20Cluster%20Sequence%20Diagram.png
>>> 
>>> 
>>> liuxun
>>> 
>

Re: Zeppelin distributed architecture design

Posted by liuxun <ne...@163.com>.

hi，Ruslan Dautkhanov

Thank you very much for your question. according to your advice, I added 3 schematics to illustrate.
1. Distributed Zeppelin Deployment architecture diagram.
2. Distributed zeppelin Server fault tolerance diagram.
3. Distributed zeppelin Server & intp process fault tolerance diagram.


The email attachment exceeded the size limit, so I reorganized the document and updated it with Google Docs.
https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>


> 在 2018年7月18日，下午1:03，liuxun <ne...@163.com> 写道：
> 
> hi，Ruslan Dautkhanov
> 
> Thank you very much for your question. according to your advice, I added 3 schematics to illustrate.
> 1. Zeppelin Cluster architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
> 
> Later, I will merge the schematic into the system design document.
> 
> <Zeppelin system architecture diagram00.png>
> 
> 
> <Distributed zeppelin Server fault tolerance diagram 1.png>
> 
> 
> 
> <Distributed zeppelin Server fault tolerance diagram 2.png>
> 
> 
> 
>> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <dautkhanov@gmail.com <ma...@gmail.com>> 写道：
>> 
>> Nice.
>> 
>> Thanks for sharing.
>> 
>> Can you explain how are users routed into a particular zeppelin server
>> instance? I've seen nginx on top of them, but I don't think the document
>> covers details? If one zeppelin server goes down or unhealthy, is nginx
>> supposed to detect (if so, how?) that and reroute users to a survived
>> instance?
>> 
>> Thanks,
>> Ruslan Dautkhanov
>> 
>> 
>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliuxun@163.com <ma...@163.com>> wrote:
>> 
>>> hi:
>>> 
>>> Our company installed and deployed a lot of zeppelin for data analysis.
>>> The single server version of zeppelin could not meet our application
>>> scenarios, so we transformed zeppelin into a clustered service that
>>> supports distributed deployment, Have a unified entrance, high
>>> availability, and High server resource usage.  the email attachment is the
>>> entire design document, I am very happy to feedback our modified code back
>>> to the community.
>>> 
>>> 
>>> this is the JIRA I submitted in the community,
>>> 
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <https://issues.apache.org/jira/browse/ZEPPELIN-3471>
>>> 
>>> 
>>> Since the design document size exceeds the mail attachment size limit, the
>>> document link address has to be sent.
>>> 
>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%20design.pdf
>>> 
>>> https://issues.apache.org/jira/secure/attachment/12931895/zepplin%20Cluster%20Sequence%20Diagram.png
>>> 
>>> 
>>> liuxun
>>> 
>

Re: Zeppelin distributed architecture design

Posted by liuxun <ne...@163.com>.

hi，Ruslan Dautkhanov

Thank you very much for your question. according to your advice, I added 3 schematics to illustrate.
1. Zeppelin Cluster architecture diagram.
2. Distributed zeppelin Server fault tolerance diagram.
3. Distributed zeppelin Server & intp process fault tolerance diagram.

Later, I will merge the schematic into the system design document.












> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <da...@gmail.com> 写道：
> 
> Nice.
> 
> Thanks for sharing.
> 
> Can you explain how are users routed into a particular zeppelin server
> instance? I've seen nginx on top of them, but I don't think the document
> covers details? If one zeppelin server goes down or unhealthy, is nginx
> supposed to detect (if so, how?) that and reroute users to a survived
> instance?
> 
> Thanks,
> Ruslan Dautkhanov
> 
> 
> On Tue, Jul 17, 2018 at 2:46 AM liuxun <ne...@163.com> wrote:
> 
>> hi:
>> 
>> Our company installed and deployed a lot of zeppelin for data analysis.
>> The single server version of zeppelin could not meet our application
>> scenarios, so we transformed zeppelin into a clustered service that
>> supports distributed deployment, Have a unified entrance, high
>> availability, and High server resource usage.  the email attachment is the
>> entire design document, I am very happy to feedback our modified code back
>> to the community.
>> 
>> 
>> this is the JIRA I submitted in the community,
>> 
>> https://issues.apache.org/jira/browse/ZEPPELIN-3471
>> 
>> 
>> Since the design document size exceeds the mail attachment size limit, the
>> document link address has to be sent.
>> 
>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%20design.pdf
>> 
>> https://issues.apache.org/jira/secure/attachment/12931895/zepplin%20Cluster%20Sequence%20Diagram.png
>> 
>> 
>> liuxun
>>

Re: Zeppelin distributed architecture design

Posted by Ruslan Dautkhanov <da...@gmail.com>.

Nice.

Thanks for sharing.

Can you explain how are users routed into a particular zeppelin server
instance? I've seen nginx on top of them, but I don't think the document
covers details? If one zeppelin server goes down or unhealthy, is nginx
supposed to detect (if so, how?) that and reroute users to a survived
instance?

Thanks,
Ruslan Dautkhanov


On Tue, Jul 17, 2018 at 2:46 AM liuxun <ne...@163.com> wrote:

> hi:
>
> Our company installed and deployed a lot of zeppelin for data analysis.
> The single server version of zeppelin could not meet our application
> scenarios, so we transformed zeppelin into a clustered service that
> supports distributed deployment, Have a unified entrance, high
> availability, and High server resource usage.  the email attachment is the
> entire design document, I am very happy to feedback our modified code back
> to the community.
>
>
> this is the JIRA I submitted in the community,
>
> https://issues.apache.org/jira/browse/ZEPPELIN-3471
>
>
> Since the design document size exceeds the mail attachment size limit, the
> document link address has to be sent.
>
> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%20design.pdf
>
> https://issues.apache.org/jira/secure/attachment/12931895/zepplin%20Cluster%20Sequence%20Diagram.png
>
>
> liuxun
>

Re: Zeppelin distributed architecture design

Posted by Ruslan Dautkhanov <da...@gmail.com>.

Nice.

Thanks for sharing.

Can you explain how are users routed into a particular zeppelin server
instance? I've seen nginx on top of them, but I don't think the document
covers details? If one zeppelin server goes down or unhealthy, is nginx
supposed to detect (if so, how?) that and reroute users to a survived
instance?

Thanks,
Ruslan Dautkhanov


On Tue, Jul 17, 2018 at 2:46 AM liuxun <ne...@163.com> wrote:

> hi:
>
> Our company installed and deployed a lot of zeppelin for data analysis.
> The single server version of zeppelin could not meet our application
> scenarios, so we transformed zeppelin into a clustered service that
> supports distributed deployment, Have a unified entrance, high
> availability, and High server resource usage.  the email attachment is the
> entire design document, I am very happy to feedback our modified code back
> to the community.
>
>
> this is the JIRA I submitted in the community,
>
> https://issues.apache.org/jira/browse/ZEPPELIN-3471
>
>
> Since the design document size exceeds the mail attachment size limit, the
> document link address has to be sent.
>
> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%20design.pdf
>
> https://issues.apache.org/jira/secure/attachment/12931895/zepplin%20Cluster%20Sequence%20Diagram.png
>
>
> liuxun
>