You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Chenyang Zhang <Ch...@c3.ai> on 2022/06/27 17:08:28 UTC

Inquiry about how spark session is shared in apache zeppelin

Hi,

Here is Chenyang. I am working on a project using PySpark and I am blocked because I want to share data between different Spark applications. The situation is that we have a running java server which can handles incoming requests with a thread pool, and each thread has a corresponding python process. We want to use pandas on Spark, but have it so that any of the python processes can access the same data in spark. For example, in a python process, we created a SparkSession, read some data, modified the data using pandas on Spark api and we want to get access to that data in a different python process. Someone from Spark community point me to Apache Zeppelin because it implements logic to share one Spark Session. How did you achieve that? Are there any documentation or references I can refer to? Thanks so much for your help.

Best regards,
Chenyang



Chenyang Zhang
Software Engineering Intern, Platform
Redwood City, California

[cid:EnterpriseAI_Banner_1200_e6f8b810-93f3-44f1-b795-bb502b7a52ae.png]<https://c3.ai/?utm_source=signature&utm_campaign=enterpriseai>
(c) 2022 C3.ai. Confidential Information.

Re: Inquiry about how spark session is shared in apache zeppelin

Posted by Chenyang Zhang <Ch...@c3.ai>.

Thanks for the response. I am wondering how you achieve that because I want to implement such mechanism in my project. Are there any source codes I could refer to?

From: Jeff Zhang <zj...@gmail.com>
Date: Monday, June 27, 2022 at 7:30 PM
To: users <us...@zeppelin.apache.org>
Subject: Re: Inquiry about how spark session is shared in apache zeppelin
EXTERNAL EMAIL: This email originated outside of our organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
You can use per-note scoped mode, so that there will be multiple python processes but share the same spark session.
Check this doc for more details https://zeppelin.apache.org/docs/0.10.1/usage/interpreter/interpreter_binding_mode.html<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzeppelin.apache.org%2Fdocs%2F0.10.1%2Fusage%2Finterpreter%2Finterpreter_binding_mode.html&data=05%7C01%7CChenyang.Zhang%40c3.ai%7Cd9b5570112264830650d08da58ae3419%7C53ad779a93e7485cba20ac8290d7252b%7C1%7C0%7C637919802479875515%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0eAtPwtMwDkExaFtvLnyS1jp55iUA7KgNssTZR7eCoU%3D&reserved=0>

On Tue, Jun 28, 2022 at 1:08 AM Chenyang Zhang <Ch...@c3.ai>> wrote:
Hi,

Here is Chenyang. I am working on a project using PySpark and I am blocked because I want to share data between different Spark applications. The situation is that we have a running java server which can handles incoming requests with a thread pool, and each thread has a corresponding python process. We want to use pandas on Spark, but have it so that any of the python processes can access the same data in spark. For example, in a python process, we created a SparkSession, read some data, modified the data using pandas on Spark api and we want to get access to that data in a different python process. Someone from Spark community point me to Apache Zeppelin because it implements logic to share one Spark Session. How did you achieve that? Are there any documentation or references I can refer to? Thanks so much for your help.

Best regards,
Chenyang

Chenyang Zhang
Software Engineering Intern, Platform
Redwood City, California
[cid:image001.png@01D88AE0.C7ECC540]<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fc3.ai%2F%3Futm_source%3Dsignature%26utm_campaign%3Denterpriseai&data=05%7C01%7CChenyang.Zhang%40c3.ai%7Cd9b5570112264830650d08da58ae3419%7C53ad779a93e7485cba20ac8290d7252b%7C1%7C0%7C637919802479875515%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BjrcnnaoKHiLUbdGj2%2FV7CcIvXhgA62Cgw07qTajxrg%3D&reserved=0>
© 2022 C3.ai. Confidential Information.

--
Best Regards

Jeff Zhang

Re: Inquiry about how spark session is shared in apache zeppelin

Posted by Jeff Zhang <zj...@gmail.com>.

You can use per-note scoped mode, so that there will be multiple python
processes but share the same spark session.
Check this doc for more details
https://zeppelin.apache.org/docs/0.10.1/usage/interpreter/interpreter_binding_mode.html


On Tue, Jun 28, 2022 at 1:08 AM Chenyang Zhang <Ch...@c3.ai> wrote:

> Hi,
>
>
>
> Here is Chenyang. I am working on a project using PySpark and I am blocked
> because I want to share data between different Spark applications. The
> situation is that we have a running java server which can handles incoming
> requests with a thread pool, and each thread has a corresponding python
> process. We want to use pandas on Spark, but have it so that any of the
> python processes can access the same data in spark. For example, in a
> python process, we created a SparkSession, read some data, modified the
> data using pandas on Spark api and we want to get access to that data in a
> different python process. Someone from Spark community point me to Apache
> Zeppelin because it implements logic to share one Spark Session. How did
> you achieve that? Are there any documentation or references I can refer to?
> Thanks so much for your help.
>
>
>
> Best regards,
>
> Chenyang
>
>
>
>
> *Chenyang Zhang*
> Software Engineering Intern, Platform
> Redwood City, California
> <https://c3.ai/?utm_source=signature&utm_campaign=enterpriseai>
> © 2022 C3.ai. Confidential Information.
>
>

-- 
Best Regards

Jeff Zhang