You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Joshua Conlin <co...@gmail.com> on 2020/07/21 14:58:43 UTC

Monitoring a Notebook in Spark UI

Hello,

I'm looking for documentation to better understand pyspark/scala notebook
execution in Spark.  I typically see application runtimes that can be very
long, is there always a spark "application" running for a notebook or
zeppelin session?  Those that are not actually being run in zeppelin
typically have very low resource utilization.  Are these applications in
spark tied to the zeppelin user's session?

Also, how can I find out more about hive, pyspark and scala interpreter
concurrency?  How many users/notebooks/paragraphs can execute these
interpreters concurrently and how is this tunable?

Any insight you can provide would be appreciated.

Thanks,

Josh

Re: Monitoring a Notebook in Spark UI

Posted by Jeff Zhang <zj...@gmail.com>.

Hi Stephane,

I mean running spark sql job concurrently via %spark.sql just by
setting zeppelin.spark.concurrentSQL
to be true.

See the details here
http://zeppelin.apache.org/docs/0.9.0-preview1/interpreter/spark.html#sparksql

<st...@orange.com> 于2020年7月22日周三 上午12:21写道：

> Hi Jeff,
>
>
>
> * You can also run multiple spark sql jobs concurrently in one spark app*
>
>
>
> Can you please elaborate on this? What I see (with Zeppelin 0.8) is that
> with shared interpreter, each job is ran one after one. When going to one
> interpreter per user, many users can run a job at the same time, but each
> user can run only one job at one time. How is it possible to run multiple
> sql jobs concurrently in one spark app?
>
>
>
> Thanks,
>
>
>
> Stéphane
>
>
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Tuesday, July 21, 2020 17:54
> *To:* users
> *Subject:* Re: Monitoring a Notebook in Spark UI
>
>
>
> Regarding how many spark apps, it depends on the interpreter binding mode,
> you can refer to this document.
> http://zeppelin.apache.org/docs/0.9.0-preview1/usage/interpreter/interpreter_binding_mode.html
>
> Internally, each spark app run a scala shell to execute scala code and
> python shell to execute pyspark code.
>
>
>
> Regarding the interpreter concurrency,  it depends on how you define
> interpreter concurrency, you can run each spark app for each user or each
> note, that depends on the interpreter binding mode I refer above. You can
> also run multiple spark sql jobs concurrently in one spark app
>
>
>
> Joshua Conlin <co...@gmail.com> 于2020年7月21日周二 下午11:00写道：
>
> Hello,
>
>
>
> I'm looking for documentation to better understand pyspark/scala notebook
> execution in Spark.  I typically see application runtimes that can be very
> long, is there always a spark "application" running for a notebook or
> zeppelin session?  Those that are not actually being run in zeppelin
> typically have very low resource utilization.  Are these applications in
> spark tied to the zeppelin user's session?
>
>
>
> Also, how can I find out more about hive, pyspark and scala interpreter
> concurrency?  How many users/notebooks/paragraphs can execute these
> interpreters concurrently and how is this tunable?
>
>
>
> Any insight you can provide would be appreciated.
>
>
>
> Thanks,
>
>
>
> Josh
>
>
>
>
> --
>
> Best Regards
>
> Jeff Zhang
>


-- 
Best Regards

Jeff Zhang

RE: Monitoring a Notebook in Spark UI

Posted by st...@orange.com.

Hi Jeff,

 

You can also run multiple spark sql jobs concurrently in one spark app

 

Can you please elaborate on this? What I see (with Zeppelin 0.8) is that with shared interpreter, each job is ran one after one. When going to one interpreter per user, many users can run a job at the same time, but each user can run only one job at one time. How is it possible to run multiple sql jobs concurrently in one spark app?

 

Thanks,

 

Stéphane

 

 

From: Jeff Zhang [mailto:zjffdu@gmail.com] 
Sent: Tuesday, July 21, 2020 17:54
To: users
Subject: Re: Monitoring a Notebook in Spark UI

 

Regarding how many spark apps, it depends on the interpreter binding mode, you can refer to this document. http://zeppelin.apache.org/docs/0.9.0-preview1/usage/interpreter/interpreter_binding_mode.html

Internally, each spark app run a scala shell to execute scala code and python shell to execute pyspark code.

 

Regarding the interpreter concurrency,  it depends on how you define interpreter concurrency, you can run each spark app for each user or each note, that depends on the interpreter binding mode I refer above. You can also run multiple spark sql jobs concurrently in one spark app

 

Joshua Conlin <conlin.joshua@gmail.com <ma...@gmail.com> > 于2020年7月21日周二 下午11:00写道：

Hello,

 

I'm looking for documentation to better understand pyspark/scala notebook execution in Spark.  I typically see application runtimes that can be very long, is there always a spark "application" running for a notebook or zeppelin session?  Those that are not actually being run in zeppelin typically have very low resource utilization.  Are these applications in spark tied to the zeppelin user's session?

 

Also, how can I find out more about hive, pyspark and scala interpreter concurrency?  How many users/notebooks/paragraphs can execute these interpreters concurrently and how is this tunable?

 

Any insight you can provide would be appreciated.

 

Thanks,

 

Josh




 

-- 

Best Regards

Jeff Zhang

Re: Monitoring a Notebook in Spark UI

Posted by Jeff Zhang <zj...@gmail.com>.

Regarding how many spark apps, it depends on the interpreter binding mode,
you can refer to this document.
http://zeppelin.apache.org/docs/0.9.0-preview1/usage/interpreter/interpreter_binding_mode.html
Internally, each spark app run a scala shell to execute scala code and
python shell to execute pyspark code.

Regarding the interpreter concurrency,  it depends on how you define
interpreter concurrency, you can run each spark app for each user or each
note, that depends on the interpreter binding mode I refer above. You can
also run multiple spark sql jobs concurrently in one spark app

Joshua Conlin <co...@gmail.com> 于2020年7月21日周二 下午11:00写道：

> Hello,
>
> I'm looking for documentation to better understand pyspark/scala notebook
> execution in Spark.  I typically see application runtimes that can be very
> long, is there always a spark "application" running for a notebook or
> zeppelin session?  Those that are not actually being run in zeppelin
> typically have very low resource utilization.  Are these applications in
> spark tied to the zeppelin user's session?
>
> Also, how can I find out more about hive, pyspark and scala interpreter
> concurrency?  How many users/notebooks/paragraphs can execute these
> interpreters concurrently and how is this tunable?
>
> Any insight you can provide would be appreciated.
>
> Thanks,
>
> Josh
>


-- 
Best Regards

Jeff Zhang