You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by Pranav Kumar Agarwal <pr...@gmail.com> on 2015/07/30 13:58:37 UTC

Re: why zeppelin SparkInterpreter use FIFOScheduler

Hi Moon,

How about tracking dedicated SparkContext for a notebook in Spark's 
remote interpreter - this will allow multiple users to run their spark 
paragraphs in parallel. Also, within a notebook only one paragraph is 
executed at a time.

Regards,
-Pranav.


On 15/07/15 7:15 pm, moon soo Lee wrote:
> Hi,
>
> Thanks for asking question.
>
> The reason is simply because of it is running code statements. The 
> statements can have order and dependency. Imagine i have two paragraphs
>
> %spark
> val a = 1
>
> %spark
> print(a)
>
> If they're not running one by one, that means they possibly runs in 
> random order and the output will be always different. Either '1' or 
> 'val a can not found'.
>
> This is the reason why. But if there are nice idea to handle this 
> problem i agree using parallel scheduler would help a lot.
>
> Thanks,
> moon
> On 2015년 7월 14일 (화) at 오후 7:59 linxi zeng 
> <linxizeng0615@gmail.com <ma...@gmail.com>> wrote:
>
>     any one who have the same question with me? or this is not a question?
>
>     2015-07-14 11:47 GMT+08:00 linxi zeng <linxizeng0615@gmail.com
>     <ma...@gmail.com>>:
>
>         hi, Moon:
>            I notice that the getScheduler function in the
>         SparkInterpreter.java return a FIFOScheduler which makes the
>         spark interpreter run spark job one by one. It's not a good
>         experience when couple of users do some work on zeppelin at
>         the same time, because they have to wait for each other.
>         And at the same time, SparkSqlInterpreter can chose what
>         scheduler to use by "zeppelin.spark.concurrentSQL".
>         My question is, what kind of consideration do you based on to
>         make such a decision?
>
>

Re: why zeppelin SparkInterpreter use FIFOScheduler

Posted by moon soo Lee <mo...@apache.org>.

Hi Pranav,

I think we need to think Scala compiler and SparkContext separately.
If Scala compiler is dedicated for a notebook, run paragraphs in different
notebooks in parallel will not be a problem. (Even if SparkContext is not
dedicated for a notebook. SparkContext is already thread safe and have fair
scheduler inside).

So, I think dedicated Scala compiler for a notebook, with shared
SparkContext (we can still use fair scheduler) would help.

Thanks,
moon

On Thu, Jul 30, 2015 at 8:53 PM Pranav Kumar Agarwal <pr...@gmail.com>
wrote:

> Hi Moon,
>
> How about tracking dedicated SparkContext for a notebook in Spark's
> remote interpreter - this will allow multiple users to run their spark
> paragraphs in parallel. Also, within a notebook only one paragraph is
> executed at a time.
>
> Regards,
> -Pranav.
>
>
> On 15/07/15 7:15 pm, moon soo Lee wrote:
> > Hi,
> >
> > Thanks for asking question.
> >
> > The reason is simply because of it is running code statements. The
> > statements can have order and dependency. Imagine i have two paragraphs
> >
> > %spark
> > val a = 1
> >
> > %spark
> > print(a)
> >
> > If they're not running one by one, that means they possibly runs in
> > random order and the output will be always different. Either '1' or
> > 'val a can not found'.
> >
> > This is the reason why. But if there are nice idea to handle this
> > problem i agree using parallel scheduler would help a lot.
> >
> > Thanks,
> > moon
> > On 2015년 7월 14일 (화) at 오후 7:59 linxi zeng
> > <linxizeng0615@gmail.com <ma...@gmail.com>> wrote:
> >
> >     any one who have the same question with me? or this is not a
> question?
> >
> >     2015-07-14 11:47 GMT+08:00 linxi zeng <linxizeng0615@gmail.com
> >     <ma...@gmail.com>>:
> >
> >         hi, Moon:
> >            I notice that the getScheduler function in the
> >         SparkInterpreter.java return a FIFOScheduler which makes the
> >         spark interpreter run spark job one by one. It's not a good
> >         experience when couple of users do some work on zeppelin at
> >         the same time, because they have to wait for each other.
> >         And at the same time, SparkSqlInterpreter can chose what
> >         scheduler to use by "zeppelin.spark.concurrentSQL".
> >         My question is, what kind of consideration do you based on to
> >         make such a decision?
> >
> >
>
>