You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by Pranav Kumar Agarwal <pr...@gmail.com> on 2015/07/30 13:58:37 UTC
Re: why zeppelin SparkInterpreter use FIFOScheduler
Hi Moon,
How about tracking dedicated SparkContext for a notebook in Spark's
remote interpreter - this will allow multiple users to run their spark
paragraphs in parallel. Also, within a notebook only one paragraph is
executed at a time.
Regards,
-Pranav.
On 15/07/15 7:15 pm, moon soo Lee wrote:
> Hi,
>
> Thanks for asking question.
>
> The reason is simply because of it is running code statements. The
> statements can have order and dependency. Imagine i have two paragraphs
>
> %spark
> val a = 1
>
> %spark
> print(a)
>
> If they're not running one by one, that means they possibly runs in
> random order and the output will be always different. Either '1' or
> 'val a can not found'.
>
> This is the reason why. But if there are nice idea to handle this
> problem i agree using parallel scheduler would help a lot.
>
> Thanks,
> moon
> On 2015년 7월 14일 (화) at 오후 7:59 linxi zeng
> <linxizeng0615@gmail.com <ma...@gmail.com>> wrote:
>
> any one who have the same question with me? or this is not a question?
>
> 2015-07-14 11:47 GMT+08:00 linxi zeng <linxizeng0615@gmail.com
> <ma...@gmail.com>>:
>
> hi, Moon:
> I notice that the getScheduler function in the
> SparkInterpreter.java return a FIFOScheduler which makes the
> spark interpreter run spark job one by one. It's not a good
> experience when couple of users do some work on zeppelin at
> the same time, because they have to wait for each other.
> And at the same time, SparkSqlInterpreter can chose what
> scheduler to use by "zeppelin.spark.concurrentSQL".
> My question is, what kind of consideration do you based on to
> make such a decision?
>
>
Re: why zeppelin SparkInterpreter use FIFOScheduler
Posted by moon soo Lee <mo...@apache.org>.
Hi Pranav,
I think we need to think Scala compiler and SparkContext separately.
If Scala compiler is dedicated for a notebook, run paragraphs in different
notebooks in parallel will not be a problem. (Even if SparkContext is not
dedicated for a notebook. SparkContext is already thread safe and have fair
scheduler inside).
So, I think dedicated Scala compiler for a notebook, with shared
SparkContext (we can still use fair scheduler) would help.
Thanks,
moon
On Thu, Jul 30, 2015 at 8:53 PM Pranav Kumar Agarwal <pr...@gmail.com>
wrote:
> Hi Moon,
>
> How about tracking dedicated SparkContext for a notebook in Spark's
> remote interpreter - this will allow multiple users to run their spark
> paragraphs in parallel. Also, within a notebook only one paragraph is
> executed at a time.
>
> Regards,
> -Pranav.
>
>
> On 15/07/15 7:15 pm, moon soo Lee wrote:
> > Hi,
> >
> > Thanks for asking question.
> >
> > The reason is simply because of it is running code statements. The
> > statements can have order and dependency. Imagine i have two paragraphs
> >
> > %spark
> > val a = 1
> >
> > %spark
> > print(a)
> >
> > If they're not running one by one, that means they possibly runs in
> > random order and the output will be always different. Either '1' or
> > 'val a can not found'.
> >
> > This is the reason why. But if there are nice idea to handle this
> > problem i agree using parallel scheduler would help a lot.
> >
> > Thanks,
> > moon
> > On 2015년 7월 14일 (화) at 오후 7:59 linxi zeng
> > <linxizeng0615@gmail.com <ma...@gmail.com>> wrote:
> >
> > any one who have the same question with me? or this is not a
> question?
> >
> > 2015-07-14 11:47 GMT+08:00 linxi zeng <linxizeng0615@gmail.com
> > <ma...@gmail.com>>:
> >
> > hi, Moon:
> > I notice that the getScheduler function in the
> > SparkInterpreter.java return a FIFOScheduler which makes the
> > spark interpreter run spark job one by one. It's not a good
> > experience when couple of users do some work on zeppelin at
> > the same time, because they have to wait for each other.
> > And at the same time, SparkSqlInterpreter can chose what
> > scheduler to use by "zeppelin.spark.concurrentSQL".
> > My question is, what kind of consideration do you based on to
> > make such a decision?
> >
> >
>
>