You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by mu...@googlemail.com on 2017/04/05 20:35:50 UTC

Other paragraphs do not wait for %sh paragraphs to finish.

I often have notebooks that have a %sh as the 1st paragraph. This scps some
file from another server, and then a number of spark or sparksql paragraphs
are after that.

If I click on the run-all paragraphs at the top of the notebook the 1st %sh
paragraph kicks off as expected, but the 2nd %spark notebook starts too at
the same time. The others go into pending state and then start once the
spark one has completed.

Is this a bug? Or am I doing something wrong?

Thanks

Re: Other paragraphs do not wait for %sh paragraphs to finish.

Posted by Ruslan Dautkhanov <da...@gmail.com>.
Filed https://issues.apache.org/jira/browse/ZEPPELIN-2368

We had users asking the same.. it forced them to run paragraphs one by one
manually.




-- 
Ruslan Dautkhanov

On Wed, Apr 5, 2017 at 4:57 PM, moon soo Lee <mo...@apache.org> wrote:

> Hi,
>
> That's expected behavior at the moment. The reason is
>
> Each interpreter has it's own scheduler (either FIFO, Parallel), and
> run-all just submit all paragraphs into target interpreter's scheduler.
>
> I think we can add feature such as run-all-sequentially.
> Do you mind file a JIRA issue?
>
> Thanks,
> moon
>
> On Thu, Apr 6, 2017 at 5:35 AM <mu...@googlemail.com> wrote:
>
>> I often have notebooks that have a %sh as the 1st paragraph. This scps
>> some file from another server, and then a number of spark or sparksql
>> paragraphs are after that.
>>
>> If I click on the run-all paragraphs at the top of the notebook the 1st
>> %sh paragraph kicks off as expected, but the 2nd %spark notebook starts too
>> at the same time. The others go into pending state and then start once the
>> spark one has completed.
>>
>> Is this a bug? Or am I doing something wrong?
>>
>> Thanks
>>
>>

Re: Other paragraphs do not wait for %sh paragraphs to finish.

Posted by Jeff Zhang <zj...@gmail.com>.
"depends on previous paragraph" could be the default behavior is no deps is
specified. Specifying dependencies explicitly could benefit the
performance. e.g. In the spark tutorial note, the 3 sql could run at the
same time independently.



Ruslan Dautkhanov <da...@gmail.com>于2017年4月7日周五 上午1:09写道:

> Apart from introducing a full-blown graph of DAG dependencies, a simpler
> solution
> might be introducing a paragraph-level property "depends on previous
> paragraph" (boolean),
> so in run-all-paragraphs run, this particular paragraph wouldn't be
> scheduled until
> previous one is complete (without errors).
>
> It will be a compromise between completely sequential run and having a way
> to define a DAG.
>
>
>
> --
> Ruslan Dautkhanov
>
> On Thu, Apr 6, 2017 at 1:32 AM, Jeff Zhang <zj...@gmail.com> wrote:
>
>
> That's correct, it needs define dependency between paragraphs, e.g.
>  %spark(deps=p1), so that we can build DAG for the whole pipeline.
>
>
>
>
>
> Rick Moritz <ra...@gmail.com>于2017年4月6日周四 下午3:28写道:
>
> This actually calls for a dependency definition of notes within a
> notebook, so the scheduler can decide which tasks to run simultaneously.
> I suggest a simple counter of dependency levels, which by default
> increases with every new note and can be decremented to allow notes to run
> simultaneously. Run-all then submits each level into the target
> interpreters for this level, awaits termination of all results, and then
> starts the next level's note.
>
>
> On Thu, Apr 6, 2017 at 12:57 AM, moon soo Lee <mo...@apache.org> wrote:
>
> Hi,
>
> That's expected behavior at the moment. The reason is
>
> Each interpreter has it's own scheduler (either FIFO, Parallel), and
> run-all just submit all paragraphs into target interpreter's scheduler.
>
> I think we can add feature such as run-all-sequentially.
> Do you mind file a JIRA issue?
>
> Thanks,
> moon
>
> On Thu, Apr 6, 2017 at 5:35 AM <mu...@googlemail.com> wrote:
>
> I often have notebooks that have a %sh as the 1st paragraph. This scps
> some file from another server, and then a number of spark or sparksql
> paragraphs are after that.
>
> If I click on the run-all paragraphs at the top of the notebook the 1st
> %sh paragraph kicks off as expected, but the 2nd %spark notebook starts too
> at the same time. The others go into pending state and then start once the
> spark one has completed.
>
> Is this a bug? Or am I doing something wrong?
>
> Thanks
>
>
>
>

Re: Other paragraphs do not wait for %sh paragraphs to finish.

Posted by Ruslan Dautkhanov <da...@gmail.com>.
Apart from introducing a full-blown graph of DAG dependencies, a simpler
solution
might be introducing a paragraph-level property "depends on previous
paragraph" (boolean),
so in run-all-paragraphs run, this particular paragraph wouldn't be
scheduled until
previous one is complete (without errors).

It will be a compromise between completely sequential run and having a way
to define a DAG.



-- 
Ruslan Dautkhanov

On Thu, Apr 6, 2017 at 1:32 AM, Jeff Zhang <zj...@gmail.com> wrote:

>
> That's correct, it needs define dependency between paragraphs, e.g.
>  %spark(deps=p1), so that we can build DAG for the whole pipeline.
>
>
>
>
>
> Rick Moritz <ra...@gmail.com>于2017年4月6日周四 下午3:28写道:
>
>> This actually calls for a dependency definition of notes within a
>> notebook, so the scheduler can decide which tasks to run simultaneously.
>> I suggest a simple counter of dependency levels, which by default
>> increases with every new note and can be decremented to allow notes to run
>> simultaneously. Run-all then submits each level into the target
>> interpreters for this level, awaits termination of all results, and then
>> starts the next level's note.
>>
>>
>> On Thu, Apr 6, 2017 at 12:57 AM, moon soo Lee <mo...@apache.org> wrote:
>>
>> Hi,
>>
>> That's expected behavior at the moment. The reason is
>>
>> Each interpreter has it's own scheduler (either FIFO, Parallel), and
>> run-all just submit all paragraphs into target interpreter's scheduler.
>>
>> I think we can add feature such as run-all-sequentially.
>> Do you mind file a JIRA issue?
>>
>> Thanks,
>> moon
>>
>> On Thu, Apr 6, 2017 at 5:35 AM <mu...@googlemail.com> wrote:
>>
>> I often have notebooks that have a %sh as the 1st paragraph. This scps
>> some file from another server, and then a number of spark or sparksql
>> paragraphs are after that.
>>
>> If I click on the run-all paragraphs at the top of the notebook the 1st
>> %sh paragraph kicks off as expected, but the 2nd %spark notebook starts too
>> at the same time. The others go into pending state and then start once the
>> spark one has completed.
>>
>> Is this a bug? Or am I doing something wrong?
>>
>> Thanks
>>
>>
>>

Re: Other paragraphs do not wait for %sh paragraphs to finish.

Posted by Jeff Zhang <zj...@gmail.com>.
That's correct, it needs define dependency between paragraphs, e.g.
 %spark(deps=p1), so that we can build DAG for the whole pipeline.





Rick Moritz <ra...@gmail.com>于2017年4月6日周四 下午3:28写道:

> This actually calls for a dependency definition of notes within a
> notebook, so the scheduler can decide which tasks to run simultaneously.
> I suggest a simple counter of dependency levels, which by default
> increases with every new note and can be decremented to allow notes to run
> simultaneously. Run-all then submits each level into the target
> interpreters for this level, awaits termination of all results, and then
> starts the next level's note.
>
>
> On Thu, Apr 6, 2017 at 12:57 AM, moon soo Lee <mo...@apache.org> wrote:
>
> Hi,
>
> That's expected behavior at the moment. The reason is
>
> Each interpreter has it's own scheduler (either FIFO, Parallel), and
> run-all just submit all paragraphs into target interpreter's scheduler.
>
> I think we can add feature such as run-all-sequentially.
> Do you mind file a JIRA issue?
>
> Thanks,
> moon
>
> On Thu, Apr 6, 2017 at 5:35 AM <mu...@googlemail.com> wrote:
>
> I often have notebooks that have a %sh as the 1st paragraph. This scps
> some file from another server, and then a number of spark or sparksql
> paragraphs are after that.
>
> If I click on the run-all paragraphs at the top of the notebook the 1st
> %sh paragraph kicks off as expected, but the 2nd %spark notebook starts too
> at the same time. The others go into pending state and then start once the
> spark one has completed.
>
> Is this a bug? Or am I doing something wrong?
>
> Thanks
>
>
>

Re: Other paragraphs do not wait for %sh paragraphs to finish.

Posted by Rick Moritz <ra...@gmail.com>.
This actually calls for a dependency definition of notes within a notebook,
so the scheduler can decide which tasks to run simultaneously.
I suggest a simple counter of dependency levels, which by default increases
with every new note and can be decremented to allow notes to run
simultaneously. Run-all then submits each level into the target
interpreters for this level, awaits termination of all results, and then
starts the next level's note.


On Thu, Apr 6, 2017 at 12:57 AM, moon soo Lee <mo...@apache.org> wrote:

> Hi,
>
> That's expected behavior at the moment. The reason is
>
> Each interpreter has it's own scheduler (either FIFO, Parallel), and
> run-all just submit all paragraphs into target interpreter's scheduler.
>
> I think we can add feature such as run-all-sequentially.
> Do you mind file a JIRA issue?
>
> Thanks,
> moon
>
> On Thu, Apr 6, 2017 at 5:35 AM <mu...@googlemail.com> wrote:
>
>> I often have notebooks that have a %sh as the 1st paragraph. This scps
>> some file from another server, and then a number of spark or sparksql
>> paragraphs are after that.
>>
>> If I click on the run-all paragraphs at the top of the notebook the 1st
>> %sh paragraph kicks off as expected, but the 2nd %spark notebook starts too
>> at the same time. The others go into pending state and then start once the
>> spark one has completed.
>>
>> Is this a bug? Or am I doing something wrong?
>>
>> Thanks
>>
>>

Re: Other paragraphs do not wait for %sh paragraphs to finish.

Posted by moon soo Lee <mo...@apache.org>.
Hi,

That's expected behavior at the moment. The reason is

Each interpreter has it's own scheduler (either FIFO, Parallel), and
run-all just submit all paragraphs into target interpreter's scheduler.

I think we can add feature such as run-all-sequentially.
Do you mind file a JIRA issue?

Thanks,
moon

On Thu, Apr 6, 2017 at 5:35 AM <mu...@googlemail.com> wrote:

> I often have notebooks that have a %sh as the 1st paragraph. This scps
> some file from another server, and then a number of spark or sparksql
> paragraphs are after that.
>
> If I click on the run-all paragraphs at the top of the notebook the 1st
> %sh paragraph kicks off as expected, but the 2nd %spark notebook starts too
> at the same time. The others go into pending state and then start once the
> spark one has completed.
>
> Is this a bug? Or am I doing something wrong?
>
> Thanks
>
>