You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Belousov Maksim Eduardovich <m....@tinkoff.ru> on 2017/09/28 15:28:58 UTC

Implementing run all paragraphs sequentially

Hello, users!
At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.
It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown
[https://lh6.googleusercontent.com/jwnb7xfb0fPbFg1CWPoMSqovu7ecSMv4pJfuP4zdKVZbyAUDwzAT2GJ5EiemXVYrqMW73yklemTpjXNyLRJABpTCoHi6us2ZI_AxWKHwZpBEA7MjpMP0-7Nk8saaJQfIF4yBMPfS]


For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"
We are glad to hear any thoughts.
Thank you.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368



Maksim Belousov

RE: Implementing run all paragraphs sequentially

Posted by Polyakov Valeriy <v....@tinkoff.ru>.

I suppose there is a fairly simple solution to the problem. We can use flag on paragraph which means “this paragraph should be run in parallel with previous”. Such a logic could help to create sequential-parallel running. It does not implement full-DAG capabilities, but it’s easy to understand and to use.

Valeriy Polyakov

From: Sotnichenko Sergey [mailto:s.sotnichenko@tinkoff.ru]
Sent: Friday, September 29, 2017 2:45 PM
To: users@zeppelin.apache.org
Subject: RE: Implementing run all paragraphs sequentially

It would be very complicated to be honest to build a DAG with names like ‘20170929-143857_1744629322’. Let’s imagine we have 20 paragraphs with such names.

Sergey Sotnichenko

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Friday, September 29, 2017 2:35 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially

'p1', 'p2' is paragraphId. Regarding the readability, we could allow user to set paragraph name, but this is another story, could be an improvement later.

Partridge, Lucas (GE Aviation) <Lu...@ge.com>>于2017年9月29日周五 下午7:30写道：
Interesting idea.  But by ‘p1’, ‘p2’, etc did you literally mean that; or were you using that as shorthand for the id of the paragraph?
If the former then what happens if someone inserts, deletes or reorders paragraphs? But if the latter then the paragraph ids wouldn’t be very easy for someone to read and follow the dependency relationships…

From: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
Sent: 29 September 2017 11:58
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: EXT: Re: Implementing run all paragraphs sequentially

I don't think 2 note setting (parallel/sequential) is sufficient for paragraph scheduling (take the spark tutorial note as an example, we should run the loading bank data paragraph first and then could run all the sql paragraph parallelly).  So the key is how we define the dependency relationship between paragraphs.  Paragraphs of note could build a DAG (directed acyclic graph). Sequential running is just one special kind of DAG (a linked list).

I believe we discuss it before in community.  My proposal is that we could add attribute to the interpreter indicator of each paragraph, so that user can specify the paragraph's dependency (If user don't specify it, the default dependency is the paragraph ahead of it).  Still take the spark tutorial note as an example. We have 3 paragraphes, the first one will load bank data, and the second, third paragraph will query the data. So paragraph 2,3 can run parallelly but must run after paragraph 1. Then we need to specify their dependency in the interpreter indicator part.  Of course, user don't need to specify dependencies if the want to run all the paragraphes sequentially, because the default dependencies is the paragraph ahead of it.

Paragraph 1.

%spark
// code to load bank data

Paragraph 2.

%spark.sql(deps=p1)
// query the bank data

Paragraph 3.
%spark.sql(deps=p1)
// query the bank data

afancy <gr...@gmail.com>>于2017年9月29日周五 下午5:35写道：
+1

I think this is one of the most important features. don't know why this requirement has been skipped.

/afancy

On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich <m....@tinkoff.ru>> wrote:
Hello, users!
At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.
It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown

For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"
We are glad to hear any thoughts.
Thank you.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368

Maksim Belousov

RE: Implementing run all paragraphs sequentially

Posted by Sotnichenko Sergey <s....@tinkoff.ru>.

Colleagues!
How many paragraphs has the typical note? 5? 10?
For 5-10 paragraphs “this paragraph should be run in parallel with previous” option solves 98% issues. It is simple to implement and it is intuitive and simple to use.
In comparison, full-linked DAG is not so intuitive and sometimes even frustrating, especially when ‘20170929-143857_1744629322’ names are involved.

Sergey Sotnichenko


From: Polyakov Valeriy [mailto:v.poljakov@tinkoff.ru]
Sent: Friday, September 29, 2017 3:11 PM
To: users@zeppelin.apache.org
Subject: RE: Implementing run all paragraphs sequentially

This can cover most of typical parallel-use cases. Other cases could be transformed to this type of case with some increase of full running time.

Building of high-grade DAG dependencies will be much more complicated and looks like functionality of visual-based platform of data transformation (e.g. industrial ETL tools) where you can see connections between steps. It’s really hard to support this using just text references.


Valeriy Polyakov

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Friday, September 29, 2017 2:56 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially

>>> I suppose there is a fairly simple solution to the problem. We can use flag on paragraph which means “this paragraph should be run in parallel with previous”. Such a logic could help to create sequential-parallel running. It does not implement full-DAG capabilities, but it’s easy to understand and to use.

This can cover some cases, but can not cover all the cases I think


Jeff Zhang <zj...@gmail.com>>于2017年9月29日周五 下午7:52写道：
Yes, the may looks a little complicated, but it is due to how we name paragraph, not due to this approach I think. IMHO without specifying the dependency relationship between paragraphs, it is almost impossible to schedule paragraphs correctly.




Sotnichenko Sergey <s....@tinkoff.ru>>于2017年9月29日周五 下午7:45写道：
It would be very complicated to be honest to build a DAG with names like ‘20170929-143857_1744629322’. Let’s imagine we have 20 paragraphs with such names.


Sergey Sotnichenko


From: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
Sent: Friday, September 29, 2017 2:35 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially


'p1', 'p2' is paragraphId. Regarding the readability, we could allow user to set paragraph name, but this is another story, could be an improvement later.



Partridge, Lucas (GE Aviation) <Lu...@ge.com>>于2017年9月29日周五 下午7:30写道：
Interesting idea.  But by ‘p1’, ‘p2’, etc did you literally mean that; or were you using that as shorthand for the id of the paragraph?
If the former then what happens if someone inserts, deletes or reorders paragraphs? But if the latter then the paragraph ids wouldn’t be very easy for someone to read and follow the dependency relationships…

From: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
Sent: 29 September 2017 11:58
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: EXT: Re: Implementing run all paragraphs sequentially


I don't think 2 note setting (parallel/sequential) is sufficient for paragraph scheduling (take the spark tutorial note as an example, we should run the loading bank data paragraph first and then could run all the sql paragraph parallelly).  So the key is how we define the dependency relationship between paragraphs.  Paragraphs of note could build a DAG (directed acyclic graph). Sequential running is just one special kind of DAG (a linked list).

I believe we discuss it before in community.  My proposal is that we could add attribute to the interpreter indicator of each paragraph, so that user can specify the paragraph's dependency (If user don't specify it, the default dependency is the paragraph ahead of it).  Still take the spark tutorial note as an example. We have 3 paragraphes, the first one will load bank data, and the second, third paragraph will query the data. So paragraph 2,3 can run parallelly but must run after paragraph 1. Then we need to specify their dependency in the interpreter indicator part.  Of course, user don't need to specify dependencies if the want to run all the paragraphes sequentially, because the default dependencies is the paragraph ahead of it.

Paragraph 1.

%spark
// code to load bank data

Paragraph 2.

%spark.sql(deps=p1)
// query the bank data

Paragraph 3.
%spark.sql(deps=p1)
// query the bank data




afancy <gr...@gmail.com>>于2017年9月29日周五 下午5:35写道：
+1

I think this is one of the most important features. don't know why this requirement has been skipped.

/afancy
On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich <m....@tinkoff.ru>> wrote:
Hello, users!
At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.
It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown


For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"
We are glad to hear any thoughts.
Thank you.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368



Maksim Belousov

RE: Implementing run all paragraphs sequentially

Posted by Polyakov Valeriy <v....@tinkoff.ru>.

This can cover most of typical parallel-use cases. Other cases could be transformed to this type of case with some increase of full running time.

Building of high-grade DAG dependencies will be much more complicated and looks like functionality of visual-based platform of data transformation (e.g. industrial ETL tools) where you can see connections between steps. It’s really hard to support this using just text references.

Valeriy Polyakov

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Friday, September 29, 2017 2:56 PM
To: users@zeppelin.apache.org
Subject: Re: Implementing run all paragraphs sequentially

>>> I suppose there is a fairly simple solution to the problem. We can use flag on paragraph which means “this paragraph should be run in parallel with previous”. Such a logic could help to create sequential-parallel running. It does not implement full-DAG capabilities, but it’s easy to understand and to use.

This can cover some cases, but can not cover all the cases I think

Jeff Zhang <zj...@gmail.com>>于2017年9月29日周五 下午7:52写道：
Yes, the may looks a little complicated, but it is due to how we name paragraph, not due to this approach I think. IMHO without specifying the dependency relationship between paragraphs, it is almost impossible to schedule paragraphs correctly.

Sotnichenko Sergey <s....@tinkoff.ru>>于2017年9月29日周五 下午7:45写道：
It would be very complicated to be honest to build a DAG with names like ‘20170929-143857_1744629322’. Let’s imagine we have 20 paragraphs with such names.

Sergey Sotnichenko

From: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
Sent: Friday, September 29, 2017 2:35 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially

'p1', 'p2' is paragraphId. Regarding the readability, we could allow user to set paragraph name, but this is another story, could be an improvement later.

Partridge, Lucas (GE Aviation) <Lu...@ge.com>>于2017年9月29日周五 下午7:30写道：
Interesting idea.  But by ‘p1’, ‘p2’, etc did you literally mean that; or were you using that as shorthand for the id of the paragraph?
If the former then what happens if someone inserts, deletes or reorders paragraphs? But if the latter then the paragraph ids wouldn’t be very easy for someone to read and follow the dependency relationships…

From: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
Sent: 29 September 2017 11:58
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: EXT: Re: Implementing run all paragraphs sequentially

I don't think 2 note setting (parallel/sequential) is sufficient for paragraph scheduling (take the spark tutorial note as an example, we should run the loading bank data paragraph first and then could run all the sql paragraph parallelly).  So the key is how we define the dependency relationship between paragraphs.  Paragraphs of note could build a DAG (directed acyclic graph). Sequential running is just one special kind of DAG (a linked list).

I believe we discuss it before in community.  My proposal is that we could add attribute to the interpreter indicator of each paragraph, so that user can specify the paragraph's dependency (If user don't specify it, the default dependency is the paragraph ahead of it).  Still take the spark tutorial note as an example. We have 3 paragraphes, the first one will load bank data, and the second, third paragraph will query the data. So paragraph 2,3 can run parallelly but must run after paragraph 1. Then we need to specify their dependency in the interpreter indicator part.  Of course, user don't need to specify dependencies if the want to run all the paragraphes sequentially, because the default dependencies is the paragraph ahead of it.

Paragraph 1.

%spark
// code to load bank data

Paragraph 2.

%spark.sql(deps=p1)
// query the bank data

Paragraph 3.
%spark.sql(deps=p1)
// query the bank data

afancy <gr...@gmail.com>>于2017年9月29日周五 下午5:35写道：
+1

I think this is one of the most important features. don't know why this requirement has been skipped.

/afancy
On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich <m....@tinkoff.ru>> wrote:
Hello, users!
At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.
It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown

For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"
We are glad to hear any thoughts.
Thank you.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368

Maksim Belousov

Re: Implementing run all paragraphs sequentially

Posted by Jeff Zhang <zj...@gmail.com>.

>>> I suppose there is a fairly simple solution to the problem. We can use
flag on paragraph which means “this paragraph should be run in parallel
with previous”. Such a logic could help to create sequential-parallel
running. It does not implement full-DAG capabilities, but it’s easy to
understand and to use.

This can cover some cases, but can not cover all the cases I think


Jeff Zhang <zj...@gmail.com>于2017年9月29日周五 下午7:52写道：

> Yes, the may looks a little complicated, but it is due to how we name
> paragraph, not due to this approach I think. IMHO without specifying the
> dependency relationship between paragraphs, it is almost impossible to
> schedule paragraphs correctly.
>
>
>
>
> Sotnichenko Sergey <s....@tinkoff.ru>于2017年9月29日周五 下午7:45写道：
>
>> It would be very complicated to be honest to build a DAG with names like
>> ‘20170929-143857_1744629322’. Let’s imagine we have 20 paragraphs with such
>> names.
>>
>>
>>
>>
>>
>>
>> *Sergey Sotnichenko *
>>
>>
>>
>>
>>
>> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
>> *Sent:* Friday, September 29, 2017 2:35 PM
>> *To:* users@zeppelin.apache.org
>> *Subject:* Re: Implementing run all paragraphs sequentially
>>
>>
>>
>>
>>
>> 'p1', 'p2' is paragraphId. Regarding the readability, we could allow user
>> to set paragraph name, but this is another story, could be an improvement
>> later.
>>
>>
>>
>>
>>
>>
>>
>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2017年9月29日周五 下午
>> 7:30写道：
>>
>> Interesting idea.  But by ‘p1’, ‘p2’, etc did you literally mean that; or
>> were you using that as shorthand for the id of the paragraph?
>>
>> If the former then what happens if someone inserts, deletes or reorders
>> paragraphs? But if the latter then the paragraph ids wouldn’t be very easy
>> for someone to read and follow the dependency relationships…
>>
>>
>>
>> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
>> *Sent:* 29 September 2017 11:58
>> *To:* users@zeppelin.apache.org
>> *Subject:* EXT: Re: Implementing run all paragraphs sequentially
>>
>>
>>
>>
>>
>> I don't think 2 note setting (parallel/sequential) is sufficient for
>> paragraph scheduling (take the spark tutorial note as an example, we should
>> run the loading bank data paragraph first and then could run all the sql
>> paragraph parallelly).  So the key is how we define the dependency
>> relationship between paragraphs.  Paragraphs of note could build a DAG
>> (directed acyclic graph). Sequential running is just one special kind of
>> DAG (a linked list).
>>
>>
>>
>> I believe we discuss it before in community.  My proposal is that we
>> could add attribute to the interpreter indicator of each paragraph, so that
>> user can specify the paragraph's dependency (If user don't specify it, the
>> default dependency is the paragraph ahead of it).  Still take the spark
>> tutorial note as an example. We have 3 paragraphes, the first one will load
>> bank data, and the second, third paragraph will query the data. So
>> paragraph 2,3 can run parallelly but must run after paragraph 1. Then we
>> need to specify their dependency in the interpreter indicator part.  Of
>> course, user don't need to specify dependencies if the want to run all the
>> paragraphes sequentially, because the default dependencies is the paragraph
>> ahead of it.
>>
>>
>>
>> Paragraph 1.
>>
>>
>>
>> %spark
>>
>> // code to load bank data
>>
>>
>>
>> Paragraph 2.
>>
>>
>>
>> %spark.sql(deps=p1)
>>
>> // query the bank data
>>
>>
>>
>> Paragraph 3.
>>
>> %spark.sql(deps=p1)
>>
>> // query the bank data
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> afancy <gr...@gmail.com>于2017年9月29日周五 下午5:35写道：
>>
>> +1
>>
>> I think this is one of the most important features. don't know why this
>> requirement has been skipped.
>>
>>
>>
>> /afancy
>>
>> On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich <
>> m.belousov@tinkoff.ru> wrote:
>>
>> Hello, users!
>>
>> At the moment our analysts often use mixes of interpreters in their notes.
>>
>> For example, they prepare data using %jdbc and then use it in %pyspark.
>> Besides, they often use scheduling to make some regular reporting. And they
>> should do something like `time.sleep()` to wait for the data from %jdbc. It
>> doesn`t guarantee the result and doesn`t look cool.
>>
>>
>>
>> You can find early attempts to implement sequential running of all
>> paragraphs in [1].
>>
>> We are really interested in implementation of the issue [2] and are ready
>> to solve it.
>>
>> It seems a good idea to discuss any requirements.
>>
>> My idea is to introduce note setting that defines the type of running to
>> use (parallel or sequential) and leave "Run all" to be the only button
>> running all the cells in the note. This will make sequential or parallel
>> running the `note option` but not `run option`.
>>
>> Option will be controlled by nearby button as shown
>>
>>
>>
>>
>>
>> For new notes the default state would be "Run sequential all", for old -
>> "Run parallel for interpreters"
>>
>> We are glad to hear any thoughts.
>>
>> Thank you.
>>
>>
>>
>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>>
>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>
>>
>>
>>
>>
>>
>> *Maksim Belousov*
>>
>>
>>
>>

Re: Implementing run all paragraphs sequentially

Posted by Jeff Zhang <zj...@gmail.com>.

Yes, the may looks a little complicated, but it is due to how we name
paragraph, not due to this approach I think. IMHO without specifying the
dependency relationship between paragraphs, it is almost impossible to
schedule paragraphs correctly.




Sotnichenko Sergey <s....@tinkoff.ru>于2017年9月29日周五 下午7:45写道：

> It would be very complicated to be honest to build a DAG with names like
> ‘20170929-143857_1744629322’. Let’s imagine we have 20 paragraphs with such
> names.
>
>
>
>
>
>
> *Sergey Sotnichenko *
>
>
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Friday, September 29, 2017 2:35 PM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: Implementing run all paragraphs sequentially
>
>
>
>
>
> 'p1', 'p2' is paragraphId. Regarding the readability, we could allow user
> to set paragraph name, but this is another story, could be an improvement
> later.
>
>
>
>
>
>
>
> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2017年9月29日周五 下午
> 7:30写道：
>
> Interesting idea.  But by ‘p1’, ‘p2’, etc did you literally mean that; or
> were you using that as shorthand for the id of the paragraph?
>
> If the former then what happens if someone inserts, deletes or reorders
> paragraphs? But if the latter then the paragraph ids wouldn’t be very easy
> for someone to read and follow the dependency relationships…
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* 29 September 2017 11:58
> *To:* users@zeppelin.apache.org
> *Subject:* EXT: Re: Implementing run all paragraphs sequentially
>
>
>
>
>
> I don't think 2 note setting (parallel/sequential) is sufficient for
> paragraph scheduling (take the spark tutorial note as an example, we should
> run the loading bank data paragraph first and then could run all the sql
> paragraph parallelly).  So the key is how we define the dependency
> relationship between paragraphs.  Paragraphs of note could build a DAG
> (directed acyclic graph). Sequential running is just one special kind of
> DAG (a linked list).
>
>
>
> I believe we discuss it before in community.  My proposal is that we could
> add attribute to the interpreter indicator of each paragraph, so that user
> can specify the paragraph's dependency (If user don't specify it, the
> default dependency is the paragraph ahead of it).  Still take the spark
> tutorial note as an example. We have 3 paragraphes, the first one will load
> bank data, and the second, third paragraph will query the data. So
> paragraph 2,3 can run parallelly but must run after paragraph 1. Then we
> need to specify their dependency in the interpreter indicator part.  Of
> course, user don't need to specify dependencies if the want to run all the
> paragraphes sequentially, because the default dependencies is the paragraph
> ahead of it.
>
>
>
> Paragraph 1.
>
>
>
> %spark
>
> // code to load bank data
>
>
>
> Paragraph 2.
>
>
>
> %spark.sql(deps=p1)
>
> // query the bank data
>
>
>
> Paragraph 3.
>
> %spark.sql(deps=p1)
>
> // query the bank data
>
>
>
>
>
>
>
>
>
> afancy <gr...@gmail.com>于2017年9月29日周五 下午5:35写道：
>
> +1
>
> I think this is one of the most important features. don't know why this
> requirement has been skipped.
>
>
>
> /afancy
>
> On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich <
> m.belousov@tinkoff.ru> wrote:
>
> Hello, users!
>
> At the moment our analysts often use mixes of interpreters in their notes.
>
> For example, they prepare data using %jdbc and then use it in %pyspark.
> Besides, they often use scheduling to make some regular reporting. And they
> should do something like `time.sleep()` to wait for the data from %jdbc. It
> doesn`t guarantee the result and doesn`t look cool.
>
>
>
> You can find early attempts to implement sequential running of all
> paragraphs in [1].
>
> We are really interested in implementation of the issue [2] and are ready
> to solve it.
>
> It seems a good idea to discuss any requirements.
>
> My idea is to introduce note setting that defines the type of running to
> use (parallel or sequential) and leave "Run all" to be the only button
> running all the cells in the note. This will make sequential or parallel
> running the `note option` but not `run option`.
>
> Option will be controlled by nearby button as shown
>
>
>
>
>
> For new notes the default state would be "Run sequential all", for old -
> "Run parallel for interpreters"
>
> We are glad to hear any thoughts.
>
> Thank you.
>
>
>
> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>
> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>
>
>
>
>
>
> *Maksim Belousov*
>
>
>
>

RE: Implementing run all paragraphs sequentially

Posted by Sotnichenko Sergey <s....@tinkoff.ru>.

It would be very complicated to be honest to build a DAG with names like ‘20170929-143857_1744629322’. Let’s imagine we have 20 paragraphs with such names.

Sergey Sotnichenko

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Friday, September 29, 2017 2:35 PM
To: users@zeppelin.apache.org
Subject: Re: Implementing run all paragraphs sequentially

'p1', 'p2' is paragraphId. Regarding the readability, we could allow user to set paragraph name, but this is another story, could be an improvement later.

Partridge, Lucas (GE Aviation) <Lu...@ge.com>>于2017年9月29日周五 下午7:30写道：
Interesting idea.  But by ‘p1’, ‘p2’, etc did you literally mean that; or were you using that as shorthand for the id of the paragraph?
If the former then what happens if someone inserts, deletes or reorders paragraphs? But if the latter then the paragraph ids wouldn’t be very easy for someone to read and follow the dependency relationships…

From: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
Sent: 29 September 2017 11:58
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: EXT: Re: Implementing run all paragraphs sequentially

I don't think 2 note setting (parallel/sequential) is sufficient for paragraph scheduling (take the spark tutorial note as an example, we should run the loading bank data paragraph first and then could run all the sql paragraph parallelly).  So the key is how we define the dependency relationship between paragraphs.  Paragraphs of note could build a DAG (directed acyclic graph). Sequential running is just one special kind of DAG (a linked list).

I believe we discuss it before in community.  My proposal is that we could add attribute to the interpreter indicator of each paragraph, so that user can specify the paragraph's dependency (If user don't specify it, the default dependency is the paragraph ahead of it).  Still take the spark tutorial note as an example. We have 3 paragraphes, the first one will load bank data, and the second, third paragraph will query the data. So paragraph 2,3 can run parallelly but must run after paragraph 1. Then we need to specify their dependency in the interpreter indicator part.  Of course, user don't need to specify dependencies if the want to run all the paragraphes sequentially, because the default dependencies is the paragraph ahead of it.

Paragraph 1.

%spark
// code to load bank data

Paragraph 2.

%spark.sql(deps=p1)
// query the bank data

Paragraph 3.
%spark.sql(deps=p1)
// query the bank data

afancy <gr...@gmail.com>>于2017年9月29日周五 下午5:35写道：
+1

I think this is one of the most important features. don't know why this requirement has been skipped.

/afancy

On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich <m....@tinkoff.ru>> wrote:
Hello, users!
At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.
It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown

For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"
We are glad to hear any thoughts.
Thank you.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368

Maksim Belousov

Re: Implementing run all paragraphs sequentially

Posted by Jeff Zhang <zj...@gmail.com>.

'p1', 'p2' is paragraphId. Regarding the readability, we could allow user
to set paragraph name, but this is another story, could be an improvement
later.



Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2017年9月29日周五
下午7:30写道：

> Interesting idea.  But by ‘p1’, ‘p2’, etc did you literally mean that; or
> were you using that as shorthand for the id of the paragraph?
>
> If the former then what happens if someone inserts, deletes or reorders
> paragraphs? But if the latter then the paragraph ids wouldn’t be very easy
> for someone to read and follow the dependency relationships…
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* 29 September 2017 11:58
> *To:* users@zeppelin.apache.org
> *Subject:* EXT: Re: Implementing run all paragraphs sequentially
>
>
>
>
>
> I don't think 2 note setting (parallel/sequential) is sufficient for
> paragraph scheduling (take the spark tutorial note as an example, we should
> run the loading bank data paragraph first and then could run all the sql
> paragraph parallelly).  So the key is how we define the dependency
> relationship between paragraphs.  Paragraphs of note could build a DAG
> (directed acyclic graph). Sequential running is just one special kind of
> DAG (a linked list).
>
>
>
> I believe we discuss it before in community.  My proposal is that we could
> add attribute to the interpreter indicator of each paragraph, so that user
> can specify the paragraph's dependency (If user don't specify it, the
> default dependency is the paragraph ahead of it).  Still take the spark
> tutorial note as an example. We have 3 paragraphes, the first one will load
> bank data, and the second, third paragraph will query the data. So
> paragraph 2,3 can run parallelly but must run after paragraph 1. Then we
> need to specify their dependency in the interpreter indicator part.  Of
> course, user don't need to specify dependencies if the want to run all the
> paragraphes sequentially, because the default dependencies is the paragraph
> ahead of it.
>
>
>
> Paragraph 1.
>
>
>
> %spark
>
> // code to load bank data
>
>
>
> Paragraph 2.
>
>
>
> %spark.sql(deps=p1)
>
> // query the bank data
>
>
>
> Paragraph 3.
>
> %spark.sql(deps=p1)
>
> // query the bank data
>
>
>
>
>
>
>
>
>
> afancy <gr...@gmail.com>于2017年9月29日周五 下午5:35写道：
>
> +1
>
> I think this is one of the most important features. don't know why this
> requirement has been skipped.
>
>
>
> /afancy
>
>
>
> On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich <
> m.belousov@tinkoff.ru> wrote:
>
> Hello, users!
>
> At the moment our analysts often use mixes of interpreters in their notes.
>
> For example, they prepare data using %jdbc and then use it in %pyspark.
> Besides, they often use scheduling to make some regular reporting. And they
> should do something like `time.sleep()` to wait for the data from %jdbc. It
> doesn`t guarantee the result and doesn`t look cool.
>
>
>
> You can find early attempts to implement sequential running of all
> paragraphs in [1].
>
> We are really interested in implementation of the issue [2] and are ready
> to solve it.
>
> It seems a good idea to discuss any requirements.
>
> My idea is to introduce note setting that defines the type of running to
> use (parallel or sequential) and leave "Run all" to be the only button
> running all the cells in the note. This will make sequential or parallel
> running the `note option` but not `run option`.
>
> Option will be controlled by nearby button as shown
>
> [image:
> https://lh6.googleusercontent.com/jwnb7xfb0fPbFg1CWPoMSqovu7ecSMv4pJfuP4zdKVZbyAUDwzAT2GJ5EiemXVYrqMW73yklemTpjXNyLRJABpTCoHi6us2ZI_AxWKHwZpBEA7MjpMP0-7Nk8saaJQfIF4yBMPfS]
>
>
>
>
>
> For new notes the default state would be "Run sequential all", for old -
> "Run parallel for interpreters"
>
> We are glad to hear any thoughts.
>
> Thank you.
>
>
>
> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>
> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>
>
>
>
>
>
> *Maksim Belousov*
>
>
>
>
>
>

Implementing run all paragraphs sequentially

Posted by "Partridge, Lucas (GE Aviation)" <Lu...@ge.com>.

Interesting idea.  But by ‘p1’, ‘p2’, etc did you literally mean that; or were you using that as shorthand for the id of the paragraph?
If the former then what happens if someone inserts, deletes or reorders paragraphs? But if the latter then the paragraph ids wouldn’t be very easy for someone to read and follow the dependency relationships…

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: 29 September 2017 11:58
To: users@zeppelin.apache.org
Subject: EXT: Re: Implementing run all paragraphs sequentially

I don't think 2 note setting (parallel/sequential) is sufficient for paragraph scheduling (take the spark tutorial note as an example, we should run the loading bank data paragraph first and then could run all the sql paragraph parallelly).  So the key is how we define the dependency relationship between paragraphs.  Paragraphs of note could build a DAG (directed acyclic graph). Sequential running is just one special kind of DAG (a linked list).

I believe we discuss it before in community.  My proposal is that we could add attribute to the interpreter indicator of each paragraph, so that user can specify the paragraph's dependency (If user don't specify it, the default dependency is the paragraph ahead of it).  Still take the spark tutorial note as an example. We have 3 paragraphes, the first one will load bank data, and the second, third paragraph will query the data. So paragraph 2,3 can run parallelly but must run after paragraph 1. Then we need to specify their dependency in the interpreter indicator part.  Of course, user don't need to specify dependencies if the want to run all the paragraphes sequentially, because the default dependencies is the paragraph ahead of it.

Paragraph 1.

%spark
// code to load bank data

Paragraph 2.

%spark.sql(deps=p1)
// query the bank data

Paragraph 3.
%spark.sql(deps=p1)
// query the bank data

afancy <gr...@gmail.com>>于2017年9月29日周五 下午5:35写道：
+1

I think this is one of the most important features. don't know why this requirement has been skipped.

/afancy

On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich <m....@tinkoff.ru>> wrote:
Hello, users!
At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.
It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown
[https://lh6.googleusercontent.com/jwnb7xfb0fPbFg1CWPoMSqovu7ecSMv4pJfuP4zdKVZbyAUDwzAT2GJ5EiemXVYrqMW73yklemTpjXNyLRJABpTCoHi6us2ZI_AxWKHwZpBEA7MjpMP0-7Nk8saaJQfIF4yBMPfS]

For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"
We are glad to hear any thoughts.
Thank you.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368

Maksim Belousov

Re: Implementing run all paragraphs sequentially

Posted by Jeff Zhang <zj...@gmail.com>.

I don't think 2 note setting (parallel/sequential) is sufficient for
paragraph scheduling (take the spark tutorial note as an example, we should
run the loading bank data paragraph first and then could run all the sql
paragraph parallelly).  So the key is how we define the dependency
relationship between paragraphs.  Paragraphs of note could build a DAG
(directed acyclic graph). Sequential running is just one special kind of
DAG (a linked list).

I believe we discuss it before in community.  My proposal is that we could
add attribute to the interpreter indicator of each paragraph, so that user
can specify the paragraph's dependency (If user don't specify it, the
default dependency is the paragraph ahead of it).  Still take the spark
tutorial note as an example. We have 3 paragraphes, the first one will load
bank data, and the second, third paragraph will query the data. So
paragraph 2,3 can run parallelly but must run after paragraph 1. Then we
need to specify their dependency in the interpreter indicator part.  Of
course, user don't need to specify dependencies if the want to run all the
paragraphes sequentially, because the default dependencies is the paragraph
ahead of it.

Paragraph 1.

%spark
// code to load bank data

Paragraph 2.

%spark.sql(deps=p1)
// query the bank data

Paragraph 3.
%spark.sql(deps=p1)
// query the bank data

afancy <gr...@gmail.com>于2017年9月29日周五 下午5:35写道：

> +1
>
> I think this is one of the most important features. don't know why this
> requirement has been skipped.
>
> /afancy
>
> On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich <
> m.belousov@tinkoff.ru> wrote:
>
>> Hello, users!
>>
>> At the moment our analysts often use mixes of interpreters in their notes.
>>
>> For example, they prepare data using %jdbc and then use it in %pyspark.
>> Besides, they often use scheduling to make some regular reporting. And they
>> should do something like `time.sleep()` to wait for the data from %jdbc. It
>> doesn`t guarantee the result and doesn`t look cool.
>>
>>
>>
>> You can find early attempts to implement sequential running of all
>> paragraphs in [1].
>>
>> We are really interested in implementation of the issue [2] and are ready
>> to solve it.
>>
>> It seems a good idea to discuss any requirements.
>>
>> My idea is to introduce note setting that defines the type of running to
>> use (parallel or sequential) and leave "Run all" to be the only button
>> running all the cells in the note. This will make sequential or parallel
>> running the `note option` but not `run option`.
>>
>> Option will be controlled by nearby button as shown
>>
>> [image:
>> https://lh6.googleusercontent.com/jwnb7xfb0fPbFg1CWPoMSqovu7ecSMv4pJfuP4zdKVZbyAUDwzAT2GJ5EiemXVYrqMW73yklemTpjXNyLRJABpTCoHi6us2ZI_AxWKHwZpBEA7MjpMP0-7Nk8saaJQfIF4yBMPfS]
>>
>>
>>
>>
>>
>> For new notes the default state would be "Run sequential all", for old -
>> "Run parallel for interpreters"
>>
>> We are glad to hear any thoughts.
>>
>> Thank you.
>>
>>
>>
>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>>
>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>
>>
>>
>>
>>
>>
>>
>>
>> *Maksim Belousov *
>>
>>
>>
>
>

Re: Implementing run all paragraphs sequentially

Posted by afancy <gr...@gmail.com>.

+1

I think this is one of the most important features. don't know why this
requirement has been skipped.

/afancy

On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich <
m.belousov@tinkoff.ru> wrote:

> Hello, users!
>
> At the moment our analysts often use mixes of interpreters in their notes.
>
> For example, they prepare data using %jdbc and then use it in %pyspark.
> Besides, they often use scheduling to make some regular reporting. And they
> should do something like `time.sleep()` to wait for the data from %jdbc. It
> doesn`t guarantee the result and doesn`t look cool.
>
>
>
> You can find early attempts to implement sequential running of all
> paragraphs in [1].
>
> We are really interested in implementation of the issue [2] and are ready
> to solve it.
>
> It seems a good idea to discuss any requirements.
>
> My idea is to introduce note setting that defines the type of running to
> use (parallel or sequential) and leave "Run all" to be the only button
> running all the cells in the note. This will make sequential or parallel
> running the `note option` but not `run option`.
>
> Option will be controlled by nearby button as shown
>
> [image:
> https://lh6.googleusercontent.com/jwnb7xfb0fPbFg1CWPoMSqovu7ecSMv4pJfuP4zdKVZbyAUDwzAT2GJ5EiemXVYrqMW73yklemTpjXNyLRJABpTCoHi6us2ZI_AxWKHwZpBEA7MjpMP0-7Nk8saaJQfIF4yBMPfS]
>
>
>
>
>
> For new notes the default state would be "Run sequential all", for old -
> "Run parallel for interpreters"
>
> We are glad to hear any thoughts.
>
> Thank you.
>
>
>
> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>
> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>
>
>
>
>
>
>
>
> *Maksim Belousov *
>
>
>

Re: Implementing run all paragraphs sequentially

Posted by da...@ontrenet.com.

We've been needing this feature as well. Very frustrating the way it currently works.




Get Outlook for Android







On Fri, Sep 29, 2017 at 12:04 AM -0400, "moon soo Lee" <mo...@apache.org> wrote:










This is going to be really useful!
Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,moon
On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com> wrote:












+1, our internal users at Twitter also often request this








From: Belousov Maksim Eduardovich <m....@tinkoff.ru>

Sent: Thursday, September 28, 2017 8:28:58 AM

To: users@zeppelin.apache.org

Subject: Implementing run all paragraphs sequentially
 





Hello, users!





At the moment our analysts often use mixes of interpreters in their notes.


For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc.
 It doesn`t guarantee the result and doesn`t look cool.


 


You can find early attempts to implement sequential running of all paragraphs in [1].


We are really interested in implementation of the issue [2] and are ready to solve it.





It seems a good idea to discuss any requirements.


My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel
 running the `note option` but not `run option`.


Option will be controlled by nearby button as shown







 


 


For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"





We are glad to hear any thoughts.


Thank you.





 


[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165


[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368


 


 




Maksim Belousov

RE: Implementing run all paragraphs sequentially

Posted by Polyakov Valeriy <v....@tinkoff.ru>.

Option (3) does not solve the problem of manageable running order which is really useful in some cases (sequential running of different interpreters and parallel running of same interpreters in one note).

Valeriy Polyakov

From: Herval Freire [mailto:hfreire@twitter.com]
Sent: Monday, October 02, 2017 6:41 PM
To: users@zeppelin.apache.org; users@zeppelin.apache.org
Subject: Re: Implementing run all paragraphs sequentially

Why do you need rules and graphs and any of that to support running everything sequentially or everything in parallel?

3) add a "run mode" to the note. If it's "sequential", run the paragraphs one at a time, in the order they're defined. If parallel, run using current scheme (as many at the same time as the threadpool permits)

Simpler and covers all cases, imo

________________________________
From: Polyakov Valeriy <v....@tinkoff.ru>>
Sent: Monday, October 2, 2017 8:24:35 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: RE: Implementing run all paragraphs sequentially

Let me try to summarize the discussion. Evidently, current behavior of running notes does not meet actual requirements. The most important thing that we need is the ability of sequential running. However, at the same time we want to keep functionality of parallel running. We discussed that the most suitable solution of building paragraphs` dependencies is a DAG (directed acyclic graph). Therefore, surely, this kind of dependencies should be defined in note and the running order should not depend on how we launch it (button / scheduler / API). In this way, our objectives are to implement "dependency definition engine" and to use it in "run engine". What are the options?

1)      Explicit dependency definition.

We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option "Wait for ..." where we can choose paragraph for which we are waiting for to start execution. In case where the option is set, we start execution immediately after the end of execution of selected paragraph. This pattern allows us to implement full-parallel DAG running order. What are the disadvantages? All of them are about the same - not easy understanding of the dependency management process from the perspective of users (and probably redundancy of the functionality - my personal view). At first, we should use strange format of paragraph IDs, which in addition is hidden. We could come up with visible and handsome paragraph ID aliases, but then it appears necessity of duplication control. The second thing is in some kind of scenarios where we should change existing dependencies (e.g. you need to add new paragraph between one and dependent group - you have to change option "Wait for ..." for each paragraph in group).

2)      Implicit dependency definition.

We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option "Run in parallel with previous" which allows us to create paragraph groups to run in parallel. It turns out that we have the way of sequential running of paragraph groups - group by group in which paragraphs run in parallel. This approach is much more understandable for the users, but the obvious defect in comparison with "Explicit definition" is the fact that dependency graph and level of parallelism are not so cool.
I am not sure which option (1) or (2) is correct to implement at the moment. I hope to hear from product visionaries which way to choose and to get approval for the start of implementation.
Thank you!

Valeriy Polyakov

From: Michael Segel [mailto:msegel_hadoop@hotmail.com]
Sent: Saturday, September 30, 2017 4:22 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially

Sorry to jump in...

If you want to run paragraphs in parallel, you are going to want to have some sort of dependency graph.  Think of a common set up where you need to set up common functions and imports. (setup of %spark.dep)

A good example is if your notebook is a bunch of unit tests and you need to build the common tear down / set up methods to be used by the other paragraphs.

If you're going to do that, you'll need to build out a metadata structure where you can set up your dependencies  as well as add things like labels beyond the ids (which only need to be unique to the given notebook. )

Just my $0.02

On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org>> wrote:

Current behavior is as parallel as possible.
Run notebook button currently submits all paragraphs in a notebook into each interpreter's own scheduler (FIFO, Parallel) at once. And each individual scheduler of interpreter runs the paragraphs.

I think we can provide "sequential" run button for easier use, which submits paragraph one and waits for finish before submit next paragraphs.

And I think sequential run button doesn't stop having more complex / flexible DAG in the future?

Thanks,
moon

On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>> wrote:
What is the current behavior?

On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>> wrote:
At least in our case, the notebooks that we need to run sequentially are expected to *always* run sequentially - thus it makes more sense to be a note option than a per-run mode

H

_____________________________
From: moon soo Lee <mo...@apache.org>>
Sent: Thursday, September 28, 2017 9:03 PM
Subject: Re: Implementing run all paragraphs sequentially
To: <us...@zeppelin.apache.org>>
This is going to be really useful!

Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,
moon

On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>> wrote:
+1, our internal users at Twitter also often request this

________________________________
From: Belousov Maksim Eduardovich <m....@tinkoff.ru>>
Sent: Thursday, September 28, 2017 8:28:58 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Implementing run all paragraphs sequentially

Hello, users!

At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.

It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown

[Image removed by sender. image002.jpg]

For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"

We are glad to hear any thoughts.
Thank you.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368

Maksim Belousov

Re: Implementing run all paragraphs sequentially

Posted by Jeff Zhang <zj...@gmail.com>.

What behavior do you see ? If it doesn't work for you, please create a
ticket and describe the details.


afancy <gr...@gmail.com>于2018年2月21日周三 下午6:04写道：

> Hi,
>
> May I ask if this feature available? I just full from the master branch,
> but I haven't seen this implementation.
>
> Thanks
> /afancy
>
> On Sat, Oct 7, 2017 at 2:57 AM, Jianfeng (Jeff) Zhang <
> jzhang@hortonworks.com> wrote:
>
>>
>> Since almost everyone agree on to run serial by default. We could
>> implement it first. Regarding the parallel mode,  we could leave it in
>> future although personally I prefer to define DAG for note.
>>
>>
>> Best Regard,
>> Jeff Zhang
>>
>>
>> From: Michael Segel <ms...@hotmail.com>
>> Reply-To: "users@zeppelin.apache.org" <us...@zeppelin.apache.org>
>> Date: Friday, October 6, 2017 at 10:08 PM
>> To: "users@zeppelin.apache.org" <us...@zeppelin.apache.org>
>> Subject: Re: Implementing run all paragraphs sequentially
>>
>> Guys…
>>
>> 1) You’re posting this to the user list… Isn’t this a dev question?
>>
>> 2) +1 on the run serial… but doesn’t that already exist with the “run all
>> paragraphs” button already?
>>
>> 3) -1 on a ‘run all in parallel’ button.  (Its like putting lipstick on a
>> pig.)
>>
>> Are you really going to run all of the paragraphs in parallel?  You’re
>> not going to have a paragraph that is used to set things up? Import
>> external libraries?  Define classes/functions for future paragraphs to use?
>>
>> IMHO I would much rather see a DAG where each paragraph can set their
>> dependancy… (this isn’t the right term. I’m trying to think back to how it
>> was described in NeXTStep objective-c code.)
>> Then you could set your parallel button to run in parallel but if your
>> paragraph is dependent on another, its blocked from executing until its
>> predecessor completes.
>>
>> But that’s just my $0.02
>>
>> On Oct 6, 2017, at 2:25 AM, Polyakov Valeriy <v....@tinkoff.ru>
>> wrote:
>>
>> Thank you all for sharing the problem. Naman Mishra had started the
>> implementation of serial run in [1] so I propose to come back for the
>> discussion of next step (both Parallel and Serial run buttons) after [1]
>> will resolved.
>>
>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>
>>
>> *Valeriy Polyakov*
>>
>> *From:* Jeff Zhang [mailto:zjffdu@gmail.com <zj...@gmail.com>]
>> *Sent:* Friday, October 06, 2017 10:14 AM
>> *To:* users@zeppelin.apache.org
>> *Subject:* Re: Implementing run all paragraphs sequentially
>>
>>
>> +1 for serial run by default.  Let's leave others in future.
>>
>> Mohit Jaggi <mo...@gmail.com>于2017年10月6日周五 上午7:48写道：
>>
>> +1 for serial run by default.
>>
>> Sent from my iPhone
>>
>>
>> On Oct 5, 2017, at 3:36 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>> I'd like to we also consider simplicity of use.
>>
>> We can have two different modes, or two different run buttons for Serial
>> or Parallel run. This gives flexibility of choosing two different scheduler
>> as a benefit, but to make user understand difference between two run
>> button, there must be really good UI treatment.
>>
>> I see there're high user demands for run notebook sequentially. And i
>> think there're 3 action items in this discussion threads.
>>
>> 1. Change Parallel -> Serial the current run all button behavior
>> 2. Provide both Parallel and Serial run buttons with really good UI
>> treatment.
>> 3. Provides DAG
>>
>> I think 1) does not stop 2) and 3) in the future. 2) also does not stop
>> 3) in the future.
>>
>> So, why don't we try 1) first and keep discuss and polish idea about 2)
>> and 3)?
>>
>>
>> Thanks,
>> moon
>>
>> On Mon, Oct 2, 2017 at 10:22 AM Michael Segel <ms...@hotmail.com>
>> wrote:
>>
>> Whoa!
>> Seems I walked in to something.
>>
>> Herval,
>>
>> What do you suggest?  A simple switch that runs everything in serial, or
>> everything in parallel?
>> That would be a very bad idea.
>>
>> I gave you an example of a class of solutions where you don’t want that
>> behavior.
>> E.g Unit testing where you have one setup and then run several unit tests
>> in parallel.
>>
>> If that’s not enough for you… how about if you want to test
>> producer/consumer problems?
>>
>> Or if you want to define classes in one paragraph but then call on them
>> in later paragraphs. If everything runs in parallel from the start of time
>> 0, you can’t do this.
>>
>>
>> So, if you want to do it right the first time… you need to establish a
>> way to control the dependency of paragraphs. This isn’t rocket science.
>> And frankly not that complex.
>>
>> BTW, this is the user list not the dev list…
>>
>> Just saying…  ;-)
>>
>>
>>
>> On Oct 2, 2017, at 11:24 AM, Herval Freire <hf...@twitter.com> wrote:
>>
>>  "nice to have" isn't a very strong requirement. I strongly uggest you
>> really, really think about this before you start pounding an overengineered
>> solution to a non-issue :-)
>>
>> h
>>
>> On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <ms...@hotmail.com>
>> wrote:
>>
>> Yes…
>>  You have bunch of unit tests you can run in parallel where you only need
>> one constructor and one cleanup.
>>
>> I would strongly suggest that you really, really think about this long
>> and hard before you start to pound code.
>> Its going to be harder to back out and fix than if you take the time to
>> think thru the problem and not make a dumb mistake.
>>
>>
>> On Oct 2, 2017, at 11:02 AM, Herval Freire <hf...@twitter.com> wrote:
>>
>> Did anyone request such a case ("running some in parallel and some in
>> sequence")? I haven't seen any requests for this in the wild (nor on this
>> thread), other than theoretical "what if" - which is totally fine, when it
>> doesn't introduce a lot of unecessary complexity for little to no gain
>> (which seems to be the case here)
>>
>> h
>>
>> On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <ms...@hotmail.com>
>> wrote:
>>
>> Because that simplicity doesn’t work.
>>
>> You will want to run some things serial and some things in parallel.
>>
>> Which is why you will need a dependency graph.
>>
>>
>> On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com> wrote:
>>
>> Why do you need rules and graphs and any of that to support running
>> everything sequentially or everything in parallel?
>>
>> 3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs
>> one at a time, in the order they’re defined. If parallel, run using current
>> scheme (as many at the same time as the threadpool permits)
>>
>> Simpler and covers all cases, imo
>>
>> ------------------------------
>> *From:* Polyakov Valeriy <v....@tinkoff.ru>
>> *Sent:* Monday, October 2, 2017 8:24:35 AM
>> *To:* users@zeppelin.apache.org
>> *Subject:* RE: Implementing run all paragraphs sequentially
>>
>> Let me try to summarize the discussion. Evidently, current behavior of
>> running notes does not meet actual requirements. The most important thing
>> that we need is the ability of sequential running. However, at the same
>> time we want to keep functionality of parallel running. We discussed that
>> the most suitable solution of building paragraphs` dependencies is a DAG
>> (directed acyclic graph). Therefore, surely, this kind of dependencies
>> should be defined in note and the running order should not depend on how we
>> launch it (button / scheduler / API). In this way, our objectives are to
>> implement “dependency definition engine” and to use it in “run engine”.
>> What are the options?
>> 1)      Explicit dependency definition.
>> We could take for a rule that each paragraph should wait for the end of
>> execution of ALL previous paragraphs. Then we add paragraph option “Wait
>> for …” where we can choose paragraph for which we are waiting for to start
>> execution. In case where the option is set, we start execution immediately
>> after the end of execution of selected paragraph. This pattern allows us to
>> implement full-parallel DAG running order. What are the disadvantages? All
>> of them are about the same – not easy understanding of the dependency
>> management process from the perspective of users (and probably redundancy
>> of the functionality – my personal view). At first, we should use strange
>> format of paragraph IDs, which in addition is hidden. We could come up with
>> visible and handsome paragraph ID aliases, but then it appears necessity of
>> duplication control. The second thing is in some kind of scenarios where we
>> should change existing dependencies (e.g. you need to add new paragraph
>> between one and dependent group – you have to change option “Wait for …”
>> for each paragraph in group).
>> 2)      Implicit dependency definition.
>>
>> We could take for a rule that each paragraph should wait for the end of
>> execution of ALL previous paragraphs. Then we add paragraph option “Run in
>> parallel with previous” which allows us to create paragraph groups to run
>> in parallel. It turns out that we have the way of sequential running of
>> paragraph groups – group by group in which paragraphs run in parallel. This
>> approach is much more understandable for the users, but the obvious defect
>> in comparison with “Explicit definition” is the fact that dependency graph
>> and level of parallelism are not so cool.
>> I am not sure which option (1) or (2) is correct to implement at the
>> moment. I hope to hear from product visionaries which way to choose and to
>> get approval for the start of implementation.
>> Thank you!
>>
>>
>>
>>
>> *Valeriy Polyakov*
>>
>> *From:* Michael Segel [mailto:msegel_hadoop@hotmail.com
>> <ms...@hotmail.com>]
>> *Sent:* Saturday, September 30, 2017 4:22 PM
>> *To:* users@zeppelin.apache.org
>> *Subject:* Re: Implementing run all paragraphs sequentially
>>
>> Sorry to jump in…
>>
>> If you want to run paragraphs in parallel, you are going to want to have
>> some sort of dependency graph.  Think of a common set up where you need to
>> set up common functions and imports. (setup of %spark.dep)
>>
>> A good example is if your notebook is a bunch of unit tests and you need
>> to build the common tear down / set up methods to be used by the other
>> paragraphs.
>>
>> If you’re going to do that, you’ll need to build out a metadata structure
>> where you can set up your dependencies  as well as add things like labels
>> beyond the ids (which only need to be unique to the given notebook. )
>>
>> Just my $0.02
>>
>>
>> On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>> Current behavior is as parallel as possible.
>> Run notebook button currently submits all paragraphs in a notebook into
>> each interpreter's own scheduler (FIFO, Parallel) at once. And each
>> individual scheduler of interpreter runs the paragraphs.
>>
>> I think we can provide "sequential" run button for easier use, which
>> submits paragraph one and waits for finish before submit next paragraphs.
>>
>> And I think sequential run button doesn't stop having more complex /
>> flexible DAG in the future?
>>
>> Thanks,
>> moon
>>
>> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>
>> wrote:
>>
>> What is the current behavior?
>>
>> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>
>> wrote:
>>
>> At least in our case, the notebooks that we need to run sequentially are
>> expected to *always* run sequentially - thus it makes more sense to be a
>> note option than a per-run mode
>>
>> H
>>
>>
>> _____________________________
>> From: moon soo Lee <mo...@apache.org>
>> Sent: Thursday, September 28, 2017 9:03 PM
>> Subject: Re: Implementing run all paragraphs sequentially
>> To: <us...@zeppelin.apache.org>
>> This is going to be really useful!
>>
>> Curios why do you prefer 'note option' instead of 'run option'?
>> Could you compare their pros and cons?
>>
>> Thanks,
>> moon
>>
>> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>
>> wrote:
>>
>> +1, our internal users at Twitter also often request this
>>
>> ------------------------------
>> *From:* Belousov Maksim Eduardovich <m....@tinkoff.ru>
>> *Sent:* Thursday, September 28, 2017 8:28:58 AM
>> *To:* users@zeppelin.apache.org
>> *Subject:* Implementing run all paragraphs sequentially
>>
>> Hello, users!
>>
>> At the moment our analysts often use mixes of interpreters in their notes.
>> For example, they prepare data using %jdbc and then use it in %pyspark.
>> Besides, they often use scheduling to make some regular reporting. And they
>> should do something like `time.sleep()` to wait for the data from %jdbc. It
>> doesn`t guarantee the result and doesn`t look cool.
>>
>> You can find early attempts to implement sequential running of all
>> paragraphs in [1].
>> We are really interested in implementation of the issue [2] and are ready
>> to solve it.
>>
>> It seems a good idea to discuss any requirements.
>> My idea is to introduce note setting that defines the type of running to
>> use (parallel or sequential) and leave "Run all" to be the only button
>> running all the cells in the note. This will make sequential or parallel
>> running the `note option` but not `run option`.
>> Option will be controlled by nearby button as shown
>>
>> <~WRD000.jpg>
>>
>>
>>
>> For new notes the default state would be "Run sequential all", for old -
>> "Run parallel for interpreters"
>>
>> We are glad to hear any thoughts.
>> Thank you.
>>
>>
>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>
>>
>>
>>
>> *Maksim Belousov*
>>
>>
>>
>

Re: Implementing run all paragraphs sequentially

Posted by afancy <gr...@gmail.com>.

Hi,

May I ask if this feature available? I just full from the master branch,
but I haven't seen this implementation.

Thanks
/afancy

On Sat, Oct 7, 2017 at 2:57 AM, Jianfeng (Jeff) Zhang <
jzhang@hortonworks.com> wrote:

>
> Since almost everyone agree on to run serial by default. We could
> implement it first. Regarding the parallel mode,  we could leave it in
> future although personally I prefer to define DAG for note.
>
>
> Best Regard,
> Jeff Zhang
>
>
> From: Michael Segel <ms...@hotmail.com>
> Reply-To: "users@zeppelin.apache.org" <us...@zeppelin.apache.org>
> Date: Friday, October 6, 2017 at 10:08 PM
> To: "users@zeppelin.apache.org" <us...@zeppelin.apache.org>
> Subject: Re: Implementing run all paragraphs sequentially
>
> Guys…
>
> 1) You’re posting this to the user list… Isn’t this a dev question?
>
> 2) +1 on the run serial… but doesn’t that already exist with the “run all
> paragraphs” button already?
>
> 3) -1 on a ‘run all in parallel’ button.  (Its like putting lipstick on a
> pig.)
>
> Are you really going to run all of the paragraphs in parallel?  You’re not
> going to have a paragraph that is used to set things up? Import external
> libraries?  Define classes/functions for future paragraphs to use?
>
> IMHO I would much rather see a DAG where each paragraph can set their
> dependancy… (this isn’t the right term. I’m trying to think back to how it
> was described in NeXTStep objective-c code.)
> Then you could set your parallel button to run in parallel but if your
> paragraph is dependent on another, its blocked from executing until its
> predecessor completes.
>
> But that’s just my $0.02
>
> On Oct 6, 2017, at 2:25 AM, Polyakov Valeriy <v....@tinkoff.ru>
> wrote:
>
> Thank you all for sharing the problem. Naman Mishra had started the
> implementation of serial run in [1] so I propose to come back for the
> discussion of next step (both Parallel and Serial run buttons) after [1]
> will resolved.
>
> [1] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>
>
> *Valeriy Polyakov*
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com <zj...@gmail.com>]
> *Sent:* Friday, October 06, 2017 10:14 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: Implementing run all paragraphs sequentially
>
>
> +1 for serial run by default.  Let's leave others in future.
>
> Mohit Jaggi <mo...@gmail.com>于2017年10月6日周五 上午7:48写道：
>
> +1 for serial run by default.
>
> Sent from my iPhone
>
>
> On Oct 5, 2017, at 3:36 PM, moon soo Lee <mo...@apache.org> wrote:
>
> I'd like to we also consider simplicity of use.
>
> We can have two different modes, or two different run buttons for Serial
> or Parallel run. This gives flexibility of choosing two different scheduler
> as a benefit, but to make user understand difference between two run
> button, there must be really good UI treatment.
>
> I see there're high user demands for run notebook sequentially. And i
> think there're 3 action items in this discussion threads.
>
> 1. Change Parallel -> Serial the current run all button behavior
> 2. Provide both Parallel and Serial run buttons with really good UI
> treatment.
> 3. Provides DAG
>
> I think 1) does not stop 2) and 3) in the future. 2) also does not stop 3)
> in the future.
>
> So, why don't we try 1) first and keep discuss and polish idea about 2)
> and 3)?
>
>
> Thanks,
> moon
>
> On Mon, Oct 2, 2017 at 10:22 AM Michael Segel <ms...@hotmail.com>
> wrote:
>
> Whoa!
> Seems I walked in to something.
>
> Herval,
>
> What do you suggest?  A simple switch that runs everything in serial, or
> everything in parallel?
> That would be a very bad idea.
>
> I gave you an example of a class of solutions where you don’t want that
> behavior.
> E.g Unit testing where you have one setup and then run several unit tests
> in parallel.
>
> If that’s not enough for you… how about if you want to test
> producer/consumer problems?
>
> Or if you want to define classes in one paragraph but then call on them in
> later paragraphs. If everything runs in parallel from the start of time 0,
> you can’t do this.
>
>
> So, if you want to do it right the first time… you need to establish a way
> to control the dependency of paragraphs. This isn’t rocket science.
> And frankly not that complex.
>
> BTW, this is the user list not the dev list…
>
> Just saying…  ;-)
>
>
>
> On Oct 2, 2017, at 11:24 AM, Herval Freire <hf...@twitter.com> wrote:
>
>  "nice to have" isn't a very strong requirement. I strongly uggest you
> really, really think about this before you start pounding an overengineered
> solution to a non-issue :-)
>
> h
>
> On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <ms...@hotmail.com>
> wrote:
>
> Yes…
>  You have bunch of unit tests you can run in parallel where you only need
> one constructor and one cleanup.
>
> I would strongly suggest that you really, really think about this long and
> hard before you start to pound code.
> Its going to be harder to back out and fix than if you take the time to
> think thru the problem and not make a dumb mistake.
>
>
> On Oct 2, 2017, at 11:02 AM, Herval Freire <hf...@twitter.com> wrote:
>
> Did anyone request such a case ("running some in parallel and some in
> sequence")? I haven't seen any requests for this in the wild (nor on this
> thread), other than theoretical "what if" - which is totally fine, when it
> doesn't introduce a lot of unecessary complexity for little to no gain
> (which seems to be the case here)
>
> h
>
> On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <ms...@hotmail.com>
> wrote:
>
> Because that simplicity doesn’t work.
>
> You will want to run some things serial and some things in parallel.
>
> Which is why you will need a dependency graph.
>
>
> On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com> wrote:
>
> Why do you need rules and graphs and any of that to support running
> everything sequentially or everything in parallel?
>
> 3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs
> one at a time, in the order they’re defined. If parallel, run using current
> scheme (as many at the same time as the threadpool permits)
>
> Simpler and covers all cases, imo
>
> ------------------------------
> *From:* Polyakov Valeriy <v....@tinkoff.ru>
> *Sent:* Monday, October 2, 2017 8:24:35 AM
> *To:* users@zeppelin.apache.org
> *Subject:* RE: Implementing run all paragraphs sequentially
>
> Let me try to summarize the discussion. Evidently, current behavior of
> running notes does not meet actual requirements. The most important thing
> that we need is the ability of sequential running. However, at the same
> time we want to keep functionality of parallel running. We discussed that
> the most suitable solution of building paragraphs` dependencies is a DAG
> (directed acyclic graph). Therefore, surely, this kind of dependencies
> should be defined in note and the running order should not depend on how we
> launch it (button / scheduler / API). In this way, our objectives are to
> implement “dependency definition engine” and to use it in “run engine”.
> What are the options?
> 1)      Explicit dependency definition.
> We could take for a rule that each paragraph should wait for the end of
> execution of ALL previous paragraphs. Then we add paragraph option “Wait
> for …” where we can choose paragraph for which we are waiting for to start
> execution. In case where the option is set, we start execution immediately
> after the end of execution of selected paragraph. This pattern allows us to
> implement full-parallel DAG running order. What are the disadvantages? All
> of them are about the same – not easy understanding of the dependency
> management process from the perspective of users (and probably redundancy
> of the functionality – my personal view). At first, we should use strange
> format of paragraph IDs, which in addition is hidden. We could come up with
> visible and handsome paragraph ID aliases, but then it appears necessity of
> duplication control. The second thing is in some kind of scenarios where we
> should change existing dependencies (e.g. you need to add new paragraph
> between one and dependent group – you have to change option “Wait for …”
> for each paragraph in group).
> 2)      Implicit dependency definition.
>
> We could take for a rule that each paragraph should wait for the end of
> execution of ALL previous paragraphs. Then we add paragraph option “Run in
> parallel with previous” which allows us to create paragraph groups to run
> in parallel. It turns out that we have the way of sequential running of
> paragraph groups – group by group in which paragraphs run in parallel. This
> approach is much more understandable for the users, but the obvious defect
> in comparison with “Explicit definition” is the fact that dependency graph
> and level of parallelism are not so cool.
> I am not sure which option (1) or (2) is correct to implement at the
> moment. I hope to hear from product visionaries which way to choose and to
> get approval for the start of implementation.
> Thank you!
>
>
>
>
> *Valeriy Polyakov*
>
> *From:* Michael Segel [mailto:msegel_hadoop@hotmail.com
> <ms...@hotmail.com>]
> *Sent:* Saturday, September 30, 2017 4:22 PM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: Implementing run all paragraphs sequentially
>
> Sorry to jump in…
>
> If you want to run paragraphs in parallel, you are going to want to have
> some sort of dependency graph.  Think of a common set up where you need to
> set up common functions and imports. (setup of %spark.dep)
>
> A good example is if your notebook is a bunch of unit tests and you need
> to build the common tear down / set up methods to be used by the other
> paragraphs.
>
> If you’re going to do that, you’ll need to build out a metadata structure
> where you can set up your dependencies  as well as add things like labels
> beyond the ids (which only need to be unique to the given notebook. )
>
> Just my $0.02
>
>
> On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org> wrote:
>
> Current behavior is as parallel as possible.
> Run notebook button currently submits all paragraphs in a notebook into
> each interpreter's own scheduler (FIFO, Parallel) at once. And each
> individual scheduler of interpreter runs the paragraphs.
>
> I think we can provide "sequential" run button for easier use, which
> submits paragraph one and waits for finish before submit next paragraphs.
>
> And I think sequential run button doesn't stop having more complex /
> flexible DAG in the future?
>
> Thanks,
> moon
>
> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com> wrote:
>
> What is the current behavior?
>
> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>
> wrote:
>
> At least in our case, the notebooks that we need to run sequentially are
> expected to *always* run sequentially - thus it makes more sense to be a
> note option than a per-run mode
>
> H
>
>
> _____________________________
> From: moon soo Lee <mo...@apache.org>
> Sent: Thursday, September 28, 2017 9:03 PM
> Subject: Re: Implementing run all paragraphs sequentially
> To: <us...@zeppelin.apache.org>
> This is going to be really useful!
>
> Curios why do you prefer 'note option' instead of 'run option'?
> Could you compare their pros and cons?
>
> Thanks,
> moon
>
> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com> wrote:
>
> +1, our internal users at Twitter also often request this
>
> ------------------------------
> *From:* Belousov Maksim Eduardovich <m....@tinkoff.ru>
> *Sent:* Thursday, September 28, 2017 8:28:58 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Implementing run all paragraphs sequentially
>
> Hello, users!
>
> At the moment our analysts often use mixes of interpreters in their notes.
> For example, they prepare data using %jdbc and then use it in %pyspark.
> Besides, they often use scheduling to make some regular reporting. And they
> should do something like `time.sleep()` to wait for the data from %jdbc. It
> doesn`t guarantee the result and doesn`t look cool.
>
> You can find early attempts to implement sequential running of all
> paragraphs in [1].
> We are really interested in implementation of the issue [2] and are ready
> to solve it.
>
> It seems a good idea to discuss any requirements.
> My idea is to introduce note setting that defines the type of running to
> use (parallel or sequential) and leave "Run all" to be the only button
> running all the cells in the note. This will make sequential or parallel
> running the `note option` but not `run option`.
> Option will be controlled by nearby button as shown
>
> <~WRD000.jpg>
>
>
>
> For new notes the default state would be "Run sequential all", for old -
> "Run parallel for interpreters"
>
> We are glad to hear any thoughts.
> Thank you.
>
>
> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>
>
>
>
> *Maksim Belousov*
>
>
>

Re: Implementing run all paragraphs sequentially

Posted by Jeff Zhang <zj...@gmail.com>.

Hi all,

Here's one PR to change the run all behavior to be run all paragraph
sequentially.
https://github.com/apache/zeppelin/pull/2627

Welcome any comment on this PR.



David Howell <da...@zipmoney.com.au>于2017年10月8日周日 下午9:10写道：

> This should be implemented as a DAG that is defined sequentially by
> default; additional paragraphs should be appended to the DAG. Reordering
> paragraphs should reorder the DAG.
>
>
>
> Implementing it as a DAG will make adding future functionality easier.
>
> Later you can add the functionality to rearrange paragraph dependencies
> within the DAG, Perhaps by creating a special %dag interpreter. If you
> define a dag interpreter and forget to add some paragraphs to the DAG
> definition, they should either run sequentially by default (probably hard
> to get right given ambiguous possibilities when missing paragraphs could be
> anywhere) or should error that not all paragraphs have been
> dependency-linked (easier to implement). The output of a DAG paragraph
> should be a visual dependency graph. The syntax for the %dag paragraph
> should follow other conventions like  using some arrow to indicate upstream
> to downstream e.g. -> or >>
>
> paragraph1 -> paragraph2
>
>
>
> And allow some diamond dependencies e.g.:
>
> paragraph1 >> paragraph2
>
> paragraph1 >> paragraph3
>
> paragraph2 >> paragraph4
>
> paragraph3 >> paragraph4
>
>
>
> Dave
>
>
>
> *From: *Jianfeng (Jeff) Zhang <jz...@hortonworks.com>
> *Sent: *Saturday, 7 October 2017 11:57 AM
>
>
> *To: *users@zeppelin.apache.org
> *Subject: *Re: Implementing run all paragraphs sequentially
>
> Since almost everyone agree on to run serial by default. We could
> implement it first. Regarding the parallel mode,  we could leave it in
> future although personally I prefer to define DAG for note.
>
>
> Best Regard,
> Jeff Zhang
>
>
> From: Michael Segel <ms...@hotmail.com>
> Reply-To: "users@zeppelin.apache.org" <us...@zeppelin.apache.org>
> Date: Friday, October 6, 2017 at 10:08 PM
> To: "users@zeppelin.apache.org" <us...@zeppelin.apache.org>
> Subject: Re: Implementing run all paragraphs sequentially
>
> Guys…
>
> 1) You’re posting this to the user list… Isn’t this a dev question?
>
> 2) +1 on the run serial… but doesn’t that already exist with the “run all
> paragraphs” button already?
>
> 3) -1 on a ‘run all in parallel’ button.  (Its like putting lipstick on a
> pig.)
>
> Are you really going to run all of the paragraphs in parallel?  You’re not
> going to have a paragraph that is used to set things up? Import external
> libraries?  Define classes/functions for future paragraphs to use?
>
> IMHO I would much rather see a DAG where each paragraph can set their
> dependancy… (this isn’t the right term. I’m trying to think back to how it
> was described in NeXTStep objective-c code.)
> Then you could set your parallel button to run in parallel but if your
> paragraph is dependent on another, its blocked from executing until its
> predecessor completes.
>
> But that’s just my $0.02
>
> On Oct 6, 2017, at 2:25 AM, Polyakov Valeriy <v....@tinkoff.ru>
> wrote:
>
> Thank you all for sharing the problem. Naman Mishra had started the
> implementation of serial run in [1] so I propose to come back for the
> discussion of next step (both Parallel and Serial run buttons) after [1]
> will resolved.
>
> [1] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>
>
> *Valeriy Polyakov*
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com <zj...@gmail.com>]
> *Sent:* Friday, October 06, 2017 10:14 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: Implementing run all paragraphs sequentially
>
>
> +1 for serial run by default.  Let's leave others in future.
>
> Mohit Jaggi <mo...@gmail.com>于2017年10月6日周五 上午7:48写道：
>
> +1 for serial run by default.
>
> Sent from my iPhone
>
>
> On Oct 5, 2017, at 3:36 PM, moon soo Lee <mo...@apache.org> wrote:
>
> I'd like to we also consider simplicity of use.
>
> We can have two different modes, or two different run buttons for Serial
> or Parallel run. This gives flexibility of choosing two different scheduler
> as a benefit, but to make user understand difference between two run
> button, there must be really good UI treatment.
>
> I see there're high user demands for run notebook sequentially. And i
> think there're 3 action items in this discussion threads.
>
> 1. Change Parallel -> Serial the current run all button behavior
> 2. Provide both Parallel and Serial run buttons with really good UI
> treatment.
> 3. Provides DAG
>
> I think 1) does not stop 2) and 3) in the future. 2) also does not stop 3)
> in the future.
>
> So, why don't we try 1) first and keep discuss and polish idea about 2)
> and 3)?
>
>
> Thanks,
> moon
>
> On Mon, Oct 2, 2017 at 10:22 AM Michael Segel <ms...@hotmail.com>
> wrote:
>
> Whoa!
> Seems I walked in to something.
>
> Herval,
>
> What do you suggest?  A simple switch that runs everything in serial, or
> everything in parallel?
> That would be a very bad idea.
>
> I gave you an example of a class of solutions where you don’t want that
> behavior.
> E.g Unit testing where you have one setup and then run several unit tests
> in parallel.
>
> If that’s not enough for you… how about if you want to test
> producer/consumer problems?
>
> Or if you want to define classes in one paragraph but then call on them in
> later paragraphs. If everything runs in parallel from the start of time 0,
> you can’t do this.
>
>
> So, if you want to do it right the first time… you need to establish a way
> to control the dependency of paragraphs. This isn’t rocket science.
> And frankly not that complex.
>
> BTW, this is the user list not the dev list…
>
> Just saying…  ;-)
>
>
>
> On Oct 2, 2017, at 11:24 AM, Herval Freire <hf...@twitter.com> wrote:
>
>  "nice to have" isn't a very strong requirement. I strongly uggest you
> really, really think about this before you start pounding an overengineered
> solution to a non-issue :-)
>
> h
>
> On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <ms...@hotmail.com>
> wrote:
>
> Yes…
>  You have bunch of unit tests you can run in parallel where you only need
> one constructor and one cleanup.
>
> I would strongly suggest that you really, really think about this long and
> hard before you start to pound code.
> Its going to be harder to back out and fix than if you take the time to
> think thru the problem and not make a dumb mistake.
>
>
> On Oct 2, 2017, at 11:02 AM, Herval Freire <hf...@twitter.com> wrote:
>
> Did anyone request such a case ("running some in parallel and some in
> sequence")? I haven't seen any requests for this in the wild (nor on this
> thread), other than theoretical "what if" - which is totally fine, when it
> doesn't introduce a lot of unecessary complexity for little to no gain
> (which seems to be the case here)
>
> h
>
> On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <ms...@hotmail.com>
> wrote:
>
> Because that simplicity doesn’t work.
>
> You will want to run some things serial and some things in parallel.
>
> Which is why you will need a dependency graph.
>
>
> On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com> wrote:
>
> Why do you need rules and graphs and any of that to support running
> everything sequentially or everything in parallel?
>
> 3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs
> one at a time, in the order they’re defined. If parallel, run using current
> scheme (as many at the same time as the threadpool permits)
>
> Simpler and covers all cases, imo
>
> ------------------------------
> *From:* Polyakov Valeriy <v....@tinkoff.ru>
> *Sent:* Monday, October 2, 2017 8:24:35 AM
> *To:* users@zeppelin.apache.org
> *Subject:* RE: Implementing run all paragraphs sequentially
>
> Let me try to summarize the discussion. Evidently, current behavior of
> running notes does not meet actual requirements. The most important thing
> that we need is the ability of sequential running. However, at the same
> time we want to keep functionality of parallel running. We discussed that
> the most suitable solution of building paragraphs` dependencies is a DAG
> (directed acyclic graph). Therefore, surely, this kind of dependencies
> should be defined in note and the running order should not depend on how we
> launch it (button / scheduler / API). In this way, our objectives are to
> implement “dependency definition engine” and to use it in “run engine”.
> What are the options?
> 1)      Explicit dependency definition.
> We could take for a rule that each paragraph should wait for the end of
> execution of ALL previous paragraphs. Then we add paragraph option “Wait
> for …” where we can choose paragraph for which we are waiting for to start
> execution. In case where the option is set, we start execution immediately
> after the end of execution of selected paragraph. This pattern allows us to
> implement full-parallel DAG running order. What are the disadvantages? All
> of them are about the same – not easy understanding of the dependency
> management process from the perspective of users (and probably redundancy
> of the functionality – my personal view). At first, we should use strange
> format of paragraph IDs, which in addition is hidden. We could come up with
> visible and handsome paragraph ID aliases, but then it appears necessity of
> duplication control. The second thing is in some kind of scenarios where we
> should change existing dependencies (e.g. you need to add new paragraph
> between one and dependent group – you have to change option “Wait for …”
> for each paragraph in group).
> 2)      Implicit dependency definition.
>
> We could take for a rule that each paragraph should wait for the end of
> execution of ALL previous paragraphs. Then we add paragraph option “Run in
> parallel with previous” which allows us to create paragraph groups to run
> in parallel. It turns out that we have the way of sequential running of
> paragraph groups – group by group in which paragraphs run in parallel. This
> approach is much more understandable for the users, but the obvious defect
> in comparison with “Explicit definition” is the fact that dependency graph
> and level of parallelism are not so cool.
> I am not sure which option (1) or (2) is correct to implement at the
> moment. I hope to hear from product visionaries which way to choose and to
> get approval for the start of implementation.
> Thank you!
>
>
>
>
> *Valeriy Polyakov*
>
> *From:* Michael Segel [mailto:msegel_hadoop@hotmail.com
> <ms...@hotmail.com>]
> *Sent:* Saturday, September 30, 2017 4:22 PM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: Implementing run all paragraphs sequentially
>
> Sorry to jump in…
>
> If you want to run paragraphs in parallel, you are going to want to have
> some sort of dependency graph.  Think of a common set up where you need to
> set up common functions and imports. (setup of %spark.dep)
>
> A good example is if your notebook is a bunch of unit tests and you need
> to build the common tear down / set up methods to be used by the other
> paragraphs.
>
> If you’re going to do that, you’ll need to build out a metadata structure
> where you can set up your dependencies  as well as add things like labels
> beyond the ids (which only need to be unique to the given notebook. )
>
> Just my $0.02
>
>
> On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org> wrote:
>
> Current behavior is as parallel as possible.
> Run notebook button currently submits all paragraphs in a notebook into
> each interpreter's own scheduler (FIFO, Parallel) at once. And each
> individual scheduler of interpreter runs the paragraphs.
>
> I think we can provide "sequential" run button for easier use, which
> submits paragraph one and waits for finish before submit next paragraphs.
>
> And I think sequential run button doesn't stop having more complex /
> flexible DAG in the future?
>
> Thanks,
> moon
>
> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com> wrote:
>
> What is the current behavior?
>
> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>
> wrote:
>
> At least in our case, the notebooks that we need to run sequentially are
> expected to *always* run sequentially - thus it makes more sense to be a
> note option than a per-run mode
>
> H
>
>
> _____________________________
> From: moon soo Lee <mo...@apache.org>
> Sent: Thursday, September 28, 2017 9:03 PM
> Subject: Re: Implementing run all paragraphs sequentially
> To: <us...@zeppelin.apache.org>
> This is going to be really useful!
>
> Curios why do you prefer 'note option' instead of 'run option'?
> Could you compare their pros and cons?
>
> Thanks,
> moon
>
> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com> wrote:
>
> +1, our internal users at Twitter also often request this
>
> ------------------------------
> *From:* Belousov Maksim Eduardovich <m....@tinkoff.ru>
> *Sent:* Thursday, September 28, 2017 8:28:58 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Implementing run all paragraphs sequentially
>
> Hello, users!
>
> At the moment our analysts often use mixes of interpreters in their notes.
> For example, they prepare data using %jdbc and then use it in %pyspark.
> Besides, they often use scheduling to make some regular reporting. And they
> should do something like `time.sleep()` to wait for the data from %jdbc. It
> doesn`t guarantee the result and doesn`t look cool.
>
> You can find early attempts to implement sequential running of all
> paragraphs in [1].
> We are really interested in implementation of the issue [2] and are ready
> to solve it.
>
> It seems a good idea to discuss any requirements.
> My idea is to introduce note setting that defines the type of running to
> use (parallel or sequential) and leave "Run all" to be the only button
> running all the cells in the note. This will make sequential or parallel
> running the `note option` but not `run option`.
> Option will be controlled by nearby button as shown
>
> <~WRD000.jpg>
>
>
>
> For new notes the default state would be "Run sequential all", for old -
> "Run parallel for interpreters"
>
> We are glad to hear any thoughts.
> Thank you.
>
>
> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>
>
>
>
> *Maksim Belousov*
>
>
>

RE: Implementing run all paragraphs sequentially

Posted by David Howell <da...@zipmoney.com.au>.

This should be implemented as a DAG that is defined sequentially by default; additional paragraphs should be appended to the DAG. Reordering paragraphs should reorder the DAG.

Implementing it as a DAG will make adding future functionality easier.
Later you can add the functionality to rearrange paragraph dependencies within the DAG, Perhaps by creating a special %dag interpreter. If you define a dag interpreter and forget to add some paragraphs to the DAG definition, they should either run sequentially by default (probably hard to get right given ambiguous possibilities when missing paragraphs could be anywhere) or should error that not all paragraphs have been dependency-linked (easier to implement). The output of a DAG paragraph should be a visual dependency graph. The syntax for the %dag paragraph should follow other conventions like  using some arrow to indicate upstream to downstream e.g. -> or >>
paragraph1 -> paragraph2

And allow some diamond dependencies e.g.:
paragraph1 >> paragraph2
paragraph1 >> paragraph3
paragraph2 >> paragraph4
paragraph3 >> paragraph4

Dave

From: Jianfeng (Jeff) Zhang<ma...@hortonworks.com>
Sent: Saturday, 7 October 2017 11:57 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially


Since almost everyone agree on to run serial by default. We could implement it first. Regarding the parallel mode,  we could leave it in future although personally I prefer to define DAG for note.


Best Regard,
Jeff Zhang


From: Michael Segel <ms...@hotmail.com>>
Reply-To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>
Date: Friday, October 6, 2017 at 10:08 PM
To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>
Subject: Re: Implementing run all paragraphs sequentially

Guys��

1) You��re posting this to the user list�� Isn��t this a dev question?

2) +1 on the run serial�� but doesn��t that already exist with the ��run all paragraphs�� button already?

3) -1 on a ��run all in parallel�� button.  (Its like putting lipstick on a pig.)

Are you really going to run all of the paragraphs in parallel?  You��re not going to have a paragraph that is used to set things up? Import external libraries?  Define classes/functions for future paragraphs to use?

IMHO I would much rather see a DAG where each paragraph can set their dependancy�� (this isn��t the right term. I��m trying to think back to how it was described in NeXTStep objective-c code.)
Then you could set your parallel button to run in parallel but if your paragraph is dependent on another, its blocked from executing until its predecessor completes.

But that��s just my $0.02

On Oct 6, 2017, at 2:25 AM, Polyakov Valeriy <v....@tinkoff.ru>> wrote:

Thank you all for sharing the problem. Naman Mishra had started the implementation of serial run in [1] so I propose to come back for the discussion of next step (both Parallel and Serial run buttons) after [1] will resolved.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-2368


Valeriy Polyakov

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Friday, October 06, 2017 10:14 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially


+1 for serial run by default.  Let's leave others in future.

Mohit Jaggi <mo...@gmail.com>>��2017��10��6������ ����7:48д����
+1 for serial run by default.

Sent from my iPhone

On Oct 5, 2017, at 3:36 PM, moon soo Lee <mo...@apache.org>> wrote:
I'd like to we also consider simplicity of use.

We can have two different modes, or two different run buttons for Serial or Parallel run. This gives flexibility of choosing two different scheduler as a benefit, but to make user understand difference between two run button, there must be really good UI treatment.

I see there're high user demands for run notebook sequentially. And i think there're 3 action items in this discussion threads.

1. Change Parallel -> Serial the current run all button behavior
2. Provide both Parallel and Serial run buttons with really good UI treatment.
3. Provides DAG

I think 1) does not stop 2) and 3) in the future. 2) also does not stop 3) in the future.

So, why don't we try 1) first and keep discuss and polish idea about 2) and 3)?


Thanks,
moon

On Mon, Oct 2, 2017 at 10:22 AM Michael Segel <ms...@hotmail.com>> wrote:
Whoa!
Seems I walked in to something.

Herval,

What do you suggest?  A simple switch that runs everything in serial, or everything in parallel?
That would be a very bad idea.

I gave you an example of a class of solutions where you don��t want that behavior.
E.g Unit testing where you have one setup and then run several unit tests in parallel.

If that��s not enough for you�� how about if you want to test producer/consumer problems?

Or if you want to define classes in one paragraph but then call on them in later paragraphs. If everything runs in parallel from the start of time 0, you can��t do this.


So, if you want to do it right the first time�� you need to establish a way to control the dependency of paragraphs. This isn��t rocket science.
And frankly not that complex.

BTW, this is the user list not the dev list��

Just saying��  ;-)


On Oct 2, 2017, at 11:24 AM, Herval Freire <hf...@twitter.com>> wrote:

 "nice to have" isn't a very strong requirement. I strongly uggest you really, really think about this before you start pounding an overengineered solution to a non-issue :-)

h

On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <ms...@hotmail.com>> wrote:
Yes��
 You have bunch of unit tests you can run in parallel where you only need one constructor and one cleanup.

I would strongly suggest that you really, really think about this long and hard before you start to pound code.
Its going to be harder to back out and fix than if you take the time to think thru the problem and not make a dumb mistake.

On Oct 2, 2017, at 11:02 AM, Herval Freire <hf...@twitter.com>> wrote:

Did anyone request such a case ("running some in parallel and some in sequence")? I haven't seen any requests for this in the wild (nor on this thread), other than theoretical "what if" - which is totally fine, when it doesn't introduce a lot of unecessary complexity for little to no gain (which seems to be the case here)

h

On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <ms...@hotmail.com>> wrote:
Because that simplicity doesn��t work.

You will want to run some things serial and some things in parallel.

Which is why you will need a dependency graph.

On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com>> wrote:

Why do you need rules and graphs and any of that to support running everything sequentially or everything in parallel?

3) add a ��run mode�� to the note. If it��s ��sequential��, run the paragraphs one at a time, in the order they��re defined. If parallel, run using current scheme (as many at the same time as the threadpool permits)

Simpler and covers all cases, imo

________________________________
From: Polyakov Valeriy <v....@tinkoff.ru>>
Sent: Monday, October 2, 2017 8:24:35 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: RE: Implementing run all paragraphs sequentially

Let me try to summarize the discussion. Evidently, current behavior of running notes does not meet actual requirements. The most important thing that we need is the ability of sequential running. However, at the same time we want to keep functionality of parallel running. We discussed that the most suitable solution of building paragraphs` dependencies is a DAG (directed acyclic graph). Therefore, surely, this kind of dependencies should be defined in note and the running order should not depend on how we launch it (button / scheduler / API). In this way, our objectives are to implement ��dependency definition engine�� and to use it in ��run engine��. What are the options?
1)      Explicit dependency definition.
We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option ��Wait for ���� where we can choose paragraph for which we are waiting for to start execution. In case where the option is set, we start execution immediately after the end of execution of selected paragraph. This pattern allows us to implement full-parallel DAG running order. What are the disadvantages? All of them are about the same �C not easy understanding of the dependency management process from the perspective of users (and probably redundancy of the functionality �C my personal view). At first, we should use strange format of paragraph IDs, which in addition is hidden. We could come up with visible and handsome paragraph ID aliases, but then it appears necessity of duplication control. The second thing is in some kind of scenarios where we should change existing dependencies (e.g. you need to add new paragraph between one and dependent group �C you have to change option ��Wait for ���� for each paragraph in group).
2)      Implicit dependency definition.

We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option ��Run in parallel with previous�� which allows us to create paragraph groups to run in parallel. It turns out that we have the way of sequential running of paragraph groups �C group by group in which paragraphs run in parallel. This approach is much more understandable for the users, but the obvious defect in comparison with ��Explicit definition�� is the fact that dependency graph and level of parallelism are not so cool.

I am not sure which option (1) or (2) is correct to implement at the moment. I hope to hear from product visionaries which way to choose and to get approval for the start of implementation.
Thank you!



Valeriy Polyakov

From: Michael Segel [mailto:msegel_hadoop@hotmail.com]
Sent: Saturday, September 30, 2017 4:22 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially

Sorry to jump in��

If you want to run paragraphs in parallel, you are going to want to have some sort of dependency graph.  Think of a common set up where you need to set up common functions and imports. (setup of %spark.dep)

A good example is if your notebook is a bunch of unit tests and you need to build the common tear down / set up methods to be used by the other paragraphs.

If you��re going to do that, you��ll need to build out a metadata structure where you can set up your dependencies  as well as add things like labels beyond the ids (which only need to be unique to the given notebook. )

Just my $0.02

On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org>> wrote:

Current behavior is as parallel as possible.
Run notebook button currently submits all paragraphs in a notebook into each interpreter's own scheduler (FIFO, Parallel) at once. And each individual scheduler of interpreter runs the paragraphs.

I think we can provide "sequential" run button for easier use, which submits paragraph one and waits for finish before submit next paragraphs.

And I think sequential run button doesn't stop having more complex / flexible DAG in the future?

Thanks,
moon

On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>> wrote:
What is the current behavior?

On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>> wrote:
At least in our case, the notebooks that we need to run sequentially are expected to *always* run sequentially - thus it makes more sense to be a note option than a per-run mode

H

_____________________________
From: moon soo Lee <mo...@apache.org>>
Sent: Thursday, September 28, 2017 9:03 PM
Subject: Re: Implementing run all paragraphs sequentially
To: <us...@zeppelin.apache.org>>
This is going to be really useful!

Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,
moon

On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>> wrote:
+1, our internal users at Twitter also often request this

________________________________
From: Belousov Maksim Eduardovich <m....@tinkoff.ru>>
Sent: Thursday, September 28, 2017 8:28:58 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Implementing run all paragraphs sequentially

Hello, users!

At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.

It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown

<~WRD000.jpg>



For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"

We are glad to hear any thoughts.
Thank you.


[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368



Maksim Belousov

Re: Implementing run all paragraphs sequentially

Posted by "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>.

Since almost everyone agree on to run serial by default. We could implement it first. Regarding the parallel mode,  we could leave it in future although personally I prefer to define DAG for note.

Best Regard,
Jeff Zhang

From: Michael Segel <ms...@hotmail.com>>
Reply-To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>
Date: Friday, October 6, 2017 at 10:08 PM
To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>
Subject: Re: Implementing run all paragraphs sequentially

Guys…

1) You’re posting this to the user list… Isn’t this a dev question?

2) +1 on the run serial… but doesn’t that already exist with the “run all paragraphs” button already?

3) -1 on a ‘run all in parallel’ button.  (Its like putting lipstick on a pig.)

Are you really going to run all of the paragraphs in parallel?  You’re not going to have a paragraph that is used to set things up? Import external libraries?  Define classes/functions for future paragraphs to use?

IMHO I would much rather see a DAG where each paragraph can set their dependancy… (this isn’t the right term. I’m trying to think back to how it was described in NeXTStep objective-c code.)
Then you could set your parallel button to run in parallel but if your paragraph is dependent on another, its blocked from executing until its predecessor completes.

But that’s just my $0.02

On Oct 6, 2017, at 2:25 AM, Polyakov Valeriy <v....@tinkoff.ru>> wrote:

Thank you all for sharing the problem. Naman Mishra had started the implementation of serial run in [1] so I propose to come back for the discussion of next step (both Parallel and Serial run buttons) after [1] will resolved.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-2368

Valeriy Polyakov

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Friday, October 06, 2017 10:14 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially

+1 for serial run by default.  Let's leave others in future.

Mohit Jaggi <mo...@gmail.com>>于2017年10月6日周五 上午7:48写道：
+1 for serial run by default.

Sent from my iPhone

On Oct 5, 2017, at 3:36 PM, moon soo Lee <mo...@apache.org>> wrote:
I'd like to we also consider simplicity of use.

We can have two different modes, or two different run buttons for Serial or Parallel run. This gives flexibility of choosing two different scheduler as a benefit, but to make user understand difference between two run button, there must be really good UI treatment.

I see there're high user demands for run notebook sequentially. And i think there're 3 action items in this discussion threads.

1. Change Parallel -> Serial the current run all button behavior
2. Provide both Parallel and Serial run buttons with really good UI treatment.
3. Provides DAG

I think 1) does not stop 2) and 3) in the future. 2) also does not stop 3) in the future.

So, why don't we try 1) first and keep discuss and polish idea about 2) and 3)?

Thanks,
moon

On Mon, Oct 2, 2017 at 10:22 AM Michael Segel <ms...@hotmail.com>> wrote:
Whoa!
Seems I walked in to something.

Herval,

What do you suggest?  A simple switch that runs everything in serial, or everything in parallel?
That would be a very bad idea.

I gave you an example of a class of solutions where you don’t want that behavior.
E.g Unit testing where you have one setup and then run several unit tests in parallel.

If that’s not enough for you… how about if you want to test producer/consumer problems?

Or if you want to define classes in one paragraph but then call on them in later paragraphs. If everything runs in parallel from the start of time 0, you can’t do this.

So, if you want to do it right the first time… you need to establish a way to control the dependency of paragraphs. This isn’t rocket science.
And frankly not that complex.

BTW, this is the user list not the dev list…

Just saying…  ;-)

On Oct 2, 2017, at 11:24 AM, Herval Freire <hf...@twitter.com>> wrote:

 "nice to have" isn't a very strong requirement. I strongly uggest you really, really think about this before you start pounding an overengineered solution to a non-issue :-)

h

On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <ms...@hotmail.com>> wrote:
Yes…
 You have bunch of unit tests you can run in parallel where you only need one constructor and one cleanup.

I would strongly suggest that you really, really think about this long and hard before you start to pound code.
Its going to be harder to back out and fix than if you take the time to think thru the problem and not make a dumb mistake.

On Oct 2, 2017, at 11:02 AM, Herval Freire <hf...@twitter.com>> wrote:

Did anyone request such a case ("running some in parallel and some in sequence")? I haven't seen any requests for this in the wild (nor on this thread), other than theoretical "what if" - which is totally fine, when it doesn't introduce a lot of unecessary complexity for little to no gain (which seems to be the case here)

h

On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <ms...@hotmail.com>> wrote:
Because that simplicity doesn’t work.

You will want to run some things serial and some things in parallel.

Which is why you will need a dependency graph.

On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com>> wrote:

Why do you need rules and graphs and any of that to support running everything sequentially or everything in parallel?

3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs one at a time, in the order they’re defined. If parallel, run using current scheme (as many at the same time as the threadpool permits)

Simpler and covers all cases, imo

________________________________
From: Polyakov Valeriy <v....@tinkoff.ru>>
Sent: Monday, October 2, 2017 8:24:35 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: RE: Implementing run all paragraphs sequentially

Let me try to summarize the discussion. Evidently, current behavior of running notes does not meet actual requirements. The most important thing that we need is the ability of sequential running. However, at the same time we want to keep functionality of parallel running. We discussed that the most suitable solution of building paragraphs` dependencies is a DAG (directed acyclic graph). Therefore, surely, this kind of dependencies should be defined in note and the running order should not depend on how we launch it (button / scheduler / API). In this way, our objectives are to implement “dependency definition engine” and to use it in “run engine”. What are the options?
1)      Explicit dependency definition.
We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Wait for …” where we can choose paragraph for which we are waiting for to start execution. In case where the option is set, we start execution immediately after the end of execution of selected paragraph. This pattern allows us to implement full-parallel DAG running order. What are the disadvantages? All of them are about the same – not easy understanding of the dependency management process from the perspective of users (and probably redundancy of the functionality – my personal view). At first, we should use strange format of paragraph IDs, which in addition is hidden. We could come up with visible and handsome paragraph ID aliases, but then it appears necessity of duplication control. The second thing is in some kind of scenarios where we should change existing dependencies (e.g. you need to add new paragraph between one and dependent group – you have to change option “Wait for …” for each paragraph in group).
2)      Implicit dependency definition.

We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Run in parallel with previous” which allows us to create paragraph groups to run in parallel. It turns out that we have the way of sequential running of paragraph groups – group by group in which paragraphs run in parallel. This approach is much more understandable for the users, but the obvious defect in comparison with “Explicit definition” is the fact that dependency graph and level of parallelism are not so cool.

I am not sure which option (1) or (2) is correct to implement at the moment. I hope to hear from product visionaries which way to choose and to get approval for the start of implementation.
Thank you!

Valeriy Polyakov

From: Michael Segel [mailto:msegel_hadoop@hotmail.com]
Sent: Saturday, September 30, 2017 4:22 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially

Sorry to jump in…

If you want to run paragraphs in parallel, you are going to want to have some sort of dependency graph.  Think of a common set up where you need to set up common functions and imports. (setup of %spark.dep)

A good example is if your notebook is a bunch of unit tests and you need to build the common tear down / set up methods to be used by the other paragraphs.

If you’re going to do that, you’ll need to build out a metadata structure where you can set up your dependencies  as well as add things like labels beyond the ids (which only need to be unique to the given notebook. )

Just my $0.02

On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org>> wrote:

Current behavior is as parallel as possible.
Run notebook button currently submits all paragraphs in a notebook into each interpreter's own scheduler (FIFO, Parallel) at once. And each individual scheduler of interpreter runs the paragraphs.

I think we can provide "sequential" run button for easier use, which submits paragraph one and waits for finish before submit next paragraphs.

And I think sequential run button doesn't stop having more complex / flexible DAG in the future?

Thanks,
moon

On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>> wrote:
What is the current behavior?

On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>> wrote:
At least in our case, the notebooks that we need to run sequentially are expected to *always* run sequentially - thus it makes more sense to be a note option than a per-run mode

H

_____________________________
From: moon soo Lee <mo...@apache.org>>
Sent: Thursday, September 28, 2017 9:03 PM
Subject: Re: Implementing run all paragraphs sequentially
To: <us...@zeppelin.apache.org>>
This is going to be really useful!

Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,
moon

On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>> wrote:
+1, our internal users at Twitter also often request this

________________________________
From: Belousov Maksim Eduardovich <m....@tinkoff.ru>>
Sent: Thursday, September 28, 2017 8:28:58 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Implementing run all paragraphs sequentially

Hello, users!

At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.

It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown

<~WRD000.jpg>

For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"

We are glad to hear any thoughts.
Thank you.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368

Maksim Belousov

Re: Implementing run all paragraphs sequentially

Posted by Michael Segel <ms...@hotmail.com>.

Guys…

1) You’re posting this to the user list… Isn’t this a dev question?

2) +1 on the run serial… but doesn’t that already exist with the “run all paragraphs” button already?

3) -1 on a ‘run all in parallel’ button.  (Its like putting lipstick on a pig.)

Are you really going to run all of the paragraphs in parallel?  You’re not going to have a paragraph that is used to set things up? Import external libraries?  Define classes/functions for future paragraphs to use?

IMHO I would much rather see a DAG where each paragraph can set their dependancy… (this isn’t the right term. I’m trying to think back to how it was described in NeXTStep objective-c code.)
Then you could set your parallel button to run in parallel but if your paragraph is dependent on another, its blocked from executing until its predecessor completes.

But that’s just my $0.02

On Oct 6, 2017, at 2:25 AM, Polyakov Valeriy <v....@tinkoff.ru>> wrote:

Thank you all for sharing the problem. Naman Mishra had started the implementation of serial run in [1] so I propose to come back for the discussion of next step (both Parallel and Serial run buttons) after [1] will resolved.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-2368


Valeriy Polyakov

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Friday, October 06, 2017 10:14 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially


+1 for serial run by default.  Let's leave others in future.

Mohit Jaggi <mo...@gmail.com>>于2017年10月6日周五 上午7:48写道：
+1 for serial run by default.

Sent from my iPhone

On Oct 5, 2017, at 3:36 PM, moon soo Lee <mo...@apache.org>> wrote:
I'd like to we also consider simplicity of use.

We can have two different modes, or two different run buttons for Serial or Parallel run. This gives flexibility of choosing two different scheduler as a benefit, but to make user understand difference between two run button, there must be really good UI treatment.

I see there're high user demands for run notebook sequentially. And i think there're 3 action items in this discussion threads.

1. Change Parallel -> Serial the current run all button behavior
2. Provide both Parallel and Serial run buttons with really good UI treatment.
3. Provides DAG

I think 1) does not stop 2) and 3) in the future. 2) also does not stop 3) in the future.

So, why don't we try 1) first and keep discuss and polish idea about 2) and 3)?


Thanks,
moon

On Mon, Oct 2, 2017 at 10:22 AM Michael Segel <ms...@hotmail.com>> wrote:
Whoa!
Seems I walked in to something.

Herval,

What do you suggest?  A simple switch that runs everything in serial, or everything in parallel?
That would be a very bad idea.

I gave you an example of a class of solutions where you don’t want that behavior.
E.g Unit testing where you have one setup and then run several unit tests in parallel.

If that’s not enough for you… how about if you want to test producer/consumer problems?

Or if you want to define classes in one paragraph but then call on them in later paragraphs. If everything runs in parallel from the start of time 0, you can’t do this.


So, if you want to do it right the first time… you need to establish a way to control the dependency of paragraphs. This isn’t rocket science.
And frankly not that complex.

BTW, this is the user list not the dev list…

Just saying…  ;-)


On Oct 2, 2017, at 11:24 AM, Herval Freire <hf...@twitter.com>> wrote:

 "nice to have" isn't a very strong requirement. I strongly uggest you really, really think about this before you start pounding an overengineered solution to a non-issue :-)

h

On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <ms...@hotmail.com>> wrote:
Yes…
 You have bunch of unit tests you can run in parallel where you only need one constructor and one cleanup.

I would strongly suggest that you really, really think about this long and hard before you start to pound code.
Its going to be harder to back out and fix than if you take the time to think thru the problem and not make a dumb mistake.

On Oct 2, 2017, at 11:02 AM, Herval Freire <hf...@twitter.com>> wrote:

Did anyone request such a case ("running some in parallel and some in sequence")? I haven't seen any requests for this in the wild (nor on this thread), other than theoretical "what if" - which is totally fine, when it doesn't introduce a lot of unecessary complexity for little to no gain (which seems to be the case here)

h

On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <ms...@hotmail.com>> wrote:
Because that simplicity doesn’t work.

You will want to run some things serial and some things in parallel.

Which is why you will need a dependency graph.

On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com>> wrote:

Why do you need rules and graphs and any of that to support running everything sequentially or everything in parallel?

3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs one at a time, in the order they’re defined. If parallel, run using current scheme (as many at the same time as the threadpool permits)

Simpler and covers all cases, imo

________________________________
From: Polyakov Valeriy <v....@tinkoff.ru>>
Sent: Monday, October 2, 2017 8:24:35 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: RE: Implementing run all paragraphs sequentially

Let me try to summarize the discussion. Evidently, current behavior of running notes does not meet actual requirements. The most important thing that we need is the ability of sequential running. However, at the same time we want to keep functionality of parallel running. We discussed that the most suitable solution of building paragraphs` dependencies is a DAG (directed acyclic graph). Therefore, surely, this kind of dependencies should be defined in note and the running order should not depend on how we launch it (button / scheduler / API). In this way, our objectives are to implement “dependency definition engine” and to use it in “run engine”. What are the options?
1)      Explicit dependency definition.
We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Wait for …” where we can choose paragraph for which we are waiting for to start execution. In case where the option is set, we start execution immediately after the end of execution of selected paragraph. This pattern allows us to implement full-parallel DAG running order. What are the disadvantages? All of them are about the same – not easy understanding of the dependency management process from the perspective of users (and probably redundancy of the functionality – my personal view). At first, we should use strange format of paragraph IDs, which in addition is hidden. We could come up with visible and handsome paragraph ID aliases, but then it appears necessity of duplication control. The second thing is in some kind of scenarios where we should change existing dependencies (e.g. you need to add new paragraph between one and dependent group – you have to change option “Wait for …” for each paragraph in group).
2)      Implicit dependency definition.

We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Run in parallel with previous” which allows us to create paragraph groups to run in parallel. It turns out that we have the way of sequential running of paragraph groups – group by group in which paragraphs run in parallel. This approach is much more understandable for the users, but the obvious defect in comparison with “Explicit definition” is the fact that dependency graph and level of parallelism are not so cool.

I am not sure which option (1) or (2) is correct to implement at the moment. I hope to hear from product visionaries which way to choose and to get approval for the start of implementation.
Thank you!



Valeriy Polyakov

From: Michael Segel [mailto:msegel_hadoop@hotmail.com]
Sent: Saturday, September 30, 2017 4:22 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially

Sorry to jump in…

If you want to run paragraphs in parallel, you are going to want to have some sort of dependency graph.  Think of a common set up where you need to set up common functions and imports. (setup of %spark.dep)

A good example is if your notebook is a bunch of unit tests and you need to build the common tear down / set up methods to be used by the other paragraphs.

If you’re going to do that, you’ll need to build out a metadata structure where you can set up your dependencies  as well as add things like labels beyond the ids (which only need to be unique to the given notebook. )

Just my $0.02

On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org>> wrote:

Current behavior is as parallel as possible.
Run notebook button currently submits all paragraphs in a notebook into each interpreter's own scheduler (FIFO, Parallel) at once. And each individual scheduler of interpreter runs the paragraphs.

I think we can provide "sequential" run button for easier use, which submits paragraph one and waits for finish before submit next paragraphs.

And I think sequential run button doesn't stop having more complex / flexible DAG in the future?

Thanks,
moon

On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>> wrote:
What is the current behavior?

On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>> wrote:
At least in our case, the notebooks that we need to run sequentially are expected to *always* run sequentially - thus it makes more sense to be a note option than a per-run mode

H

_____________________________
From: moon soo Lee <mo...@apache.org>>
Sent: Thursday, September 28, 2017 9:03 PM
Subject: Re: Implementing run all paragraphs sequentially
To: <us...@zeppelin.apache.org>>
This is going to be really useful!

Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,
moon

On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>> wrote:
+1, our internal users at Twitter also often request this

________________________________
From: Belousov Maksim Eduardovich <m....@tinkoff.ru>>
Sent: Thursday, September 28, 2017 8:28:58 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Implementing run all paragraphs sequentially

Hello, users!

At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.

It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown

<~WRD000.jpg>



For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"

We are glad to hear any thoughts.
Thank you.


[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368



Maksim Belousov

RE: Implementing run all paragraphs sequentially

Posted by Polyakov Valeriy <v....@tinkoff.ru>.

Thank you all for sharing the problem. Naman Mishra had started the implementation of serial run in [1] so I propose to come back for the discussion of next step (both Parallel and Serial run buttons) after [1] will resolved.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-2368

Valeriy Polyakov

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Friday, October 06, 2017 10:14 AM
To: users@zeppelin.apache.org
Subject: Re: Implementing run all paragraphs sequentially

+1 for serial run by default.  Let's leave others in future.

Mohit Jaggi <mo...@gmail.com>>于2017年10月6日周五 上午7:48写道：
+1 for serial run by default.

Sent from my iPhone

On Oct 5, 2017, at 3:36 PM, moon soo Lee <mo...@apache.org>> wrote:
I'd like to we also consider simplicity of use.

We can have two different modes, or two different run buttons for Serial or Parallel run. This gives flexibility of choosing two different scheduler as a benefit, but to make user understand difference between two run button, there must be really good UI treatment.

I see there're high user demands for run notebook sequentially. And i think there're 3 action items in this discussion threads.

1. Change Parallel -> Serial the current run all button behavior
2. Provide both Parallel and Serial run buttons with really good UI treatment.
3. Provides DAG

I think 1) does not stop 2) and 3) in the future. 2) also does not stop 3) in the future.

So, why don't we try 1) first and keep discuss and polish idea about 2) and 3)?

Thanks,
moon

On Mon, Oct 2, 2017 at 10:22 AM Michael Segel <ms...@hotmail.com>> wrote:
Whoa!
Seems I walked in to something.

Herval,

What do you suggest?  A simple switch that runs everything in serial, or everything in parallel?
That would be a very bad idea.

I gave you an example of a class of solutions where you don’t want that behavior.
E.g Unit testing where you have one setup and then run several unit tests in parallel.

If that’s not enough for you… how about if you want to test producer/consumer problems?

Or if you want to define classes in one paragraph but then call on them in later paragraphs. If everything runs in parallel from the start of time 0, you can’t do this.

So, if you want to do it right the first time… you need to establish a way to control the dependency of paragraphs. This isn’t rocket science.
And frankly not that complex.

BTW, this is the user list not the dev list…

Just saying…  ;-)

On Oct 2, 2017, at 11:24 AM, Herval Freire <hf...@twitter.com>> wrote:

 "nice to have" isn't a very strong requirement. I strongly uggest you really, really think about this before you start pounding an overengineered solution to a non-issue :-)

h

On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <ms...@hotmail.com>> wrote:
Yes…
 You have bunch of unit tests you can run in parallel where you only need one constructor and one cleanup.

I would strongly suggest that you really, really think about this long and hard before you start to pound code.
Its going to be harder to back out and fix than if you take the time to think thru the problem and not make a dumb mistake.

On Oct 2, 2017, at 11:02 AM, Herval Freire <hf...@twitter.com>> wrote:

Did anyone request such a case ("running some in parallel and some in sequence")? I haven't seen any requests for this in the wild (nor on this thread), other than theoretical "what if" - which is totally fine, when it doesn't introduce a lot of unecessary complexity for little to no gain (which seems to be the case here)

h

On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <ms...@hotmail.com>> wrote:
Because that simplicity doesn’t work.

You will want to run some things serial and some things in parallel.

Which is why you will need a dependency graph.

On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com>> wrote:

Why do you need rules and graphs and any of that to support running everything sequentially or everything in parallel?

3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs one at a time, in the order they’re defined. If parallel, run using current scheme (as many at the same time as the threadpool permits)

Simpler and covers all cases, imo

________________________________
From: Polyakov Valeriy <v....@tinkoff.ru>>
Sent: Monday, October 2, 2017 8:24:35 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: RE: Implementing run all paragraphs sequentially

Let me try to summarize the discussion. Evidently, current behavior of running notes does not meet actual requirements. The most important thing that we need is the ability of sequential running. However, at the same time we want to keep functionality of parallel running. We discussed that the most suitable solution of building paragraphs` dependencies is a DAG (directed acyclic graph). Therefore, surely, this kind of dependencies should be defined in note and the running order should not depend on how we launch it (button / scheduler / API). In this way, our objectives are to implement “dependency definition engine” and to use it in “run engine”. What are the options?
1)      Explicit dependency definition.
We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Wait for …” where we can choose paragraph for which we are waiting for to start execution. In case where the option is set, we start execution immediately after the end of execution of selected paragraph. This pattern allows us to implement full-parallel DAG running order. What are the disadvantages? All of them are about the same – not easy understanding of the dependency management process from the perspective of users (and probably redundancy of the functionality – my personal view). At first, we should use strange format of paragraph IDs, which in addition is hidden. We could come up with visible and handsome paragraph ID aliases, but then it appears necessity of duplication control. The second thing is in some kind of scenarios where we should change existing dependencies (e.g. you need to add new paragraph between one and dependent group – you have to change option “Wait for …” for each paragraph in group).
2)      Implicit dependency definition.

We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Run in parallel with previous” which allows us to create paragraph groups to run in parallel. It turns out that we have the way of sequential running of paragraph groups – group by group in which paragraphs run in parallel. This approach is much more understandable for the users, but the obvious defect in comparison with “Explicit definition” is the fact that dependency graph and level of parallelism are not so cool.
I am not sure which option (1) or (2) is correct to implement at the moment. I hope to hear from product visionaries which way to choose and to get approval for the start of implementation.
Thank you!

Valeriy Polyakov

From: Michael Segel [mailto:msegel_hadoop@hotmail.com]
Sent: Saturday, September 30, 2017 4:22 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially

Sorry to jump in…

If you want to run paragraphs in parallel, you are going to want to have some sort of dependency graph.  Think of a common set up where you need to set up common functions and imports. (setup of %spark.dep)

A good example is if your notebook is a bunch of unit tests and you need to build the common tear down / set up methods to be used by the other paragraphs.

If you’re going to do that, you’ll need to build out a metadata structure where you can set up your dependencies  as well as add things like labels beyond the ids (which only need to be unique to the given notebook. )

Just my $0.02

On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org>> wrote:

Current behavior is as parallel as possible.
Run notebook button currently submits all paragraphs in a notebook into each interpreter's own scheduler (FIFO, Parallel) at once. And each individual scheduler of interpreter runs the paragraphs.

I think we can provide "sequential" run button for easier use, which submits paragraph one and waits for finish before submit next paragraphs.

And I think sequential run button doesn't stop having more complex / flexible DAG in the future?

Thanks,
moon

On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>> wrote:
What is the current behavior?

On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>> wrote:
At least in our case, the notebooks that we need to run sequentially are expected to *always* run sequentially - thus it makes more sense to be a note option than a per-run mode

H

_____________________________
From: moon soo Lee <mo...@apache.org>>
Sent: Thursday, September 28, 2017 9:03 PM
Subject: Re: Implementing run all paragraphs sequentially
To: <us...@zeppelin.apache.org>>
This is going to be really useful!

Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,
moon

On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>> wrote:
+1, our internal users at Twitter also often request this

________________________________
From: Belousov Maksim Eduardovich <m....@tinkoff.ru>>
Sent: Thursday, September 28, 2017 8:28:58 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Implementing run all paragraphs sequentially

Hello, users!

At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.

It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown

<~WRD000.jpg>

For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"

We are glad to hear any thoughts.
Thank you.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368

Maksim Belousov

Re: Implementing run all paragraphs sequentially

Posted by Jeff Zhang <zj...@gmail.com>.

+1 for serial run by default.  Let's leave others in future.

Mohit Jaggi <mo...@gmail.com>于2017年10月6日周五 上午7:48写道：

> +1 for serial run by default.
>
> Sent from my iPhone
>
> On Oct 5, 2017, at 3:36 PM, moon soo Lee <mo...@apache.org> wrote:
>
> I'd like to we also consider simplicity of use.
>
> We can have two different modes, or two different run buttons for Serial
> or Parallel run. This gives flexibility of choosing two different scheduler
> as a benefit, but to make user understand difference between two run
> button, there must be really good UI treatment.
>
> I see there're high user demands for run notebook sequentially. And i
> think there're 3 action items in this discussion threads.
>
> 1. Change Parallel -> Serial the current run all button behavior
> 2. Provide both Parallel and Serial run buttons with really good UI
> treatment.
> 3. Provides DAG
>
> I think 1) does not stop 2) and 3) in the future. 2) also does not stop 3)
> in the future.
>
> So, why don't we try 1) first and keep discuss and polish idea about 2)
> and 3)?
>
>
> Thanks,
> moon
>
> On Mon, Oct 2, 2017 at 10:22 AM Michael Segel <ms...@hotmail.com>
> wrote:
>
>> Whoa!
>> Seems I walked in to something.
>>
>> Herval,
>>
>> What do you suggest?  A simple switch that runs everything in serial, or
>> everything in parallel?
>> That would be a very bad idea.
>>
>> I gave you an example of a class of solutions where you don’t want that
>> behavior.
>> E.g Unit testing where you have one setup and then run several unit tests
>> in parallel.
>>
>> If that’s not enough for you… how about if you want to test
>> producer/consumer problems?
>>
>> Or if you want to define classes in one paragraph but then call on them
>> in later paragraphs. If everything runs in parallel from the start of time
>> 0, you can’t do this.
>>
>>
>> So, if you want to do it right the first time… you need to establish a
>> way to control the dependency of paragraphs. This isn’t rocket science.
>> And frankly not that complex.
>>
>> BTW, this is the user list not the dev list…
>>
>> Just saying…  ;-)
>>
>>
>> On Oct 2, 2017, at 11:24 AM, Herval Freire <hf...@twitter.com> wrote:
>>
>>  "nice to have" isn't a very strong requirement. I strongly uggest you
>> really, really think about this before you start pounding an overengineered
>> solution to a non-issue :-)
>>
>> h
>>
>> On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <ms...@hotmail.com>
>> wrote:
>>
>>> Yes…
>>>  You have bunch of unit tests you can run in parallel where you only
>>> need one constructor and one cleanup.
>>>
>>> I would strongly suggest that you really, really think about this long
>>> and hard before you start to pound code.
>>> Its going to be harder to back out and fix than if you take the time to
>>> think thru the problem and not make a dumb mistake.
>>>
>>> On Oct 2, 2017, at 11:02 AM, Herval Freire <hf...@twitter.com> wrote:
>>>
>>> Did anyone request such a case ("running some in parallel and some in
>>> sequence")? I haven't seen any requests for this in the wild (nor on this
>>> thread), other than theoretical "what if" - which is totally fine, when it
>>> doesn't introduce a lot of unecessary complexity for little to no gain
>>> (which seems to be the case here)
>>>
>>> h
>>>
>>> On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <msegel_hadoop@hotmail.com
>>> > wrote:
>>>
>>>> Because that simplicity doesn’t work.
>>>>
>>>> You will want to run some things serial and some things in parallel.
>>>>
>>>> Which is why you will need a dependency graph.
>>>>
>>>> On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com> wrote:
>>>>
>>>> Why do you need rules and graphs and any of that to support running
>>>> everything sequentially or everything in parallel?
>>>>
>>>> 3) add a “run mode” to the note. If it’s “sequential”, run the
>>>> paragraphs one at a time, in the order they’re defined. If parallel, run
>>>> using current scheme (as many at the same time as the threadpool permits)
>>>>
>>>> Simpler and covers all cases, imo
>>>>
>>>> ------------------------------
>>>> *From:* Polyakov Valeriy <v....@tinkoff.ru>
>>>> *Sent:* Monday, October 2, 2017 8:24:35 AM
>>>> *To:* users@zeppelin.apache.org
>>>> *Subject:* RE: Implementing run all paragraphs sequentially
>>>>
>>>> Let me try to summarize the discussion. Evidently, current behavior of
>>>> running notes does not meet actual requirements. The most important thing
>>>> that we need is the ability of sequential running. However, at the same
>>>> time we want to keep functionality of parallel running. We discussed that
>>>> the most suitable solution of building paragraphs` dependencies is a DAG
>>>> (directed acyclic graph). Therefore, surely, this kind of dependencies
>>>> should be defined in note and the running order should not depend on how we
>>>> launch it (button / scheduler / API). In this way, our objectives are to
>>>> implement “dependency definition engine” and to use it in “run engine”.
>>>> What are the options?
>>>> 1)      Explicit dependency definition.
>>>> We could take for a rule that each paragraph should wait for the end of
>>>> execution of ALL previous paragraphs. Then we add paragraph option “Wait
>>>> for …” where we can choose paragraph for which we are waiting for to start
>>>> execution. In case where the option is set, we start execution immediately
>>>> after the end of execution of selected paragraph. This pattern allows us to
>>>> implement full-parallel DAG running order. What are the disadvantages? All
>>>> of them are about the same – not easy understanding of the dependency
>>>> management process from the perspective of users (and probably redundancy
>>>> of the functionality – my personal view). At first, we should use strange
>>>> format of paragraph IDs, which in addition is hidden. We could come up with
>>>> visible and handsome paragraph ID aliases, but then it appears necessity of
>>>> duplication control. The second thing is in some kind of scenarios where we
>>>> should change existing dependencies (e.g. you need to add new paragraph
>>>> between one and dependent group – you have to change option “Wait for …”
>>>> for each paragraph in group).
>>>> 2)      Implicit dependency definition.
>>>>
>>>> We could take for a rule that each paragraph should wait for the end of
>>>> execution of ALL previous paragraphs. Then we add paragraph option “Run in
>>>> parallel with previous” which allows us to create paragraph groups to run
>>>> in parallel. It turns out that we have the way of sequential running of
>>>> paragraph groups – group by group in which paragraphs run in parallel. This
>>>> approach is much more understandable for the users, but the obvious defect
>>>> in comparison with “Explicit definition” is the fact that dependency graph
>>>> and level of parallelism are not so cool.
>>>> I am not sure which option (1) or (2) is correct to implement at the
>>>> moment. I hope to hear from product visionaries which way to choose and to
>>>> get approval for the start of implementation.
>>>> Thank you!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Valeriy Polyakov *
>>>>
>>>>
>>>> *From:* Michael Segel [mailto:msegel_hadoop@hotmail.com
>>>> <ms...@hotmail.com>]
>>>> *Sent:* Saturday, September 30, 2017 4:22 PM
>>>> *To:* users@zeppelin.apache.org
>>>> *Subject:* Re: Implementing run all paragraphs sequentially
>>>>
>>>>
>>>> Sorry to jump in…
>>>>
>>>>
>>>> If you want to run paragraphs in parallel, you are going to want to
>>>> have some sort of dependency graph.  Think of a common set up where you
>>>> need to set up common functions and imports. (setup of %spark.dep)
>>>>
>>>>
>>>> A good example is if your notebook is a bunch of unit tests and you
>>>> need to build the common tear down / set up methods to be used by the other
>>>> paragraphs.
>>>>
>>>>
>>>> If you’re going to do that, you’ll need to build out a metadata
>>>> structure where you can set up your dependencies  as well as add things
>>>> like labels beyond the ids (which only need to be unique to the given
>>>> notebook. )
>>>>
>>>>
>>>> Just my $0.02
>>>>
>>>>
>>>>
>>>> On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>>
>>>> Current behavior is as parallel as possible.
>>>> Run notebook button currently submits all paragraphs in a notebook into
>>>> each interpreter's own scheduler (FIFO, Parallel) at once. And each
>>>> individual scheduler of interpreter runs the paragraphs.
>>>>
>>>>
>>>> I think we can provide "sequential" run button for easier use, which
>>>> submits paragraph one and waits for finish before submit next paragraphs.
>>>>
>>>>
>>>> And I think sequential run button doesn't stop having more complex /
>>>> flexible DAG in the future?
>>>>
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>>
>>>> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>
>>>> wrote:
>>>>
>>>> What is the current behavior?
>>>>
>>>>
>>>> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>
>>>> wrote:
>>>>
>>>> At least in our case, the notebooks that we need to run sequentially
>>>> are expected to *always* run sequentially - thus it makes more sense to be
>>>> a note option than a per-run mode
>>>>
>>>>
>>>> H
>>>>
>>>>
>>>>
>>>> _____________________________
>>>> From: moon soo Lee <mo...@apache.org>
>>>> Sent: Thursday, September 28, 2017 9:03 PM
>>>> Subject: Re: Implementing run all paragraphs sequentially
>>>> To: <us...@zeppelin.apache.org>
>>>>
>>>> This is going to be really useful!
>>>>
>>>>
>>>> Curios why do you prefer 'note option' instead of 'run option'?
>>>> Could you compare their pros and cons?
>>>>
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>>
>>>> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>
>>>> wrote:
>>>>
>>>> +1, our internal users at Twitter also often request this
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* Belousov Maksim Eduardovich <m....@tinkoff.ru>
>>>> *Sent:* Thursday, September 28, 2017 8:28:58 AM
>>>> *To:* users@zeppelin.apache.org
>>>> *Subject:* Implementing run all paragraphs sequentially
>>>>
>>>>
>>>> Hello, users!
>>>>
>>>>
>>>> At the moment our analysts often use mixes of interpreters in their
>>>> notes.
>>>> For example, they prepare data using %jdbc and then use it in %pyspark.
>>>> Besides, they often use scheduling to make some regular reporting. And they
>>>> should do something like `time.sleep()` to wait for the data from %jdbc. It
>>>> doesn`t guarantee the result and doesn`t look cool.
>>>>
>>>>
>>>> You can find early attempts to implement sequential running of all
>>>> paragraphs in [1].
>>>> We are really interested in implementation of the issue [2] and are
>>>> ready to solve it.
>>>>
>>>>
>>>> It seems a good idea to discuss any requirements.
>>>> My idea is to introduce note setting that defines the type of running
>>>> to use (parallel or sequential) and leave "Run all" to be the only button
>>>> running all the cells in the note. This will make sequential or parallel
>>>> running the `note option` but not `run option`.
>>>> Option will be controlled by nearby button as shown
>>>>
>>>>
>>>> <~WRD000.jpg>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> For new notes the default state would be "Run sequential all", for old
>>>> - "Run parallel for interpreters"
>>>>
>>>>
>>>> We are glad to hear any thoughts.
>>>> Thank you.
>>>>
>>>>
>>>>
>>>>
>>>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>>>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Maksim Belousov*
>>>>
>>>>
>>>>
>>>
>>>
>>
>>

Re: Implementing run all paragraphs sequentially

Posted by Mohit Jaggi <mo...@gmail.com>.

+1 for serial run by default. 

Sent from my iPhone

> On Oct 5, 2017, at 3:36 PM, moon soo Lee <mo...@apache.org> wrote:
> 
> I'd like to we also consider simplicity of use.
> 
> We can have two different modes, or two different run buttons for Serial or Parallel run. This gives flexibility of choosing two different scheduler as a benefit, but to make user understand difference between two run button, there must be really good UI treatment. 
> 
> I see there're high user demands for run notebook sequentially. And i think there're 3 action items in this discussion threads.
> 
> 1. Change Parallel -> Serial the current run all button behavior
> 2. Provide both Parallel and Serial run buttons with really good UI treatment.
> 3. Provides DAG 
> 
> I think 1) does not stop 2) and 3) in the future. 2) also does not stop 3) in the future.
> 
> So, why don't we try 1) first and keep discuss and polish idea about 2) and 3)?
> 
> 
> Thanks,
> moon
> 
>> On Mon, Oct 2, 2017 at 10:22 AM Michael Segel <ms...@hotmail.com> wrote:
>> Whoa! 
>> Seems I walked in to something. 
>> 
>> Herval, 
>> 
>> What do you suggest?  A simple switch that runs everything in serial, or everything in parallel? 
>> That would be a very bad idea. 
>> 
>> I gave you an example of a class of solutions where you don’t want that behavior. 
>> E.g Unit testing where you have one setup and then run several unit tests in parallel. 
>> 
>> If that’s not enough for you… how about if you want to test producer/consumer problems?  
>> 
>> Or if you want to define classes in one paragraph but then call on them in later paragraphs. If everything runs in parallel from the start of time 0, you can’t do this.
>> 
>> 
>> So, if you want to do it right the first time… you need to establish a way to control the dependency of paragraphs. This isn’t rocket science. 
>> And frankly not that complex. 
>> 
>> BTW, this is the user list not the dev list… 
>> 
>> Just saying…  ;-)
>> 
>> 
>>> On Oct 2, 2017, at 11:24 AM, Herval Freire <hf...@twitter.com> wrote:
>>> 
>>>  "nice to have" isn't a very strong requirement. I strongly uggest you really, really think about this before you start pounding an overengineered solution to a non-issue :-)
>>> 
>>> h
>>> 
>>>> On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <ms...@hotmail.com> wrote:
>>>> Yes… 
>>>>  You have bunch of unit tests you can run in parallel where you only need one constructor and one cleanup. 
>>>> 
>>>> I would strongly suggest that you really, really think about this long and hard before you start to pound code. 
>>>> Its going to be harder to back out and fix than if you take the time to think thru the problem and not make a dumb mistake.
>>>> 
>>>>> On Oct 2, 2017, at 11:02 AM, Herval Freire <hf...@twitter.com> wrote:
>>>>> 
>>>>> Did anyone request such a case ("running some in parallel and some in sequence")? I haven't seen any requests for this in the wild (nor on this thread), other than theoretical "what if" - which is totally fine, when it doesn't introduce a lot of unecessary complexity for little to no gain (which seems to be the case here)
>>>>> 
>>>>> h
>>>>> 
>>>>>> On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <ms...@hotmail.com> wrote:
>>>>>> Because that simplicity doesn’t work. 
>>>>>> 
>>>>>> You will want to run some things serial and some things in parallel. 
>>>>>> 
>>>>>> Which is why you will need a dependency graph.
>>>>>> 
>>>>>>> On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com> wrote:
>>>>>>> 
>>>>>>> Why do you need rules and graphs and any of that to support running everything sequentially or everything in parallel?
>>>>>>> 
>>>>>>> 3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs one at a time, in the order they’re defined. If parallel, run using current scheme (as many at the same time as the threadpool permits)
>>>>>>> 
>>>>>>> Simpler and covers all cases, imo
>>>>>>> 
>>>>>>>   
>>>>>>> From: Polyakov Valeriy <v....@tinkoff.ru>
>>>>>>> Sent: Monday, October 2, 2017 8:24:35 AM
>>>>>>> To: users@zeppelin.apache.org
>>>>>>> Subject: RE: Implementing run all paragraphs sequentially
>>>>>>>  
>>>>>>> Let me try to summarize the discussion. Evidently, current behavior of running notes does not meet actual requirements. The most important thing that we need is the ability of sequential running. However, at the same time we want to keep functionality of parallel running. We discussed that the most suitable solution of building paragraphs` dependencies is a DAG (directed acyclic graph). Therefore, surely, this kind of dependencies should be defined in note and the running order should not depend on how we launch it (button / scheduler / API). In this way, our objectives are to implement “dependency definition engine” and to use it in “run engine”. What are the options?
>>>>>>> 1)      Explicit dependency definition.
>>>>>>> We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Wait for …” where we can choose paragraph for which we are waiting for to start execution. In case where the option is set, we start execution immediately after the end of execution of selected paragraph. This pattern allows us to implement full-parallel DAG running order. What are the disadvantages? All of them are about the same – not easy understanding of the dependency management process from the perspective of users (and probably redundancy of the functionality – my personal view). At first, we should use strange format of paragraph IDs, which in addition is hidden. We could come up with visible and handsome paragraph ID aliases, but then it appears necessity of duplication control. The second thing is in some kind of scenarios where we should change existing dependencies (e.g. you need to add new paragraph between one and dependent group – you have to change option “Wait for …” for each paragraph in group).
>>>>>>> 2)      Implicit dependency definition.
>>>>>>> We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Run in parallel with previous” which allows us to create paragraph groups to run in parallel. It turns out that we have the way of sequential running of paragraph groups – group by group in which paragraphs run in parallel. This approach is much more understandable for the users, but the obvious defect in comparison with “Explicit definition” is the fact that dependency graph and level of parallelism are not so cool.
>>>>>>> 
>>>>>>> I am not sure which option (1) or (2) is correct to implement at the moment. I hope to hear from product visionaries which way to choose and to get approval for the start of implementation.
>>>>>>> Thank you!
>>>>>>>  
>>>>>>>  
>>>>>>> 
>>>>>>> Valeriy Polyakov
>>>>>>> 
>>>>>>>  
>>>>>>> From: Michael Segel [mailto:msegel_hadoop@hotmail.com] 
>>>>>>> Sent: Saturday, September 30, 2017 4:22 PM
>>>>>>> To: users@zeppelin.apache.org
>>>>>>> Subject: Re: Implementing run all paragraphs sequentially
>>>>>>>  
>>>>>>> Sorry to jump in…  
>>>>>>>  
>>>>>>> If you want to run paragraphs in parallel, you are going to want to have some sort of dependency graph.  Think of a common set up where you need to set up common functions and imports. (setup of %spark.dep) 
>>>>>>>  
>>>>>>> A good example is if your notebook is a bunch of unit tests and you need to build the common tear down / set up methods to be used by the other paragraphs. 
>>>>>>>  
>>>>>>> If you’re going to do that, you’ll need to build out a metadata structure where you can set up your dependencies  as well as add things like labels beyond the ids (which only need to be unique to the given notebook. ) 
>>>>>>>  
>>>>>>> Just my $0.02 
>>>>>>>  
>>>>>>> On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>>>>  
>>>>>>> Current behavior is as parallel as possible.
>>>>>>> Run notebook button currently submits all paragraphs in a notebook into each interpreter's own scheduler (FIFO, Parallel) at once. And each individual scheduler of interpreter runs the paragraphs.
>>>>>>>  
>>>>>>> I think we can provide "sequential" run button for easier use, which submits paragraph one and waits for finish before submit next paragraphs.
>>>>>>>  
>>>>>>> And I think sequential run button doesn't stop having more complex / flexible DAG in the future?
>>>>>>>  
>>>>>>> Thanks,
>>>>>>> moon
>>>>>>>  
>>>>>>> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com> wrote:
>>>>>>> What is the current behavior?
>>>>>>>  
>>>>>>> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com> wrote:
>>>>>>> At least in our case, the notebooks that we need to run sequentially are expected to *always* run sequentially - thus it makes more sense to be a note option than a per-run mode
>>>>>>>  
>>>>>>> H
>>>>>>>  
>>>>>>> _____________________________
>>>>>>> From: moon soo Lee <mo...@apache.org>
>>>>>>> Sent: Thursday, September 28, 2017 9:03 PM
>>>>>>> Subject: Re: Implementing run all paragraphs sequentially
>>>>>>> To: <us...@zeppelin.apache.org>
>>>>>>> 
>>>>>>> 
>>>>>>> This is going to be really useful!
>>>>>>>  
>>>>>>> Curios why do you prefer 'note option' instead of 'run option'?
>>>>>>> Could you compare their pros and cons?
>>>>>>>  
>>>>>>> Thanks,
>>>>>>> moon
>>>>>>>  
>>>>>>> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com> wrote:
>>>>>>> +1, our internal users at Twitter also often request this
>>>>>>>  
>>>>>>> From: Belousov Maksim Eduardovich <m....@tinkoff.ru>
>>>>>>> Sent: Thursday, September 28, 2017 8:28:58 AM
>>>>>>> To: users@zeppelin.apache.org
>>>>>>> Subject: Implementing run all paragraphs sequentially
>>>>>>>  
>>>>>>> Hello, users!
>>>>>>>  
>>>>>>> At the moment our analysts often use mixes of interpreters in their notes.
>>>>>>> For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.
>>>>>>>  
>>>>>>> You can find early attempts to implement sequential running of all paragraphs in [1].
>>>>>>> We are really interested in implementation of the issue [2] and are ready to solve it.
>>>>>>>  
>>>>>>> It seems a good idea to discuss any requirements.
>>>>>>> My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
>>>>>>> Option will be controlled by nearby button as shown
>>>>>>>  
>>>>>>> <~WRD000.jpg>
>>>>>>>  
>>>>>>>  
>>>>>>>  
>>>>>>> For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"
>>>>>>>  
>>>>>>> We are glad to hear any thoughts.
>>>>>>> Thank you.
>>>>>>>  
>>>>>>>  
>>>>>>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>>>>>>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>>>>>>  
>>>>>>>  
>>>>>>> 
>>>>>>> Maksim Belousov
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Implementing run all paragraphs sequentially

Posted by moon soo Lee <mo...@apache.org>.

I'd like to we also consider simplicity of use.

We can have two different modes, or two different run buttons for Serial or
Parallel run. This gives flexibility of choosing two different scheduler as
a benefit, but to make user understand difference between two run button,
there must be really good UI treatment.

I see there're high user demands for run notebook sequentially. And i think
there're 3 action items in this discussion threads.

1. Change Parallel -> Serial the current run all button behavior
2. Provide both Parallel and Serial run buttons with really good UI
treatment.
3. Provides DAG

I think 1) does not stop 2) and 3) in the future. 2) also does not stop 3)
in the future.

So, why don't we try 1) first and keep discuss and polish idea about 2) and
3)?


Thanks,
moon

On Mon, Oct 2, 2017 at 10:22 AM Michael Segel <ms...@hotmail.com>
wrote:

> Whoa!
> Seems I walked in to something.
>
> Herval,
>
> What do you suggest?  A simple switch that runs everything in serial, or
> everything in parallel?
> That would be a very bad idea.
>
> I gave you an example of a class of solutions where you don’t want that
> behavior.
> E.g Unit testing where you have one setup and then run several unit tests
> in parallel.
>
> If that’s not enough for you… how about if you want to test
> producer/consumer problems?
>
> Or if you want to define classes in one paragraph but then call on them in
> later paragraphs. If everything runs in parallel from the start of time 0,
> you can’t do this.
>
>
> So, if you want to do it right the first time… you need to establish a way
> to control the dependency of paragraphs. This isn’t rocket science.
> And frankly not that complex.
>
> BTW, this is the user list not the dev list…
>
> Just saying…  ;-)
>
>
> On Oct 2, 2017, at 11:24 AM, Herval Freire <hf...@twitter.com> wrote:
>
>  "nice to have" isn't a very strong requirement. I strongly uggest you
> really, really think about this before you start pounding an overengineered
> solution to a non-issue :-)
>
> h
>
> On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <ms...@hotmail.com>
> wrote:
>
>> Yes…
>>  You have bunch of unit tests you can run in parallel where you only need
>> one constructor and one cleanup.
>>
>> I would strongly suggest that you really, really think about this long
>> and hard before you start to pound code.
>> Its going to be harder to back out and fix than if you take the time to
>> think thru the problem and not make a dumb mistake.
>>
>> On Oct 2, 2017, at 11:02 AM, Herval Freire <hf...@twitter.com> wrote:
>>
>> Did anyone request such a case ("running some in parallel and some in
>> sequence")? I haven't seen any requests for this in the wild (nor on this
>> thread), other than theoretical "what if" - which is totally fine, when it
>> doesn't introduce a lot of unecessary complexity for little to no gain
>> (which seems to be the case here)
>>
>> h
>>
>> On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <ms...@hotmail.com>
>> wrote:
>>
>>> Because that simplicity doesn’t work.
>>>
>>> You will want to run some things serial and some things in parallel.
>>>
>>> Which is why you will need a dependency graph.
>>>
>>> On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com> wrote:
>>>
>>> Why do you need rules and graphs and any of that to support running
>>> everything sequentially or everything in parallel?
>>>
>>> 3) add a “run mode” to the note. If it’s “sequential”, run the
>>> paragraphs one at a time, in the order they’re defined. If parallel, run
>>> using current scheme (as many at the same time as the threadpool permits)
>>>
>>> Simpler and covers all cases, imo
>>>
>>> ------------------------------
>>> *From:* Polyakov Valeriy <v....@tinkoff.ru>
>>> *Sent:* Monday, October 2, 2017 8:24:35 AM
>>> *To:* users@zeppelin.apache.org
>>> *Subject:* RE: Implementing run all paragraphs sequentially
>>>
>>> Let me try to summarize the discussion. Evidently, current behavior of
>>> running notes does not meet actual requirements. The most important thing
>>> that we need is the ability of sequential running. However, at the same
>>> time we want to keep functionality of parallel running. We discussed that
>>> the most suitable solution of building paragraphs` dependencies is a DAG
>>> (directed acyclic graph). Therefore, surely, this kind of dependencies
>>> should be defined in note and the running order should not depend on how we
>>> launch it (button / scheduler / API). In this way, our objectives are to
>>> implement “dependency definition engine” and to use it in “run engine”.
>>> What are the options?
>>> 1)      Explicit dependency definition.
>>> We could take for a rule that each paragraph should wait for the end of
>>> execution of ALL previous paragraphs. Then we add paragraph option “Wait
>>> for …” where we can choose paragraph for which we are waiting for to start
>>> execution. In case where the option is set, we start execution immediately
>>> after the end of execution of selected paragraph. This pattern allows us to
>>> implement full-parallel DAG running order. What are the disadvantages? All
>>> of them are about the same – not easy understanding of the dependency
>>> management process from the perspective of users (and probably redundancy
>>> of the functionality – my personal view). At first, we should use strange
>>> format of paragraph IDs, which in addition is hidden. We could come up with
>>> visible and handsome paragraph ID aliases, but then it appears necessity of
>>> duplication control. The second thing is in some kind of scenarios where we
>>> should change existing dependencies (e.g. you need to add new paragraph
>>> between one and dependent group – you have to change option “Wait for …”
>>> for each paragraph in group).
>>> 2)      Implicit dependency definition.
>>>
>>> We could take for a rule that each paragraph should wait for the end of
>>> execution of ALL previous paragraphs. Then we add paragraph option “Run in
>>> parallel with previous” which allows us to create paragraph groups to run
>>> in parallel. It turns out that we have the way of sequential running of
>>> paragraph groups – group by group in which paragraphs run in parallel. This
>>> approach is much more understandable for the users, but the obvious defect
>>> in comparison with “Explicit definition” is the fact that dependency graph
>>> and level of parallelism are not so cool.
>>> I am not sure which option (1) or (2) is correct to implement at the
>>> moment. I hope to hear from product visionaries which way to choose and to
>>> get approval for the start of implementation.
>>> Thank you!
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *Valeriy Polyakov *
>>>
>>>
>>> *From:* Michael Segel [mailto:msegel_hadoop@hotmail.com
>>> <ms...@hotmail.com>]
>>> *Sent:* Saturday, September 30, 2017 4:22 PM
>>> *To:* users@zeppelin.apache.org
>>> *Subject:* Re: Implementing run all paragraphs sequentially
>>>
>>>
>>> Sorry to jump in…
>>>
>>>
>>> If you want to run paragraphs in parallel, you are going to want to have
>>> some sort of dependency graph.  Think of a common set up where you need to
>>> set up common functions and imports. (setup of %spark.dep)
>>>
>>>
>>> A good example is if your notebook is a bunch of unit tests and you need
>>> to build the common tear down / set up methods to be used by the other
>>> paragraphs.
>>>
>>>
>>> If you’re going to do that, you’ll need to build out a metadata
>>> structure where you can set up your dependencies  as well as add things
>>> like labels beyond the ids (which only need to be unique to the given
>>> notebook. )
>>>
>>>
>>> Just my $0.02
>>>
>>>
>>>
>>> On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org> wrote:
>>>
>>>
>>> Current behavior is as parallel as possible.
>>> Run notebook button currently submits all paragraphs in a notebook into
>>> each interpreter's own scheduler (FIFO, Parallel) at once. And each
>>> individual scheduler of interpreter runs the paragraphs.
>>>
>>>
>>> I think we can provide "sequential" run button for easier use, which
>>> submits paragraph one and waits for finish before submit next paragraphs.
>>>
>>>
>>> And I think sequential run button doesn't stop having more complex /
>>> flexible DAG in the future?
>>>
>>>
>>> Thanks,
>>> moon
>>>
>>>
>>> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>
>>> wrote:
>>>
>>> What is the current behavior?
>>>
>>>
>>> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>
>>> wrote:
>>>
>>> At least in our case, the notebooks that we need to run sequentially are
>>> expected to *always* run sequentially - thus it makes more sense to be a
>>> note option than a per-run mode
>>>
>>>
>>> H
>>>
>>>
>>>
>>> _____________________________
>>> From: moon soo Lee <mo...@apache.org>
>>> Sent: Thursday, September 28, 2017 9:03 PM
>>> Subject: Re: Implementing run all paragraphs sequentially
>>> To: <us...@zeppelin.apache.org>
>>>
>>> This is going to be really useful!
>>>
>>>
>>> Curios why do you prefer 'note option' instead of 'run option'?
>>> Could you compare their pros and cons?
>>>
>>>
>>> Thanks,
>>> moon
>>>
>>>
>>> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>
>>> wrote:
>>>
>>> +1, our internal users at Twitter also often request this
>>>
>>>
>>> ------------------------------
>>> *From:* Belousov Maksim Eduardovich <m....@tinkoff.ru>
>>> *Sent:* Thursday, September 28, 2017 8:28:58 AM
>>> *To:* users@zeppelin.apache.org
>>> *Subject:* Implementing run all paragraphs sequentially
>>>
>>>
>>> Hello, users!
>>>
>>>
>>> At the moment our analysts often use mixes of interpreters in their
>>> notes.
>>> For example, they prepare data using %jdbc and then use it in %pyspark.
>>> Besides, they often use scheduling to make some regular reporting. And they
>>> should do something like `time.sleep()` to wait for the data from %jdbc. It
>>> doesn`t guarantee the result and doesn`t look cool.
>>>
>>>
>>> You can find early attempts to implement sequential running of all
>>> paragraphs in [1].
>>> We are really interested in implementation of the issue [2] and are
>>> ready to solve it.
>>>
>>>
>>> It seems a good idea to discuss any requirements.
>>> My idea is to introduce note setting that defines the type of running to
>>> use (parallel or sequential) and leave "Run all" to be the only button
>>> running all the cells in the note. This will make sequential or parallel
>>> running the `note option` but not `run option`.
>>> Option will be controlled by nearby button as shown
>>>
>>>
>>> <~WRD000.jpg>
>>>
>>>
>>>
>>>
>>>
>>>
>>> For new notes the default state would be "Run sequential all", for old -
>>> "Run parallel for interpreters"
>>>
>>>
>>> We are glad to hear any thoughts.
>>> Thank you.
>>>
>>>
>>>
>>>
>>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>>
>>>
>>>
>>>
>>>
>>>
>>> *Maksim Belousov*
>>>
>>>
>>>
>>
>>
>
>

Re: Implementing run all paragraphs sequentially

Posted by Michael Segel <ms...@hotmail.com>.

Whoa!
Seems I walked in to something.

Herval,

What do you suggest?  A simple switch that runs everything in serial, or everything in parallel?
That would be a very bad idea.

I gave you an example of a class of solutions where you don’t want that behavior.
E.g Unit testing where you have one setup and then run several unit tests in parallel.

If that’s not enough for you… how about if you want to test producer/consumer problems?

Or if you want to define classes in one paragraph but then call on them in later paragraphs. If everything runs in parallel from the start of time 0, you can’t do this.


So, if you want to do it right the first time… you need to establish a way to control the dependency of paragraphs. This isn’t rocket science.
And frankly not that complex.

BTW, this is the user list not the dev list…

Just saying…  ;-)


On Oct 2, 2017, at 11:24 AM, Herval Freire <hf...@twitter.com>> wrote:

 "nice to have" isn't a very strong requirement. I strongly uggest you really, really think about this before you start pounding an overengineered solution to a non-issue :-)

h

On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <ms...@hotmail.com>> wrote:
Yes…
 You have bunch of unit tests you can run in parallel where you only need one constructor and one cleanup.

I would strongly suggest that you really, really think about this long and hard before you start to pound code.
Its going to be harder to back out and fix than if you take the time to think thru the problem and not make a dumb mistake.

On Oct 2, 2017, at 11:02 AM, Herval Freire <hf...@twitter.com>> wrote:

Did anyone request such a case ("running some in parallel and some in sequence")? I haven't seen any requests for this in the wild (nor on this thread), other than theoretical "what if" - which is totally fine, when it doesn't introduce a lot of unecessary complexity for little to no gain (which seems to be the case here)

h

On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <ms...@hotmail.com>> wrote:
Because that simplicity doesn’t work.

You will want to run some things serial and some things in parallel.

Which is why you will need a dependency graph.

On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com>> wrote:

Why do you need rules and graphs and any of that to support running everything sequentially or everything in parallel?

3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs one at a time, in the order they’re defined. If parallel, run using current scheme (as many at the same time as the threadpool permits)

Simpler and covers all cases, imo

________________________________
From: Polyakov Valeriy <v....@tinkoff.ru>>
Sent: Monday, October 2, 2017 8:24:35 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: RE: Implementing run all paragraphs sequentially

Let me try to summarize the discussion. Evidently, current behavior of running notes does not meet actual requirements. The most important thing that we need is the ability of sequential running. However, at the same time we want to keep functionality of parallel running. We discussed that the most suitable solution of building paragraphs` dependencies is a DAG (directed acyclic graph). Therefore, surely, this kind of dependencies should be defined in note and the running order should not depend on how we launch it (button / scheduler / API). In this way, our objectives are to implement “dependency definition engine” and to use it in “run engine”. What are the options?
1)      Explicit dependency definition.
We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Wait for …” where we can choose paragraph for which we are waiting for to start execution. In case where the option is set, we start execution immediately after the end of execution of selected paragraph. This pattern allows us to implement full-parallel DAG running order. What are the disadvantages? All of them are about the same – not easy understanding of the dependency management process from the perspective of users (and probably redundancy of the functionality – my personal view). At first, we should use strange format of paragraph IDs, which in addition is hidden. We could come up with visible and handsome paragraph ID aliases, but then it appears necessity of duplication control. The second thing is in some kind of scenarios where we should change existing dependencies (e.g. you need to add new paragraph between one and dependent group – you have to change option “Wait for …” for each paragraph in group).
2)      Implicit dependency definition.

We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Run in parallel with previous” which allows us to create paragraph groups to run in parallel. It turns out that we have the way of sequential running of paragraph groups – group by group in which paragraphs run in parallel. This approach is much more understandable for the users, but the obvious defect in comparison with “Explicit definition” is the fact that dependency graph and level of parallelism are not so cool.

I am not sure which option (1) or (2) is correct to implement at the moment. I hope to hear from product visionaries which way to choose and to get approval for the start of implementation.
Thank you!



Valeriy Polyakov


From: Michael Segel [mailto:msegel_hadoop@hotmail.com]
Sent: Saturday, September 30, 2017 4:22 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially

Sorry to jump in…

If you want to run paragraphs in parallel, you are going to want to have some sort of dependency graph.  Think of a common set up where you need to set up common functions and imports. (setup of %spark.dep)

A good example is if your notebook is a bunch of unit tests and you need to build the common tear down / set up methods to be used by the other paragraphs.

If you’re going to do that, you’ll need to build out a metadata structure where you can set up your dependencies  as well as add things like labels beyond the ids (which only need to be unique to the given notebook. )

Just my $0.02

On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org>> wrote:

Current behavior is as parallel as possible.
Run notebook button currently submits all paragraphs in a notebook into each interpreter's own scheduler (FIFO, Parallel) at once. And each individual scheduler of interpreter runs the paragraphs.

I think we can provide "sequential" run button for easier use, which submits paragraph one and waits for finish before submit next paragraphs.

And I think sequential run button doesn't stop having more complex / flexible DAG in the future?

Thanks,
moon

On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>> wrote:
What is the current behavior?

On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>> wrote:
At least in our case, the notebooks that we need to run sequentially are expected to *always* run sequentially - thus it makes more sense to be a note option than a per-run mode

H

_____________________________
From: moon soo Lee <mo...@apache.org>>
Sent: Thursday, September 28, 2017 9:03 PM
Subject: Re: Implementing run all paragraphs sequentially
To: <us...@zeppelin.apache.org>>

This is going to be really useful!

Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,
moon

On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>> wrote:
+1, our internal users at Twitter also often request this

________________________________
From: Belousov Maksim Eduardovich <m....@tinkoff.ru>>
Sent: Thursday, September 28, 2017 8:28:58 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Implementing run all paragraphs sequentially

Hello, users!

At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.

It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown

<~WRD000.jpg>



For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"

We are glad to hear any thoughts.
Thank you.


[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368



Maksim Belousov

Re: Implementing run all paragraphs sequentially

Posted by Herval Freire <hf...@twitter.com>.

 "nice to have" isn't a very strong requirement. I strongly uggest you
really, really think about this before you start pounding an overengineered
solution to a non-issue :-)

h

On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <ms...@hotmail.com>
wrote:

> Yes…
>  You have bunch of unit tests you can run in parallel where you only need
> one constructor and one cleanup.
>
> I would strongly suggest that you really, really think about this long and
> hard before you start to pound code.
> Its going to be harder to back out and fix than if you take the time to
> think thru the problem and not make a dumb mistake.
>
> On Oct 2, 2017, at 11:02 AM, Herval Freire <hf...@twitter.com> wrote:
>
> Did anyone request such a case ("running some in parallel and some in
> sequence")? I haven't seen any requests for this in the wild (nor on this
> thread), other than theoretical "what if" - which is totally fine, when it
> doesn't introduce a lot of unecessary complexity for little to no gain
> (which seems to be the case here)
>
> h
>
> On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <ms...@hotmail.com>
> wrote:
>
>> Because that simplicity doesn’t work.
>>
>> You will want to run some things serial and some things in parallel.
>>
>> Which is why you will need a dependency graph.
>>
>> On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com> wrote:
>>
>> Why do you need rules and graphs and any of that to support running
>> everything sequentially or everything in parallel?
>>
>> 3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs
>> one at a time, in the order they’re defined. If parallel, run using current
>> scheme (as many at the same time as the threadpool permits)
>>
>> Simpler and covers all cases, imo
>>
>> ------------------------------
>> *From:* Polyakov Valeriy <v....@tinkoff.ru>
>> *Sent:* Monday, October 2, 2017 8:24:35 AM
>> *To:* users@zeppelin.apache.org
>> *Subject:* RE: Implementing run all paragraphs sequentially
>>
>> Let me try to summarize the discussion. Evidently, current behavior of
>> running notes does not meet actual requirements. The most important thing
>> that we need is the ability of sequential running. However, at the same
>> time we want to keep functionality of parallel running. We discussed that
>> the most suitable solution of building paragraphs` dependencies is a DAG
>> (directed acyclic graph). Therefore, surely, this kind of dependencies
>> should be defined in note and the running order should not depend on how we
>> launch it (button / scheduler / API). In this way, our objectives are to
>> implement “dependency definition engine” and to use it in “run engine”.
>> What are the options?
>> 1)      Explicit dependency definition.
>> We could take for a rule that each paragraph should wait for the end of
>> execution of ALL previous paragraphs. Then we add paragraph option “Wait
>> for …” where we can choose paragraph for which we are waiting for to start
>> execution. In case where the option is set, we start execution immediately
>> after the end of execution of selected paragraph. This pattern allows us to
>> implement full-parallel DAG running order. What are the disadvantages? All
>> of them are about the same – not easy understanding of the dependency
>> management process from the perspective of users (and probably redundancy
>> of the functionality – my personal view). At first, we should use strange
>> format of paragraph IDs, which in addition is hidden. We could come up with
>> visible and handsome paragraph ID aliases, but then it appears necessity of
>> duplication control. The second thing is in some kind of scenarios where we
>> should change existing dependencies (e.g. you need to add new paragraph
>> between one and dependent group – you have to change option “Wait for …”
>> for each paragraph in group).
>> 2)      Implicit dependency definition.
>>
>> We could take for a rule that each paragraph should wait for the end of
>> execution of ALL previous paragraphs. Then we add paragraph option “Run in
>> parallel with previous” which allows us to create paragraph groups to run
>> in parallel. It turns out that we have the way of sequential running of
>> paragraph groups – group by group in which paragraphs run in parallel. This
>> approach is much more understandable for the users, but the obvious defect
>> in comparison with “Explicit definition” is the fact that dependency graph
>> and level of parallelism are not so cool.
>> I am not sure which option (1) or (2) is correct to implement at the
>> moment. I hope to hear from product visionaries which way to choose and to
>> get approval for the start of implementation.
>> Thank you!
>>
>>
>>
>>
>>
>>
>>
>> *Valeriy Polyakov *
>>
>>
>> *From:* Michael Segel [mailto:msegel_hadoop@hotmail.com
>> <ms...@hotmail.com>]
>> *Sent:* Saturday, September 30, 2017 4:22 PM
>> *To:* users@zeppelin.apache.org
>> *Subject:* Re: Implementing run all paragraphs sequentially
>>
>>
>> Sorry to jump in…
>>
>>
>> If you want to run paragraphs in parallel, you are going to want to have
>> some sort of dependency graph.  Think of a common set up where you need to
>> set up common functions and imports. (setup of %spark.dep)
>>
>>
>> A good example is if your notebook is a bunch of unit tests and you need
>> to build the common tear down / set up methods to be used by the other
>> paragraphs.
>>
>>
>> If you’re going to do that, you’ll need to build out a metadata structure
>> where you can set up your dependencies  as well as add things like labels
>> beyond the ids (which only need to be unique to the given notebook. )
>>
>>
>> Just my $0.02
>>
>>
>>
>> On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>>
>> Current behavior is as parallel as possible.
>> Run notebook button currently submits all paragraphs in a notebook into
>> each interpreter's own scheduler (FIFO, Parallel) at once. And each
>> individual scheduler of interpreter runs the paragraphs.
>>
>>
>> I think we can provide "sequential" run button for easier use, which
>> submits paragraph one and waits for finish before submit next paragraphs.
>>
>>
>> And I think sequential run button doesn't stop having more complex /
>> flexible DAG in the future?
>>
>>
>> Thanks,
>> moon
>>
>>
>> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>
>> wrote:
>>
>> What is the current behavior?
>>
>>
>> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>
>> wrote:
>>
>> At least in our case, the notebooks that we need to run sequentially are
>> expected to *always* run sequentially - thus it makes more sense to be a
>> note option than a per-run mode
>>
>>
>> H
>>
>>
>>
>> _____________________________
>> From: moon soo Lee <mo...@apache.org>
>> Sent: Thursday, September 28, 2017 9:03 PM
>> Subject: Re: Implementing run all paragraphs sequentially
>> To: <us...@zeppelin.apache.org>
>>
>> This is going to be really useful!
>>
>>
>> Curios why do you prefer 'note option' instead of 'run option'?
>> Could you compare their pros and cons?
>>
>>
>> Thanks,
>> moon
>>
>>
>> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>
>> wrote:
>>
>> +1, our internal users at Twitter also often request this
>>
>>
>> ------------------------------
>> *From:* Belousov Maksim Eduardovich <m....@tinkoff.ru>
>> *Sent:* Thursday, September 28, 2017 8:28:58 AM
>> *To:* users@zeppelin.apache.org
>> *Subject:* Implementing run all paragraphs sequentially
>>
>>
>> Hello, users!
>>
>>
>> At the moment our analysts often use mixes of interpreters in their notes.
>> For example, they prepare data using %jdbc and then use it in %pyspark.
>> Besides, they often use scheduling to make some regular reporting. And they
>> should do something like `time.sleep()` to wait for the data from %jdbc. It
>> doesn`t guarantee the result and doesn`t look cool.
>>
>>
>> You can find early attempts to implement sequential running of all
>> paragraphs in [1].
>> We are really interested in implementation of the issue [2] and are ready
>> to solve it.
>>
>>
>> It seems a good idea to discuss any requirements.
>> My idea is to introduce note setting that defines the type of running to
>> use (parallel or sequential) and leave "Run all" to be the only button
>> running all the cells in the note. This will make sequential or parallel
>> running the `note option` but not `run option`.
>> Option will be controlled by nearby button as shown
>>
>>
>> <~WRD000.jpg>
>>
>>
>>
>>
>>
>>
>> For new notes the default state would be "Run sequential all", for old -
>> "Run parallel for interpreters"
>>
>>
>> We are glad to hear any thoughts.
>> Thank you.
>>
>>
>>
>>
>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>
>>
>>
>>
>>
>>
>> *Maksim Belousov*
>>
>>
>>
>
>

Re: Implementing run all paragraphs sequentially

Posted by Michael Segel <ms...@hotmail.com>.

Yes…
 You have bunch of unit tests you can run in parallel where you only need one constructor and one cleanup.

I would strongly suggest that you really, really think about this long and hard before you start to pound code.
Its going to be harder to back out and fix than if you take the time to think thru the problem and not make a dumb mistake.

On Oct 2, 2017, at 11:02 AM, Herval Freire <hf...@twitter.com>> wrote:

Did anyone request such a case ("running some in parallel and some in sequence")? I haven't seen any requests for this in the wild (nor on this thread), other than theoretical "what if" - which is totally fine, when it doesn't introduce a lot of unecessary complexity for little to no gain (which seems to be the case here)

h

On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <ms...@hotmail.com>> wrote:
Because that simplicity doesn’t work.

You will want to run some things serial and some things in parallel.

Which is why you will need a dependency graph.

On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com>> wrote:

Why do you need rules and graphs and any of that to support running everything sequentially or everything in parallel?

3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs one at a time, in the order they’re defined. If parallel, run using current scheme (as many at the same time as the threadpool permits)

Simpler and covers all cases, imo

________________________________
From: Polyakov Valeriy <v....@tinkoff.ru>>
Sent: Monday, October 2, 2017 8:24:35 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: RE: Implementing run all paragraphs sequentially

Let me try to summarize the discussion. Evidently, current behavior of running notes does not meet actual requirements. The most important thing that we need is the ability of sequential running. However, at the same time we want to keep functionality of parallel running. We discussed that the most suitable solution of building paragraphs` dependencies is a DAG (directed acyclic graph). Therefore, surely, this kind of dependencies should be defined in note and the running order should not depend on how we launch it (button / scheduler / API). In this way, our objectives are to implement “dependency definition engine” and to use it in “run engine”. What are the options?
1)      Explicit dependency definition.
We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Wait for …” where we can choose paragraph for which we are waiting for to start execution. In case where the option is set, we start execution immediately after the end of execution of selected paragraph. This pattern allows us to implement full-parallel DAG running order. What are the disadvantages? All of them are about the same – not easy understanding of the dependency management process from the perspective of users (and probably redundancy of the functionality – my personal view). At first, we should use strange format of paragraph IDs, which in addition is hidden. We could come up with visible and handsome paragraph ID aliases, but then it appears necessity of duplication control. The second thing is in some kind of scenarios where we should change existing dependencies (e.g. you need to add new paragraph between one and dependent group – you have to change option “Wait for …” for each paragraph in group).
2)      Implicit dependency definition.

We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Run in parallel with previous” which allows us to create paragraph groups to run in parallel. It turns out that we have the way of sequential running of paragraph groups – group by group in which paragraphs run in parallel. This approach is much more understandable for the users, but the obvious defect in comparison with “Explicit definition” is the fact that dependency graph and level of parallelism are not so cool.

I am not sure which option (1) or (2) is correct to implement at the moment. I hope to hear from product visionaries which way to choose and to get approval for the start of implementation.
Thank you!



Valeriy Polyakov


From: Michael Segel [mailto:msegel_hadoop@hotmail.com]
Sent: Saturday, September 30, 2017 4:22 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially

Sorry to jump in…

If you want to run paragraphs in parallel, you are going to want to have some sort of dependency graph.  Think of a common set up where you need to set up common functions and imports. (setup of %spark.dep)

A good example is if your notebook is a bunch of unit tests and you need to build the common tear down / set up methods to be used by the other paragraphs.

If you’re going to do that, you’ll need to build out a metadata structure where you can set up your dependencies  as well as add things like labels beyond the ids (which only need to be unique to the given notebook. )

Just my $0.02

On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org>> wrote:

Current behavior is as parallel as possible.
Run notebook button currently submits all paragraphs in a notebook into each interpreter's own scheduler (FIFO, Parallel) at once. And each individual scheduler of interpreter runs the paragraphs.

I think we can provide "sequential" run button for easier use, which submits paragraph one and waits for finish before submit next paragraphs.

And I think sequential run button doesn't stop having more complex / flexible DAG in the future?

Thanks,
moon

On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>> wrote:
What is the current behavior?

On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>> wrote:
At least in our case, the notebooks that we need to run sequentially are expected to *always* run sequentially - thus it makes more sense to be a note option than a per-run mode

H

_____________________________
From: moon soo Lee <mo...@apache.org>>
Sent: Thursday, September 28, 2017 9:03 PM
Subject: Re: Implementing run all paragraphs sequentially
To: <us...@zeppelin.apache.org>>

This is going to be really useful!

Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,
moon

On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>> wrote:
+1, our internal users at Twitter also often request this

________________________________
From: Belousov Maksim Eduardovich <m....@tinkoff.ru>>
Sent: Thursday, September 28, 2017 8:28:58 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Implementing run all paragraphs sequentially

Hello, users!

At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.

It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown

<~WRD000.jpg>



For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"

We are glad to hear any thoughts.
Thank you.


[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368



Maksim Belousov

Re: Implementing run all paragraphs sequentially

Posted by Herval Freire <hf...@twitter.com>.

Did anyone request such a case ("running some in parallel and some in
sequence")? I haven't seen any requests for this in the wild (nor on this
thread), other than theoretical "what if" - which is totally fine, when it
doesn't introduce a lot of unecessary complexity for little to no gain
(which seems to be the case here)

h

On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <ms...@hotmail.com>
wrote:

> Because that simplicity doesn’t work.
>
> You will want to run some things serial and some things in parallel.
>
> Which is why you will need a dependency graph.
>
> On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com> wrote:
>
> Why do you need rules and graphs and any of that to support running
> everything sequentially or everything in parallel?
>
> 3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs
> one at a time, in the order they’re defined. If parallel, run using current
> scheme (as many at the same time as the threadpool permits)
>
> Simpler and covers all cases, imo
>
> ------------------------------
> *From:* Polyakov Valeriy <v....@tinkoff.ru>
> *Sent:* Monday, October 2, 2017 8:24:35 AM
> *To:* users@zeppelin.apache.org
> *Subject:* RE: Implementing run all paragraphs sequentially
>
> Let me try to summarize the discussion. Evidently, current behavior of
> running notes does not meet actual requirements. The most important thing
> that we need is the ability of sequential running. However, at the same
> time we want to keep functionality of parallel running. We discussed that
> the most suitable solution of building paragraphs` dependencies is a DAG
> (directed acyclic graph). Therefore, surely, this kind of dependencies
> should be defined in note and the running order should not depend on how we
> launch it (button / scheduler / API). In this way, our objectives are to
> implement “dependency definition engine” and to use it in “run engine”.
> What are the options?
> 1)      Explicit dependency definition.
> We could take for a rule that each paragraph should wait for the end of
> execution of ALL previous paragraphs. Then we add paragraph option “Wait
> for …” where we can choose paragraph for which we are waiting for to start
> execution. In case where the option is set, we start execution immediately
> after the end of execution of selected paragraph. This pattern allows us to
> implement full-parallel DAG running order. What are the disadvantages? All
> of them are about the same – not easy understanding of the dependency
> management process from the perspective of users (and probably redundancy
> of the functionality – my personal view). At first, we should use strange
> format of paragraph IDs, which in addition is hidden. We could come up with
> visible and handsome paragraph ID aliases, but then it appears necessity of
> duplication control. The second thing is in some kind of scenarios where we
> should change existing dependencies (e.g. you need to add new paragraph
> between one and dependent group – you have to change option “Wait for …”
> for each paragraph in group).
> 2)      Implicit dependency definition.
>
> We could take for a rule that each paragraph should wait for the end of
> execution of ALL previous paragraphs. Then we add paragraph option “Run in
> parallel with previous” which allows us to create paragraph groups to run
> in parallel. It turns out that we have the way of sequential running of
> paragraph groups – group by group in which paragraphs run in parallel. This
> approach is much more understandable for the users, but the obvious defect
> in comparison with “Explicit definition” is the fact that dependency graph
> and level of parallelism are not so cool.
> I am not sure which option (1) or (2) is correct to implement at the
> moment. I hope to hear from product visionaries which way to choose and to
> get approval for the start of implementation.
> Thank you!
>
>
>
>
>
>
>
> *Valeriy Polyakov *
>
>
> *From:* Michael Segel [mailto:msegel_hadoop@hotmail.com
> <ms...@hotmail.com>]
> *Sent:* Saturday, September 30, 2017 4:22 PM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: Implementing run all paragraphs sequentially
>
>
> Sorry to jump in…
>
>
> If you want to run paragraphs in parallel, you are going to want to have
> some sort of dependency graph.  Think of a common set up where you need to
> set up common functions and imports. (setup of %spark.dep)
>
>
> A good example is if your notebook is a bunch of unit tests and you need
> to build the common tear down / set up methods to be used by the other
> paragraphs.
>
>
> If you’re going to do that, you’ll need to build out a metadata structure
> where you can set up your dependencies  as well as add things like labels
> beyond the ids (which only need to be unique to the given notebook. )
>
>
> Just my $0.02
>
>
>
> On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org> wrote:
>
>
> Current behavior is as parallel as possible.
> Run notebook button currently submits all paragraphs in a notebook into
> each interpreter's own scheduler (FIFO, Parallel) at once. And each
> individual scheduler of interpreter runs the paragraphs.
>
>
> I think we can provide "sequential" run button for easier use, which
> submits paragraph one and waits for finish before submit next paragraphs.
>
>
> And I think sequential run button doesn't stop having more complex /
> flexible DAG in the future?
>
>
> Thanks,
> moon
>
>
> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com> wrote:
>
> What is the current behavior?
>
>
> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>
> wrote:
>
> At least in our case, the notebooks that we need to run sequentially are
> expected to *always* run sequentially - thus it makes more sense to be a
> note option than a per-run mode
>
>
> H
>
>
>
> _____________________________
> From: moon soo Lee <mo...@apache.org>
> Sent: Thursday, September 28, 2017 9:03 PM
> Subject: Re: Implementing run all paragraphs sequentially
> To: <us...@zeppelin.apache.org>
>
> This is going to be really useful!
>
>
> Curios why do you prefer 'note option' instead of 'run option'?
> Could you compare their pros and cons?
>
>
> Thanks,
> moon
>
>
> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com> wrote:
>
> +1, our internal users at Twitter also often request this
>
>
> ------------------------------
> *From:* Belousov Maksim Eduardovich <m....@tinkoff.ru>
> *Sent:* Thursday, September 28, 2017 8:28:58 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Implementing run all paragraphs sequentially
>
>
> Hello, users!
>
>
> At the moment our analysts often use mixes of interpreters in their notes.
> For example, they prepare data using %jdbc and then use it in %pyspark.
> Besides, they often use scheduling to make some regular reporting. And they
> should do something like `time.sleep()` to wait for the data from %jdbc. It
> doesn`t guarantee the result and doesn`t look cool.
>
>
> You can find early attempts to implement sequential running of all
> paragraphs in [1].
> We are really interested in implementation of the issue [2] and are ready
> to solve it.
>
>
> It seems a good idea to discuss any requirements.
> My idea is to introduce note setting that defines the type of running to
> use (parallel or sequential) and leave "Run all" to be the only button
> running all the cells in the note. This will make sequential or parallel
> running the `note option` but not `run option`.
> Option will be controlled by nearby button as shown
>
>
> <~WRD000.jpg>
>
>
>
>
>
>
> For new notes the default state would be "Run sequential all", for old -
> "Run parallel for interpreters"
>
>
> We are glad to hear any thoughts.
> Thank you.
>
>
>
>
> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>
>
>
>
>
>
> *Maksim Belousov*
>
>
>

Re: Implementing run all paragraphs sequentially

Posted by Patrick Maroney <pm...@wapacklabs.com>.

+1 on DAG.
+1 on adopting an existing mature DAG framework.
+1 on ability to stitch together different Interpreters in this framework (e.g. Python, ElasticSearch).

Patrick Maroney
Principal Engineer - Data Science & Analytics
Wapack Labs LLC


Public Key: http://pgp.mit.edu/pks/lookup?op=get&search=0x7C810C9769BD29AF

On Oct 2, 2017, at 11:48 AM, Michael Segel <ms...@hotmail.com> wrote:

Because that simplicity doesn’t work.

You will want to run some things serial and some things in parallel.

Which is why you will need a dependency graph.

Re: Implementing run all paragraphs sequentially

Posted by Michael Segel <ms...@hotmail.com>.

Because that simplicity doesn’t work.

You will want to run some things serial and some things in parallel.

Which is why you will need a dependency graph.

On Oct 2, 2017, at 10:40 AM, Herval Freire <hf...@twitter.com>> wrote:

Why do you need rules and graphs and any of that to support running everything sequentially or everything in parallel?

3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs one at a time, in the order they’re defined. If parallel, run using current scheme (as many at the same time as the threadpool permits)

Simpler and covers all cases, imo

________________________________
From: Polyakov Valeriy <v....@tinkoff.ru>>
Sent: Monday, October 2, 2017 8:24:35 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: RE: Implementing run all paragraphs sequentially

Let me try to summarize the discussion. Evidently, current behavior of running notes does not meet actual requirements. The most important thing that we need is the ability of sequential running. However, at the same time we want to keep functionality of parallel running. We discussed that the most suitable solution of building paragraphs` dependencies is a DAG (directed acyclic graph). Therefore, surely, this kind of dependencies should be defined in note and the running order should not depend on how we launch it (button / scheduler / API). In this way, our objectives are to implement “dependency definition engine” and to use it in “run engine”. What are the options?
1)      Explicit dependency definition.
We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Wait for …” where we can choose paragraph for which we are waiting for to start execution. In case where the option is set, we start execution immediately after the end of execution of selected paragraph. This pattern allows us to implement full-parallel DAG running order. What are the disadvantages? All of them are about the same – not easy understanding of the dependency management process from the perspective of users (and probably redundancy of the functionality – my personal view). At first, we should use strange format of paragraph IDs, which in addition is hidden. We could come up with visible and handsome paragraph ID aliases, but then it appears necessity of duplication control. The second thing is in some kind of scenarios where we should change existing dependencies (e.g. you need to add new paragraph between one and dependent group – you have to change option “Wait for …” for each paragraph in group).
2)      Implicit dependency definition.

We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Run in parallel with previous” which allows us to create paragraph groups to run in parallel. It turns out that we have the way of sequential running of paragraph groups – group by group in which paragraphs run in parallel. This approach is much more understandable for the users, but the obvious defect in comparison with “Explicit definition” is the fact that dependency graph and level of parallelism are not so cool.

I am not sure which option (1) or (2) is correct to implement at the moment. I hope to hear from product visionaries which way to choose and to get approval for the start of implementation.
Thank you!



Valeriy Polyakov


From: Michael Segel [mailto:msegel_hadoop@hotmail.com]
Sent: Saturday, September 30, 2017 4:22 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Implementing run all paragraphs sequentially

Sorry to jump in…

If you want to run paragraphs in parallel, you are going to want to have some sort of dependency graph.  Think of a common set up where you need to set up common functions and imports. (setup of %spark.dep)

A good example is if your notebook is a bunch of unit tests and you need to build the common tear down / set up methods to be used by the other paragraphs.

If you’re going to do that, you’ll need to build out a metadata structure where you can set up your dependencies  as well as add things like labels beyond the ids (which only need to be unique to the given notebook. )

Just my $0.02

On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org>> wrote:

Current behavior is as parallel as possible.
Run notebook button currently submits all paragraphs in a notebook into each interpreter's own scheduler (FIFO, Parallel) at once. And each individual scheduler of interpreter runs the paragraphs.

I think we can provide "sequential" run button for easier use, which submits paragraph one and waits for finish before submit next paragraphs.

And I think sequential run button doesn't stop having more complex / flexible DAG in the future?

Thanks,
moon

On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>> wrote:
What is the current behavior?

On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>> wrote:
At least in our case, the notebooks that we need to run sequentially are expected to *always* run sequentially - thus it makes more sense to be a note option than a per-run mode

H

_____________________________
From: moon soo Lee <mo...@apache.org>>
Sent: Thursday, September 28, 2017 9:03 PM
Subject: Re: Implementing run all paragraphs sequentially
To: <us...@zeppelin.apache.org>>

This is going to be really useful!

Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,
moon

On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>> wrote:
+1, our internal users at Twitter also often request this

________________________________
From: Belousov Maksim Eduardovich <m....@tinkoff.ru>>
Sent: Thursday, September 28, 2017 8:28:58 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Implementing run all paragraphs sequentially

Hello, users!

At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.

It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown

<~WRD000.jpg>



For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"

We are glad to hear any thoughts.
Thank you.


[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368



Maksim Belousov

Re: Implementing run all paragraphs sequentially

Posted by Herval Freire <hf...@twitter.com>.

Why do you need rules and graphs and any of that to support running everything sequentially or everything in parallel?

3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs one at a time, in the order they’re defined. If parallel, run using current scheme (as many at the same time as the threadpool permits)

Simpler and covers all cases, imo

________________________________
From: Polyakov Valeriy <v....@tinkoff.ru>
Sent: Monday, October 2, 2017 8:24:35 AM
To: users@zeppelin.apache.org
Subject: RE: Implementing run all paragraphs sequentially

Let me try to summarize the discussion. Evidently, current behavior of running notes does not meet actual requirements. The most important thing that we need is the ability of sequential running. However, at the same time we want to keep functionality of parallel running. We discussed that the most suitable solution of building paragraphs` dependencies is a DAG (directed acyclic graph). Therefore, surely, this kind of dependencies should be defined in note and the running order should not depend on how we launch it (button / scheduler / API). In this way, our objectives are to implement “dependency definition engine” and to use it in “run engine”. What are the options?

1)      Explicit dependency definition.

We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Wait for …” where we can choose paragraph for which we are waiting for to start execution. In case where the option is set, we start execution immediately after the end of execution of selected paragraph. This pattern allows us to implement full-parallel DAG running order. What are the disadvantages? All of them are about the same – not easy understanding of the dependency management process from the perspective of users (and probably redundancy of the functionality – my personal view). At first, we should use strange format of paragraph IDs, which in addition is hidden. We could come up with visible and handsome paragraph ID aliases, but then it appears necessity of duplication control. The second thing is in some kind of scenarios where we should change existing dependencies (e.g. you need to add new paragraph between one and dependent group – you have to change option “Wait for …” for each paragraph in group).

2)      Implicit dependency definition.

We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Run in parallel with previous” which allows us to create paragraph groups to run in parallel. It turns out that we have the way of sequential running of paragraph groups – group by group in which paragraphs run in parallel. This approach is much more understandable for the users, but the obvious defect in comparison with “Explicit definition” is the fact that dependency graph and level of parallelism are not so cool.
I am not sure which option (1) or (2) is correct to implement at the moment. I hope to hear from product visionaries which way to choose and to get approval for the start of implementation.
Thank you!

Valeriy Polyakov

From: Michael Segel [mailto:msegel_hadoop@hotmail.com]
Sent: Saturday, September 30, 2017 4:22 PM
To: users@zeppelin.apache.org
Subject: Re: Implementing run all paragraphs sequentially

Sorry to jump in…

If you want to run paragraphs in parallel, you are going to want to have some sort of dependency graph.  Think of a common set up where you need to set up common functions and imports. (setup of %spark.dep)

A good example is if your notebook is a bunch of unit tests and you need to build the common tear down / set up methods to be used by the other paragraphs.

If you’re going to do that, you’ll need to build out a metadata structure where you can set up your dependencies  as well as add things like labels beyond the ids (which only need to be unique to the given notebook. )

Just my $0.02

On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org>> wrote:

Current behavior is as parallel as possible.
Run notebook button currently submits all paragraphs in a notebook into each interpreter's own scheduler (FIFO, Parallel) at once. And each individual scheduler of interpreter runs the paragraphs.

I think we can provide "sequential" run button for easier use, which submits paragraph one and waits for finish before submit next paragraphs.

And I think sequential run button doesn't stop having more complex / flexible DAG in the future?

Thanks,
moon

On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>> wrote:
What is the current behavior?

On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>> wrote:
At least in our case, the notebooks that we need to run sequentially are expected to *always* run sequentially - thus it makes more sense to be a note option than a per-run mode

H

_____________________________
From: moon soo Lee <mo...@apache.org>>
Sent: Thursday, September 28, 2017 9:03 PM
Subject: Re: Implementing run all paragraphs sequentially
To: <us...@zeppelin.apache.org>>

This is going to be really useful!

Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,
moon

On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>> wrote:
+1, our internal users at Twitter also often request this

________________________________
From: Belousov Maksim Eduardovich <m....@tinkoff.ru>>
Sent: Thursday, September 28, 2017 8:28:58 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Implementing run all paragraphs sequentially

Hello, users!

At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.

It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown

[Image removed by sender. image002.jpg]

For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"

We are glad to hear any thoughts.
Thank you.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368

Maksim Belousov

RE: Implementing run all paragraphs sequentially

Posted by Polyakov Valeriy <v....@tinkoff.ru>.

Let me try to summarize the discussion. Evidently, current behavior of running notes does not meet actual requirements. The most important thing that we need is the ability of sequential running. However, at the same time we want to keep functionality of parallel running. We discussed that the most suitable solution of building paragraphs` dependencies is a DAG (directed acyclic graph). Therefore, surely, this kind of dependencies should be defined in note and the running order should not depend on how we launch it (button / scheduler / API). In this way, our objectives are to implement “dependency definition engine” and to use it in “run engine”. What are the options?

1)      Explicit dependency definition.

We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Wait for …” where we can choose paragraph for which we are waiting for to start execution. In case where the option is set, we start execution immediately after the end of execution of selected paragraph. This pattern allows us to implement full-parallel DAG running order. What are the disadvantages? All of them are about the same – not easy understanding of the dependency management process from the perspective of users (and probably redundancy of the functionality – my personal view). At first, we should use strange format of paragraph IDs, which in addition is hidden. We could come up with visible and handsome paragraph ID aliases, but then it appears necessity of duplication control. The second thing is in some kind of scenarios where we should change existing dependencies (e.g. you need to add new paragraph between one and dependent group – you have to change option “Wait for …” for each paragraph in group).

2)      Implicit dependency definition.

We could take for a rule that each paragraph should wait for the end of execution of ALL previous paragraphs. Then we add paragraph option “Run in parallel with previous” which allows us to create paragraph groups to run in parallel. It turns out that we have the way of sequential running of paragraph groups – group by group in which paragraphs run in parallel. This approach is much more understandable for the users, but the obvious defect in comparison with “Explicit definition” is the fact that dependency graph and level of parallelism are not so cool.
I am not sure which option (1) or (2) is correct to implement at the moment. I hope to hear from product visionaries which way to choose and to get approval for the start of implementation.
Thank you!



Valeriy Polyakov


From: Michael Segel [mailto:msegel_hadoop@hotmail.com]
Sent: Saturday, September 30, 2017 4:22 PM
To: users@zeppelin.apache.org
Subject: Re: Implementing run all paragraphs sequentially

Sorry to jump in…

If you want to run paragraphs in parallel, you are going to want to have some sort of dependency graph.  Think of a common set up where you need to set up common functions and imports. (setup of %spark.dep)

A good example is if your notebook is a bunch of unit tests and you need to build the common tear down / set up methods to be used by the other paragraphs.

If you’re going to do that, you’ll need to build out a metadata structure where you can set up your dependencies  as well as add things like labels beyond the ids (which only need to be unique to the given notebook. )

Just my $0.02

On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org>> wrote:

Current behavior is as parallel as possible.
Run notebook button currently submits all paragraphs in a notebook into each interpreter's own scheduler (FIFO, Parallel) at once. And each individual scheduler of interpreter runs the paragraphs.

I think we can provide "sequential" run button for easier use, which submits paragraph one and waits for finish before submit next paragraphs.

And I think sequential run button doesn't stop having more complex / flexible DAG in the future?

Thanks,
moon

On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>> wrote:
What is the current behavior?

On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>> wrote:
At least in our case, the notebooks that we need to run sequentially are expected to *always* run sequentially - thus it makes more sense to be a note option than a per-run mode

H

_____________________________
From: moon soo Lee <mo...@apache.org>>
Sent: Thursday, September 28, 2017 9:03 PM
Subject: Re: Implementing run all paragraphs sequentially
To: <us...@zeppelin.apache.org>>

This is going to be really useful!

Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,
moon

On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>> wrote:
+1, our internal users at Twitter also often request this

________________________________
From: Belousov Maksim Eduardovich <m....@tinkoff.ru>>
Sent: Thursday, September 28, 2017 8:28:58 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Implementing run all paragraphs sequentially

Hello, users!

At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.

It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown

[Image removed by sender. image002.jpg]



For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"

We are glad to hear any thoughts.
Thank you.


[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368



Maksim Belousov

Re: Implementing run all paragraphs sequentially

Posted by Michael Segel <ms...@hotmail.com>.

Sorry to jump in…

If you want to run paragraphs in parallel, you are going to want to have some sort of dependency graph.  Think of a common set up where you need to set up common functions and imports. (setup of %spark.dep)

A good example is if your notebook is a bunch of unit tests and you need to build the common tear down / set up methods to be used by the other paragraphs.

If you’re going to do that, you’ll need to build out a metadata structure where you can set up your dependencies  as well as add things like labels beyond the ids (which only need to be unique to the given notebook. )

Just my $0.02

On Sep 29, 2017, at 1:30 PM, moon soo Lee <mo...@apache.org>> wrote:

Current behavior is as parallel as possible.
Run notebook button currently submits all paragraphs in a notebook into each interpreter's own scheduler (FIFO, Parallel) at once. And each individual scheduler of interpreter runs the paragraphs.

I think we can provide "sequential" run button for easier use, which submits paragraph one and waits for finish before submit next paragraphs.

And I think sequential run button doesn't stop having more complex / flexible DAG in the future?

Thanks,
moon

On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com>> wrote:
What is the current behavior?

On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>> wrote:
At least in our case, the notebooks that we need to run sequentially are expected to *always* run sequentially - thus it makes more sense to be a note option than a per-run mode

H

_____________________________
From: moon soo Lee <mo...@apache.org>>
Sent: Thursday, September 28, 2017 9:03 PM
Subject: Re: Implementing run all paragraphs sequentially
To: <us...@zeppelin.apache.org>>


This is going to be really useful!

Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,
moon

On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>> wrote:
+1, our internal users at Twitter also often request this

________________________________
From: Belousov Maksim Eduardovich <m....@tinkoff.ru>>
Sent: Thursday, September 28, 2017 8:28:58 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Implementing run all paragraphs sequentially

Hello, users!

At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.

It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown

[image002.jpg]



For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"

We are glad to hear any thoughts.
Thank you.


[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368



Maksim Belousov

Re: Implementing run all paragraphs sequentially

Posted by moon soo Lee <mo...@apache.org>.

Current behavior is as parallel as possible.
Run notebook button currently submits all paragraphs in a notebook into
each interpreter's own scheduler (FIFO, Parallel) at once. And each
individual scheduler of interpreter runs the paragraphs.

I think we can provide "sequential" run button for easier use, which
submits paragraph one and waits for finish before submit next paragraphs.

And I think sequential run button doesn't stop having more complex /
flexible DAG in the future?

Thanks,
moon

On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mo...@gmail.com> wrote:

> What is the current behavior?
>
> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com>
> wrote:
>
>> At least in our case, the notebooks that we need to run sequentially are
>> expected to *always* run sequentially - thus it makes more sense to be a
>> note option than a per-run mode
>>
>> H
>>
>> _____________________________
>> From: moon soo Lee <mo...@apache.org>
>> Sent: Thursday, September 28, 2017 9:03 PM
>> Subject: Re: Implementing run all paragraphs sequentially
>> To: <us...@zeppelin.apache.org>
>>
>>
>> This is going to be really useful!
>>
>> Curios why do you prefer 'note option' instead of 'run option'?
>> Could you compare their pros and cons?
>>
>> Thanks,
>> moon
>>
>> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>
>> wrote:
>>
>>> +1, our internal users at Twitter also often request this
>>>
>>> ------------------------------
>>> *From:* Belousov Maksim Eduardovich <m....@tinkoff.ru>
>>> *Sent:* Thursday, September 28, 2017 8:28:58 AM
>>> *To:* users@zeppelin.apache.org
>>> *Subject:* Implementing run all paragraphs sequentially
>>>
>>>
>>> Hello, users!
>>>
>>> At the moment our analysts often use mixes of interpreters in their
>>> notes.
>>>
>>> For example, they prepare data using %jdbc and then use it in %pyspark.
>>> Besides, they often use scheduling to make some regular reporting. And they
>>> should do something like `time.sleep()` to wait for the data from %jdbc. It
>>> doesn`t guarantee the result and doesn`t look cool.
>>>
>>>
>>>
>>> You can find early attempts to implement sequential running of all
>>> paragraphs in [1].
>>>
>>> We are really interested in implementation of the issue [2] and are
>>> ready to solve it.
>>>
>>> It seems a good idea to discuss any requirements.
>>>
>>> My idea is to introduce note setting that defines the type of running to
>>> use (parallel or sequential) and leave "Run all" to be the only button
>>> running all the cells in the note. This will make sequential or parallel
>>> running the `note option` but not `run option`.
>>>
>>> Option will be controlled by nearby button as shown
>>>
>>> [image: image002.jpg]
>>>
>>>
>>>
>>>
>>>
>>> For new notes the default state would be "Run sequential all", for old -
>>> "Run parallel for interpreters"
>>>
>>> We are glad to hear any thoughts.
>>>
>>> Thank you.
>>>
>>>
>>>
>>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>>>
>>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *Maksim Belousov *
>>>
>>>
>>>
>>
>>
>>
>

Re: Implementing run all paragraphs sequentially

Posted by Mohit Jaggi <mo...@gmail.com>.

What is the current behavior?

On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hf...@twitter.com> wrote:

> At least in our case, the notebooks that we need to run sequentially are
> expected to *always* run sequentially - thus it makes more sense to be a
> note option than a per-run mode
>
> H
>
> _____________________________
> From: moon soo Lee <mo...@apache.org>
> Sent: Thursday, September 28, 2017 9:03 PM
> Subject: Re: Implementing run all paragraphs sequentially
> To: <us...@zeppelin.apache.org>
>
>
> This is going to be really useful!
>
> Curios why do you prefer 'note option' instead of 'run option'?
> Could you compare their pros and cons?
>
> Thanks,
> moon
>
> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com> wrote:
>
>> +1, our internal users at Twitter also often request this
>>
>> ------------------------------
>> *From:* Belousov Maksim Eduardovich <m....@tinkoff.ru>
>> *Sent:* Thursday, September 28, 2017 8:28:58 AM
>> *To:* users@zeppelin.apache.org
>> *Subject:* Implementing run all paragraphs sequentially
>>
>>
>> Hello, users!
>>
>> At the moment our analysts often use mixes of interpreters in their notes.
>>
>> For example, they prepare data using %jdbc and then use it in %pyspark.
>> Besides, they often use scheduling to make some regular reporting. And they
>> should do something like `time.sleep()` to wait for the data from %jdbc. It
>> doesn`t guarantee the result and doesn`t look cool.
>>
>>
>>
>> You can find early attempts to implement sequential running of all
>> paragraphs in [1].
>>
>> We are really interested in implementation of the issue [2] and are ready
>> to solve it.
>>
>> It seems a good idea to discuss any requirements.
>>
>> My idea is to introduce note setting that defines the type of running to
>> use (parallel or sequential) and leave "Run all" to be the only button
>> running all the cells in the note. This will make sequential or parallel
>> running the `note option` but not `run option`.
>>
>> Option will be controlled by nearby button as shown
>>
>> [image: image002.jpg]
>>
>>
>>
>>
>>
>> For new notes the default state would be "Run sequential all", for old -
>> "Run parallel for interpreters"
>>
>> We are glad to hear any thoughts.
>>
>> Thank you.
>>
>>
>>
>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>>
>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>
>>
>>
>>
>>
>>
>>
>>
>> *Maksim Belousov *
>>
>>
>>
>
>
>

Re: Implementing run all paragraphs sequentially

Posted by Herval Freire <hf...@twitter.com>.

At least in our case, the notebooks that we need to run sequentially are expected to *always* run sequentially - thus it makes more sense to be a note option than a per-run mode

H

_____________________________
From: moon soo Lee <mo...@apache.org>>
Sent: Thursday, September 28, 2017 9:03 PM
Subject: Re: Implementing run all paragraphs sequentially
To: <us...@zeppelin.apache.org>>

This is going to be really useful!

Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,
moon

On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com>> wrote:
+1, our internal users at Twitter also often request this

________________________________
From: Belousov Maksim Eduardovich <m....@tinkoff.ru>>
Sent: Thursday, September 28, 2017 8:28:58 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Implementing run all paragraphs sequentially

Hello, users!
At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.
It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown
[image002.jpg]

For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"
We are glad to hear any thoughts.
Thank you.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368

Maksim Belousov

Re: Implementing run all paragraphs sequentially

Posted by moon soo Lee <mo...@apache.org>.

This is going to be really useful!

Curios why do you prefer 'note option' instead of 'run option'?
Could you compare their pros and cons?

Thanks,
moon

On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hf...@twitter.com> wrote:

> +1, our internal users at Twitter also often request this
>
> ------------------------------
> *From:* Belousov Maksim Eduardovich <m....@tinkoff.ru>
> *Sent:* Thursday, September 28, 2017 8:28:58 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Implementing run all paragraphs sequentially
>
>
> Hello, users!
>
> At the moment our analysts often use mixes of interpreters in their notes.
>
> For example, they prepare data using %jdbc and then use it in %pyspark.
> Besides, they often use scheduling to make some regular reporting. And they
> should do something like `time.sleep()` to wait for the data from %jdbc. It
> doesn`t guarantee the result and doesn`t look cool.
>
>
>
> You can find early attempts to implement sequential running of all
> paragraphs in [1].
>
> We are really interested in implementation of the issue [2] and are ready
> to solve it.
>
> It seems a good idea to discuss any requirements.
>
> My idea is to introduce note setting that defines the type of running to
> use (parallel or sequential) and leave "Run all" to be the only button
> running all the cells in the note. This will make sequential or parallel
> running the `note option` but not `run option`.
>
> Option will be controlled by nearby button as shown
>
> [image: image002.jpg]
>
>
>
>
>
> For new notes the default state would be "Run sequential all", for old -
> "Run parallel for interpreters"
>
> We are glad to hear any thoughts.
>
> Thank you.
>
>
>
> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>
> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>
>
>
>
>
>
>
>
> *Maksim Belousov *
>
>
>

Re: Implementing run all paragraphs sequentially

Posted by Herval Freire <hf...@twitter.com>.

+1, our internal users at Twitter also often request this

________________________________
From: Belousov Maksim Eduardovich <m....@tinkoff.ru>
Sent: Thursday, September 28, 2017 8:28:58 AM
To: users@zeppelin.apache.org
Subject: Implementing run all paragraphs sequentially

Hello, users!
At the moment our analysts often use mixes of interpreters in their notes.
For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool.

You can find early attempts to implement sequential running of all paragraphs in [1].
We are really interested in implementation of the issue [2] and are ready to solve it.
It seems a good idea to discuss any requirements.
My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`.
Option will be controlled by nearby button as shown
[https://lh6.googleusercontent.com/jwnb7xfb0fPbFg1CWPoMSqovu7ecSMv4pJfuP4zdKVZbyAUDwzAT2GJ5EiemXVYrqMW73yklemTpjXNyLRJABpTCoHi6us2ZI_AxWKHwZpBEA7MjpMP0-7Nk8saaJQfIF4yBMPfS]

For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters"
We are glad to hear any thoughts.
Thank you.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
[2] https://issues.apache.org/jira/browse/ZEPPELIN-2368

Maksim Belousov