You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by Xun Liu <ne...@163.com> on 2019/03/11 01:43:23 UTC

[discuss] Zeppelin support workflow

Hello, everyone

Because there are more than 20 interpreters in zeppelin, Data analysts can be used to do a variety of data development,
A lot of data development is interdependent.
For example, the development of machine learning algorithms requires relying on spark to preprocess data, and so on.

Zeppelin should have built-in workflow capabilities. Instead of relying on external software to schedule notes in zeppelin for the following reasons:

1. Now that we have upgraded from the data processing era to the algorithm era, After zeppelin has its own workflow,
Will have a complete ecosystem of complete data processing and algorithmic operations.
2. zeppelin's powerful interactive processing capabilities help algorithm engineers improve productivity and work.
Zeppelin should give the algorithm engineer more direct control. Instead of handing the algorithm to other teams(or software) to do the workflow.
3. zeppelin knows more about the processing status of data than Azkaban and airflow.
So the built-in workflow will have better performance, user experience and control.

Typical use case
Especially in machine learning, Because machine learning generally has a long task execution.
A typical example is as follows:
1) First, obtain data from HDFS through spark;
2) Clean and convert the data through sparksql;
3) Feature extraction of data through spark;
4) Tensorflow writing algorithm through hadoop submarine;
5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch processing;
6) Publish the training acquisition model and provide online prediction services;
7) Model prediction by flink;
8) Receive incremental data through flink for incremental update of the model;
Therefore, zeppelin is especially required to have the ability to arrange workflows.

I completed the draft of the zeppelin workflow system design, please review, you can directly modify the document or fill in the comments.

JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018>
gdoc: https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit <https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit>

:-)

Xun Liu
2019-03-11

Re: [discuss] Zeppelin support workflow

Posted by Xun Liu <ne...@163.com>.

Hi, Mei Long

I am very happy to be able to attend the meeting of the zeppelin community. 
What time is the next meeting? Waiting for community email notifications?

Zeppelin workflow's ticket is here, 
https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> 
welcome everyone's attention.

> 在 2019年3月19日，上午1:04，Mei Long <ml...@zepl.com> 写道：
> 
> Very cool! @Xun Liu Would you like to talk about it at our next Apache
> Zeppelin community meeting?
> 
> On Sat, Mar 16, 2019 at 1:00 PM Felix Cheung <fe...@hotmail.com>
> wrote:
> 
>> I like it!
>> 
>> ________________________________
>> From: Jongyoul Lee <jo...@gmail.com>
>> Sent: Monday, March 11, 2019 9:05:03 PM
>> To: dev
>> Subject: Re: [discuss] Zeppelin support workflow
>> 
>> Thanks for the sharing this kind of discussion.
>> 
>> I'm interested in it. Will see it.
>> 
>> On Mon, Mar 11, 2019 at 10:43 AM Xun Liu <ne...@163.com> wrote:
>> 
>>> Hello, everyone
>>> 
>>> Because there are more than 20 interpreters in zeppelin,  Data analysts
>>> can be used to do a variety of data development,
>>> A lot of data development is interdependent.
>>> For example, the development of machine learning algorithms requires
>>> relying on spark to preprocess data, and so on.
>>> 
>>> Zeppelin should have built-in workflow capabilities. Instead of relying
>> on
>>> external software to schedule notes in zeppelin for the following
>> reasons:
>>> 
>>> 1. Now that we have upgraded from the data processing era to the
>> algorithm
>>> era, After zeppelin has its own workflow,
>>> Will have a complete ecosystem of complete data processing and
>> algorithmic
>>> operations.
>>> 2. zeppelin's powerful interactive processing capabilities help algorithm
>>> engineers improve productivity and work.
>>> Zeppelin should give the algorithm engineer more direct control. Instead
>>> of handing the algorithm to other teams(or software) to do the workflow.
>>> 3. zeppelin knows more about the processing status of data than Azkaban
>>> and airflow.
>>> So the built-in workflow will have better performance, user experience
>> and
>>> control.
>>> 
>>> Typical use case
>>> Especially in machine learning, Because machine learning generally has a
>>> long task execution.
>>> A typical example is as follows:
>>> 1) First, obtain data from HDFS through spark;
>>> 2) Clean and convert the data through sparksql;
>>> 3) Feature extraction of data through spark;
>>> 4) Tensorflow writing algorithm through hadoop submarine;
>>> 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
>>> processing;
>>> 6) Publish the training acquisition model and provide online prediction
>>> services;
>>> 7) Model prediction by flink;
>>> 8) Receive incremental data through flink for incremental update of the
>>> model;
>>> Therefore, zeppelin is especially required to have the ability to arrange
>>> workflows.
>>> 
>>> I completed the draft of the zeppelin workflow system design, please
>>> review, you can directly modify the document or fill in the comments.
>>> 
>>> JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>>> gdoc:
>>> 
>> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
>>> <
>>> 
>> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
>>> 
>>> 
>>> 
>>> :-)
>>> 
>>> Xun Liu
>>> 2019-03-11
>> 
>> 
>> 
>> --
>> 이종열, Jongyoul Lee, 李宗烈
>> http://madeng.net
>>

Re: [discuss] Zeppelin support workflow

Posted by Mei Long <ml...@zepl.com>.

Very cool! @Xun Liu Would you like to talk about it at our next Apache
Zeppelin community meeting?

On Sat, Mar 16, 2019 at 1:00 PM Felix Cheung <fe...@hotmail.com>
wrote:

> I like it!
>
> ________________________________
> From: Jongyoul Lee <jo...@gmail.com>
> Sent: Monday, March 11, 2019 9:05:03 PM
> To: dev
> Subject: Re: [discuss] Zeppelin support workflow
>
> Thanks for the sharing this kind of discussion.
>
> I'm interested in it. Will see it.
>
> On Mon, Mar 11, 2019 at 10:43 AM Xun Liu <ne...@163.com> wrote:
>
> > Hello, everyone
> >
> > Because there are more than 20 interpreters in zeppelin,  Data analysts
> > can be used to do a variety of data development,
> > A lot of data development is interdependent.
> > For example, the development of machine learning algorithms requires
> > relying on spark to preprocess data, and so on.
> >
> > Zeppelin should have built-in workflow capabilities. Instead of relying
> on
> > external software to schedule notes in zeppelin for the following
> reasons:
> >
> > 1. Now that we have upgraded from the data processing era to the
> algorithm
> > era, After zeppelin has its own workflow,
> > Will have a complete ecosystem of complete data processing and
> algorithmic
> > operations.
> > 2. zeppelin's powerful interactive processing capabilities help algorithm
> > engineers improve productivity and work.
> > Zeppelin should give the algorithm engineer more direct control. Instead
> > of handing the algorithm to other teams(or software) to do the workflow.
> > 3. zeppelin knows more about the processing status of data than Azkaban
> > and airflow.
> > So the built-in workflow will have better performance, user experience
> and
> > control.
> >
> > Typical use case
> > Especially in machine learning, Because machine learning generally has a
> > long task execution.
> > A typical example is as follows:
> > 1) First, obtain data from HDFS through spark;
> > 2) Clean and convert the data through sparksql;
> > 3) Feature extraction of data through spark;
> > 4) Tensorflow writing algorithm through hadoop submarine;
> > 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
> > processing;
> > 6) Publish the training acquisition model and provide online prediction
> > services;
> > 7) Model prediction by flink;
> > 8) Receive incremental data through flink for incremental update of the
> > model;
> > Therefore, zeppelin is especially required to have the ability to arrange
> > workflows.
> >
> > I completed the draft of the zeppelin workflow system design, please
> > review, you can directly modify the document or fill in the comments.
> >
> > JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> > https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> > gdoc:
> >
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> > <
> >
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> >
> >
> >
> > :-)
> >
> > Xun Liu
> > 2019-03-11
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>

Re: [discuss] Zeppelin support workflow

Posted by Felix Cheung <fe...@hotmail.com>.

I like it!

________________________________
From: Jongyoul Lee <jo...@gmail.com>
Sent: Monday, March 11, 2019 9:05:03 PM
To: dev
Subject: Re: [discuss] Zeppelin support workflow

Thanks for the sharing this kind of discussion.

I'm interested in it. Will see it.

On Mon, Mar 11, 2019 at 10:43 AM Xun Liu <ne...@163.com> wrote:

> Hello, everyone
>
> Because there are more than 20 interpreters in zeppelin,  Data analysts
> can be used to do a variety of data development,
> A lot of data development is interdependent.
> For example, the development of machine learning algorithms requires
> relying on spark to preprocess data, and so on.
>
> Zeppelin should have built-in workflow capabilities. Instead of relying on
> external software to schedule notes in zeppelin for the following reasons:
>
> 1. Now that we have upgraded from the data processing era to the algorithm
> era, After zeppelin has its own workflow,
> Will have a complete ecosystem of complete data processing and algorithmic
> operations.
> 2. zeppelin's powerful interactive processing capabilities help algorithm
> engineers improve productivity and work.
> Zeppelin should give the algorithm engineer more direct control. Instead
> of handing the algorithm to other teams(or software) to do the workflow.
> 3. zeppelin knows more about the processing status of data than Azkaban
> and airflow.
> So the built-in workflow will have better performance, user experience and
> control.
>
> Typical use case
> Especially in machine learning, Because machine learning generally has a
> long task execution.
> A typical example is as follows:
> 1) First, obtain data from HDFS through spark;
> 2) Clean and convert the data through sparksql;
> 3) Feature extraction of data through spark;
> 4) Tensorflow writing algorithm through hadoop submarine;
> 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
> processing;
> 6) Publish the training acquisition model and provide online prediction
> services;
> 7) Model prediction by flink;
> 8) Receive incremental data through flink for incremental update of the
> model;
> Therefore, zeppelin is especially required to have the ability to arrange
> workflows.
>
> I completed the draft of the zeppelin workflow system design, please
> review, you can directly modify the document or fill in the comments.
>
> JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> gdoc:
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> <
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit>
>
>
> :-)
>
> Xun Liu
> 2019-03-11



--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: [discuss] Zeppelin support workflow

Posted by Jongyoul Lee <jo...@gmail.com>.

Thanks for the sharing this kind of discussion.

I'm interested in it. Will see it.

On Mon, Mar 11, 2019 at 10:43 AM Xun Liu <ne...@163.com> wrote:

> Hello, everyone
>
> Because there are more than 20 interpreters in zeppelin,  Data analysts
> can be used to do a variety of data development,
> A lot of data development is interdependent.
> For example, the development of machine learning algorithms requires
> relying on spark to preprocess data, and so on.
>
> Zeppelin should have built-in workflow capabilities. Instead of relying on
> external software to schedule notes in zeppelin for the following reasons:
>
> 1. Now that we have upgraded from the data processing era to the algorithm
> era, After zeppelin has its own workflow,
> Will have a complete ecosystem of complete data processing and algorithmic
> operations.
> 2. zeppelin's powerful interactive processing capabilities help algorithm
> engineers improve productivity and work.
> Zeppelin should give the algorithm engineer more direct control. Instead
> of handing the algorithm to other teams(or software) to do the workflow.
> 3. zeppelin knows more about the processing status of data than Azkaban
> and airflow.
> So the built-in workflow will have better performance, user experience and
> control.
>
> Typical use case
> Especially in machine learning, Because machine learning generally has a
> long task execution.
> A typical example is as follows:
> 1) First, obtain data from HDFS through spark;
> 2) Clean and convert the data through sparksql;
> 3) Feature extraction of data through spark;
> 4) Tensorflow writing algorithm through hadoop submarine;
> 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
> processing;
> 6) Publish the training acquisition model and provide online prediction
> services;
> 7) Model prediction by flink;
> 8) Receive incremental data through flink for incremental update of the
> model;
> Therefore, zeppelin is especially required to have the ability to arrange
> workflows.
>
> I completed the draft of the zeppelin workflow system design, please
> review, you can directly modify the document or fill in the comments.
>
> JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> gdoc:
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> <
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit>
>
>
> :-)
>
> Xun Liu
> 2019-03-11



-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net