You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Rong Rong <wa...@gmail.com> on 2019/05/01 22:18:33 UTC

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Hi Shaoxuan/Weihua,

Thanks for the proposal and driving the effort.
I also replied to the original discussion thread, and still a +1 on moving
towards the ski-learn model.
I just left a few comments on the API details and some general questions.
Please kindly take a look.

There's another thread regarding a close to merge FLIP-23 implementation
[1]. I agree this might still be early stage to talk about productionizing
and model-serving. But I would be nice to keep the design/implementation in
mind that: ease of use for productionizing a ML pipeline is also very
important.
And if we can leverage the implementation in FLIP-23 in the future, (some
adjustment might be needed) that would be super helpful.

Best,
Rong


[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-23-Model-Serving-td20260.html


On Tue, Apr 30, 2019 at 1:47 AM Shaoxuan Wang <ws...@gmail.com> wrote:

> Thanks for all the feedback.
>
> @Jincheng Sun
> > I recommend It's better to add a detailed implementation plan to FLIP and
> google doc.
> Yes, I will add a subsection for implementation plan.
>
> @Chen Qin
> >Just share some of insights from operating SparkML side at scale
> >- map reduce may not best way to iterative sync partitioned workers.
> >- native hardware accelerations is key to adopt rapid changes in ML
> improvements in foreseeable future.
> Thanks for sharing your experience on SparkML. The purpose of this FLIP is
> mainly to provide the interfaces for ML pipeline and ML lib, and the
> implementations of most standard algorithms. Besides this FLIP, for AI
> computing on Flink, we will continue to contribute the efforts, like the
> enhancement of iterative and the integration of deep learning engines (such
> as Tensoflow/Pytorch). I have presented part of these work in
>
> https://www.ververica.com/resources/flink-forward-san-francisco-2019/when-table-meets-ai-build-flink-ai-ecosystem-on-table-api
> I am not sure if I have fully got your comments. Can you please elaborate
> them with more details, and if possible, please provide some suggestions
> about what we should work on to address the challenges you have mentioned.
>
> Regards,
> Shaoxuan
>
> On Mon, Apr 29, 2019 at 11:28 AM Chen Qin <qi...@gmail.com> wrote:
>
> > Just share some of insights from operating SparkML side at scale
> > - map reduce may not best way to iterative sync partitioned workers.
> > - native hardware accelerations is key to adopt rapid changes in ML
> > improvements in foreseeable future.
> >
> > Chen
> >
> > On Apr 29, 2019, at 11:02, jincheng sun <su...@gmail.com>
> wrote:
> > >
> > > Hi Shaoxuan,
> > >
> > > Thanks for doing more efforts for the enhances of the scalability and
> the
> > > ease of use of Flink ML and make it one step further. Thank you for
> > sharing
> > > a lot of context information.
> > >
> > > big +1 for this proposal!
> > >
> > > Here only one suggestion, that is, It has been a short time until the
> > > release of flink-1.9, so I recommend It's better to add a detailed
> > > implementation plan to FLIP and google doc.
> > >
> > > What do you think?
> > >
> > > Best,
> > > Jincheng
> > >
> > > Shaoxuan Wang <ws...@gmail.com> 于2019年4月29日周一 上午10:34写道：
> > >
> > >> Hi everyone,
> > >>
> > >> Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI
> > several
> > >> months ago in this mail thread:
> > >>
> > >>
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
> > >>
> > >> Luogen, Becket, Xu, Weihua and I have been working on this proposal
> > >> offline in
> > >> the past a few months. Now we want to share the first phase of the
> > entire
> > >> proposal with a FLIP. In this FLIP-39, we want to achieve several
> things
> > >> (and hope those can be accomplished and released in Flink-1.9):
> > >>
> > >>   -
> > >>
> > >>   Provide a new set of ML core interface (on top of Flink TableAPI)
> > >>   -
> > >>
> > >>   Provide a ML pipeline interface (on top of Flink TableAPI)
> > >>   -
> > >>
> > >>   Provide the interfaces for parameters management and pipeline/mode
> > >>   persistence
> > >>   -
> > >>
> > >>   All the above interfaces should facilitate any new ML algorithm. We
> > will
> > >>   gradually add various standard ML algorithms on top of these new
> > >> proposed
> > >>   interfaces to ensure their feasibility and scalability.
> > >>
> > >>
> > >> Part of this FLIP has been present in Flink Forward 2019 @ San
> > Francisco by
> > >> Xu and Me.
> > >>
> > >>
> > >>
> >
> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
> > >>
> > >>
> > >>
> >
> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
> > >>
> > >> You can find the videos & slides at
> > >> https://www.ververica.com/flink-forward-san-francisco-2019
> > >>
> > >> The design document for FLIP-39 can be found here:
> > >>
> > >>
> > >>
> >
> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
> > >>
> > >>
> > >> I am looking forward to your feedback.
> > >>
> > >> Regards,
> > >>
> > >> Shaoxuan
> > >>
> >
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Shuiqiang Chen <ac...@gmail.com>.

Hi Robert,

Thank you for your reminding! I have added the wiki page[1] for this FLIP.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs

Robert Metzger <rm...@apache.org> 于2019年8月14日周三 下午5:56写道：

> It seems that this FLIP doesn't have a Wiki page yet [1], even though it is
> already partially implemented [2]
> We should try to stick more to the FLIP process to manage the project more
> efficiently.
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> [2] https://issues.apache.org/jira/browse/FLINK-12470
>
> On Mon, Jun 17, 2019 at 12:27 PM Gen Luo <lu...@gmail.com> wrote:
>
> > Hi all,
> >
> > In the review of PR for FLINK-12473, there were a few comments regarding
> > pipeline exportation. We would like to start a follow up discussions to
> > address some related comments.
> >
> > Currently, FLIP-39 proposal gives a way for users to persist a pipeline
> in
> > JSON format. But it does not specify how users can export a pipeline for
> > serving purpose. We summarized some thoughts on this in the following
> doc.
> >
> >
> >
> https://docs.google.com/document/d/1B84b-1CvOXtwWQ6_tQyiaHwnSeiRqh-V96Or8uHqCp8/edit?usp=sharing
> >
> > After we reach consensus on the pipeline exportation, we will add a
> > corresponding section in FLIP-39.
> >
> >
> > Shaoxuan Wang <ws...@gmail.com> 于2019年6月5日周三 上午8:47写道：
> >
> > > Stavros,
> > > They have the similar logic concept, but the implementation details are
> > > quite different. It is hard to migrate the interface with different
> > > implementations. The built-in algorithms are useful legacy that we will
> > > consider migrate to the new API (but still with different
> > implementations).
> > > BTW, the new API has already been merged via FLINK-12473.
> > >
> > > Thanks,
> > > Shaoxuan
> > >
> > >
> > >
> > > On Mon, Jun 3, 2019 at 6:08 PM Stavros Kontopoulos <
> > > st.kontopoulos@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Some portion of the code could be migrated to the new Table API no?
> > > > I am saying that because the new API design is based on scikit-learn
> > and
> > > > the old one was also inspired by it.
> > > >
> > > > Best,
> > > > Stavros
> > > > On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang <ws...@gmail.com>
> > > wrote:
> > > >
> > > > > Another consensus (from the offline discussion) is that we will
> > > > > delete/deprecate flink-libraries/flink-ml. I have started a survey
> > and
> > > > > discussion [1] in dev/user-ml to collect the feedback. Depending on
> > the
> > > > > replies, we will decide if we shall delete it in Flink1.9 or
> > > > > deprecate&delete in the next release after 1.9.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html
> > > > >
> > > > > Regards,
> > > > > Shaoxuan
> > > > >
> > > > >
> > > > > On Tue, May 21, 2019 at 9:22 PM Gen Luo <lu...@gmail.com>
> wrote:
> > > > >
> > > > > > Yes, this is our conclusion. I'd like to add only one point that
> > > > > > registering user defined aggregator is also needed which is
> > currently
> > > > > > provided by 'bridge' and finally will be merged into Table API.
> > It's
> > > > same
> > > > > > with collect().
> > > > > >
> > > > > > I will add a TableEnvironment argument in Estimator.fit() and
> > > > > > Transformer.transform() to get rid of the dependency on
> > > > > > flink-table-planner. This will be committed soon.
> > > > > >
> > > > > > Aljoscha Krettek <al...@apache.org> 于2019年5月21日周二 下午7:31写道：
> > > > > >
> > > > > > > We discussed this in private and came to the conclusion that we
> > > > should
> > > > > > > (for now) have the dependency on flink-table-api-xxx-bridge
> > because
> > > > we
> > > > > > need
> > > > > > > access to the collect() method, which is not yet available in
> the
> > > > Table
> > > > > > > API. Once that is available the code can be refactored but for
> > now
> > > we
> > > > > > want
> > > > > > > to unblock work on this new module.
> > > > > > >
> > > > > > > We also agreed that we don’t need a direct dependency on
> > > > > > > flink-table-planner.
> > > > > > >
> > > > > > > I hope I summarised our discussion correctly.
> > > > > > >
> > > > > > > > On 17. May 2019, at 12:20, Gen Luo <lu...@gmail.com>
> > wrote:
> > > > > > > >
> > > > > > > > Thanks for your reply.
> > > > > > > >
> > > > > > > > For the first question, it's not strictly necessary. But I
> > perfer
> > > > not
> > > > > > to
> > > > > > > > have a TableEnvironment argument in Estimator.fit() or
> > > > > > > > Transformer.transform(), which is not part of machine
> learning
> > > > > concept,
> > > > > > > and
> > > > > > > > may make our API not as clean and pretty as other systems
> do. I
> > > > would
> > > > > > > like
> > > > > > > > another way other than introducing flink-table-planner to do
> > > this.
> > > > If
> > > > > > > it's
> > > > > > > > impossible or severely opposed, I may make the concession to
> > add
> > > > the
> > > > > > > > argument.
> > > > > > > >
> > > > > > > > Other than that, "flink-table-api-xxx-bridge"s are still
> > needed.
> > > A
> > > > > vary
> > > > > > > > common case is that an algorithm needs to guarantee that it's
> > > > running
> > > > > > > under
> > > > > > > > a BatchTableEnvironment, which makes it possible to collect
> > > result
> > > > > each
> > > > > > > > iteration. A typical algorithm like this is ALS. By flink1.8,
> > > this
> > > > > can
> > > > > > be
> > > > > > > > only achieved by converting Table to DataSet than call
> > > > > > DataSet.collect(),
> > > > > > > > which is available in flink-table-api-xxx-bridge. Besides,
> > > > > registering
> > > > > > > > UDAGG is also depending on it.
> > > > > > > >
> > > > > > > > In conclusion, '"planner" can be removed from dependencies
> but
> > > > > > > introducing
> > > > > > > > "bridge"s are inevitable. Whether and how to acquire
> > > > TableEnvironment
> > > > > > > from
> > > > > > > > a Table can be discussed.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Robert Metzger <rm...@apache.org>.

It seems that this FLIP doesn't have a Wiki page yet [1], even though it is
already partially implemented [2]
We should try to stick more to the FLIP process to manage the project more
efficiently.


[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
[2] https://issues.apache.org/jira/browse/FLINK-12470

On Mon, Jun 17, 2019 at 12:27 PM Gen Luo <lu...@gmail.com> wrote:

> Hi all,
>
> In the review of PR for FLINK-12473, there were a few comments regarding
> pipeline exportation. We would like to start a follow up discussions to
> address some related comments.
>
> Currently, FLIP-39 proposal gives a way for users to persist a pipeline in
> JSON format. But it does not specify how users can export a pipeline for
> serving purpose. We summarized some thoughts on this in the following doc.
>
>
> https://docs.google.com/document/d/1B84b-1CvOXtwWQ6_tQyiaHwnSeiRqh-V96Or8uHqCp8/edit?usp=sharing
>
> After we reach consensus on the pipeline exportation, we will add a
> corresponding section in FLIP-39.
>
>
> Shaoxuan Wang <ws...@gmail.com> 于2019年6月5日周三 上午8:47写道：
>
> > Stavros,
> > They have the similar logic concept, but the implementation details are
> > quite different. It is hard to migrate the interface with different
> > implementations. The built-in algorithms are useful legacy that we will
> > consider migrate to the new API (but still with different
> implementations).
> > BTW, the new API has already been merged via FLINK-12473.
> >
> > Thanks,
> > Shaoxuan
> >
> >
> >
> > On Mon, Jun 3, 2019 at 6:08 PM Stavros Kontopoulos <
> > st.kontopoulos@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Some portion of the code could be migrated to the new Table API no?
> > > I am saying that because the new API design is based on scikit-learn
> and
> > > the old one was also inspired by it.
> > >
> > > Best,
> > > Stavros
> > > On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang <ws...@gmail.com>
> > wrote:
> > >
> > > > Another consensus (from the offline discussion) is that we will
> > > > delete/deprecate flink-libraries/flink-ml. I have started a survey
> and
> > > > discussion [1] in dev/user-ml to collect the feedback. Depending on
> the
> > > > replies, we will decide if we shall delete it in Flink1.9 or
> > > > deprecate&delete in the next release after 1.9.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html
> > > >
> > > > Regards,
> > > > Shaoxuan
> > > >
> > > >
> > > > On Tue, May 21, 2019 at 9:22 PM Gen Luo <lu...@gmail.com> wrote:
> > > >
> > > > > Yes, this is our conclusion. I'd like to add only one point that
> > > > > registering user defined aggregator is also needed which is
> currently
> > > > > provided by 'bridge' and finally will be merged into Table API.
> It's
> > > same
> > > > > with collect().
> > > > >
> > > > > I will add a TableEnvironment argument in Estimator.fit() and
> > > > > Transformer.transform() to get rid of the dependency on
> > > > > flink-table-planner. This will be committed soon.
> > > > >
> > > > > Aljoscha Krettek <al...@apache.org> 于2019年5月21日周二 下午7:31写道：
> > > > >
> > > > > > We discussed this in private and came to the conclusion that we
> > > should
> > > > > > (for now) have the dependency on flink-table-api-xxx-bridge
> because
> > > we
> > > > > need
> > > > > > access to the collect() method, which is not yet available in the
> > > Table
> > > > > > API. Once that is available the code can be refactored but for
> now
> > we
> > > > > want
> > > > > > to unblock work on this new module.
> > > > > >
> > > > > > We also agreed that we don’t need a direct dependency on
> > > > > > flink-table-planner.
> > > > > >
> > > > > > I hope I summarised our discussion correctly.
> > > > > >
> > > > > > > On 17. May 2019, at 12:20, Gen Luo <lu...@gmail.com>
> wrote:
> > > > > > >
> > > > > > > Thanks for your reply.
> > > > > > >
> > > > > > > For the first question, it's not strictly necessary. But I
> perfer
> > > not
> > > > > to
> > > > > > > have a TableEnvironment argument in Estimator.fit() or
> > > > > > > Transformer.transform(), which is not part of machine learning
> > > > concept,
> > > > > > and
> > > > > > > may make our API not as clean and pretty as other systems do. I
> > > would
> > > > > > like
> > > > > > > another way other than introducing flink-table-planner to do
> > this.
> > > If
> > > > > > it's
> > > > > > > impossible or severely opposed, I may make the concession to
> add
> > > the
> > > > > > > argument.
> > > > > > >
> > > > > > > Other than that, "flink-table-api-xxx-bridge"s are still
> needed.
> > A
> > > > vary
> > > > > > > common case is that an algorithm needs to guarantee that it's
> > > running
> > > > > > under
> > > > > > > a BatchTableEnvironment, which makes it possible to collect
> > result
> > > > each
> > > > > > > iteration. A typical algorithm like this is ALS. By flink1.8,
> > this
> > > > can
> > > > > be
> > > > > > > only achieved by converting Table to DataSet than call
> > > > > DataSet.collect(),
> > > > > > > which is available in flink-table-api-xxx-bridge. Besides,
> > > > registering
> > > > > > > UDAGG is also depending on it.
> > > > > > >
> > > > > > > In conclusion, '"planner" can be removed from dependencies but
> > > > > > introducing
> > > > > > > "bridge"s are inevitable. Whether and how to acquire
> > > TableEnvironment
> > > > > > from
> > > > > > > a Table can be discussed.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Gen Luo <lu...@gmail.com>.

Hi all,

In the review of PR for FLINK-12473, there were a few comments regarding
pipeline exportation. We would like to start a follow up discussions to
address some related comments.

Currently, FLIP-39 proposal gives a way for users to persist a pipeline in
JSON format. But it does not specify how users can export a pipeline for
serving purpose. We summarized some thoughts on this in the following doc.

https://docs.google.com/document/d/1B84b-1CvOXtwWQ6_tQyiaHwnSeiRqh-V96Or8uHqCp8/edit?usp=sharing

After we reach consensus on the pipeline exportation, we will add a
corresponding section in FLIP-39.


Shaoxuan Wang <ws...@gmail.com> 于2019年6月5日周三 上午8:47写道：

> Stavros,
> They have the similar logic concept, but the implementation details are
> quite different. It is hard to migrate the interface with different
> implementations. The built-in algorithms are useful legacy that we will
> consider migrate to the new API (but still with different implementations).
> BTW, the new API has already been merged via FLINK-12473.
>
> Thanks,
> Shaoxuan
>
>
>
> On Mon, Jun 3, 2019 at 6:08 PM Stavros Kontopoulos <
> st.kontopoulos@gmail.com>
> wrote:
>
> > Hi,
> >
> > Some portion of the code could be migrated to the new Table API no?
> > I am saying that because the new API design is based on scikit-learn and
> > the old one was also inspired by it.
> >
> > Best,
> > Stavros
> > On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang <ws...@gmail.com>
> wrote:
> >
> > > Another consensus (from the offline discussion) is that we will
> > > delete/deprecate flink-libraries/flink-ml. I have started a survey and
> > > discussion [1] in dev/user-ml to collect the feedback. Depending on the
> > > replies, we will decide if we shall delete it in Flink1.9 or
> > > deprecate&delete in the next release after 1.9.
> > >
> > > [1]
> > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html
> > >
> > > Regards,
> > > Shaoxuan
> > >
> > >
> > > On Tue, May 21, 2019 at 9:22 PM Gen Luo <lu...@gmail.com> wrote:
> > >
> > > > Yes, this is our conclusion. I'd like to add only one point that
> > > > registering user defined aggregator is also needed which is currently
> > > > provided by 'bridge' and finally will be merged into Table API. It's
> > same
> > > > with collect().
> > > >
> > > > I will add a TableEnvironment argument in Estimator.fit() and
> > > > Transformer.transform() to get rid of the dependency on
> > > > flink-table-planner. This will be committed soon.
> > > >
> > > > Aljoscha Krettek <al...@apache.org> 于2019年5月21日周二 下午7:31写道：
> > > >
> > > > > We discussed this in private and came to the conclusion that we
> > should
> > > > > (for now) have the dependency on flink-table-api-xxx-bridge because
> > we
> > > > need
> > > > > access to the collect() method, which is not yet available in the
> > Table
> > > > > API. Once that is available the code can be refactored but for now
> we
> > > > want
> > > > > to unblock work on this new module.
> > > > >
> > > > > We also agreed that we don’t need a direct dependency on
> > > > > flink-table-planner.
> > > > >
> > > > > I hope I summarised our discussion correctly.
> > > > >
> > > > > > On 17. May 2019, at 12:20, Gen Luo <lu...@gmail.com> wrote:
> > > > > >
> > > > > > Thanks for your reply.
> > > > > >
> > > > > > For the first question, it's not strictly necessary. But I perfer
> > not
> > > > to
> > > > > > have a TableEnvironment argument in Estimator.fit() or
> > > > > > Transformer.transform(), which is not part of machine learning
> > > concept,
> > > > > and
> > > > > > may make our API not as clean and pretty as other systems do. I
> > would
> > > > > like
> > > > > > another way other than introducing flink-table-planner to do
> this.
> > If
> > > > > it's
> > > > > > impossible or severely opposed, I may make the concession to add
> > the
> > > > > > argument.
> > > > > >
> > > > > > Other than that, "flink-table-api-xxx-bridge"s are still needed.
> A
> > > vary
> > > > > > common case is that an algorithm needs to guarantee that it's
> > running
> > > > > under
> > > > > > a BatchTableEnvironment, which makes it possible to collect
> result
> > > each
> > > > > > iteration. A typical algorithm like this is ALS. By flink1.8,
> this
> > > can
> > > > be
> > > > > > only achieved by converting Table to DataSet than call
> > > > DataSet.collect(),
> > > > > > which is available in flink-table-api-xxx-bridge. Besides,
> > > registering
> > > > > > UDAGG is also depending on it.
> > > > > >
> > > > > > In conclusion, '"planner" can be removed from dependencies but
> > > > > introducing
> > > > > > "bridge"s are inevitable. Whether and how to acquire
> > TableEnvironment
> > > > > from
> > > > > > a Table can be discussed.
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Shaoxuan Wang <ws...@gmail.com>.

Stavros,
They have the similar logic concept, but the implementation details are
quite different. It is hard to migrate the interface with different
implementations. The built-in algorithms are useful legacy that we will
consider migrate to the new API (but still with different implementations).
BTW, the new API has already been merged via FLINK-12473.

Thanks,
Shaoxuan



On Mon, Jun 3, 2019 at 6:08 PM Stavros Kontopoulos <st...@gmail.com>
wrote:

> Hi,
>
> Some portion of the code could be migrated to the new Table API no?
> I am saying that because the new API design is based on scikit-learn and
> the old one was also inspired by it.
>
> Best,
> Stavros
> On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang <ws...@gmail.com> wrote:
>
> > Another consensus (from the offline discussion) is that we will
> > delete/deprecate flink-libraries/flink-ml. I have started a survey and
> > discussion [1] in dev/user-ml to collect the feedback. Depending on the
> > replies, we will decide if we shall delete it in Flink1.9 or
> > deprecate&delete in the next release after 1.9.
> >
> > [1]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html
> >
> > Regards,
> > Shaoxuan
> >
> >
> > On Tue, May 21, 2019 at 9:22 PM Gen Luo <lu...@gmail.com> wrote:
> >
> > > Yes, this is our conclusion. I'd like to add only one point that
> > > registering user defined aggregator is also needed which is currently
> > > provided by 'bridge' and finally will be merged into Table API. It's
> same
> > > with collect().
> > >
> > > I will add a TableEnvironment argument in Estimator.fit() and
> > > Transformer.transform() to get rid of the dependency on
> > > flink-table-planner. This will be committed soon.
> > >
> > > Aljoscha Krettek <al...@apache.org> 于2019年5月21日周二 下午7:31写道：
> > >
> > > > We discussed this in private and came to the conclusion that we
> should
> > > > (for now) have the dependency on flink-table-api-xxx-bridge because
> we
> > > need
> > > > access to the collect() method, which is not yet available in the
> Table
> > > > API. Once that is available the code can be refactored but for now we
> > > want
> > > > to unblock work on this new module.
> > > >
> > > > We also agreed that we don’t need a direct dependency on
> > > > flink-table-planner.
> > > >
> > > > I hope I summarised our discussion correctly.
> > > >
> > > > > On 17. May 2019, at 12:20, Gen Luo <lu...@gmail.com> wrote:
> > > > >
> > > > > Thanks for your reply.
> > > > >
> > > > > For the first question, it's not strictly necessary. But I perfer
> not
> > > to
> > > > > have a TableEnvironment argument in Estimator.fit() or
> > > > > Transformer.transform(), which is not part of machine learning
> > concept,
> > > > and
> > > > > may make our API not as clean and pretty as other systems do. I
> would
> > > > like
> > > > > another way other than introducing flink-table-planner to do this.
> If
> > > > it's
> > > > > impossible or severely opposed, I may make the concession to add
> the
> > > > > argument.
> > > > >
> > > > > Other than that, "flink-table-api-xxx-bridge"s are still needed. A
> > vary
> > > > > common case is that an algorithm needs to guarantee that it's
> running
> > > > under
> > > > > a BatchTableEnvironment, which makes it possible to collect result
> > each
> > > > > iteration. A typical algorithm like this is ALS. By flink1.8, this
> > can
> > > be
> > > > > only achieved by converting Table to DataSet than call
> > > DataSet.collect(),
> > > > > which is available in flink-table-api-xxx-bridge. Besides,
> > registering
> > > > > UDAGG is also depending on it.
> > > > >
> > > > > In conclusion, '"planner" can be removed from dependencies but
> > > > introducing
> > > > > "bridge"s are inevitable. Whether and how to acquire
> TableEnvironment
> > > > from
> > > > > a Table can be discussed.
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Stavros Kontopoulos <st...@gmail.com>.

Hi,

Some portion of the code could be migrated to the new Table API no?
I am saying that because the new API design is based on scikit-learn and
the old one was also inspired by it.

Best,
Stavros
On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang <ws...@gmail.com> wrote:

> Another consensus (from the offline discussion) is that we will
> delete/deprecate flink-libraries/flink-ml. I have started a survey and
> discussion [1] in dev/user-ml to collect the feedback. Depending on the
> replies, we will decide if we shall delete it in Flink1.9 or
> deprecate&delete in the next release after 1.9.
>
> [1]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html
>
> Regards,
> Shaoxuan
>
>
> On Tue, May 21, 2019 at 9:22 PM Gen Luo <lu...@gmail.com> wrote:
>
> > Yes, this is our conclusion. I'd like to add only one point that
> > registering user defined aggregator is also needed which is currently
> > provided by 'bridge' and finally will be merged into Table API. It's same
> > with collect().
> >
> > I will add a TableEnvironment argument in Estimator.fit() and
> > Transformer.transform() to get rid of the dependency on
> > flink-table-planner. This will be committed soon.
> >
> > Aljoscha Krettek <al...@apache.org> 于2019年5月21日周二 下午7:31写道：
> >
> > > We discussed this in private and came to the conclusion that we should
> > > (for now) have the dependency on flink-table-api-xxx-bridge because we
> > need
> > > access to the collect() method, which is not yet available in the Table
> > > API. Once that is available the code can be refactored but for now we
> > want
> > > to unblock work on this new module.
> > >
> > > We also agreed that we don’t need a direct dependency on
> > > flink-table-planner.
> > >
> > > I hope I summarised our discussion correctly.
> > >
> > > > On 17. May 2019, at 12:20, Gen Luo <lu...@gmail.com> wrote:
> > > >
> > > > Thanks for your reply.
> > > >
> > > > For the first question, it's not strictly necessary. But I perfer not
> > to
> > > > have a TableEnvironment argument in Estimator.fit() or
> > > > Transformer.transform(), which is not part of machine learning
> concept,
> > > and
> > > > may make our API not as clean and pretty as other systems do. I would
> > > like
> > > > another way other than introducing flink-table-planner to do this. If
> > > it's
> > > > impossible or severely opposed, I may make the concession to add the
> > > > argument.
> > > >
> > > > Other than that, "flink-table-api-xxx-bridge"s are still needed. A
> vary
> > > > common case is that an algorithm needs to guarantee that it's running
> > > under
> > > > a BatchTableEnvironment, which makes it possible to collect result
> each
> > > > iteration. A typical algorithm like this is ALS. By flink1.8, this
> can
> > be
> > > > only achieved by converting Table to DataSet than call
> > DataSet.collect(),
> > > > which is available in flink-table-api-xxx-bridge. Besides,
> registering
> > > > UDAGG is also depending on it.
> > > >
> > > > In conclusion, '"planner" can be removed from dependencies but
> > > introducing
> > > > "bridge"s are inevitable. Whether and how to acquire TableEnvironment
> > > from
> > > > a Table can be discussed.
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Shaoxuan Wang <ws...@gmail.com>.

Another consensus (from the offline discussion) is that we will
delete/deprecate flink-libraries/flink-ml. I have started a survey and
discussion [1] in dev/user-ml to collect the feedback. Depending on the
replies, we will decide if we shall delete it in Flink1.9 or
deprecate&delete in the next release after 1.9.

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html

Regards,
Shaoxuan


On Tue, May 21, 2019 at 9:22 PM Gen Luo <lu...@gmail.com> wrote:

> Yes, this is our conclusion. I'd like to add only one point that
> registering user defined aggregator is also needed which is currently
> provided by 'bridge' and finally will be merged into Table API. It's same
> with collect().
>
> I will add a TableEnvironment argument in Estimator.fit() and
> Transformer.transform() to get rid of the dependency on
> flink-table-planner. This will be committed soon.
>
> Aljoscha Krettek <al...@apache.org> 于2019年5月21日周二 下午7:31写道：
>
> > We discussed this in private and came to the conclusion that we should
> > (for now) have the dependency on flink-table-api-xxx-bridge because we
> need
> > access to the collect() method, which is not yet available in the Table
> > API. Once that is available the code can be refactored but for now we
> want
> > to unblock work on this new module.
> >
> > We also agreed that we don’t need a direct dependency on
> > flink-table-planner.
> >
> > I hope I summarised our discussion correctly.
> >
> > > On 17. May 2019, at 12:20, Gen Luo <lu...@gmail.com> wrote:
> > >
> > > Thanks for your reply.
> > >
> > > For the first question, it's not strictly necessary. But I perfer not
> to
> > > have a TableEnvironment argument in Estimator.fit() or
> > > Transformer.transform(), which is not part of machine learning concept,
> > and
> > > may make our API not as clean and pretty as other systems do. I would
> > like
> > > another way other than introducing flink-table-planner to do this. If
> > it's
> > > impossible or severely opposed, I may make the concession to add the
> > > argument.
> > >
> > > Other than that, "flink-table-api-xxx-bridge"s are still needed. A vary
> > > common case is that an algorithm needs to guarantee that it's running
> > under
> > > a BatchTableEnvironment, which makes it possible to collect result each
> > > iteration. A typical algorithm like this is ALS. By flink1.8, this can
> be
> > > only achieved by converting Table to DataSet than call
> DataSet.collect(),
> > > which is available in flink-table-api-xxx-bridge. Besides, registering
> > > UDAGG is also depending on it.
> > >
> > > In conclusion, '"planner" can be removed from dependencies but
> > introducing
> > > "bridge"s are inevitable. Whether and how to acquire TableEnvironment
> > from
> > > a Table can be discussed.
> >
> >
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Gen Luo <lu...@gmail.com>.

Yes, this is our conclusion. I'd like to add only one point that
registering user defined aggregator is also needed which is currently
provided by 'bridge' and finally will be merged into Table API. It's same
with collect().

I will add a TableEnvironment argument in Estimator.fit() and
Transformer.transform() to get rid of the dependency on
flink-table-planner. This will be committed soon.

Aljoscha Krettek <al...@apache.org> 于2019年5月21日周二 下午7:31写道：

> We discussed this in private and came to the conclusion that we should
> (for now) have the dependency on flink-table-api-xxx-bridge because we need
> access to the collect() method, which is not yet available in the Table
> API. Once that is available the code can be refactored but for now we want
> to unblock work on this new module.
>
> We also agreed that we don’t need a direct dependency on
> flink-table-planner.
>
> I hope I summarised our discussion correctly.
>
> > On 17. May 2019, at 12:20, Gen Luo <lu...@gmail.com> wrote:
> >
> > Thanks for your reply.
> >
> > For the first question, it's not strictly necessary. But I perfer not to
> > have a TableEnvironment argument in Estimator.fit() or
> > Transformer.transform(), which is not part of machine learning concept,
> and
> > may make our API not as clean and pretty as other systems do. I would
> like
> > another way other than introducing flink-table-planner to do this. If
> it's
> > impossible or severely opposed, I may make the concession to add the
> > argument.
> >
> > Other than that, "flink-table-api-xxx-bridge"s are still needed. A vary
> > common case is that an algorithm needs to guarantee that it's running
> under
> > a BatchTableEnvironment, which makes it possible to collect result each
> > iteration. A typical algorithm like this is ALS. By flink1.8, this can be
> > only achieved by converting Table to DataSet than call DataSet.collect(),
> > which is available in flink-table-api-xxx-bridge. Besides, registering
> > UDAGG is also depending on it.
> >
> > In conclusion, '"planner" can be removed from dependencies but
> introducing
> > "bridge"s are inevitable. Whether and how to acquire TableEnvironment
> from
> > a Table can be discussed.
>
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Aljoscha Krettek <al...@apache.org>.

We discussed this in private and came to the conclusion that we should (for now) have the dependency on flink-table-api-xxx-bridge because we need access to the collect() method, which is not yet available in the Table API. Once that is available the code can be refactored but for now we want to unblock work on this new module.

We also agreed that we don’t need a direct dependency on flink-table-planner.

I hope I summarised our discussion correctly.

> On 17. May 2019, at 12:20, Gen Luo <lu...@gmail.com> wrote:
> 
> Thanks for your reply.
> 
> For the first question, it's not strictly necessary. But I perfer not to
> have a TableEnvironment argument in Estimator.fit() or
> Transformer.transform(), which is not part of machine learning concept, and
> may make our API not as clean and pretty as other systems do. I would like
> another way other than introducing flink-table-planner to do this. If it's
> impossible or severely opposed, I may make the concession to add the
> argument.
> 
> Other than that, "flink-table-api-xxx-bridge"s are still needed. A vary
> common case is that an algorithm needs to guarantee that it's running under
> a BatchTableEnvironment, which makes it possible to collect result each
> iteration. A typical algorithm like this is ALS. By flink1.8, this can be
> only achieved by converting Table to DataSet than call DataSet.collect(),
> which is available in flink-table-api-xxx-bridge. Besides, registering
> UDAGG is also depending on it.
> 
> In conclusion, '"planner" can be removed from dependencies but introducing
> "bridge"s are inevitable. Whether and how to acquire TableEnvironment from
> a Table can be discussed.

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Gen Luo <lu...@gmail.com>.

Thanks for your reply.

For the first question, it's not strictly necessary. But I perfer not to
have a TableEnvironment argument in Estimator.fit() or
Transformer.transform(), which is not part of machine learning concept, and
may make our API not as clean and pretty as other systems do. I would like
another way other than introducing flink-table-planner to do this. If it's
impossible or severely opposed, I may make the concession to add the
argument.

Other than that, "flink-table-api-xxx-bridge"s are still needed. A vary
common case is that an algorithm needs to guarantee that it's running under
a BatchTableEnvironment, which makes it possible to collect result each
iteration. A typical algorithm like this is ALS. By flink1.8, this can be
only achieved by converting Table to DataSet than call DataSet.collect(),
which is available in flink-table-api-xxx-bridge. Besides, registering
UDAGG is also depending on it.

In conclusion, '"planner" can be removed from dependencies but introducing
"bridge"s are inevitable. Whether and how to acquire TableEnvironment from
a Table can be discussed.

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Aljoscha Krettek <al...@apache.org>.

Hi,

Why is it necessary to acquire a TableEnvironment from a Table?

I think you even said yourself what we should do: "I believe it's better to make the api
clean and hide the detail of implementation as much as possible.”. In my opinion this means we can only depend on the generic Table API module and not let any planner/runner specifics or DataSet/DataStream API leak out. This would be setting us up for future problems once we want to deprecate/remove/rework those APIs.

Best,
Aljoscha

> On 17. May 2019, at 09:06, Gen Luo <lu...@gmail.com> wrote:
> 
> It's better not to depend on flink-table-planner indeed. It's currently
> needed for 3 points: registering udagg, judging the tableEnv batch or
> streaming, converting table to dataSet to collect data. Most of these
> requirements can be fulfilled by flink-table-api-java-bridge and
> flink-table-api-scala-bridge.
> 
> But there's a lack that without current flink-table-planner, it's
> impossible to acquire the tableEnv from a table. If so, all interfaces have
> to require an extra argument tableEnv.
> 
> This does make sense, but personally I don't like it because it has nothing
> to do with machine learning concept. The flink-ml is mainly towards to
> algorithm engineers and scientists, I believe it's better to make the api
> clean and hide the detail of implementation as much as possible. Hopefully
> there would another way to acquire the tableEnv and the api could stay
> clean.
> 
> Aljoscha Krettek <al...@apache.org> 于2019年5月16日周四 下午8:16写道：
> 
>> Hi,
>> 
>> I had a look at the document mostly from a module structure/dependency
>> structure perspective.
>> 
>> We should make the expected dependency structure explicit in the document.
>> 
>> From the discussion in the doc it seems that the intention is that
>> flink-ml-lib should depend on flink-table-planner (the current, pre-blink
>> Table API planner that has a dependency on the DataSet API and DataStream
>> API). I think we should not have this because it ties the Flink ML
>> implementation to a module that is going to be deprecated. As far as I
>> understood, the intention for this new Flink ML module is to be the next
>> generation approach, based on the Table API. If this is true, we should
>> make sure that this only depends on the Table API and is independent of the
>> underlying planner implementation. Especially if we want this to work with
>> the new Blink-based planner that is currently being added to Flink.
>> 
>> What do you think?
>> 
>> Best,
>> Aljoscha
>> 
>>> On 10. May 2019, at 11:22, Shaoxuan Wang <ws...@gmail.com> wrote:
>>> 
>>> Hi everyone,
>>> 
>>> I created umbrella Jira FLINK-12470
>>> <https://issues.apache.org/jira/browse/FLINK-12470> for FLIP39 and
>> added an
>>> "implementation plan" section in the google doc
>>> (
>> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx
>> )
>>> <http://%28https//
>> docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx)
>> .>
>>> .
>>> Need your special attention on the organization of modules/packages of
>>> flink-ml. @Aljosha, @Till, @Rong, @Jincheng, @Becket, and all.
>>> 
>>> We anticipate a quick development growth of Flink ML in the next several
>>> releases. Several components (for instance, pipeline, mllib, model
>> serving,
>>> ml integration test) need to be separated into different submodules.
>>> Therefore, we propose to create a new flink-ml module at the root, and
>> add
>>> sub-modules for ml-pipeline and ml-lib of FLIP39, and potentially we
>>> can also design FLIP23 as another sub-module under this new flink-ml
>>> module (I will raise a discussion in FLIP23 ML thread about this). The
>>> legacy flink-ml module (under flink-libraries) can be remained as it is
>> and
>>> await to be deprecated in the future, or alternatively we move it under
>>> this new flink-ml module and rename it to flink-dataset-ml. What do you
>>> think?
>>> 
>>> Looking forward to your feedback.
>>> 
>>> Regards,
>>> Shaoxuan
>>> 
>>> 
>>> On Tue, May 7, 2019 at 8:42 AM Rong Rong <wa...@gmail.com> wrote:
>>> 
>>>> Thanks for following up promptly and sharing the feedback @shaoxuan.
>>>> 
>>>> Yes I share the same view with you on the convergence of these 2 FLIPs
>>>> eventually. I also have some questions regarding the API as well as the
>>>> possible convergence challenges (especially current Co-processor
>> approach
>>>> vs. FLIP-39's table API approach), I will follow up on the discussion
>>>> thread and the PR on FLIP-23 with you and Boris :-)
>>>> 
>>>> --
>>>> Rong
>>>> 
>>>> On Mon, May 6, 2019 at 3:30 AM Shaoxuan Wang <ws...@gmail.com>
>> wrote:
>>>> 
>>>>> 
>>>>> Thanks for the feedback, Rong and Flavio.
>>>>> 
>>>>> @Rong Rong
>>>>>> There's another thread regarding a close to merge FLIP-23
>> implementation
>>>>>> [1]. I agree this might still be early stage to talk about
>>>>> productionizing
>>>>>> and model-serving. But I would be nice to keep the
>>>>> design/implementation in
>>>>>> mind that: ease of use for productionizing a ML pipeline is also very
>>>>>> important.
>>>>>> And if we can leverage the implementation in FLIP-23 in the future,
>>>>> (some
>>>>>> adjustment might be needed) that would be super helpful.
>>>>> Your raised a very good point. Actually I have been reviewing FLIP23
>> for
>>>>> a while (mostly offline to help Boris polish the PR). FMPOV, FLIP23 and
>>>>> FLIP39 can be well unified at some point. Model serving in FLIP23 is
>>>>> actually a special case of “transformer/model” proposed in FLIP39.
>> Boris's
>>>>> implementation of model serving can be designed as an abstract class
>> on top
>>>>> of transformer/model interface, and then can be used by ML users as a
>>>>> certain ML lib.  I have some other comments WRT FLIP23 x FLIP39, I will
>>>>> reply to the FLIP23 ML later with more details.
>>>>> 
>>>>> @Flavio
>>>>>> I have read many discussion about Flink ML and none of them take into
>>>>>> account the ongoing efforts carried out of by the Streamline H2020
>>>>> project
>>>>>> [1] on this topic.
>>>>>> Have you tried to ping them? I think that both projects could benefits
>>>>> from
>>>>>> a joined effort on this side..
>>>>>> [1] https://h2020-streamline-project.eu/objectives/
>>>>> Thank you for your info. I am not aware of the Streamline H2020
>> projects
>>>>> before. Just did a quick look at its website and github. IMO these
>> projects
>>>>> could be very good Flink ecosystem projects and can be built on top of
>> ML
>>>>> pipeline & ML lib interfaces introduced in FLIP39. I will try to
>> contact
>>>>> the owners of these projects to understand their plans and blockers of
>>>>> using Flink (if there is any). In the meantime, if you have the direct
>>>>> contact of person who might be interested on ML pipeline & ML lib,
>> please
>>>>> share with me.
>>>>> 
>>>>> Regards,
>>>>> Shaoxuan
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, May 2, 2019 at 3:59 PM Flavio Pompermaier <
>> pompermaier@okkam.it>
>>>>> wrote:
>>>>> 
>>>>>> Hi to all,
>>>>>> I have read many discussion about Flink ML and none of them take into
>>>>>> account the ongoing efforts carried out of by the Streamline H2020
>>>>>> project
>>>>>> [1] on this topic.
>>>>>> Have you tried to ping them? I think that both projects could benefits
>>>>>> from
>>>>>> a joined effort on this side..
>>>>>> [1] https://h2020-streamline-project.eu/objectives/
>>>>>> 
>>>>>> Best,
>>>>>> Flavio
>>>>>> 
>>>>>> On Thu, May 2, 2019 at 12:18 AM Rong Rong <wa...@gmail.com>
>> wrote:
>>>>>> 
>>>>>>> Hi Shaoxuan/Weihua,
>>>>>>> 
>>>>>>> Thanks for the proposal and driving the effort.
>>>>>>> I also replied to the original discussion thread, and still a +1 on
>>>>>> moving
>>>>>>> towards the ski-learn model.
>>>>>>> I just left a few comments on the API details and some general
>>>>>> questions.
>>>>>>> Please kindly take a look.
>>>>>>> 
>>>>>>> There's another thread regarding a close to merge FLIP-23
>>>>>> implementation
>>>>>>> [1]. I agree this might still be early stage to talk about
>>>>>> productionizing
>>>>>>> and model-serving. But I would be nice to keep the
>>>>>> design/implementation in
>>>>>>> mind that: ease of use for productionizing a ML pipeline is also very
>>>>>>> important.
>>>>>>> And if we can leverage the implementation in FLIP-23 in the future,
>>>>>> (some
>>>>>>> adjustment might be needed) that would be super helpful.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Rong
>>>>>>> 
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>>>>> 
>>>>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-23-Model-Serving-td20260.html
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Apr 30, 2019 at 1:47 AM Shaoxuan Wang <ws...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks for all the feedback.
>>>>>>>> 
>>>>>>>> @Jincheng Sun
>>>>>>>>> I recommend It's better to add a detailed implementation plan to
>>>>>> FLIP
>>>>>>> and
>>>>>>>> google doc.
>>>>>>>> Yes, I will add a subsection for implementation plan.
>>>>>>>> 
>>>>>>>> @Chen Qin
>>>>>>>>> Just share some of insights from operating SparkML side at scale
>>>>>>>>> - map reduce may not best way to iterative sync partitioned
>> workers.
>>>>>>>>> - native hardware accelerations is key to adopt rapid changes in ML
>>>>>>>> improvements in foreseeable future.
>>>>>>>> Thanks for sharing your experience on SparkML. The purpose of this
>>>>>> FLIP
>>>>>>> is
>>>>>>>> mainly to provide the interfaces for ML pipeline and ML lib, and the
>>>>>>>> implementations of most standard algorithms. Besides this FLIP, for
>>>>>> AI
>>>>>>>> computing on Flink, we will continue to contribute the efforts, like
>>>>>> the
>>>>>>>> enhancement of iterative and the integration of deep learning
>> engines
>>>>>>> (such
>>>>>>>> as Tensoflow/Pytorch). I have presented part of these work in
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>> https://www.ververica.com/resources/flink-forward-san-francisco-2019/when-table-meets-ai-build-flink-ai-ecosystem-on-table-api
>>>>>>>> I am not sure if I have fully got your comments. Can you please
>>>>>> elaborate
>>>>>>>> them with more details, and if possible, please provide some
>>>>>> suggestions
>>>>>>>> about what we should work on to address the challenges you have
>>>>>>> mentioned.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Shaoxuan
>>>>>>>> 
>>>>>>>> On Mon, Apr 29, 2019 at 11:28 AM Chen Qin <qi...@gmail.com>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Just share some of insights from operating SparkML side at scale
>>>>>>>>> - map reduce may not best way to iterative sync partitioned
>>>>>> workers.
>>>>>>>>> - native hardware accelerations is key to adopt rapid changes in ML
>>>>>>>>> improvements in foreseeable future.
>>>>>>>>> 
>>>>>>>>> Chen
>>>>>>>>> 
>>>>>>>>> On Apr 29, 2019, at 11:02, jincheng sun <su...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Shaoxuan,
>>>>>>>>>> 
>>>>>>>>>> Thanks for doing more efforts for the enhances of the
>>>>>> scalability and
>>>>>>>> the
>>>>>>>>>> ease of use of Flink ML and make it one step further. Thank you
>>>>>> for
>>>>>>>>> sharing
>>>>>>>>>> a lot of context information.
>>>>>>>>>> 
>>>>>>>>>> big +1 for this proposal!
>>>>>>>>>> 
>>>>>>>>>> Here only one suggestion, that is, It has been a short time
>>>>>> until the
>>>>>>>>>> release of flink-1.9, so I recommend It's better to add a
>>>>>> detailed
>>>>>>>>>> implementation plan to FLIP and google doc.
>>>>>>>>>> 
>>>>>>>>>> What do you think?
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Jincheng
>>>>>>>>>> 
>>>>>>>>>> Shaoxuan Wang <ws...@gmail.com> 于2019年4月29日周一 上午10:34写道：
>>>>>>>>>> 
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>> 
>>>>>>>>>>> Weihua has proposed to rebuild Flink ML pipeline on top of
>>>>>> TableAPI
>>>>>>>>> several
>>>>>>>>>>> months ago in this mail thread:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
>>>>>>>>>>> 
>>>>>>>>>>> Luogen, Becket, Xu, Weihua and I have been working on this
>>>>>> proposal
>>>>>>>>>>> offline in
>>>>>>>>>>> the past a few months. Now we want to share the first phase of
>>>>>> the
>>>>>>>>> entire
>>>>>>>>>>> proposal with a FLIP. In this FLIP-39, we want to achieve
>>>>>> several
>>>>>>>> things
>>>>>>>>>>> (and hope those can be accomplished and released in Flink-1.9):
>>>>>>>>>>> 
>>>>>>>>>>> -
>>>>>>>>>>> 
>>>>>>>>>>> Provide a new set of ML core interface (on top of Flink
>>>>>> TableAPI)
>>>>>>>>>>> -
>>>>>>>>>>> 
>>>>>>>>>>> Provide a ML pipeline interface (on top of Flink TableAPI)
>>>>>>>>>>> -
>>>>>>>>>>> 
>>>>>>>>>>> Provide the interfaces for parameters management and
>>>>>> pipeline/mode
>>>>>>>>>>> persistence
>>>>>>>>>>> -
>>>>>>>>>>> 
>>>>>>>>>>> All the above interfaces should facilitate any new ML
>>>>>> algorithm.
>>>>>>> We
>>>>>>>>> will
>>>>>>>>>>> gradually add various standard ML algorithms on top of these
>>>>>> new
>>>>>>>>>>> proposed
>>>>>>>>>>> interfaces to ensure their feasibility and scalability.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Part of this FLIP has been present in Flink Forward 2019 @ San
>>>>>>>>> Francisco by
>>>>>>>>>>> Xu and Me.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
>>>>>>>>>>> 
>>>>>>>>>>> You can find the videos & slides at
>>>>>>>>>>> https://www.ververica.com/flink-forward-san-francisco-2019
>>>>>>>>>>> 
>>>>>>>>>>> The design document for FLIP-39 can be found here:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I am looking forward to your feedback.
>>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> 
>>>>>>>>>>> Shaoxuan
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>> 
>>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Gen Luo <lu...@gmail.com>.

It's better not to depend on flink-table-planner indeed. It's currently
needed for 3 points: registering udagg, judging the tableEnv batch or
streaming, converting table to dataSet to collect data. Most of these
requirements can be fulfilled by flink-table-api-java-bridge and
flink-table-api-scala-bridge.

But there's a lack that without current flink-table-planner, it's
impossible to acquire the tableEnv from a table. If so, all interfaces have
to require an extra argument tableEnv.

This does make sense, but personally I don't like it because it has nothing
to do with machine learning concept. The flink-ml is mainly towards to
algorithm engineers and scientists, I believe it's better to make the api
clean and hide the detail of implementation as much as possible. Hopefully
there would another way to acquire the tableEnv and the api could stay
clean.

Aljoscha Krettek <al...@apache.org> 于2019年5月16日周四 下午8:16写道：

> Hi,
>
> I had a look at the document mostly from a module structure/dependency
> structure perspective.
>
> We should make the expected dependency structure explicit in the document.
>
> From the discussion in the doc it seems that the intention is that
> flink-ml-lib should depend on flink-table-planner (the current, pre-blink
> Table API planner that has a dependency on the DataSet API and DataStream
> API). I think we should not have this because it ties the Flink ML
> implementation to a module that is going to be deprecated. As far as I
> understood, the intention for this new Flink ML module is to be the next
> generation approach, based on the Table API. If this is true, we should
> make sure that this only depends on the Table API and is independent of the
> underlying planner implementation. Especially if we want this to work with
> the new Blink-based planner that is currently being added to Flink.
>
> What do you think?
>
> Best,
> Aljoscha
>
> > On 10. May 2019, at 11:22, Shaoxuan Wang <ws...@gmail.com> wrote:
> >
> > Hi everyone,
> >
> > I created umbrella Jira FLINK-12470
> > <https://issues.apache.org/jira/browse/FLINK-12470> for FLIP39 and
> added an
> > "implementation plan" section in the google doc
> > (
> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx
> )
> > <http://%28https//
> docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx)
> .>
> > .
> > Need your special attention on the organization of modules/packages of
> > flink-ml. @Aljosha, @Till, @Rong, @Jincheng, @Becket, and all.
> >
> > We anticipate a quick development growth of Flink ML in the next several
> > releases. Several components (for instance, pipeline, mllib, model
> serving,
> > ml integration test) need to be separated into different submodules.
> > Therefore, we propose to create a new flink-ml module at the root, and
> add
> > sub-modules for ml-pipeline and ml-lib of FLIP39, and potentially we
> > can also design FLIP23 as another sub-module under this new flink-ml
> > module (I will raise a discussion in FLIP23 ML thread about this). The
> > legacy flink-ml module (under flink-libraries) can be remained as it is
> and
> > await to be deprecated in the future, or alternatively we move it under
> > this new flink-ml module and rename it to flink-dataset-ml. What do you
> > think?
> >
> > Looking forward to your feedback.
> >
> > Regards,
> > Shaoxuan
> >
> >
> > On Tue, May 7, 2019 at 8:42 AM Rong Rong <wa...@gmail.com> wrote:
> >
> >> Thanks for following up promptly and sharing the feedback @shaoxuan.
> >>
> >> Yes I share the same view with you on the convergence of these 2 FLIPs
> >> eventually. I also have some questions regarding the API as well as the
> >> possible convergence challenges (especially current Co-processor
> approach
> >> vs. FLIP-39's table API approach), I will follow up on the discussion
> >> thread and the PR on FLIP-23 with you and Boris :-)
> >>
> >> --
> >> Rong
> >>
> >> On Mon, May 6, 2019 at 3:30 AM Shaoxuan Wang <ws...@gmail.com>
> wrote:
> >>
> >>>
> >>> Thanks for the feedback, Rong and Flavio.
> >>>
> >>> @Rong Rong
> >>>> There's another thread regarding a close to merge FLIP-23
> implementation
> >>>> [1]. I agree this might still be early stage to talk about
> >>> productionizing
> >>>> and model-serving. But I would be nice to keep the
> >>> design/implementation in
> >>>> mind that: ease of use for productionizing a ML pipeline is also very
> >>>> important.
> >>>> And if we can leverage the implementation in FLIP-23 in the future,
> >>> (some
> >>>> adjustment might be needed) that would be super helpful.
> >>> Your raised a very good point. Actually I have been reviewing FLIP23
> for
> >>> a while (mostly offline to help Boris polish the PR). FMPOV, FLIP23 and
> >>> FLIP39 can be well unified at some point. Model serving in FLIP23 is
> >>> actually a special case of “transformer/model” proposed in FLIP39.
> Boris's
> >>> implementation of model serving can be designed as an abstract class
> on top
> >>> of transformer/model interface, and then can be used by ML users as a
> >>> certain ML lib.  I have some other comments WRT FLIP23 x FLIP39, I will
> >>> reply to the FLIP23 ML later with more details.
> >>>
> >>> @Flavio
> >>>> I have read many discussion about Flink ML and none of them take into
> >>>> account the ongoing efforts carried out of by the Streamline H2020
> >>> project
> >>>> [1] on this topic.
> >>>> Have you tried to ping them? I think that both projects could benefits
> >>> from
> >>>> a joined effort on this side..
> >>>> [1] https://h2020-streamline-project.eu/objectives/
> >>> Thank you for your info. I am not aware of the Streamline H2020
> projects
> >>> before. Just did a quick look at its website and github. IMO these
> projects
> >>> could be very good Flink ecosystem projects and can be built on top of
> ML
> >>> pipeline & ML lib interfaces introduced in FLIP39. I will try to
> contact
> >>> the owners of these projects to understand their plans and blockers of
> >>> using Flink (if there is any). In the meantime, if you have the direct
> >>> contact of person who might be interested on ML pipeline & ML lib,
> please
> >>> share with me.
> >>>
> >>> Regards,
> >>> Shaoxuan
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, May 2, 2019 at 3:59 PM Flavio Pompermaier <
> pompermaier@okkam.it>
> >>> wrote:
> >>>
> >>>> Hi to all,
> >>>> I have read many discussion about Flink ML and none of them take into
> >>>> account the ongoing efforts carried out of by the Streamline H2020
> >>>> project
> >>>> [1] on this topic.
> >>>> Have you tried to ping them? I think that both projects could benefits
> >>>> from
> >>>> a joined effort on this side..
> >>>> [1] https://h2020-streamline-project.eu/objectives/
> >>>>
> >>>> Best,
> >>>> Flavio
> >>>>
> >>>> On Thu, May 2, 2019 at 12:18 AM Rong Rong <wa...@gmail.com>
> wrote:
> >>>>
> >>>>> Hi Shaoxuan/Weihua,
> >>>>>
> >>>>> Thanks for the proposal and driving the effort.
> >>>>> I also replied to the original discussion thread, and still a +1 on
> >>>> moving
> >>>>> towards the ski-learn model.
> >>>>> I just left a few comments on the API details and some general
> >>>> questions.
> >>>>> Please kindly take a look.
> >>>>>
> >>>>> There's another thread regarding a close to merge FLIP-23
> >>>> implementation
> >>>>> [1]. I agree this might still be early stage to talk about
> >>>> productionizing
> >>>>> and model-serving. But I would be nice to keep the
> >>>> design/implementation in
> >>>>> mind that: ease of use for productionizing a ML pipeline is also very
> >>>>> important.
> >>>>> And if we can leverage the implementation in FLIP-23 in the future,
> >>>> (some
> >>>>> adjustment might be needed) that would be super helpful.
> >>>>>
> >>>>> Best,
> >>>>> Rong
> >>>>>
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>>
> >>>>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-23-Model-Serving-td20260.html
> >>>>>
> >>>>>
> >>>>> On Tue, Apr 30, 2019 at 1:47 AM Shaoxuan Wang <ws...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>>> Thanks for all the feedback.
> >>>>>>
> >>>>>> @Jincheng Sun
> >>>>>>> I recommend It's better to add a detailed implementation plan to
> >>>> FLIP
> >>>>> and
> >>>>>> google doc.
> >>>>>> Yes, I will add a subsection for implementation plan.
> >>>>>>
> >>>>>> @Chen Qin
> >>>>>>> Just share some of insights from operating SparkML side at scale
> >>>>>>> - map reduce may not best way to iterative sync partitioned
> workers.
> >>>>>>> - native hardware accelerations is key to adopt rapid changes in ML
> >>>>>> improvements in foreseeable future.
> >>>>>> Thanks for sharing your experience on SparkML. The purpose of this
> >>>> FLIP
> >>>>> is
> >>>>>> mainly to provide the interfaces for ML pipeline and ML lib, and the
> >>>>>> implementations of most standard algorithms. Besides this FLIP, for
> >>>> AI
> >>>>>> computing on Flink, we will continue to contribute the efforts, like
> >>>> the
> >>>>>> enhancement of iterative and the integration of deep learning
> engines
> >>>>> (such
> >>>>>> as Tensoflow/Pytorch). I have presented part of these work in
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> https://www.ververica.com/resources/flink-forward-san-francisco-2019/when-table-meets-ai-build-flink-ai-ecosystem-on-table-api
> >>>>>> I am not sure if I have fully got your comments. Can you please
> >>>> elaborate
> >>>>>> them with more details, and if possible, please provide some
> >>>> suggestions
> >>>>>> about what we should work on to address the challenges you have
> >>>>> mentioned.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Shaoxuan
> >>>>>>
> >>>>>> On Mon, Apr 29, 2019 at 11:28 AM Chen Qin <qi...@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> Just share some of insights from operating SparkML side at scale
> >>>>>>> - map reduce may not best way to iterative sync partitioned
> >>>> workers.
> >>>>>>> - native hardware accelerations is key to adopt rapid changes in ML
> >>>>>>> improvements in foreseeable future.
> >>>>>>>
> >>>>>>> Chen
> >>>>>>>
> >>>>>>> On Apr 29, 2019, at 11:02, jincheng sun <su...@gmail.com>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi Shaoxuan,
> >>>>>>>>
> >>>>>>>> Thanks for doing more efforts for the enhances of the
> >>>> scalability and
> >>>>>> the
> >>>>>>>> ease of use of Flink ML and make it one step further. Thank you
> >>>> for
> >>>>>>> sharing
> >>>>>>>> a lot of context information.
> >>>>>>>>
> >>>>>>>> big +1 for this proposal!
> >>>>>>>>
> >>>>>>>> Here only one suggestion, that is, It has been a short time
> >>>> until the
> >>>>>>>> release of flink-1.9, so I recommend It's better to add a
> >>>> detailed
> >>>>>>>> implementation plan to FLIP and google doc.
> >>>>>>>>
> >>>>>>>> What do you think?
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jincheng
> >>>>>>>>
> >>>>>>>> Shaoxuan Wang <ws...@gmail.com> 于2019年4月29日周一 上午10:34写道：
> >>>>>>>>
> >>>>>>>>> Hi everyone,
> >>>>>>>>>
> >>>>>>>>> Weihua has proposed to rebuild Flink ML pipeline on top of
> >>>> TableAPI
> >>>>>>> several
> >>>>>>>>> months ago in this mail thread:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
> >>>>>>>>>
> >>>>>>>>> Luogen, Becket, Xu, Weihua and I have been working on this
> >>>> proposal
> >>>>>>>>> offline in
> >>>>>>>>> the past a few months. Now we want to share the first phase of
> >>>> the
> >>>>>>> entire
> >>>>>>>>> proposal with a FLIP. In this FLIP-39, we want to achieve
> >>>> several
> >>>>>> things
> >>>>>>>>> (and hope those can be accomplished and released in Flink-1.9):
> >>>>>>>>>
> >>>>>>>>>  -
> >>>>>>>>>
> >>>>>>>>>  Provide a new set of ML core interface (on top of Flink
> >>>> TableAPI)
> >>>>>>>>>  -
> >>>>>>>>>
> >>>>>>>>>  Provide a ML pipeline interface (on top of Flink TableAPI)
> >>>>>>>>>  -
> >>>>>>>>>
> >>>>>>>>>  Provide the interfaces for parameters management and
> >>>> pipeline/mode
> >>>>>>>>>  persistence
> >>>>>>>>>  -
> >>>>>>>>>
> >>>>>>>>>  All the above interfaces should facilitate any new ML
> >>>> algorithm.
> >>>>> We
> >>>>>>> will
> >>>>>>>>>  gradually add various standard ML algorithms on top of these
> >>>> new
> >>>>>>>>> proposed
> >>>>>>>>>  interfaces to ensure their feasibility and scalability.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Part of this FLIP has been present in Flink Forward 2019 @ San
> >>>>>>> Francisco by
> >>>>>>>>> Xu and Me.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
> >>>>>>>>>
> >>>>>>>>> You can find the videos & slides at
> >>>>>>>>> https://www.ververica.com/flink-forward-san-francisco-2019
> >>>>>>>>>
> >>>>>>>>> The design document for FLIP-39 can be found here:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I am looking forward to your feedback.
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>>
> >>>>>>>>> Shaoxuan
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
>
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Aljoscha Krettek <al...@apache.org>.

Hi,

I had a look at the document mostly from a module structure/dependency structure perspective.

We should make the expected dependency structure explicit in the document.

From the discussion in the doc it seems that the intention is that flink-ml-lib should depend on flink-table-planner (the current, pre-blink Table API planner that has a dependency on the DataSet API and DataStream API). I think we should not have this because it ties the Flink ML implementation to a module that is going to be deprecated. As far as I understood, the intention for this new Flink ML module is to be the next generation approach, based on the Table API. If this is true, we should make sure that this only depends on the Table API and is independent of the underlying planner implementation. Especially if we want this to work with the new Blink-based planner that is currently being added to Flink.

What do you think?

Best,
Aljoscha

> On 10. May 2019, at 11:22, Shaoxuan Wang <ws...@gmail.com> wrote:
> 
> Hi everyone,
> 
> I created umbrella Jira FLINK-12470
> <https://issues.apache.org/jira/browse/FLINK-12470> for FLIP39 and added an
> "implementation plan" section in the google doc
> (https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx)
> <http://%28https//docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx).>
> .
> Need your special attention on the organization of modules/packages of
> flink-ml. @Aljosha, @Till, @Rong, @Jincheng, @Becket, and all.
> 
> We anticipate a quick development growth of Flink ML in the next several
> releases. Several components (for instance, pipeline, mllib, model serving,
> ml integration test) need to be separated into different submodules.
> Therefore, we propose to create a new flink-ml module at the root, and add
> sub-modules for ml-pipeline and ml-lib of FLIP39, and potentially we
> can also design FLIP23 as another sub-module under this new flink-ml
> module (I will raise a discussion in FLIP23 ML thread about this). The
> legacy flink-ml module (under flink-libraries) can be remained as it is and
> await to be deprecated in the future, or alternatively we move it under
> this new flink-ml module and rename it to flink-dataset-ml. What do you
> think?
> 
> Looking forward to your feedback.
> 
> Regards,
> Shaoxuan
> 
> 
> On Tue, May 7, 2019 at 8:42 AM Rong Rong <wa...@gmail.com> wrote:
> 
>> Thanks for following up promptly and sharing the feedback @shaoxuan.
>> 
>> Yes I share the same view with you on the convergence of these 2 FLIPs
>> eventually. I also have some questions regarding the API as well as the
>> possible convergence challenges (especially current Co-processor approach
>> vs. FLIP-39's table API approach), I will follow up on the discussion
>> thread and the PR on FLIP-23 with you and Boris :-)
>> 
>> --
>> Rong
>> 
>> On Mon, May 6, 2019 at 3:30 AM Shaoxuan Wang <ws...@gmail.com> wrote:
>> 
>>> 
>>> Thanks for the feedback, Rong and Flavio.
>>> 
>>> @Rong Rong
>>>> There's another thread regarding a close to merge FLIP-23 implementation
>>>> [1]. I agree this might still be early stage to talk about
>>> productionizing
>>>> and model-serving. But I would be nice to keep the
>>> design/implementation in
>>>> mind that: ease of use for productionizing a ML pipeline is also very
>>>> important.
>>>> And if we can leverage the implementation in FLIP-23 in the future,
>>> (some
>>>> adjustment might be needed) that would be super helpful.
>>> Your raised a very good point. Actually I have been reviewing FLIP23 for
>>> a while (mostly offline to help Boris polish the PR). FMPOV, FLIP23 and
>>> FLIP39 can be well unified at some point. Model serving in FLIP23 is
>>> actually a special case of “transformer/model” proposed in FLIP39. Boris's
>>> implementation of model serving can be designed as an abstract class on top
>>> of transformer/model interface, and then can be used by ML users as a
>>> certain ML lib.  I have some other comments WRT FLIP23 x FLIP39, I will
>>> reply to the FLIP23 ML later with more details.
>>> 
>>> @Flavio
>>>> I have read many discussion about Flink ML and none of them take into
>>>> account the ongoing efforts carried out of by the Streamline H2020
>>> project
>>>> [1] on this topic.
>>>> Have you tried to ping them? I think that both projects could benefits
>>> from
>>>> a joined effort on this side..
>>>> [1] https://h2020-streamline-project.eu/objectives/
>>> Thank you for your info. I am not aware of the Streamline H2020 projects
>>> before. Just did a quick look at its website and github. IMO these projects
>>> could be very good Flink ecosystem projects and can be built on top of ML
>>> pipeline & ML lib interfaces introduced in FLIP39. I will try to contact
>>> the owners of these projects to understand their plans and blockers of
>>> using Flink (if there is any). In the meantime, if you have the direct
>>> contact of person who might be interested on ML pipeline & ML lib, please
>>> share with me.
>>> 
>>> Regards,
>>> Shaoxuan
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Thu, May 2, 2019 at 3:59 PM Flavio Pompermaier <po...@okkam.it>
>>> wrote:
>>> 
>>>> Hi to all,
>>>> I have read many discussion about Flink ML and none of them take into
>>>> account the ongoing efforts carried out of by the Streamline H2020
>>>> project
>>>> [1] on this topic.
>>>> Have you tried to ping them? I think that both projects could benefits
>>>> from
>>>> a joined effort on this side..
>>>> [1] https://h2020-streamline-project.eu/objectives/
>>>> 
>>>> Best,
>>>> Flavio
>>>> 
>>>> On Thu, May 2, 2019 at 12:18 AM Rong Rong <wa...@gmail.com> wrote:
>>>> 
>>>>> Hi Shaoxuan/Weihua,
>>>>> 
>>>>> Thanks for the proposal and driving the effort.
>>>>> I also replied to the original discussion thread, and still a +1 on
>>>> moving
>>>>> towards the ski-learn model.
>>>>> I just left a few comments on the API details and some general
>>>> questions.
>>>>> Please kindly take a look.
>>>>> 
>>>>> There's another thread regarding a close to merge FLIP-23
>>>> implementation
>>>>> [1]. I agree this might still be early stage to talk about
>>>> productionizing
>>>>> and model-serving. But I would be nice to keep the
>>>> design/implementation in
>>>>> mind that: ease of use for productionizing a ML pipeline is also very
>>>>> important.
>>>>> And if we can leverage the implementation in FLIP-23 in the future,
>>>> (some
>>>>> adjustment might be needed) that would be super helpful.
>>>>> 
>>>>> Best,
>>>>> Rong
>>>>> 
>>>>> 
>>>>> [1]
>>>>> 
>>>>> 
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-23-Model-Serving-td20260.html
>>>>> 
>>>>> 
>>>>> On Tue, Apr 30, 2019 at 1:47 AM Shaoxuan Wang <ws...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> Thanks for all the feedback.
>>>>>> 
>>>>>> @Jincheng Sun
>>>>>>> I recommend It's better to add a detailed implementation plan to
>>>> FLIP
>>>>> and
>>>>>> google doc.
>>>>>> Yes, I will add a subsection for implementation plan.
>>>>>> 
>>>>>> @Chen Qin
>>>>>>> Just share some of insights from operating SparkML side at scale
>>>>>>> - map reduce may not best way to iterative sync partitioned workers.
>>>>>>> - native hardware accelerations is key to adopt rapid changes in ML
>>>>>> improvements in foreseeable future.
>>>>>> Thanks for sharing your experience on SparkML. The purpose of this
>>>> FLIP
>>>>> is
>>>>>> mainly to provide the interfaces for ML pipeline and ML lib, and the
>>>>>> implementations of most standard algorithms. Besides this FLIP, for
>>>> AI
>>>>>> computing on Flink, we will continue to contribute the efforts, like
>>>> the
>>>>>> enhancement of iterative and the integration of deep learning engines
>>>>> (such
>>>>>> as Tensoflow/Pytorch). I have presented part of these work in
>>>>>> 
>>>>>> 
>>>>> 
>>>> https://www.ververica.com/resources/flink-forward-san-francisco-2019/when-table-meets-ai-build-flink-ai-ecosystem-on-table-api
>>>>>> I am not sure if I have fully got your comments. Can you please
>>>> elaborate
>>>>>> them with more details, and if possible, please provide some
>>>> suggestions
>>>>>> about what we should work on to address the challenges you have
>>>>> mentioned.
>>>>>> 
>>>>>> Regards,
>>>>>> Shaoxuan
>>>>>> 
>>>>>> On Mon, Apr 29, 2019 at 11:28 AM Chen Qin <qi...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>>> Just share some of insights from operating SparkML side at scale
>>>>>>> - map reduce may not best way to iterative sync partitioned
>>>> workers.
>>>>>>> - native hardware accelerations is key to adopt rapid changes in ML
>>>>>>> improvements in foreseeable future.
>>>>>>> 
>>>>>>> Chen
>>>>>>> 
>>>>>>> On Apr 29, 2019, at 11:02, jincheng sun <su...@gmail.com>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi Shaoxuan,
>>>>>>>> 
>>>>>>>> Thanks for doing more efforts for the enhances of the
>>>> scalability and
>>>>>> the
>>>>>>>> ease of use of Flink ML and make it one step further. Thank you
>>>> for
>>>>>>> sharing
>>>>>>>> a lot of context information.
>>>>>>>> 
>>>>>>>> big +1 for this proposal!
>>>>>>>> 
>>>>>>>> Here only one suggestion, that is, It has been a short time
>>>> until the
>>>>>>>> release of flink-1.9, so I recommend It's better to add a
>>>> detailed
>>>>>>>> implementation plan to FLIP and google doc.
>>>>>>>> 
>>>>>>>> What do you think?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Jincheng
>>>>>>>> 
>>>>>>>> Shaoxuan Wang <ws...@gmail.com> 于2019年4月29日周一 上午10:34写道：
>>>>>>>> 
>>>>>>>>> Hi everyone,
>>>>>>>>> 
>>>>>>>>> Weihua has proposed to rebuild Flink ML pipeline on top of
>>>> TableAPI
>>>>>>> several
>>>>>>>>> months ago in this mail thread:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
>>>>>>>>> 
>>>>>>>>> Luogen, Becket, Xu, Weihua and I have been working on this
>>>> proposal
>>>>>>>>> offline in
>>>>>>>>> the past a few months. Now we want to share the first phase of
>>>> the
>>>>>>> entire
>>>>>>>>> proposal with a FLIP. In this FLIP-39, we want to achieve
>>>> several
>>>>>> things
>>>>>>>>> (and hope those can be accomplished and released in Flink-1.9):
>>>>>>>>> 
>>>>>>>>>  -
>>>>>>>>> 
>>>>>>>>>  Provide a new set of ML core interface (on top of Flink
>>>> TableAPI)
>>>>>>>>>  -
>>>>>>>>> 
>>>>>>>>>  Provide a ML pipeline interface (on top of Flink TableAPI)
>>>>>>>>>  -
>>>>>>>>> 
>>>>>>>>>  Provide the interfaces for parameters management and
>>>> pipeline/mode
>>>>>>>>>  persistence
>>>>>>>>>  -
>>>>>>>>> 
>>>>>>>>>  All the above interfaces should facilitate any new ML
>>>> algorithm.
>>>>> We
>>>>>>> will
>>>>>>>>>  gradually add various standard ML algorithms on top of these
>>>> new
>>>>>>>>> proposed
>>>>>>>>>  interfaces to ensure their feasibility and scalability.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Part of this FLIP has been present in Flink Forward 2019 @ San
>>>>>>> Francisco by
>>>>>>>>> Xu and Me.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
>>>>>>>>> 
>>>>>>>>> You can find the videos & slides at
>>>>>>>>> https://www.ververica.com/flink-forward-san-francisco-2019
>>>>>>>>> 
>>>>>>>>> The design document for FLIP-39 can be found here:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I am looking forward to your feedback.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> 
>>>>>>>>> Shaoxuan
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Shaoxuan Wang <ws...@gmail.com>.

Hi everyone,

I created umbrella Jira FLINK-12470
<https://issues.apache.org/jira/browse/FLINK-12470> for FLIP39 and added an
"implementation plan" section in the google doc
(https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx)
<http://%28https//docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit#heading=h.pggjwvwg8mrx).>
.
Need your special attention on the organization of modules/packages of
flink-ml. @Aljosha, @Till, @Rong, @Jincheng, @Becket, and all.

We anticipate a quick development growth of Flink ML in the next several
releases. Several components (for instance, pipeline, mllib, model serving,
ml integration test) need to be separated into different submodules.
Therefore, we propose to create a new flink-ml module at the root, and add
sub-modules for ml-pipeline and ml-lib of FLIP39, and potentially we
can also design FLIP23 as another sub-module under this new flink-ml
module (I will raise a discussion in FLIP23 ML thread about this). The
legacy flink-ml module (under flink-libraries) can be remained as it is and
await to be deprecated in the future, or alternatively we move it under
this new flink-ml module and rename it to flink-dataset-ml. What do you
think?

Looking forward to your feedback.

Regards,
Shaoxuan


On Tue, May 7, 2019 at 8:42 AM Rong Rong <wa...@gmail.com> wrote:

> Thanks for following up promptly and sharing the feedback @shaoxuan.
>
> Yes I share the same view with you on the convergence of these 2 FLIPs
> eventually. I also have some questions regarding the API as well as the
> possible convergence challenges (especially current Co-processor approach
> vs. FLIP-39's table API approach), I will follow up on the discussion
> thread and the PR on FLIP-23 with you and Boris :-)
>
> --
> Rong
>
> On Mon, May 6, 2019 at 3:30 AM Shaoxuan Wang <ws...@gmail.com> wrote:
>
>>
>> Thanks for the feedback, Rong and Flavio.
>>
>> @Rong Rong
>> > There's another thread regarding a close to merge FLIP-23 implementation
>> > [1]. I agree this might still be early stage to talk about
>> productionizing
>> > and model-serving. But I would be nice to keep the
>> design/implementation in
>> > mind that: ease of use for productionizing a ML pipeline is also very
>> > important.
>> > And if we can leverage the implementation in FLIP-23 in the future,
>> (some
>> > adjustment might be needed) that would be super helpful.
>> Your raised a very good point. Actually I have been reviewing FLIP23 for
>> a while (mostly offline to help Boris polish the PR). FMPOV, FLIP23 and
>> FLIP39 can be well unified at some point. Model serving in FLIP23 is
>> actually a special case of “transformer/model” proposed in FLIP39. Boris's
>> implementation of model serving can be designed as an abstract class on top
>> of transformer/model interface, and then can be used by ML users as a
>> certain ML lib.  I have some other comments WRT FLIP23 x FLIP39, I will
>> reply to the FLIP23 ML later with more details.
>>
>> @Flavio
>> > I have read many discussion about Flink ML and none of them take into
>> > account the ongoing efforts carried out of by the Streamline H2020
>> project
>> > [1] on this topic.
>> > Have you tried to ping them? I think that both projects could benefits
>> from
>> > a joined effort on this side..
>> > [1] https://h2020-streamline-project.eu/objectives/
>> Thank you for your info. I am not aware of the Streamline H2020 projects
>> before. Just did a quick look at its website and github. IMO these projects
>> could be very good Flink ecosystem projects and can be built on top of ML
>> pipeline & ML lib interfaces introduced in FLIP39. I will try to contact
>> the owners of these projects to understand their plans and blockers of
>> using Flink (if there is any). In the meantime, if you have the direct
>> contact of person who might be interested on ML pipeline & ML lib, please
>> share with me.
>>
>> Regards,
>> Shaoxuan
>>
>>
>>
>>
>>
>> On Thu, May 2, 2019 at 3:59 PM Flavio Pompermaier <po...@okkam.it>
>> wrote:
>>
>>> Hi to all,
>>> I have read many discussion about Flink ML and none of them take into
>>> account the ongoing efforts carried out of by the Streamline H2020
>>> project
>>> [1] on this topic.
>>> Have you tried to ping them? I think that both projects could benefits
>>> from
>>> a joined effort on this side..
>>> [1] https://h2020-streamline-project.eu/objectives/
>>>
>>> Best,
>>> Flavio
>>>
>>> On Thu, May 2, 2019 at 12:18 AM Rong Rong <wa...@gmail.com> wrote:
>>>
>>> > Hi Shaoxuan/Weihua,
>>> >
>>> > Thanks for the proposal and driving the effort.
>>> > I also replied to the original discussion thread, and still a +1 on
>>> moving
>>> > towards the ski-learn model.
>>> > I just left a few comments on the API details and some general
>>> questions.
>>> > Please kindly take a look.
>>> >
>>> > There's another thread regarding a close to merge FLIP-23
>>> implementation
>>> > [1]. I agree this might still be early stage to talk about
>>> productionizing
>>> > and model-serving. But I would be nice to keep the
>>> design/implementation in
>>> > mind that: ease of use for productionizing a ML pipeline is also very
>>> > important.
>>> > And if we can leverage the implementation in FLIP-23 in the future,
>>> (some
>>> > adjustment might be needed) that would be super helpful.
>>> >
>>> > Best,
>>> > Rong
>>> >
>>> >
>>> > [1]
>>> >
>>> >
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-23-Model-Serving-td20260.html
>>> >
>>> >
>>> > On Tue, Apr 30, 2019 at 1:47 AM Shaoxuan Wang <ws...@gmail.com>
>>> wrote:
>>> >
>>> > > Thanks for all the feedback.
>>> > >
>>> > > @Jincheng Sun
>>> > > > I recommend It's better to add a detailed implementation plan to
>>> FLIP
>>> > and
>>> > > google doc.
>>> > > Yes, I will add a subsection for implementation plan.
>>> > >
>>> > > @Chen Qin
>>> > > >Just share some of insights from operating SparkML side at scale
>>> > > >- map reduce may not best way to iterative sync partitioned workers.
>>> > > >- native hardware accelerations is key to adopt rapid changes in ML
>>> > > improvements in foreseeable future.
>>> > > Thanks for sharing your experience on SparkML. The purpose of this
>>> FLIP
>>> > is
>>> > > mainly to provide the interfaces for ML pipeline and ML lib, and the
>>> > > implementations of most standard algorithms. Besides this FLIP, for
>>> AI
>>> > > computing on Flink, we will continue to contribute the efforts, like
>>> the
>>> > > enhancement of iterative and the integration of deep learning engines
>>> > (such
>>> > > as Tensoflow/Pytorch). I have presented part of these work in
>>> > >
>>> > >
>>> >
>>> https://www.ververica.com/resources/flink-forward-san-francisco-2019/when-table-meets-ai-build-flink-ai-ecosystem-on-table-api
>>> > > I am not sure if I have fully got your comments. Can you please
>>> elaborate
>>> > > them with more details, and if possible, please provide some
>>> suggestions
>>> > > about what we should work on to address the challenges you have
>>> > mentioned.
>>> > >
>>> > > Regards,
>>> > > Shaoxuan
>>> > >
>>> > > On Mon, Apr 29, 2019 at 11:28 AM Chen Qin <qi...@gmail.com>
>>> wrote:
>>> > >
>>> > > > Just share some of insights from operating SparkML side at scale
>>> > > > - map reduce may not best way to iterative sync partitioned
>>> workers.
>>> > > > - native hardware accelerations is key to adopt rapid changes in ML
>>> > > > improvements in foreseeable future.
>>> > > >
>>> > > > Chen
>>> > > >
>>> > > > On Apr 29, 2019, at 11:02, jincheng sun <su...@gmail.com>
>>> > > wrote:
>>> > > > >
>>> > > > > Hi Shaoxuan,
>>> > > > >
>>> > > > > Thanks for doing more efforts for the enhances of the
>>> scalability and
>>> > > the
>>> > > > > ease of use of Flink ML and make it one step further. Thank you
>>> for
>>> > > > sharing
>>> > > > > a lot of context information.
>>> > > > >
>>> > > > > big +1 for this proposal!
>>> > > > >
>>> > > > > Here only one suggestion, that is, It has been a short time
>>> until the
>>> > > > > release of flink-1.9, so I recommend It's better to add a
>>> detailed
>>> > > > > implementation plan to FLIP and google doc.
>>> > > > >
>>> > > > > What do you think?
>>> > > > >
>>> > > > > Best,
>>> > > > > Jincheng
>>> > > > >
>>> > > > > Shaoxuan Wang <ws...@gmail.com> 于2019年4月29日周一 上午10:34写道：
>>> > > > >
>>> > > > >> Hi everyone,
>>> > > > >>
>>> > > > >> Weihua has proposed to rebuild Flink ML pipeline on top of
>>> TableAPI
>>> > > > several
>>> > > > >> months ago in this mail thread:
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
>>> > > > >>
>>> > > > >> Luogen, Becket, Xu, Weihua and I have been working on this
>>> proposal
>>> > > > >> offline in
>>> > > > >> the past a few months. Now we want to share the first phase of
>>> the
>>> > > > entire
>>> > > > >> proposal with a FLIP. In this FLIP-39, we want to achieve
>>> several
>>> > > things
>>> > > > >> (and hope those can be accomplished and released in Flink-1.9):
>>> > > > >>
>>> > > > >>   -
>>> > > > >>
>>> > > > >>   Provide a new set of ML core interface (on top of Flink
>>> TableAPI)
>>> > > > >>   -
>>> > > > >>
>>> > > > >>   Provide a ML pipeline interface (on top of Flink TableAPI)
>>> > > > >>   -
>>> > > > >>
>>> > > > >>   Provide the interfaces for parameters management and
>>> pipeline/mode
>>> > > > >>   persistence
>>> > > > >>   -
>>> > > > >>
>>> > > > >>   All the above interfaces should facilitate any new ML
>>> algorithm.
>>> > We
>>> > > > will
>>> > > > >>   gradually add various standard ML algorithms on top of these
>>> new
>>> > > > >> proposed
>>> > > > >>   interfaces to ensure their feasibility and scalability.
>>> > > > >>
>>> > > > >>
>>> > > > >> Part of this FLIP has been present in Flink Forward 2019 @ San
>>> > > > Francisco by
>>> > > > >> Xu and Me.
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
>>> > > > >>
>>> > > > >> You can find the videos & slides at
>>> > > > >> https://www.ververica.com/flink-forward-san-francisco-2019
>>> > > > >>
>>> > > > >> The design document for FLIP-39 can be found here:
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
>>> > > > >>
>>> > > > >>
>>> > > > >> I am looking forward to your feedback.
>>> > > > >>
>>> > > > >> Regards,
>>> > > > >>
>>> > > > >> Shaoxuan
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Rong Rong <wa...@gmail.com>.

Thanks for following up promptly and sharing the feedback @shaoxuan.

Yes I share the same view with you on the convergence of these 2 FLIPs
eventually. I also have some questions regarding the API as well as the
possible convergence challenges (especially current Co-processor approach
vs. FLIP-39's table API approach), I will follow up on the discussion
thread and the PR on FLIP-23 with you and Boris :-)

--
Rong

On Mon, May 6, 2019 at 3:30 AM Shaoxuan Wang <ws...@gmail.com> wrote:

>
> Thanks for the feedback, Rong and Flavio.
>
> @Rong Rong
> > There's another thread regarding a close to merge FLIP-23 implementation
> > [1]. I agree this might still be early stage to talk about
> productionizing
> > and model-serving. But I would be nice to keep the design/implementation
> in
> > mind that: ease of use for productionizing a ML pipeline is also very
> > important.
> > And if we can leverage the implementation in FLIP-23 in the future, (some
> > adjustment might be needed) that would be super helpful.
> Your raised a very good point. Actually I have been reviewing FLIP23 for a
> while (mostly offline to help Boris polish the PR). FMPOV, FLIP23 and
> FLIP39 can be well unified at some point. Model serving in FLIP23 is
> actually a special case of “transformer/model” proposed in FLIP39. Boris's
> implementation of model serving can be designed as an abstract class on top
> of transformer/model interface, and then can be used by ML users as a
> certain ML lib.  I have some other comments WRT FLIP23 x FLIP39, I will
> reply to the FLIP23 ML later with more details.
>
> @Flavio
> > I have read many discussion about Flink ML and none of them take into
> > account the ongoing efforts carried out of by the Streamline H2020
> project
> > [1] on this topic.
> > Have you tried to ping them? I think that both projects could benefits
> from
> > a joined effort on this side..
> > [1] https://h2020-streamline-project.eu/objectives/
> Thank you for your info. I am not aware of the Streamline H2020 projects
> before. Just did a quick look at its website and github. IMO these projects
> could be very good Flink ecosystem projects and can be built on top of ML
> pipeline & ML lib interfaces introduced in FLIP39. I will try to contact
> the owners of these projects to understand their plans and blockers of
> using Flink (if there is any). In the meantime, if you have the direct
> contact of person who might be interested on ML pipeline & ML lib, please
> share with me.
>
> Regards,
> Shaoxuan
>
>
>
>
>
> On Thu, May 2, 2019 at 3:59 PM Flavio Pompermaier <po...@okkam.it>
> wrote:
>
>> Hi to all,
>> I have read many discussion about Flink ML and none of them take into
>> account the ongoing efforts carried out of by the Streamline H2020 project
>> [1] on this topic.
>> Have you tried to ping them? I think that both projects could benefits
>> from
>> a joined effort on this side..
>> [1] https://h2020-streamline-project.eu/objectives/
>>
>> Best,
>> Flavio
>>
>> On Thu, May 2, 2019 at 12:18 AM Rong Rong <wa...@gmail.com> wrote:
>>
>> > Hi Shaoxuan/Weihua,
>> >
>> > Thanks for the proposal and driving the effort.
>> > I also replied to the original discussion thread, and still a +1 on
>> moving
>> > towards the ski-learn model.
>> > I just left a few comments on the API details and some general
>> questions.
>> > Please kindly take a look.
>> >
>> > There's another thread regarding a close to merge FLIP-23 implementation
>> > [1]. I agree this might still be early stage to talk about
>> productionizing
>> > and model-serving. But I would be nice to keep the
>> design/implementation in
>> > mind that: ease of use for productionizing a ML pipeline is also very
>> > important.
>> > And if we can leverage the implementation in FLIP-23 in the future,
>> (some
>> > adjustment might be needed) that would be super helpful.
>> >
>> > Best,
>> > Rong
>> >
>> >
>> > [1]
>> >
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-23-Model-Serving-td20260.html
>> >
>> >
>> > On Tue, Apr 30, 2019 at 1:47 AM Shaoxuan Wang <ws...@gmail.com>
>> wrote:
>> >
>> > > Thanks for all the feedback.
>> > >
>> > > @Jincheng Sun
>> > > > I recommend It's better to add a detailed implementation plan to
>> FLIP
>> > and
>> > > google doc.
>> > > Yes, I will add a subsection for implementation plan.
>> > >
>> > > @Chen Qin
>> > > >Just share some of insights from operating SparkML side at scale
>> > > >- map reduce may not best way to iterative sync partitioned workers.
>> > > >- native hardware accelerations is key to adopt rapid changes in ML
>> > > improvements in foreseeable future.
>> > > Thanks for sharing your experience on SparkML. The purpose of this
>> FLIP
>> > is
>> > > mainly to provide the interfaces for ML pipeline and ML lib, and the
>> > > implementations of most standard algorithms. Besides this FLIP, for AI
>> > > computing on Flink, we will continue to contribute the efforts, like
>> the
>> > > enhancement of iterative and the integration of deep learning engines
>> > (such
>> > > as Tensoflow/Pytorch). I have presented part of these work in
>> > >
>> > >
>> >
>> https://www.ververica.com/resources/flink-forward-san-francisco-2019/when-table-meets-ai-build-flink-ai-ecosystem-on-table-api
>> > > I am not sure if I have fully got your comments. Can you please
>> elaborate
>> > > them with more details, and if possible, please provide some
>> suggestions
>> > > about what we should work on to address the challenges you have
>> > mentioned.
>> > >
>> > > Regards,
>> > > Shaoxuan
>> > >
>> > > On Mon, Apr 29, 2019 at 11:28 AM Chen Qin <qi...@gmail.com> wrote:
>> > >
>> > > > Just share some of insights from operating SparkML side at scale
>> > > > - map reduce may not best way to iterative sync partitioned workers.
>> > > > - native hardware accelerations is key to adopt rapid changes in ML
>> > > > improvements in foreseeable future.
>> > > >
>> > > > Chen
>> > > >
>> > > > On Apr 29, 2019, at 11:02, jincheng sun <su...@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > Hi Shaoxuan,
>> > > > >
>> > > > > Thanks for doing more efforts for the enhances of the scalability
>> and
>> > > the
>> > > > > ease of use of Flink ML and make it one step further. Thank you
>> for
>> > > > sharing
>> > > > > a lot of context information.
>> > > > >
>> > > > > big +1 for this proposal!
>> > > > >
>> > > > > Here only one suggestion, that is, It has been a short time until
>> the
>> > > > > release of flink-1.9, so I recommend It's better to add a detailed
>> > > > > implementation plan to FLIP and google doc.
>> > > > >
>> > > > > What do you think?
>> > > > >
>> > > > > Best,
>> > > > > Jincheng
>> > > > >
>> > > > > Shaoxuan Wang <ws...@gmail.com> 于2019年4月29日周一 上午10:34写道：
>> > > > >
>> > > > >> Hi everyone,
>> > > > >>
>> > > > >> Weihua has proposed to rebuild Flink ML pipeline on top of
>> TableAPI
>> > > > several
>> > > > >> months ago in this mail thread:
>> > > > >>
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
>> > > > >>
>> > > > >> Luogen, Becket, Xu, Weihua and I have been working on this
>> proposal
>> > > > >> offline in
>> > > > >> the past a few months. Now we want to share the first phase of
>> the
>> > > > entire
>> > > > >> proposal with a FLIP. In this FLIP-39, we want to achieve several
>> > > things
>> > > > >> (and hope those can be accomplished and released in Flink-1.9):
>> > > > >>
>> > > > >>   -
>> > > > >>
>> > > > >>   Provide a new set of ML core interface (on top of Flink
>> TableAPI)
>> > > > >>   -
>> > > > >>
>> > > > >>   Provide a ML pipeline interface (on top of Flink TableAPI)
>> > > > >>   -
>> > > > >>
>> > > > >>   Provide the interfaces for parameters management and
>> pipeline/mode
>> > > > >>   persistence
>> > > > >>   -
>> > > > >>
>> > > > >>   All the above interfaces should facilitate any new ML
>> algorithm.
>> > We
>> > > > will
>> > > > >>   gradually add various standard ML algorithms on top of these
>> new
>> > > > >> proposed
>> > > > >>   interfaces to ensure their feasibility and scalability.
>> > > > >>
>> > > > >>
>> > > > >> Part of this FLIP has been present in Flink Forward 2019 @ San
>> > > > Francisco by
>> > > > >> Xu and Me.
>> > > > >>
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
>> > > > >>
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
>> > > > >>
>> > > > >> You can find the videos & slides at
>> > > > >> https://www.ververica.com/flink-forward-san-francisco-2019
>> > > > >>
>> > > > >> The design document for FLIP-39 can be found here:
>> > > > >>
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
>> > > > >>
>> > > > >>
>> > > > >> I am looking forward to your feedback.
>> > > > >>
>> > > > >> Regards,
>> > > > >>
>> > > > >> Shaoxuan
>> > > > >>
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Shaoxuan Wang <ws...@gmail.com>.

Thanks for the feedback, Rong and Flavio.

@Rong Rong
> There's another thread regarding a close to merge FLIP-23 implementation
> [1]. I agree this might still be early stage to talk about productionizing
> and model-serving. But I would be nice to keep the design/implementation
in
> mind that: ease of use for productionizing a ML pipeline is also very
> important.
> And if we can leverage the implementation in FLIP-23 in the future, (some
> adjustment might be needed) that would be super helpful.
Your raised a very good point. Actually I have been reviewing FLIP23 for a
while (mostly offline to help Boris polish the PR). FMPOV, FLIP23 and
FLIP39 can be well unified at some point. Model serving in FLIP23 is
actually a special case of “transformer/model” proposed in FLIP39. Boris's
implementation of model serving can be designed as an abstract class on top
of transformer/model interface, and then can be used by ML users as a
certain ML lib.  I have some other comments WRT FLIP23 x FLIP39, I will
reply to the FLIP23 ML later with more details.

@Flavio
> I have read many discussion about Flink ML and none of them take into
> account the ongoing efforts carried out of by the Streamline H2020 project
> [1] on this topic.
> Have you tried to ping them? I think that both projects could benefits
from
> a joined effort on this side..
> [1] https://h2020-streamline-project.eu/objectives/
Thank you for your info. I am not aware of the Streamline H2020 projects
before. Just did a quick look at its website and github. IMO these projects
could be very good Flink ecosystem projects and can be built on top of ML
pipeline & ML lib interfaces introduced in FLIP39. I will try to contact
the owners of these projects to understand their plans and blockers of
using Flink (if there is any). In the meantime, if you have the direct
contact of person who might be interested on ML pipeline & ML lib, please
share with me.

Regards,
Shaoxuan





On Thu, May 2, 2019 at 3:59 PM Flavio Pompermaier <po...@okkam.it>
wrote:

> Hi to all,
> I have read many discussion about Flink ML and none of them take into
> account the ongoing efforts carried out of by the Streamline H2020 project
> [1] on this topic.
> Have you tried to ping them? I think that both projects could benefits from
> a joined effort on this side..
> [1] https://h2020-streamline-project.eu/objectives/
>
> Best,
> Flavio
>
> On Thu, May 2, 2019 at 12:18 AM Rong Rong <wa...@gmail.com> wrote:
>
> > Hi Shaoxuan/Weihua,
> >
> > Thanks for the proposal and driving the effort.
> > I also replied to the original discussion thread, and still a +1 on
> moving
> > towards the ski-learn model.
> > I just left a few comments on the API details and some general questions.
> > Please kindly take a look.
> >
> > There's another thread regarding a close to merge FLIP-23 implementation
> > [1]. I agree this might still be early stage to talk about
> productionizing
> > and model-serving. But I would be nice to keep the design/implementation
> in
> > mind that: ease of use for productionizing a ML pipeline is also very
> > important.
> > And if we can leverage the implementation in FLIP-23 in the future, (some
> > adjustment might be needed) that would be super helpful.
> >
> > Best,
> > Rong
> >
> >
> > [1]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-23-Model-Serving-td20260.html
> >
> >
> > On Tue, Apr 30, 2019 at 1:47 AM Shaoxuan Wang <ws...@gmail.com>
> wrote:
> >
> > > Thanks for all the feedback.
> > >
> > > @Jincheng Sun
> > > > I recommend It's better to add a detailed implementation plan to FLIP
> > and
> > > google doc.
> > > Yes, I will add a subsection for implementation plan.
> > >
> > > @Chen Qin
> > > >Just share some of insights from operating SparkML side at scale
> > > >- map reduce may not best way to iterative sync partitioned workers.
> > > >- native hardware accelerations is key to adopt rapid changes in ML
> > > improvements in foreseeable future.
> > > Thanks for sharing your experience on SparkML. The purpose of this FLIP
> > is
> > > mainly to provide the interfaces for ML pipeline and ML lib, and the
> > > implementations of most standard algorithms. Besides this FLIP, for AI
> > > computing on Flink, we will continue to contribute the efforts, like
> the
> > > enhancement of iterative and the integration of deep learning engines
> > (such
> > > as Tensoflow/Pytorch). I have presented part of these work in
> > >
> > >
> >
> https://www.ververica.com/resources/flink-forward-san-francisco-2019/when-table-meets-ai-build-flink-ai-ecosystem-on-table-api
> > > I am not sure if I have fully got your comments. Can you please
> elaborate
> > > them with more details, and if possible, please provide some
> suggestions
> > > about what we should work on to address the challenges you have
> > mentioned.
> > >
> > > Regards,
> > > Shaoxuan
> > >
> > > On Mon, Apr 29, 2019 at 11:28 AM Chen Qin <qi...@gmail.com> wrote:
> > >
> > > > Just share some of insights from operating SparkML side at scale
> > > > - map reduce may not best way to iterative sync partitioned workers.
> > > > - native hardware accelerations is key to adopt rapid changes in ML
> > > > improvements in foreseeable future.
> > > >
> > > > Chen
> > > >
> > > > On Apr 29, 2019, at 11:02, jincheng sun <su...@gmail.com>
> > > wrote:
> > > > >
> > > > > Hi Shaoxuan,
> > > > >
> > > > > Thanks for doing more efforts for the enhances of the scalability
> and
> > > the
> > > > > ease of use of Flink ML and make it one step further. Thank you for
> > > > sharing
> > > > > a lot of context information.
> > > > >
> > > > > big +1 for this proposal!
> > > > >
> > > > > Here only one suggestion, that is, It has been a short time until
> the
> > > > > release of flink-1.9, so I recommend It's better to add a detailed
> > > > > implementation plan to FLIP and google doc.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Best,
> > > > > Jincheng
> > > > >
> > > > > Shaoxuan Wang <ws...@gmail.com> 于2019年4月29日周一 上午10:34写道：
> > > > >
> > > > >> Hi everyone,
> > > > >>
> > > > >> Weihua has proposed to rebuild Flink ML pipeline on top of
> TableAPI
> > > > several
> > > > >> months ago in this mail thread:
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
> > > > >>
> > > > >> Luogen, Becket, Xu, Weihua and I have been working on this
> proposal
> > > > >> offline in
> > > > >> the past a few months. Now we want to share the first phase of the
> > > > entire
> > > > >> proposal with a FLIP. In this FLIP-39, we want to achieve several
> > > things
> > > > >> (and hope those can be accomplished and released in Flink-1.9):
> > > > >>
> > > > >>   -
> > > > >>
> > > > >>   Provide a new set of ML core interface (on top of Flink
> TableAPI)
> > > > >>   -
> > > > >>
> > > > >>   Provide a ML pipeline interface (on top of Flink TableAPI)
> > > > >>   -
> > > > >>
> > > > >>   Provide the interfaces for parameters management and
> pipeline/mode
> > > > >>   persistence
> > > > >>   -
> > > > >>
> > > > >>   All the above interfaces should facilitate any new ML algorithm.
> > We
> > > > will
> > > > >>   gradually add various standard ML algorithms on top of these new
> > > > >> proposed
> > > > >>   interfaces to ensure their feasibility and scalability.
> > > > >>
> > > > >>
> > > > >> Part of this FLIP has been present in Flink Forward 2019 @ San
> > > > Francisco by
> > > > >> Xu and Me.
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
> > > > >>
> > > > >> You can find the videos & slides at
> > > > >> https://www.ververica.com/flink-forward-san-francisco-2019
> > > > >>
> > > > >> The design document for FLIP-39 can be found here:
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
> > > > >>
> > > > >>
> > > > >> I am looking forward to your feedback.
> > > > >>
> > > > >> Regards,
> > > > >>
> > > > >> Shaoxuan
> > > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Flavio Pompermaier <po...@okkam.it>.

Hi to all,
I have read many discussion about Flink ML and none of them take into
account the ongoing efforts carried out of by the Streamline H2020 project
[1] on this topic.
Have you tried to ping them? I think that both projects could benefits from
a joined effort on this side..
[1] https://h2020-streamline-project.eu/objectives/

Best,
Flavio

On Thu, May 2, 2019 at 12:18 AM Rong Rong <wa...@gmail.com> wrote:

> Hi Shaoxuan/Weihua,
>
> Thanks for the proposal and driving the effort.
> I also replied to the original discussion thread, and still a +1 on moving
> towards the ski-learn model.
> I just left a few comments on the API details and some general questions.
> Please kindly take a look.
>
> There's another thread regarding a close to merge FLIP-23 implementation
> [1]. I agree this might still be early stage to talk about productionizing
> and model-serving. But I would be nice to keep the design/implementation in
> mind that: ease of use for productionizing a ML pipeline is also very
> important.
> And if we can leverage the implementation in FLIP-23 in the future, (some
> adjustment might be needed) that would be super helpful.
>
> Best,
> Rong
>
>
> [1]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-23-Model-Serving-td20260.html
>
>
> On Tue, Apr 30, 2019 at 1:47 AM Shaoxuan Wang <ws...@gmail.com> wrote:
>
> > Thanks for all the feedback.
> >
> > @Jincheng Sun
> > > I recommend It's better to add a detailed implementation plan to FLIP
> and
> > google doc.
> > Yes, I will add a subsection for implementation plan.
> >
> > @Chen Qin
> > >Just share some of insights from operating SparkML side at scale
> > >- map reduce may not best way to iterative sync partitioned workers.
> > >- native hardware accelerations is key to adopt rapid changes in ML
> > improvements in foreseeable future.
> > Thanks for sharing your experience on SparkML. The purpose of this FLIP
> is
> > mainly to provide the interfaces for ML pipeline and ML lib, and the
> > implementations of most standard algorithms. Besides this FLIP, for AI
> > computing on Flink, we will continue to contribute the efforts, like the
> > enhancement of iterative and the integration of deep learning engines
> (such
> > as Tensoflow/Pytorch). I have presented part of these work in
> >
> >
> https://www.ververica.com/resources/flink-forward-san-francisco-2019/when-table-meets-ai-build-flink-ai-ecosystem-on-table-api
> > I am not sure if I have fully got your comments. Can you please elaborate
> > them with more details, and if possible, please provide some suggestions
> > about what we should work on to address the challenges you have
> mentioned.
> >
> > Regards,
> > Shaoxuan
> >
> > On Mon, Apr 29, 2019 at 11:28 AM Chen Qin <qi...@gmail.com> wrote:
> >
> > > Just share some of insights from operating SparkML side at scale
> > > - map reduce may not best way to iterative sync partitioned workers.
> > > - native hardware accelerations is key to adopt rapid changes in ML
> > > improvements in foreseeable future.
> > >
> > > Chen
> > >
> > > On Apr 29, 2019, at 11:02, jincheng sun <su...@gmail.com>
> > wrote:
> > > >
> > > > Hi Shaoxuan,
> > > >
> > > > Thanks for doing more efforts for the enhances of the scalability and
> > the
> > > > ease of use of Flink ML and make it one step further. Thank you for
> > > sharing
> > > > a lot of context information.
> > > >
> > > > big +1 for this proposal!
> > > >
> > > > Here only one suggestion, that is, It has been a short time until the
> > > > release of flink-1.9, so I recommend It's better to add a detailed
> > > > implementation plan to FLIP and google doc.
> > > >
> > > > What do you think?
> > > >
> > > > Best,
> > > > Jincheng
> > > >
> > > > Shaoxuan Wang <ws...@gmail.com> 于2019年4月29日周一 上午10:34写道：
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI
> > > several
> > > >> months ago in this mail thread:
> > > >>
> > > >>
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
> > > >>
> > > >> Luogen, Becket, Xu, Weihua and I have been working on this proposal
> > > >> offline in
> > > >> the past a few months. Now we want to share the first phase of the
> > > entire
> > > >> proposal with a FLIP. In this FLIP-39, we want to achieve several
> > things
> > > >> (and hope those can be accomplished and released in Flink-1.9):
> > > >>
> > > >>   -
> > > >>
> > > >>   Provide a new set of ML core interface (on top of Flink TableAPI)
> > > >>   -
> > > >>
> > > >>   Provide a ML pipeline interface (on top of Flink TableAPI)
> > > >>   -
> > > >>
> > > >>   Provide the interfaces for parameters management and pipeline/mode
> > > >>   persistence
> > > >>   -
> > > >>
> > > >>   All the above interfaces should facilitate any new ML algorithm.
> We
> > > will
> > > >>   gradually add various standard ML algorithms on top of these new
> > > >> proposed
> > > >>   interfaces to ensure their feasibility and scalability.
> > > >>
> > > >>
> > > >> Part of this FLIP has been present in Flink Forward 2019 @ San
> > > Francisco by
> > > >> Xu and Me.
> > > >>
> > > >>
> > > >>
> > >
> >
> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
> > > >>
> > > >>
> > > >>
> > >
> >
> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
> > > >>
> > > >> You can find the videos & slides at
> > > >> https://www.ververica.com/flink-forward-san-francisco-2019
> > > >>
> > > >> The design document for FLIP-39 can be found here:
> > > >>
> > > >>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
> > > >>
> > > >>
> > > >> I am looking forward to your feedback.
> > > >>
> > > >> Regards,
> > > >>
> > > >> Shaoxuan
> > > >>
> > >
> >
>