You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Robert Metzger <rm...@apache.org> on 2019/08/14 09:56:10 UTC

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

It seems that this FLIP doesn't have a Wiki page yet [1], even though it is
already partially implemented [2]
We should try to stick more to the FLIP process to manage the project more
efficiently.


[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
[2] https://issues.apache.org/jira/browse/FLINK-12470

On Mon, Jun 17, 2019 at 12:27 PM Gen Luo <lu...@gmail.com> wrote:

> Hi all,
>
> In the review of PR for FLINK-12473, there were a few comments regarding
> pipeline exportation. We would like to start a follow up discussions to
> address some related comments.
>
> Currently, FLIP-39 proposal gives a way for users to persist a pipeline in
> JSON format. But it does not specify how users can export a pipeline for
> serving purpose. We summarized some thoughts on this in the following doc.
>
>
> https://docs.google.com/document/d/1B84b-1CvOXtwWQ6_tQyiaHwnSeiRqh-V96Or8uHqCp8/edit?usp=sharing
>
> After we reach consensus on the pipeline exportation, we will add a
> corresponding section in FLIP-39.
>
>
> Shaoxuan Wang <ws...@gmail.com> 于2019年6月5日周三 上午8:47写道：
>
> > Stavros,
> > They have the similar logic concept, but the implementation details are
> > quite different. It is hard to migrate the interface with different
> > implementations. The built-in algorithms are useful legacy that we will
> > consider migrate to the new API (but still with different
> implementations).
> > BTW, the new API has already been merged via FLINK-12473.
> >
> > Thanks,
> > Shaoxuan
> >
> >
> >
> > On Mon, Jun 3, 2019 at 6:08 PM Stavros Kontopoulos <
> > st.kontopoulos@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Some portion of the code could be migrated to the new Table API no?
> > > I am saying that because the new API design is based on scikit-learn
> and
> > > the old one was also inspired by it.
> > >
> > > Best,
> > > Stavros
> > > On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang <ws...@gmail.com>
> > wrote:
> > >
> > > > Another consensus (from the offline discussion) is that we will
> > > > delete/deprecate flink-libraries/flink-ml. I have started a survey
> and
> > > > discussion [1] in dev/user-ml to collect the feedback. Depending on
> the
> > > > replies, we will decide if we shall delete it in Flink1.9 or
> > > > deprecate&delete in the next release after 1.9.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html
> > > >
> > > > Regards,
> > > > Shaoxuan
> > > >
> > > >
> > > > On Tue, May 21, 2019 at 9:22 PM Gen Luo <lu...@gmail.com> wrote:
> > > >
> > > > > Yes, this is our conclusion. I'd like to add only one point that
> > > > > registering user defined aggregator is also needed which is
> currently
> > > > > provided by 'bridge' and finally will be merged into Table API.
> It's
> > > same
> > > > > with collect().
> > > > >
> > > > > I will add a TableEnvironment argument in Estimator.fit() and
> > > > > Transformer.transform() to get rid of the dependency on
> > > > > flink-table-planner. This will be committed soon.
> > > > >
> > > > > Aljoscha Krettek <al...@apache.org> 于2019年5月21日周二 下午7:31写道：
> > > > >
> > > > > > We discussed this in private and came to the conclusion that we
> > > should
> > > > > > (for now) have the dependency on flink-table-api-xxx-bridge
> because
> > > we
> > > > > need
> > > > > > access to the collect() method, which is not yet available in the
> > > Table
> > > > > > API. Once that is available the code can be refactored but for
> now
> > we
> > > > > want
> > > > > > to unblock work on this new module.
> > > > > >
> > > > > > We also agreed that we don’t need a direct dependency on
> > > > > > flink-table-planner.
> > > > > >
> > > > > > I hope I summarised our discussion correctly.
> > > > > >
> > > > > > > On 17. May 2019, at 12:20, Gen Luo <lu...@gmail.com>
> wrote:
> > > > > > >
> > > > > > > Thanks for your reply.
> > > > > > >
> > > > > > > For the first question, it's not strictly necessary. But I
> perfer
> > > not
> > > > > to
> > > > > > > have a TableEnvironment argument in Estimator.fit() or
> > > > > > > Transformer.transform(), which is not part of machine learning
> > > > concept,
> > > > > > and
> > > > > > > may make our API not as clean and pretty as other systems do. I
> > > would
> > > > > > like
> > > > > > > another way other than introducing flink-table-planner to do
> > this.
> > > If
> > > > > > it's
> > > > > > > impossible or severely opposed, I may make the concession to
> add
> > > the
> > > > > > > argument.
> > > > > > >
> > > > > > > Other than that, "flink-table-api-xxx-bridge"s are still
> needed.
> > A
> > > > vary
> > > > > > > common case is that an algorithm needs to guarantee that it's
> > > running
> > > > > > under
> > > > > > > a BatchTableEnvironment, which makes it possible to collect
> > result
> > > > each
> > > > > > > iteration. A typical algorithm like this is ALS. By flink1.8,
> > this
> > > > can
> > > > > be
> > > > > > > only achieved by converting Table to DataSet than call
> > > > > DataSet.collect(),
> > > > > > > which is available in flink-table-api-xxx-bridge. Besides,
> > > > registering
> > > > > > > UDAGG is also depending on it.
> > > > > > >
> > > > > > > In conclusion, '"planner" can be removed from dependencies but
> > > > > > introducing
> > > > > > > "bridge"s are inevitable. Whether and how to acquire
> > > TableEnvironment
> > > > > > from
> > > > > > > a Table can be discussed.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Posted by Shuiqiang Chen <ac...@gmail.com>.

Hi Robert,

Thank you for your reminding! I have added the wiki page[1] for this FLIP.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs

Robert Metzger <rm...@apache.org> 于2019年8月14日周三 下午5:56写道：

> It seems that this FLIP doesn't have a Wiki page yet [1], even though it is
> already partially implemented [2]
> We should try to stick more to the FLIP process to manage the project more
> efficiently.
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> [2] https://issues.apache.org/jira/browse/FLINK-12470
>
> On Mon, Jun 17, 2019 at 12:27 PM Gen Luo <lu...@gmail.com> wrote:
>
> > Hi all,
> >
> > In the review of PR for FLINK-12473, there were a few comments regarding
> > pipeline exportation. We would like to start a follow up discussions to
> > address some related comments.
> >
> > Currently, FLIP-39 proposal gives a way for users to persist a pipeline
> in
> > JSON format. But it does not specify how users can export a pipeline for
> > serving purpose. We summarized some thoughts on this in the following
> doc.
> >
> >
> >
> https://docs.google.com/document/d/1B84b-1CvOXtwWQ6_tQyiaHwnSeiRqh-V96Or8uHqCp8/edit?usp=sharing
> >
> > After we reach consensus on the pipeline exportation, we will add a
> > corresponding section in FLIP-39.
> >
> >
> > Shaoxuan Wang <ws...@gmail.com> 于2019年6月5日周三 上午8:47写道：
> >
> > > Stavros,
> > > They have the similar logic concept, but the implementation details are
> > > quite different. It is hard to migrate the interface with different
> > > implementations. The built-in algorithms are useful legacy that we will
> > > consider migrate to the new API (but still with different
> > implementations).
> > > BTW, the new API has already been merged via FLINK-12473.
> > >
> > > Thanks,
> > > Shaoxuan
> > >
> > >
> > >
> > > On Mon, Jun 3, 2019 at 6:08 PM Stavros Kontopoulos <
> > > st.kontopoulos@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Some portion of the code could be migrated to the new Table API no?
> > > > I am saying that because the new API design is based on scikit-learn
> > and
> > > > the old one was also inspired by it.
> > > >
> > > > Best,
> > > > Stavros
> > > > On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang <ws...@gmail.com>
> > > wrote:
> > > >
> > > > > Another consensus (from the offline discussion) is that we will
> > > > > delete/deprecate flink-libraries/flink-ml. I have started a survey
> > and
> > > > > discussion [1] in dev/user-ml to collect the feedback. Depending on
> > the
> > > > > replies, we will decide if we shall delete it in Flink1.9 or
> > > > > deprecate&delete in the next release after 1.9.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html
> > > > >
> > > > > Regards,
> > > > > Shaoxuan
> > > > >
> > > > >
> > > > > On Tue, May 21, 2019 at 9:22 PM Gen Luo <lu...@gmail.com>
> wrote:
> > > > >
> > > > > > Yes, this is our conclusion. I'd like to add only one point that
> > > > > > registering user defined aggregator is also needed which is
> > currently
> > > > > > provided by 'bridge' and finally will be merged into Table API.
> > It's
> > > > same
> > > > > > with collect().
> > > > > >
> > > > > > I will add a TableEnvironment argument in Estimator.fit() and
> > > > > > Transformer.transform() to get rid of the dependency on
> > > > > > flink-table-planner. This will be committed soon.
> > > > > >
> > > > > > Aljoscha Krettek <al...@apache.org> 于2019年5月21日周二 下午7:31写道：
> > > > > >
> > > > > > > We discussed this in private and came to the conclusion that we
> > > > should
> > > > > > > (for now) have the dependency on flink-table-api-xxx-bridge
> > because
> > > > we
> > > > > > need
> > > > > > > access to the collect() method, which is not yet available in
> the
> > > > Table
> > > > > > > API. Once that is available the code can be refactored but for
> > now
> > > we
> > > > > > want
> > > > > > > to unblock work on this new module.
> > > > > > >
> > > > > > > We also agreed that we don’t need a direct dependency on
> > > > > > > flink-table-planner.
> > > > > > >
> > > > > > > I hope I summarised our discussion correctly.
> > > > > > >
> > > > > > > > On 17. May 2019, at 12:20, Gen Luo <lu...@gmail.com>
> > wrote:
> > > > > > > >
> > > > > > > > Thanks for your reply.
> > > > > > > >
> > > > > > > > For the first question, it's not strictly necessary. But I
> > perfer
> > > > not
> > > > > > to
> > > > > > > > have a TableEnvironment argument in Estimator.fit() or
> > > > > > > > Transformer.transform(), which is not part of machine
> learning
> > > > > concept,
> > > > > > > and
> > > > > > > > may make our API not as clean and pretty as other systems
> do. I
> > > > would
> > > > > > > like
> > > > > > > > another way other than introducing flink-table-planner to do
> > > this.
> > > > If
> > > > > > > it's
> > > > > > > > impossible or severely opposed, I may make the concession to
> > add
> > > > the
> > > > > > > > argument.
> > > > > > > >
> > > > > > > > Other than that, "flink-table-api-xxx-bridge"s are still
> > needed.
> > > A
> > > > > vary
> > > > > > > > common case is that an algorithm needs to guarantee that it's
> > > > running
> > > > > > > under
> > > > > > > > a BatchTableEnvironment, which makes it possible to collect
> > > result
> > > > > each
> > > > > > > > iteration. A typical algorithm like this is ALS. By flink1.8,
> > > this
> > > > > can
> > > > > > be
> > > > > > > > only achieved by converting Table to DataSet than call
> > > > > > DataSet.collect(),
> > > > > > > > which is available in flink-table-api-xxx-bridge. Besides,
> > > > > registering
> > > > > > > > UDAGG is also depending on it.
> > > > > > > >
> > > > > > > > In conclusion, '"planner" can be removed from dependencies
> but
> > > > > > > introducing
> > > > > > > > "bridge"s are inevitable. Whether and how to acquire
> > > > TableEnvironment
> > > > > > > from
> > > > > > > > a Table can be discussed.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>