You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Dong Lin <li...@gmail.com> on 2023/02/01 12:38:43 UTC

[DISCUSS] FLIP-289: Support online inference (Flink ML)

Hi all,

Fan, Jiang, Zhipeng, and I have created FLIP-289: Support online inference
(Flink ML).

The goal of this FLIP is to enable users to use the model trained by Flink
ML to do online inference. More details can be found in the FLIP doc at
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240881268.

We are looking forward to your comments.

Regards,
Dong

Re: [DISCUSS] FLIP-289: Support online inference (Flink ML)

Posted by Dian Fu <di...@gmail.com>.
Hi Dong,

Thanks for the reply. Got your points and it makes sense to me now~

Regards,
Dian

On Thu, Feb 9, 2023 at 3:45 PM Dong Lin <li...@gmail.com> wrote:

> Hi Dian,
>
> Thanks for the review! Please see my reply inline.
>
> Regards,
> Dong
>
> On Wed, Feb 8, 2023 at 11:58 AM Dian Fu <di...@gmail.com> wrote:
>
> > Hi Dong,
> >
> >
> > Thanks for driving this effort! This FLIP LGTM overall. I have just a few
> > minor comments regarding the proposed API:
> >
> > 1) For the method `DataFrame.collect()`, why is it named `collect`
> instead
> > of something else, e.g. `get`? Does it mean that the result will be
> > computed during this method call?
> >
>
> I have chosen the names to be closer to the names of similar classes or
> concepts in the existing popular open-source projects.
>
> The reasons for using `collect` include:
> - mleap is a popular machine learning serving framework and it uses
> LeapFrame#collect()
> <
> https://github.com/combust/mleap/blob/master/mleap-runtime/src/main/scala/ml/combust/mleap/runtime/frame/LeapFrame.scala#L13
> >
> to represent this concept.
> - Collect rows from table seems close to the concept of
> TableResult#collect() in Flink.
> - The method returns a *collection* of rows from this table, which makes
> `collect()` is bit more relevant than `get()`.
>
> Currently, I expect DataFrame.collect() to just return the values already
> computed before this method is called.
>
>
> > 2) For the method `DataFrame.addColumn`, when will this method be used?
> >
>
> For example, we might want to implement KMeansModelServable#transform
> that appends the prediction result to the input dataFrame. This is similar
> to the existing way that KMeansModel#transform
> <
> https://github.com/apache/flink-ml/blob/master/flink-ml-lib/src/main/java/org/apache/flink/ml/clustering/kmeans/KMeansModel.java#L84
> >
> appends
> the result to the input Table. Note that we probably don't want to copy the
> entire input DataFrame to the outpuDataFrame.
>
> We will need to use DataFrame.addColumn to append the prediction result in
> this case.
>
>
> > 3) In the example `runOnlineInferenceOnWebServer`, it uses
> > `output_df.getDataType("output")` to get the result type. It's not quite
> > intuitive. Does it make sense to add a method `getDataType()` in
> DataFrame?
>
>
> I guess you are asking whether we should just have a method getDataType()
> that returns types of all columns of the dataframe.
>
> Here is a scenario where users and algorithms will read only selected
> columns from DataFrame:
> - Users provided a DataFrame with columnA and columnB
> - The 1st algorithm will read columnA and columnB, then computes/appends
> columnC to the DataFrame.
> - The 2nd algorithm will read columnB and columnC, then computes/appends
> columnD to the DataFrame.
> - Users need to read columnD in the final DataFrame.
>
>
> > Regards,
> > Dian
> >
> >
> >
> > On Tue, Feb 7, 2023 at 12:37 PM Dong Lin <li...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > If there is no question related to this FLIP, we will start the voting
> > > thread on 2/10.
> > >
> > > Regards,
> > > Dong
> > >
> > > On Wed, Feb 1, 2023 at 8:38 PM Dong Lin <li...@gmail.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Fan, Jiang, Zhipeng, and I have created FLIP-289: Support online
> > > inference
> > > > (Flink ML).
> > > >
> > > > The goal of this FLIP is to enable users to use the model trained by
> > > Flink
> > > > ML to do online inference. More details can be found in the FLIP doc
> at
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240881268
> > > > .
> > > >
> > > > We are looking forward to your comments.
> > > >
> > > > Regards,
> > > > Dong
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-289: Support online inference (Flink ML)

Posted by Dong Lin <li...@gmail.com>.
Hi Dian,

Thanks for the review! Please see my reply inline.

Regards,
Dong

On Wed, Feb 8, 2023 at 11:58 AM Dian Fu <di...@gmail.com> wrote:

> Hi Dong,
>
>
> Thanks for driving this effort! This FLIP LGTM overall. I have just a few
> minor comments regarding the proposed API:
>
> 1) For the method `DataFrame.collect()`, why is it named `collect` instead
> of something else, e.g. `get`? Does it mean that the result will be
> computed during this method call?
>

I have chosen the names to be closer to the names of similar classes or
concepts in the existing popular open-source projects.

The reasons for using `collect` include:
- mleap is a popular machine learning serving framework and it uses
LeapFrame#collect()
<https://github.com/combust/mleap/blob/master/mleap-runtime/src/main/scala/ml/combust/mleap/runtime/frame/LeapFrame.scala#L13>
to represent this concept.
- Collect rows from table seems close to the concept of
TableResult#collect() in Flink.
- The method returns a *collection* of rows from this table, which makes
`collect()` is bit more relevant than `get()`.

Currently, I expect DataFrame.collect() to just return the values already
computed before this method is called.


> 2) For the method `DataFrame.addColumn`, when will this method be used?
>

For example, we might want to implement KMeansModelServable#transform
that appends the prediction result to the input dataFrame. This is similar
to the existing way that KMeansModel#transform
<https://github.com/apache/flink-ml/blob/master/flink-ml-lib/src/main/java/org/apache/flink/ml/clustering/kmeans/KMeansModel.java#L84>
appends
the result to the input Table. Note that we probably don't want to copy the
entire input DataFrame to the outpuDataFrame.

We will need to use DataFrame.addColumn to append the prediction result in
this case.


> 3) In the example `runOnlineInferenceOnWebServer`, it uses
> `output_df.getDataType("output")` to get the result type. It's not quite
> intuitive. Does it make sense to add a method `getDataType()` in DataFrame?


I guess you are asking whether we should just have a method getDataType()
that returns types of all columns of the dataframe.

Here is a scenario where users and algorithms will read only selected
columns from DataFrame:
- Users provided a DataFrame with columnA and columnB
- The 1st algorithm will read columnA and columnB, then computes/appends
columnC to the DataFrame.
- The 2nd algorithm will read columnB and columnC, then computes/appends
columnD to the DataFrame.
- Users need to read columnD in the final DataFrame.


> Regards,
> Dian
>
>
>
> On Tue, Feb 7, 2023 at 12:37 PM Dong Lin <li...@gmail.com> wrote:
>
> > Hi all,
> >
> > If there is no question related to this FLIP, we will start the voting
> > thread on 2/10.
> >
> > Regards,
> > Dong
> >
> > On Wed, Feb 1, 2023 at 8:38 PM Dong Lin <li...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Fan, Jiang, Zhipeng, and I have created FLIP-289: Support online
> > inference
> > > (Flink ML).
> > >
> > > The goal of this FLIP is to enable users to use the model trained by
> > Flink
> > > ML to do online inference. More details can be found in the FLIP doc at
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240881268
> > > .
> > >
> > > We are looking forward to your comments.
> > >
> > > Regards,
> > > Dong
> > >
> >
>

Re: [DISCUSS] FLIP-289: Support online inference (Flink ML)

Posted by Dian Fu <di...@gmail.com>.
Hi Dong,


Thanks for driving this effort! This FLIP LGTM overall. I have just a few
minor comments regarding the proposed API:

1) For the method `DataFrame.collect()`, why is it named `collect` instead
of something else, e.g. `get`? Does it mean that the result will be
computed during this method call?

2) For the method `DataFrame.addColumn`, when will this method be used?

3) In the example `runOnlineInferenceOnWebServer`, it uses
`output_df.getDataType("output")` to get the result type. It's not quite
intuitive. Does it make sense to add a method `getDataType()` in DataFrame?

Regards,
Dian



On Tue, Feb 7, 2023 at 12:37 PM Dong Lin <li...@gmail.com> wrote:

> Hi all,
>
> If there is no question related to this FLIP, we will start the voting
> thread on 2/10.
>
> Regards,
> Dong
>
> On Wed, Feb 1, 2023 at 8:38 PM Dong Lin <li...@gmail.com> wrote:
>
> > Hi all,
> >
> > Fan, Jiang, Zhipeng, and I have created FLIP-289: Support online
> inference
> > (Flink ML).
> >
> > The goal of this FLIP is to enable users to use the model trained by
> Flink
> > ML to do online inference. More details can be found in the FLIP doc at
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240881268
> > .
> >
> > We are looking forward to your comments.
> >
> > Regards,
> > Dong
> >
>

Re: [DISCUSS] FLIP-289: Support online inference (Flink ML)

Posted by Dong Lin <li...@gmail.com>.
Hi all,

If there is no question related to this FLIP, we will start the voting
thread on 2/10.

Regards,
Dong

On Wed, Feb 1, 2023 at 8:38 PM Dong Lin <li...@gmail.com> wrote:

> Hi all,
>
> Fan, Jiang, Zhipeng, and I have created FLIP-289: Support online inference
> (Flink ML).
>
> The goal of this FLIP is to enable users to use the model trained by Flink
> ML to do online inference. More details can be found in the FLIP doc at
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240881268
> .
>
> We are looking forward to your comments.
>
> Regards,
> Dong
>