You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Sonia-Florina Horchidan <sf...@kth.se> on 2022/01/07 09:23:32 UTC

Serving Machine Learning models

Hello,


I recently started looking into serving Machine Learning models for streaming data in Flink. To give more context, that would involve training a model offline (using PyTorch or TensorFlow), and calling it from inside a Flink job to do online inference on newly arrived data. I have found multiple discussions, presentations, and tools that could achieve this, and it seems like the two alternatives would be: (1) wrap the pre-trained models in a HTTP service (such as PyTorch Serve [1]) and let Flink do async calls for model scoring, or (2) convert the models into a standardized format (e.g., ONNX [2]), pre-load the model in memory for every task manager (or use external storage if needed) and call it for each new data point.

Both approaches come with a set of advantages and drawbacks and, as far as I understand, there is no "silver bullet", since one approach could be more suitable than the other based on the application requirements. However, I would be curious to know what would be the "recommended" methods for model serving (if any) and what approaches are currently adopted by the users in the wild.


[1] https://pytorch.org/serve/

[2] https://onnx.ai/


Best regards,

Sonia


 [Kth Logo]

Sonia-Florina Horchidan
PhD Student
KTH Royal Institute of Technology
Software and Computer Systems (SCS)
School of Electrical Engineering and Computer Science (EECS)
Mobil: +46769751562
<ma...@kth.se>, <http://www.kth.se/> www.kth.se<http://www.kth.se>

Re: Serving Machine Learning models

Posted by Xingbo Huang <hx...@gmail.com>.

Hi sonia,

As far as I know, pyflink users prefer to use python udf[1][2] for model
prediction. Load the model when the udf is initialized, and then predict
each new piece of data

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/table/udfs/overview/
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/datastream/operators/process_function/

Best,
Xingbo

David Anderson <da...@apache.org> 于2022年1月11日周二 03:39写道：

> Another approach that I find quite natural is to use Flink's Stateful
> Functions API [1] for model serving, and this has some nice advantages,
> such as zero-downtime deployments of new models, and the ease with which
> you can use Python. [2] is an example of this approach.
>
> [1] https://flink.apache.org/stateful-functions.html
> [2] https://github.com/ververica/flink-statefun-workshop
>
> On Fri, Jan 7, 2022 at 5:55 PM Yun Gao <yu...@aliyun.com> wrote:
>
>> Hi Sonia,
>>
>> Sorry I might not have the statistics on the provided two methods,
>> perhaps as input
>> I could also provide another method: currently there is an eco-project
>> dl-on-flink
>> that supports running DL frameworks on top of the Flink and it will
>> handle the data
>> exchange between java and python processes, which would allows to user
>> the native
>> model directly.
>>
>> Best,
>> Yun
>>
>>
>> [1] https://github.com/flink-extended/dl-on-flink
>>
>>
>>
>> ------------------------------------------------------------------
>> From:Sonia-Florina Horchidan <sf...@kth.se>
>> Send Time:2022 Jan. 7 (Fri.) 17:23
>> To:user@flink.apache.org <us...@flink.apache.org>
>> Subject:Serving Machine Learning models
>>
>> Hello,
>>
>>
>> I recently started looking into serving Machine Learning models for
>> streaming data in Flink. To give more context, that would involve training
>> a model offline (using PyTorch or TensorFlow), and calling it from inside a
>> Flink job to do online inference on newly arrived data. I have found
>> multiple discussions, presentations, and tools that could achieve this, and
>> it seems like the two alternatives would be: (1) wrap the pre-trained
>> models in a HTTP service (such as PyTorch Serve [1]) and let Flink do async
>> calls for model scoring, or (2) convert the models into a standardized
>> format (e.g., ONNX [2]), pre-load the model in memory for every task
>> manager (or use external storage if needed) and call it for each new data
>> point.
>>
>> Both approaches come with a set of advantages and drawbacks and, as far
>> as I understand, there is no "silver bullet", since one approach could be
>> more suitable than the other based on the application requirements.
>> However, I would be curious to know what would be the "recommended" methods
>> for model serving (if any) and what approaches are currently adopted by the
>> users in the wild.
>>
>> [1] https://pytorch.org/serve/
>>
>> [2] https://onnx.ai/
>>
>> Best regards,
>>
>> Sonia
>>
>>
>>  [image: Kth Logo]
>>
>> Sonia-Florina Horchidan
>> PhD Student
>> KTH Royal Institute of Technology
>> *Software and Computer Systems (SCS)*
>> School of Electrical Engineering and Computer Science (EECS)
>> Mobil: +46769751562
>> <an...@kth.se>sfhor@kth.se,  <http://www.kth.se/>www.kth.se
>>
>>
>>

Re: Serving Machine Learning models

Posted by David Anderson <da...@apache.org>.

Another approach that I find quite natural is to use Flink's Stateful
Functions API [1] for model serving, and this has some nice advantages,
such as zero-downtime deployments of new models, and the ease with which
you can use Python. [2] is an example of this approach.

[1] https://flink.apache.org/stateful-functions.html
[2] https://github.com/ververica/flink-statefun-workshop

On Fri, Jan 7, 2022 at 5:55 PM Yun Gao <yu...@aliyun.com> wrote:

> Hi Sonia,
>
> Sorry I might not have the statistics on the provided two methods, perhaps
> as input
> I could also provide another method: currently there is an eco-project
> dl-on-flink
> that supports running DL frameworks on top of the Flink and it will handle
> the data
> exchange between java and python processes, which would allows to user the
> native
> model directly.
>
> Best,
> Yun
>
>
> [1] https://github.com/flink-extended/dl-on-flink
>
>
>
> ------------------------------------------------------------------
> From:Sonia-Florina Horchidan <sf...@kth.se>
> Send Time:2022 Jan. 7 (Fri.) 17:23
> To:user@flink.apache.org <us...@flink.apache.org>
> Subject:Serving Machine Learning models
>
> Hello,
>
>
> I recently started looking into serving Machine Learning models for
> streaming data in Flink. To give more context, that would involve training
> a model offline (using PyTorch or TensorFlow), and calling it from inside a
> Flink job to do online inference on newly arrived data. I have found
> multiple discussions, presentations, and tools that could achieve this, and
> it seems like the two alternatives would be: (1) wrap the pre-trained
> models in a HTTP service (such as PyTorch Serve [1]) and let Flink do async
> calls for model scoring, or (2) convert the models into a standardized
> format (e.g., ONNX [2]), pre-load the model in memory for every task
> manager (or use external storage if needed) and call it for each new data
> point.
>
> Both approaches come with a set of advantages and drawbacks and, as far as
> I understand, there is no "silver bullet", since one approach could be more
> suitable than the other based on the application requirements. However, I
> would be curious to know what would be the "recommended" methods for model
> serving (if any) and what approaches are currently adopted by the users in
> the wild.
>
> [1] https://pytorch.org/serve/
>
> [2] https://onnx.ai/
>
> Best regards,
>
> Sonia
>
>
>  [image: Kth Logo]
>
> Sonia-Florina Horchidan
> PhD Student
> KTH Royal Institute of Technology
> *Software and Computer Systems (SCS)*
> School of Electrical Engineering and Computer Science (EECS)
> Mobil: +46769751562
> <an...@kth.se>sfhor@kth.se,  <http://www.kth.se/>www.kth.se
>
>
>

Re: Serving Machine Learning models

Posted by Yun Gao <yu...@aliyun.com>.

Hi Sonia,

Sorry I might not have the statistics on the provided two methods, perhaps as input
I could also provide another method: currently there is an eco-project dl-on-flink
that supports running DL frameworks on top of the Flink and it will handle the data
exchange between java and python processes, which would allows to user the native
model directly. 

Best,
Yun


[1] https://github.com/flink-extended/dl-on-flink




------------------------------------------------------------------
From:Sonia-Florina Horchidan <sf...@kth.se>
Send Time:2022 Jan. 7 (Fri.) 17:23
To:user@flink.apache.org <us...@flink.apache.org>
Subject:Serving Machine Learning models



Hello,

I recently started looking into serving Machine Learning models for streaming data in Flink. To give more context, that would involve training a model offline (using PyTorch or TensorFlow), and calling it from inside a Flink job to do online inference on newly arrived data. I have found multiple discussions, presentations, and tools that could achieve this, and it seems like the two alternatives would be: (1) wrap the pre-trained models in a HTTP service (such as PyTorch Serve [1]) and let Flink do async calls for model scoring, or (2) convert the models into a standardized format (e.g., ONNX [2]), pre-load the model in memory for every task manager (or use external storage if needed) and call it for each new data point. 
Both approaches come with a set of advantages and drawbacks and, as far as I understand, there is no "silver bullet", since one approach could be more suitable than the other based on the application requirements. However, I would be curious to know what would be the "recommended" methods for model serving (if any) and what approaches are currently adopted by the users in the wild.
[1] https://pytorch.org/serve/
[2] https://onnx.ai/
Best regards,
Sonia

 [Kth Logo]

Sonia-Florina Horchidan
PhD Student
KTH Royal Institute of Technology
Software and Computer Systems (SCS)
School of Electrical Engineering and Computer Science (EECS)
Mobil: +46769751562
sfhor@kth.se, www.kth.se