You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Christophe Jolif <cj...@gmail.com> on 2018/02/05 15:43:53 UTC

ML and Stream

Hi all,

Sorry, this is me again with another question.

Maybe I did not search deep enough, but it seems the FlinkML API is still
pure batch.

If I read
https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap
it seems there was the intend to "exploit the streaming nature of Flink,
and provide functionality designed specifically for data streams" but from
my external point of view, I don't see much happening here. Is there work
in progress towards that?

I would personally see two use-cases around streaming, first one around
updating an existing model that was build in batch, second one would be
triggering prediction not through a batch job but in a stream job.

Are these things that are in the works? or maybe already feasible despite
the API looking like purely batch branded?

Thanks,
-- 
Christophe

Re: ML and Stream

Posted by Fabian Hueske <fh...@gmail.com>.

That's correct.
It's not possible to persist data in memory across jobs in Flink's batch
API.

Best, Fabian

2018-02-05 18:28 GMT+01:00 Christophe Jolif <cj...@gmail.com>:

> Fabian,
>
> Ok thanks for the update. Meanwhile I was looking at how I could still
> leverage current FlinkML API, but as far as I can see, it misses the
> ability of being able to persist its own models? So even for pure batch it
> prevents running your (once built) model in several jobs? Or am I missing
> something?
>
> I suspect I should not be the only one that would love to apply machine
> learning as part of a Flink Processing? Waiting for FLIP-23 what are the
> "best" practices today?
>
> Thanks again for your help,
> --
> Christophe
>
> On Mon, Feb 5, 2018 at 6:01 PM, Fabian Hueske <fh...@gmail.com> wrote:
>
>> Hi Christophe,
>>
>> it is true that FlinkML only targets batch workloads. Also, there has not
>> been any development since a long time.
>>
>> In March last year, a discussion was started on the dev mailing list
>> about different machine learning features for stream processing [1].
>> One result of this discussion was FLIP-23 [2] which will add a library
>> for model serving to Flink, i.e., it can load (and update) machine learning
>> models and evaluate them on a stream.
>> If you dig through the mailing list thread, you'll find a link to a
>> Google doc that discusses other possible directions.
>>
>> Best, Fabian
>>
>> [1] https://lists.apache.org/thread.html/eeb80481f3723c160bc923d
>> 689416a352d6df4aad98fe7424bf33132@%3Cdev.flink.apache.org%3E
>> [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+
>> Model+Serving
>>
>> 2018-02-05 16:43 GMT+01:00 Christophe Jolif <cj...@gmail.com>:
>>
>>> Hi all,
>>>
>>> Sorry, this is me again with another question.
>>>
>>> Maybe I did not search deep enough, but it seems the FlinkML API is
>>> still pure batch.
>>>
>>> If I read https://cwiki.apache.org/confluence/display/FLINK/Flink
>>> ML%3A+Vision+and+Roadmap it seems there was the intend to "exploit the
>>> streaming nature of Flink, and provide functionality designed
>>> specifically for data streams" but from my external point of view, I don't
>>> see much happening here. Is there work in progress towards that?
>>>
>>> I would personally see two use-cases around streaming, first one around
>>> updating an existing model that was build in batch, second one would be
>>> triggering prediction not through a batch job but in a stream job.
>>>
>>> Are these things that are in the works? or maybe already feasible
>>> despite the API looking like purely batch branded?
>>>
>>>
>>>

Re: ML and Stream

Posted by Christophe Jolif <cj...@gmail.com>.

Fabian,

Ok thanks for the update. Meanwhile I was looking at how I could still
leverage current FlinkML API, but as far as I can see, it misses the
ability of being able to persist its own models? So even for pure batch it
prevents running your (once built) model in several jobs? Or am I missing
something?

I suspect I should not be the only one that would love to apply machine
learning as part of a Flink Processing? Waiting for FLIP-23 what are the
"best" practices today?

Thanks again for your help,
--
Christophe

On Mon, Feb 5, 2018 at 6:01 PM, Fabian Hueske <fh...@gmail.com> wrote:

> Hi Christophe,
>
> it is true that FlinkML only targets batch workloads. Also, there has not
> been any development since a long time.
>
> In March last year, a discussion was started on the dev mailing list about
> different machine learning features for stream processing [1].
> One result of this discussion was FLIP-23 [2] which will add a library for
> model serving to Flink, i.e., it can load (and update) machine learning
> models and evaluate them on a stream.
> If you dig through the mailing list thread, you'll find a link to a Google
> doc that discusses other possible directions.
>
> Best, Fabian
>
> [1] https://lists.apache.org/thread.html/eeb80481f3723c160bc923d689416a
> 352d6df4aad98fe7424bf33132@%3Cdev.flink.apache.org%3E
> [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> 23+-+Model+Serving
>
> 2018-02-05 16:43 GMT+01:00 Christophe Jolif <cj...@gmail.com>:
>
>> Hi all,
>>
>> Sorry, this is me again with another question.
>>
>> Maybe I did not search deep enough, but it seems the FlinkML API is still
>> pure batch.
>>
>> If I read https://cwiki.apache.org/confluence/display/FLINK/Flink
>> ML%3A+Vision+and+Roadmap it seems there was the intend to "exploit the
>> streaming nature of Flink, and provide functionality designed
>> specifically for data streams" but from my external point of view, I don't
>> see much happening here. Is there work in progress towards that?
>>
>> I would personally see two use-cases around streaming, first one around
>> updating an existing model that was build in batch, second one would be
>> triggering prediction not through a batch job but in a stream job.
>>
>> Are these things that are in the works? or maybe already feasible despite
>> the API looking like purely batch branded?
>>
>>
>>

Re: ML and Stream

Posted by Fabian Hueske <fh...@gmail.com>.

Hi Christophe,

it is true that FlinkML only targets batch workloads. Also, there has not
been any development since a long time.

In March last year, a discussion was started on the dev mailing list about
different machine learning features for stream processing [1].
One result of this discussion was FLIP-23 [2] which will add a library for
model serving to Flink, i.e., it can load (and update) machine learning
models and evaluate them on a stream.
If you dig through the mailing list thread, you'll find a link to a Google
doc that discusses other possible directions.

Best, Fabian

[1]
https://lists.apache.org/thread.html/eeb80481f3723c160bc923d689416a352d6df4aad98fe7424bf33132@%3Cdev.flink.apache.org%3E
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+Model+Serving

2018-02-05 16:43 GMT+01:00 Christophe Jolif <cj...@gmail.com>:

> Hi all,
>
> Sorry, this is me again with another question.
>
> Maybe I did not search deep enough, but it seems the FlinkML API is still
> pure batch.
>
> If I read https://cwiki.apache.org/confluence/display/FLINK/
> FlinkML%3A+Vision+and+Roadmap it seems there was the intend to "exploit
> the streaming nature of Flink, and provide functionality designed
> specifically for data streams" but from my external point of view, I don't
> see much happening here. Is there work in progress towards that?
>
> I would personally see two use-cases around streaming, first one around
> updating an existing model that was build in batch, second one would be
> triggering prediction not through a batch job but in a stream job.
>
> Are these things that are in the works? or maybe already feasible despite
> the API looking like purely batch branded?
>
> Thanks,
> --
> Christophe
>