You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Theodore Vasiloudis <th...@gmail.com> on 2017/04/03 07:51:39 UTC

Re: Flink Scheduling and FlinkML

Hello Fabio,

what you describe sounds very possible, the easiest way to do it would be
to save your incoming data in HDFS as you already do if I understand
correctly,
and then use the batch ALS algorithm [1] to create your recommendations
from the static data, which you could do at regular intervals.

Regards,
Theodore

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/libs/ml/als.html

On Fri, Mar 31, 2017 at 4:10 PM, Fábio Dias <fa...@gmail.com> wrote:

> Hi to all,
>
> I'm building a recommendation system to my application.
> I have a set of logs (that contains the user info, the hour, the button
> that was clicked ect...) that arrive to my Flink by kafka, then I save
> every log in a HDFS (HADOOP), but know I have a problem, I want to apply ML
> to (all) my data.
>
> I think in 2 scenarios:
> First : Transform my DataStream in a DataSet and perform the ML task. It
> is possible?
> Second : Preform a task in flink that get the data from Hadoop and perform
> the ML task.
>
> What is the best way to do it?
>
> I already check the IncrementalLearningSkeleton but I didn't understand
> how to apply that to an actual real case. Is there some complex example
> that I could look?
> (https://github.com/apache/flink/tree/master/flink-
> examples/flink-examples-streaming/src/main/java/org/
> apache/flink/streaming/examples/ml)
>
> Another thing that I would like to ask is how to perform the second
> scenario, where I need to perform this task every hour, what it is the best
> way to do it?
>
> Thanks,
> Fábio Dias.
>