You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Danyal Awan <da...@gmail.com> on 2023/05/14 19:24:07 UTC

FlinkMl

hello,

For my master thesis i am comparing ml frameworks on data streams.

What is the current status on FlinkML? Is distributed learning possible on
multiple nodes? If yes, how?

I played around with FlinkML a bit and modeled a simple pipeline for
sentiment analysis on tweets. For this I used the Sentiment 140 dataset
which contains 1.6 million tweets.
Unfortunately I can only use a small amount of data (about 30000 samples)
for training, otherwise Taskmanager gets lost or crashes. I have also
allocated enough memory to taskmanager (JVM heap size is set to 50gb). But
training should also work with more data, right?I have also allocated
enough memory to taskmanager (JVM heap size is set to 50gb).
I have also allocated enough memory to taskmanager (JVM heap size is set to
50gb).

greetings