You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Zhipeng Zhang (Jira)" <ji...@apache.org> on 2022/05/28 12:42:00 UTC

[jira] [Created] (FLINK-27826) Support machine learning training for high dimesional models

Zhipeng Zhang created FLINK-27826:
-------------------------------------

             Summary: Support machine learning training for high dimesional models
                 Key: FLINK-27826
                 URL: https://issues.apache.org/jira/browse/FLINK-27826
             Project: Flink
          Issue Type: New Feature
          Components: Library / Machine Learning
            Reporter: Zhipeng Zhang
            Assignee: Zhipeng Zhang


There is limited support for training high dimensional machine learning models in FlinkML though it is often useful especially in industrial cases. When the size of the model parameter can not be hold in the memory of a single machine, FlinkML crashes now.

So it is useful to support high dimensional model training in FlinkML. To achieve this, we probably need to do the following things:
 # Do a survey on how to training large machine learning models of existing machine learning systems (e.g. data paralllel, model parallel)
 # Define/Implement the infra of supporting large model training in FlinkML
 # Implement a logistic regression model that can train models with more than ten billion parameters
 # Benchmark the implementation and further improve it



--
This message was sent by Atlassian Jira
(v8.20.7#820007)