You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Flink Jira Bot (Jira)" <ji...@apache.org> on 2022/04/10 10:39:00 UTC

[jira] [Updated] (FLINK-12470) FLIP39: Flink ML pipeline and ML libs

     [ https://issues.apache.org/jira/browse/FLINK-12470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flink Jira Bot updated FLINK-12470:
-----------------------------------
    Labels: auto-deprioritized-major auto-unassigned pull-request-available stale-minor  (was: auto-deprioritized-major auto-unassigned pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help the community manage its development. I see this issues has been marked as Minor but is unassigned and neither itself nor its Sub-Tasks have been updated for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is still Minor, please either assign yourself or give an update. Afterwards, please remove the label or in 7 days the issue will be deprioritized.


> FLIP39: Flink ML pipeline and ML libs
> -------------------------------------
>
>                 Key: FLINK-12470
>                 URL: https://issues.apache.org/jira/browse/FLINK-12470
>             Project: Flink
>          Issue Type: New Feature
>          Components: Library / Machine Learning
>    Affects Versions: 1.9.0
>            Reporter: Shaoxuan Wang
>            Priority: Minor
>              Labels: auto-deprioritized-major, auto-unassigned, pull-request-available, stale-minor
>   Original Estimate: 720h
>          Time Spent: 10m
>  Remaining Estimate: 719h 50m
>
> This is the umbrella Jira for FLIP39, which intents to to enhance the scalability and the ease of use of Flink ML. 
> ML Discussion thread: [http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-39-Flink-ML-pipeline-and-ML-libs-td28633.html]
> Google Doc: (will convert it to an official confluence page very soon ) [https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo|https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit]
> In machine learning, there are mainly two types of people. The first type is MLlib developer. They need a set of standard/well abstracted core ML APIs to implement the algorithms. Every ML algorithm is a certain concrete implementation on top of these APIs. The second type is MLlib users who utilize the existing/packaged MLlib to train or server a model.  It is pretty common that the entire training or inference is constructed by a sequence of transformation or algorithms. It is essential to provide a workflow/pipeline API for MLlib users such that they can easily combine multiple algorithms to describe the ML workflow/pipeline.
> Current Flink has a set of ML core inferences, but they are built on top of dataset API. This does not quite align with the latest flink [roadmap|https://flink.apache.org/roadmap.html] (TableAPI will become the first class citizen and primary API for analytics use cases, while dataset API will be gradually deprecated). Moreover, Flink at present does not have any interface that allows MLlib users to describe an ML workflow/pipeline, nor provides any approach to persist pipeline or model and reuse them in the future. To solve/improve these issues, in this FLIP we propose to:
>  * Provide a new set of ML core interface (on top of Flink TableAPI)
>  * Provide a ML pipeline interface (on top of Flink TableAPI)
>  * Provide the interfaces for parameters management and pipeline persistence
>  * All the above interfaces should facilitate any new ML algorithm. We will gradually add various standard ML algorithms on top of these new proposed interfaces to ensure their feasibility and scalability.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)