You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@submarine.apache.org by "Kevin Su (Jira)" <ji...@apache.org> on 2021/01/19 06:01:00 UTC

[jira] [Assigned] (SUBMARINE-270) [Umbrella] Submarine-sdk pipeline

     [ https://issues.apache.org/jira/browse/SUBMARINE-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Su reassigned SUBMARINE-270:
----------------------------------

    Assignee: Kevin Su

> [Umbrella] Submarine-sdk pipeline
> ---------------------------------
>
>                 Key: SUBMARINE-270
>                 URL: https://issues.apache.org/jira/browse/SUBMARINE-270
>             Project: Apache Submarine
>          Issue Type: New Feature
>          Components: SDK
>            Reporter: Kevin Su
>            Assignee: Kevin Su
>            Priority: Major
>
> It's very complex from raw data ingestion to push model in production, submarine pipeline is building for deploying portable, scalable machine learning workflow
> Created this JIRA ticket to discuss more detail and plan on submarine pipeline 
>  The pipeline would have two main component
> 1. *workflow orchestrator* - help us manage dependency between each task ,schedule workflow and retry if failure happens. There are 3 ways to build our orchestrator.
>  * airflow - use airflow API to build our pipeline
>  * submarine workflow - [~10110346] suggests built-in [submarine workflow|https://docs.google.com/document/d/1LiRozgumsYadmESQAXJk5gM5GOvB5bXpOiuFew6i9Os/edit#]
>  * abstract orchestrator - support a abstraction layer like [TFX|https://github.com/tensorflow/tfx/blob/master/docs/guide/index.md#portability-and-interoperability], and we can support different orchestration frameworks
> 2. *sdk ML library* - reduce routine ML code development, there are several routine task to build ML pipeline, give some callback function to let user easily preprocessing, train model and others, we may contain different frameworks to deal with both small and large datasets.
>  * preprocessing (Hive,Spark,Pandas)
>  * train (TF, Pytorch)
>  * Evaluation
>  * Model Validator
>  * Pusher
> To find more check the link below, feel free to edit or comment documents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org