You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by "cdmikechen (Jira)" <ji...@apache.org> on 2022/11/05 08:56:00 UTC

[jira] [Updated] (SUBMARINE-857) [Umbrella] Support model management SDK in distributed scenerios

     [ https://issues.apache.org/jira/browse/SUBMARINE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

cdmikechen updated SUBMARINE-857:
---------------------------------
    Target Version: 0.9.0

> [Umbrella] Support model management SDK in distributed scenerios
> ----------------------------------------------------------------
>
>                 Key: SUBMARINE-857
>                 URL: https://issues.apache.org/jira/browse/SUBMARINE-857
>             Project: Apache Submarine
>          Issue Type: Task
>            Reporter: Byron Hsu
>            Assignee: Byron Hsu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.6.0
>
>
> Submarine is a platform designed for distributed training, so its model management SDK should be easier to use in distributed scenarios.
>  In a general distributed experiment, there are several workers training together.
>  Our model management toolkit will support:
>  1. The workers in the same experiment will automatically direct their logs to the same group in mlflow, so users can monitor multiple workers' info in one graph.
>  2. When saving models, users do not need to store all the workers' because some are replicated or redundant. Calling save_model in our toolkit, we will apply the most efficient saving strategy under the hood, which can cost the least space and time.
> The API design doc can be viewed here: [https://hackmd.io/I6frSeZIQDaKQYK4nGCR5w?both]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org