You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by ch...@apache.org on 2022/09/13 08:20:11 UTC
[dolphinscheduler] branch dev updated: [Doc][Task Plugin] Fix MLflow task plugin doc (#11905)

This is an automated email from the ASF dual-hosted git repository.

chufenggao pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler.git


The following commit(s) were added to refs/heads/dev by this push:
     new 892d867270 [Doc][Task Plugin] Fix MLflow task plugin doc  (#11905)
892d867270 is described below

commit 892d8672704fc16db9e4416ac258f5c97d18758d
Author: JieguangZhou <ji...@163.com>
AuthorDate: Tue Sep 13 16:20:03 2022 +0800

    [Doc][Task Plugin] Fix MLflow task plugin doc  (#11905)
    
    * update mlflow doc
    
    * fix punctuations
---
 docs/docs/en/guide/task/mlflow.md | 31 +++++++++++---------------
 docs/docs/zh/guide/task/mlflow.md | 47 +++++++++++++++++----------------------
 2 files changed, 34 insertions(+), 44 deletions(-)

diff --git a/docs/docs/en/guide/task/mlflow.md b/docs/docs/en/guide/task/mlflow.md
index e4af9a15e3..b1b76bc4c1 100644
--- a/docs/docs/en/guide/task/mlflow.md
+++ b/docs/docs/en/guide/task/mlflow.md
@@ -5,7 +5,7 @@
 [MLflow](https://mlflow.org) is an excellent open source platform to manage the ML lifecycle, including experimentation,
 reproducibility, deployment, and a central model registry.
 
-MLflow task plugin used to execute MLflow tasks，Currently contains MLflow Projects and MLflow Models. (Model Registry will soon be rewarded for support)
+MLflow task plugin used to execute MLflow tasks, Currently contains MLflow Projects and MLflow Models. (Model Registry will soon be rewarded for support)
 
 - MLflow Projects: Package data science code in a format to reproduce runs on any platform.
 - MLflow Models: Deploy machine learning models in diverse serving environments.
@@ -13,19 +13,14 @@ MLflow task plugin used to execute MLflow tasks，Currently contains MLflow Proj
 
 The MLflow plugin currently supports and will support the following:
 
-- [x] MLflow Projects
-    - [x] BasicAlgorithm: contains LogisticRegression, svm, lightgbm, xgboost
-    - [x] AutoML: AutoML tool，contains autosklean, flaml
-    - [x] Custom projects: Support for running your own MLflow projects
-- [ ] MLflow Models
-    - [x] MLFLOW: Use `MLflow models serve` to deploy a model service
-    - [x] Docker: Run the container after packaging the docker image
-    - [x] Docker Compose: Use docker compose to run the container, it will replace the docker run above
-    - [ ] Seldon core: Use Selcon core to deploy model to k8s cluster
-    - [ ] k8s: Deploy containers directly to K8S
-    - [ ] MLflow deployments: Built-in deployment modules, such as built-in deployment to SageMaker, etc
-- [ ] Model Registry
-    - [ ] Register Model: Allows artifacts (Including model and related parameters, indicators) to be registered directly into the model center
+- MLflow Projects
+    - BasicAlgorithm: contains LogisticRegression, svm, lightgbm, xgboost
+    - AutoML: AutoML tool, contains autosklean, flaml
+    - Custom projects: Support for running your own MLflow projects
+- MLflow Models
+    - MLFLOW: Use `MLflow models serve` to deploy a model service
+    - Docker: Run the container after packaging the docker image
+    - Docker Compose: Use docker compose to run the container, it will replace the docker run above
 
 
 
@@ -64,9 +59,9 @@ The MLflow plugin currently supports and will support the following:
 | Register Model | Register the model or not. If register is selected, the following parameters are expanded. |
 | Model Name | The registered model name is added to the original model version and registered as Production. |
 | Data Path | The absolute path of the file or folder. Ends with .csv for file or contain train.csv and test.csv for folder（In the suggested way, users should build their own test sets for model evaluation. |
-| Parameters | Parameter when initializing the algorithm/AutoML model, which can be empty. For example, parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml 。The convention will be passed with '; ' shards each parameter, using the name before the equal sign as the parameter name, and using the name after the equal sign to get the corresponding parameter value through `python eval()`. <ul><li>[Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear [...]
+| Parameters | Parameter when initializing the algorithm/AutoML model, which can be empty. For example, parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml. The convention will be passed with '; ' shards each parameter, using the name before the equal sign as the parameter name, and using the name after the equal sign to get the corresponding parameter value through `python eval()`. <ul><li>[Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear [...]
 | Algorithm |The selected algorithm currently supports `LR`, `SVM`, `LightGBM` and `XGboost` based on [scikit-learn](https://scikit-learn.org/) form. |
-| Parameter Search Space | Parameter search space when running the corresponding algorithm, which can be empty. For example, the parameter `max_depth=[5, 10];n_estimators=[100, 200]` for lightgbm 。The convention will be passed with '; 'shards each parameter, using the name before the equal sign as the parameter name, and using the name after the equal sign to get the corresponding parameter value through `python eval()`. |
+| Parameter Search Space | Parameter search space when running the corresponding algorithm, which can be empty. For example, the parameter `max_depth=[5, 10];n_estimators=[100, 200]` for lightgbm. The convention will be passed with '; 'shards each parameter, using the name before the equal sign as the parameter name, and using the name after the equal sign to get the corresponding parameter value through `python eval()`. |
 
 #### AutoML
 
@@ -89,8 +84,8 @@ The MLflow plugin currently supports and will support the following:
 | **Parameter** | **Description** |
 | ------- | ---------- |
 | parameters | `--param-list` in `mlflow run`. For example `-P learning_rate=0.2 -P colsample_bytree=0.8 -P subsample=0.9`. |
-| Repository | Repository url of MLflow Project，Support git address and directory on worker. If it's in a subdirectory，We add `#` to support this (same as `mlflow run`) , for example `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`. |
-| Project Version | Version of the project，default master. |
+| Repository | Repository url of MLflow Project, Support git address and directory on worker. If it's in a subdirectory, We add `#` to support this (same as `mlflow run`) , for example `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`. |
+| Project Version | Version of the project, default master. |
 
 You can now use this feature to run all MLFlow projects on Github (For example [MLflow examples](https://github.com/mlflow/mlflow/tree/master/examples) ). You can also create your own machine learning library to reuse your work, and then use DolphinScheduler to use your library with one click.
 
diff --git a/docs/docs/zh/guide/task/mlflow.md b/docs/docs/zh/guide/task/mlflow.md
index 2cebb8506e..0662c8a7fe 100644
--- a/docs/docs/zh/guide/task/mlflow.md
+++ b/docs/docs/zh/guide/task/mlflow.md
@@ -4,7 +4,7 @@
 
 [MLflow](https://mlflow.org) 是一个MLops领域一个优秀的开源项目， 用于管理机器学习的生命周期，包括实验、可再现性、部署和中心模型注册。
 
-MLflow 组件用于执行 MLflow 任务，目前包含Mlflow Projects, 和MLflow Models。（Model Registry将在不就的将来支持）。
+MLflow 组件用于执行 MLflow 任务，目前包含Mlflow Projects，和MLflow Models。（Model Registry将在不就的将来支持）。
 
 - MLflow Projects: 将代码打包，并可以运行到任务的平台上。
 - MLflow Models: 在不同的服务环境中部署机器学习模型。
@@ -12,19 +12,14 @@ MLflow 组件用于执行 MLflow 任务，目前包含Mlflow Projects, 和MLflow
 
 目前 Mlflow 组件支持的和即将支持的内容如下中：
 
-- [x] MLflow Projects
-  - [x] BasicAlgorithm: 基础算法，包含LogisticRegression, svm, lightgbm, xgboost
-  - [x] AutoML: AutoML工具，包含autosklean, flaml
-  - [x] Custom projects: 支持运行自己的MLflow Projects项目
-- [ ] MLflow Models
-  - [x] MLFLOW: 直接使用 `mlflow models serve` 部署模型。
-  - [x] Docker: 打包 DOCKER 镜像后部署模型。
-  - [x] Docker Compose: 使用Docker Compose 部署模型，将会取代上面的Docker部署。
-  - [ ] Seldon core: 构建完镜像后，使用Seldon Core 部署到k8s集群上, 可以使用Seldon Core的生成模型管理能力。
-  - [ ] k8s: 构建完镜像后， 部署到k8s集群上。
-  - [ ] MLflow deployments: 内置的允许MLflow 部署模块, 如内置的部署到Sagemaker等。
-- [ ] Model Registry
-  - [ ] Register Model: 注册相关工件(模型以及相关的参数，指标)到模型中心
+- MLflow Projects
+  - BasicAlgorithm: 基础算法，包含LogisticRegression， svm， lightgbm， xgboost
+  - AutoML: AutoML工具，包含autosklean， flaml
+  - Custom projects: 支持运行自己的MLflow Projects项目
+- MLflow Models
+  - MLFLOW: 直接使用 `mlflow models serve` 部署模型。
+  - Docker: 打包 DOCKER 镜像后部署模型。
+  - Docker Compose: 使用Docker Compose 部署模型，将会取代上面的Docker部署。
 
 ## 创建任务
 
@@ -51,8 +46,8 @@ MLflow 组件用于执行 MLflow 任务，目前包含Mlflow Projects, 和MLflow
 
 以下是一些MLflow 组件的常用参数
 
-- **MLflow Tracking Server URI** ：MLflow Tracking Server 的连接, 默认 http://localhost:5000。
-- **实验名称** ：任务运行时所在的实验，若实验不存在，则创建。若实验名称为空，则设置为`Default`, 与 MLflow 一样。
+- **MLflow Tracking Server URI** ：MLflow Tracking Server 的连接，默认 http://localhost:5000。
+- **实验名称** ：任务运行时所在的实验，若实验不存在，则创建。若实验名称为空，则设置为`Default`，与 MLflow 一样。
 
 ### MLflow Projects
 
@@ -64,14 +59,14 @@ MLflow 组件用于执行 MLflow 任务，目前包含Mlflow Projects, 和MLflow
 
 - **注册模型** ：是否注册模型，若选择注册，则会展开以下参数。
     - **注册的模型名称** : 注册的模型名称，会在原来的基础上加上一个模型版本，并注册为Production。
-- **数据路径** : 文件/文件夹的绝对路径, 若文件需以.csv结尾（自动切分训练集与测试集）, 文件夹需包含train.csv和test.csv（建议方式，用户应自行构建测试集用于模型评估）。
+- **数据路径** : 文件/文件夹的绝对路径，若文件需以.csv结尾（自动切分训练集与测试集），文件夹需包含train.csv和test.csv（建议方式，用户应自行构建测试集用于模型评估）。
 详细的参数列表如下:
   - [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)
   - [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC)
   - [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier)
   - [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier)
-- **算法** ：选择的算法，目前基于 [scikit-learn](https://scikit-learn.org/) 形式支持 `lr`, `svm`, `lightgbm`, `xgboost`。
-- **参数搜索空间** : 运行对应算法的参数搜索空间, 可为空。如针对lightgbm 的 `max_depth=[5, 10];n_estimators=[100, 200]` 则会进行对应搜索。约定传入后会以;切分各个参数，等号前的名字作为参数名，等号后的名字将以python eval执行得到对应的参数值
+- **算法** ：选择的算法，目前基于 [scikit-learn](https://scikit-learn.org/) 形式支持 `lr`，`svm`，`lightgbm`，`xgboost`。
+- **参数搜索空间** : 运行对应算法的参数搜索空间，可为空。如针对lightgbm 的 `max_depth=[5, 10];n_estimators=[100, 200]` 则会进行对应搜索。约定传入后会以;切分各个参数，等号前的名字作为参数名，等号后的名字将以python eval执行得到对应的参数值
 
 #### AutoML
 
@@ -81,12 +76,12 @@ MLflow 组件用于执行 MLflow 任务，目前包含Mlflow Projects, 和MLflow
 
 - **注册模型** ：是否注册模型，若选择注册，则会展开以下参数。
     - **注册的模型名称** : 注册的模型名称，会在原来的基础上加上一个模型版本，并注册为Production。
-- **数据路径** : 文件/文件夹的绝对路径, 若文件需以.csv结尾（自动切分训练集与测试集）, 文件夹需包含train.csv和test.csv（建议方式，用户应自行构建测试集用于模型评估）。
-- **参数** : 初始化AutoML训练器时的参数，可为空, 如针对 flaml 设置`time_budget=30;estimator_list=['lgbm']`。约定传入后会以; 切分各个参数，等号前的名字作为参数名，等号后的名字将以python eval执行得到对应的参数值。详细的参数列表如下:
+- **数据路径** : 文件/文件夹的绝对路径，若文件需以.csv结尾（自动切分训练集与测试集），文件夹需包含train.csv和test.csv（建议方式，用户应自行构建测试集用于模型评估）。
+- **参数** : 初始化AutoML训练器时的参数，可为空，如针对 flaml 设置`time_budget=30;estimator_list=['lgbm']`。约定传入后会以; 切分各个参数，等号前的名字作为参数名，等号后的名字将以python eval执行得到对应的参数值。详细的参数列表如下:
   - [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects)
   - [autosklearn](https://automl.github.io/auto-sklearn/master/api.html)
 - **AutoML工具** : 使用的AutoML工具，目前支持 [autosklearn](https://github.com/automl/auto-sklearn)
-  , [flaml](https://github.com/microsoft/FLAML)。
+  ，[flaml](https://github.com/microsoft/FLAML)。
 
 #### Custom projects
 
@@ -95,7 +90,7 @@ MLflow 组件用于执行 MLflow 任务，目前包含Mlflow Projects, 和MLflow
 **任务参数**
 
 - **参数** : `mlflow run`中的 --param-list 如 `-P learning_rate=0.2 -P colsample_bytree=0.8 -P subsample=0.9`
-- **运行仓库** : MLflow Project的仓库地址，可以为github地址，或者worker上的目录, 如MLflow project位于子目录，可以添加 `#` 隔开, 如 `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`
+- **运行仓库** : MLflow Project的仓库地址，可以为github地址，或者worker上的目录，如MLflow project位于子目录，可以添加 `#` 隔开，如 `https://github.com/mlflow/mlflow#examples/xgboost/xgboost_native`
 - **项目版本** : 对应项目中git版本管理中的版本，默认 master
 
 现在你可以使用这个功能来运行github上所有的MLflow Projects (如 [MLflow examples](https://github.com/mlflow/mlflow/tree/master/examples) )了。你也可以创建自己的机器学习库，用来复用你的研究成果，以后你就可以使用DolphinScheduler来一键操作使用你的算法库。
@@ -105,7 +100,7 @@ MLflow 组件用于执行 MLflow 任务，目前包含Mlflow Projects, 和MLflow
 
 常用参数:
 
-- **部署模型的URI** ：MLflow 服务里面模型对应的URI, 支持 `models:/<model_name>/suffix` 格式 和 `runs:/` 格式。
+- **部署模型的URI** ：MLflow 服务里面模型对应的URI，支持 `models:/<model_name>/suffix` 格式 和 `runs:/` 格式。
 - **监听端口** ：部署服务时的端口。
 
 #### MLFLOW
@@ -120,8 +115,8 @@ MLflow 组件用于执行 MLflow 任务，目前包含Mlflow Projects, 和MLflow
 
 ![mlflow-models-docker-compose](../../../../img/tasks/demo/mlflow-models-docker-compose.png)
 
-- **最大CPU限制** ：如 `1.0` 或者 `0.5`, 与 docker compose 一致。
-- **最大内存限制** ：如 `1G` 或者 `500M`, 与 docker compose 一致。
+- **最大CPU限制** ：如 `1.0` 或者 `0.5`，与 docker compose 一致。
+- **最大内存限制** ：如 `1G` 或者 `500M`，与 docker compose 一致。
 
 ## 环境准备