You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by zh...@apache.org on 2022/07/06 07:56:47 UTC

[dolphinscheduler] branch dev updated: [Feature] Enable users to create python env from requirements.txt (#10658)

This is an automated email from the ASF dual-hosted git repository.

zhongjiajie pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler.git


The following commit(s) were added to refs/heads/dev by this push:
     new 71f0168510 [Feature] Enable users to create python env from requirements.txt (#10658)
71f0168510 is described below

commit 71f016851093c18376e3bb301bf6e41dee215706
Author: Eric Gao <er...@gmail.com>
AuthorDate: Wed Jul 6 15:56:39 2022 +0800

    [Feature] Enable users to create python env from requirements.txt (#10658)
---
 docs/docs/en/guide/task/jupyter.md                 | 52 +++++++++++++++++++-
 docs/docs/zh/guide/task/jupyter.md                 | 49 +++++++++++++++++++
 .../dolphinscheduler/spi/utils/DateUtils.java      |  8 ++++
 .../plugin/task/jupyter/JupyterConstants.java      | 27 +++++++++++
 .../plugin/task/jupyter/JupyterTask.java           | 20 +++++++-
 .../plugin/task/jupyter/JupyterTaskTest.java       | 56 ++++++++++++++++++++--
 6 files changed, 204 insertions(+), 8 deletions(-)

diff --git a/docs/docs/en/guide/task/jupyter.md b/docs/docs/en/guide/task/jupyter.md
index 648c42a651..9cf769634a 100644
--- a/docs/docs/en/guide/task/jupyter.md
+++ b/docs/docs/en/guide/task/jupyter.md
@@ -26,7 +26,8 @@ Click [here](https://docs.conda.io/en/latest/) for more information about `conda
 
 1. Use [Conda-Pack](https://conda.github.io/conda-pack/) to pack your conda environment into `tarball`.
 2. Upload packed conda environment to `resource center`.
-3. Select your packed conda environment as `resource` in your `jupyter task`, e.g. `jupyter_env.tar.gz`.
+3. Set `condaEnvName` as the name of your packed conda environment in your `jupyter task`, e.g. `jupyter_env.tar.gz`.
+4. Select your packed conda environment as `resource` in your `jupyter task`, e.g. `jupyter_env.tar.gz`.
 
 > NOTE: Make sure you follow the [Conda-Pack](https://conda.github.io/conda-pack/) official instructions. 
 > If you unpack your packed conda environment, the directory structure should be the same as below:
@@ -46,6 +47,55 @@ Click [here](https://docs.conda.io/en/latest/) for more information about `conda
 > `Jupyter Task Plugin` uses `source` command to activate your packed conda environment.
 > If you are concerned about using `source`, choose other options to manage your python dependency.   
 
+### Construct From Requirements
+
+1. Upload or create a `.txt` file of requirements with your python dependencies in `Resource Center`.
+2. Set `condaEnvName` as the name of your file of requirements in your `jupyter task`, e.g. `requirements.txt`.
+3. Select your file of requirements as `resource` in your `jupyter task`, e.g. `requirements.txt`.
+
+Here is an example file of requirements, from which `jupyter task plugin` will automatically 
+construct your python dependencies, run your python code and finally tear down the environment:
+
+```text
+fastjsonschema==2.15.3
+fonttools==4.33.3
+geojson==2.5.0
+identify==2.4.11
+idna==3.3
+importlib-metadata==4.11.3
+importlib-resources==5.7.1
+ipykernel==5.5.6
+ipython==8.2.0
+ipython-genutils==0.2.0
+jedi==0.18.1
+Jinja2==3.1.1
+json5==0.9.6
+jsonschema==4.4.0
+jupyter-client==7.3.0
+jupyter-core==4.10.0
+jupyter-server==1.17.0
+jupyterlab==3.3.4
+jupyterlab-pygments==0.2.2
+jupyterlab-server==2.13.0
+kiwisolver==1.4.2
+MarkupSafe==2.1.1
+matplotlib==3.5.2
+matplotlib-inline==0.1.3
+mistune==0.8.4
+nbclassic==0.3.7
+nbclient==0.6.0
+nbconvert==6.5.0
+nbformat==5.3.0
+nest-asyncio==1.5.5
+notebook==6.4.11
+notebook-shim==0.1.0
+numpy==1.22.3
+packaging==21.3
+pandas==1.4.2
+pandocfilters==1.5.0
+papermill==2.3.4
+``` 
+
 ## Create Task
 
 - Click `Project Management-Project Name-Workflow Definition`, and click the `Create Workflow` button to enter the DAG editing page.
diff --git a/docs/docs/zh/guide/task/jupyter.md b/docs/docs/zh/guide/task/jupyter.md
index 1698630aed..59ac4628a9 100644
--- a/docs/docs/zh/guide/task/jupyter.md
+++ b/docs/docs/zh/guide/task/jupyter.md
@@ -45,6 +45,55 @@
 > `Jupyter任务插件`使用`source`命令激活您打包的conda环境。
 > 若您对使用`source`命令有安全性上的担忧,请使用其他方法管理您的python依赖。   
 
+### 由依赖需求文本文件临时构建
+
+1. 在`资源中心`创建或上传`.txt`格式的python依赖需求文本文件。
+2. 将`jupyter任务`中的`condaEnvName`参数设置成您的python依赖需求文本文件,如`requirements.txt`。
+3. 在您`jupyter任务`的`资源`中选取您的python依赖需求文本文件,如`requirements.txt`。
+
+如下是一个依赖需求文本文件的样例,通过该文件,`jupyter任务插件`会自动构建您的python依赖,并执行您的python代码,
+执行完成后会自动释放临时构建的环境。 
+
+```text
+fastjsonschema==2.15.3
+fonttools==4.33.3
+geojson==2.5.0
+identify==2.4.11
+idna==3.3
+importlib-metadata==4.11.3
+importlib-resources==5.7.1
+ipykernel==5.5.6
+ipython==8.2.0
+ipython-genutils==0.2.0
+jedi==0.18.1
+Jinja2==3.1.1
+json5==0.9.6
+jsonschema==4.4.0
+jupyter-client==7.3.0
+jupyter-core==4.10.0
+jupyter-server==1.17.0
+jupyterlab==3.3.4
+jupyterlab-pygments==0.2.2
+jupyterlab-server==2.13.0
+kiwisolver==1.4.2
+MarkupSafe==2.1.1
+matplotlib==3.5.2
+matplotlib-inline==0.1.3
+mistune==0.8.4
+nbclassic==0.3.7
+nbclient==0.6.0
+nbconvert==6.5.0
+nbformat==5.3.0
+nest-asyncio==1.5.5
+notebook==6.4.11
+notebook-shim==0.1.0
+numpy==1.22.3
+packaging==21.3
+pandas==1.4.2
+pandocfilters==1.5.0
+papermill==2.3.4
+``` 
+
 ## 创建任务
 
 - 点击项目管理-项目名称-工作流定义,点击"创建工作流"按钮,进入DAG编辑页面。
diff --git a/dolphinscheduler-spi/src/main/java/org/apache/dolphinscheduler/spi/utils/DateUtils.java b/dolphinscheduler-spi/src/main/java/org/apache/dolphinscheduler/spi/utils/DateUtils.java
index 4f4a8e7a17..695e70a7ba 100644
--- a/dolphinscheduler-spi/src/main/java/org/apache/dolphinscheduler/spi/utils/DateUtils.java
+++ b/dolphinscheduler-spi/src/main/java/org/apache/dolphinscheduler/spi/utils/DateUtils.java
@@ -429,4 +429,12 @@ public class DateUtils {
         }
         return TimeZone.getTimeZone(timezoneId);
     }
+
+    /**
+     * get timestamp in String
+     * PowerMock 2.0.9 fails to mock System.currentTimeMillis(), this method helps in UT
+     */
+    public static String getTimestampString() {
+        return String.valueOf(System.currentTimeMillis());
+    }
 }
diff --git a/dolphinscheduler-task-plugin/dolphinscheduler-task-jupyter/src/main/java/org/apache/dolphinscheduler/plugin/task/jupyter/JupyterConstants.java b/dolphinscheduler-task-plugin/dolphinscheduler-task-jupyter/src/main/java/org/apache/dolphinscheduler/plugin/task/jupyter/JupyterConstants.java
index 8b4069c048..7585311eb3 100644
--- a/dolphinscheduler-task-plugin/dolphinscheduler-task-jupyter/src/main/java/org/apache/dolphinscheduler/plugin/task/jupyter/JupyterConstants.java
+++ b/dolphinscheduler-task-plugin/dolphinscheduler-task-jupyter/src/main/java/org/apache/dolphinscheduler/plugin/task/jupyter/JupyterConstants.java
@@ -23,6 +23,16 @@ public class JupyterConstants {
         throw new IllegalStateException("Utility class");
     }
 
+    /**
+     * execution flag, ignore errors and keep executing till the end
+     */
+    public static final String EXECUTION_FLAG = "set +e";
+
+    /**
+     * new line symbol
+     */
+    public static final String NEW_LINE_SYMBOL = "\n";
+
     /**
      * conda init
      */
@@ -40,11 +50,28 @@ public class JupyterConstants {
             "tar -xzf %s -C jupyter_env && " +
             "source jupyter_env/bin/activate";
 
+    /**
+     * create and activate tmp conda env from txt
+     */
+    public static final String CREATE_ENV_FROM_TXT = "conda create -n jupyter-tmp-env-%s -y && " +
+            "conda activate jupyter-tmp-env-%s && " +
+            "pip install -r %s";
+
+    /**
+     * remove tmp conda env
+     */
+    public static final String REMOVE_ENV = "conda deactivate && conda remove --name jupyter-tmp-env-%s --all -y";
+
     /**
      * file suffix tar.gz
      */
     public static final String TAR_SUFFIX = ".tar.gz";
 
+    /**
+     * file suffix .txt
+     */
+    public static final String TXT_SUFFIX = ".txt";
+
     /**
      * jointer to combine two command
      */
diff --git a/dolphinscheduler-task-plugin/dolphinscheduler-task-jupyter/src/main/java/org/apache/dolphinscheduler/plugin/task/jupyter/JupyterTask.java b/dolphinscheduler-task-plugin/dolphinscheduler-task-jupyter/src/main/java/org/apache/dolphinscheduler/plugin/task/jupyter/JupyterTask.java
index cec72c9601..0ce6052bdd 100644
--- a/dolphinscheduler-task-plugin/dolphinscheduler-task-jupyter/src/main/java/org/apache/dolphinscheduler/plugin/task/jupyter/JupyterTask.java
+++ b/dolphinscheduler-task-plugin/dolphinscheduler-task-jupyter/src/main/java/org/apache/dolphinscheduler/plugin/task/jupyter/JupyterTask.java
@@ -28,6 +28,7 @@ import org.apache.dolphinscheduler.plugin.task.api.parameters.AbstractParameters
 import org.apache.dolphinscheduler.plugin.task.api.parser.ParamUtils;
 import org.apache.dolphinscheduler.plugin.task.api.parser.ParameterUtils;
 import org.apache.dolphinscheduler.plugin.task.api.utils.MapUtils;
+import org.apache.dolphinscheduler.spi.utils.DateUtils;
 import org.apache.dolphinscheduler.spi.utils.JSONUtils;
 import org.apache.dolphinscheduler.spi.utils.PropertyUtils;
 import org.apache.dolphinscheduler.spi.utils.StringUtils;
@@ -104,12 +105,20 @@ public class JupyterTask extends AbstractTaskExecutor {
          */
         List<String> args = new ArrayList<>();
         final String condaPath = PropertyUtils.getString(TaskConstants.CONDA_PATH);
+        final String timestamp = DateUtils.getTimestampString();
+        String condaEnvName = jupyterParameters.getCondaEnvName();
+        if (condaEnvName.endsWith(JupyterConstants.TXT_SUFFIX)) {
+            args.add(JupyterConstants.EXECUTION_FLAG);
+            args.add(JupyterConstants.NEW_LINE_SYMBOL);
+        }
+
         args.add(JupyterConstants.CONDA_INIT);
         args.add(condaPath);
         args.add(JupyterConstants.JOINTER);
-        String condaEnvName = jupyterParameters.getCondaEnvName();
         if (condaEnvName.endsWith(JupyterConstants.TAR_SUFFIX)) {
             args.add(String.format(JupyterConstants.CREATE_ENV_FROM_TAR, condaEnvName));
+        } else if (condaEnvName.endsWith(JupyterConstants.TXT_SUFFIX)) {
+            args.add(String.format(JupyterConstants.CREATE_ENV_FROM_TXT, timestamp, timestamp, condaEnvName));
         } else {
             args.add(JupyterConstants.CONDA_ACTIVATE);
             args.add(jupyterParameters.getCondaEnvName());
@@ -126,10 +135,17 @@ public class JupyterTask extends AbstractTaskExecutor {
         // populate jupyter options
         args.addAll(populateJupyterOptions());
 
+        // remove tmp conda env, if created from requirements.txt
+        if (condaEnvName.endsWith(JupyterConstants.TXT_SUFFIX)) {
+            args.add(JupyterConstants.NEW_LINE_SYMBOL);
+            args.add(String.format(JupyterConstants.REMOVE_ENV, timestamp));
+        }
+
         // replace placeholder, and combining local and global parameters
         Map<String, Property> paramsMap = taskExecutionContext.getPrepareParamsMap();
 
-        String command = ParameterUtils.convertParameterPlaceholders(String.join(" ", args), ParamUtils.convert(paramsMap));
+        String command = ParameterUtils
+                .convertParameterPlaceholders(String.join(" ", args), ParamUtils.convert(paramsMap));
 
         logger.info("jupyter task command: {}", command);
 
diff --git a/dolphinscheduler-task-plugin/dolphinscheduler-task-jupyter/src/test/java/org/apache/dolphinscheduler/plugin/task/jupyter/JupyterTaskTest.java b/dolphinscheduler-task-plugin/dolphinscheduler-task-jupyter/src/test/java/org/apache/dolphinscheduler/plugin/task/jupyter/JupyterTaskTest.java
index c55aa4935a..3007a4ee25 100644
--- a/dolphinscheduler-task-plugin/dolphinscheduler-task-jupyter/src/test/java/org/apache/dolphinscheduler/plugin/task/jupyter/JupyterTaskTest.java
+++ b/dolphinscheduler-task-plugin/dolphinscheduler-task-jupyter/src/test/java/org/apache/dolphinscheduler/plugin/task/jupyter/JupyterTaskTest.java
@@ -19,8 +19,8 @@ package org.apache.dolphinscheduler.plugin.task.jupyter;
 
 
 import org.apache.dolphinscheduler.plugin.task.api.TaskExecutionContext;
+import org.apache.dolphinscheduler.spi.utils.DateUtils;
 import org.apache.dolphinscheduler.spi.utils.JSONUtils;
-
 import org.apache.dolphinscheduler.spi.utils.PropertyUtils;
 import org.junit.Assert;
 import org.junit.Test;
@@ -30,16 +30,15 @@ import org.powermock.core.classloader.annotations.PowerMockIgnore;
 import org.powermock.core.classloader.annotations.PrepareForTest;
 import org.powermock.core.classloader.annotations.SuppressStaticInitializationFor;
 import org.powermock.modules.junit4.PowerMockRunner;
-import org.apache.dolphinscheduler.plugin.task.api.TaskConstants;
-
 import static org.mockito.ArgumentMatchers.any;
 import static org.powermock.api.mockito.PowerMockito.spy;
 import static org.powermock.api.mockito.PowerMockito.when;
 
 @RunWith(PowerMockRunner.class)
 @PrepareForTest({
-    JSONUtils.class,
-    PropertyUtils.class,
+        JSONUtils.class,
+        PropertyUtils.class,
+        DateUtils.class
 })
 @PowerMockIgnore({"javax.*"})
 @SuppressStaticInitializationFor("org.apache.dolphinscheduler.spi.utils.PropertyUtils")
@@ -99,6 +98,39 @@ public class JupyterTaskTest {
                         "--progress-bar");
     }
 
+    @Test
+    public void testBuildJupyterCommandWithRequirements() throws Exception {
+        String parameters = buildJupyterCommandWithRequirements();
+        TaskExecutionContext taskExecutionContext = PowerMockito.mock(TaskExecutionContext.class);
+        when(taskExecutionContext.getTaskParams()).thenReturn(parameters);
+        PowerMockito.mockStatic(PropertyUtils.class);
+        when(PropertyUtils.getString(any())).thenReturn("/opt/anaconda3/etc/profile.d/conda.sh");
+        PowerMockito.mockStatic(DateUtils.class);
+        when(DateUtils.getTimestampString()).thenReturn("123456789");
+        JupyterTask jupyterTask = spy(new JupyterTask(taskExecutionContext));
+        jupyterTask.init();
+        Assert.assertEquals(jupyterTask.buildCommand(),
+                "set +e \n " +
+                        "source /opt/anaconda3/etc/profile.d/conda.sh && " +
+                        "conda create -n jupyter-tmp-env-123456789 -y && " +
+                        "conda activate jupyter-tmp-env-123456789 && " +
+                        "pip install -r requirements.txt && " +
+                        "papermill " +
+                        "/test/input_note.ipynb " +
+                        "/test/output_note.ipynb " +
+                        "--parameters city Shanghai " +
+                        "--parameters factor 0.01 " +
+                        "--kernel python3 " +
+                        "--engine default_engine " +
+                        "--execution-timeout 10 " +
+                        "--start-timeout 3 " +
+                        "--version " +
+                        "--inject-paths " +
+                        "--progress-bar \n " +
+                        "conda deactivate && conda remove --name jupyter-tmp-env-123456789 --all -y"
+                );
+    }
+
     private String buildJupyterCommandWithLocalEnv() {
         JupyterParameters jupyterParameters = new JupyterParameters();
         jupyterParameters.setCondaEnvName("jupyter-lab");
@@ -127,4 +159,18 @@ public class JupyterTaskTest {
         return JSONUtils.toJsonString(jupyterParameters);
     }
 
+    private String buildJupyterCommandWithRequirements() {
+        JupyterParameters jupyterParameters = new JupyterParameters();
+        jupyterParameters.setCondaEnvName("requirements.txt");
+        jupyterParameters.setInputNotePath("/test/input_note.ipynb");
+        jupyterParameters.setOutputNotePath("/test/output_note.ipynb");
+        jupyterParameters.setParameters("{\"city\": \"Shanghai\", \"factor\": \"0.01\"}");
+        jupyterParameters.setKernel("python3");
+        jupyterParameters.setEngine("default_engine");
+        jupyterParameters.setExecutionTimeout("10");
+        jupyterParameters.setStartTimeout("3");
+        jupyterParameters.setOthers("--version");
+        return JSONUtils.toJsonString(jupyterParameters);
+    }
+
 }