You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by di...@apache.org on 2021/05/01 13:58:58 UTC

[flink] branch release-1.13 updated: [FLINK-22544][python][docs] Add the missing documentation about the command line options for PyFlink

This is an automated email from the ASF dual-hosted git repository.

dianfu pushed a commit to branch release-1.13
in repository https://gitbox.apache.org/repos/asf/flink.git


The following commit(s) were added to refs/heads/release-1.13 by this push:
     new e59e9ec  [FLINK-22544][python][docs] Add the missing documentation about the command line options for PyFlink
e59e9ec is described below

commit e59e9ec180e478f75d92350f0e7c6f8b68b1174f
Author: Dian Fu <di...@apache.org>
AuthorDate: Sat May 1 21:22:18 2021 +0800

    [FLINK-22544][python][docs] Add the missing documentation about the command line options for PyFlink
---
 docs/content.zh/docs/deployment/cli.md             | 79 ++++++++++++++++++++++
 docs/content/docs/deployment/cli.md                | 79 ++++++++++++++++++++++
 .../apache/flink/client/cli/CliFrontendParser.java |  6 +-
 3 files changed, 161 insertions(+), 3 deletions(-)

diff --git a/docs/content.zh/docs/deployment/cli.md b/docs/content.zh/docs/deployment/cli.md
index ae1f399..13d8bb60 100644
--- a/docs/content.zh/docs/deployment/cli.md
+++ b/docs/content.zh/docs/deployment/cli.md
@@ -421,4 +421,83 @@ $ ./bin/flink run-application \
 To learn more available options, please refer to [Kubernetes]({{< ref "docs/deployment/resource-providers/native_kubernetes" >}})
 or [YARN]({{< ref "docs/deployment/resource-providers/yarn" >}}) which are described in more detail in the
 Resource Provider section.
+
+Besides `--pyFiles`, `--pyModule` and `--python` mentioned above, there are also some other Python
+related options. Here's an overview of all the Python related options for the actions
+`run` and `run-application` supported by Flink's CLI tool:
+<table class="table table-bordered">
+    <thead>
+        <tr>
+          <th class="text-left" style="width: 25%">Option</th>
+          <th class="text-left" style="width: 50%">Description</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td><code class="highlighter-rouge">-py,--python</code></td>
+            <td>
+                Python script with the program entry. The dependent resources can be configured
+                with the <code class="highlighter-rouge">--pyFiles</code> option.
+            </td>
+        </tr>
+        <tr>
+            <td><code class="highlighter-rouge">-pym,--pyModule</code></td>
+            <td>
+                Python module with the program entry point.
+                This option must be used in conjunction with <code class="highlighter-rouge">--pyFiles</code>.
+            </td>
+        </tr>
+        <tr>
+            <td><code class="highlighter-rouge">-pyfs,--pyFiles</code></td>
+            <td>
+                Attach custom files for job. The standard resource file suffixes such as .py/.egg/.zip/.whl or directory are all supported.
+                These files will be added to the PYTHONPATH of both the local client and the remote python UDF worker.
+                Files suffixed with .zip will be extracted and added to PYTHONPATH.
+                Comma (',') could be used as the separator to specify multiple files
+                (e.g., --pyFiles file:///tmp/myresource.zip,hdfs:///$namenode_address/myresource2.zip).
+            </td>
+        </tr>
+        <tr>
+            <td><code class="highlighter-rouge">-pyarch,--pyArchives</code></td>
+            <td>
+                Add python archive files for job. The archive files will be extracted to the working directory
+                of python UDF worker. Currently only zip-format is supported. For each archive file, a target directory
+                be specified. If the target directory name is specified, the archive file will be extracted to a
+                directory with the specified name. Otherwise, the archive file will be extracted to a
+                directory with the same name of the archive file. The files uploaded via this option are accessible
+                via relative path. '#' could be used as the separator of the archive file path and the target directory
+                name. Comma (',') could be used as the separator to specify multiple archive files.
+                This option can be used to upload the virtual environment, the data files used in Python UDF
+                (e.g., --pyArchives file:///tmp/py37.zip,file:///tmp/data.zip#data --pyExecutable
+                py37.zip/py37/bin/python). The data files could be accessed in Python UDF, e.g.:
+                f = open('data/data.txt', 'r').
+            </td>
+        </tr>
+        <tr>
+            <td><code class="highlighter-rouge">-pyexec,--pyExecutable</code></td>
+            <td>
+                Specify the path of the python interpreter used to execute the python UDF worker
+                (e.g.: --pyExecutable /usr/local/bin/python3).
+                The python UDF worker depends on Python 3.6+, Apache Beam (version == 2.27.0),
+                Pip (version >= 7.1.0) and SetupTools (version >= 37.0.0).
+                Please ensure that the specified environment meets the above requirements.
+            </td>
+        </tr>
+        <tr>
+            <td><code class="highlighter-rouge">-pyreq,--pyRequirements</code></td>
+            <td>
+                Specify the requirements.txt file which defines the third-party dependencies.
+                These dependencies will be installed and added to the PYTHONPATH of the python UDF worker.
+                A directory which contains the installation packages of these dependencies could be specified
+                optionally. Use '#' as the separator if the optional parameter exists
+                (e.g., --pyRequirements file:///tmp/requirements.txt#file:///tmp/cached_dir).
+            </td>
+        </tr>
+    </tbody>
+</table>
+
+In addition to the command line options during submitting the job, it also supports to specify the
+dependencies via configuration or Python API inside the code. Please refer to the
+[dependency management]({{< ref "docs/dev/python/dependency_management" >}}) for more details.
+
 {{< top >}}
diff --git a/docs/content/docs/deployment/cli.md b/docs/content/docs/deployment/cli.md
index 9e62836..9761288 100644
--- a/docs/content/docs/deployment/cli.md
+++ b/docs/content/docs/deployment/cli.md
@@ -419,4 +419,83 @@ $ ./bin/flink run-application \
 To learn more available options, please refer to [Kubernetes]({{< ref "docs/deployment/resource-providers/native_kubernetes" >}})
 or [YARN]({{< ref "docs/deployment/resource-providers/yarn" >}}) which are described in more detail in the
 Resource Provider section.
+
+Besides `--pyFiles`, `--pyModule` and `--python` mentioned above, there are also some other Python
+related options. Here's an overview of all the Python related options for the actions
+`run` and `run-application` supported by Flink's CLI tool:
+<table class="table table-bordered">
+    <thead>
+        <tr>
+          <th class="text-left" style="width: 25%">Option</th>
+          <th class="text-left" style="width: 50%">Description</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td><code class="highlighter-rouge">-py,--python</code></td>
+            <td>
+                Python script with the program entry. The dependent resources can be configured
+                with the <code class="highlighter-rouge">--pyFiles</code> option.
+            </td>
+        </tr>
+        <tr>
+            <td><code class="highlighter-rouge">-pym,--pyModule</code></td>
+            <td>
+                Python module with the program entry point.
+                This option must be used in conjunction with <code class="highlighter-rouge">--pyFiles</code>.
+            </td>
+        </tr>
+        <tr>
+            <td><code class="highlighter-rouge">-pyfs,--pyFiles</code></td>
+            <td>
+                Attach custom files for job. The standard resource file suffixes such as .py/.egg/.zip/.whl or directory are all supported.
+                These files will be added to the PYTHONPATH of both the local client and the remote python UDF worker.
+                Files suffixed with .zip will be extracted and added to PYTHONPATH.
+                Comma (',') could be used as the separator to specify multiple files
+                (e.g., --pyFiles file:///tmp/myresource.zip,hdfs:///$namenode_address/myresource2.zip).
+            </td>
+        </tr>
+        <tr>
+            <td><code class="highlighter-rouge">-pyarch,--pyArchives</code></td>
+            <td>
+                Add python archive files for job. The archive files will be extracted to the working directory
+                of python UDF worker. Currently only zip-format is supported. For each archive file, a target directory
+                be specified. If the target directory name is specified, the archive file will be extracted to a
+                directory with the specified name. Otherwise, the archive file will be extracted to a
+                directory with the same name of the archive file. The files uploaded via this option are accessible
+                via relative path. '#' could be used as the separator of the archive file path and the target directory
+                name. Comma (',') could be used as the separator to specify multiple archive files.
+                This option can be used to upload the virtual environment, the data files used in Python UDF
+                (e.g., --pyArchives file:///tmp/py37.zip,file:///tmp/data.zip#data --pyExecutable
+                py37.zip/py37/bin/python). The data files could be accessed in Python UDF, e.g.:
+                f = open('data/data.txt', 'r').
+            </td>
+        </tr>
+        <tr>
+            <td><code class="highlighter-rouge">-pyexec,--pyExecutable</code></td>
+            <td>
+                Specify the path of the python interpreter used to execute the python UDF worker
+                (e.g.: --pyExecutable /usr/local/bin/python3).
+                The python UDF worker depends on Python 3.6+, Apache Beam (version == 2.27.0),
+                Pip (version >= 7.1.0) and SetupTools (version >= 37.0.0).
+                Please ensure that the specified environment meets the above requirements.
+            </td>
+        </tr>
+        <tr>
+            <td><code class="highlighter-rouge">-pyreq,--pyRequirements</code></td>
+            <td>
+                Specify the requirements.txt file which defines the third-party dependencies.
+                These dependencies will be installed and added to the PYTHONPATH of the python UDF worker.
+                A directory which contains the installation packages of these dependencies could be specified
+                optionally. Use '#' as the separator if the optional parameter exists
+                (e.g., --pyRequirements file:///tmp/requirements.txt#file:///tmp/cached_dir).
+            </td>
+        </tr>
+    </tbody>
+</table>
+
+In addition to the command line options during submitting the job, it also supports to specify the
+dependencies via configuration or Python API inside the code. Please refer to the
+[dependency management]({{< ref "docs/dev/python/dependency_management" >}}) for more details.
+
 {{< top >}}
diff --git a/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java b/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java
index dbc51e8..de41b04 100644
--- a/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java
+++ b/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java
@@ -195,7 +195,7 @@ public class CliFrontendParser {
                             + "These files will be added to the PYTHONPATH of both the local client and the remote python UDF worker. "
                             + "Files suffixed with .zip will be extracted and added to PYTHONPATH. "
                             + "Comma (',') could be used as the separator to specify multiple files "
-                            + "(e.g.: --pyFiles file:///tmp/myresource.zip,hdfs:///$namenode_address/myresource2.zip).");
+                            + "(e.g., --pyFiles file:///tmp/myresource.zip,hdfs:///$namenode_address/myresource2.zip).");
 
     public static final Option PYMODULE_OPTION =
             new Option(
@@ -214,7 +214,7 @@ public class CliFrontendParser {
                             + "These dependencies will be installed and added to the PYTHONPATH of the python UDF worker. "
                             + "A directory which contains the installation packages of these dependencies could be specified "
                             + "optionally. Use '#' as the separator if the optional parameter exists "
-                            + "(e.g.: --pyRequirements file:///tmp/requirements.txt#file:///tmp/cached_dir).");
+                            + "(e.g., --pyRequirements file:///tmp/requirements.txt#file:///tmp/cached_dir).");
 
     public static final Option PYARCHIVE_OPTION =
             new Option(
@@ -229,7 +229,7 @@ public class CliFrontendParser {
                             + "via relative path. '#' could be used as the separator of the archive file path and the target directory "
                             + "name. Comma (',') could be used as the separator to specify multiple archive files. "
                             + "This option can be used to upload the virtual environment, the data files used in Python UDF "
-                            + "(e.g.: --pyArchives file:///tmp/py37.zip,file:///tmp/data.zip#data --pyExecutable "
+                            + "(e.g., --pyArchives file:///tmp/py37.zip,file:///tmp/data.zip#data --pyExecutable "
                             + "py37.zip/py37/bin/python). The data files could be accessed in Python UDF, e.g.: "
                             + "f = open('data/data.txt', 'r').");