You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by di...@apache.org on 2021/05/01 13:58:58 UTC
[flink] branch release-1.13 updated: [FLINK-22544][python][docs]
Add the missing documentation about the command line options for PyFlink
This is an automated email from the ASF dual-hosted git repository.
dianfu pushed a commit to branch release-1.13
in repository https://gitbox.apache.org/repos/asf/flink.git
The following commit(s) were added to refs/heads/release-1.13 by this push:
new e59e9ec [FLINK-22544][python][docs] Add the missing documentation about the command line options for PyFlink
e59e9ec is described below
commit e59e9ec180e478f75d92350f0e7c6f8b68b1174f
Author: Dian Fu <di...@apache.org>
AuthorDate: Sat May 1 21:22:18 2021 +0800
[FLINK-22544][python][docs] Add the missing documentation about the command line options for PyFlink
---
docs/content.zh/docs/deployment/cli.md | 79 ++++++++++++++++++++++
docs/content/docs/deployment/cli.md | 79 ++++++++++++++++++++++
.../apache/flink/client/cli/CliFrontendParser.java | 6 +-
3 files changed, 161 insertions(+), 3 deletions(-)
diff --git a/docs/content.zh/docs/deployment/cli.md b/docs/content.zh/docs/deployment/cli.md
index ae1f399..13d8bb60 100644
--- a/docs/content.zh/docs/deployment/cli.md
+++ b/docs/content.zh/docs/deployment/cli.md
@@ -421,4 +421,83 @@ $ ./bin/flink run-application \
To learn more available options, please refer to [Kubernetes]({{< ref "docs/deployment/resource-providers/native_kubernetes" >}})
or [YARN]({{< ref "docs/deployment/resource-providers/yarn" >}}) which are described in more detail in the
Resource Provider section.
+
+Besides `--pyFiles`, `--pyModule` and `--python` mentioned above, there are also some other Python
+related options. Here's an overview of all the Python related options for the actions
+`run` and `run-application` supported by Flink's CLI tool:
+<table class="table table-bordered">
+ <thead>
+ <tr>
+ <th class="text-left" style="width: 25%">Option</th>
+ <th class="text-left" style="width: 50%">Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td><code class="highlighter-rouge">-py,--python</code></td>
+ <td>
+ Python script with the program entry. The dependent resources can be configured
+ with the <code class="highlighter-rouge">--pyFiles</code> option.
+ </td>
+ </tr>
+ <tr>
+ <td><code class="highlighter-rouge">-pym,--pyModule</code></td>
+ <td>
+ Python module with the program entry point.
+ This option must be used in conjunction with <code class="highlighter-rouge">--pyFiles</code>.
+ </td>
+ </tr>
+ <tr>
+ <td><code class="highlighter-rouge">-pyfs,--pyFiles</code></td>
+ <td>
+ Attach custom files for job. The standard resource file suffixes such as .py/.egg/.zip/.whl or directory are all supported.
+ These files will be added to the PYTHONPATH of both the local client and the remote python UDF worker.
+ Files suffixed with .zip will be extracted and added to PYTHONPATH.
+ Comma (',') could be used as the separator to specify multiple files
+ (e.g., --pyFiles file:///tmp/myresource.zip,hdfs:///$namenode_address/myresource2.zip).
+ </td>
+ </tr>
+ <tr>
+ <td><code class="highlighter-rouge">-pyarch,--pyArchives</code></td>
+ <td>
+ Add python archive files for job. The archive files will be extracted to the working directory
+ of python UDF worker. Currently only zip-format is supported. For each archive file, a target directory
+ be specified. If the target directory name is specified, the archive file will be extracted to a
+ directory with the specified name. Otherwise, the archive file will be extracted to a
+ directory with the same name of the archive file. The files uploaded via this option are accessible
+ via relative path. '#' could be used as the separator of the archive file path and the target directory
+ name. Comma (',') could be used as the separator to specify multiple archive files.
+ This option can be used to upload the virtual environment, the data files used in Python UDF
+ (e.g., --pyArchives file:///tmp/py37.zip,file:///tmp/data.zip#data --pyExecutable
+ py37.zip/py37/bin/python). The data files could be accessed in Python UDF, e.g.:
+ f = open('data/data.txt', 'r').
+ </td>
+ </tr>
+ <tr>
+ <td><code class="highlighter-rouge">-pyexec,--pyExecutable</code></td>
+ <td>
+ Specify the path of the python interpreter used to execute the python UDF worker
+ (e.g.: --pyExecutable /usr/local/bin/python3).
+ The python UDF worker depends on Python 3.6+, Apache Beam (version == 2.27.0),
+ Pip (version >= 7.1.0) and SetupTools (version >= 37.0.0).
+ Please ensure that the specified environment meets the above requirements.
+ </td>
+ </tr>
+ <tr>
+ <td><code class="highlighter-rouge">-pyreq,--pyRequirements</code></td>
+ <td>
+ Specify the requirements.txt file which defines the third-party dependencies.
+ These dependencies will be installed and added to the PYTHONPATH of the python UDF worker.
+ A directory which contains the installation packages of these dependencies could be specified
+ optionally. Use '#' as the separator if the optional parameter exists
+ (e.g., --pyRequirements file:///tmp/requirements.txt#file:///tmp/cached_dir).
+ </td>
+ </tr>
+ </tbody>
+</table>
+
+In addition to the command line options during submitting the job, it also supports to specify the
+dependencies via configuration or Python API inside the code. Please refer to the
+[dependency management]({{< ref "docs/dev/python/dependency_management" >}}) for more details.
+
{{< top >}}
diff --git a/docs/content/docs/deployment/cli.md b/docs/content/docs/deployment/cli.md
index 9e62836..9761288 100644
--- a/docs/content/docs/deployment/cli.md
+++ b/docs/content/docs/deployment/cli.md
@@ -419,4 +419,83 @@ $ ./bin/flink run-application \
To learn more available options, please refer to [Kubernetes]({{< ref "docs/deployment/resource-providers/native_kubernetes" >}})
or [YARN]({{< ref "docs/deployment/resource-providers/yarn" >}}) which are described in more detail in the
Resource Provider section.
+
+Besides `--pyFiles`, `--pyModule` and `--python` mentioned above, there are also some other Python
+related options. Here's an overview of all the Python related options for the actions
+`run` and `run-application` supported by Flink's CLI tool:
+<table class="table table-bordered">
+ <thead>
+ <tr>
+ <th class="text-left" style="width: 25%">Option</th>
+ <th class="text-left" style="width: 50%">Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td><code class="highlighter-rouge">-py,--python</code></td>
+ <td>
+ Python script with the program entry. The dependent resources can be configured
+ with the <code class="highlighter-rouge">--pyFiles</code> option.
+ </td>
+ </tr>
+ <tr>
+ <td><code class="highlighter-rouge">-pym,--pyModule</code></td>
+ <td>
+ Python module with the program entry point.
+ This option must be used in conjunction with <code class="highlighter-rouge">--pyFiles</code>.
+ </td>
+ </tr>
+ <tr>
+ <td><code class="highlighter-rouge">-pyfs,--pyFiles</code></td>
+ <td>
+ Attach custom files for job. The standard resource file suffixes such as .py/.egg/.zip/.whl or directory are all supported.
+ These files will be added to the PYTHONPATH of both the local client and the remote python UDF worker.
+ Files suffixed with .zip will be extracted and added to PYTHONPATH.
+ Comma (',') could be used as the separator to specify multiple files
+ (e.g., --pyFiles file:///tmp/myresource.zip,hdfs:///$namenode_address/myresource2.zip).
+ </td>
+ </tr>
+ <tr>
+ <td><code class="highlighter-rouge">-pyarch,--pyArchives</code></td>
+ <td>
+ Add python archive files for job. The archive files will be extracted to the working directory
+ of python UDF worker. Currently only zip-format is supported. For each archive file, a target directory
+ be specified. If the target directory name is specified, the archive file will be extracted to a
+ directory with the specified name. Otherwise, the archive file will be extracted to a
+ directory with the same name of the archive file. The files uploaded via this option are accessible
+ via relative path. '#' could be used as the separator of the archive file path and the target directory
+ name. Comma (',') could be used as the separator to specify multiple archive files.
+ This option can be used to upload the virtual environment, the data files used in Python UDF
+ (e.g., --pyArchives file:///tmp/py37.zip,file:///tmp/data.zip#data --pyExecutable
+ py37.zip/py37/bin/python). The data files could be accessed in Python UDF, e.g.:
+ f = open('data/data.txt', 'r').
+ </td>
+ </tr>
+ <tr>
+ <td><code class="highlighter-rouge">-pyexec,--pyExecutable</code></td>
+ <td>
+ Specify the path of the python interpreter used to execute the python UDF worker
+ (e.g.: --pyExecutable /usr/local/bin/python3).
+ The python UDF worker depends on Python 3.6+, Apache Beam (version == 2.27.0),
+ Pip (version >= 7.1.0) and SetupTools (version >= 37.0.0).
+ Please ensure that the specified environment meets the above requirements.
+ </td>
+ </tr>
+ <tr>
+ <td><code class="highlighter-rouge">-pyreq,--pyRequirements</code></td>
+ <td>
+ Specify the requirements.txt file which defines the third-party dependencies.
+ These dependencies will be installed and added to the PYTHONPATH of the python UDF worker.
+ A directory which contains the installation packages of these dependencies could be specified
+ optionally. Use '#' as the separator if the optional parameter exists
+ (e.g., --pyRequirements file:///tmp/requirements.txt#file:///tmp/cached_dir).
+ </td>
+ </tr>
+ </tbody>
+</table>
+
+In addition to the command line options during submitting the job, it also supports to specify the
+dependencies via configuration or Python API inside the code. Please refer to the
+[dependency management]({{< ref "docs/dev/python/dependency_management" >}}) for more details.
+
{{< top >}}
diff --git a/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java b/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java
index dbc51e8..de41b04 100644
--- a/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java
+++ b/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java
@@ -195,7 +195,7 @@ public class CliFrontendParser {
+ "These files will be added to the PYTHONPATH of both the local client and the remote python UDF worker. "
+ "Files suffixed with .zip will be extracted and added to PYTHONPATH. "
+ "Comma (',') could be used as the separator to specify multiple files "
- + "(e.g.: --pyFiles file:///tmp/myresource.zip,hdfs:///$namenode_address/myresource2.zip).");
+ + "(e.g., --pyFiles file:///tmp/myresource.zip,hdfs:///$namenode_address/myresource2.zip).");
public static final Option PYMODULE_OPTION =
new Option(
@@ -214,7 +214,7 @@ public class CliFrontendParser {
+ "These dependencies will be installed and added to the PYTHONPATH of the python UDF worker. "
+ "A directory which contains the installation packages of these dependencies could be specified "
+ "optionally. Use '#' as the separator if the optional parameter exists "
- + "(e.g.: --pyRequirements file:///tmp/requirements.txt#file:///tmp/cached_dir).");
+ + "(e.g., --pyRequirements file:///tmp/requirements.txt#file:///tmp/cached_dir).");
public static final Option PYARCHIVE_OPTION =
new Option(
@@ -229,7 +229,7 @@ public class CliFrontendParser {
+ "via relative path. '#' could be used as the separator of the archive file path and the target directory "
+ "name. Comma (',') could be used as the separator to specify multiple archive files. "
+ "This option can be used to upload the virtual environment, the data files used in Python UDF "
- + "(e.g.: --pyArchives file:///tmp/py37.zip,file:///tmp/data.zip#data --pyExecutable "
+ + "(e.g., --pyArchives file:///tmp/py37.zip,file:///tmp/data.zip#data --pyExecutable "
+ "py37.zip/py37/bin/python). The data files could be accessed in Python UDF, e.g.: "
+ "f = open('data/data.txt', 'r').");