You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/12/17 02:42:41 UTC

[GitHub] [flink] dianfu commented on a change in pull request #10597: [FLINK-15270][python][docs] Add documentation about how to specify third-party dependencies via API for Python UDFs

dianfu commented on a change in pull request #10597: [FLINK-15270][python][docs] Add documentation about how to specify third-party dependencies via API for Python UDFs
URL: https://github.com/apache/flink/pull/10597#discussion_r358572060
 
 

 ##########
 File path: docs/dev/table/functions/udfs.md
 ##########
 @@ -211,6 +211,76 @@ table_env.register_function("add", add)
 # use the function in Python Table API
 my_table.select("add(a, b)")
 {% endhighlight %}
+
+If the python scalar function depends on other dependencies, you can specify the dependencies with the following table APIs or through <a href="{{ site.baseurl }}/ops/cli.html#usage">command line</a> directly when submit the job.
+
+<table class="table table-bordered">
+  <thead>
+    <tr>
+      <th class="text-left" style="width: 20%">Dependencies</th>
+      <th class="text-left">Description</th>
+    </tr>
+  </thead>
+
+  <tbody>
+    <tr>
+      <td>files</td>
+      <td>
+        <p>Adds python file dependencies which could be python files, python packages or local directories. They will be added to the PYTHONPATH of the python UDF worker.</p>
+{% highlight python %}
+table_env.add_python_file(file_path)
+{% endhighlight %}
+      </td>
+    </tr>
+    <tr>
+      <td>requirements</td>
+      <td>
+        <p>Specifies a requirements.txt file which defines the third-party dependencies. These dependencies will be installed to a temporary directory and added to the PYTHONPATH of the python UDF worker. For the dependencies which could not be accessed in the cluster, a directory which contains the installation packages of these dependencies could be specified using the parameter "requirements_cached_dir". It will be uploaded to the cluster to support offline installation.</p>
+{% highlight python %}
+# commands executed in shell
+echo numpy==1.16.5 > requirements.txt
+pip download -d cached_dir -r requirements.txt --no-binary :all:
+
+# python code
+table_env.set_python_requirements("requirements.txt", "cached_dir")
+{% endhighlight %}
+        <p>Please make sure the installation packages matches the platform of the cluster and the python version used. These packages will be installed using pip.</p>
+      </td>
+    </tr>
+    <tr>
+      <td>archive</td>
+      <td>
+        <p>Adds a python archive file dependency. The file will be extracted to the working directory of python UDF worker. If the parameter "target_dir" is specified, the archive file will be extracted to a directory named "target_dir". Otherwise, the archive file will be extracted to a directory with the same name of the archive file.</p>
+{% highlight python %}
+# command executed in shell
+# assert the relative path of python interpreter is py_env/bin/python
+zip -r py_env.zip py_env
+
+# python code
+table_env.add_python_archive("py_env.zip")
+table_env.get_config().set_python_executable("py_env.zip/py_env/bin/python")
+
+# or
+table_env.add_python_archive("py_env.zip", "myenv")
 
 Review comment:
   What about adding an example about how to use the data files of the archive in Python UDF?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services