You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by di...@apache.org on 2021/05/07 06:41:42 UTC

[flink] branch release-1.13 updated: [hotfix][docs][python] Add an overview page for Python UDFs

This is an automated email from the ASF dual-hosted git repository.

dianfu pushed a commit to branch release-1.13
in repository https://gitbox.apache.org/repos/asf/flink.git


The following commit(s) were added to refs/heads/release-1.13 by this push:
     new c8b3160  [hotfix][docs][python] Add an overview page for Python UDFs
c8b3160 is described below

commit c8b31602a5554d7f53bb198177c211ca20492dc3
Author: Dian Fu <di...@apache.org>
AuthorDate: Fri May 7 14:40:25 2021 +0800

    [hotfix][docs][python] Add an overview page for Python UDFs
---
 .../docs/dev/python/table/udfs/overview.md         | 63 ++++++++++++++++++++++
 .../docs/dev/python/table/udfs/python_udfs.md      |  2 +-
 .../content/docs/dev/python/table/udfs/overview.md | 63 ++++++++++++++++++++++
 .../docs/dev/python/table/udfs/python_udfs.md      | 27 +---------
 .../python/table/udfs/vectorized_python_udfs.md    |  2 +-
 5 files changed, 129 insertions(+), 28 deletions(-)

diff --git a/docs/content.zh/docs/dev/python/table/udfs/overview.md b/docs/content.zh/docs/dev/python/table/udfs/overview.md
new file mode 100644
index 0000000..e721fed
--- /dev/null
+++ b/docs/content.zh/docs/dev/python/table/udfs/overview.md
@@ -0,0 +1,63 @@
+---
+title: "概览"
+weight: 1
+type: docs
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# User-defined Functions
+
+PyFlink Table API empowers users to do data transformations with Python user-defined functions.
+
+Currently, it supports two kinds of Python user-defined functions: the [general Python user-defined
+functions]({{< ref "docs/dev/python/table/udfs/python_udfs" >}}) which process data one row at a time and
+[vectorized Python user-defined functions]({{< ref "docs/dev/python/table/udfs/vectorized_python_udfs" >}})
+which process data one batch at a time.
+
+## Bundling UDFs
+
+To run Python UDFs (as well as Pandas UDFs) in any non-local mode, it is strongly recommended
+bundling your Python UDF definitions using the config option [`python-files`]({{< ref "docs/dev/python/python_config" >}}#python-files),
+if your Python UDFs live outside the file where the `main()` function is defined.
+Otherwise, you may run into `ModuleNotFoundError: No module named 'my_udf'`
+if you define Python UDFs in a file called `my_udf.py`.
+
+## Loading resources in UDFs
+
+There are scenarios when you want to load some resources in UDFs first, then running computation
+(i.e., `eval`) over and over again, without having to re-load the resources.
+For example, you may want to load a large deep learning model only once,
+then run batch prediction against the model multiple times.
+
+Overriding the `open` method of `UserDefinedFunction` is exactly what you need.
+
+```python
+class Predict(ScalarFunction):
+    def open(self, function_context):
+        import pickle
+
+        with open("resources.zip/resources/model.pkl", "rb") as f:
+            self.model = pickle.load(f)
+
+    def eval(self, x):
+        return self.model.predict(x)
+
+predict = udf(Predict(), result_type=DataTypes.DOUBLE(), func_type="pandas")
+```
diff --git a/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md b/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md
index ced6216..bc1d5bc 100644
--- a/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md
+++ b/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md
@@ -1,5 +1,5 @@
 ---
-title: "普通自定义函数(UDF)"
+title: "普通自定义函数"
 weight: 21
 type: docs
 aliases:
diff --git a/docs/content/docs/dev/python/table/udfs/overview.md b/docs/content/docs/dev/python/table/udfs/overview.md
new file mode 100644
index 0000000..280a2be
--- /dev/null
+++ b/docs/content/docs/dev/python/table/udfs/overview.md
@@ -0,0 +1,63 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# User-defined Functions
+
+PyFlink Table API empowers users to do data transformations with Python user-defined functions.
+
+Currently, it supports two kinds of Python user-defined functions: the [general Python user-defined
+functions]({{< ref "docs/dev/python/table/udfs/python_udfs" >}}) which process data one row at a time and
+[vectorized Python user-defined functions]({{< ref "docs/dev/python/table/udfs/vectorized_python_udfs" >}})
+which process data one batch at a time.
+
+## Bundling UDFs
+
+To run Python UDFs (as well as Pandas UDFs) in any non-local mode, it is strongly recommended
+bundling your Python UDF definitions using the config option [`python-files`]({{< ref "docs/dev/python/python_config" >}}#python-files),
+if your Python UDFs live outside the file where the `main()` function is defined.
+Otherwise, you may run into `ModuleNotFoundError: No module named 'my_udf'`
+if you define Python UDFs in a file called `my_udf.py`.
+
+## Loading resources in UDFs
+
+There are scenarios when you want to load some resources in UDFs first, then running computation
+(i.e., `eval`) over and over again, without having to re-load the resources.
+For example, you may want to load a large deep learning model only once,
+then run batch prediction against the model multiple times.
+
+Overriding the `open` method of `UserDefinedFunction` is exactly what you need.
+
+```python
+class Predict(ScalarFunction):
+    def open(self, function_context):
+        import pickle
+
+        with open("resources.zip/resources/model.pkl", "rb") as f:
+            self.model = pickle.load(f)
+
+    def eval(self, x):
+        return self.model.predict(x)
+
+predict = udf(Predict(), result_type=DataTypes.DOUBLE(), func_type="pandas")
+```
diff --git a/docs/content/docs/dev/python/table/udfs/python_udfs.md b/docs/content/docs/dev/python/table/udfs/python_udfs.md
index a0445d8..ab31719 100644
--- a/docs/content/docs/dev/python/table/udfs/python_udfs.md
+++ b/docs/content/docs/dev/python/table/udfs/python_udfs.md
@@ -1,6 +1,6 @@
 ---
 title: "General User-defined Functions"
-weight: 1
+weight: 5
 type: docs
 aliases:
   - /dev/python/table-api-users-guide/udfs/python_udfs.html
@@ -552,28 +552,3 @@ class ListViewConcatTableAggregateFunction(TableAggregateFunction):
     def get_result_type(self):
         return DataTypes.ROW([DataTypes.FIELD("a", DataTypes.STRING())])
 ```
-
-## Bundling UDFs
-
-To run Python UDFs (as well as Pandas UDFs) in any non-local mode, it is strongly recommended to bundle your Python UDF definitions using the config option [`python-files`]({{< ref "docs/dev/python/python_config" >}}#python-files), if your Python UDFs live outside of the file where the `main()` function is defined.
-Otherwise, you may run into `ModuleNotFoundError: No module named 'my_udf'` if you define Python UDFs in a file called `my_udf.py`.
-
-## Loading resources in UDFs
-
-There are scenarios when you want to load some resources in UDFs first, then running computation (i.e., `eval`) over and over again, without having to re-load the resources. For example, you may want to load a large deep learning model only once, then run batch prediction against the model multiple times.
-
-Overriding the `open` method of `UserDefinedFunction` is exactly what you need.
-
-```python
-class Predict(ScalarFunction):
-    def open(self, function_context):
-        import pickle
-
-        with open("resources.zip/resources/model.pkl", "rb") as f:
-            self.model = pickle.load(f)
-
-    def eval(self, x):
-        return self.model.predict(x)
-
-predict = udf(Predict(), result_type=DataTypes.DOUBLE(), func_type="pandas")
-```
diff --git a/docs/content/docs/dev/python/table/udfs/vectorized_python_udfs.md b/docs/content/docs/dev/python/table/udfs/vectorized_python_udfs.md
index 0e64a60..0fb19ad 100644
--- a/docs/content/docs/dev/python/table/udfs/vectorized_python_udfs.md
+++ b/docs/content/docs/dev/python/table/udfs/vectorized_python_udfs.md
@@ -1,6 +1,6 @@
 ---
 title: "Vectorized User-defined Functions"
-weight: 2
+weight: 10
 type: docs
 aliases:
   - /dev/python/table-api-users-guide/udfs/vectorized_python_udfs.html