You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by di...@apache.org on 2021/05/07 06:41:42 UTC
[flink] branch release-1.13 updated: [hotfix][docs][python] Add an
overview page for Python UDFs
This is an automated email from the ASF dual-hosted git repository.
dianfu pushed a commit to branch release-1.13
in repository https://gitbox.apache.org/repos/asf/flink.git
The following commit(s) were added to refs/heads/release-1.13 by this push:
new c8b3160 [hotfix][docs][python] Add an overview page for Python UDFs
c8b3160 is described below
commit c8b31602a5554d7f53bb198177c211ca20492dc3
Author: Dian Fu <di...@apache.org>
AuthorDate: Fri May 7 14:40:25 2021 +0800
[hotfix][docs][python] Add an overview page for Python UDFs
---
.../docs/dev/python/table/udfs/overview.md | 63 ++++++++++++++++++++++
.../docs/dev/python/table/udfs/python_udfs.md | 2 +-
.../content/docs/dev/python/table/udfs/overview.md | 63 ++++++++++++++++++++++
.../docs/dev/python/table/udfs/python_udfs.md | 27 +---------
.../python/table/udfs/vectorized_python_udfs.md | 2 +-
5 files changed, 129 insertions(+), 28 deletions(-)
diff --git a/docs/content.zh/docs/dev/python/table/udfs/overview.md b/docs/content.zh/docs/dev/python/table/udfs/overview.md
new file mode 100644
index 0000000..e721fed
--- /dev/null
+++ b/docs/content.zh/docs/dev/python/table/udfs/overview.md
@@ -0,0 +1,63 @@
+---
+title: "概览"
+weight: 1
+type: docs
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# User-defined Functions
+
+PyFlink Table API empowers users to do data transformations with Python user-defined functions.
+
+Currently, it supports two kinds of Python user-defined functions: the [general Python user-defined
+functions]({{< ref "docs/dev/python/table/udfs/python_udfs" >}}) which process data one row at a time and
+[vectorized Python user-defined functions]({{< ref "docs/dev/python/table/udfs/vectorized_python_udfs" >}})
+which process data one batch at a time.
+
+## Bundling UDFs
+
+To run Python UDFs (as well as Pandas UDFs) in any non-local mode, it is strongly recommended
+bundling your Python UDF definitions using the config option [`python-files`]({{< ref "docs/dev/python/python_config" >}}#python-files),
+if your Python UDFs live outside the file where the `main()` function is defined.
+Otherwise, you may run into `ModuleNotFoundError: No module named 'my_udf'`
+if you define Python UDFs in a file called `my_udf.py`.
+
+## Loading resources in UDFs
+
+There are scenarios when you want to load some resources in UDFs first, then running computation
+(i.e., `eval`) over and over again, without having to re-load the resources.
+For example, you may want to load a large deep learning model only once,
+then run batch prediction against the model multiple times.
+
+Overriding the `open` method of `UserDefinedFunction` is exactly what you need.
+
+```python
+class Predict(ScalarFunction):
+ def open(self, function_context):
+ import pickle
+
+ with open("resources.zip/resources/model.pkl", "rb") as f:
+ self.model = pickle.load(f)
+
+ def eval(self, x):
+ return self.model.predict(x)
+
+predict = udf(Predict(), result_type=DataTypes.DOUBLE(), func_type="pandas")
+```
diff --git a/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md b/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md
index ced6216..bc1d5bc 100644
--- a/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md
+++ b/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md
@@ -1,5 +1,5 @@
---
-title: "普通自定义函数(UDF)"
+title: "普通自定义函数"
weight: 21
type: docs
aliases:
diff --git a/docs/content/docs/dev/python/table/udfs/overview.md b/docs/content/docs/dev/python/table/udfs/overview.md
new file mode 100644
index 0000000..280a2be
--- /dev/null
+++ b/docs/content/docs/dev/python/table/udfs/overview.md
@@ -0,0 +1,63 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# User-defined Functions
+
+PyFlink Table API empowers users to do data transformations with Python user-defined functions.
+
+Currently, it supports two kinds of Python user-defined functions: the [general Python user-defined
+functions]({{< ref "docs/dev/python/table/udfs/python_udfs" >}}) which process data one row at a time and
+[vectorized Python user-defined functions]({{< ref "docs/dev/python/table/udfs/vectorized_python_udfs" >}})
+which process data one batch at a time.
+
+## Bundling UDFs
+
+To run Python UDFs (as well as Pandas UDFs) in any non-local mode, it is strongly recommended
+bundling your Python UDF definitions using the config option [`python-files`]({{< ref "docs/dev/python/python_config" >}}#python-files),
+if your Python UDFs live outside the file where the `main()` function is defined.
+Otherwise, you may run into `ModuleNotFoundError: No module named 'my_udf'`
+if you define Python UDFs in a file called `my_udf.py`.
+
+## Loading resources in UDFs
+
+There are scenarios when you want to load some resources in UDFs first, then running computation
+(i.e., `eval`) over and over again, without having to re-load the resources.
+For example, you may want to load a large deep learning model only once,
+then run batch prediction against the model multiple times.
+
+Overriding the `open` method of `UserDefinedFunction` is exactly what you need.
+
+```python
+class Predict(ScalarFunction):
+ def open(self, function_context):
+ import pickle
+
+ with open("resources.zip/resources/model.pkl", "rb") as f:
+ self.model = pickle.load(f)
+
+ def eval(self, x):
+ return self.model.predict(x)
+
+predict = udf(Predict(), result_type=DataTypes.DOUBLE(), func_type="pandas")
+```
diff --git a/docs/content/docs/dev/python/table/udfs/python_udfs.md b/docs/content/docs/dev/python/table/udfs/python_udfs.md
index a0445d8..ab31719 100644
--- a/docs/content/docs/dev/python/table/udfs/python_udfs.md
+++ b/docs/content/docs/dev/python/table/udfs/python_udfs.md
@@ -1,6 +1,6 @@
---
title: "General User-defined Functions"
-weight: 1
+weight: 5
type: docs
aliases:
- /dev/python/table-api-users-guide/udfs/python_udfs.html
@@ -552,28 +552,3 @@ class ListViewConcatTableAggregateFunction(TableAggregateFunction):
def get_result_type(self):
return DataTypes.ROW([DataTypes.FIELD("a", DataTypes.STRING())])
```
-
-## Bundling UDFs
-
-To run Python UDFs (as well as Pandas UDFs) in any non-local mode, it is strongly recommended to bundle your Python UDF definitions using the config option [`python-files`]({{< ref "docs/dev/python/python_config" >}}#python-files), if your Python UDFs live outside of the file where the `main()` function is defined.
-Otherwise, you may run into `ModuleNotFoundError: No module named 'my_udf'` if you define Python UDFs in a file called `my_udf.py`.
-
-## Loading resources in UDFs
-
-There are scenarios when you want to load some resources in UDFs first, then running computation (i.e., `eval`) over and over again, without having to re-load the resources. For example, you may want to load a large deep learning model only once, then run batch prediction against the model multiple times.
-
-Overriding the `open` method of `UserDefinedFunction` is exactly what you need.
-
-```python
-class Predict(ScalarFunction):
- def open(self, function_context):
- import pickle
-
- with open("resources.zip/resources/model.pkl", "rb") as f:
- self.model = pickle.load(f)
-
- def eval(self, x):
- return self.model.predict(x)
-
-predict = udf(Predict(), result_type=DataTypes.DOUBLE(), func_type="pandas")
-```
diff --git a/docs/content/docs/dev/python/table/udfs/vectorized_python_udfs.md b/docs/content/docs/dev/python/table/udfs/vectorized_python_udfs.md
index 0e64a60..0fb19ad 100644
--- a/docs/content/docs/dev/python/table/udfs/vectorized_python_udfs.md
+++ b/docs/content/docs/dev/python/table/udfs/vectorized_python_udfs.md
@@ -1,6 +1,6 @@
---
title: "Vectorized User-defined Functions"
-weight: 2
+weight: 10
type: docs
aliases:
- /dev/python/table-api-users-guide/udfs/vectorized_python_udfs.html