You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/03/21 02:47:28 UTC

[GitHub] [flink] dianfu commented on a change in pull request #19150: [FLINK-26727][python] Fix the implementation of sub-interpreter in Thread Mode

dianfu commented on a change in pull request #19150:
URL: https://github.com/apache/flink/pull/19150#discussion_r830718361



##########
File path: docs/content.zh/docs/dev/python/python_execution_mode.md
##########
@@ -31,61 +31,48 @@ defines how to execute your customized Python functions.
 Prior to release-1.15, there is the only execution mode called `PROCESS` execution mode. The `PROCESS`
 mode means that the Python user-defined functions will be executed in separate Python processes.
 
-In release-1.15, it has introduced another two execution modes called `MULTI-THREAD` execution mode and
-`SUB-INTERPRETER` execution mode. The `MULTI-THREAD` mode means that the Python user-defined functions
-will be executed in the same thread as Java Operator, but it will be affected by GIL performance.
-The `SUB-INTERPRETER` mode means that the Python user-defined functions will be executed in Python
-different sub-interpreters rather than different threads of one interpreter, which can largely overcome
-the effects of the GIL, but some CPython extensions libraries doesn't support it, such as numpy, tensorflow, etc.
+In release-1.15, it has introduced a new execution mode called `THREAD` execution mode. The `THREAD`
+mode means that the Python user-defined functions will be executed in the same thread as Java Operator,
+but it will be affected by GIL performance.

Review comment:
       ```suggestion
   It should be noted that multiple Python user-defined functions running in the same JVM are still affected by GIL.
   ```

##########
File path: docs/content.zh/docs/dev/python/python_execution_mode.md
##########
@@ -31,61 +31,48 @@ defines how to execute your customized Python functions.
 Prior to release-1.15, there is the only execution mode called `PROCESS` execution mode. The `PROCESS`
 mode means that the Python user-defined functions will be executed in separate Python processes.
 
-In release-1.15, it has introduced another two execution modes called `MULTI-THREAD` execution mode and
-`SUB-INTERPRETER` execution mode. The `MULTI-THREAD` mode means that the Python user-defined functions
-will be executed in the same thread as Java Operator, but it will be affected by GIL performance.
-The `SUB-INTERPRETER` mode means that the Python user-defined functions will be executed in Python
-different sub-interpreters rather than different threads of one interpreter, which can largely overcome
-the effects of the GIL, but some CPython extensions libraries doesn't support it, such as numpy, tensorflow, etc.
+In release-1.15, it has introduced a new execution mode called `THREAD` execution mode. The `THREAD`
+mode means that the Python user-defined functions will be executed in the same thread as Java Operator,
+but it will be affected by GIL performance.
 
-## When can/should I use MULTI-THREAD execution mode or SUB-INTERPRETER execution mode?
+## When can/should I use THREAD execution mode?
 
-The purpose of the introduction of `MULTI-THREAD` mode and `SUB-INTERPRETER` mode is to overcome the
-overhead of serialization/deserialization and network communication caused in `PROCESS` mode.
-So if performance is not your concern, or the computing logic of your customized Python functions is
-the performance bottleneck of the job, `PROCESS` mode will be the best choice as `PROCESS` mode provides
-the best isolation compared to `MULTI-THREAD` mode and `SUB-INTERPRETER` mode.
-
-Compared to `MULTI-THREAD` execution mode, `SUB-INTERPRETER` execution mode can largely overcome the
-effects of the GIL, so you can get better performance usually. However, `SUB-INTERPRETER` may fail in some CPython
-extensions libraries, such as numpy, tensorflow. In this case, you should use `PROCESS` mode or `MULTI-THREAD` mode.
+The purpose of the introduction of `THREAD` mode is to overcome the overhead of serialization/deserialization
+and network communication caused in `PROCESS` mode. So if performance is not your concern, or the computing
+logic of your customized Python functions is the performance bottleneck of the job, `PROCESS` mode will
+be the best choice as `PROCESS` mode provides the best isolation compared to `THREAD` mode.
 
 ## Configuring Python execution mode
 
 The execution mode can be configured via the `python.execution-mode` setting.
-There are three possible values:
+There are two possible values:
 
  - `PROCESS`: The Python user-defined functions will be executed in separate Python process. (default)
- - `MULTI-THREAD`: The Python user-defined functions will be executed in the same thread as Java Operator.
- - `SUB-INTERPRETER`: The Python user-defined functions will be executed in Python different sub-interpreters.
+ - `THREAD`: The Python user-defined functions will be executed in the same thread as Java Operator.

Review comment:
       ```suggestion
    - `THREAD`: The Python user-defined functions will be executed in the same process as the Java operator.
   ```

##########
File path: flink-python/src/main/java/org/apache/flink/python/PythonOptions.java
##########
@@ -231,10 +231,8 @@
                     .stringType()
                     .defaultValue("process")
                     .withDescription(
-                            "Specify the python runtime execution mode. The optional values are `process`, `multi-thread` and `sub-interpreter`. "
+                            "Specify the python runtime execution mode. The optional values are `process` and `thread`. "
                                     + "The `process` mode means that the Python user-defined functions will be executed in separate Python process. "
-                                    + "The `multi-thread` mode means that the Python user-defined functions will be executed in the same thread as Java Operator, but it will be affected by GIL performance. "
-                                    + "The `sub-interpreter` mode means that the Python user-defined functions will be executed in python different sub-interpreters rather than different threads of one interpreter, "
-                                    + "which can largely overcome the effects of the GIL, but it maybe fail in some CPython extensions libraries, such as numpy, tensorflow. "
-                                    + "Note that if the python operator dose not support `multi-thread` and `sub-interpreter` mode, we will still use `process` mode.");
+                                    + "The `thread` mode means that the Python user-defined functions will be executed in the same thread as Java Operator, but it will be affected by GIL performance. "

Review comment:
       ```suggestion
                                       + "The `thread` mode means that the Python user-defined functions will be executed in the same process of the Java Operator. "
   ```

##########
File path: docs/content.zh/docs/dev/python/python_execution_mode.md
##########
@@ -31,61 +31,48 @@ defines how to execute your customized Python functions.
 Prior to release-1.15, there is the only execution mode called `PROCESS` execution mode. The `PROCESS`
 mode means that the Python user-defined functions will be executed in separate Python processes.
 
-In release-1.15, it has introduced another two execution modes called `MULTI-THREAD` execution mode and
-`SUB-INTERPRETER` execution mode. The `MULTI-THREAD` mode means that the Python user-defined functions
-will be executed in the same thread as Java Operator, but it will be affected by GIL performance.
-The `SUB-INTERPRETER` mode means that the Python user-defined functions will be executed in Python
-different sub-interpreters rather than different threads of one interpreter, which can largely overcome
-the effects of the GIL, but some CPython extensions libraries doesn't support it, such as numpy, tensorflow, etc.
+In release-1.15, it has introduced a new execution mode called `THREAD` execution mode. The `THREAD`
+mode means that the Python user-defined functions will be executed in the same thread as Java Operator,
+but it will be affected by GIL performance.
 
-## When can/should I use MULTI-THREAD execution mode or SUB-INTERPRETER execution mode?
+## When can/should I use THREAD execution mode?
 
-The purpose of the introduction of `MULTI-THREAD` mode and `SUB-INTERPRETER` mode is to overcome the
-overhead of serialization/deserialization and network communication caused in `PROCESS` mode.
-So if performance is not your concern, or the computing logic of your customized Python functions is
-the performance bottleneck of the job, `PROCESS` mode will be the best choice as `PROCESS` mode provides
-the best isolation compared to `MULTI-THREAD` mode and `SUB-INTERPRETER` mode.
-
-Compared to `MULTI-THREAD` execution mode, `SUB-INTERPRETER` execution mode can largely overcome the
-effects of the GIL, so you can get better performance usually. However, `SUB-INTERPRETER` may fail in some CPython
-extensions libraries, such as numpy, tensorflow. In this case, you should use `PROCESS` mode or `MULTI-THREAD` mode.
+The purpose of the introduction of `THREAD` mode is to overcome the overhead of serialization/deserialization
+and network communication caused in `PROCESS` mode. So if performance is not your concern, or the computing
+logic of your customized Python functions is the performance bottleneck of the job, `PROCESS` mode will
+be the best choice as `PROCESS` mode provides the best isolation compared to `THREAD` mode.
 
 ## Configuring Python execution mode
 
 The execution mode can be configured via the `python.execution-mode` setting.
-There are three possible values:
+There are two possible values:
 
  - `PROCESS`: The Python user-defined functions will be executed in separate Python process. (default)
- - `MULTI-THREAD`: The Python user-defined functions will be executed in the same thread as Java Operator.
- - `SUB-INTERPRETER`: The Python user-defined functions will be executed in Python different sub-interpreters.
+ - `THREAD`: The Python user-defined functions will be executed in the same thread as Java Operator.
 
 You could specify the Python execution mode using Python Table API as following:
 
 ```python
 # Specify `PROCESS` mode
 table_env.get_config().get_configuration().set_string("python.execution-mode", "process")
 
-# Specify `MULTI-THREAD` mode
-table_env.get_config().get_configuration().set_string("python.execution-mode", "multi-thread")
-
-# Specify `SUB-INTERPRETER` mode
-table_env.get_config().get_configuration().set_string("python.execution-mode", "sub-interpreter")
+# Specify `THREAD` mode
+table_env.get_config().get_configuration().set_string("python.execution-mode", "thread")
 ```
 
 {{< hint info >}}
-Currently, it still doesn't support to execute Python UDFs in `MULTI-THREAD` and `SUB-INTERPRETER` execution mode
-in all places. It will fall back to `PROCESS` execution mode in these cases. So it may happen that you configure a job
-to execute in `MULTI-THREAD` or `SUB-INTERPRETER` execution modes, however, it's actually executed in `PROCESS` execution mode.
+Currently, it still doesn't support to execute Python UDFs in `THREAD` execution mode in all places.
+It will fall back to `PROCESS` execution mode in these cases. So it may happen that you configure a job
+to execute in `THREAD` execution modes, however, it's actually executed in `PROCESS` execution mode.
 {{< /hint >}}
 {{< hint info >}}
-`MULTI-THREAD` execution mode only supports Python 3.7+. `SUB-INTERPRETER` execution mode only supports Python 3.8+.  
+`THREAD` execution mode only supports Python 3.7+.

Review comment:
       ```suggestion
   `THREAD` execution mode is only supported in Python 3.7+.
   ```

##########
File path: flink-python/src/main/java/org/apache/flink/python/PythonOptions.java
##########
@@ -231,10 +231,8 @@
                     .stringType()
                     .defaultValue("process")
                     .withDescription(
-                            "Specify the python runtime execution mode. The optional values are `process`, `multi-thread` and `sub-interpreter`. "
+                            "Specify the python runtime execution mode. The optional values are `process` and `thread`. "
                                     + "The `process` mode means that the Python user-defined functions will be executed in separate Python process. "
-                                    + "The `multi-thread` mode means that the Python user-defined functions will be executed in the same thread as Java Operator, but it will be affected by GIL performance. "
-                                    + "The `sub-interpreter` mode means that the Python user-defined functions will be executed in python different sub-interpreters rather than different threads of one interpreter, "
-                                    + "which can largely overcome the effects of the GIL, but it maybe fail in some CPython extensions libraries, such as numpy, tensorflow. "
-                                    + "Note that if the python operator dose not support `multi-thread` and `sub-interpreter` mode, we will still use `process` mode.");
+                                    + "The `thread` mode means that the Python user-defined functions will be executed in the same thread as Java Operator, but it will be affected by GIL performance. "
+                                    + "Note that if the python operator dose not support `thread` mode, we will still use `process` mode.");

Review comment:
       ```suggestion
                                       + "Note that currently it still doesn't support to execute Python user-defined functions in `thread` mode in all places. It will fall back to `process` mode in these cases. ");
   ```

##########
File path: docs/content.zh/docs/dev/python/python_execution_mode.md
##########
@@ -31,61 +31,48 @@ defines how to execute your customized Python functions.
 Prior to release-1.15, there is the only execution mode called `PROCESS` execution mode. The `PROCESS`
 mode means that the Python user-defined functions will be executed in separate Python processes.
 
-In release-1.15, it has introduced another two execution modes called `MULTI-THREAD` execution mode and
-`SUB-INTERPRETER` execution mode. The `MULTI-THREAD` mode means that the Python user-defined functions
-will be executed in the same thread as Java Operator, but it will be affected by GIL performance.
-The `SUB-INTERPRETER` mode means that the Python user-defined functions will be executed in Python
-different sub-interpreters rather than different threads of one interpreter, which can largely overcome
-the effects of the GIL, but some CPython extensions libraries doesn't support it, such as numpy, tensorflow, etc.
+In release-1.15, it has introduced a new execution mode called `THREAD` execution mode. The `THREAD`
+mode means that the Python user-defined functions will be executed in the same thread as Java Operator,
+but it will be affected by GIL performance.
 
-## When can/should I use MULTI-THREAD execution mode or SUB-INTERPRETER execution mode?
+## When can/should I use THREAD execution mode?
 
-The purpose of the introduction of `MULTI-THREAD` mode and `SUB-INTERPRETER` mode is to overcome the
-overhead of serialization/deserialization and network communication caused in `PROCESS` mode.
-So if performance is not your concern, or the computing logic of your customized Python functions is
-the performance bottleneck of the job, `PROCESS` mode will be the best choice as `PROCESS` mode provides
-the best isolation compared to `MULTI-THREAD` mode and `SUB-INTERPRETER` mode.
-
-Compared to `MULTI-THREAD` execution mode, `SUB-INTERPRETER` execution mode can largely overcome the
-effects of the GIL, so you can get better performance usually. However, `SUB-INTERPRETER` may fail in some CPython
-extensions libraries, such as numpy, tensorflow. In this case, you should use `PROCESS` mode or `MULTI-THREAD` mode.
+The purpose of the introduction of `THREAD` mode is to overcome the overhead of serialization/deserialization
+and network communication caused in `PROCESS` mode. So if performance is not your concern, or the computing
+logic of your customized Python functions is the performance bottleneck of the job, `PROCESS` mode will
+be the best choice as `PROCESS` mode provides the best isolation compared to `THREAD` mode.
 
 ## Configuring Python execution mode
 
 The execution mode can be configured via the `python.execution-mode` setting.
-There are three possible values:
+There are two possible values:
 
  - `PROCESS`: The Python user-defined functions will be executed in separate Python process. (default)
- - `MULTI-THREAD`: The Python user-defined functions will be executed in the same thread as Java Operator.
- - `SUB-INTERPRETER`: The Python user-defined functions will be executed in Python different sub-interpreters.
+ - `THREAD`: The Python user-defined functions will be executed in the same thread as Java Operator.
 
 You could specify the Python execution mode using Python Table API as following:
 
 ```python
 # Specify `PROCESS` mode
 table_env.get_config().get_configuration().set_string("python.execution-mode", "process")
 
-# Specify `MULTI-THREAD` mode
-table_env.get_config().get_configuration().set_string("python.execution-mode", "multi-thread")
-
-# Specify `SUB-INTERPRETER` mode
-table_env.get_config().get_configuration().set_string("python.execution-mode", "sub-interpreter")
+# Specify `THREAD` mode
+table_env.get_config().get_configuration().set_string("python.execution-mode", "thread")
 ```
 
 {{< hint info >}}
-Currently, it still doesn't support to execute Python UDFs in `MULTI-THREAD` and `SUB-INTERPRETER` execution mode
-in all places. It will fall back to `PROCESS` execution mode in these cases. So it may happen that you configure a job
-to execute in `MULTI-THREAD` or `SUB-INTERPRETER` execution modes, however, it's actually executed in `PROCESS` execution mode.
+Currently, it still doesn't support to execute Python UDFs in `THREAD` execution mode in all places.
+It will fall back to `PROCESS` execution mode in these cases. So it may happen that you configure a job
+to execute in `THREAD` execution modes, however, it's actually executed in `PROCESS` execution mode.

Review comment:
       ```suggestion
   to execute in `THREAD` execution mode, however, it's actually executed in `PROCESS` execution mode.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org