You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/07/14 12:17:04 UTC

[GitHub] [flink] HuangXingBo opened a new pull request, #20276: [FLINK-28549][python] Support DataStream PythonProcessOperator in Thread Mode

HuangXingBo opened a new pull request, #20276:
URL: https://github.com/apache/flink/pull/20276

   ## What is the purpose of the change
   
   *This pull request will support DataStream PythonProcessOperator in Thread Mode*
   
   
   ## Brief change log
   
     - *Refactor the directory structure of the current Python DataStream Operators*
     - *Add the support of `EmbeddedPythonProcessOperator`*
   
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
     - *`EmbeddedDataStreamTests` in `test_data_stream.py`*
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] HuangXingBo closed pull request #20276: [FLINK-28549][python] Support DataStream PythonProcessOperator in Thread Mode

Posted by GitBox <gi...@apache.org>.
HuangXingBo closed pull request #20276: [FLINK-28549][python] Support DataStream PythonProcessOperator in Thread Mode
URL: https://github.com/apache/flink/pull/20276


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] flinkbot commented on pull request #20276: [FLINK-28549][python] Support DataStream PythonProcessOperator in Thread Mode

Posted by GitBox <gi...@apache.org>.
flinkbot commented on PR #20276:
URL: https://github.com/apache/flink/pull/20276#issuecomment-1184383187

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "76fbeb7d6a732a7970bb031a9828c91785094be1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "76fbeb7d6a732a7970bb031a9828c91785094be1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76fbeb7d6a732a7970bb031a9828c91785094be1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] dianfu commented on a diff in pull request #20276: [FLINK-28549][python] Support DataStream PythonProcessOperator in Thread Mode

Posted by GitBox <gi...@apache.org>.
dianfu commented on code in PR #20276:
URL: https://github.com/apache/flink/pull/20276#discussion_r928771352


##########
flink-python/src/main/java/org/apache/flink/python/util/PythonConfigUtil.java:
##########
@@ -149,8 +151,8 @@ private static void processSideOutput(List<Transformation<?>> transformations) {
                 final Transformation<?> upTransform =
                         Iterables.getOnlyElement(sideTransform.getInputs());
                 if (PythonConfigUtil.isPythonDataStreamOperator(upTransform)) {
-                    final AbstractDataStreamPythonFunctionOperator<?> upOperator =
-                            (AbstractDataStreamPythonFunctionOperator<?>)
+                    final AbstractExternalDataStreamPythonFunctionOperator<?> upOperator =

Review Comment:
   DataStreamPythonFunctionOperator?



##########
flink-python/pyflink/datastream/tests/test_data_stream.py:
##########
@@ -1675,6 +1678,101 @@ def flat_map_func2(data):
         self.assert_equals_sorted(expected, results)
 
 
+@pytest.mark.skipif(sys.version_info < (3, 7), reason="requires python3.7")
+class EmbeddedDataStreamTests(PyFlinkStreamingTestCase):
+    def setUp(self):
+        super(EmbeddedDataStreamTests, self).setUp()
+        config = get_j_env_configuration(self.env._j_stream_execution_environment)
+        config.setString("python.execution-mode", "thread")
+        config.setString("akka.ask.timeout", "20 s")
+        self.test_sink = DataStreamTestSinkFunction()
+
+    def tearDown(self) -> None:
+        self.test_sink.clear()
+
+    def assert_equals_sorted(self, expected, actual):
+        expected.sort()
+        actual.sort()
+        self.assertEqual(expected, actual)
+
+    def test_basic_operations(self):

Review Comment:
   Could we refactor the test cases a bit and move test cases to be executed in both process mode and thread mode in one class?



##########
flink-python/src/main/java/org/apache/flink/python/util/PythonConfigUtil.java:
##########
@@ -289,7 +291,9 @@ private static boolean isPythonDataStreamOperator(
             StreamOperatorFactory<?> streamOperatorFactory) {
         if (streamOperatorFactory instanceof SimpleOperatorFactory) {
             return ((SimpleOperatorFactory<?>) streamOperatorFactory).getOperator()
-                    instanceof AbstractDataStreamPythonFunctionOperator;
+                            instanceof AbstractExternalDataStreamPythonFunctionOperator
+                    || ((SimpleOperatorFactory<?>) streamOperatorFactory).getOperator()

Review Comment:
   instance of DataStreamPythonFunctionOperator



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org