You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/27 01:59:31 UTC

[GitHub] [spark] HeartSaVioR opened a new pull request, #38008: [SPARk-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures

HeartSaVioR opened a new pull request, #38008:
URL: https://github.com/apache/spark/pull/38008

   ### What changes were proposed in this pull request?
   
   This PR proposes a new test case for applyInPandasWithState to verify fault-tolerance semantic is not broken despite of random python worker failure. If the sink provides end-to-end exactly-once, the query should respect the guarantee. Otherwise, the query should respect stateful exactly-once, but at-least-once in terms of outputs.
   
   The test leverages file stream sink which is end-to-end exactly-once, but to make the verification simpler, we just verify whether the stateful exactly-once is guaranteed despite of python worker failures.
   
   ### Why are the changes needed?
   
   This strengthen the test coverage, especially the fault-tolerance semantic.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New test added.  Manually ran `./python/run-tests --testnames 'pyspark.sql.tests.test_pandas_grouped_map_with_state'` 10 times and all succeeded.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38008: [SPARk-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r980664896


##########
python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py:
##########
@@ -90,6 +107,99 @@ def check_results(batch_df, _):
         self.assertTrue(q.isActive)
         q.processAllAvailable()
 
+    def test_apply_in_pandas_with_state_python_worker_random_failure(self):
+        output_path = tempfile.mkdtemp()
+        checkpoint_loc = tempfile.mkdtemp()
+        shutil.rmtree(output_path)
+        shutil.rmtree(checkpoint_loc)
+
+        def run_query():
+            df = self.spark.readStream.format("text") \
+                .option("maxFilesPerTrigger", "1") \

Review Comment:
   This query runs 10 batches from 10 files, which each file has 100 words. 1000 words in overall.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR closed pull request #38008: [SPARK-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures

Posted by GitBox <gi...@apache.org>.

HeartSaVioR closed pull request #38008: [SPARK-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures
URL: https://github.com/apache/spark/pull/38008


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38008: [SPARk-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r980667435


##########
python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py:
##########
@@ -90,6 +107,99 @@ def check_results(batch_df, _):
         self.assertTrue(q.isActive)
         q.processAllAvailable()
 
+    def test_apply_in_pandas_with_state_python_worker_random_failure(self):
+        output_path = tempfile.mkdtemp()
+        checkpoint_loc = tempfile.mkdtemp()
+        shutil.rmtree(output_path)
+        shutil.rmtree(checkpoint_loc)
+
+        def run_query():
+            df = self.spark.readStream.format("text") \
+                .option("maxFilesPerTrigger", "1") \
+                .load(self.base_path + "/random_failure/input")
+
+            for q in self.spark.streams.active:
+                q.stop()
+            self.assertTrue(df.isStreaming)
+
+            output_type = StructType(
+                [StructField("value", StringType()), StructField("count", LongType())]
+            )
+            state_type = StructType([StructField("cnt", LongType())])
+
+            def func(key, pdf_iter, state):
+                assert isinstance(state, GroupState)
+
+                # should be huge enough to not trigger kill in every batches
+                # but should be also reasonable to trigger kill multiple times across batches
+                if random.randrange(300) == 1:
+                    sys.exit(1)
+
+                count = state.getOption
+                if count is None:
+                    count = 0
+                else:
+                    count = count[0]
+
+                for pdf in pdf_iter:
+                    count += len(pdf)
+
+                state.update((count,))
+                yield pd.DataFrame({"value": [key[0]], "count": [count]})
+
+            q = (
+                df.groupBy(df["value"])
+                    .applyInPandasWithState(
+                    func, output_type, state_type, "Append", GroupStateTimeout.NoTimeout
+                )
+                    .writeStream.queryName("this_query")
+                    .format("json")
+                    .outputMode("append")
+                    .option("path", output_path)
+                    .option("checkpointLocation", checkpoint_loc)
+                    .start()
+            )
+
+            return q
+
+        q = run_query()
+
+        self.assertEqual(q.name, "this_query")
+        self.assertTrue(q.isActive)
+
+        # expected_output directory is constucted from below query:
+        # spark.read.format("text").load("./input").groupBy("value").count() \
+        #     .repartition(1).sort("value").write.format("json").save("./output")
+        expected = self.spark.read.schema("value string, count int").format("json") \
+            .load(self.base_path + "/random_failure/expected_output") \
+            .sort("value").collect()
+
+        curr_time = time.time()
+        timeout = curr_time + 120  # 2 minutes

Review Comment:
   There's `eventually` available at `pyspark.testing.utils`. Can we leverage this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38008: [SPARK-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r980719058


##########
python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py:
##########
@@ -90,6 +107,99 @@ def check_results(batch_df, _):
         self.assertTrue(q.isActive)
         q.processAllAvailable()
 
+    def test_apply_in_pandas_with_state_python_worker_random_failure(self):
+        output_path = tempfile.mkdtemp()
+        checkpoint_loc = tempfile.mkdtemp()
+        shutil.rmtree(output_path)
+        shutil.rmtree(checkpoint_loc)
+
+        def run_query():
+            df = self.spark.readStream.format("text") \
+                .option("maxFilesPerTrigger", "1") \
+                .load(self.base_path + "/random_failure/input")
+
+            for q in self.spark.streams.active:
+                q.stop()
+            self.assertTrue(df.isStreaming)
+
+            output_type = StructType(
+                [StructField("value", StringType()), StructField("count", LongType())]
+            )
+            state_type = StructType([StructField("cnt", LongType())])
+
+            def func(key, pdf_iter, state):
+                assert isinstance(state, GroupState)
+
+                # should be huge enough to not trigger kill in every batches
+                # but should be also reasonable to trigger kill multiple times across batches
+                if random.randrange(300) == 1:
+                    sys.exit(1)
+
+                count = state.getOption
+                if count is None:
+                    count = 0
+                else:
+                    count = count[0]
+
+                for pdf in pdf_iter:
+                    count += len(pdf)
+
+                state.update((count,))
+                yield pd.DataFrame({"value": [key[0]], "count": [count]})
+
+            q = (
+                df.groupBy(df["value"])
+                    .applyInPandasWithState(
+                    func, output_type, state_type, "Append", GroupStateTimeout.NoTimeout
+                )
+                    .writeStream.queryName("this_query")
+                    .format("json")
+                    .outputMode("append")
+                    .option("path", output_path)
+                    .option("checkpointLocation", checkpoint_loc)
+                    .start()
+            )
+
+            return q
+
+        q = run_query()
+
+        self.assertEqual(q.name, "this_query")
+        self.assertTrue(q.isActive)
+
+        # expected_output directory is constucted from below query:
+        # spark.read.format("text").load("./input").groupBy("value").count() \
+        #     .repartition(1).sort("value").write.format("json").save("./output")
+        expected = self.spark.read.schema("value string, count int").format("json") \
+            .load(self.base_path + "/random_failure/expected_output") \
+            .sort("value").collect()
+
+        curr_time = time.time()
+        timeout = curr_time + 120  # 2 minutes

Review Comment:
   Thanks for the info! I didn't notice that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38008: [SPARK-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r980721102


##########
python/test_support/sql/streaming/apply_in_pandas_with_state/random_failure/input/test-00000.txt:
##########
@@ -0,0 +1,100 @@
+non

Review Comment:
   I just changed both tests to create a dataset files before running a test; neither input files nor golden file is needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #38008: [SPARK-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on PR #38008:
URL: https://github.com/apache/spark/pull/38008#issuecomment-1259178740

   https://github.com/HeartSaVioR/spark/runs/8566461025
   
   Looks like GA build for checking the result couldn't pull the result from forked repo. Maybe due to concurrent runs?
   Anyway, GA passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38008: [SPARK-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r980721524


##########
python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py:
##########
@@ -90,6 +107,99 @@ def check_results(batch_df, _):
         self.assertTrue(q.isActive)
         q.processAllAvailable()
 
+    def test_apply_in_pandas_with_state_python_worker_random_failure(self):
+        output_path = tempfile.mkdtemp()
+        checkpoint_loc = tempfile.mkdtemp()
+        shutil.rmtree(output_path)
+        shutil.rmtree(checkpoint_loc)
+
+        def run_query():
+            df = self.spark.readStream.format("text") \
+                .option("maxFilesPerTrigger", "1") \
+                .load(self.base_path + "/random_failure/input")
+
+            for q in self.spark.streams.active:
+                q.stop()
+            self.assertTrue(df.isStreaming)
+
+            output_type = StructType(
+                [StructField("value", StringType()), StructField("count", LongType())]
+            )
+            state_type = StructType([StructField("cnt", LongType())])
+
+            def func(key, pdf_iter, state):
+                assert isinstance(state, GroupState)
+
+                # should be huge enough to not trigger kill in every batches
+                # but should be also reasonable to trigger kill multiple times across batches
+                if random.randrange(300) == 1:
+                    sys.exit(1)
+
+                count = state.getOption
+                if count is None:
+                    count = 0
+                else:
+                    count = count[0]
+
+                for pdf in pdf_iter:
+                    count += len(pdf)
+
+                state.update((count,))
+                yield pd.DataFrame({"value": [key[0]], "count": [count]})
+
+            q = (
+                df.groupBy(df["value"])
+                    .applyInPandasWithState(
+                    func, output_type, state_type, "Append", GroupStateTimeout.NoTimeout
+                )
+                    .writeStream.queryName("this_query")
+                    .format("json")
+                    .outputMode("append")
+                    .option("path", output_path)
+                    .option("checkpointLocation", checkpoint_loc)
+                    .start()
+            )
+
+            return q
+
+        q = run_query()
+
+        self.assertEqual(q.name, "this_query")
+        self.assertTrue(q.isActive)
+
+        # expected_output directory is constucted from below query:
+        # spark.read.format("text").load("./input").groupBy("value").count() \
+        #     .repartition(1).sort("value").write.format("json").save("./output")
+        expected = self.spark.read.schema("value string, count int").format("json") \
+            .load(self.base_path + "/random_failure/expected_output") \
+            .sort("value").collect()

Review Comment:
   Thanks, I didn't know that is standard - I thought that's workaround. Good to know.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38008: [SPARk-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r980666406


##########
python/test_support/sql/streaming/apply_in_pandas_with_state/random_failure/input/test-00000.txt:
##########
@@ -0,0 +1,100 @@
+non

Review Comment:
   Can we avoid adding these files? I try to avoid this in general to make the test case self-contained and more readable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38008: [SPARK-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r980719432


##########
python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py:
##########
@@ -46,8 +55,27 @@
     cast(str, pandas_requirement_message or pyarrow_requirement_message),
 )
 class GroupedMapInPandasWithStateTests(ReusedSQLTestCase):
+    @classmethod
+    def conf(cls):
+        cfg = SparkConf()
+        cfg.set("spark.sql.shuffle.partitions", "5")
+        return cfg
+
+    def __init__(self, methodName="runTest"):
+        super(GroupedMapInPandasWithStateTests, self).__init__(methodName)
+        self.base_path = "python/test_support/sql/streaming/apply_in_pandas_with_state"

Review Comment:
   Seems like we don't need this anymore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38008: [SPARK-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r980722405


##########
python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py:
##########
@@ -46,8 +55,27 @@
     cast(str, pandas_requirement_message or pyarrow_requirement_message),
 )
 class GroupedMapInPandasWithStateTests(ReusedSQLTestCase):
+    @classmethod
+    def conf(cls):
+        cfg = SparkConf()
+        cfg.set("spark.sql.shuffle.partitions", "5")
+        return cfg
+
+    def __init__(self, methodName="runTest"):
+        super(GroupedMapInPandasWithStateTests, self).__init__(methodName)
+        self.base_path = "python/test_support/sql/streaming/apply_in_pandas_with_state"

Review Comment:
   Nice finding!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #38008: [SPARK-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on PR #38008:
URL: https://github.com/apache/spark/pull/38008#issuecomment-1259178940

   Thanks! Merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38008: [SPARk-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r980666658


##########
python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py:
##########
@@ -90,6 +107,99 @@ def check_results(batch_df, _):
         self.assertTrue(q.isActive)
         q.processAllAvailable()
 
+    def test_apply_in_pandas_with_state_python_worker_random_failure(self):
+        output_path = tempfile.mkdtemp()
+        checkpoint_loc = tempfile.mkdtemp()
+        shutil.rmtree(output_path)
+        shutil.rmtree(checkpoint_loc)
+
+        def run_query():
+            df = self.spark.readStream.format("text") \
+                .option("maxFilesPerTrigger", "1") \
+                .load(self.base_path + "/random_failure/input")
+
+            for q in self.spark.streams.active:
+                q.stop()
+            self.assertTrue(df.isStreaming)
+
+            output_type = StructType(
+                [StructField("value", StringType()), StructField("count", LongType())]
+            )
+            state_type = StructType([StructField("cnt", LongType())])
+
+            def func(key, pdf_iter, state):
+                assert isinstance(state, GroupState)
+
+                # should be huge enough to not trigger kill in every batches
+                # but should be also reasonable to trigger kill multiple times across batches
+                if random.randrange(300) == 1:
+                    sys.exit(1)
+
+                count = state.getOption
+                if count is None:
+                    count = 0
+                else:
+                    count = count[0]
+
+                for pdf in pdf_iter:
+                    count += len(pdf)
+
+                state.update((count,))
+                yield pd.DataFrame({"value": [key[0]], "count": [count]})
+
+            q = (
+                df.groupBy(df["value"])
+                    .applyInPandasWithState(
+                    func, output_type, state_type, "Append", GroupStateTimeout.NoTimeout
+                )
+                    .writeStream.queryName("this_query")
+                    .format("json")
+                    .outputMode("append")
+                    .option("path", output_path)
+                    .option("checkpointLocation", checkpoint_loc)
+                    .start()
+            )
+
+            return q
+
+        q = run_query()
+
+        self.assertEqual(q.name, "this_query")
+        self.assertTrue(q.isActive)
+
+        # expected_output directory is constucted from below query:
+        # spark.read.format("text").load("./input").groupBy("value").count() \
+        #     .repartition(1).sort("value").write.format("json").save("./output")
+        expected = self.spark.read.schema("value string, count int").format("json") \
+            .load(self.base_path + "/random_failure/expected_output") \
+            .sort("value").collect()

Review Comment:
   This style is recommended by PEP8 IIRC
   ```suggestion
           expected = (
               self.spark.read.schema("value string, count int").format("json")
               .load(self.base_path + "/random_failure/expected_output")
               .sort("value").collect()
           )
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #38008: [SPARk-40571][SS][TESTS] Construct a new test case for applyInPandasWithState to verify fault-tolerance semantic with random python worker failures

Posted by GitBox <gi...@apache.org>.

HeartSaVioR commented on PR #38008:
URL: https://github.com/apache/spark/pull/38008#issuecomment-1258874132

   cc. @HyukjinKwon @alex-balikov 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org