You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/27 21:29:00 UTC

[GitHub] [spark] viirya opened a new pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

viirya opened a new pull request #30162:
URL: https://github.com/apache/spark/pull/30162


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   This patch proposes to make StateStore compression codec configurable.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   Currently the compression codec of StateStore is not configurable and hard-coded to be lz4. It is better if we can follow Spark other modules to configure the compression codec of StateStore. For example, we can choose zstd codec and zstd is configurable with different compression level.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   Yes, after this change users can config different codec for StateStore.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   Unit test. I manually set different codec to the config and run StateStoreSuite locally.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513980401



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCompatibilitySuite.scala
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.io.CompressionCodec
+import org.apache.spark.sql.catalyst.plans.PlanTestBase
+import org.apache.spark.sql.catalyst.streaming.InternalOutputModes.Update
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.functions.count
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.util.Utils
+
+class StateStoreCompatibilitySuite extends StreamTest with StateStoreCodecsTest {
+   testWithAllCodec(

Review comment:
       Thanks, @HeartSaVioR .




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718327938


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718299797






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-720382843


   Thanks for updating :smile:


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718293911


   I'm sorry to try back and forth, but I feel it weird that StateStoreCodecsTest belongs to StateStoreCompatibilitySuite.scala file. Probably we can just inline StateStoreCodecsTest to StateStoreSuite.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513831520



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala
##########
@@ -814,11 +818,53 @@ class StateStoreSuite extends StateStoreSuiteBase[HDFSBackedStateStoreProvider]
   }
 }
 
+class StateStoreCompatibleSuite extends StreamTest with StateStoreCodecsTest {

Review comment:
       Apache Spark uses `XXXCompatibilitySuite` instead of `XXXCompatibleSuite`. Could you rename this sure?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718308057


   **[Test build #130390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130390/testReport)** for PR 30162 at commit [`8e435a1`](https://github.com/apache/spark/commit/8e435a1a7266b13e9cfe0072c5a808f10c41ca6b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717553031


   **[Test build #130341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130341/testReport)** for PR 30162 at commit [`2eb813a`](https://github.com/apache/spark/commit/2eb813a8dc11bbb6b8716a4f7cbecf1a14d1b4df).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717719546


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513833642



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala
##########
@@ -814,11 +818,53 @@ class StateStoreSuite extends StateStoreSuiteBase[HDFSBackedStateStoreProvider]
   }
 }
 
+class StateStoreCompatibleSuite extends StreamTest with StateStoreCodecsTest {

Review comment:
       Sure.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717745990


   retest this, please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718285419


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717630563


   I don't have the numbers exactly for StateStore, but I remember we have some numbers for switching shuffle compression codec. @dbtsai Do you remember where the numbers are?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya edited a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya edited a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718054995


   > While we may not want to run all stateful tests with all compression, can we make sure we run basic tests against all compressions? All tests in StateStoreSuiteBase may need to be run with these compressions.
   
   Okay. I thought about let StateStoreSuiteBase runs all compressions. Although I turned to manually set up compression codec config and run it locally, because seems we don't always run all compression codec in related modules. I will update StateStoreSuiteBase.
   
   > We also tend to add backward compatibility config test if we add relevantSQLConfs - create a checkpoint from older version and try to load in this version. You can search around the existing configs and see how we constructed the tests.
   
   Thanks. Let me check.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717687791


   @dongjoon-hyun @viirya 
   The problem isn't something about changing the config during the single run. The problem is something about changing the config during the new run with checkpoint. That's why we should put the config for the first time, and always read the value from checkpoint and ignore further modification.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718345319






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718228562


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34986/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717719521


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34954/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718304998






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513168650



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1324,6 +1324,16 @@ object SQLConf {
     .intConf
     .createWithDefault(2)
 
+  val STATE_STORE_COMPRESSION_CODEC =
+    buildConf("spark.sql.streaming.stateStore.compression.codec")
+      .internal()
+      .doc("The codec used to compress delta and snapshot files generated by StateStore. " +
+        "By default, Spark provides four codecs: lz4, lzf, snappy, and zstd. You can also " +
+        "use fully qualified class names to specify the codec. Default codec is lz4.")
+      .version("3.1.0")
+      .stringConf
+      .createWithDefault("lz4")

Review comment:
       This doesn't change the default value. I believe we can add the benchmark result as a follow-up.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718259234


   @HeartSaVioR Added checkpoint codec compatibility test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513829300



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1324,6 +1324,16 @@ object SQLConf {
     .intConf
     .createWithDefault(2)
 
+  val STATE_STORE_COMPRESSION_CODEC =
+    buildConf("spark.sql.streaming.stateStore.compression.codec")
+      .internal()
+      .doc("The codec used to compress delta and snapshot files generated by StateStore. " +
+        "By default, Spark provides four codecs: lz4, lzf, snappy, and zstd. You can also " +
+        "use fully qualified class names to specify the codec. Default codec is lz4.")

Review comment:
       Also, I'm wondering what happens when the invalid class name. It would be great if we can add a negative test case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-719120291


   Late LGTM.
   Just one small concern for PR description, do we need to emphasize the new config will be written in checkpoint? I think this behavior is an important implementation detail and good for letting end-user know.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717666167


   > either 1) make it as a configuration but prevent the value to be changed after the query starts (like we do in state store formats)
   
   Btw, out of curiosity, I check two state format configs `spark.sql.streaming.join.stateFormatVersion` and `spark.sql.streaming.aggregation.stateFormatVersion`.  Their docs just explicitly claims that `state format version shouldn't be modified after running.`, and I don't find related code to prevent the values to be changed after the query starts. Maybe I miss it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513831661



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1324,6 +1324,16 @@ object SQLConf {
     .intConf
     .createWithDefault(2)
 
+  val STATE_STORE_COMPRESSION_CODEC =
+    buildConf("spark.sql.streaming.stateStore.compression.codec")
+      .internal()
+      .doc("The codec used to compress delta and snapshot files generated by StateStore. " +
+        "By default, Spark provides four codecs: lz4, lzf, snappy, and zstd. You can also " +
+        "use fully qualified class names to specify the codec. Default codec is lz4.")

Review comment:
       Okay.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717584258






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718221781


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34985/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718327938






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718345319






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717692390






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718343569


   **[Test build #130386 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130386/testReport)** for PR 30162 at commit [`ab69ccd`](https://github.com/apache/spark/commit/ab69ccd424d350fc1dbc1d9e7892da5cb2854094).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717719550


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34954/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717872045






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718345308


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34995/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717711223


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34954/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513828434



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala
##########
@@ -814,11 +818,53 @@ class StateStoreSuite extends StateStoreSuiteBase[HDFSBackedStateStoreProvider]
   }
 }
 
+class StateStoreCompatibleSuite extends StreamTest with StateStoreCodecsTest {
+  testWithAllCodec("SPARK-33263: Recovery from checkpoint before codec config introduced") {
+    val resourceUri = this.getClass.getResource(
+      "/structured-streaming/checkpoint-version-3.0.0-streaming-statestore-codec/").toURI
+    val checkpointDir = Utils.createTempDir().getCanonicalFile
+    FileUtils.copyDirectory(new File(resourceUri), checkpointDir)
+
+    import testImplicits._
+
+    val inputData = MemoryStream[Int]
+    val aggregated = inputData.toDF().groupBy("value").agg(count("*"))
+    inputData.addData(1, 2, 3)
+
+    /*
+      Note: The checkpoint was generated using the following input in Spark version 3.0.0:
+      AddData(inputData, 1, 2, 3)
+    */

Review comment:
       nit. Shall we adjust a little?
   - https://github.com/databricks/scala-style-guide#documentation-style




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717872045






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718358492


   **[Test build #130387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130387/testReport)** for PR 30162 at commit [`c828811`](https://github.com/apache/spark/commit/c8288111064b98107346e93c7686bbd48d5138a1).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718386994


   **[Test build #130390 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130390/testReport)** for PR 30162 at commit [`8e435a1`](https://github.com/apache/spark/commit/8e435a1a7266b13e9cfe0072c5a808f10c41ca6b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513168650



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1324,6 +1324,16 @@ object SQLConf {
     .intConf
     .createWithDefault(2)
 
+  val STATE_STORE_COMPRESSION_CODEC =
+    buildConf("spark.sql.streaming.stateStore.compression.codec")
+      .internal()
+      .doc("The codec used to compress delta and snapshot files generated by StateStore. " +
+        "By default, Spark provides four codecs: lz4, lzf, snappy, and zstd. You can also " +
+        "use fully qualified class names to specify the codec. Default codec is lz4.")
+      .version("3.1.0")
+      .stringConf
+      .createWithDefault("lz4")

Review comment:
       This doesn't change the default value. I believe we can add the benchmark result as a follow-up, @HeartSaVioR .




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718291157


   **[Test build #130382 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130382/testReport)** for PR 30162 at commit [`a1fb543`](https://github.com/apache/spark/commit/a1fb543c8a075ac5df521cca18133443bfc43391).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718299797






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #30162:
URL: https://github.com/apache/spark/pull/30162


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718221794






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718344106






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717746808


   **[Test build #130358 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130358/testReport)** for PR 30162 at commit [`bc6f21b`](https://github.com/apache/spark/commit/bc6f21b724bb47932a4cdd1c14f40ce79f630ad9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513835420



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1324,6 +1324,16 @@ object SQLConf {
     .intConf
     .createWithDefault(2)
 
+  val STATE_STORE_COMPRESSION_CODEC =
+    buildConf("spark.sql.streaming.stateStore.compression.codec")
+      .internal()
+      .doc("The codec used to compress delta and snapshot files generated by StateStore. " +
+        "By default, Spark provides four codecs: lz4, lzf, snappy, and zstd. You can also " +
+        "use fully qualified class names to specify the codec. Default codec is lz4.")

Review comment:
       For invalid codec name, `CompressionCodec.createCodec` will throw an `IllegalArgumentException`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718304768


   **[Test build #130385 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130385/testReport)** for PR 30162 at commit [`f16c563`](https://github.com/apache/spark/commit/f16c563cddc24d9df596a7a2b690457b514c8464).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class StateStoreCompatibleSuite extends StreamTest with StateStoreCodecsTest `
     * `trait StateStoreCodecsTest extends SparkFunSuite with PlanTestBase `


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717572809


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34943/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718387807






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718260070


   **[Test build #130385 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130385/testReport)** for PR 30162 at commit [`f16c563`](https://github.com/apache/spark/commit/f16c563cddc24d9df596a7a2b690457b514c8464).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717685222


   > https://github.com/apache/spark/blob/fcf8aa59b5025dde9b4af36953146894659967e2/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L92-L115
   > 
   > Please look into how `relevantSQLConfs` is handled in OffsetSeqMetadata.
   
   Hmm, as we use default value as lz4, do we still need to put new config into here?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513177486



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala
##########
@@ -89,10 +89,15 @@ case class OffsetSeqMetadata(
 
 object OffsetSeqMetadata extends Logging {
   private implicit val format = Serialization.formats(NoTypeHints)
+  /**
+   * These configs are related to streaming query execution and should not be changed across
+   * batches of a streaming query. The values of these configs are persisted into the offset
+   * log in the checkpoint position.
+   */

Review comment:
       Added some comments to notify readers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718337953


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34995/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718228556






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717609418


   > How about we follow `spark.eventLog.compression.codec`,
   > 
   > ```
   > The codec to compress logged events. If this is not given, spark.io.compression.codec will be used.
   > ```
   > 
   > so if `spark.sql.streaming.stateStore.compression.codec` is not set, we use `spark.io.compression.codec`.
   
   Basically the new `spark.sql.streaming.stateStore.compression.codec` is default to lz4 for backward-compatible. If we let it follow `spark.io.compression.codec`, it might be unintentionally changed other codec. If users read old state store, it will cause problem.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717776535






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717885470


   While we may not want to run all stateful tests with all compression, can we make sure we run basic tests against all compressions? All tests in StateStoreSuiteBase may need to be run with these compressions.
   
   We also tend to add backward compatibility config test if we add relevantSQLConfs - create a checkpoint from older version and try to load in this version. You can search around the existing configs and see how we constructed the tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718359146






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dbtsai commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717577793


   How about we follow `spark.eventLog.compression.codec`,
   ```
   The codec to compress logged events. If this is not given, spark.io.compression.codec will be used.
   ```
   so if `spark.sql.streaming.stateStore.compression.codec` is not set, we use `spark.io.compression.codec`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718187728


   **[Test build #130382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130382/testReport)** for PR 30162 at commit [`a1fb543`](https://github.com/apache/spark/commit/a1fb543c8a075ac5df521cca18133443bfc43391).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717746808


   **[Test build #130358 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130358/testReport)** for PR 30162 at commit [`bc6f21b`](https://github.com/apache/spark/commit/bc6f21b724bb47932a4cdd1c14f40ce79f630ad9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718262767


   **[Test build #130386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130386/testReport)** for PR 30162 at commit [`ab69ccd`](https://github.com/apache/spark/commit/ab69ccd424d350fc1dbc1d9e7892da5cb2854094).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718407316


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130392/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513885144



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCompatibilitySuite.scala
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.io.CompressionCodec
+import org.apache.spark.sql.catalyst.plans.PlanTestBase
+import org.apache.spark.sql.catalyst.streaming.InternalOutputModes.Update
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.functions.count
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.util.Utils
+
+class StateStoreCompatibleSuite extends StreamTest with StateStoreCodecsTest {

Review comment:
       `StateStoreCompatibleSuite` -> `StateStoreCompatibilitySuite`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717553031


   **[Test build #130341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130341/testReport)** for PR 30162 at commit [`2eb813a`](https://github.com/apache/spark/commit/2eb813a8dc11bbb6b8716a4f7cbecf1a14d1b4df).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718262767


   **[Test build #130386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130386/testReport)** for PR 30162 at commit [`ab69ccd`](https://github.com/apache/spark/commit/ab69ccd424d350fc1dbc1d9e7892da5cb2854094).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718285586


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34989/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718281337


   **[Test build #130387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130387/testReport)** for PR 30162 at commit [`c828811`](https://github.com/apache/spark/commit/c8288111064b98107346e93c7686bbd48d5138a1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718221794






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513903355



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCompatibilitySuite.scala
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.io.CompressionCodec
+import org.apache.spark.sql.catalyst.plans.PlanTestBase
+import org.apache.spark.sql.catalyst.streaming.InternalOutputModes.Update
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.functions.count
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.util.Utils
+
+class StateStoreCompatibilitySuite extends StreamTest with StateStoreCodecsTest {
+   testWithAllCodec(

Review comment:
       I didn't know we apply all codec test here as well - if then I feel it becomes a bit ambiguous where is the better place to put StateStoreCodecsTest. Anything of these options 1) here 2) StateStoreSuite 3) even in new file would be fine.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718327928


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34993/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya edited a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya edited a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717630563


   I don't have the numbers exactly for StateStore, but I remember we have some numbers for switching shuffle compression codec. @dbtsai Do you remember where the numbers are?
   
   At least the bottom line is we should not reply on a single codec for storage without any option. For example, assume lz4 codec has any unexpected issue, we can change to other codec. For now, users have no other ways.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717744125






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717753995


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718281337


   **[Test build #130387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130387/testReport)** for PR 30162 at commit [`c828811`](https://github.com/apache/spark/commit/c8288111064b98107346e93c7686bbd48d5138a1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513180738



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala
##########
@@ -89,10 +89,15 @@ case class OffsetSeqMetadata(
 
 object OffsetSeqMetadata extends Logging {
   private implicit val format = Serialization.formats(NoTypeHints)
+  /**
+   * These configs are related to streaming query execution and should not be changed across
+   * batches of a streaming query. The values of these configs are persisted into the offset
+   * log in the checkpoint position.
+   */
   private val relevantSQLConfs = Seq(
     SHUFFLE_PARTITIONS, STATE_STORE_PROVIDER_CLASS, STREAMING_MULTIPLE_WATERMARK_POLICY,
     FLATMAPGROUPSWITHSTATE_STATE_FORMAT_VERSION, STREAMING_AGGREGATION_STATE_FORMAT_VERSION,
-    STREAMING_JOIN_STATE_FORMAT_VERSION)
+    STREAMING_JOIN_STATE_FORMAT_VERSION, STATE_STORE_COMPRESSION_CODEC)

Review comment:
       I'd also recommend to add the default value ("lz4") in `relevantSQLConfDefaultValues` to make sure we don't make any possible mistake. (That's a sort of defensive programming though.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717776514


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34961/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718285407


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34988/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717706418


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34953/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718296486






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718291708






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-719046322


   (I'd appreciate if we can hold the PR for several days to have more eyes to look, if that's requested by reviewers. Doesn't mean we need to take this back, just a 2 cents for further reviewing & merging.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718191772


   **[Test build #130383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130383/testReport)** for PR 30162 at commit [`92b0bee`](https://github.com/apache/spark/commit/92b0beefcce229acaa93e3d76b2b7ffa52ae0369).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718407308


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718191772


   **[Test build #130383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130383/testReport)** for PR 30162 at commit [`92b0bee`](https://github.com/apache/spark/commit/92b0beefcce229acaa93e3d76b2b7ffa52ae0369).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717714910


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717649431






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718291708






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717753971


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34959/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718407308






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718406589


   **[Test build #130392 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130392/testReport)** for PR 30162 at commit [`4dc153d`](https://github.com/apache/spark/commit/4dc153db5eeea904e0c2208726cbd41bd514b62f).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class StateStoreCompatibilitySuite extends StreamTest with StateStoreCodecsTest `


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718305957


   > I'm sorry to try back and forth, but I feel it weird that StateStoreCodecsTest belongs to StateStoreCompatibilitySuite.scala file. Probably we can just inline StateStoreCodecsTest to StateStoreSuite.
   
   Inlining `StateStoreCodecsTest` to `StateStoreSuite` means we need extend `StateStoreSuite` at `StateStoreCompatibleSuite`. It will run duplicate tests in `StateStoreSuite`. And `StateStoreCompatibleSuite` also needs to provide implementation for `newStoreProvider`, `newStoreProvider` and `getLatestData`.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718291463


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34990/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718278737


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34989/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717749023


   Got it. Thanks, @HeartSaVioR !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717689524


   @HeartSaVioR Make sense. Looks like we need to put the config into `relevantSQLConfs`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717584235


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34943/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717871064


   **[Test build #130358 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130358/testReport)** for PR 30162 at commit [`bc6f21b`](https://github.com/apache/spark/commit/bc6f21b724bb47932a4cdd1c14f40ce79f630ad9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718326137


   **[Test build #130392 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130392/testReport)** for PR 30162 at commit [`4dc153d`](https://github.com/apache/spark/commit/4dc153db5eeea904e0c2208726cbd41bd514b62f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717626162


   cc @HeartSaVioR and @xuanyuanking FYI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718260070


   **[Test build #130385 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130385/testReport)** for PR 30162 at commit [`f16c563`](https://github.com/apache/spark/commit/f16c563cddc24d9df596a7a2b690457b514c8464).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717630958


   > That said, you'll need to either 1) make it as a configuration but prevent the value to be changed after the query starts (like we do in state store formats) or 2) add the information to the separate metadata file and let state store read it. Probably even for the case of 2) you'll want to prevent the compression codec to be changed across the lifetime of the query - if you allow arbitrary changes of the compression codec across batches, these information should be written and referenced which would become non-trivial overhead.
   
   This is good point. I'd like to make it simple so we don't allow to change the value after the query starts.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718228556


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717766382


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34961/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718218268


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34986/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718308057


   **[Test build #130390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130390/testReport)** for PR 30162 at commit [`8e435a1`](https://github.com/apache/spark/commit/8e435a1a7266b13e9cfe0072c5a808f10c41ca6b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513897382



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCompatibilitySuite.scala
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.io.CompressionCodec
+import org.apache.spark.sql.catalyst.plans.PlanTestBase
+import org.apache.spark.sql.catalyst.streaming.InternalOutputModes.Update
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.functions.count
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.util.Utils
+
+class StateStoreCompatibleSuite extends StreamTest with StateStoreCodecsTest {

Review comment:
       Oops!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718285593






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717744954


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34959/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-719071979


   Sure, @HeartSaVioR !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-719122190


   @xuanyuanking Thanks. I will update the PR description.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717754004


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34959/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya edited a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya edited a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-719122190


   @xuanyuanking Thanks. I updated the PR description.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718285425






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717776535






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718387807






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718326137


   **[Test build #130392 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130392/testReport)** for PR 30162 at commit [`4dc153d`](https://github.com/apache/spark/commit/4dc153db5eeea904e0c2208726cbd41bd514b62f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717625880


   At least we shouldn't rely on the configuration while reading the file - this isn't same as others, e.g. event log compression. For event log compression, the file has a postfix for the file compression, so regardless of the configuration, reader can extract the file correctly.
   
   One thing making it uneasy is that sometimes HDFS state store requires knowing about the exact file name to read without listing, if I remember correctly. That makes us unable to add the postfix for the compression. 
   
   That said, you'll need to either 1) make it as a configuration but prevent the value to be changed after the query starts (like we do in state store formats) or 2) add the information to the separate metadata file and let state store read it. Probably even for the case of 2) you'll want to prevent the compression codec to be changed across the lifetime of the query - if you allow arbitrary changes of the compression codec across batches, these information should be written and referenced which would become non-trivial overhead.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717719546






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717584258






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717649431






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717753995






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717684847


   > @viirya and @HeartSaVioR .
   > Shall we put the new config into `StaticSQLConf.scala` instead of `SQLConf.scala`? I think that is enough.
   
   Yes, that is also what I think. Will update later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718321187


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34993/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718285419






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717626419


   Before that, do you have numbers based on some experiments? It'd be nice if we get some numbers to determine how it will help.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718295797


   **[Test build #130383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130383/testReport)** for PR 30162 at commit [`92b0bee`](https://github.com/apache/spark/commit/92b0beefcce229acaa93e3d76b2b7ffa52ae0369).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718359146






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717744322


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130351/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513903355



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCompatibilitySuite.scala
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.io.CompressionCodec
+import org.apache.spark.sql.catalyst.plans.PlanTestBase
+import org.apache.spark.sql.catalyst.streaming.InternalOutputModes.Update
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.functions.count
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.util.Utils
+
+class StateStoreCompatibilitySuite extends StreamTest with StateStoreCodecsTest {
+   testWithAllCodec(

Review comment:
       I didn't know we apply all codec test here as well (I guess this case is different from existing one as the default value of the configuration isn't changed).
   
   Based on that I feel it becomes a bit ambiguous where is the better place to put StateStoreCodecsTest. Anything of these options 1) here 2) StateStoreSuite 3) even in new file would be fine.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513829456



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala
##########
@@ -111,7 +116,9 @@ object OffsetSeqMetadata extends Logging {
     STREAMING_AGGREGATION_STATE_FORMAT_VERSION.key ->
       StreamingAggregationStateManager.legacyVersion.toString,
     STREAMING_JOIN_STATE_FORMAT_VERSION.key ->
-      SymmetricHashJoinStateManager.legacyVersion.toString
+      SymmetricHashJoinStateManager.legacyVersion.toString,
+    STATE_STORE_COMPRESSION_CODEC.key -> "lz4"
+

Review comment:
       nit, redundant empty line.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513203243



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala
##########
@@ -89,10 +89,15 @@ case class OffsetSeqMetadata(
 
 object OffsetSeqMetadata extends Logging {
   private implicit val format = Serialization.formats(NoTypeHints)
+  /**
+   * These configs are related to streaming query execution and should not be changed across
+   * batches of a streaming query. The values of these configs are persisted into the offset
+   * log in the checkpoint position.
+   */
   private val relevantSQLConfs = Seq(
     SHUFFLE_PARTITIONS, STATE_STORE_PROVIDER_CLASS, STREAMING_MULTIPLE_WATERMARK_POLICY,
     FLATMAPGROUPSWITHSTATE_STATE_FORMAT_VERSION, STREAMING_AGGREGATION_STATE_FORMAT_VERSION,
-    STREAMING_JOIN_STATE_FORMAT_VERSION)
+    STREAMING_JOIN_STATE_FORMAT_VERSION, STATE_STORE_COMPRESSION_CODEC)

Review comment:
       Seems good if we possibly change default value unintentionally in the future.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717714910






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513828975



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1324,6 +1324,16 @@ object SQLConf {
     .intConf
     .createWithDefault(2)
 
+  val STATE_STORE_COMPRESSION_CODEC =
+    buildConf("spark.sql.streaming.stateStore.compression.codec")
+      .internal()
+      .doc("The codec used to compress delta and snapshot files generated by StateStore. " +
+        "By default, Spark provides four codecs: lz4, lzf, snappy, and zstd. You can also " +
+        "use fully qualified class names to specify the codec. Default codec is lz4.")

Review comment:
       Shall we add a test case for this assertion, `fully qualified class names`? In the new test case, it seems that we always use `getShortName`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718296486






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718327943


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34993/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717744289






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717551134


   cc @dbtsai @dongjoon-hyun 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718304998


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717724561


   **[Test build #130356 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130356/testReport)** for PR 30162 at commit [`bc6f21b`](https://github.com/apache/spark/commit/bc6f21b724bb47932a4cdd1c14f40ce79f630ad9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717682587


   @viirya and @HeartSaVioR .
   Shall we put the new config into `StaticSQLConf.scala` instead of `SQLConf.scala`? I think that is enough.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717648777


   **[Test build #130341 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130341/testReport)** for PR 30162 at commit [`2eb813a`](https://github.com/apache/spark/commit/2eb813a8dc11bbb6b8716a4f7cbecf1a14d1b4df).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718305004


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130385/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya edited a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya edited a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718305957


   > I'm sorry to try back and forth, but I feel it weird that StateStoreCodecsTest belongs to StateStoreCompatibilitySuite.scala file. Probably we can just inline StateStoreCodecsTest to StateStoreSuite.
   
   Inlining `StateStoreCodecsTest` to `StateStoreSuite` means we need extend `StateStoreSuite` at `StateStoreCompatibleSuite`. It will run duplicate tests in `StateStoreSuite`. And `StateStoreCompatibleSuite` also needs to provide implementation for `newStoreProvider`, `newStoreProvider` and `getLatestData` which are not used at all.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718277542


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34988/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30162:
URL: https://github.com/apache/spark/pull/30162#discussion_r513831520



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala
##########
@@ -814,11 +818,53 @@ class StateStoreSuite extends StateStoreSuiteBase[HDFSBackedStateStoreProvider]
   }
 }
 
+class StateStoreCompatibleSuite extends StreamTest with StateStoreCodecsTest {

Review comment:
       Apache Spark uses `XXXCompatibilitySuite` instead of `XXXCompatibleSuite`. Could you rename this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717683158


   https://github.com/apache/spark/blob/fcf8aa59b5025dde9b4af36953146894659967e2/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L92-L115
   
   Please look into how `relevantSQLConfs` is handled in OffsetSeqMetadata.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717714903


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34953/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717626419


   Before that, do you have numbers based on some experiments? It'd be nice if we get some numbers to determine how much it will help.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-719077722


   Thanks @HeartSaVioR @dongjoon-hyun 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717692390


   **[Test build #130351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130351/testReport)** for PR 30162 at commit [`86f0924`](https://github.com/apache/spark/commit/86f09243816112112addb8cf18c97118a983f434).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718054995


   > While we may not want to run all stateful tests with all compression, can we make sure we run basic tests against all compressions? All tests in StateStoreSuiteBase may need to be run with these compressions.
   
   Okay. I thought about let StateStoreSuiteBase runs all compressions. Although I turned to manually set up compression codec config and run it locally. I will update StateStoreSuiteBase.
   
   > We also tend to add backward compatibility config test if we add relevantSQLConfs - create a checkpoint from older version and try to load in this version. You can search around the existing configs and see how we constructed the tests.
   
   Thanks. Let me check.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717714917


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34953/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717744297






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718228540


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34986/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718299791


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34990/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-718344106






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org