You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/23 05:24:01 UTC

[GitHub] [spark] dongjoon-hyun opened a new pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

dongjoon-hyun opened a new pull request #31618:
URL: https://github.com/apache/spark/pull/31618


   ### What changes were proposed in this pull request?
   
   This PR aims to set `zstd` as the default value for `spark.eventLog.compression.codec` configuration.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784566920


   **[Test build #135390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135390/testReport)** for PR 31618 at commit [`0e88652`](https://github.com/apache/spark/commit/0e88652b99b833551d3e940a24d9d2c217fe4f51).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784027786


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135366/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784638616


   Thank you, @HeartSaVioR and @HyukjinKwon .
   
   BTW, for lz4, it looks like some header issues. `Apache Parquet` community also is having some interesting discussion during last two weeks.
   - https://lists.apache.org/thread.html/r03cf7d1c57feaf556a5d7bfd8d440d96f694114a88bc7d7ed1e51d12%40%3Cdev.parquet.apache.org%3E (`Request deprecation / removal of LZ4 compression`)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784585913


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39970/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-783949409


   **[Test build #135366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135366/testReport)** for PR 31618 at commit [`54f14cc`](https://github.com/apache/spark/commit/54f14ccca857e780c6d4370ee45c3302d4437a8c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #31618:
URL: https://github.com/apache/spark/pull/31618


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784583921


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39970/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784522469


   Hi, @HyukjinKwon . Why do you think so?
   > I think it's not an obvious win though .. Zstd looks more for archiving purpose with less throughput with high compression ratio vs lz4 is for more throughput with less compression.
   
   According to the benchmark, 
   - LZ4 1.7.5 compression time is not a winner. If you consider the upload time to the remote storage, ZSTD can be the winner.
   - LZ4 1.7.5 decompression time might be your reason. However, this is an event log.
      - When you download a log from `Spark History Server`, ZSTD log file will be downloaded 2~3x faster.
      - Also, when you view the log via `Spark History Server`, Spark History Server also do the download it from the remote storage like S3 and decompress it. 2~3x faster download will compensate the decompression downgrade slowdown.
    
   In addition, for the storage cost saving, ZSTD is a clear winner.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-783991480


   **[Test build #135365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135365/testReport)** for PR 31618 at commit [`6ceea1f`](https://github.com/apache/spark/commit/6ceea1f7d1355f7f7b78fa507783ea742ab771aa).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784662119


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135390/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-783975896


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39946/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784638750


   Merged to master for Apache Spark 3.2.0!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-783927059


   **[Test build #135365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135365/testReport)** for PR 31618 at commit [`6ceea1f`](https://github.com/apache/spark/commit/6ceea1f7d1355f7f7b78fa507783ea742ab771aa).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784522469






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784585913


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39970/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784652617


   **[Test build #135390 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135390/testReport)** for PR 31618 at commit [`0e88652`](https://github.com/apache/spark/commit/0e88652b99b833551d3e940a24d9d2c217fe4f51).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-783949409


   **[Test build #135366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135366/testReport)** for PR 31618 at commit [`54f14cc`](https://github.com/apache/spark/commit/54f14ccca857e780c6d4370ee45c3302d4437a8c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784015079


   Thank you, @viirya !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-783979247


   I updated the indirect benchmark result by using `lzbench`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31618:
URL: https://github.com/apache/spark/pull/31618#discussion_r585082091



##########
File path: docs/configuration.md
##########
@@ -1040,10 +1040,9 @@ Apart from these, the following properties are also available, and may be useful
 </tr>
 <tr>
   <td><code>spark.eventLog.compression.codec</code></td>
-  <td></td>
+  <td>zstd</td>
   <td>
-    The codec to compress logged events. If this is not given,
-    <code>spark.io.compression.codec</code> will be used.
+    The codec to compress logged events.

Review comment:
       Here, I made a PR.
   - https://github.com/apache/spark/pull/31695




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784662119


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135390/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-783992753


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39945/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-783927059


   **[Test build #135365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135365/testReport)** for PR 31618 at commit [`6ceea1f`](https://github.com/apache/spark/commit/6ceea1f7d1355f7f7b78fa507783ea742ab771aa).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784576749


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39970/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784611739


   Okie. I'm good with this change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-783997001






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-783973623


   No~ This only decides write codec for new logs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #31618:
URL: https://github.com/apache/spark/pull/31618#discussion_r584901766



##########
File path: docs/configuration.md
##########
@@ -1040,10 +1040,9 @@ Apart from these, the following properties are also available, and may be useful
 </tr>
 <tr>
   <td><code>spark.eventLog.compression.codec</code></td>
-  <td></td>
+  <td>zstd</td>
   <td>
-    The codec to compress logged events. If this is not given,
-    <code>spark.io.compression.codec</code> will be used.
+    The codec to compress logged events.

Review comment:
       sorry for coming in late, was out last week, we may want to reference what other codecs can be used here.   @dongjoon-hyun thoughts?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784017849


   I think it's not an obvious win though .. Zstd looks more for archiving purpose with less throughput with high compression ratio vs lz4 is for more throughput with less compression.
   
   > The main purpose of event logs is archiving. Many logs are generated and occupy the storage, but most of them are never accessed by users.
   
   But I tend to agree with this. cc @HeartSaVioR or @tgravescs too FYI in case you guys have different thoughts on this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31618:
URL: https://github.com/apache/spark/pull/31618#discussion_r585078575



##########
File path: docs/configuration.md
##########
@@ -1040,10 +1040,9 @@ Apart from these, the following properties are also available, and may be useful
 </tr>
 <tr>
   <td><code>spark.eventLog.compression.codec</code></td>
-  <td></td>
+  <td>zstd</td>
   <td>
-    The codec to compress logged events. If this is not given,
-    <code>spark.io.compression.codec</code> will be used.
+    The codec to compress logged events.

Review comment:
       Thank you for review, @tgravescs . Sure, I'll make a documentation follow-up.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #31618:
URL: https://github.com/apache/spark/pull/31618#discussion_r580856999



##########
File path: docs/core-migration-guide.md
##########
@@ -24,6 +24,8 @@ license: |
 
 ## Upgrading from Core 3.1 to 3.2
 
+- Since Spark 3.2, `spark.eventLog.compression.codec` has `zstd` by default which means Spark will not fallback to use `spark.io.compression.codec`. To restore the behavior before Spark 3.2, you can set `spark.eventLog.compression.codec` explicitly to the value of `spark.io.compression.codec`.

Review comment:
       nit but maybe: "`spark.eventLog.compression.codec` is set to `zstd` by default which means Spark will not fallback to use `spark.io.compression.codec` anymore"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784551734


   I agree with the statement the event log directory is most likely placed in remote storage in practice, and for that case reducing size would affect the overall latency. It would be really appreciated if we could see direct benchmark (compress and send to S3 & receive from S3 and decompress) - probably run 10~100 times for each and takes median?, but that's optional and I'd tend to agree small difference from compression/decompression could be caught with reduced network cost.
   
   Btw,
   
   ```
   $ lz4 -d spark-d3deba027bd34435ba849e14fc2c42ef.lz4
   Decoding file spark-d3deba027bd34435ba849e14fc2c42ef
   Error 44 : Unrecognized header : file cannot be decoded
   ```
   
   makes me feel Spark does something wrong with lz4, or lz4 has varient which aren't compatible. Anyone knows why this doesn't work?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-783996992






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784027786


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135366/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784566920


   **[Test build #135390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135390/testReport)** for PR 31618 at commit [`0e88652`](https://github.com/apache/spark/commit/0e88652b99b833551d3e940a24d9d2c217fe4f51).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-783960565


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39946/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-783946203


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39945/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31618:
URL: https://github.com/apache/spark/pull/31618#discussion_r581413377



##########
File path: docs/core-migration-guide.md
##########
@@ -24,6 +24,8 @@ license: |
 
 ## Upgrading from Core 3.1 to 3.2
 
+- Since Spark 3.2, `spark.eventLog.compression.codec` has `zstd` by default which means Spark will not fallback to use `spark.io.compression.codec`. To restore the behavior before Spark 3.2, you can set `spark.eventLog.compression.codec` explicitly to the value of `spark.io.compression.codec`.

Review comment:
       Sure.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31618: [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31618:
URL: https://github.com/apache/spark/pull/31618#issuecomment-784021009


   **[Test build #135366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135366/testReport)** for PR 31618 at commit [`54f14cc`](https://github.com/apache/spark/commit/54f14ccca857e780c6d4370ee45c3302d4437a8c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org