You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2023/10/11 11:25:45 UTC

[PR] [SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw [spark]

beliefer opened a new pull request, #43330:
URL: https://github.com/apache/spark/pull/43330

   ### What changes were proposed in this pull request?
   According to the discussion at https://github.com/apache/spark/pull/43310#issuecomment-1757139681, this PR want deprecates the incorrect parquet compression codec `lz4raw` at Spark 3.5.1 and adds a warning log.
   
   The warning log prompts users that `lz4raw` will be removed it at Apache Spark 4.0.0.
   
   
   ### Why are the changes needed?
   Deprecated the incorrect parquet compression codec `lz4raw`.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'Yes'.
   Users will see the waring log below.
   `Parquet compression codec 'lz4raw' is deprecated, please use 'lz4_raw'`
   
   
   ### How was this patch tested?
   Exists test cases and new test cases.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   'No'.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #43330:
URL: https://github.com/apache/spark/pull/43330#issuecomment-1762547364

   The GA failure is unrelated to this PR.
   `Linters, licenses, dependencies and documentation generation`
   ```
   /usr/lib/ruby/2.7.0/fileutils.rb:1415:in `copy_stream': No space left on device - copy_file_range (Errno::ENOSPC)
   [12872](https://github.com/beliefer/spark/actions/runs/6492096910/job/17696356466#step:24:12872)
   	from /usr/lib/ruby/2.7.0/fileutils.rb:1415:in `block (2 levels) in copy_file'
   [12873](https://github.com/beliefer/spark/actions/runs/6492096910/job/17696356466#step:24:12873)
   	from /usr/lib/ruby/2.7.0/fileutils.rb:1414:in `open'
   [12874](https://github.com/beliefer/spark/actions/runs/6492096910/job/17696356466#step:24:12874)
   	from /usr/lib/ruby/2.7.0/fileutils.rb:1414:in `block in copy_file'
   [12875](https://github.com/beliefer/spark/actions/runs/6492096910/job/17696356466#step:24:12875)
   	from /usr/lib/ruby/2.7.0/fileutils.rb:1413:in `open'
   [12876](https://github.com/beliefer/spark/actions/runs/6492096910/job/17696356466#step:24:12876)
   	from /usr/lib/ruby/2.7.0/fileutils.rb:1413:in `copy_file'
   [12877](https://github.com/beliefer/spark/actions/runs/6492096910/job/17696356466#step:24:12877)
   	from /usr/lib/ruby/2.7.0/fileutils.rb:1378:in `copy'
   [12879](https://github.com/beliefer/spark/actions/runs/6492096910/job/17696356466#step:24:12879)
   Error: No space left on device : '/home/runner/runners/2.309.0/_diag/pages/b7db6e67-b9c1-4011-a807-e832ba0cf437_fef911b0-771d-54c9-1f3c-e23e2c04b8fb_1.log'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43330:
URL: https://github.com/apache/spark/pull/43330#discussion_r1355248487


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCompressionCodecPrecedenceSuite.scala:
##########
@@ -94,18 +108,22 @@ class ParquetCompressionCodecPrecedenceSuite extends ParquetTest with SharedSpar
     withTempDir { tmpDir =>
       val tempTableName = "TempParquetTable"
       withTable(tempTableName) {
+

Review Comment:
   nit. This looks like a mistake. Let's remove this empty line. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #43330:
URL: https://github.com/apache/spark/pull/43330#issuecomment-1762547574

   cc @dongjoon-hyun @wangyum 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43330:
URL: https://github.com/apache/spark/pull/43330#issuecomment-1763682065

   > > Details
   > 
   > cc @zhengruifeng Could we possibly backport `free_disk_space_container` to branc-3.5?
   
   backport [SPARK-44619](https://issues.apache.org/jira/browse/SPARK-44619) to branch-3.5 to avoid  `No space left on device` https://github.com/apache/spark/pull/43381 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #43330:
URL: https://github.com/apache/spark/pull/43330#issuecomment-1765532789

   @srowen @dongjoon-hyun @LuciferYang Merged! Thank you all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43330:
URL: https://github.com/apache/spark/pull/43330#discussion_r1355245520


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCompressionCodecPrecedenceSuite.scala:
##########
@@ -29,9 +29,23 @@ import org.apache.spark.sql.test.SharedSparkSession
 
 class ParquetCompressionCodecPrecedenceSuite extends ParquetTest with SharedSparkSession {
   test("Test `spark.sql.parquet.compression.codec` config") {
-    Seq("NONE", "UNCOMPRESSED", "SNAPPY", "GZIP", "LZO", "LZ4", "BROTLI", "ZSTD").foreach { c =>
+    Seq(
+      "NONE",
+      "UNCOMPRESSED",
+      "SNAPPY",
+      "GZIP",
+      "LZO",
+      "LZ4",
+      "BROTLI",
+      "ZSTD",
+      "LZ4RAW",
+      "LZ4_RAW").foreach { c =>
       withSQLConf(SQLConf.PARQUET_COMPRESSION.key -> c) {
-        val expected = if (c == "NONE") "UNCOMPRESSED" else c
+        val expected = c match {
+          case "NONE" => "UNCOMPRESSED"
+          case "LZ4RAW" => "LZ4_RAW"

Review Comment:
   Oh, why the expected value for `LZ4RAW` is changed to `LZ4_RAW`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer closed pull request #43330: [SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw
URL: https://github.com/apache/spark/pull/43330


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43330:
URL: https://github.com/apache/spark/pull/43330#discussion_r1355245520


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCompressionCodecPrecedenceSuite.scala:
##########
@@ -29,9 +29,23 @@ import org.apache.spark.sql.test.SharedSparkSession
 
 class ParquetCompressionCodecPrecedenceSuite extends ParquetTest with SharedSparkSession {
   test("Test `spark.sql.parquet.compression.codec` config") {
-    Seq("NONE", "UNCOMPRESSED", "SNAPPY", "GZIP", "LZO", "LZ4", "BROTLI", "ZSTD").foreach { c =>
+    Seq(
+      "NONE",
+      "UNCOMPRESSED",
+      "SNAPPY",
+      "GZIP",
+      "LZO",
+      "LZ4",
+      "BROTLI",
+      "ZSTD",
+      "LZ4RAW",
+      "LZ4_RAW").foreach { c =>
       withSQLConf(SQLConf.PARQUET_COMPRESSION.key -> c) {
-        val expected = if (c == "NONE") "UNCOMPRESSED" else c
+        val expected = c match {
+          case "NONE" => "UNCOMPRESSED"
+          case "LZ4RAW" => "LZ4_RAW"

Review Comment:
   Oh, why the expected value for `LZ4RAW` is changed to `LZ4_RAW`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw [spark]

Posted by "srowen (via GitHub)" <gi...@apache.org>.
srowen commented on PR #43330:
URL: https://github.com/apache/spark/pull/43330#issuecomment-1757917444

   So this change is only needed in 3.5, and we already fixed it differently in 4.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43330:
URL: https://github.com/apache/spark/pull/43330#issuecomment-1763633049

   > Details
   
   cc @zhengruifeng Could we possibly backport `free_disk_space_container` to branc-3.5?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43330:
URL: https://github.com/apache/spark/pull/43330#issuecomment-1757964545

   cc @wangyum 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45484][SQL][3.5] Deprecated the incorrect parquet compression codec lz4raw [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #43330:
URL: https://github.com/apache/spark/pull/43330#issuecomment-1758993812

   > So this change is only needed in 3.5, and we already fixed it differently in 4.0?
   
   Yes. This PR only used for 3.5.1. and https://github.com/apache/spark/pull/43310 used to fix it in 4.0.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org