You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2023/10/27 12:48:25 UTC

[PR] [SPARK-45711][SQL] Introduce a mapper for avro compression codecs [spark]

beliefer opened a new pull request, #43562:
URL: https://github.com/apache/spark/pull/43562

   ### What changes were proposed in this pull request?
   Currently, Spark supported all the avro compression codecs, but the avro supported compression codecs and spark supported are not completely one-on-one due to Spark introduce the compression codecs `UNCOMPRESSED`.
   
   On the other hand, there are a lot of magic strings copy from avro compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency.
   
   
   ### Why are the changes needed?
   Let developers easy to use avro compression codecs.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   Introduce a new class.
   
   
   ### How was this patch tested?
   Exists test cases.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   'No'.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45711][SQL] Introduce a mapper for avro compression codecs [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43562:
URL: https://github.com/apache/spark/pull/43562#discussion_r1375231417


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala:
##########
@@ -100,18 +100,19 @@ private[sql] object AvroUtils extends Logging {
 
     AvroJob.setOutputKeySchema(job, outputAvroSchema)
 
-    if (parsedOptions.compression == "uncompressed") {
+    if (parsedOptions.compression == UNCOMPRESSED.name().toLowerCase(Locale.ROOT)) {

Review Comment:
   HaHa! Let's add it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45711][SQL] Introduce a mapper for avro compression codecs [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43562:
URL: https://github.com/apache/spark/pull/43562#discussion_r1375230764


##########
connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala:
##########
@@ -100,18 +100,19 @@ private[sql] object AvroUtils extends Logging {
 
     AvroJob.setOutputKeySchema(job, outputAvroSchema)
 
-    if (parsedOptions.compression == "uncompressed") {
+    if (parsedOptions.compression == UNCOMPRESSED.name().toLowerCase(Locale.ROOT)) {

Review Comment:
   @beliefer For instance, here
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45711][SQL] Introduce a mapper for avro compression codecs [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43562:
URL: https://github.com/apache/spark/pull/43562#discussion_r1375149364


##########
connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala:
##########
@@ -680,18 +680,24 @@ abstract class AvroSuite
       val zstandardDir = s"$dir/zstandard"
 
       val df = spark.read.format("avro").load(testAvro)
-      spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key, "uncompressed")
+      spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key,
+        AvroCompressionCodec.UNCOMPRESSED.name().toLowerCase(Locale.ROOT))

Review Comment:
   How about adding a `lowerCaseName()` method to `AvroCompressionCodec`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45711][SQL] Introduce a mapper for avro compression codecs [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43562:
URL: https://github.com/apache/spark/pull/43562#discussion_r1375228322


##########
connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala:
##########
@@ -680,18 +680,24 @@ abstract class AvroSuite
       val zstandardDir = s"$dir/zstandard"
 
       val df = spark.read.format("avro").load(testAvro)
-      spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key, "uncompressed")
+      spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key,
+        AvroCompressionCodec.UNCOMPRESSED.name().toLowerCase(Locale.ROOT))

Review Comment:
   I have considered doing this before. But these usage just exists in test cases. So I think it's not worth.
   If the non test path need, we add it at that time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45711][SQL] Introduce a mapper for avro compression codecs [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #43562:
URL: https://github.com/apache/spark/pull/43562#issuecomment-1783969081

   Merged to master. @LuciferYang Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45711][SQL] Introduce a mapper for avro compression codecs [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer closed pull request #43562: [SPARK-45711][SQL] Introduce a mapper for avro compression codecs
URL: https://github.com/apache/spark/pull/43562


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org