You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2023/10/09 12:13:35 UTC

[PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

beliefer opened a new pull request, #43296:
URL: https://github.com/apache/spark/pull/43296

   ### What changes were proposed in this pull request?
   Currently, Hive supports ORC format with some compression codec( Please refer org.apache.hadoop.hive.ql.io.orc.CompressionKind).
   
   Spark pasted many string literal of these compression codec. It is easy to make mistakes and reduce development efficiency.
   
   
   ### Why are the changes needed?
   Avoid paste string value of hive orc compression kind
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   Just update inner implementation.
   
   
   ### How was this patch tested?
   Exists test cases.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   'No'.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #43296:
URL: https://github.com/apache/spark/pull/43296#issuecomment-1754353217

   > Just a question, is this required for [SPARK-44114](https://issues.apache.org/jira/browse/SPARK-44114)? Or, simply want to remove the magic strings?
   
   I investigated the source code of Hive-2 and Hive-3, the supported orc compression kind not changed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43296:
URL: https://github.com/apache/spark/pull/43296#issuecomment-1754448693

   Merged to master for Apache Spark 4.0.0~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43296:
URL: https://github.com/apache/spark/pull/43296#issuecomment-1754419226

   Got it. Are we going to do the same clean-up for the other data sources like Parquet? Or Avro in `avro` module?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43296:
URL: https://github.com/apache/spark/pull/43296#discussion_r1351137985


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcHadoopFsRelationSuite.scala:
##########
@@ -87,7 +88,7 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest {
       val path = s"${dir.getCanonicalPath}/table1"
       val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b")
       df.write
-        .option("compression", "ZlIb")
+        .option("compression", CompressionKind.ZLIB.name())

Review Comment:
   I will adjust it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43296:
URL: https://github.com/apache/spark/pull/43296#issuecomment-1754322080

   Just a question, is this required for SPARK-44114? Or, simply want to remove the magic strings?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #43296: [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind
URL: https://github.com/apache/spark/pull/43296


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43296:
URL: https://github.com/apache/spark/pull/43296#discussion_r1350717818


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcHadoopFsRelationSuite.scala:
##########
@@ -87,7 +88,7 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest {
       val path = s"${dir.getCanonicalPath}/table1"
       val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b")
       df.write
-        .option("compression", "ZlIb")
+        .option("compression", CompressionKind.ZLIB.name())

Review Comment:
   Does this one test case-insensitivity? I wouldn't change tests, frankly speaking. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #43296:
URL: https://github.com/apache/spark/pull/43296#issuecomment-1754457307

   @dongjoon-hyun @LuciferYang @MaxGekk Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #43296:
URL: https://github.com/apache/spark/pull/43296#issuecomment-1754437003

   > Got it. Are we going to do the same clean-up for the other data sources like Parquet? Or Avro in `avro` module?
   
   Yes. I will do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #43296:
URL: https://github.com/apache/spark/pull/43296#issuecomment-1754189027

   > Ur, very sorry, but I'd not do this ORC change when we do the samething still for Parquet , @beliefer . This only increases the compile depdency for `org.apache.hadoop.hive.ql.io.orc.CompressionKind`. We prefer the existing loose-coupled way.
   
   All the change in the hive module already depends on `hive-exec`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43296:
URL: https://github.com/apache/spark/pull/43296#discussion_r1351137985


##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcHadoopFsRelationSuite.scala:
##########
@@ -87,7 +88,7 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest {
       val path = s"${dir.getCanonicalPath}/table1"
       val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b")
       df.write
-        .option("compression", "ZlIb")
+        .option("compression", CompressionKind.ZLIB.name())

Review Comment:
   I will revert this one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org