You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/25 14:24:28 UTC

[GitHub] [spark] BrennanStein opened a new pull request, #37659: [SPARK-40212] [SQL] SparkSQL castPartValue does not properly handle byte & short

BrennanStein opened a new pull request, #37659:
URL: https://github.com/apache/spark/pull/37659

   ### What changes were proposed in this pull request?
   The `castPartValueToDesiredType` function now returns byte for ByteType and short for ShortType, rather than ints.
   
   ### Why are the changes needed?
   Previously, attempting to read back in a file partitioned on a byte-type column would result in a `java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Byte` exception at runtime, presumably due to this function returning an integer rather than a byte (or short). I can't think this is anything but a bug, as returning the correct data type prevents the crash.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes: it changes the observed behavior when reading in a byte/short-partitioned file.
   
   ### How was this patch tested?
   Added unit test. Without the `castPartValueToDesiredType` updates, the test fails with the stated exception.
   
   ===
   I'll note that I'm not familiar enough with the spark repo to know if this will have ripple effects elsewhere, but tests pass on my fork and since the very similar https://github.com/apache/spark/pull/36344/files only needed to touch these two files I expect this change is self-contained as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #37659: [SPARK-40212] [SQL] SparkSQL castPartValue does not properly handle byte & short

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #37659:
URL: https://github.com/apache/spark/pull/37659#discussion_r956023458


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala:
##########
@@ -530,7 +530,9 @@ object PartitioningUtils extends SQLConfHelper {
     case _ if value == DEFAULT_PARTITION_NAME => null
     case NullType => null
     case StringType => UTF8String.fromString(unescapePathName(value))
-    case ByteType | ShortType | IntegerType => Integer.parseInt(value)
+    case ByteType => Integer.parseInt(value).toByte
+    case ShortType => Integer.parseInt(value).toShort
+    case IntegerType => Integer.parseInt(value)
     case LongType => JLong.parseLong(value)
     case FloatType | DoubleType => JDouble.parseDouble(value)

Review Comment:
   It seems `FloatType` also has this issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #37659: [SPARK-40212] [SQL] SparkSQL castPartValue does not properly handle byte, short, or float

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #37659:
URL: https://github.com/apache/spark/pull/37659#issuecomment-1229250749

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #37659: [SPARK-40212] [SQL] SparkSQL castPartValue does not properly handle byte & short

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #37659:
URL: https://github.com/apache/spark/pull/37659#discussion_r956026285


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -4371,6 +4371,23 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
     }
   }
 
+  test("SPARK-40212: SparkSQL castPartValue does not properly handle byte & short") {

Review Comment:
   Move the test to `ParquetPartitionDiscoverySuite`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #37659: [SPARK-40212] [SQL] SparkSQL castPartValue does not properly handle byte, short, or float

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #37659: [SPARK-40212] [SQL] SparkSQL castPartValue does not properly handle byte, short, or float
URL: https://github.com/apache/spark/pull/37659


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37659: [SPARK-40212] [SQL] SparkSQL castPartValue does not properly handle byte, short, or float

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37659:
URL: https://github.com/apache/spark/pull/37659#issuecomment-1229652492

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #37659: [SPARK-40212] [SQL] SparkSQL castPartValue does not properly handle byte & short

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #37659:
URL: https://github.com/apache/spark/pull/37659#discussion_r956546281


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -4371,6 +4371,23 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
     }
   }
 
+  test("SPARK-40212: SparkSQL castPartValue does not properly handle byte & short") {

Review Comment:
   Please also add this fix to PR title and PR description.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #37659: [SPARK-40212] [SQL] SparkSQL castPartValue does not properly handle byte, short, or float

Posted by GitBox <gi...@apache.org>.
wangyum commented on PR #37659:
URL: https://github.com/apache/spark/pull/37659#issuecomment-1229362269

   cc @cloud-fan @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org