You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "johanl-db (via GitHub)" <gi...@apache.org> on 2024/01/19 16:53:49 UTC

[PR] [SPARK-40876][SQL] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

johanl-db opened a new pull request, #44803:
URL: https://github.com/apache/spark/pull/44803

   ### What changes were proposed in this pull request?
   This is a follow-up from https://github.com/apache/spark/pull/44368 and https://github.com/apache/spark/pull/44513, implementing an additional type promotion from integers to decimals in the parquet vectorized reader, bringing it at parity with the non-vectorized reader in that regard.
   
   ### Why are the changes needed?
   This allows reading parquet files that have different schemas and mix decimals and integers - e.g reading files containing either `Decimal(15, 2)` and `INT32` as `Decimal(15, 2)` - as long as the requested decimal type is large enough to accommodate the integer values without precision loss.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, the following now succeeds when using the vectorized Parquet reader:
   ```
     Seq(20).toDF($"a".cast(IntegerType)).write.parquet(path)
     spark.read.schema("a decimal(12, 0)").parquet(path).collect()
   ```
   It failed before with the vectorized reader and succeeded with the non-vectorized reader.
   
   ### How was this patch tested?
   - Tests added to `ParquetWideningTypeSuite`
   - Updated relevant `ParquetQuerySuite` test.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-40876][SQL] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

Posted by "johanl-db (via GitHub)" <gi...@apache.org>.

johanl-db commented on code in PR #44803:
URL: https://github.com/apache/spark/pull/44803#discussion_r1463500581


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala:
##########
@@ -201,17 +290,17 @@ class ParquetTypeWideningSuite
     Seq(5 -> 7, 5 -> 10, 5 -> 20, 10 -> 12, 10 -> 20, 20 -> 22) ++
       Seq(7 -> 5, 10 -> 5, 20 -> 5, 12 -> 10, 20 -> 10, 22 -> 20)
   }
-    test(
-      s"parquet decimal precision change Decimal($fromPrecision, 2) -> Decimal($toPrecision, 2)") {
-      checkAllParquetReaders(
-        values = Seq("1.23", "10.34"),
-        fromType = DecimalType(fromPrecision, 2),
-        toType = DecimalType(toPrecision, 2),
-        expectError = fromPrecision > toPrecision &&
-          // parquet-mr allows reading decimals into a smaller precision decimal type without
-          // checking for overflows. See test below checking for the overflow case in parquet-mr.
-          spark.conf.get(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key).toBoolean)
-    }
+  test(
+    s"parquet decimal precision change Decimal($fromPrecision, 2) -> Decimal($toPrecision, 2)") {
+    checkAllParquetReaders(
+      values = Seq("1.23", "10.34"),
+      fromType = DecimalType(fromPrecision, 2),
+      toType = DecimalType(toPrecision, 2),
+      expectError = fromPrecision > toPrecision &&
+        // parquet-mr allows reading decimals into a smaller precision decimal type without
+        // checking for overflows. See test below checking for the overflow case in parquet-mr.

Review Comment:
   Decimal values are set to null on overflow, see https://github.com/apache/spark/pull/44803/files/8935b284e5519038f78fd95c8d12e66224f29d63#diff-a5cfd7285f9adf95b2aeea90aa57cc35d2b8c6bddaa0f4652172d30a264d3614R347
   
   Integers silently overflow on the other hand:
   https://github.com/apache/spark/pull/44803/files/8935b284e5519038f78fd95c8d12e66224f29d63#diff-a5cfd7285f9adf95b2aeea90aa57cc35d2b8c6bddaa0f4652172d30a264d3614R363
   
   Arguably not great but changing it would be a breaking change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-40876][SQL] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun closed pull request #44803: [SPARK-40876][SQL] Widening type promotion from integers to decimal in Parquet vectorized reader
URL: https://github.com/apache/spark/pull/44803


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-40876][SQL][FOLLOWUP] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #44803:
URL: https://github.com/apache/spark/pull/44803#discussion_r1464193246


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala:
##########
@@ -201,17 +290,17 @@ class ParquetTypeWideningSuite
     Seq(5 -> 7, 5 -> 10, 5 -> 20, 10 -> 12, 10 -> 20, 20 -> 22) ++
       Seq(7 -> 5, 10 -> 5, 20 -> 5, 12 -> 10, 20 -> 10, 22 -> 20)
   }
-    test(
-      s"parquet decimal precision change Decimal($fromPrecision, 2) -> Decimal($toPrecision, 2)") {
-      checkAllParquetReaders(
-        values = Seq("1.23", "10.34"),
-        fromType = DecimalType(fromPrecision, 2),
-        toType = DecimalType(toPrecision, 2),
-        expectError = fromPrecision > toPrecision &&
-          // parquet-mr allows reading decimals into a smaller precision decimal type without
-          // checking for overflows. See test below checking for the overflow case in parquet-mr.
-          spark.conf.get(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key).toBoolean)
-    }
+  test(
+    s"parquet decimal precision change Decimal($fromPrecision, 2) -> Decimal($toPrecision, 2)") {
+    checkAllParquetReaders(
+      values = Seq("1.23", "10.34"),
+      fromType = DecimalType(fromPrecision, 2),
+      toType = DecimalType(toPrecision, 2),
+      expectError = fromPrecision > toPrecision &&
+        // parquet-mr allows reading decimals into a smaller precision decimal type without
+        // checking for overflows. See test below checking for the overflow case in parquet-mr.

Review Comment:
   and vectorized reader just doesn't allow it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-40876][SQL] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #44803:
URL: https://github.com/apache/spark/pull/44803#discussion_r1462975131


##########
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java:
##########
@@ -151,18 +151,17 @@ private boolean isLazyDecodingSupported(
     // rebasing.
     switch (typeName) {
       case INT32: {
-        boolean isDate = logicalTypeAnnotation instanceof DateLogicalTypeAnnotation;
-        boolean isDecimal = logicalTypeAnnotation instanceof DecimalLogicalTypeAnnotation;
+        boolean isDecimal = sparkType instanceof DecimalType;
         boolean needsUpcast = sparkType == LongType || sparkType == DoubleType ||
-          (isDate && sparkType == TimestampNTZType) ||

Review Comment:
   Oh I see, this is inside `isLazyDecodingSupported`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-40876][SQL] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

Posted by "johanl-db (via GitHub)" <gi...@apache.org>.

johanl-db commented on code in PR #44803:
URL: https://github.com/apache/spark/pull/44803#discussion_r1463500581


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala:
##########
@@ -201,17 +290,17 @@ class ParquetTypeWideningSuite
     Seq(5 -> 7, 5 -> 10, 5 -> 20, 10 -> 12, 10 -> 20, 20 -> 22) ++
       Seq(7 -> 5, 10 -> 5, 20 -> 5, 12 -> 10, 20 -> 10, 22 -> 20)
   }
-    test(
-      s"parquet decimal precision change Decimal($fromPrecision, 2) -> Decimal($toPrecision, 2)") {
-      checkAllParquetReaders(
-        values = Seq("1.23", "10.34"),
-        fromType = DecimalType(fromPrecision, 2),
-        toType = DecimalType(toPrecision, 2),
-        expectError = fromPrecision > toPrecision &&
-          // parquet-mr allows reading decimals into a smaller precision decimal type without
-          // checking for overflows. See test below checking for the overflow case in parquet-mr.
-          spark.conf.get(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key).toBoolean)
-    }
+  test(
+    s"parquet decimal precision change Decimal($fromPrecision, 2) -> Decimal($toPrecision, 2)") {
+    checkAllParquetReaders(
+      values = Seq("1.23", "10.34"),
+      fromType = DecimalType(fromPrecision, 2),
+      toType = DecimalType(toPrecision, 2),
+      expectError = fromPrecision > toPrecision &&
+        // parquet-mr allows reading decimals into a smaller precision decimal type without
+        // checking for overflows. See test below checking for the overflow case in parquet-mr.

Review Comment:
   Decimal values are set to null on overflow, see https://github.com/apache/spark/pull/44803/files/8935b284e5519038f78fd95c8d12e66224f29d63#diff-a5cfd7285f9adf95b2aeea90aa57cc35d2b8c6bddaa0f4652172d30a264d3614R347
   
   Integers wrap around on overflow on the other hand:
   https://github.com/apache/spark/pull/44803/files/8935b284e5519038f78fd95c8d12e66224f29d63#diff-a5cfd7285f9adf95b2aeea90aa57cc35d2b8c6bddaa0f4652172d30a264d3614R363
   
   Arguably not great but changing it would be a breaking change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-40876][SQL][FOLLOWUP] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

Posted by "johanl-db (via GitHub)" <gi...@apache.org>.

johanl-db commented on code in PR #44803:
URL: https://github.com/apache/spark/pull/44803#discussion_r1464709783


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala:
##########
@@ -201,17 +290,17 @@ class ParquetTypeWideningSuite
     Seq(5 -> 7, 5 -> 10, 5 -> 20, 10 -> 12, 10 -> 20, 20 -> 22) ++
       Seq(7 -> 5, 10 -> 5, 20 -> 5, 12 -> 10, 20 -> 10, 22 -> 20)
   }
-    test(
-      s"parquet decimal precision change Decimal($fromPrecision, 2) -> Decimal($toPrecision, 2)") {
-      checkAllParquetReaders(
-        values = Seq("1.23", "10.34"),
-        fromType = DecimalType(fromPrecision, 2),
-        toType = DecimalType(toPrecision, 2),
-        expectError = fromPrecision > toPrecision &&
-          // parquet-mr allows reading decimals into a smaller precision decimal type without
-          // checking for overflows. See test below checking for the overflow case in parquet-mr.
-          spark.conf.get(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key).toBoolean)
-    }
+  test(
+    s"parquet decimal precision change Decimal($fromPrecision, 2) -> Decimal($toPrecision, 2)") {
+    checkAllParquetReaders(
+      values = Seq("1.23", "10.34"),
+      fromType = DecimalType(fromPrecision, 2),
+      toType = DecimalType(toPrecision, 2),
+      expectError = fromPrecision > toPrecision &&
+        // parquet-mr allows reading decimals into a smaller precision decimal type without
+        // checking for overflows. See test below checking for the overflow case in parquet-mr.

Review Comment:
   Yes the vectorized reader throws an exception which this test is checking



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-40876][SQL][FOLLOWUP] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on PR #44803:
URL: https://github.com/apache/spark/pull/44803#issuecomment-1909136466

   Thank you, @johanl-db . 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-40876][SQL][FOLLOWUP] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

Posted by "johanl-db (via GitHub)" <gi...@apache.org>.

johanl-db commented on PR #44803:
URL: https://github.com/apache/spark/pull/44803#issuecomment-1907881225

   > Oh, @johanl-db .
   > 
   > * It seems that you forgot to add `[FOLLOWUP]` tag to the PR title.
   > * In addition, please use a new JIRA from now. You already made 5 commits with the same JIRA ID. This is very bad in terms of traceability in the community because we cannot keep track in JIRA when we need to revert one of your contributions.
   > 
   > ```
   > $ git log --oneline | grep SPARK-40876
   > 0356ac00947 [SPARK-40876][SQL] Widening type promotion from integers to decimal in Parquet vectorized reader
   > ee2a87b4642 [SPARK-40876][SQL][TESTS][FOLLOW-UP] Remove invalid decimal test case when ANSI mode is on
   > d439e34d6bd [SPARK-40876][SQL] Widening type promotion for decimals with larger scale in Parquet readers
   > c1888cdf536 [SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in `ParquetTypeWideningSuite` when `SPARK_ANSI_SQL_MODE` is set to true
   > 3361f25dc0f [SPARK-40876][SQL] Widening type promotions in Parquet readers
   > ```
   
   Understood, I'll make sure to create separate tickets for each PRs in the future (and use tags).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-40876][SQL] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #44803:
URL: https://github.com/apache/spark/pull/44803#discussion_r1462875360


##########
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java:
##########
@@ -151,18 +151,17 @@ private boolean isLazyDecodingSupported(
     // rebasing.
     switch (typeName) {
       case INT32: {
-        boolean isDate = logicalTypeAnnotation instanceof DateLogicalTypeAnnotation;
-        boolean isDecimal = logicalTypeAnnotation instanceof DecimalLogicalTypeAnnotation;
+        boolean isDecimal = sparkType instanceof DecimalType;
         boolean needsUpcast = sparkType == LongType || sparkType == DoubleType ||
-          (isDate && sparkType == TimestampNTZType) ||

Review Comment:
   why do we remove `isDate`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-40876][SQL] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

Posted by "johanl-db (via GitHub)" <gi...@apache.org>.

johanl-db commented on PR #44803:
URL: https://github.com/apache/spark/pull/44803#issuecomment-1904492850

   @cloud-fan since you reviewed the two previous changes around type promotion in parquet readers, this is adding (byte, short, int, long) -> decimal to the vectorized reader - the non-vectorized reader can already do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-40876][SQL] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #44803:
URL: https://github.com/apache/spark/pull/44803#discussion_r1462981843


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala:
##########
@@ -201,17 +290,17 @@ class ParquetTypeWideningSuite
     Seq(5 -> 7, 5 -> 10, 5 -> 20, 10 -> 12, 10 -> 20, 20 -> 22) ++
       Seq(7 -> 5, 10 -> 5, 20 -> 5, 12 -> 10, 20 -> 10, 22 -> 20)
   }
-    test(
-      s"parquet decimal precision change Decimal($fromPrecision, 2) -> Decimal($toPrecision, 2)") {
-      checkAllParquetReaders(
-        values = Seq("1.23", "10.34"),
-        fromType = DecimalType(fromPrecision, 2),
-        toType = DecimalType(toPrecision, 2),
-        expectError = fromPrecision > toPrecision &&
-          // parquet-mr allows reading decimals into a smaller precision decimal type without
-          // checking for overflows. See test below checking for the overflow case in parquet-mr.
-          spark.conf.get(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key).toBoolean)
-    }
+  test(
+    s"parquet decimal precision change Decimal($fromPrecision, 2) -> Decimal($toPrecision, 2)") {
+    checkAllParquetReaders(
+      values = Seq("1.23", "10.34"),
+      fromType = DecimalType(fromPrecision, 2),
+      toType = DecimalType(toPrecision, 2),
+      expectError = fromPrecision > toPrecision &&
+        // parquet-mr allows reading decimals into a smaller precision decimal type without
+        // checking for overflows. See test below checking for the overflow case in parquet-mr.

Review Comment:
   for non-vectorized parquet reader, what's the behavior? silent overflow?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-40876][SQL] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

Posted by "johanl-db (via GitHub)" <gi...@apache.org>.

johanl-db commented on code in PR #44803:
URL: https://github.com/apache/spark/pull/44803#discussion_r1462941494


##########
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java:
##########
@@ -151,18 +151,17 @@ private boolean isLazyDecodingSupported(
     // rebasing.
     switch (typeName) {
       case INT32: {
-        boolean isDate = logicalTypeAnnotation instanceof DateLogicalTypeAnnotation;
-        boolean isDecimal = logicalTypeAnnotation instanceof DecimalLogicalTypeAnnotation;
+        boolean isDecimal = sparkType instanceof DecimalType;
         boolean needsUpcast = sparkType == LongType || sparkType == DoubleType ||
-          (isDate && sparkType == TimestampNTZType) ||

Review Comment:
   This was redundant since reading an INT32 as TimestampNTZType necessarily requires converting the value. The fact that this only happens for parquet dates isn't really relevant here and with the current change this would be the only case where we look at the parquet type annotation which is a bit confusing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-40876][SQL] Widening type promotion from integers to decimal in Parquet vectorized reader [spark]

Posted by "johanl-db (via GitHub)" <gi...@apache.org>.

johanl-db commented on code in PR #44803:
URL: https://github.com/apache/spark/pull/44803#discussion_r1463500581


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala:
##########
@@ -201,17 +290,17 @@ class ParquetTypeWideningSuite
     Seq(5 -> 7, 5 -> 10, 5 -> 20, 10 -> 12, 10 -> 20, 20 -> 22) ++
       Seq(7 -> 5, 10 -> 5, 20 -> 5, 12 -> 10, 20 -> 10, 22 -> 20)
   }
-    test(
-      s"parquet decimal precision change Decimal($fromPrecision, 2) -> Decimal($toPrecision, 2)") {
-      checkAllParquetReaders(
-        values = Seq("1.23", "10.34"),
-        fromType = DecimalType(fromPrecision, 2),
-        toType = DecimalType(toPrecision, 2),
-        expectError = fromPrecision > toPrecision &&
-          // parquet-mr allows reading decimals into a smaller precision decimal type without
-          // checking for overflows. See test below checking for the overflow case in parquet-mr.
-          spark.conf.get(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key).toBoolean)
-    }
+  test(
+    s"parquet decimal precision change Decimal($fromPrecision, 2) -> Decimal($toPrecision, 2)") {
+    checkAllParquetReaders(
+      values = Seq("1.23", "10.34"),
+      fromType = DecimalType(fromPrecision, 2),
+      toType = DecimalType(toPrecision, 2),
+      expectError = fromPrecision > toPrecision &&
+        // parquet-mr allows reading decimals into a smaller precision decimal type without
+        // checking for overflows. See test below checking for the overflow case in parquet-mr.

Review Comment:
   Values are set to null on overflow, see https://github.com/apache/spark/pull/44803/files/8935b284e5519038f78fd95c8d12e66224f29d63#diff-a5cfd7285f9adf95b2aeea90aa57cc35d2b8c6bddaa0f4652172d30a264d3614R347
   
   Arguably not great but changing it would be a breaking change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org