You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/19 00:18:29 UTC

[GitHub] [spark] bersprockets commented on a diff in pull request #39117: [SPARK-41535][SQL] Set null correctly for interval fields in `InterpretedUnsafeProjection` and `InterpretedMutableProjection`

bersprockets commented on code in PR #39117:
URL: https://github.com/apache/spark/pull/39117#discussion_r1051693056


##########
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java:
##########
@@ -191,4 +192,17 @@ public void write(int ordinal, Decimal input, int precision, int scale) {
       setNull(ordinal);
     }
   }
+
+  @Override
+  public void write(int ordinal, CalendarInterval input) {

Review Comment:
   Inspired by how`InterpretedUnsafeProjection` already handles decimals, I changed `InterpretedUnsafeProjection` to always call `unsafeWriter.write` for calendar interval values, regardless of whether the input value is null. This broke the case where the calendar interval value is contained in an array (or map). In that case, the base implementation (`UnsafeWriter.write(int, CalendarInterval)`) did not properly set null values (it stomped on the numElements field).
   
   To fix it, again I looked at how decimals are treated, and therefore added an override of `write(int, CalendarInterval)` to `UnsafeArrayWriter`.
   
   Note, the previous lack of this method was not a problem for generated code. That's because `GenerateUnsafeProjection` handles decimals and intervals that are contained within an array different than decimals and intervals that are at the top level or contained in a struct.
   
   In the _array_ case, `GenerateUnsafeProjection` generates code that explicitly sets null when the input value is null:
   ```scala
   /* 054 */       for (int index_0 = 0; index_0 < numElements_0; index_0++) {
   /* 055 */
   /* 056 */         if (tmpInput_0.isNullAt(index_0)) {
   /* 057 */           mutableStateArray_1[0].setNull8Bytes(index_0);
   /* 058 */         } else {
   /* 059 */           mutableStateArray_1[0].write(index_0, tmpInput_0.getInterval(index_0));
   /* 060 */         }
   /* 061 */
   /* 062 */       }
   ```
   Compare to the top-level/struct case, where the generated code always uses `writer.write`:
   ```scala
   /* 033 */     boolean isNull_0 = i.isNullAt(0);
   /* 034 */     CalendarInterval value_0 = isNull_0 ?
   /* 035 */     null : (i.getInterval(0));
   /* 036 */     if (isNull_0) {
   /* 037 */       mutableStateArray_0[0].write(0, (CalendarInterval) null);
   /* 038 */     } else {
   /* 039 */       mutableStateArray_0[0].write(0, value_0);
   /* 040 */     }
   
   ```
   This means that `InterpretedUnsafeProjection` and `GenerateUnsafeProjection` have slightly different logic in the handling of null decimal or null calendar interval values in arrays (and maps): `InterpretedUnsafeProjection` always calls `writer.write`, and `GenerateUnsafeProjection` only does so for the top-level/struct case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org