You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "walterddr (via GitHub)" <gi...@apache.org> on 2023/11/01 15:15:47 UTC

Re: [PR] Add nullability to FieldSpec and use it in TypeFactory [pinot]

walterddr commented on code in PR #11824:
URL: https://github.com/apache/pinot/pull/11824#discussion_r1378937058


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -300,6 +303,27 @@ public void setTransformFunction(@Nullable String transformFunction) {
     _transformFunction = transformFunction;
   }
 
+  /**
+   * Returns whether the column is nullable or not.
+   *
+   * This Java property itself is nullable. In case null is returned, actual nullability depends on the value of
+   * {@link IndexingConfig#isNullHandlingEnabled()}.
+   *
+   * @return true if the field is nullable, false if it is not and null if column level nullability should be delegated
+   * to {@link IndexingConfig#isNullHandlingEnabled()}.
+   */
+  @Nullable
+  public Boolean getNullable() {

Review Comment:
   sorry for following up late:
   
   > Phase2: ... otherwise delegate on `IndexingConfig.isNullHandlingEnabled()` ?
   > Phase 3:
   > Deprecate `IndexingConfig.isNullHandlingEnabled()` (although we still need to support it for compatibility reasons)
   
   ^ when it is deprecated in phase 3 what's the logic originally in phase 2? is it false or true?
   
   > There are potential big performance impact
   
   IMO: 4 scenarios currently
   1. `IndexingConfig.isNullHandlingEnabled() = true` and `SET enableNullHandling=true;`
   2. `IndexingConfig.isNullHandlingEnabled() = false` and `SET enableNullHandling=true;`
   3. `IndexingConfig.isNullHandlingEnabled() = true` and `SET enableNullHandling=false;`
   4. `IndexingConfig.isNullHandlingEnabled() = false` and `SET enableNullHandling=false;`
   
   as long as the following are met we should be good: 
   - performance on query should be the same on both 3&4 --> we should guarantee the same if all columns are marked as non-nullable by the user 
   - performance on query should be the same on both 1&2 and will be worst than 3&4 b/c it accesses the null vector --> we should guarantee better performance when some columns are marked as non-nullable by the user
   
   
   > I don't think that is a good idea. Without making this non-primitive we cannot know whether the value was explicitly defined as false or it has defaulted to false. We would also be breaking backward compatibility. By using a primitive boolean with false by default, in order to make a column nullable users would need to:
   > In Phase 1:
   >   - Set IndexingConfig.isNullHandlingEnabled() to true. This is required at write time, given in V1 write path ignores FieldSpec nullability.
   >   - Set each, for each nullable column, FieldSpec nullability to true. This is a breaking change. All tables already created would need to be modified, otherwise they would be suddenly treated as not null.
   > In Phase 2:
   >   - Set each, for each nullable column, FieldSpec nullability to true.
   
   there's no backward compatibility here, b/c the default behavior of V1 is all non-nullable, so setting it to primitive boolean with default to false exactly achieves that, 
   you don't need to set nullability = true for all columns, b/c
     - in v1 this can be controlled (and should still be controlled by `SET enableNullHandling=true` option regardless whether this field is `boolean` or `Boolean`
     - in v2 there's no such option, so it will honor the nullability setting --> which requires it to be set explicitly to true anyway otherwise nothing works.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org