You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "gortiz (via GitHub)" <gi...@apache.org> on 2023/11/07 11:28:27 UTC

[PR] [draft] Explicit null handling [pinot]

gortiz opened a new pull request, #11960:
URL: https://github.com/apache/pinot/pull/11960

   This is another approach to implement column level nullability, as explained in https://github.com/apache/pinot/issues/10381.
   
   # Introducing null handling modes
   
   After discussing with @Jackie-Jiang, @walterddr and @npawar we decided to propose this third approach where each table specify how to handle nulls. There are two possible ways to handle null:
   - Table mode, which is what we were using so far
   - Column mode, which means that each column specifies its own nullability in the FieldSpec.
   
   Multi-stage engine (aka V2) has also been modified to reject queries where `SET enableNullHandling = true` AND at least one of the tables involved in the query uses Table mode null handling.
   
   ## TODO
   - [x] Modifications in `Schema`
   - [x] Modifications in `TypeFactory`
   - [x] Modifications in `NullValueIndexType`
   - [ ] Modifications in the ingestion side
   - [ ] Modifications in test engine
   - [ ] Write resource tests
   
   ## Proposed Syntax
   
   Although the null handling mode semantics are clear, it is not so clear how to configure them. What is explained here is just a proposal open to discussion and can be changed during the PR review. 
   
   The proposal is to add a new JSON property in the schema\*.
   
   This new property can be configured as:
   ```js
   {
     "schemaName": "blablabla",
     "dimensionFieldSpecs": [...],
     "options": {
        "nullHandling": {...}
     }
   }
   ```
   
   Where `nullHandling` can be either 
   * _table mode_, which means to follow null semantics supported in master right now. This is the default mode for backward compatibility reasons. It can be explicitly set as:
   ```js
   {
     "mode": "table"
   }
   ```
   
   * _column_ mode, which means that *IndexingConfig.nullHandlingEnabled is ignored* and *instead* FieldSpec.isNullable() is used to create the null value vector for that column at ingestion time and to read it at query time. It can be specified with:
   ```js
   {
     "mode": "column",
   }
   ```
   
   ### Column mode default nullability
   
   In _column_ mode columns are not nullable by default. This is another opinionated decision totally open to discuss. This means that in tables with a lot of columns where most columns are nullable, users would need to specify `FieldSpec.nullable` for a lot of columns. 
   
   In order to make changes easier in this case, _column_ mode can be customized further to define the default nullability for columns:
   ```js
   {
     "mode": "column",
     "default": false/true
   }
   ```
   
   So users can configure `"default": true` in order to make all columns nullable by default.
   
   \* It is yet to be decide whether it will be added to the TableConfig or the Schema. When this PR was open it was added to the Schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397675140


##########
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/queries/QueryRunnerTestBase.java:
##########
@@ -197,14 +199,17 @@ protected List<Object[]> queryH2(String sql)
     return result;
   }
 
-  protected void compareRowEquals(ResultTable resultTable, List<Object[]> expectedRows) {
-    compareRowEquals(resultTable, expectedRows, false);
+  protected void compareRowEquals(ResultTable resultTable, List<Object[]> expectedRows, String sql) {

Review Comment:
   ah. i dont use intellij ui. it is printed in the test report from mvn test :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397681341


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/defaultcolumn/BaseDefaultColumnHandler.java:
##########
@@ -550,6 +548,16 @@ private void createDefaultValueColumnV1Indices(String column)
         fieldSpec, true/*hasDictionary*/, dictionaryElementSize);
   }
 
+  private boolean isNullable(FieldSpec fieldSpec) {
+    if (_schema.isEnableColumnBasedNullHandling() && fieldSpec.isNullable()) {
+      return true;
+    } else {
+      return _indexLoadingConfig.getTableConfig() != null
+          && _indexLoadingConfig.getTableConfig().getIndexingConfig() != null
+          && _indexLoadingConfig.getTableConfig().getIndexingConfig().isNullHandlingEnabled();
+    }
+  }

Review Comment:
   This is incorrect. We want to ignore table based null handling when column based is enabled
   ```suggestion
       if (_schema.isEnableColumnBasedNullHandling()) {
         return fieldSpec.isNullable();
       } else {
         return _indexLoadingConfig.getTableConfig() != null
             && _indexLoadingConfig.getTableConfig().getIndexingConfig() != null
             && _indexLoadingConfig.getTableConfig().getIndexingConfig().isNullHandlingEnabled();
       }
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397677461


##########
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/queries/QueryRunnerTestBase.java:
##########
@@ -197,14 +199,17 @@ protected List<Object[]> queryH2(String sql)
     return result;
   }
 
-  protected void compareRowEquals(ResultTable resultTable, List<Object[]> expectedRows) {
-    compareRowEquals(resultTable, expectedRows, false);
+  protected void compareRowEquals(ResultTable resultTable, List<Object[]> expectedRows, String sql) {

Review Comment:
   but regardless, i think bottomline is can we agree to separate into a different PR (all the test improvements)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1389260806


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/Schema.java:
##########
@@ -57,13 +59,16 @@
  * <p>There could be multiple DIMENSION or METRIC or DATE_TIME fields, but at most 1 TIME field.
  * <p>In pinot, we store data using 5 <code>DataType</code>s: INT, LONG, FLOAT, DOUBLE, STRING. All other
  * <code>DataType</code>s will be converted to one of them.
+ *
+ * <p>The schema can be configured with different {@link Options}, including how to handle nulls</p>
  */
 @SuppressWarnings("unused")
 @JsonIgnoreProperties(ignoreUnknown = true)
 public final class Schema implements Serializable {
   private static final Logger LOGGER = LoggerFactory.getLogger(Schema.class);
 
   private String _schemaName;
+  private Options _options = new Options();

Review Comment:
   I like the proposed approach, but I've simplified to what Jackie proposed in 11e597f4fe776c53ebaafbef3bed3c68446c0b96.
   
   I've changed the PR description accordingly 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397672344


##########
pinot-query-planner/src/test/java/org/apache/pinot/query/queries/ResourceBasedQueryPlansTest.java:
##########
@@ -81,7 +83,7 @@ public void testQueryExplainPlansWithExceptions(String testCaseName, String quer
     }
   }
 
-  @DataProvider
+  @DataProvider(parallel = true)

Review Comment:
   Done in e73aac404b600f2944324ae42833c974bf2dfe36



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1392853100


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/immutable/ImmutableSegmentLoader.java:
##########
@@ -71,6 +72,17 @@ public static ImmutableSegment load(File indexDir, ReadMode readMode)
     return load(indexDir, defaultIndexLoadingConfig, null, false);
   }
 
+  /**
+   * Loads the segment with empty schema and IndexLoadingConfig. This method is used to
+   * access the segment without modifying it, i.e. in read-only mode.
+   */
+  public static ImmutableSegment load(File indexDir, ReadMode readMode, Schema schema, TableConfig tableConfig)

Review Comment:
   this is visible for testing purpose?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/nullvalue/NullValueIndexType.java:
##########
@@ -67,7 +70,7 @@ public NullValueVectorCreator createIndexCreator(IndexCreationContext context, I
 
   @Override
   public IndexConfig getDefaultConfig() {
-    return IndexConfig.DISABLED;
+    return IndexConfig.ENABLED;

Review Comment:
   just to confirm:
   1. this means null index type is always enabled, but it could contain no columns?
   2. this has nothing to do with the schema definition JSONs `option.nullHandling.mode/default` value yes?



##########
pinot-query-runtime/src/test/resources/queries/CountDistinct.json:
##########
@@ -119,11 +119,13 @@
       {
         "comments": "table aren't actually partitioned by val thus all segments can produce duplicate results, thus [[8]]",
         "sql": "SELECT SEGMENT_PARTITIONED_DISTINCT_COUNT(val) FROM {tbl1}",
+        "ignored": true,

Review Comment:
   what's the reason behind this?



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -300,6 +302,30 @@ public void setTransformFunction(@Nullable String transformFunction) {
     _transformFunction = transformFunction;
   }
 
+  /**
+   * Returns whether the column is nullable or not.
+   */
+  @JsonIgnore
+  public Boolean isNullable() {

Review Comment:
   primitive boolean



##########
pinot-integration-tests/src/test/resources/test_null_handling.schema:
##########
@@ -1,4 +1,10 @@
 {
+  "options": {
+    "nullHandling": {
+      "mode": "column",
+      "default": true

Review Comment:
   - assume ***mode*** can either be `column` or `table`?
   - so we plan to allow users to configure the ***default nullability*** on a table? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1393449972


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -317,6 +343,7 @@ public ObjectNode toJsonObject() {
     }
     appendDefaultNullValue(jsonObject);
     appendTransformFunction(jsonObject);
+    jsonObject.put("notNull", !_nullable);

Review Comment:
   this would be super error-prone going forward IMO



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397670332


##########
pinot-query-planner/src/test/java/org/apache/pinot/query/queries/ResourceBasedQueryPlansTest.java:
##########
@@ -81,7 +83,7 @@ public void testQueryExplainPlansWithExceptions(String testCaseName, String quer
     }
   }
 
-  @DataProvider
+  @DataProvider(parallel = true)

Review Comment:
   I think it runs in quite more threads than just two. As said I noticed a large performance gain when running with this option. Neither I or the CI found any issue, but as requested I'll be conservative and remove it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1387378495


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -99,6 +100,7 @@ public abstract class FieldSpec implements Comparable<FieldSpec>, Serializable {
   protected String _name;
   protected DataType _dataType;
   protected boolean _isSingleValueField = true;
+  protected Boolean _nullable = null;

Review Comment:
   Per our discussion, should we add it as primitive boolean `_notNull` and default to false?



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/Schema.java:
##########
@@ -520,14 +535,60 @@ public SchemaBuilder setSchemaName(String schemaName) {
       return this;
     }
 
+    public SchemaBuilder withOptions(Consumer<Options> consumer) {
+      consumer.accept(_schema._options);
+      return this;
+    }
+
+    public SchemaBuilder addMetricField(String name, DataType dataType) {
+      return addMetricField(name, dataType, ignore -> {
+      });
+    }
+
+    public SchemaBuilder addMetricField(String name, DataType dataType, Consumer<MetricFieldSpec> customizer) {

Review Comment:
   I don't see much value for adding this flexible customizer. `addField(FieldSpec fieldSpec)` can be handy though. This customizer is more or less creating a `FieldSpec`



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/Schema.java:
##########
@@ -57,13 +59,16 @@
  * <p>There could be multiple DIMENSION or METRIC or DATE_TIME fields, but at most 1 TIME field.
  * <p>In pinot, we store data using 5 <code>DataType</code>s: INT, LONG, FLOAT, DOUBLE, STRING. All other
  * <code>DataType</code>s will be converted to one of them.
+ *
+ * <p>The schema can be configured with different {@link Options}, including how to handle nulls</p>
  */
 @SuppressWarnings("unused")
 @JsonIgnoreProperties(ignoreUnknown = true)
 public final class Schema implements Serializable {
   private static final Logger LOGGER = LoggerFactory.getLogger(Schema.class);
 
   private String _schemaName;
+  private Options _options = new Options();

Review Comment:
   Can we keep it simple by just adding a `boolean enableColumnBasedNullHandling` and default to false? This should be a short lived flag for migration, so I'd suggest keeping it simple



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1389260806


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/Schema.java:
##########
@@ -57,13 +59,16 @@
  * <p>There could be multiple DIMENSION or METRIC or DATE_TIME fields, but at most 1 TIME field.
  * <p>In pinot, we store data using 5 <code>DataType</code>s: INT, LONG, FLOAT, DOUBLE, STRING. All other
  * <code>DataType</code>s will be converted to one of them.
+ *
+ * <p>The schema can be configured with different {@link Options}, including how to handle nulls</p>
  */
 @SuppressWarnings("unused")
 @JsonIgnoreProperties(ignoreUnknown = true)
 public final class Schema implements Serializable {
   private static final Logger LOGGER = LoggerFactory.getLogger(Schema.class);
 
   private String _schemaName;
+  private Options _options = new Options();

Review Comment:
   I like the proposed approach, but I've simplified to what Jackie proposed in 11e597f4fe776c53ebaafbef3bed3c68446c0b96



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1392916051


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/immutable/ImmutableSegmentLoader.java:
##########
@@ -71,6 +72,17 @@ public static ImmutableSegment load(File indexDir, ReadMode readMode)
     return load(indexDir, defaultIndexLoadingConfig, null, false);
   }
 
+  /**
+   * Loads the segment with empty schema and IndexLoadingConfig. This method is used to
+   * access the segment without modifying it, i.e. in read-only mode.
+   */
+  public static ImmutableSegment load(File indexDir, ReadMode readMode, Schema schema, TableConfig tableConfig)

Review Comment:
   I've changed that as suggested by Jackie and you. I forgot to modify this resource.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1393093909


##########
pinot-query-runtime/src/test/resources/queries/CountDistinct.json:
##########
@@ -119,11 +119,13 @@
       {
         "comments": "table aren't actually partitioned by val thus all segments can produce duplicate results, thus [[8]]",
         "sql": "SELECT SEGMENT_PARTITIONED_DISTINCT_COUNT(val) FROM {tbl1}",
+        "ignored": true,

Review Comment:
   ah i see. i recently fixed this in #11937 . PTAL



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397675140


##########
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/queries/QueryRunnerTestBase.java:
##########
@@ -197,14 +199,17 @@ protected List<Object[]> queryH2(String sql)
     return result;
   }
 
-  protected void compareRowEquals(ResultTable resultTable, List<Object[]> expectedRows) {
-    compareRowEquals(resultTable, expectedRows, false);
+  protected void compareRowEquals(ResultTable resultTable, List<Object[]> expectedRows, String sql) {

Review Comment:
   ah. i dont use intellij ui. it is printed in the test report from mvn test :-) the reason why i said this is b/c for example if there's queries with different h2sql and pinot sql field it will not be shown but it will be in the mvn test report. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1387432071


##########
pinot-query-planner/src/main/java/org/apache/pinot/query/type/TypeFactory.java:
##########
@@ -45,20 +46,36 @@ public TypeFactory(RelDataTypeSystem typeSystem) {
 
   public RelDataType createRelDataTypeFromSchema(Schema schema) {
     Builder builder = new Builder(this);
+    Predicate<FieldSpec> isNullable;
+    if (schema.getOptions().getNullHandling().supportsV2()) {
+      isNullable = schema.getOptions().getNullHandling()::isNullable;
+    } else {
+      isNullable = fieldSpec -> false;
+    }
     for (Map.Entry<String, FieldSpec> e : schema.getFieldSpecMap().entrySet()) {
-      builder.add(e.getKey(), toRelDataType(e.getValue()));
+      builder.add(e.getKey(), toRelDataType(e.getValue(), isNullable));
     }
     return builder.build();
   }
 
-  private RelDataType toRelDataType(FieldSpec fieldSpec) {
+  private RelDataType toRelDataType(FieldSpec fieldSpec, Predicate<FieldSpec> isNullable) {
+    RelDataType type = createSqlType(getSqlTypeName(fieldSpec));
+    boolean isArray = !fieldSpec.isSingleValueField();
+    if (isArray) {
+      type = createArrayType(type, -1);
+    }
+    if (isNullable.test(fieldSpec)) {
+      type = createTypeWithNullability(type, true);
+    }

Review Comment:
   do we consider `isNullable` flag here refer to the array itself or this is refering to the element inside the array can be null? 
   it is interesting b/c some DB consider array to be null <=> empty; but postgres do not (e.g. can have null array)
   let's make sure we document it properly here.



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/plan/server/ServerPlanRequestUtils.java:
##########
@@ -147,6 +147,13 @@ private static ServerPlanRequestContext build(OpChainExecutionContext executionC
     // 2. set pinot query options according to requestMetadataMap
     updateQueryOptions(pinotQuery, executionContext);
 
+    if (QueryOptionsUtils.isNullHandlingEnabled(pinotQuery.getQueryOptions())

Review Comment:
   i am not fully sure i understand the logic here. but I think it is OK to move this to a separate PR (not going to change the behavior if something is broken; and whatever works still works)



##########
pinot-query-planner/src/test/java/org/apache/pinot/query/queries/ResourceBasedQueryPlansTest.java:
##########
@@ -81,7 +83,7 @@ public void testQueryExplainPlansWithExceptions(String testCaseName, String quer
     }
   }
 
-  @DataProvider
+  @DataProvider(parallel = true)

Review Comment:
   what's the reason behind this change? (btw. calcite is [not threadsafe](https://lists.apache.org/thread/wqjjsnyrwxsv9czctkh74v4r8h9w3w2r))



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -99,6 +100,7 @@ public abstract class FieldSpec implements Comparable<FieldSpec>, Serializable {
   protected String _name;
   protected DataType _dataType;
   protected boolean _isSingleValueField = true;
+  protected Boolean _nullable = null;

Review Comment:
   i think primitive `protected boolean _nullable` default to true is also fine 



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/Schema.java:
##########
@@ -57,13 +59,16 @@
  * <p>There could be multiple DIMENSION or METRIC or DATE_TIME fields, but at most 1 TIME field.
  * <p>In pinot, we store data using 5 <code>DataType</code>s: INT, LONG, FLOAT, DOUBLE, STRING. All other
  * <code>DataType</code>s will be converted to one of them.
+ *
+ * <p>The schema can be configured with different {@link Options}, including how to handle nulls</p>
  */
 @SuppressWarnings("unused")
 @JsonIgnoreProperties(ignoreUnknown = true)
 public final class Schema implements Serializable {
   private static final Logger LOGGER = LoggerFactory.getLogger(Schema.class);
 
   private String _schemaName;
+  private Options _options = new Options();

Review Comment:
   +1 on this one. we will most likely default to column level null in the long run anyway



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397656622


##########
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/queries/ResourceBasedQueriesTest.java:
##########
@@ -73,6 +73,7 @@ public class ResourceBasedQueriesTest extends QueryRunnerTestBase {
   private static final String FILE_FILTER_PROPERTY = "pinot.fileFilter";
   private static final String IGNORE_FILTER_PROPERTY = "pinot.runIgnored";
   private static final int DEFAULT_NUM_PARTITIONS = 4;
+  private static final boolean REPORT_IGNORES = true;

Review Comment:
   Sure



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1389035140


##########
pinot-query-planner/src/main/java/org/apache/pinot/query/type/TypeFactory.java:
##########
@@ -45,20 +46,36 @@ public TypeFactory(RelDataTypeSystem typeSystem) {
 
   public RelDataType createRelDataTypeFromSchema(Schema schema) {
     Builder builder = new Builder(this);
+    Predicate<FieldSpec> isNullable;
+    if (schema.getOptions().getNullHandling().supportsV2()) {
+      isNullable = schema.getOptions().getNullHandling()::isNullable;
+    } else {
+      isNullable = fieldSpec -> false;
+    }
     for (Map.Entry<String, FieldSpec> e : schema.getFieldSpecMap().entrySet()) {
-      builder.add(e.getKey(), toRelDataType(e.getValue()));
+      builder.add(e.getKey(), toRelDataType(e.getValue(), isNullable));
     }
     return builder.build();
   }
 
-  private RelDataType toRelDataType(FieldSpec fieldSpec) {
+  private RelDataType toRelDataType(FieldSpec fieldSpec, Predicate<FieldSpec> isNullable) {
+    RelDataType type = createSqlType(getSqlTypeName(fieldSpec));
+    boolean isArray = !fieldSpec.isSingleValueField();
+    if (isArray) {
+      type = createArrayType(type, -1);
+    }
+    if (isNullable.test(fieldSpec)) {
+      type = createTypeWithNullability(type, true);
+    }

Review Comment:
   Most databases do not correctly support arrays (ie mysql 6.x). Postgres, on the other hand, has a sane type system (that even supports inheritance!). AFAIK when a column of type array is nullable it means the array itself is nullable. I don't know if each element in the array can be tuned nullable or not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1393800724


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/Schema.java:
##########
@@ -520,14 +531,60 @@ public SchemaBuilder setSchemaName(String schemaName) {
       return this;
     }
 
+    public SchemaBuilder withEnableColumnBasedNullHandling(boolean enableColumnBasedNullHandling) {
+      _schema.setEnableColumnBasedNullHandling(enableColumnBasedNullHandling);
+      return this;
+    }
+
+    public SchemaBuilder addMetricField(String name, DataType dataType) {
+      return addMetricField(name, dataType, ignore -> {
+      });
+    }
+
+    public SchemaBuilder addMetricField(String name, DataType dataType, Consumer<MetricFieldSpec> customizer) {
+      MetricFieldSpec fieldSpec = new MetricFieldSpec();
+      return addFieldSpec(fieldSpec, name, dataType, customizer);
+    }
+
+    public SchemaBuilder addDimensionField(String name, DataType dataType) {
+      return addDimensionField(name, dataType, ignore -> {
+      });
+    }
+
+    public SchemaBuilder addDimensionField(String name, DataType dataType, Consumer<DimensionFieldSpec> customizer) {
+      DimensionFieldSpec fieldSpec = new DimensionFieldSpec();
+      return addFieldSpec(fieldSpec, name, dataType, customizer);
+    }
+
+    public SchemaBuilder addDateTimeField(String name, DataType dataType, String format, String granularity) {
+      return addDateTimeField(name, dataType, format, granularity, ignore -> {
+      });
+    }
+
+    public SchemaBuilder addDateTimeField(String name, DataType dataType, String format, String granularity,
+        Consumer<DateTimeFieldSpec> customizer) {
+      DateTimeFieldSpec fieldSpec = new DateTimeFieldSpec();
+      fieldSpec.setFormat(format);
+      fieldSpec.setGranularity(granularity);
+      return addFieldSpec(fieldSpec, name, dataType, customizer);
+    }
+
+    private <E extends FieldSpec> SchemaBuilder addFieldSpec(E fieldSpec, String name, DataType dataType,
+        Consumer<E> customizer) {
+      fieldSpec.setName(name);
+      fieldSpec.setDataType(dataType);
+      customizer.accept(fieldSpec);
+      _schema.addField(fieldSpec);
+      return this;
+    }
+
     /**
      * Add single value dimensionFieldSpec
      */
     public SchemaBuilder addSingleValueDimension(String dimensionName, DataType dataType) {
       _schema.addField(new DimensionFieldSpec(dimensionName, dataType, true));
       return this;
     }
-

Review Comment:
   I already explained this is an improvement. I would prefer to do not remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1393805680


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -317,6 +343,7 @@ public ObjectNode toJsonObject() {
     }
     appendDefaultNullValue(jsonObject);
     appendTransformFunction(jsonObject);
+    jsonObject.put("notNull", !_nullable);

Review Comment:
   Fixed in 802e9c4f93caa0815e74d85035847c1a2153f20f



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -99,6 +100,7 @@ public abstract class FieldSpec implements Comparable<FieldSpec>, Serializable {
   protected String _name;
   protected DataType _dataType;
   protected boolean _isSingleValueField = true;
+  protected boolean _nullable = true;

Review Comment:
   Changed in 802e9c4f93caa0815e74d85035847c1a2153f20f



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -44,6 +44,7 @@
  * <p>- <code>IsSingleValueField</code>: single-value or multi-value field.
  * <p>- <code>DefaultNullValue</code>: when no value found for this field, use this value. Stored in string format.
  * <p>- <code>VirtualColumnProvider</code>: the virtual column provider to use for this field.
+ * <p>- <code>Nullable</code>: whether the column is nullable or not. Defaults to null

Review Comment:
   Fixed in ccf1ef0f42aaaad7cdb7145813b88e360aae1ded



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1389027448


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/Schema.java:
##########
@@ -520,14 +535,60 @@ public SchemaBuilder setSchemaName(String schemaName) {
       return this;
     }
 
+    public SchemaBuilder withOptions(Consumer<Options> consumer) {
+      consumer.accept(_schema._options);
+      return this;
+    }
+
+    public SchemaBuilder addMetricField(String name, DataType dataType) {
+      return addMetricField(name, dataType, ignore -> {
+      });
+    }
+
+    public SchemaBuilder addMetricField(String name, DataType dataType, Consumer<MetricFieldSpec> customizer) {

Review Comment:
   It is very valuable in order to keep the fluent dsl of the builder. There are several examples in this PR. Basically it is quite easier to write something like:
   
   ``` 
   Schema schema = new SchemaBuilder()
      .addMetricField("name", type, field -> {
         //do your changes here
      })
      .build();
   ```
   
   Than:
   
   ``` 
   FieldSpec metric = new MetricField("name", type);
   // do your changes here
   Schema schema = new SchemaBuilder()
      .addField(field)
   .build();
   ```
   
   That is clear by the fact we have several builder methods like:
   ```
    public SchemaBuilder addSingleValueDimension(String dimensionName, DataType dataType);
    public SchemaBuilder addSingleValueDimension(String dimensionName, DataType dataType, Object defaultNullValue);
   public SchemaBuilder addSingleValueDimension(String dimensionName, DataType dataType, int maxLength,
           Object defaultNullValue);
   ///etc
   ```
   
   The current method does not scale while having a consumer is cleaner, shorter, scalable and safer. The later comes from the fact that having so many overloaded methods end up being problematic once we have a collision in types, while the consumer method is always type safe and explicit given we use named setters.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1389039129


##########
pinot-query-planner/src/test/java/org/apache/pinot/query/queries/ResourceBasedQueryPlansTest.java:
##########
@@ -81,7 +83,7 @@ public void testQueryExplainPlansWithExceptions(String testCaseName, String quer
     }
   }
 
-  @DataProvider
+  @DataProvider(parallel = true)

Review Comment:
   Calcite is not problematic here given we create a new instance on each query. It has to be thread safe in that case, otherwise two concurrent queries may affect each other.
   
   This change reduce the time to test in my local computer by half. Given it is a free performance gain, I included it here and I encourage to use parallel whenever is possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397661235


##########
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/queries/QueryRunnerTestBase.java:
##########
@@ -197,14 +199,17 @@ protected List<Object[]> queryH2(String sql)
     return result;
   }
 
-  protected void compareRowEquals(ResultTable resultTable, List<Object[]> expectedRows) {
-    compareRowEquals(resultTable, expectedRows, false);
+  protected void compareRowEquals(ResultTable resultTable, List<Object[]> expectedRows, String sql) {

Review Comment:
   Sorry, I though you were complaining about the name of the tests. In this case... I definitively missed the query in the assertion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397697484


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/nullvalue/NullValueIndexType.java:
##########
@@ -77,11 +80,23 @@ public String getPrettyName() {
 
   @Override
   public ColumnConfigDeserializer<IndexConfig> createDeserializer() {
-    return IndexConfigDeserializer.ifIndexingConfig(
-        IndexConfigDeserializer.alwaysCall((TableConfig tableConfig, Schema schema) ->
-            tableConfig.getIndexingConfig().isNullHandlingEnabled()
-                ? IndexConfig.ENABLED
-                : IndexConfig.DISABLED));
+    return (TableConfig tableConfig, Schema schema) -> {
+      Collection<FieldSpec> allFieldSpecs = schema.getAllFieldSpecs();
+      Map<String, IndexConfig> configMap = Maps.newHashMapWithExpectedSize(allFieldSpecs.size());
+
+      boolean enableColumnBasedNullHandling = schema.isEnableColumnBasedNullHandling();
+
+      for (FieldSpec fieldSpec : allFieldSpecs) {
+        IndexConfig indexConfig;
+        if (!enableColumnBasedNullHandling || fieldSpec.isNullable()) {

Review Comment:
   Fixed in 9402a7c4a65a9d031f66a696031bc4154d217915. I think it is also easier to understand now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397677350


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/Schema.java:
##########
@@ -520,14 +531,60 @@ public SchemaBuilder setSchemaName(String schemaName) {
       return this;
     }
 
+    public SchemaBuilder withEnableColumnBasedNullHandling(boolean enableColumnBasedNullHandling) {
+      _schema.setEnableColumnBasedNullHandling(enableColumnBasedNullHandling);
+      return this;
+    }
+
+    public SchemaBuilder addMetricField(String name, DataType dataType) {
+      return addMetricField(name, dataType, ignore -> {
+      });
+    }
+
+    public SchemaBuilder addMetricField(String name, DataType dataType, Consumer<MetricFieldSpec> customizer) {
+      MetricFieldSpec fieldSpec = new MetricFieldSpec();
+      return addFieldSpec(fieldSpec, name, dataType, customizer);
+    }
+
+    public SchemaBuilder addDimensionField(String name, DataType dataType) {
+      return addDimensionField(name, dataType, ignore -> {
+      });
+    }
+
+    public SchemaBuilder addDimensionField(String name, DataType dataType, Consumer<DimensionFieldSpec> customizer) {
+      DimensionFieldSpec fieldSpec = new DimensionFieldSpec();
+      return addFieldSpec(fieldSpec, name, dataType, customizer);
+    }
+
+    public SchemaBuilder addDateTimeField(String name, DataType dataType, String format, String granularity) {
+      return addDateTimeField(name, dataType, format, granularity, ignore -> {
+      });
+    }
+
+    public SchemaBuilder addDateTimeField(String name, DataType dataType, String format, String granularity,
+        Consumer<DateTimeFieldSpec> customizer) {
+      DateTimeFieldSpec fieldSpec = new DateTimeFieldSpec();
+      fieldSpec.setFormat(format);
+      fieldSpec.setGranularity(granularity);
+      return addFieldSpec(fieldSpec, name, dataType, customizer);
+    }
+
+    private <E extends FieldSpec> SchemaBuilder addFieldSpec(E fieldSpec, String name, DataType dataType,
+        Consumer<E> customizer) {
+      fieldSpec.setName(name);
+      fieldSpec.setDataType(dataType);
+      customizer.accept(fieldSpec);
+      _schema.addField(fieldSpec);
+      return this;
+    }
+
     /**
      * Add single value dimensionFieldSpec
      */
     public SchemaBuilder addSingleValueDimension(String dimensionName, DataType dataType) {
       _schema.addField(new DimensionFieldSpec(dimensionName, dataType, true));
       return this;
     }
-

Review Comment:
   Fixed in 7716fe00bdb7f6c3bada07922deb39ec6b528e37



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397679349


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/nullvalue/NullValueIndexType.java:
##########
@@ -77,11 +80,23 @@ public String getPrettyName() {
 
   @Override
   public ColumnConfigDeserializer<IndexConfig> createDeserializer() {
-    return IndexConfigDeserializer.ifIndexingConfig(
-        IndexConfigDeserializer.alwaysCall((TableConfig tableConfig, Schema schema) ->
-            tableConfig.getIndexingConfig().isNullHandlingEnabled()
-                ? IndexConfig.ENABLED
-                : IndexConfig.DISABLED));
+    return (TableConfig tableConfig, Schema schema) -> {
+      Collection<FieldSpec> allFieldSpecs = schema.getAllFieldSpecs();
+      Map<String, IndexConfig> configMap = Maps.newHashMapWithExpectedSize(allFieldSpecs.size());
+
+      boolean enableColumnBasedNullHandling = schema.isEnableColumnBasedNullHandling();
+
+      for (FieldSpec fieldSpec : allFieldSpecs) {
+        IndexConfig indexConfig;
+        if (!enableColumnBasedNullHandling || fieldSpec.isNullable()) {

Review Comment:
   For my understanding, is this to keep the old behavior of always loading the null vector when the index exists?
   In the existing code, we do check `tableConfig.getIndexingConfig().isNullHandlingEnabled()` but the check is dropped here. Is that because we do not check whether it is enabled in the existing code?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397675140


##########
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/queries/QueryRunnerTestBase.java:
##########
@@ -197,14 +199,17 @@ protected List<Object[]> queryH2(String sql)
     return result;
   }
 
-  protected void compareRowEquals(ResultTable resultTable, List<Object[]> expectedRows) {
-    compareRowEquals(resultTable, expectedRows, false);
+  protected void compareRowEquals(ResultTable resultTable, List<Object[]> expectedRows, String sql) {

Review Comment:
   ah. i dont use intellij ui. it is printed in the test report from mvn test :-) the reason why i said this is b/c for example if there's queries with different h2sql and pinot sql field it will not be shown but it will be in the mvn test report. thus it is better to leverage the parameterized test naming instead of invent our own.
   
   i wonder if there's plugin in intellij that can display those. after all a mvn cli too can do it intellij should be able to



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1389041535


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/plan/server/ServerPlanRequestUtils.java:
##########
@@ -147,6 +147,13 @@ private static ServerPlanRequestContext build(OpChainExecutionContext executionC
     // 2. set pinot query options according to requestMetadataMap
     updateQueryOptions(pinotQuery, executionContext);
 
+    if (QueryOptionsUtils.isNullHandlingEnabled(pinotQuery.getQueryOptions())

Review Comment:
   The logic here is:
   ```
   If a query is executed with V2 engine
     And is executed with null handling
     And any table consumed by the query is not using column level nullability
   Then abort the execution with an error saying column null handling must be enable for all tables involved in a multi-stage query with null handling
   ```
   
   I've been asked several times about queries returning strange results in V2 when using null handling. We don't support that but users don't know it. Therefore I strongly suggest to fail fast in order to be explicit about this limitation. This will also help users to understand that is important to migrate to column level nullability.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1393041339


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -44,6 +44,7 @@
  * <p>- <code>IsSingleValueField</code>: single-value or multi-value field.
  * <p>- <code>DefaultNullValue</code>: when no value found for this field, use this value. Stored in string format.
  * <p>- <code>VirtualColumnProvider</code>: the virtual column provider to use for this field.
+ * <p>- <code>Nullable</code>: whether the column is nullable or not. Defaults to null

Review Comment:
   Update this



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -99,6 +100,7 @@ public abstract class FieldSpec implements Comparable<FieldSpec>, Serializable {
   protected String _name;
   protected DataType _dataType;
   protected boolean _isSingleValueField = true;
+  protected boolean _nullable = true;

Review Comment:
   Since we use `notNull` as the key, can we also store `_notNull` internally and default to `false`? It is quite confusing to see it defined as `_nullable`, default to `true`, and has `isNullable()` and `setNullable()` json ignored



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/Schema.java:
##########
@@ -520,14 +531,60 @@ public SchemaBuilder setSchemaName(String schemaName) {
       return this;
     }
 
+    public SchemaBuilder withEnableColumnBasedNullHandling(boolean enableColumnBasedNullHandling) {
+      _schema.setEnableColumnBasedNullHandling(enableColumnBasedNullHandling);
+      return this;
+    }
+
+    public SchemaBuilder addMetricField(String name, DataType dataType) {
+      return addMetricField(name, dataType, ignore -> {
+      });
+    }
+
+    public SchemaBuilder addMetricField(String name, DataType dataType, Consumer<MetricFieldSpec> customizer) {
+      MetricFieldSpec fieldSpec = new MetricFieldSpec();
+      return addFieldSpec(fieldSpec, name, dataType, customizer);
+    }
+
+    public SchemaBuilder addDimensionField(String name, DataType dataType) {
+      return addDimensionField(name, dataType, ignore -> {
+      });
+    }
+
+    public SchemaBuilder addDimensionField(String name, DataType dataType, Consumer<DimensionFieldSpec> customizer) {
+      DimensionFieldSpec fieldSpec = new DimensionFieldSpec();
+      return addFieldSpec(fieldSpec, name, dataType, customizer);
+    }
+
+    public SchemaBuilder addDateTimeField(String name, DataType dataType, String format, String granularity) {
+      return addDateTimeField(name, dataType, format, granularity, ignore -> {
+      });
+    }
+
+    public SchemaBuilder addDateTimeField(String name, DataType dataType, String format, String granularity,
+        Consumer<DateTimeFieldSpec> customizer) {
+      DateTimeFieldSpec fieldSpec = new DateTimeFieldSpec();
+      fieldSpec.setFormat(format);
+      fieldSpec.setGranularity(granularity);
+      return addFieldSpec(fieldSpec, name, dataType, customizer);
+    }
+
+    private <E extends FieldSpec> SchemaBuilder addFieldSpec(E fieldSpec, String name, DataType dataType,
+        Consumer<E> customizer) {
+      fieldSpec.setName(name);
+      fieldSpec.setDataType(dataType);
+      customizer.accept(fieldSpec);
+      _schema.addField(fieldSpec);
+      return this;
+    }
+
     /**
      * Add single value dimensionFieldSpec
      */
     public SchemaBuilder addSingleValueDimension(String dimensionName, DataType dataType) {
       _schema.addField(new DimensionFieldSpec(dimensionName, dataType, true));
       return this;
     }
-

Review Comment:
   (nit) revert



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11960:
URL: https://github.com/apache/pinot/pull/11960#issuecomment-1798886338

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11960?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#11960](https://app.codecov.io/gh/apache/pinot/pull/11960?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (e0d14e2) into [master](https://app.codecov.io/gh/apache/pinot/commit/f232a5fae2a1345610744e6c76c7e7fbb5712e72?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (f232a5f) will **decrease** coverage by `33.65%`.
   > Report is 47 commits behind head on master.
   > The diff coverage is `17.64%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #11960       +/-   ##
   =============================================
   - Coverage     61.25%   27.60%   -33.65%     
   + Complexity     1141      201      -940     
   =============================================
     Files          2373     2386       +13     
     Lines        128263   129187      +924     
     Branches      19799    19965      +166     
   =============================================
   - Hits          78563    35665    -42898     
   - Misses        44013    90839    +46826     
   + Partials       5687     2683     -3004     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/11960/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/11960/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/11960/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/11960/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/11960/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.60% <17.64%> (?)` | |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/11960/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.59% <17.64%> (-33.66%)` | :arrow_down: |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/11960/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.60% <17.64%> (+0.04%)` | :arrow_up: |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/11960/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.59% <17.64%> (-7.11%)` | :arrow_down: |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/11960/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.60% <17.64%> (-33.65%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/11960/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.60% <17.64%> (-33.65%)` | :arrow_down: |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/11960/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/11960/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.60% <17.64%> (+0.03%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Files](https://app.codecov.io/gh/apache/pinot/pull/11960?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...t/core/query/selection/SelectionOperatorUtils.java](https://app.codecov.io/gh/apache/pinot/pull/11960?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9zZWxlY3Rpb24vU2VsZWN0aW9uT3BlcmF0b3JVdGlscy5qYXZh) | `0.00% <ø> (-92.77%)` | :arrow_down: |
   | [...local/indexsegment/mutable/MutableSegmentImpl.java](https://app.codecov.io/gh/apache/pinot/pull/11960?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvbXV0YWJsZS9NdXRhYmxlU2VnbWVudEltcGwuamF2YQ==) | `71.84% <100.00%> (-0.20%)` | :arrow_down: |
   | [...indexsegment/immutable/ImmutableSegmentLoader.java](https://app.codecov.io/gh/apache/pinot/pull/11960?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRMb2FkZXIuamF2YQ==) | `81.65% <0.00%> (-2.32%)` | :arrow_down: |
   | [...local/segment/index/loader/IndexLoadingConfig.java](https://app.codecov.io/gh/apache/pinot/pull/11960?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9JbmRleExvYWRpbmdDb25maWcuamF2YQ==) | `78.93% <0.00%> (-0.58%)` | :arrow_down: |
   | [...al/segment/index/nullvalue/NullValueIndexType.java](https://app.codecov.io/gh/apache/pinot/pull/11960?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L251bGx2YWx1ZS9OdWxsVmFsdWVJbmRleFR5cGUuamF2YQ==) | `75.75% <82.35%> (+0.75%)` | :arrow_up: |
   | [...ry/runtime/plan/server/ServerPlanRequestUtils.java](https://app.codecov.io/gh/apache/pinot/pull/11960?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtcXVlcnktcnVudGltZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvcXVlcnkvcnVudGltZS9wbGFuL3NlcnZlci9TZXJ2ZXJQbGFuUmVxdWVzdFV0aWxzLmphdmE=) | `0.00% <0.00%> (-78.95%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://app.codecov.io/gh/apache/pinot/pull/11960?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `56.03% <42.85%> (-0.06%)` | :arrow_down: |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://app.codecov.io/gh/apache/pinot/pull/11960?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `75.65% <54.54%> (-0.45%)` | :arrow_down: |
   | [...main/java/org/apache/pinot/spi/data/FieldSpec.java](https://app.codecov.io/gh/apache/pinot/pull/11960?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9GaWVsZFNwZWMuamF2YQ==) | `0.00% <0.00%> (-84.62%)` | :arrow_down: |
   | [.../java/org/apache/pinot/query/type/TypeFactory.java](https://app.codecov.io/gh/apache/pinot/pull/11960?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtcXVlcnktcGxhbm5lci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvcXVlcnkvdHlwZS9UeXBlRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-68.97%)` | :arrow_down: |
   | ... and [2 more](https://app.codecov.io/gh/apache/pinot/pull/11960?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | |
   
   ... and [1209 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/11960/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in [Chrome](https://chrome.google.com/webstore/detail/codecov/gedikamndpbemklijjkncpnolildpbgo) or [Firefox](https://addons.mozilla.org/en-US/firefox/addon/codecov/) today!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1392923138


##########
pinot-query-runtime/src/test/resources/queries/CountDistinct.json:
##########
@@ -119,11 +119,13 @@
       {
         "comments": "table aren't actually partitioned by val thus all segments can produce duplicate results, thus [[8]]",
         "sql": "SELECT SEGMENT_PARTITIONED_DISTINCT_COUNT(val) FROM {tbl1}",
+        "ignored": true,

Review Comment:
   Not this one, but the other tests ignored fail just because I added more tables. It seem the test framework is not able to specify how to partition data and by adding tests with new tables we modify the default partition distribution.
   
   You can try to remove the `ignored`  tests here and run the tests with and without my new tests and you will see that the results are different



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -300,6 +302,30 @@ public void setTransformFunction(@Nullable String transformFunction) {
     _transformFunction = transformFunction;
   }
 
+  /**
+   * Returns whether the column is nullable or not.
+   */
+  @JsonIgnore
+  public Boolean isNullable() {

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #11960:
URL: https://github.com/apache/pinot/pull/11960#issuecomment-1810689447

   There are still some test failing. I wasn't able to understand why. I'll try to dedicate them more time tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1389264531


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/plan/server/ServerPlanRequestUtils.java:
##########
@@ -147,6 +147,13 @@ private static ServerPlanRequestContext build(OpChainExecutionContext executionC
     // 2. set pinot query options according to requestMetadataMap
     updateQueryOptions(pinotQuery, executionContext);
 
+    if (QueryOptionsUtils.isNullHandlingEnabled(pinotQuery.getQueryOptions())

Review Comment:
   Removed in ac5c5e51c8ce8d418cc7cf35590e1a4e5a273a55



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1389508443


##########
pinot-query-planner/src/main/java/org/apache/pinot/query/type/TypeFactory.java:
##########
@@ -45,20 +46,36 @@ public TypeFactory(RelDataTypeSystem typeSystem) {
 
   public RelDataType createRelDataTypeFromSchema(Schema schema) {
     Builder builder = new Builder(this);
+    Predicate<FieldSpec> isNullable;
+    if (schema.getOptions().getNullHandling().supportsV2()) {
+      isNullable = schema.getOptions().getNullHandling()::isNullable;
+    } else {
+      isNullable = fieldSpec -> false;
+    }
     for (Map.Entry<String, FieldSpec> e : schema.getFieldSpecMap().entrySet()) {
-      builder.add(e.getKey(), toRelDataType(e.getValue()));
+      builder.add(e.getKey(), toRelDataType(e.getValue(), isNullable));
     }
     return builder.build();
   }
 
-  private RelDataType toRelDataType(FieldSpec fieldSpec) {
+  private RelDataType toRelDataType(FieldSpec fieldSpec, Predicate<FieldSpec> isNullable) {
+    RelDataType type = createSqlType(getSqlTypeName(fieldSpec));
+    boolean isArray = !fieldSpec.isSingleValueField();
+    if (isArray) {
+      type = createArrayType(type, -1);
+    }
+    if (isNullable.test(fieldSpec)) {
+      type = createTypeWithNullability(type, true);
+    }

Review Comment:
   yeah agreed. as long as we document & best effort align with postgers it should be good :-D



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1393803213


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -99,6 +100,7 @@ public abstract class FieldSpec implements Comparable<FieldSpec>, Serializable {
   protected String _name;
   protected DataType _dataType;
   protected boolean _isSingleValueField = true;
+  protected boolean _nullable = true;

Review Comment:
   We can store that attribute. But adding too many negations makes things difficult to understand. It is easier to read
   ```
   if (field.isNullable()) {
   ```
   
   Than
   
   ```
   if (!field.isNotNull()) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #11960:
URL: https://github.com/apache/pinot/pull/11960


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397659281


##########
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/queries/QueryRunnerTestBase.java:
##########
@@ -197,14 +199,17 @@ protected List<Object[]> queryH2(String sql)
     return result;
   }
 
-  protected void compareRowEquals(ResultTable resultTable, List<Object[]> expectedRows) {
-    compareRowEquals(resultTable, expectedRows, false);
+  protected void compareRowEquals(ResultTable resultTable, List<Object[]> expectedRows, String sql) {

Review Comment:
   I've added that because in Intellij query is not added, which makes it difficult to find the query that fail. Also the description, if defined, should be part of the test name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397703543


##########
pinot-query-planner/src/test/java/org/apache/pinot/query/testutils/MockRoutingManagerFactory.java:
##########
@@ -107,6 +113,16 @@ public TableCache buildTableCache() {
       String schemaName = invocationOnMock.getArgument(0);
       return _schemaMap.get(schemaName);
     });
+    when(mock.getTableConfig(anyString())).thenAnswer(invocationOnMock -> {
+      String tableNameWithType = invocationOnMock.getArgument(0);
+      String tableName = TableNameBuilder.extractRawTableName(tableNameWithType);
+      return new TableConfigBuilder(TableType.OFFLINE)
+          .setTableName(tableName)
+//          .setSchemaName(_schemaMap.get(tableName).getSchemaName())

Review Comment:
   (nit) Remove the commented line



##########
pinot-integration-tests/src/test/resources/test_null_handling.schema:
##########
@@ -1,4 +1,10 @@
 {
+  "options": {
+    "nullHandling": {
+      "mode": "column",
+      "default": true

Review Comment:
   ^^
   Also, how does it work right now? Do we have test that rely on column based null handling to be enabled?



##########
pinot-query-runtime/src/test/resources/queries/CountDistinct.json:
##########
@@ -119,11 +119,13 @@
       {
         "comments": "table aren't actually partitioned by val thus all segments can produce duplicate results, thus [[8]]",
         "sql": "SELECT SEGMENT_PARTITIONED_DISTINCT_COUNT(val) FROM {tbl1}",
+        "ignored": true,

Review Comment:
   Shall we try removing the ignored flag and see if they still fail? If so, we might want to debug them because this can cause regression



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1399494892


##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/nullvalue/NullValueIndexTypeTest.java:
##########
@@ -50,6 +50,7 @@ protected void assertEquals(IndexConfig expected) {
 
     @Test(dataProvider = "provideCases", dataProviderClass = NullValueIndexTypeTest.class)
     public void isEnabledWhenNullable(boolean columnNullHandling, boolean fieldNullable, IndexConfig expected) {
+      _tableConfig.getIndexingConfig().setNullHandlingEnabled(true);

Review Comment:
   this is needed b/c the logic now rely on table config. (backward incompat?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1399489236


##########
pinot-query-runtime/src/test/resources/queries/CountDistinct.json:
##########
@@ -119,11 +119,13 @@
       {
         "comments": "table aren't actually partitioned by val thus all segments can produce duplicate results, thus [[8]]",
         "sql": "SELECT SEGMENT_PARTITIONED_DISTINCT_COUNT(val) FROM {tbl1}",
+        "ignored": true,

Review Comment:
   i tried it and it didnt' work. let's address separately



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on PR #11960:
URL: https://github.com/apache/pinot/pull/11960#issuecomment-1814747284

   > In https://github.com/apache/pinot/actions/runs/6874613568/job/18696611890?pr=11960 you can see the test that failed. I'm going to comment them again in order to have a green light.
   
   the output value was changed in #11234 CC @Jackie-Jiang to take a look. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1392919826


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/nullvalue/NullValueIndexType.java:
##########
@@ -67,7 +70,7 @@ public NullValueVectorCreator createIndexCreator(IndexCreationContext context, I
 
   @Override
   public IndexConfig getDefaultConfig() {
-    return IndexConfig.DISABLED;
+    return IndexConfig.ENABLED;

Review Comment:
   > 1. this means null index type is always enabled, but it could contain no columns?
   
   I don't think so. Columns contain indexes, not the other way arround
   
   > 2. this has nothing to do with the schema definition JSONs option.nullHandling.mode/default value yes?
   
   Nothing to do.
   
   The reason to add this change is that before this change we always read the buffer, independently of the actual default config. Now we don't do that and therefore it is important to keep backward compatibility. Also, there are tons of tests that load segments without providing schema and table, which means that they use the default index config for all index types.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1389108084


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/plan/server/ServerPlanRequestUtils.java:
##########
@@ -147,6 +147,13 @@ private static ServerPlanRequestContext build(OpChainExecutionContext executionC
     // 2. set pinot query options according to requestMetadataMap
     updateQueryOptions(pinotQuery, executionContext);
 
+    if (QueryOptionsUtils.isNullHandlingEnabled(pinotQuery.getQueryOptions())

Review Comment:
   But I'm ok removing this from the PR give it side affects several tests which are not actually testing nullability



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1389507192


##########
pinot-query-planner/src/test/java/org/apache/pinot/query/queries/ResourceBasedQueryPlansTest.java:
##########
@@ -81,7 +83,7 @@ public void testQueryExplainPlansWithExceptions(String testCaseName, String quer
     }
   }
 
-  @DataProvider
+  @DataProvider(parallel = true)

Review Comment:
   oh. i see. so this runs 2 individual processes? (calcite do share some static objects IIUC)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on PR #11960:
URL: https://github.com/apache/pinot/pull/11960#issuecomment-1799695843

   CC @ankitsultana to also take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #11960:
URL: https://github.com/apache/pinot/pull/11960#issuecomment-1812351515

   Marked as ready to review. There are 3 test failing, but they are the ones defined in CountDistinct.json that (AFAIK) are incorrectly failing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1393810276


##########
pinot-query-runtime/src/test/resources/queries/CountDistinct.json:
##########
@@ -119,11 +119,13 @@
       {
         "comments": "table aren't actually partitioned by val thus all segments can produce duplicate results, thus [[8]]",
         "sql": "SELECT SEGMENT_PARTITIONED_DISTINCT_COUNT(val) FROM {tbl1}",
+        "ignored": true,

Review Comment:
   I've merged my branch with master and it didn't fix the issue. In fact now we have a third test here that fails. I'm going to remove the ignores so the issue is more evident.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #11960:
URL: https://github.com/apache/pinot/pull/11960#issuecomment-1814550240

   In https://github.com/apache/pinot/actions/runs/6874613568/job/18696611890?pr=11960 you can see the test that failed. I'm going to comment them again in order to have a green light.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] [draft] Explicit null handling [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1388499295


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -99,6 +100,7 @@ public abstract class FieldSpec implements Comparable<FieldSpec>, Serializable {
   protected String _name;
   protected DataType _dataType;
   protected boolean _isSingleValueField = true;
+  protected Boolean _nullable = null;

Review Comment:
   We have run into problems with default true configs before, so it is best practice to use false as default to prevent bugs because people usually assume the default is false



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397674138


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/Schema.java:
##########
@@ -520,14 +531,60 @@ public SchemaBuilder setSchemaName(String schemaName) {
       return this;
     }
 
+    public SchemaBuilder withEnableColumnBasedNullHandling(boolean enableColumnBasedNullHandling) {
+      _schema.setEnableColumnBasedNullHandling(enableColumnBasedNullHandling);
+      return this;
+    }
+
+    public SchemaBuilder addMetricField(String name, DataType dataType) {
+      return addMetricField(name, dataType, ignore -> {
+      });
+    }
+
+    public SchemaBuilder addMetricField(String name, DataType dataType, Consumer<MetricFieldSpec> customizer) {
+      MetricFieldSpec fieldSpec = new MetricFieldSpec();
+      return addFieldSpec(fieldSpec, name, dataType, customizer);
+    }
+
+    public SchemaBuilder addDimensionField(String name, DataType dataType) {
+      return addDimensionField(name, dataType, ignore -> {
+      });
+    }
+
+    public SchemaBuilder addDimensionField(String name, DataType dataType, Consumer<DimensionFieldSpec> customizer) {
+      DimensionFieldSpec fieldSpec = new DimensionFieldSpec();
+      return addFieldSpec(fieldSpec, name, dataType, customizer);
+    }
+
+    public SchemaBuilder addDateTimeField(String name, DataType dataType, String format, String granularity) {
+      return addDateTimeField(name, dataType, format, granularity, ignore -> {
+      });
+    }
+
+    public SchemaBuilder addDateTimeField(String name, DataType dataType, String format, String granularity,
+        Consumer<DateTimeFieldSpec> customizer) {
+      DateTimeFieldSpec fieldSpec = new DateTimeFieldSpec();
+      fieldSpec.setFormat(format);
+      fieldSpec.setGranularity(granularity);
+      return addFieldSpec(fieldSpec, name, dataType, customizer);
+    }
+
+    private <E extends FieldSpec> SchemaBuilder addFieldSpec(E fieldSpec, String name, DataType dataType,
+        Consumer<E> customizer) {
+      fieldSpec.setName(name);
+      fieldSpec.setDataType(dataType);
+      customizer.accept(fieldSpec);
+      _schema.addField(fieldSpec);
+      return this;
+    }
+
     /**
      * Add single value dimensionFieldSpec
      */
     public SchemaBuilder addSingleValueDimension(String dimensionName, DataType dataType) {
       _schema.addField(new DimensionFieldSpec(dimensionName, dataType, true));
       return this;
     }
-

Review Comment:
   I mean the unintended removal of empty line



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397678713


##########
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/queries/ResourceBasedQueriesTest.java:
##########
@@ -73,6 +73,7 @@ public class ResourceBasedQueriesTest extends QueryRunnerTestBase {
   private static final String FILE_FILTER_PROPERTY = "pinot.fileFilter";
   private static final String IGNORE_FILTER_PROPERTY = "pinot.runIgnored";
   private static final int DEFAULT_NUM_PARTITIONS = 4;
+  private static final boolean REPORT_IGNORES = true;

Review Comment:
   I've rollbacked most of these changes in 3dd08862ed9e5c2355f029276675ffc2179be886



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397653274


##########
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/queries/QueryRunnerTestBase.java:
##########
@@ -197,14 +199,17 @@ protected List<Object[]> queryH2(String sql)
     return result;
   }
 
-  protected void compareRowEquals(ResultTable resultTable, List<Object[]> expectedRows) {
-    compareRowEquals(resultTable, expectedRows, false);
+  protected void compareRowEquals(ResultTable resultTable, List<Object[]> expectedRows, String sql) {

Review Comment:
   not sure this change is necessary. in mvn tests the query will be part of the test name.
   if we want to be more verbose here suggest take a stab into how test suite utilizes the parameterized test input



##########
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/queries/ResourceBasedQueriesTest.java:
##########
@@ -73,6 +73,7 @@ public class ResourceBasedQueriesTest extends QueryRunnerTestBase {
   private static final String FILE_FILTER_PROPERTY = "pinot.fileFilter";
   private static final String IGNORE_FILTER_PROPERTY = "pinot.runIgnored";
   private static final int DEFAULT_NUM_PARTITIONS = 4;
+  private static final boolean REPORT_IGNORES = true;

Review Comment:
   can we separate this one from this PR? 



##########
pinot-query-planner/src/test/java/org/apache/pinot/query/queries/ResourceBasedQueryPlansTest.java:
##########
@@ -81,7 +83,7 @@ public void testQueryExplainPlansWithExceptions(String testCaseName, String quer
     }
   }
 
-  @DataProvider
+  @DataProvider(parallel = true)

Review Comment:
   i wasn't sure this is fully verified. if possible let's keep this separate into another PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397691057


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/nullvalue/NullValueIndexType.java:
##########
@@ -77,11 +80,23 @@ public String getPrettyName() {
 
   @Override
   public ColumnConfigDeserializer<IndexConfig> createDeserializer() {
-    return IndexConfigDeserializer.ifIndexingConfig(
-        IndexConfigDeserializer.alwaysCall((TableConfig tableConfig, Schema schema) ->
-            tableConfig.getIndexingConfig().isNullHandlingEnabled()
-                ? IndexConfig.ENABLED
-                : IndexConfig.DISABLED));
+    return (TableConfig tableConfig, Schema schema) -> {
+      Collection<FieldSpec> allFieldSpecs = schema.getAllFieldSpecs();
+      Map<String, IndexConfig> configMap = Maps.newHashMapWithExpectedSize(allFieldSpecs.size());
+
+      boolean enableColumnBasedNullHandling = schema.isEnableColumnBasedNullHandling();
+
+      for (FieldSpec fieldSpec : allFieldSpecs) {
+        IndexConfig indexConfig;
+        if (!enableColumnBasedNullHandling || fieldSpec.isNullable()) {

Review Comment:
   You are right, although in my understanding nobody actually used this to decide whether the index is created/consumed or not. Even then, I'll change this code to be sure it is still backward compatible



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Explicit null handling [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11960:
URL: https://github.com/apache/pinot/pull/11960#discussion_r1397688624


##########
pinot-integration-tests/src/test/resources/test_null_handling.schema:
##########
@@ -1,4 +1,10 @@
 {
+  "options": {
+    "nullHandling": {
+      "mode": "column",
+      "default": true

Review Comment:
   Does this still apply? Should we set "enableColumnBasedNullHandling" instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org