You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/08/06 05:52:28 UTC
[GitHub] [druid] gianm commented on a change in pull request #10219: add isNullable to ColumnCapabilities, ColumnAnalysis

gianm commented on a change in pull request #10219:
URL: https://github.com/apache/druid/pull/10219#discussion_r466140881



##########
File path: processing/src/main/java/org/apache/druid/query/metadata/SegmentAnalyzer.java
##########
@@ -100,6 +100,11 @@ public long numRows(Segment segment)
 
     Map<String, ColumnAnalysis> columns = new TreeMap<>();
 
+    Function<String, ColumnCapabilities> adapterCapabilitesFn =

Review comment:
       I realize you just copied this code from somewhere else, but is there a way we can do this without `instanceof`? Maybe a new method on StorageAdapter with some nice javadocs? This code is pretty brittle otherwise.

##########
File path: processing/src/main/java/org/apache/druid/segment/serde/DictionaryEncodedColumnPartSerde.java
##########
@@ -329,6 +331,7 @@ public void read(ByteBuffer buffer, ColumnBuilder builder, ColumnConfig columnCo
         );
         builder
             .setHasMultipleValues(hasMultipleValues)
+            .setNullable(firstDictionaryEntry == null)

Review comment:
       What if it's a multi-value column with no explicit nulls, but some empty rows? Two questions in that scenario:
   
   1. Should a column like that report "yes" or "no" for having nulls? I'm guessing "yes" makes sense, because if you filter on `col is null` then that should match empty rows. So it stands to reason that it has nulls in some sense, even though you won't see them if you walk through the column.
   2. What will firstDictionaryEntry be?

##########
File path: processing/src/main/java/org/apache/druid/segment/column/ColumnCapabilitiesImpl.java
##########
@@ -72,22 +74,35 @@ public static ColumnCapabilitiesImpl snapshot(@Nullable final ColumnCapabilities
     copy.hasMultipleValues = copy.hasMultipleValues.coerceUnknownToBoolean(unknownIsTrue);
     copy.dictionaryValuesSorted = copy.dictionaryValuesSorted.coerceUnknownToBoolean(unknownIsTrue);
     copy.dictionaryValuesUnique = copy.dictionaryValuesUnique.coerceUnknownToBoolean(unknownIsTrue);
+    copy.nullable = copy.nullable.coerceUnknownToBoolean(unknownIsTrue);

Review comment:
       With this addition, I'm wondering if the `snapshot` method still makes sense. What reason is there that callers generally want to set all these unknown things to the same boolean? They don't seem incredibly related. Maybe we should have callers explicitly resolve all the unknowns in particular directions.

##########
File path: processing/src/main/java/org/apache/druid/segment/serde/DoubleNumericColumnPartSerdeV2.java
##########
@@ -152,13 +152,17 @@ public Deserializer getDeserializer()
 
       buffer.position(initialPos + offset);
       final ImmutableBitmap bitmap;
+      final boolean isNullable;
       if (buffer.hasRemaining()) {
         bitmap = bitmapSerdeFactory.getObjectStrategy().fromByteBufferWithSize(buffer);
+        isNullable = bitmap.size() > 0;

Review comment:
       It would be better to use `!bitmap.isEmpty()`, because `bitmap.size()` isn't always cached and could potentially require walking the bitmap. Similar comment for the other serde types.

##########
File path: processing/src/main/java/org/apache/druid/query/metadata/SegmentAnalyzer.java
##########
@@ -192,6 +193,7 @@ private ColumnAnalysis analyzeNumericColumn(
     return new ColumnAnalysis(
         capabilities.getType().name(),
         capabilities.hasMultipleValues().isTrue(),
+        capabilities.isNullable().isTrue(),

Review comment:
       When would it be unknown if a column has nulls?
   
   Will bad stuff happen if we return `false` when it actually does have nulls, but for some reason it's marked as unknown?

##########
File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##########
@@ -685,6 +685,9 @@ IncrementalIndexRowResult toIncrementalIndexRow(InputRow row)
         }
         dimsKeySize += indexer.estimateEncodedKeyComponentSize(dimsKey);
         // Set column capabilities as data is coming in
+        if (dimsKey == null) {
+          capabilities.setIsNullable(NullHandling.sqlCompatible());

Review comment:
       If it's string won't it still count as a null, even in replace-with-default mode?
   
   Btw, consider adding a `NullHandling.isNullable(ValueType)` method if seems useful.

##########
File path: processing/src/main/java/org/apache/druid/segment/column/ColumnCapabilities.java
##########
@@ -38,6 +38,7 @@
   boolean hasSpatialIndexes();
   Capable hasMultipleValues();
   boolean isFilterable();
+  Capable isNullable();

Review comment:
       Broad comment: "hasNulls" would be a better name for this property, because it's meant to mean whether or not the column really has nulls in it right now.
   
   "isNullable" will make people think of the SQL sense of the term, which is a different concept: it means a column could have nulls potentially, even if it doesn't right now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org