You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/23 06:28:32 UTC

[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #11973: Handle null values in Range Partition dimension distribution

abhishekagarwal87 commented on a change in pull request #11973:
URL: https://github.com/apache/druid/pull/11973#discussion_r754830086



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/PartialDimensionDistributionTask.java
##########
@@ -68,8 +68,9 @@
 {
   public static final String TYPE = "partial_dimension_distribution";
 
-  // Future work: StringDistribution does not handle inserting NULLs. This is the same behavior as hadoop indexing.
-  private static final boolean SKIP_NULL = true;
+  // Do not skip nulls as StringDistribution can handle null values.
+  // This behavior is different from hadoop indexing.
+  private static final boolean SKIP_NULL = false;

Review comment:
       can this be selectively turned on only when more than one dimension is being used? I don't know for certain what the impact of not skipping null will be but then that impact will be limited to new range partitioning only. or it can be based on a flag that you can pass via the context. thoughts? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org