You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/23 08:47:07 UTC

[GitHub] [druid] kfaraz commented on a change in pull request #11973: Handle null values in Range Partition dimension distribution

kfaraz commented on a change in pull request #11973:
URL: https://github.com/apache/druid/pull/11973#discussion_r754900489



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/PartialDimensionDistributionTask.java
##########
@@ -68,8 +68,9 @@
 {
   public static final String TYPE = "partial_dimension_distribution";
 
-  // Future work: StringDistribution does not handle inserting NULLs. This is the same behavior as hadoop indexing.
-  private static final boolean SKIP_NULL = true;
+  // Do not skip nulls as StringDistribution can handle null values.
+  // This behavior is different from hadoop indexing.
+  private static final boolean SKIP_NULL = false;

Review comment:
       I think we should be fine without the flag.
   
   The following effects would be observed on single dim:
   1. Partitioning would now also work on a dimension column that is always null, although it will actually create just one partition.
   2. Estimation of partition boundaries will also take into account null values. So the algorithm would do a better job of estimating the size of the first (it would be closer to the target rows). The sizes of later partitions will not be affected.
   
   With the addition of (multi dimension) range partitioning, single dim is inevitably being affected as it now goes through the multi dim flow itself. So this would only be another part of that overall effect.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org