You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/06/29 20:08:21 UTC
[GitHub] [druid] gianm commented on a change in pull request #10089: Fix native batch range partition segment sizing

gianm commented on a change in pull request #10089:
URL: https://github.com/apache/druid/pull/10089#discussion_r447220850



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/PartialDimensionDistributionTask.java
##########
@@ -223,17 +224,19 @@ public TaskStatus runTask(TaskToolbox toolbox) throws Exception
 
     try (
         CloseableIterator<InputRow> inputRowIterator = inputSourceReader.read();
-        HandlingInputRowIterator iterator = new RangePartitionIndexTaskInputRowIteratorBuilder(partitionDimension, SKIP_NULL)
-            .delegate(inputRowIterator)
-            .granularitySpec(granularitySpec)
-            .nullRowRunnable(IndexTaskInputRowIteratorBuilder.NOOP_RUNNABLE)
-            .absentBucketIntervalConsumer(IndexTaskInputRowIteratorBuilder.NOOP_CONSUMER)
-            .build()
+        HandlingInputRowIterator iterator =
+            new RangePartitionIndexTaskInputRowIteratorBuilder(partitionDimension, SKIP_NULL)
+                .delegate(inputRowIterator)
+                .granularitySpec(granularitySpec)
+                .nullRowRunnable(IndexTaskInputRowIteratorBuilder.NOOP_RUNNABLE)
+                .absentBucketIntervalConsumer(IndexTaskInputRowIteratorBuilder.NOOP_CONSUMER)
+                .build()
     ) {
       Map<Interval, StringDistribution> distribution = determineDistribution(
           iterator,
           granularitySpec,
           partitionDimension,
+          dataSchema.getDimensionsSpec().getDimensionNames(),

Review comment:
       I haven't fully looked into the logic of `determineDistribution`, but a warning: this isn't the right way to get the list of dimension names that will end up in the finished product, since it won't work properly in [schemaless dimensions](https://druid.apache.org/docs/latest/ingestion/schema-design.html#schema-less-dimensions) mode. So `determineDistribution` may need to accept the DimensionsSpec itself and do the same logic that the index generator code itself would do.
   
   Watch out for the queryGranularity as well, which is part of the rollup key. It looks like `determineDistribution` isn't currently inspecting it (it comes from `granularitySpec.getQueryGranularity()`).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org