You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/23 09:54:26 UTC

[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10419: Automatically determine numShards for parallel ingestion hash partitioning

abhishekagarwal87 commented on a change in pull request #10419:
URL: https://github.com/apache/druid/pull/10419#discussion_r493391588



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java
##########
@@ -582,6 +652,50 @@ private TaskStatus runRangePartitionMultiPhaseParallel(TaskToolbox toolbox) thro
     return TaskStatus.fromCode(getId(), mergeState);
   }
 
+  @VisibleForTesting
+  public static int determineNumShardsFromCardinalityReport(
+      Collection<DimensionCardinalityReport> reports,
+      int maxRowsPerSegment
+  )
+  {
+    // aggregate all the sub-reports
+    Map<Interval, Union> finalCollectors = new HashMap<>();
+    reports.forEach(report -> {
+      Map<Interval, byte[]> intervalToCardinality = report.getIntervalToCardinalities();
+      for (Map.Entry<Interval, byte[]> entry : intervalToCardinality.entrySet()) {
+        Union union = finalCollectors.computeIfAbsent(
+            entry.getKey(),
+            (key) -> {
+              return new Union(DimensionCardinalityReport.HLL_SKETCH_LOG_K);
+            }
+        );
+        HllSketch entryHll = HllSketch.wrap(Memory.wrap(entry.getValue()));
+        union.update(entryHll);
+      }
+    });
+
+    // determine the highest cardinality in any interval
+    long maxCardinality = Long.MIN_VALUE;

Review comment:
       how about using `0` instead here? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org