You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/04/16 00:19:23 UTC

[GitHub] [druid] loquisgon opened a new pull request, #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

loquisgon opened a new pull request, #12443:
URL: https://github.com/apache/druid/pull/12443

We have seen rare instances on the wild where during hash partitions the determine cardinality phase produces a single shard with a large segment without regards to the value of `macRowsPerSegment`. Attempts to reproduce have not been successful so adding some defensive programming and logging in case this happens again would be helpful. This PR deals with the improbable case where the estimate from a `Union` `HLLSketch` that is used in the code returns negative. It also adds some logging to report the values used to come up with the final determination.

This PR has:
- [X ] been self-reviewed.
- [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [X ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [ X] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
- [ ] added integration tests.
- [X ] been tested in a test Druid cluster.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] loquisgon commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

loquisgon commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r874022794


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;
+            LOG.warn("Estimated cardinality for union of estimates is zero or less: %.2f, setting num shards to %d",
+                     estimatedCardinality, estimatedNumShards
+            );
+          } else {
+            // determine numShards based on maxRowsPerSegment and the cardinality
+            estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          }
+          LOG.debug("estimatedNumShards %d given estimated cardinality %.2f and maxRowsPerSegment %d",

Review Comment:
   It is not easy to know how frequent it would be logged without coming up with some data models and retrieving some data to understand more the real distributions of hash buckets in a given time chunk for a give set of dimensions. This is one of these things were experience and/or experimentation teaches you better IMO. So turning it on again and watching it seems like the right thing to do this time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] loquisgon commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

loquisgon commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r876430129


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -703,6 +704,10 @@ TaskStatus runHashPartitionMultiPhaseParallel(TaskToolbox toolbox) throws Except
             cardinalityRunner.getReports().values(),
             effectiveMaxRowsPerSegment
         );
+
+        // This is for potential debugging in case we suspect bad estimation of cardinalities etc,
+        LOG.debug("intervalToNumShards: %s", intervalToNumShards.toString());
+

Review Comment:
   I have instrumented the `Task`'s to make it easier to emit metrics...  we can delay this PR until [this other PR with the metrics instrumentation](https://github.com/apache/druid/pull/12488) merges and then I can easily add the metric.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] loquisgon commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

loquisgon commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r869784712


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;

Review Comment:
   My thinking is that three is too small but it is all arbitrary. This is not really a default, it is just for the *remote*, never actually observed, case that estimate is negative. So I would like to leave as is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] loquisgon commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

loquisgon commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r869785377


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;
+            LOG.warn("Estimated cardinality for union of estimates is zero or less: %.2f, setting num shards to %d",
+                     estimatedCardinality, estimatedNumShards
+            );
+          } else {
+            // determine numShards based on maxRowsPerSegment and the cardinality
+            estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          }
+          LOG.debug("estimatedNumShards %d given estimated cardinality %.2f and maxRowsPerSegment %d",

Review Comment:
   mmm, the goal of this was to do something for the bad estimate not for better logging?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] zachjsh commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

zachjsh commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r871603644


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;

Review Comment:
   hmm, yeah failing the ingestion completely may be extreme. I think best is to allow user to set a fallback num shards value if it cant be computed / estimated properly. Maybe a negative value for the config could indicate to fail the ingestion?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] zachjsh commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

zachjsh commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r871602211


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;
+            LOG.warn("Estimated cardinality for union of estimates is zero or less: %.2f, setting num shards to %d",
+                     estimatedCardinality, estimatedNumShards
+            );
+          } else {
+            // determine numShards based on maxRowsPerSegment and the cardinality
+            estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          }
+          LOG.debug("estimatedNumShards %d given estimated cardinality %.2f and maxRowsPerSegment %d",

Review Comment:
   If the logging was deemed to excessive before and explicitly removed, then maybe thats good reason to keep it debug. I'm not sure on the details though? Still not clear to me how often this log would be written.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] loquisgon commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

loquisgon commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r870750706


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;
+            LOG.warn("Estimated cardinality for union of estimates is zero or less: %.2f, setting num shards to %d",
+                     estimatedCardinality, estimatedNumShards
+            );
+          } else {
+            // determine numShards based on maxRowsPerSegment and the cardinality
+            estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          }
+          LOG.debug("estimatedNumShards %d given estimated cardinality %.2f and maxRowsPerSegment %d",

Review Comment:
   I made it info based on code review...I agree it is useful to have.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] loquisgon commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

loquisgon commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r870636575


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;

Review Comment:
   In order to enforce collect and fix we could also throw an ISE here so the context is repeatable...how does this sound instead of the guesstimate of seven shards? Rather than guesstimating just throw an ISE and halt. This may be too harsh so the warning is better I think but stop there and not try to be more clever. Let's think of this as some sort of fishing expedition for data to see if this was the original problem. There is no evidence and those of us that have tried have not been able to reproduce the scenario.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] loquisgon commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

loquisgon commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r870633973


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;

Review Comment:
   If we fall again here we should collect evidence and fix it for good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] loquisgon commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

loquisgon commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r869784953


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -703,6 +704,10 @@ TaskStatus runHashPartitionMultiPhaseParallel(TaskToolbox toolbox) throws Except
             cardinalityRunner.getReports().values(),
             effectiveMaxRowsPerSegment
         );
+
+        // This is for potential debugging in case we suspect bad estimation of cardinalities etc,
+        LOG.debug("intervalToNumShards: %s", intervalToNumShards.toString());
+

Review Comment:
   A similar log message was there before and it was explicitly removed by a previous change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] zachjsh commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

zachjsh commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r869920252


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;

Review Comment:
   But in the case that we get into this situation, should the fall back here be configurable, so that if 7 results in a bad estimate / default value, the job can be rerun with a different value and produce potentially better results?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] loquisgon commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

loquisgon commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r876428969


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;

Review Comment:
   Put the magic number behind a label.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] loquisgon commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

loquisgon commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r869784712


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;

Review Comment:
   My thinking is that three is too small but it is all arbitrary. This is not really a default, it is just for the *remote*, never actually verified to be observed, case that estimate is negative. So I would like to leave as is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] loquisgon commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

loquisgon commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r869785377


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;
+            LOG.warn("Estimated cardinality for union of estimates is zero or less: %.2f, setting num shards to %d",
+                     estimatedCardinality, estimatedNumShards
+            );
+          } else {
+            // determine numShards based on maxRowsPerSegment and the cardinality
+            estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          }
+          LOG.debug("estimatedNumShards %d given estimated cardinality %.2f and maxRowsPerSegment %d",

Review Comment:
   mmm, the goal of this was to do something for the bad estimate not for better logging? Let me think...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] zachjsh commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

zachjsh commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r869920852


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -703,6 +704,10 @@ TaskStatus runHashPartitionMultiPhaseParallel(TaskToolbox toolbox) throws Except
             cardinalityRunner.getReports().values(),
             effectiveMaxRowsPerSegment
         );
+
+        // This is for potential debugging in case we suspect bad estimation of cardinalities etc,
+        LOG.debug("intervalToNumShards: %s", intervalToNumShards.toString());
+

Review Comment:
   a metric here could be good, as @somu-imply  suggested



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] zachjsh commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

zachjsh commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r869623768


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;
+            LOG.warn("Estimated cardinality for union of estimates is zero or less: %.2f, setting num shards to %d",
+                     estimatedCardinality, estimatedNumShards
+            );
+          } else {
+            // determine numShards based on maxRowsPerSegment and the cardinality
+            estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          }
+          LOG.debug("estimatedNumShards %d given estimated cardinality %.2f and maxRowsPerSegment %d",

Review Comment:
   This seems like useful log, if not logged too often, maybe we can make it info level?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] loquisgon commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

loquisgon commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r870639690


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;
+            LOG.warn("Estimated cardinality for union of estimates is zero or less: %.2f, setting num shards to %d",
+                     estimatedCardinality, estimatedNumShards
+            );
+          } else {
+            // determine numShards based on maxRowsPerSegment and the cardinality
+            estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          }
+          LOG.debug("estimatedNumShards %d given estimated cardinality %.2f and maxRowsPerSegment %d",

Review Comment:
   How many depends on how many shards were created per interval. This code will. union all the independent shards created from the parallel sub-tasks in order to create the final shard. I am being cautious again because before such logging was considered excessive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] zachjsh commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

zachjsh commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r872746879


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;

Review Comment:
   I realize that complicates things, so your choice to add or not. Would be good to allow some control here, but not a blocker.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] loquisgon merged pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

loquisgon merged PR #12443:
URL: https://github.com/apache/druid/pull/12443


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] somu-imply commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

somu-imply commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r869455415


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;

Review Comment:
   instead of setting this 7 here, we should move this as a final static variable up top. Something like `DEFAULT_NUM_SHARDS`
   



##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;

Review Comment:
   Just a question, why we went with 7 and say why not 3 ? Any rationale behind it ?



##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -703,6 +704,10 @@ TaskStatus runHashPartitionMultiPhaseParallel(TaskToolbox toolbox) throws Except
             cardinalityRunner.getReports().values(),
             effectiveMaxRowsPerSegment
         );
+
+        // This is for potential debugging in case we suspect bad estimation of cardinalities etc,
+        LOG.debug("intervalToNumShards: %s", intervalToNumShards.toString());
+

Review Comment:
   Apart from putting it in the log, do we need a metric around it too ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] zachjsh commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

zachjsh commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r869623312


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;

Review Comment:
   Could also make this configurable with a default of 7.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] zachjsh commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

zachjsh commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r869621970


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -703,6 +704,10 @@ TaskStatus runHashPartitionMultiPhaseParallel(TaskToolbox toolbox) throws Except
             cardinalityRunner.getReports().values(),
             effectiveMaxRowsPerSegment
         );
+
+        // This is for potential debugging in case we suspect bad estimation of cardinalities etc,
+        LOG.debug("intervalToNumShards: %s", intervalToNumShards.toString());
+

Review Comment:
   How often would this log message be hit? We rarely turn on debug logging until after an issue is seen. If its loggign here wont be too much, could we move to info level?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] zachjsh commented on a diff in pull request #12443: Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase

Posted by GitBox <gi...@apache.org>.

zachjsh commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r871603644


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code handles that
+            // Seven on the other hand will at least create some shards rather than potentially a single huge one
+            estimatedNumShards = 7L;

Review Comment:
   hmm, yeah failing the ingestion completely may be extreme. I think best is to allow user to set a fallback value if it cant be found, maybe a negative value for the config could indicate to fail the ingestion?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org