You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/08/21 19:20:37 UTC

[GitHub] [druid] jihoonson opened a new pull request #10307: Add support for all partitioing schemes for auto compaction

jihoonson opened a new pull request #10307:
URL: https://github.com/apache/druid/pull/10307


   ### Description
   
   This PR allows using all supported partitioning schemes in auto compaction. In addition to that, all tuningConfigs in `ParallelIndexTuningConfig` are now available for auto compaction except for `forceGuaranteedRollup` and configurations related to parse exceptions. `forceGuaranteedRollup` can be computed based on the type of `partitionsSpec` and the configurations related to parse exceptions are not supported by parallel index task yet. These changes can be found in `ClientCompactionTaskQueryTuningConfig`.
   
   `NewestSegmentFirstIterator` is now modified to allow all partitioning schemes. Before, it always triggered compactions if it finds a segment compacted with non-dynamic partitioning.
   
   `ParallelIndexSupervisorTask` is also now able to annotate the `lastCompactionState` properly with non-dynamic partitioning. Before, the `lastCompactionState` wasn't stored properly when it runs with multiple subtasks for hash/range partitioning.
   
   The web console change is not fixed in this PR.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `ClientCompactionTaskQueryTuningConfig`
    * `NewestSegmentFirstIterator`
    * `ParallelIndexSupervisorTask`
   * `TransactionalSegmentPublisher`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #10307: Add support for all partitioing schemes for auto compaction

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #10307:
URL: https://github.com/apache/druid/pull/10307#discussion_r475882549



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java
##########
@@ -764,7 +765,28 @@ private PartitionBoundaries determineRangePartition(Collection<StringDistributio
     return Pair.of(start, stop);
   }
 
-  private static void publishSegments(TaskToolbox toolbox, Map<String, PushedSegmentsReport> reportsMap)
+  public static Function<Set<DataSegment>, Set<DataSegment>> compactionStateAnnotateFunction(

Review comment:
       Good idea! Moved this method.

##########
File path: server/src/main/java/org/apache/druid/client/indexing/ClientCompactionTaskQueryTuningConfig.java
##########
@@ -158,27 +222,90 @@ public IndexSpec getIndexSpec()
     return indexSpec;
   }
 
+  @JsonProperty
+  @Nullable
+  public IndexSpec getIndexSpecForIntermediatePersists()
+  {
+    return indexSpecForIntermediatePersists;
+  }
+
   @JsonProperty
   @Nullable
   public Integer getMaxPendingPersists()
   {
     return maxPendingPersists;
   }
 
+  @JsonProperty

Review comment:
       Yes, this should be passed properly when this class is deserialized as `ParallelIndexTuningConfig`.

##########
File path: server/src/main/java/org/apache/druid/server/coordinator/duty/NewestSegmentFirstIterator.java
##########
@@ -247,17 +248,30 @@ public boolean hasNext()
     }
   }
 
-  private boolean needsCompaction(DataSourceCompactionConfig config, SegmentsToCompact candidates)
+  @VisibleForTesting
+  static PartitionsSpec findPartitinosSpecFromConfig(ClientCompactionTaskQueryTuningConfig tuningConfig)
+  {

Review comment:
       The auto compaction supported only dynamic partitioning before this PR. The deprecated `maxRowsPerSegment` and `maxTotalRows` in `DataSourceCompactionConfig` are for dynamic partitioning. Users are recommended to use `partitionsSpec` instead of them. 
   
   > Misread the code. if the maxTotalRows is null in partition spec, should it be read from tuningConfig instead of using the Long.Max?
   
   My intention is using `partitionsSpec` if it's specified and ignoring deprecated ones. I will add docs for this behaviour as well as the newly supported tuning configurations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on pull request #10307: Add support for all partitioing schemes for auto compaction

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #10307:
URL: https://github.com/apache/druid/pull/10307#issuecomment-679356629


   ```
   org/apache/druid/indexing/common/task/batch/parallel/PartialSegmentMergeTask.java
   ------------------------------------------------------------------------------
   286                               new DataSegment(
   287                                   getDataSource(),
   288                                   interval,
   289 L                                 Preconditions.checkNotNull(
   290 L                                     ParallelIndexSupervisorTask.findVersion(intervalToVersion, interval),
   291                                       "version for interval[%s]",
   292                                       interval
   293 F                                 ),
   303 F | L | B(0/4)            exception -> !(exception instanceof NullPointerException) && exception instanceof Exception,
   ```
   
   The coverage bot complains about this change, but I'm not sure how to test this without a big refactoring around retry.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10307: Add support for all partitioing schemes for auto compaction

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10307:
URL: https://github.com/apache/druid/pull/10307#discussion_r475111599



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java
##########
@@ -764,7 +765,28 @@ private PartitionBoundaries determineRangePartition(Collection<StringDistributio
     return Pair.of(start, stop);
   }
 
-  private static void publishSegments(TaskToolbox toolbox, Map<String, PushedSegmentsReport> reportsMap)
+  public static Function<Set<DataSegment>, Set<DataSegment>> compactionStateAnnotateFunction(

Review comment:
       should this rather stay inside `AbstractBatchIndexTask` so that concrete Task classes remain unaware of each other? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 edited a comment on pull request #10307: Add support for all partitioing schemes for auto compaction

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 edited a comment on pull request #10307:
URL: https://github.com/apache/druid/pull/10307#issuecomment-678677120


   could this happen when the cluster is upgraded to newer version? 
    - The lastCompactionState for non-dynamic partitioned segments will still be null since they were created with older version of ParallelSupervisorIndexTask but would still be compacted by the new auto compaction logic since last compaction state is null. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on pull request #10307: Add support for all partitioing schemes for auto compaction

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #10307:
URL: https://github.com/apache/druid/pull/10307#issuecomment-679355934


   @abhishekagarwal87 thank you for the review.
   
   > could this happen when the cluster is upgraded to newer version?
   > 
   >     * The lastCompactionState for non-dynamic partitioned segments will still be null since they were created with older version of ParallelSupervisorIndexTask but would still be compacted by the new auto compaction logic since last compaction state is null.
   
   Yes, it can happen. Since auto compaction didn't support non-dynamic partitioning before, those segments should be created by a manual compaction if they are compacted. Those segments cannot be recognized by auto compaction automatically.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10307: Add support for all partitioing schemes for auto compaction

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10307:
URL: https://github.com/apache/druid/pull/10307#discussion_r475115873



##########
File path: server/src/main/java/org/apache/druid/server/coordinator/duty/NewestSegmentFirstIterator.java
##########
@@ -247,17 +248,30 @@ public boolean hasNext()
     }
   }
 
-  private boolean needsCompaction(DataSourceCompactionConfig config, SegmentsToCompact candidates)
+  @VisibleForTesting
+  static PartitionsSpec findPartitinosSpecFromConfig(ClientCompactionTaskQueryTuningConfig tuningConfig)
+  {

Review comment:
       so if the partition spec is defined in compaction config and is dynamic partition spec, `MaxRowsPerSegment` is taken from partition spec. if its not dynamic partition and single dimension for example, then `MaxRowsPerSegment` is taken from tuning config? 
   and it the same for `MaxTotalRows`? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on pull request #10307: Add support for all partitioing schemes for auto compaction

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #10307:
URL: https://github.com/apache/druid/pull/10307#issuecomment-681102067


   @maytasm thanks! I will create a follow-up PR soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on pull request #10307: Add support for all partitioing schemes for auto compaction

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #10307:
URL: https://github.com/apache/druid/pull/10307#issuecomment-680311655


   @maytasm thanks for the review. 
   
   > LGTM - Would be great if you can add Integration Tests. There are already existing auto compaction integration tests so it should not be too hard to add for this PR.
   
   That's a good idea. Do you mind if I add them in a follow-up PR with doc changes?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10307: Add support for all partitioing schemes for auto compaction

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10307:
URL: https://github.com/apache/druid/pull/10307#discussion_r475117201



##########
File path: server/src/main/java/org/apache/druid/server/coordinator/duty/NewestSegmentFirstIterator.java
##########
@@ -247,17 +248,30 @@ public boolean hasNext()
     }
   }
 
-  private boolean needsCompaction(DataSourceCompactionConfig config, SegmentsToCompact candidates)
+  @VisibleForTesting
+  static PartitionsSpec findPartitinosSpecFromConfig(ClientCompactionTaskQueryTuningConfig tuningConfig)
+  {

Review comment:
       Misread the code. if the maxTotalRows is null in partition spec, should it be read from tuningConfig instead of using the Long.Max?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on pull request #10307: Add support for all partitioing schemes for auto compaction

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on pull request #10307:
URL: https://github.com/apache/druid/pull/10307#issuecomment-678677120


   could this happen if the cluster is upgraded to newer version? 
    - The lastCompactionState for non-dynamic partitioned segments will still be null since they were created with older version of ParallelSupervisorIndexTask but would still be compacted by the new auto compaction logic since last compaction state is null. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10307: Add support for all partitioing schemes for auto compaction

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10307:
URL: https://github.com/apache/druid/pull/10307#discussion_r475112920



##########
File path: server/src/main/java/org/apache/druid/client/indexing/ClientCompactionTaskQueryTuningConfig.java
##########
@@ -158,27 +222,90 @@ public IndexSpec getIndexSpec()
     return indexSpec;
   }
 
+  @JsonProperty
+  @Nullable
+  public IndexSpec getIndexSpecForIntermediatePersists()
+  {
+    return indexSpecForIntermediatePersists;
+  }
+
   @JsonProperty
   @Nullable
   public Integer getMaxPendingPersists()
   {
     return maxPendingPersists;
   }
 
+  @JsonProperty

Review comment:
       should this be marked JsonProperty? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson merged pull request #10307: Add support for all partitioing schemes for auto compaction

Posted by GitBox <gi...@apache.org>.
jihoonson merged pull request #10307:
URL: https://github.com/apache/druid/pull/10307


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on pull request #10307: Add support for all partitioing schemes for auto compaction

Posted by GitBox <gi...@apache.org>.
maytasm commented on pull request #10307:
URL: https://github.com/apache/druid/pull/10307#issuecomment-680413632


   > @maytasm thanks for the review.
   > 
   > > LGTM - Would be great if you can add Integration Tests. There are already existing auto compaction integration tests so it should not be too hard to add for this PR.
   > 
   > That's a good idea. Do you mind if I add them in a follow-up PR with doc changes?
   
   I think it's fine to add in a follow up PR


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org