You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/04/20 16:50:07 UTC

[GitHub] [druid] AmatyaAvadhanula opened a new pull request, #12465: Avoid unnecessary cache building for cachingCost

AmatyaAvadhanula opened a new pull request, #12465:
URL: https://github.com/apache/druid/pull/12465

   <!-- Thanks for trying to help us make Apache Druid be the best it can be! Please fill out as much of the following information as is possible (where relevant, and remove it when irrelevant) to help make the intention and scope of this PR clear in order to ease review. -->
   
   <!-- Please read the doc for contribution (https://github.com/apache/druid/blob/master/CONTRIBUTING.md) before making this PR. Also, once you open a PR, please _avoid using force pushes and rebasing_ since these make it difficult for reviewers to see what you've changed in response to their reviews. See [the 'If your pull request shows conflicts with master' section](https://github.com/apache/druid/blob/master/CONTRIBUTING.md#if-your-pull-request-shows-conflicts-with-master) for more details. -->
   
   Fixes #XXXX.
   
   <!-- Replace XXXX with the id of the issue fixed in this PR. Remove this section if there is no corresponding issue. Don't reference the issue in the title of this pull-request. -->
   
   <!-- If you are a committer, follow the PR action item checklist for committers:
   https://github.com/apache/druid/blob/master/dev/committer-instructions.md#pr-and-issue-action-item-checklist-for-committers. -->
   
   ### Description
   
   CachingCostBalancerStrategy can be inefficient when there are a large number of segments in the load / drop queue.
   
   It builds a cache which takes O(N ^ 2) and computes it N times in the process of loading N segments.
   
   This can be avoided by simply computing and adding the pairwise costs in O(N) computed N times.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `CachingCostBalancerStrategy`
   
   
   <hr>
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist below are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a diff in pull request #12465: Avoid unnecessary cache building for cachingCost

Posted by GitBox <gi...@apache.org>.
kfaraz commented on code in PR #12465:
URL: https://github.com/apache/druid/pull/12465#discussion_r875514860


##########
core/src/main/java/org/apache/druid/timeline/SegmentId.java:
##########
@@ -80,6 +79,12 @@ public final class SegmentId implements Comparable<SegmentId>
    */
   private static final Interner<String> STRING_INTERNER = Interners.newWeakInterner();
 
+  /**
+   * Store Intervals since creating them each time before returning is an expensive operation

Review Comment:
   Thanks for adding this!



##########
server/src/main/java/org/apache/druid/server/coordinator/CachingCostBalancerStrategy.java:
##########
@@ -70,10 +70,19 @@ protected double computeCost(DataSegment proposalSegment, ServerHolder server, b
     return cost * (server.getMaxSize() / server.getAvailableSize());
   }
 
-  private ClusterCostCache costCacheForLoadingSegments(ServerHolder server)
+  private double costCacheForLoadingSegments(ServerHolder server, DataSegment proposalSegment)

Review Comment:
   Nit: Rename to `computeCostForLoadingSegmentOnServer`



##########
core/src/main/java/org/apache/druid/timeline/SegmentId.java:
##########
@@ -278,9 +276,7 @@ public static SegmentId dummy(String dataSource, int partitionNum)
   private SegmentId(String dataSource, Interval interval, String version, int partitionNum)
   {
     this.dataSource = STRING_INTERNER.intern(Objects.requireNonNull(dataSource));
-    this.intervalStartMillis = interval.getStartMillis();
-    this.intervalEndMillis = interval.getEndMillis();
-    this.intervalChronology = interval.getChronology();
+    this.interval = INTERVAL_INTERNER.intern(interval);

Review Comment:
   Can interval ever be null here?
   If not, we can add `Objects.requireNonNull` similar to the datasource validation in the previous line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] AmatyaAvadhanula closed pull request #12465: Avoid unnecessary cache building for cachingCost

Posted by "AmatyaAvadhanula (via GitHub)" <gi...@apache.org>.
AmatyaAvadhanula closed pull request #12465: Avoid unnecessary cache building for cachingCost
URL: https://github.com/apache/druid/pull/12465


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on pull request #12465: Avoid unnecessary cache building for cachingCost

Posted by GitBox <gi...@apache.org>.
imply-cheddar commented on PR #12465:
URL: https://github.com/apache/druid/pull/12465#issuecomment-1158793347

   Doing just the interval interning would make this mergeable, definitely do a separate PR for the caching as the correctness of that is less clear.  
   
   We should probably include the flamegraphs that led us to make this code change in this PR.
   
   In terms of memory consumption, the fields being stored on SegmentId are the exact same as what an Interval stores, by interning and reusing the same reference, given that the same interval tends to show up a lot, we should actually save on memory consumption versus increase it while also improving performance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] AmatyaAvadhanula commented on pull request #12465: Avoid unnecessary cache building for cachingCost

Posted by "AmatyaAvadhanula (via GitHub)" <gi...@apache.org>.
AmatyaAvadhanula commented on PR #12465:
URL: https://github.com/apache/druid/pull/12465#issuecomment-1608791584

   Closing since https://github.com/apache/druid/pull/14484 deprecates cachingCost


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org