You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "clintropolis (via GitHub)" <gi...@apache.org> on 2023/03/30 04:50:06 UTC

[GitHub] [druid] clintropolis opened a new pull request, #14002: lower segment heap footprint and fix bug with expression type coercion

clintropolis opened a new pull request, #14002:
URL: https://github.com/apache/druid/pull/14002

   ### Description
   While working on some other stuff I noticed that we were doing something when loading segments which results in some extra heap usage storing redundant stuff we don't really need. Specifically, the changes in #12279 causes us to make a combined column and dimension list in the form of `Indexed` to pass to the `SimpleQueryableIndex`, which it was doing prior to this PR with https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/IndexIO.java#L648 `GenericIndexed.fromIterable` which is going to make new heap bytebuffers of the column and dimension lists. The columns `Indexed` is thrown away basically, but the dimensions list is kept around forever.
   
   We actually already have to have all of these column names on heap, and are already interning them via the `FileSmooshMapper` https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/java/util/common/io/smoosh/SmooshedFileMapper.java#L88 which does so whenever it reads `meta.smoosh` to get the internal file list of a smoosh. There is an internal file as the entry point for every column name, so if we re-use this interner for the map of `ColumnHolder` suppliers as well as to back the `Indexed<String>` for the dimension list then we can save some heap.
   
   With my laptop test cluster of ~5k segments it reduces the heap footprint by ~20MB
   
   Before:
   ![Screenshot 2023-03-29 at 8 27 59 PM](https://user-images.githubusercontent.com/1577461/228732099-bcea909c-c400-4d42-b6db-c5ec2033af19.png)
   
   After:
   ![Screenshot 2023-03-29 at 8 28 22 PM](https://user-images.githubusercontent.com/1577461/228732186-f4c6f7a5-1718-4edb-b3dc-28bd402166c7.png)
   
   Looking specifically at strings, we now have about 1/3 as many on heap (from the map of ColumnHolder suppliers), not counting all of the heap bytebuffers which were backing the dimensions lists.
   ![Screenshot 2023-03-29 at 8 38 54 PM](https://user-images.githubusercontent.com/1577461/228732287-5166a7c4-3342-4cee-8240-6ba32ba51f13.png)
   ![Screenshot 2023-03-29 at 8 39 00 PM](https://user-images.githubusercontent.com/1577461/228732298-0dfc568e-0d3f-488d-9ae9-af19de3dc033.png)
   
   
   Completely unrelated, this PR also fixes a bug with native expression type coercion with complex types. The added test would fail where it would return "unknown complex" sometimes when using the `ExpressionTypeCoercion.operator` method. I guess I could have made it its own PR, but both things were small...
   
   
   
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on a diff in pull request #14002: lower segment heap footprint and fix bug with expression type coercion

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.
imply-cheddar commented on code in PR #14002:
URL: https://github.com/apache/druid/pull/14002#discussion_r1154121916


##########
processing/src/main/java/org/apache/druid/java/util/common/io/smoosh/SmooshedFileMapper.java:
##########
@@ -48,7 +48,11 @@
  */
 public class SmooshedFileMapper implements Closeable
 {
-  private static final Interner<String> STRING_INTERNER = Interners.newWeakInterner();
+  /**
+   * Interner for smoosh internal files, which includes all column names since very column has an internal file

Review Comment:
   typo: s/very/every/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #14002: lower segment heap footprint and fix bug with expression type coercion

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis merged PR #14002:
URL: https://github.com/apache/druid/pull/14002


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org