You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/12/09 22:43:58 UTC

[GitHub] [druid] clintropolis opened a new pull request #12048: fix IncrementalIndex performance regression

clintropolis opened a new pull request #12048:
URL: https://github.com/apache/druid/pull/12048


   ### Description
   This PR fixes a performance regression caused by #11853, where adding type information to the `RowBasedColumnSelectorFactory` of `IncrementalIndex` was done by using the `getColumnCapabilities` method the latter had, which built a hashmap of all the capabilities and then translated that into a `RowSignature`. This turned out to be very dramatically expensive.
   
   To fix this, `IncrementalIndex` now just implements `ColumnInspector` so that it can cut out the middle-man and serve as the `ColumnInspector` for the `RowBasedColumnSelectorFactory`.
   
   Before:
   ![Screen Shot 2021-12-09 at 2 27 38 PM](https://user-images.githubusercontent.com/1577461/145486999-7fe3e064-3779-424d-9c88-9a111617e2ec.png)
   
   After:
   ![Screen Shot 2021-12-09 at 2 27 28 PM](https://user-images.githubusercontent.com/1577461/145487077-b868e1b5-98f6-42f8-bd2f-86d002f0921e.png)
   
   
   Zoomed into sink.add:
   before:
   ![Screen Shot 2021-12-09 at 2 28 40 PM](https://user-images.githubusercontent.com/1577461/145487072-33547634-06bc-4f04-9550-469e4cf3b9b2.png)
   
   after:
   ![Screen Shot 2021-12-09 at 2 28 31 PM](https://user-images.githubusercontent.com/1577461/145487010-83f873a3-5856-4c5c-968d-823ee3cb84c9.png)
   
   
   Task run times are shorter too:
   before:
   <img width="1640" alt="Screen Shot 2021-12-09 at 2 24 21 PM" src="https://user-images.githubusercontent.com/1577461/145487240-9633c828-d300-44e4-bfa6-dd84eed30405.png">
   after:
   <img width="1639" alt="Screen Shot 2021-12-09 at 2 27 00 PM" src="https://user-images.githubusercontent.com/1577461/145487244-6642a2aa-ccfb-456d-9078-9179fcd0be8e.png">
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12048: fix IncrementalIndex performance regression

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12048:
URL: https://github.com/apache/druid/pull/12048#discussion_r766235589



##########
File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##########
@@ -441,6 +441,19 @@ public InputRow formatRow(InputRow row)
     return builder.build();
   }
 
+  @Nullable
+  @Override
+  public ColumnCapabilities getColumnCapabilities(String columnName)
+  {
+    if (timeAndMetricsColumnCapabilities.containsKey(columnName)) {
+      return timeAndMetricsColumnCapabilities.get(columnName);
+    }
+    if (dimensionDescs.containsKey(columnName)) {

Review comment:
       btw, I didn't trust any of the users of `dimensionDescs` that weren't obviously safe so adjusted a couple of other locations as well before I did the measurement in my previous comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12048: fix IncrementalIndex performance regression

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12048:
URL: https://github.com/apache/druid/pull/12048#discussion_r766224253



##########
File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##########
@@ -441,6 +441,19 @@ public InputRow formatRow(InputRow row)
     return builder.build();
   }
 
+  @Nullable
+  @Override
+  public ColumnCapabilities getColumnCapabilities(String columnName)
+  {
+    if (timeAndMetricsColumnCapabilities.containsKey(columnName)) {
+      return timeAndMetricsColumnCapabilities.get(columnName);
+    }
+    if (dimensionDescs.containsKey(columnName)) {

Review comment:
       Good question, will have a look. ~`timeAndMetricsColumnCapabilities` would also have this problem potentially?~ nevermind, it doesn't change after constructor




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12048: fix IncrementalIndex performance regression

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12048:
URL: https://github.com/apache/druid/pull/12048#discussion_r766233092



##########
File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##########
@@ -441,6 +441,19 @@ public InputRow formatRow(InputRow row)
     return builder.build();
   }
 
+  @Nullable
+  @Override
+  public ColumnCapabilities getColumnCapabilities(String columnName)
+  {
+    if (timeAndMetricsColumnCapabilities.containsKey(columnName)) {
+      return timeAndMetricsColumnCapabilities.get(columnName);
+    }
+    if (dimensionDescs.containsKey(columnName)) {

Review comment:
       hmm, I think it should be synchronized anyway because callers would not expect this method to not be threadsafe. it doesn't seem to cause a performance impact afaict:
   
   without:
   ![Screen Shot 2021-12-09 at 3 30 16 PM](https://user-images.githubusercontent.com/1577461/145491925-fb52f4b9-93b7-4616-bf9a-0b1a4b3630c3.png)
   
   
   with:
   ![Screen Shot 2021-12-09 at 3 30 04 PM](https://user-images.githubusercontent.com/1577461/145491931-39672b36-e262-471b-beb8-490ccef69477.png)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12048: fix IncrementalIndex performance regression

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12048:
URL: https://github.com/apache/druid/pull/12048#discussion_r766224253



##########
File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##########
@@ -441,6 +441,19 @@ public InputRow formatRow(InputRow row)
     return builder.build();
   }
 
+  @Nullable
+  @Override
+  public ColumnCapabilities getColumnCapabilities(String columnName)
+  {
+    if (timeAndMetricsColumnCapabilities.containsKey(columnName)) {
+      return timeAndMetricsColumnCapabilities.get(columnName);
+    }
+    if (dimensionDescs.containsKey(columnName)) {

Review comment:
       Good question, will have a look. `timeAndMetricsColumnCapabilities` would also have this problem potentially

##########
File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##########
@@ -441,6 +441,19 @@ public InputRow formatRow(InputRow row)
     return builder.build();
   }
 
+  @Nullable
+  @Override
+  public ColumnCapabilities getColumnCapabilities(String columnName)
+  {
+    if (timeAndMetricsColumnCapabilities.containsKey(columnName)) {
+      return timeAndMetricsColumnCapabilities.get(columnName);
+    }
+    if (dimensionDescs.containsKey(columnName)) {

Review comment:
       Good question, will have a look. `timeAndMetricsColumnCapabilities` would also have this problem potentially?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cheddar commented on a change in pull request #12048: fix IncrementalIndex performance regression

Posted by GitBox <gi...@apache.org>.
cheddar commented on a change in pull request #12048:
URL: https://github.com/apache/druid/pull/12048#discussion_r766220636



##########
File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##########
@@ -441,6 +441,19 @@ public InputRow formatRow(InputRow row)
     return builder.build();
   }
 
+  @Nullable
+  @Override
+  public ColumnCapabilities getColumnCapabilities(String columnName)
+  {
+    if (timeAndMetricsColumnCapabilities.containsKey(columnName)) {
+      return timeAndMetricsColumnCapabilities.get(columnName);
+    }
+    if (dimensionDescs.containsKey(columnName)) {

Review comment:
       Looking through the code in IncrementalIndex, sometimes `dimensionDescs` is accessed with synchronization.  Sometimes it is not.  Assuming that it's okay to access it without synchronization, this looks good.  Have we verified that we will already be in a synchronized block once we get to this method?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org