You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/12/30 00:33:38 UTC

[GitHub] [druid] jasonk000 opened a new pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

jasonk000 opened a new pull request #12105:
URL: https://github.com/apache/druid/pull/12105


   ### Description
   
   If large number of values are required from `DimensionDictionary` during indexing, fetch them all in a single lock/unlock call instead of lock/unlock each individual item.
   
   During indexing there are repeated lock/unlock boundary crossing. In a sample application (57 fields to index), this consumes ~9% of the taskrunner CPU.
   
   Depending on indexing row configuration specifics, the indexer usage of `DimensionDictionary` can consume anywhere from 1-20% of the CPU time during processing. This PR addresses one aspect of the processing, specifically the getValues. (I'll introduce another PR on the add/size change).
   
   #### Introduce a `getValuesInto()` call to the `DimensionDictionary`, and use it
   
   This introduces a `getValuesInto` call, which accepts an array of IDs to fetch, and performs the equivalent of many `getValue()` calls.
   
   #### Expand benchmarks to cover wider rows and some concurrency
   
   ### Design option
   
   There is one other design option, that is in fact much faster again, but is a bit less intuitive.
   
   Here are the benchmark results across all three:
   
   ```
   single getValue()
   Benchmark                                                                  (cardinality)  (rowSize)  Mode  Cnt  Score    Error  Units
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize                    10000          8  avgt   10  0.046 ±  0.001  us/op
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize                    10000         40  avgt   10  0.215 ±  0.007  us/op
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSizeTwoThreads          10000          8  avgt   10  3.484 ±  0.032  us/op
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSizeTwoThreads          10000         40  avgt   10  8.638 ±  0.514  us/op
   
   bulk getValuesInto() <----- THIS SOLUTION
   Benchmark                                                                  (cardinality)  (rowSize)  Mode  Cnt  Score    Error  Units
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize                    10000          8  avgt   10  0.039 ±  0.001  us/op
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize                    10000         40  avgt   10  0.120 ±  0.002  us/op
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSizeTwoThreads          10000          8  avgt   10  0.383 ±  0.052  us/op
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSizeTwoThreads          10000         40  avgt   10  0.386 ±  0.018  us/op
   
   using doInsideReadLock() <----- ALTERNATIVE, FASTER BUT LESS CLEAN
   Benchmark                                                                  (cardinality)  (rowSize)  Mode  Cnt  Score    Error  Units
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize                    10000          8  avgt   10  0.018 ±  0.001  us/op
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize                    10000         40  avgt   10  0.077 ±  0.002  us/op
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSizeTwoThreads          10000          8  avgt   10  0.241 ±  0.004  us/op
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSizeTwoThreads          10000         40  avgt   10  0.486 ±  0.002  us/op
   ```
   
   ### Alternatives (faster!)
   
   The alternative, `doInsideReadLoop()` is to pass a lambda / closure into the `DimensionDictionary` boundary and have it perform locking and execute on the other side. This is likely to be faster for most cases, however, it's a bit less clear in the API (I'll leave an extra comment).
   
   The solution involves similar changes, and passing a closure across the boundary. However, it does leak implementation concerns...
   
   ```
   /************ DimensionDictionary.java **************/
   public void doInsideReadLock(BiConsumer<List<T>, Integer> fn)
   {
     lock.readLock().lock();
     try {
       fn.accept(idToValue, idForNull);
     }
     finally {
       lock.readLock().unlock();
     }
   }
   
   
   /************ StringDimensionIndexer.java **************/
   @Override
   public long estimateEncodedKeyComponentSize(int[] key)
   {
     int[] estimatedSize = new int[]{key.length * Integer.BYTES};
     dimLookup.doInsideReadLock((List<String> idToValue, Integer idForNull) -> {
       for (int id : key) {
         if (id != idForNull) {
             String val = idToValue.get(id);
             int sizeOfString = 28 + 16 + (2 * val.length());
             estimatedSize[0] += sizeOfString;
         }
       }
     });
     return estimatedSize[0];
   }
   ```
   
   
   Otherwise, I think the changes are straightforward!
   
   <hr>
   
   
   This PR has:
   - [x] been self-reviewed.
      - [x] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster. (as part of other changes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 edited a comment on pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
jasonk000 edited a comment on pull request #12105:
URL: https://github.com/apache/druid/pull/12105#issuecomment-1003156510


   while we are here, could you share an opinion on this implementation, it's faster again but with a different API:
   https://github.com/jasonk000/druid/commit/f4354bbe8ccaed556c0488bc875d753afabc1b86 -> https://github.com/jasonk000/druid/pull/8/files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #12105:
URL: https://github.com/apache/druid/pull/12105#issuecomment-1051215739


   Thanks @jasonk000 !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on a change in pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on a change in pull request #12105:
URL: https://github.com/apache/druid/pull/12105#discussion_r776540734



##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionDictionary.java
##########
@@ -84,6 +84,19 @@ public T getValue(int id)
     }
   }
 
+  public void getValuesInto(int[] ids, T[] values)
+  {
+    lock.readLock().lock();
+    try {
+      for (int i = 0; i < ids.length; i++) {

Review comment:
       Good point on checking array lengths match.
   
   I did explore returning value only (that's my internal impl actually), but it requires wider changes. Specifically, since Java erases generics, it's impossible to do `new T[]`. So, we have to either do something like:
   ```
   StringDimensionIndexer() constructor {
   super(String.class);
   }
   
   // and then propagate this Class<T> to DimensionDictionary, and in DimensionDictionary, do a:
   
   T[] out = (T[]) Array.newInstance(typeFromConstructor, keys.length);
   ```
   To me, both are viable and work fine, this approach seems less invasive.
   
   So, we could either: (1) check array length explicitly or (2) create new array inside `getValues()` call.
   
   What do you suggest?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on a change in pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on a change in pull request #12105:
URL: https://github.com/apache/druid/pull/12105#discussion_r776838503



##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionDictionary.java
##########
@@ -84,6 +84,19 @@ public T getValue(int id)
     }
   }
 
+  public void getValuesInto(int[] ids, T[] values)
+  {
+    lock.readLock().lock();
+    try {
+      for (int i = 0; i < ids.length; i++) {

Review comment:
       two paths here:
   - explicit check solution in f4e4f9e5c0d0622e4f099e0668ae36d90a8a0f1d
   - refactor to use reflection in f91f709a43462b2f7e8ab1339aedccbb10a10cfe




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on pull request #12105:
URL: https://github.com/apache/druid/pull/12105#issuecomment-1003242678


   > while we are here, could you share an opinion on this implementation, it's faster again but with a different API: [jasonk000@f4354bb](https://github.com/jasonk000/druid/commit/f4354bbe8ccaed556c0488bc875d753afabc1b86) -> https://github.com/jasonk000/druid/pull/8/files
   
   As a public API that accepts a callback, I'm worried about if user provides a time-consuming callback, which means the read lock will be held for a long time, and the write lock will be blocked. So I think current solution in this PR might be better.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #12105:
URL: https://github.com/apache/druid/pull/12105#issuecomment-1040612302


   @jasonk000 Thanks for another optimization! Do you think you could clean up the conflicts in this PR so that it can be merged? Sorry it fell off the radar


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s merged pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
suneet-s merged pull request #12105:
URL: https://github.com/apache/druid/pull/12105


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on a change in pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on a change in pull request #12105:
URL: https://github.com/apache/druid/pull/12105#discussion_r776540734



##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionDictionary.java
##########
@@ -84,6 +84,19 @@ public T getValue(int id)
     }
   }
 
+  public void getValuesInto(int[] ids, T[] values)
+  {
+    lock.readLock().lock();
+    try {
+      for (int i = 0; i < ids.length; i++) {

Review comment:
       Good point on checking array lengths match.
   
   I did explore returning value only (that's my internal impl actually), but it requires wider changes. Specifically, since Java erases generics, it's impossible to do `new T[]`. So, we have to either do something like:
   ```
   StringDimensionIndexer() constructor {
   super(String.class);
   }
   
   // and then propagate this Class<T> to DimensionDictionary, and in DimensionDictionary, do a:
   
   T[] out = (T[]) Array.newInstance(typeFromConstructor, keys.length);
   ```
   To me, both are viable and work fine, this approach seems less invasive.
   
   What do you suggest?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #12105:
URL: https://github.com/apache/druid/pull/12105#discussion_r776538984



##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionDictionary.java
##########
@@ -84,6 +84,19 @@ public T getValue(int id)
     }
   }
 
+  public void getValuesInto(int[] ids, T[] values)
+  {
+    lock.readLock().lock();
+    try {
+      for (int i = 0; i < ids.length; i++) {

Review comment:
       Since here you're copying the whole `ids` array to the input array without checking if the input `values` length matches the length of `ids`, I'm suggesting that this new method only accepts `ids` as parameter, and returns `T[]`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on pull request #12105:
URL: https://github.com/apache/druid/pull/12105#issuecomment-1003156510


   while we are here, could you share an opinion on this implementation, it's faster again but with a different API:
   https://github.com/jasonk000/druid/commit/f4354bbe8ccaed556c0488bc875d753afabc1b86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on a change in pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on a change in pull request #12105:
URL: https://github.com/apache/druid/pull/12105#discussion_r776838503



##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionDictionary.java
##########
@@ -84,6 +84,19 @@ public T getValue(int id)
     }
   }
 
+  public void getValuesInto(int[] ids, T[] values)
+  {
+    lock.readLock().lock();
+    try {
+      for (int i = 0; i < ids.length; i++) {

Review comment:
       fixed in f4e4f9e5c0d0622e4f099e0668ae36d90a8a0f1d and f91f709a43462b2f7e8ab1339aedccbb10a10cfe




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on pull request #12105:
URL: https://github.com/apache/druid/pull/12105#issuecomment-1050466554


   @suneet-s rebased!
   
   > @jasonk000 Thanks for another optimization! Do you think you could clean up the conflicts in this PR so that it can be merged? Sorry it fell off the radar
   
   Rebased! Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on pull request #12105:
URL: https://github.com/apache/druid/pull/12105#issuecomment-1006795945


   > LGTM. +1 after CI.
   
   @FrankChen021 It seems the CI failure on this PR is unrelated area in stage 2 tests. Any suggestions? I can close/open to trigger CI?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12105:
URL: https://github.com/apache/druid/pull/12105#discussion_r791645296



##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionDictionary.java
##########
@@ -84,6 +84,19 @@ public T getValue(int id)
     }
   }
 
+  public void getValuesInto(int[] ids, T[] values)
+  {
+    lock.readLock().lock();
+    try {
+      for (int i = 0; i < ids.length; i++) {

Review comment:
       #12073 is adding a `StringDimensionDictionary`, so this method might be able to be a bit cleaner in the future, https://github.com/apache/druid/pull/12073/files#diff-f8b1f0a09275a697d95c1d716e7c9f7650cdd2d7021497018df75e4635e4377cR28




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on a change in pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on a change in pull request #12105:
URL: https://github.com/apache/druid/pull/12105#discussion_r776838267



##########
File path: processing/src/main/java/org/apache/druid/segment/StringDimensionIndexer.java
##########
@@ -132,8 +132,10 @@ public long estimateEncodedKeyComponentSize(int[] key)
     // even though they are stored just once. It may overestimate the size by a bit, but we wanted to leave
     // more buffer to be safe
     long estimatedSize = key.length * Integer.BYTES;
-    for (int element : key) {

Review comment:
       fixed in d166364a1b76a487fc808d993973ed064cbf8978




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on pull request #12105:
URL: https://github.com/apache/druid/pull/12105#issuecomment-1003156455


   hey @FrankChen021 , feedback covered from your previous review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #12105:
URL: https://github.com/apache/druid/pull/12105#issuecomment-1040612302


   @jasonk000 Thanks for another optimization! Do you think you could clean up the conflicts in this PR so that it can be merged? Sorry it fell off the radar


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12105:
URL: https://github.com/apache/druid/pull/12105#discussion_r791645296



##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionDictionary.java
##########
@@ -84,6 +84,19 @@ public T getValue(int id)
     }
   }
 
+  public void getValuesInto(int[] ids, T[] values)
+  {
+    lock.readLock().lock();
+    try {
+      for (int i = 0; i < ids.length; i++) {

Review comment:
       #12073 is adding a `StringDimensionDictionary`, so this method might be able to be a bit cleaner in the future, https://github.com/apache/druid/pull/12073/files#diff-f8b1f0a09275a697d95c1d716e7c9f7650cdd2d7021497018df75e4635e4377cR28




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #12105:
URL: https://github.com/apache/druid/pull/12105#discussion_r776539164



##########
File path: processing/src/main/java/org/apache/druid/segment/StringDimensionIndexer.java
##########
@@ -132,8 +132,10 @@ public long estimateEncodedKeyComponentSize(int[] key)
     // even though they are stored just once. It may overestimate the size by a bit, but we wanted to leave
     // more buffer to be safe
     long estimatedSize = key.length * Integer.BYTES;
-    for (int element : key) {

Review comment:
       nit: it's better to rename the original variable name `key` to `keys` to make it intuitive.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #12105: perf: indexing: Introduce a bulk getValuesInto function to read values

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #12105:
URL: https://github.com/apache/druid/pull/12105#discussion_r776549385



##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionDictionary.java
##########
@@ -84,6 +84,19 @@ public T getValue(int id)
     }
   }
 
+  public void getValuesInto(int[] ids, T[] values)
+  {
+    lock.readLock().lock();
+    try {
+      for (int i = 0; i < ids.length; i++) {

Review comment:
       Yes, handling generic in java is a little complicated. How about using `List<T>` as the returning type? This won't introduce any magic code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org