You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2022/02/13 18:20:31 UTC

[GitHub] [kafka] ruanwenjun opened a new pull request #11754: MINOR: Optimize collection method in Utils

ruanwenjun opened a new pull request #11754:
URL: https://github.com/apache/kafka/pull/11754


   * It's better to initialize collection size when create a collection like set, and the constructor like `HashSet(Collection<? extends E> c)` will calculate the best initialize size, so we don't need to calculate by ourselves.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ruanwenjun commented on a change in pull request #11754: MINOR: Optimize collection method in Utils

Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r810732547



##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
      */
     @SafeVarargs
     public static <T> Set<T> mkSet(T... elems) {
-        Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
-        for (T elem : elems)
-            result.add(elem);
-        return result;
+        return new HashSet<>(Arrays.asList(elems));

Review comment:
       @splett2 Yes, agree with you, the method `mkSet` is just used in some test code or static variable. This pr is not made due to the performance, it's just hoped to improve readability.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] splett2 commented on a change in pull request #11754: MINOR: Optimize collection method in Utils

Posted by GitBox <gi...@apache.org>.
splett2 commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r810211960



##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
      */
     @SafeVarargs
     public static <T> Set<T> mkSet(T... elems) {
-        Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
-        for (T elem : elems)
-            result.add(elem);
-        return result;
+        return new HashSet<>(Arrays.asList(elems));

Review comment:
       This microbenchmark doesn't seem to be written correctly.
   you are creating a singleton list of a primitive array in `testCreateHashSet1`.
   
   `inits` needs to be an `Integer` array rather than a primitive array to get the expected array/hashset behavior.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ruanwenjun commented on a change in pull request #11754: MINOR: Optimize collection method in Utils

Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r810426220



##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
      */
     @SafeVarargs
     public static <T> Set<T> mkSet(T... elems) {
-        Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
-        for (T elem : elems)
-            result.add(elem);
-        return result;
+        return new HashSet<>(Arrays.asList(elems));

Review comment:
       Thanks for your review, I changed as you said
   ```java
   private Integer[] inits = new Integer[100];
   @Setup(Level.Trial)
   public void setUp() {
       for (int i = 0; i < inits.length; i++) {
           inits[i] = ThreadLocalRandom.current().nextInt();
       }
   }
   @Benchmark
   public void testCreateHashSet1() {
       Set<Integer> ints = new HashSet<>(Arrays.asList(inits));
   }
   @Benchmark
   public void testCreateHashSet2() {
       Set<Integer> result = new HashSet<>((int) (inits.length / 0.75) + 1);
       for (Integer elem : inits)
           result.add(elem);
   }
   ```
   
   The benchmark result is 
   ```
   Benchmark                         Mode  Cnt  Score    Error  Units
   CreateSetTest.testCreateHashSet1  avgt   20  0.002 ±  0.001  ms/op
   CreateSetTest.testCreateHashSet2  avgt   20  0.001 ±  0.001  ms/op
   ```
   And change the order of two method the result is 
   ```
   Benchmark                         Mode  Cnt  Score    Error  Units
   CreateSetTest.testCreateHashSet2  avgt   20  0.002 ±  0.001  ms/op
   CreateSetTest.testCreateHashSet1  avgt   20  0.001 ±  0.001  ms/op
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ruanwenjun commented on a change in pull request #11754: MINOR: Optimize collection method in Utils

Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r805769215



##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
      */
     @SafeVarargs
     public static <T> Set<T> mkSet(T... elems) {
-        Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
-        for (T elem : elems)
-            result.add(elem);
-        return result;
+        return new HashSet<>(Arrays.asList(elems));

Review comment:
       Yes, I got your concern. The `Arrays.asList(elems) ` just copy the reference of elems.
   And I create a benchmark test looks like below
   ```java
   @State(Scope.Benchmark)
   @Fork(value = 1)
   @Warmup(iterations = 3)
   @Measurement(iterations = 10)
   @BenchmarkMode(Mode.AverageTime)
   @OutputTimeUnit(TimeUnit.MILLISECONDS)
   public class CreateSetTest {
   
       private int[] inits = new int[100];
   
       @Setup(Level.Trial)
       public void setUp() {
           for (int i = 0; i < inits.length; i++) {
               inits[i] = ThreadLocalRandom.current().nextInt();
           }
       }
   
       @Benchmark
       public void testCreateHashSet1() {
           HashSet<int[]> ints = new HashSet<>(Arrays.asList(inits));
       }
   
       @Benchmark
       public void testCreateHashSet2() {
           Set<Integer> result = new HashSet<>((int) (inits.length / 0.75) + 1);
           for (Integer elem : inits)
               result.add(elem);
       }
   
   
       public static void main(String[] args) throws RunnerException {
           Options opt = new OptionsBuilder()
               .include(CreateSetTest.class.getSimpleName())
               .forks(2)
               .build();
   
           new Runner(opt).run();
       }
   }
   ```
   
   The test result 
   ```
   Benchmark                Mode  Cnt   Score    Error  Units
   CreateSetTest.testCreateHashSet1  avgt   20  ≈ 10⁻⁴           ms/op
   CreateSetTest.testCreateHashSet2  avgt   20   0.001 ±  0.001  ms/op
   ```
   
   It seems it will not cause performance reduce.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ruanwenjun commented on a change in pull request #11754: MINOR: Optimize collection method in Utils

Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r810426220



##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
      */
     @SafeVarargs
     public static <T> Set<T> mkSet(T... elems) {
-        Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
-        for (T elem : elems)
-            result.add(elem);
-        return result;
+        return new HashSet<>(Arrays.asList(elems));

Review comment:
       @splett2 Thanks for your review, I changed as you said
   ```java
   private Integer[] inits = new Integer[100];
   @Setup(Level.Trial)
   public void setUp() {
       for (int i = 0; i < inits.length; i++) {
           inits[i] = ThreadLocalRandom.current().nextInt();
       }
   }
   @Benchmark
   public void testCreateHashSet1() {
       Set<Integer> ints = new HashSet<>(Arrays.asList(inits));
   }
   @Benchmark
   public void testCreateHashSet2() {
       Set<Integer> result = new HashSet<>((int) (inits.length / 0.75) + 1);
       for (Integer elem : inits)
           result.add(elem);
   }
   ```
   
   The benchmark result is 
   ```
   Benchmark                         Mode  Cnt  Score    Error  Units
   CreateSetTest.testCreateHashSet1  avgt   20  0.002 ±  0.001  ms/op
   CreateSetTest.testCreateHashSet2  avgt   20  0.001 ±  0.001  ms/op
   ```
   And change the order of two method the result is 
   ```
   Benchmark                         Mode  Cnt  Score    Error  Units
   CreateSetTest.testCreateHashSet2  avgt   20  0.002 ±  0.001  ms/op
   CreateSetTest.testCreateHashSet1  avgt   20  0.001 ±  0.001  ms/op
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] splett2 commented on a change in pull request #11754: MINOR: Optimize collection method in Utils

Posted by GitBox <gi...@apache.org>.
splett2 commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r810532447



##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
      */
     @SafeVarargs
     public static <T> Set<T> mkSet(T... elems) {
-        Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
-        for (T elem : elems)
-            result.add(elem);
-        return result;
+        return new HashSet<>(Arrays.asList(elems));

Review comment:
       It would be good to include the benchmark in the PR.
   As a comment on the benchmark, I think ms/op is too coarse of a measurement for the code we're benchmarking.
   
   Nanos/op is probably more appropriate.
   
   I am also wondering whether this code is called in the critical path anywhere. It doesn't seem to be. For instance almost all of the calls to `mkSet` seem to be test code or static variables.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ijuma commented on a change in pull request #11754: MINOR: Optimize collection method in Utils

Posted by GitBox <gi...@apache.org>.
ijuma commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r805423796



##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
      */
     @SafeVarargs
     public static <T> Set<T> mkSet(T... elems) {
-        Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
-        for (T elem : elems)
-            result.add(elem);
-        return result;
+        return new HashSet<>(Arrays.asList(elems));

Review comment:
       This approach creates a temporary object, so it's not clear that it's better. It would need benchmarking to confirm.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] splett2 commented on a change in pull request #11754: MINOR: Optimize collection method in Utils

Posted by GitBox <gi...@apache.org>.
splett2 commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r810211960



##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
      */
     @SafeVarargs
     public static <T> Set<T> mkSet(T... elems) {
-        Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
-        for (T elem : elems)
-            result.add(elem);
-        return result;
+        return new HashSet<>(Arrays.asList(elems));

Review comment:
       This microbenchmark doesn't seem to be written correctly.
   you are creating a singleton list of an int[] array in `testCreateHashSet1`.
   
   `inits` needs to be an `Integer` array rather than a primitive array to get the expected array/hashset behavior.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ijuma commented on a change in pull request #11754: MINOR: Optimize collection method in Utils

Posted by GitBox <gi...@apache.org>.
ijuma commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r805423796



##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
      */
     @SafeVarargs
     public static <T> Set<T> mkSet(T... elems) {
-        Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
-        for (T elem : elems)
-            result.add(elem);
-        return result;
+        return new HashSet<>(Arrays.asList(elems));

Review comment:
       This approach creates a temporary object, so it's not clear that it's better. It would need benchmarking to confirm.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org