You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2022/02/13 18:20:31 UTC
[GitHub] [kafka] ruanwenjun opened a new pull request #11754: MINOR: Optimize collection method in Utils
ruanwenjun opened a new pull request #11754:
URL: https://github.com/apache/kafka/pull/11754
* It's better to initialize collection size when create a collection like set, and the constructor like `HashSet(Collection<? extends E> c)` will calculate the best initialize size, so we don't need to calculate by ourselves.
### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [kafka] ruanwenjun commented on a change in pull request #11754: MINOR: Optimize collection method in Utils
Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r810732547
##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
*/
@SafeVarargs
public static <T> Set<T> mkSet(T... elems) {
- Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
- for (T elem : elems)
- result.add(elem);
- return result;
+ return new HashSet<>(Arrays.asList(elems));
Review comment:
@splett2 Yes, agree with you, the method `mkSet` is just used in some test code or static variable. This pr is not made due to the performance, it's just hoped to improve readability.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [kafka] splett2 commented on a change in pull request #11754: MINOR: Optimize collection method in Utils
Posted by GitBox <gi...@apache.org>.
splett2 commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r810211960
##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
*/
@SafeVarargs
public static <T> Set<T> mkSet(T... elems) {
- Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
- for (T elem : elems)
- result.add(elem);
- return result;
+ return new HashSet<>(Arrays.asList(elems));
Review comment:
This microbenchmark doesn't seem to be written correctly.
you are creating a singleton list of a primitive array in `testCreateHashSet1`.
`inits` needs to be an `Integer` array rather than a primitive array to get the expected array/hashset behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [kafka] ruanwenjun commented on a change in pull request #11754: MINOR: Optimize collection method in Utils
Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r810426220
##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
*/
@SafeVarargs
public static <T> Set<T> mkSet(T... elems) {
- Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
- for (T elem : elems)
- result.add(elem);
- return result;
+ return new HashSet<>(Arrays.asList(elems));
Review comment:
Thanks for your review, I changed as you said
```java
private Integer[] inits = new Integer[100];
@Setup(Level.Trial)
public void setUp() {
for (int i = 0; i < inits.length; i++) {
inits[i] = ThreadLocalRandom.current().nextInt();
}
}
@Benchmark
public void testCreateHashSet1() {
Set<Integer> ints = new HashSet<>(Arrays.asList(inits));
}
@Benchmark
public void testCreateHashSet2() {
Set<Integer> result = new HashSet<>((int) (inits.length / 0.75) + 1);
for (Integer elem : inits)
result.add(elem);
}
```
The benchmark result is
```
Benchmark Mode Cnt Score Error Units
CreateSetTest.testCreateHashSet1 avgt 20 0.002 ± 0.001 ms/op
CreateSetTest.testCreateHashSet2 avgt 20 0.001 ± 0.001 ms/op
```
And change the order of two method the result is
```
Benchmark Mode Cnt Score Error Units
CreateSetTest.testCreateHashSet2 avgt 20 0.002 ± 0.001 ms/op
CreateSetTest.testCreateHashSet1 avgt 20 0.001 ± 0.001 ms/op
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [kafka] ruanwenjun commented on a change in pull request #11754: MINOR: Optimize collection method in Utils
Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r805769215
##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
*/
@SafeVarargs
public static <T> Set<T> mkSet(T... elems) {
- Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
- for (T elem : elems)
- result.add(elem);
- return result;
+ return new HashSet<>(Arrays.asList(elems));
Review comment:
Yes, I got your concern. The `Arrays.asList(elems) ` just copy the reference of elems.
And I create a benchmark test looks like below
```java
@State(Scope.Benchmark)
@Fork(value = 1)
@Warmup(iterations = 3)
@Measurement(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class CreateSetTest {
private int[] inits = new int[100];
@Setup(Level.Trial)
public void setUp() {
for (int i = 0; i < inits.length; i++) {
inits[i] = ThreadLocalRandom.current().nextInt();
}
}
@Benchmark
public void testCreateHashSet1() {
HashSet<int[]> ints = new HashSet<>(Arrays.asList(inits));
}
@Benchmark
public void testCreateHashSet2() {
Set<Integer> result = new HashSet<>((int) (inits.length / 0.75) + 1);
for (Integer elem : inits)
result.add(elem);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(CreateSetTest.class.getSimpleName())
.forks(2)
.build();
new Runner(opt).run();
}
}
```
The test result
```
Benchmark Mode Cnt Score Error Units
CreateSetTest.testCreateHashSet1 avgt 20 ≈ 10⁻⁴ ms/op
CreateSetTest.testCreateHashSet2 avgt 20 0.001 ± 0.001 ms/op
```
It seems it will not cause performance reduce.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [kafka] ruanwenjun commented on a change in pull request #11754: MINOR: Optimize collection method in Utils
Posted by GitBox <gi...@apache.org>.
ruanwenjun commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r810426220
##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
*/
@SafeVarargs
public static <T> Set<T> mkSet(T... elems) {
- Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
- for (T elem : elems)
- result.add(elem);
- return result;
+ return new HashSet<>(Arrays.asList(elems));
Review comment:
@splett2 Thanks for your review, I changed as you said
```java
private Integer[] inits = new Integer[100];
@Setup(Level.Trial)
public void setUp() {
for (int i = 0; i < inits.length; i++) {
inits[i] = ThreadLocalRandom.current().nextInt();
}
}
@Benchmark
public void testCreateHashSet1() {
Set<Integer> ints = new HashSet<>(Arrays.asList(inits));
}
@Benchmark
public void testCreateHashSet2() {
Set<Integer> result = new HashSet<>((int) (inits.length / 0.75) + 1);
for (Integer elem : inits)
result.add(elem);
}
```
The benchmark result is
```
Benchmark Mode Cnt Score Error Units
CreateSetTest.testCreateHashSet1 avgt 20 0.002 ± 0.001 ms/op
CreateSetTest.testCreateHashSet2 avgt 20 0.001 ± 0.001 ms/op
```
And change the order of two method the result is
```
Benchmark Mode Cnt Score Error Units
CreateSetTest.testCreateHashSet2 avgt 20 0.002 ± 0.001 ms/op
CreateSetTest.testCreateHashSet1 avgt 20 0.001 ± 0.001 ms/op
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [kafka] splett2 commented on a change in pull request #11754: MINOR: Optimize collection method in Utils
Posted by GitBox <gi...@apache.org>.
splett2 commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r810532447
##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
*/
@SafeVarargs
public static <T> Set<T> mkSet(T... elems) {
- Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
- for (T elem : elems)
- result.add(elem);
- return result;
+ return new HashSet<>(Arrays.asList(elems));
Review comment:
It would be good to include the benchmark in the PR.
As a comment on the benchmark, I think ms/op is too coarse of a measurement for the code we're benchmarking.
Nanos/op is probably more appropriate.
I am also wondering whether this code is called in the critical path anywhere. It doesn't seem to be. For instance almost all of the calls to `mkSet` seem to be test code or static variables.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [kafka] ijuma commented on a change in pull request #11754: MINOR: Optimize collection method in Utils
Posted by GitBox <gi...@apache.org>.
ijuma commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r805423796
##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
*/
@SafeVarargs
public static <T> Set<T> mkSet(T... elems) {
- Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
- for (T elem : elems)
- result.add(elem);
- return result;
+ return new HashSet<>(Arrays.asList(elems));
Review comment:
This approach creates a temporary object, so it's not clear that it's better. It would need benchmarking to confirm.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [kafka] splett2 commented on a change in pull request #11754: MINOR: Optimize collection method in Utils
Posted by GitBox <gi...@apache.org>.
splett2 commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r810211960
##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
*/
@SafeVarargs
public static <T> Set<T> mkSet(T... elems) {
- Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
- for (T elem : elems)
- result.add(elem);
- return result;
+ return new HashSet<>(Arrays.asList(elems));
Review comment:
This microbenchmark doesn't seem to be written correctly.
you are creating a singleton list of an int[] array in `testCreateHashSet1`.
`inits` needs to be an `Integer` array rather than a primitive array to get the expected array/hashset behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [kafka] ijuma commented on a change in pull request #11754: MINOR: Optimize collection method in Utils
Posted by GitBox <gi...@apache.org>.
ijuma commented on a change in pull request #11754:
URL: https://github.com/apache/kafka/pull/11754#discussion_r805423796
##########
File path: clients/src/main/java/org/apache/kafka/common/utils/Utils.java
##########
@@ -764,10 +764,7 @@ public static ByteBuffer ensureCapacity(ByteBuffer existingBuffer, int newLength
*/
@SafeVarargs
public static <T> Set<T> mkSet(T... elems) {
- Set<T> result = new HashSet<>((int) (elems.length / 0.75) + 1);
- for (T elem : elems)
- result.add(elem);
- return result;
+ return new HashSet<>(Arrays.asList(elems));
Review comment:
This approach creates a temporary object, so it's not clear that it's better. It would need benchmarking to confirm.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org