You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "neoremind (Jira)" <ji...@apache.org> on 2020/03/27 04:20:00 UTC

[jira] [Comment Edited] (CALCITE-3878) Make ArrayList creation with initial capacity when size is fixed

    [ https://issues.apache.org/jira/browse/CALCITE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068252#comment-17068252 ] 

neoremind edited comment on CALCITE-3878 at 3/27/20, 4:19 AM:
--------------------------------------------------------------

[~julianhyde] Thanks for looking into the issue. I agree Java intrinsic is not related in this case. Java intrinsic mainly works by JIT to make code run faster, like some native JNI interfaces runs faster by inline technique. 

I agree extra bytes waste is not a big deal, but I think resizing is a performance killer. For example,
{code:java}
List<String> list = new ArrayList<>();
for (String s: strs) {
  list.add(s);
}{code}
If strs size is less than 10, then it does no harm. If strs size is more than 10, or even worse, like 256, then the _list_ will grow and resize (involves new array instance creating and copying from old) many times. I test under Java 1.8.0_161-b12, the resized capacity is

_^int newCapacity = oldCapacity + (oldCapacity >> 1);^_

the size will grow like that 15,22,33,49,73,109,163,244 to 366, and there are 9 times array copying.

If we already know the size of strs, we can eliminate resizing and copying, good for performance and GC.

It is good to use immutable data structure in guava, but the original list creation process can be more efficient.

Moreover, in most cases, the list contains object reference which is small, if take CPU caching like cache line, the array copying is also not very friendly.

Overall, the performance improvement regarding to this update could be very small, but I think it is a good way, at least better for performance, even it is small.

 

 


was (Author: neoremind):
[~julianhyde] Thanks for looking into the issue. I agree not in this case Java intrinsic is not related. Java intrinsic mainly works by JIT to make code run faster, like some native JNI interfaces runs faster by inline technique. 

I agree extra bytes waste is not a big deal, but I think resizing is a performance killer. For example,
{code:java}
List<String> list = new ArrayList<>();
for (String s: strs) {
  list.add(s);
}{code}
If strs size is less than 10, then it does no harm. If strs size is more than 10, or even worse, like 256, then the _list_ will grow and resize (involves new array instance creating and copying from old) many times. I test under Java 1.8.0_161-b12, the resized capacity is

_^int newCapacity = oldCapacity + (oldCapacity >> 1);^_

the size will grow like that 15,22,33,49,73,109,163,244 to 366, and there are 9 times array copying.

If we already know the size of strs, we can eliminate resizing and copying, good for performance and GC.

It is good to use immutable data structure in guava, but the original list creation process can be more efficient.

Moreover, in most cases, the list contains object reference which is small, if take CPU caching like cache line, the array copying is also not very friendly.

Overall, the performance improvement regarding to this update could be very small, but I think it is a good way, at least better for performance, even it is small.

 

 

> Make ArrayList creation with initial capacity when size is fixed
> ----------------------------------------------------------------
>
>                 Key: CALCITE-3878
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3878
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.22.0
>            Reporter: neoremind
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> I find many places in Calcite where _new ArrayList<>()_ is used, if the list is expected to be immutable or not resizing, it is always a good manner to create with initial capacity, better for memory usage and performance.
> I search all occurrences, focus on the core module, to make it safe, I only update local variables with fixed size and not working in recursive method. If the local variable reference goes out of scope, if resizing is needed, things will work normally as well, so no side effect, but for the "escaping" case, I am very conservative and do not change them.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)