You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@orc.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2021/08/27 03:53:00 UTC

[jira] [Updated] (ORC-848) Recycle Internal Buffer in StringHashTableDictionary

     [ https://issues.apache.org/jira/browse/ORC-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated ORC-848:
------------------------------
    Affects Version/s: 1.7.0

> Recycle Internal Buffer in StringHashTableDictionary
> ----------------------------------------------------
>
>                 Key: ORC-848
>                 URL: https://issues.apache.org/jira/browse/ORC-848
>             Project: ORC
>          Issue Type: Improvement
>    Affects Versions: 1.7.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Minor
>             Fix For: 1.7.0
>
>
> {code:java|title=StringHashTableDictionary.java}
>   private void initHashBuckets(int capacity) {
>     DynamicIntArray[] buckets = new DynamicIntArray[capacity];
>     for (int i = 0; i < capacity; i++) {
>       // We don't need large bucket: If we have more than a handful of collisions,
>       // then the table is too small or the function isn't good.
>       buckets[i] = createBucket();
>     }
>     hashBuckets = buckets;
>   }
> {code}
> This code was highlighted for me in a JMH run of the perf test.  The {{Dictionary}} is regularly cleared out and is reset back to its default state.  I'm sure most of the time is spent generating {{capacity}} buckets (buffers), but we can save one buffer initialization by only creating {{buckets}} if the capacity is different than requested (which is not the case with a {{clear()}}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)