You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2021/07/19 13:49:00 UTC

[jira] [Commented] (ORC-830) Do Not Copy String When Adding to StringHashTableDictionary

    [ https://issues.apache.org/jira/browse/ORC-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383350#comment-17383350 ] 

David Mollitor commented on ORC-830:
------------------------------------

For the {{main}} branch, you can see that {{StringTreeWriter#writeBatch()}} consumes 2.5% of the cycles, much of which is spent in {{getText()}}:

!Capture_StringHashTableAdd_Main.PNG!

In {{ORC-830}} branch you can see that {{StringTreeWriter}}{{#writeBatch()}}{{}} consumes 1.6% of the cycles and the call to {{getText()}} does not even register anymore:

!Capture_StringHashTableAdd_ORC830.PNG!

> Do Not Copy String When Adding to StringHashTableDictionary
> -----------------------------------------------------------
>
>                 Key: ORC-830
>                 URL: https://issues.apache.org/jira/browse/ORC-830
>             Project: ORC
>          Issue Type: Improvement
>          Components: Java
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Minor
>         Attachments: Capture_StringHashTableAdd_Main.PNG, Capture_StringHashTableAdd_ORC830.PNG
>
>
> {code:java|title=StringHashTableDictionary.java}
>     Text tmpText = new Text();
>     for (int i = 0; i < candidateArray.size(); i++) {
>       getText(tmpText, candidateArray.get(i));
>       if (tmpText.equals(newKey)) {
>         return candidateArray.get(i);
>       }
>     }
> {code}
> When there is a collision adding a value into a {{StringHashTableDictionary}}, a temp {{Text}} object is created and then each value in the byte array is copied into the temp {{Text}}  until a match is found (or worst-case scenario, a match is not found and every value is loaded).
> Instead of loading (copying) the values, just compare directly against the byte array without copying the data into a intermediate (temp) buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)