You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jin Shang (Jira)" <ji...@apache.org> on 2022/09/24 17:49:00 UTC

[jira] [Updated] (ARROW-17824) [C++][Gandiva] Implement preallocation for variable length output buffer

     [ https://issues.apache.org/jira/browse/ARROW-17824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jin Shang updated ARROW-17824:
------------------------------
    Description: 
When the output type of an expression is of variable length, e.g. string, Gandiva would realloc the output buffer to make space for new outputs for each row. When num of rows is high some memory allocators perform poorly.

We can use the std::vector like approach to amortize the allcation cost. First allocate some initial space depending on the input size. Each time we run out of space, double the buffer size. In the end shrink it to fit the actual size. 

Arrow string builder also uses this approach.

  was:
When the output type of an expression is of variable length, e.g. string, Gandiva would realloc the output buffer to make space for new outputs for each row. When num of rows is high some memory allocators perform poorly.

We can use the std::vector like approach to amortize the allcation cost. First allocate some initial space depending on the input size. Each time we run out of space, double the buffer size. In the end shrink it to fit the actual size. 


> [C++][Gandiva] Implement preallocation for  variable length output buffer
> -------------------------------------------------------------------------
>
>                 Key: ARROW-17824
>                 URL: https://issues.apache.org/jira/browse/ARROW-17824
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++ - Gandiva
>    Affects Versions: 9.0.0
>            Reporter: Jin Shang
>            Assignee: Jin Shang
>            Priority: Major
>
> When the output type of an expression is of variable length, e.g. string, Gandiva would realloc the output buffer to make space for new outputs for each row. When num of rows is high some memory allocators perform poorly.
> We can use the std::vector like approach to amortize the allcation cost. First allocate some initial space depending on the input size. Each time we run out of space, double the buffer size. In the end shrink it to fit the actual size. 
> Arrow string builder also uses this approach.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)