You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jeroen van Straten (Jira)" <ji...@apache.org> on 2022/01/19 14:40:00 UTC

[jira] [Comment Edited] (ARROW-7051) [C++] Improve MakeArrayOfNull to support creation of multiple arrays

    [ https://issues.apache.org/jira/browse/ARROW-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477904#comment-17477904 ] 

Jeroen van Straten edited comment on ARROW-7051 at 1/19/22, 2:39 PM:
---------------------------------------------------------------------

As it turns out, at least some of the functions using {{MakeArrayOfNull()}} are mutating the resulting array (it should go without saying that this is a good way to create garbage when different kinds of buffers share the same bit of memory), as evidenced by the tests failing now that the new implementation just so happens to actually return {{Buffers}} that are marked as immutable. There was no documentation in the header about the result of {{MakeArrayOfNull()}} being immutable, so this is not really surprising. The same is probably true for {{MakeArrayFromScalar()}}, which calls {{MakeArrayOfNull()}} in some special cases.

I've now made separate versions of {{MakeArrayOfNull()}} and {{MakeArrayFromScalar()}} for mutable and immutable use cases, and am slowly trying to figure out which of the invocations need the array to be mutable and which don't. I'm also replacing {{MakeArrayOfNull(type, /\*length=\*/0)}} invocations with {{MakeEmptyArray(type)}}, which seems more suitable in those cases. I'm getting rather worried about poking around changing semantics all over Arrow without fully understanding Arrow first, though...


was (Author: JIRAUSER282962):
As it turns out, at least some of the functions using {{MakeArrayOfNull()}} are mutating the resulting array (it should go without saying that this is a good way to create garbage when different kinds of buffers share the same bit of memory), as evidenced by the tests failing now that the new implementation just so happens to actually return {{Buffers}} that are marked as immutable. There was no documentation in the header about the result of {{MakeArrayOfNull()}} being immutable, so this is not really surprising. The same is probably true for {{MakeArrayFromScalar()}}, which calls {{MakeArrayOfNull()}} in some special cases.

I've now made separate versions of {{MakeArrayOfNull()}} and {{MakeArrayFromScalar()}} for mutable and immutable use cases, and am slowly trying to figure out which of the invocations need the array to be mutable and which don't. I'm also replacing {{MakeArrayOfNull(type, /*length=*/0)}} invocations with {{MakeEmptyArray(type)}}, which seems more suitable in those cases. I'm getting rather worried about poking around changing semantics all over Arrow without fully understanding Arrow first, though...

> [C++] Improve MakeArrayOfNull to support creation of multiple arrays
> --------------------------------------------------------------------
>
>                 Key: ARROW-7051
>                 URL: https://issues.apache.org/jira/browse/ARROW-7051
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 0.14.0
>            Reporter: Ben Kietzman
>            Assignee: Jeroen van Straten
>            Priority: Minor
>              Labels: beginner, good-first-issue, pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> MakeArrayOfNull reuses a single buffer of {{0}} for all buffers in the array it creates. It could be extended to reuse that same buffer for all buffers in multiple arrays. This optimization will make RecordBatchProjector and ConcatenateTablesWithPromotion more memory efficient



--
This message was sent by Atlassian Jira
(v8.20.1#820001)