You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2017/04/03 22:26:42 UTC

[jira] [Commented] (ARROW-602) C++: Provide iterator access to primitive elements inside a Column/ChunkedArray

    [ https://issues.apache.org/jira/browse/ARROW-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954254#comment-15954254 ] 

Wes McKinney commented on ARROW-602:
------------------------------------

hi [~JohanMabille], thank you very much for writing this spec document. 

I think using an STL-compatible interface to Arrow data structures would be really useful. As far as the data structures defined in https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h, my feeling is that they should remain as "plain old data" with as few features as necessary beyond access to their metadata and data buffers -- there are a couple of convenience methods on {{arrow::Array}} and its subclasses for equality, slicing, and simple value access, but beyond that I am not sure we should add very much to these classes (I'd be more in favor of making {{array.h}} smaller than making it bigger}}. 

What I'm envisioning is something like:

{code}
std::shared_ptr<Array> my_data = ...;

arrow::ArrayAccessor<Int64Type> container(*my_data);
{code}

From here, {{container}} would unbox the memory in {{my_data}} and implement the interfaces which you've described in your document. We'll have to make decisions about the return value for {{operator[]}}, like perhaps it will return {{std::optional<int64_t>}} for this example, but the return type for nested types may be more complicated. 

While Arrow memory is intended to be immutable for most applications, if the buffers in an array are mutable (e.g. {{my_data->data()->is_mutable()}} is true) then this container could permit mutation, subject to const-ness. 

Does this make sense? 

> C++: Provide iterator access to primitive elements inside a Column/ChunkedArray
> -------------------------------------------------------------------------------
>
>                 Key: ARROW-602
>                 URL: https://issues.apache.org/jira/browse/ARROW-602
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Uwe L. Korn
>              Labels: beginner, newbie
>
> Given a ChunkedArray, an Arrow user must currently iterate over all its chunks and then cast them to their types to extract the primitive memory regions to access the values. A convenient way to access the underlying values would be to offer a function that takes a ChunkedArray and returns a C++ iterator over all elements.
> While this may not be the most performant way to access the underlying data, it should have sufficient performance and adds a convenience layer for new users.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)