You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2019/02/08 15:35:00 UTC

[jira] [Closed] (ARROW-4437) [Python] Add builder API

     [ https://issues.apache.org/jira/browse/ARROW-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney closed ARROW-4437.
-------------------------------
    Resolution: Duplicate

Duplicate of ARROW-3917. Help would be appreciated

> [Python] Add builder API
> ------------------------
>
>                 Key: ARROW-4437
>                 URL: https://issues.apache.org/jira/browse/ARROW-4437
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>         Environment: Python 3.7.0 pyarrow-0.12.0
>            Reporter: Zhuang Tianyi
>            Priority: Minor
>
> There is no [Array Builder|https://arrow.apache.org/docs/cpp/api/builder.html#_CPPv3N5arrow12ArrayBuilderE] API in python bindings. When I generate data from a stream, I have to build a python list (high overhead) or pandas, then finalize it by call pa.array with copy operation. It seems like that we can build an Array directly from some (two or three) pa.ResizableBuffer in O(1) time.
> It's possible that maintain these buffers (value buffer, null bitmap, offset buffer) manually by current exported API, but not safe enough.
>  
> I found undocumented StringBuilder API in [python/pyarrow/builder.pxi|https://github.com/apache/arrow/blob/master/python/pyarrow/builder.pxi], corresponding to [https://arrow.apache.org/docs/cpp/api/builder.html#classarrow_1_1_string_builder]. Will other ArrayBuilder APIs to be add in python binding?
>  
> ----
> Something more
> a BatchBuilder API is better if possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)