You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Zhuang Tianyi (JIRA)" <ji...@apache.org> on 2019/01/31 12:39:00 UTC

[jira] [Created] (ARROW-4437) [Python] Add builder API

Zhuang Tianyi created ARROW-4437:
------------------------------------

             Summary: [Python] Add builder API
                 Key: ARROW-4437
                 URL: https://issues.apache.org/jira/browse/ARROW-4437
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Python
         Environment: Python 3.7.0 pyarrow-0.12.0
            Reporter: Zhuang Tianyi


There is no [Array Builder|https://arrow.apache.org/docs/cpp/api/builder.html#_CPPv3N5arrow12ArrayBuilderE] API in python bindings. When I generate data from a stream, I have to build a python list (high overhead) or pandas, then finalize it by call pa.array with copy operation. It seems like that we can build an Array directly from some (two or three) pa.ResizableBuffer in O(1) time.

It's possible that maintain these buffers (value buffer, null bitmap, offset buffer) manually by current exported API, but not safe enough.

 

I found undocumented StringBuilder API in [python/pyarrow/builder.pxi|https://github.com/apache/arrow/blob/master/python/pyarrow/builder.pxi], corresponding to [https://arrow.apache.org/docs/cpp/api/builder.html#classarrow_1_1_string_builder]. Will other ArrayBuilder APIs to be add in python binding?

 
----
Something more

a BatchBuilder API is better if possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)