You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Zhuang Tianyi (JIRA)" <ji...@apache.org> on 2019/01/31 12:39:00 UTC
[jira] [Created] (ARROW-4437) [Python] Add builder API
Zhuang Tianyi created ARROW-4437:
------------------------------------
Summary: [Python] Add builder API
Key: ARROW-4437
URL: https://issues.apache.org/jira/browse/ARROW-4437
Project: Apache Arrow
Issue Type: New Feature
Components: Python
Environment: Python 3.7.0 pyarrow-0.12.0
Reporter: Zhuang Tianyi
There is no [Array Builder|https://arrow.apache.org/docs/cpp/api/builder.html#_CPPv3N5arrow12ArrayBuilderE] API in python bindings. When I generate data from a stream, I have to build a python list (high overhead) or pandas, then finalize it by call pa.array with copy operation. It seems like that we can build an Array directly from some (two or three) pa.ResizableBuffer in O(1) time.
It's possible that maintain these buffers (value buffer, null bitmap, offset buffer) manually by current exported API, but not safe enough.
I found undocumented StringBuilder API in [python/pyarrow/builder.pxi|https://github.com/apache/arrow/blob/master/python/pyarrow/builder.pxi], corresponding to [https://arrow.apache.org/docs/cpp/api/builder.html#classarrow_1_1_string_builder]. Will other ArrayBuilder APIs to be add in python binding?
----
Something more
a BatchBuilder API is better if possible.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)