You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/22 10:55:00 UTC

[jira] [Commented] (ARROW-1964) [Python] Expose Builder classes

    [ https://issues.apache.org/jira/browse/ARROW-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16447201#comment-16447201 ] 

ASF GitHub Bot commented on ARROW-1964:
---------------------------------------

dsimmie opened a new pull request #1930: ARROW-1964: [Python ] Exposed StringBuilder to Cython
URL: https://github.com/apache/arrow/pull/1930
 
 
   * Partial implementation of ARROW-1964
   * Only implements StringBuilder not the other builders, notably the DictionaryBuilder mentioned in issue 1964

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> [Python] Expose Builder classes
> -------------------------------
>
>                 Key: ARROW-1964
>                 URL: https://issues.apache.org/jira/browse/ARROW-1964
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Uwe L. Korn
>            Priority: Major
>              Labels: beginner, pull-request-available
>             Fix For: 1.0.0
>
>
> Having the builder classes available from Python would be very helpful. Currently a construction of an Arrow array always need to have a Python list or numpy array as intermediate. As  the builder in combination with jemalloc are very efficient in building up non-chunked memory, it would be nice to directly use them in certain cases.
> The most useful builders are the [StringBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L714] and [DictionaryBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L872] as they provide functionality to create columns that are not easily constructed using NumPy methods in Python.
> The basic approach would be to wrap the C++ classes in https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd so that they can be used from Cython. Afterwards, we should start a new file {{python/pyarrow/builder.pxi}} where we have classes take typical Python objects like {{str}} and pass them on to the C++ classes. At the end, these classes should also return (Python accessible) {{pyarrow.Array}} instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)