You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Tim Swast (JIRA)" <ji...@apache.org> on 2019/06/07 18:56:00 UTC
[jira] [Commented] (ARROW-2607) [Java/Python] Support VarCharVector
/ StringArray in pyarrow.Array.from_jvm
[ https://issues.apache.org/jira/browse/ARROW-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858909#comment-16858909 ]
Tim Swast commented on ARROW-2607:
----------------------------------
I'm very interested in this issue, as it would be extremely useful for parsing files into Arrow tables from numba. I would expect to be able to do the following:
{code:java}
my_string_array = pyarrow.Array.from_buffers(
pyarrow.string(),
row_count,
[
pyarrow.py_buffer(my_string_nullmask),
pyarrow.py_buffer(my_string_offsets),
pyarrow.py_buffer(my_string_bytes),
],
){code}
But I get :
{quote}File "pyarrow/array.pxi", line 578, in pyarrow.lib.Array.from_buffers
NotImplementedError: from_buffers is only supported for primitive arrays yet.
{quote}
I suppose if I wanted to contribute this fix, I should start looking at pyarrow/array.pxi first?
> [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm
> ---------------------------------------------------------------------------
>
> Key: ARROW-2607
> URL: https://issues.apache.org/jira/browse/ARROW-2607
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Java, Python
> Reporter: Uwe L. Korn
> Priority: Major
>
> Follow-up after https://issues.apache.org/jira/browse/ARROW-2249: Currently only primitive arrays are supported in {{pyarrow.Array.from_jvm}} as it uses {{pyarrow.Array.from_buffers}} underneath. We should extend one of the two functions to be able to deal with string arrays. There is a currently failing unit test {{test_jvm_string_array}} in {{pyarrow/tests/test_jvm.py}} to verify the implementation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)