You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Athanassios Hatzis (Jira)" <ji...@apache.org> on 2020/07/16 16:36:00 UTC

[jira] [Created] (ARROW-9505) [Python] pa.struct() dictionary-encode not implemented for decimal

Athanassios Hatzis created ARROW-9505:
-----------------------------------------

             Summary: [Python] pa.struct() dictionary-encode not implemented for decimal
                 Key: ARROW-9505
                 URL: https://issues.apache.org/jira/browse/ARROW-9505
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
    Affects Versions: 0.17.1
            Reporter: Athanassios Hatzis


Hi,  in this PyArrow structured array

 
{code:java}
struct_array.slice(0,3)
Out[52]: 
<pyarrow.lib.StructArray object at 0x7f92061e9dc0>
-- is_valid: all not null
-- child 0 type: int16
 [
 991,
 992,
 993
 ]
-- child 1 type: decimal(6, 3)
 [
 36.100,
 42.300,
 15.300
 ]
{code}
I have tried to apply dictionary_encode() method and I got back this error

 
{code:java}
struct_array.dictionary_encode()
File "<ipython-input-51-440741990dd7>", line 1, in <module>
 struct_array.dictionary_encode()
 File "pyarrow/array.pxi", line 750, in pyarrow.lib.Array.dictionary_encode
 File "pyarrow/error.pxi", line 106, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: dictionary-encode not implemented for struct<catpid: int16, catcost: decimal(6, 3)> 
{code}
I know that it is possible to apply dictionary_encode() to each field of the struct_array and you can create a RecordBatch from the dictionary encoded fields of the array. So I am not sure why this functionality is not implemented.

I also noticed that there is a transformation RecordBatch.from_struct_array() but I want the columns to be dictionary encoded and the only way to do this in the current version is to process each field, column separately.

BTW: In my project I am addressing a basic problem which is how to transform tuples from any database table to dictionary encoded columns of a PyArrow RecordBatch (Table). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)