You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2022/09/01 08:07:00 UTC

[jira] [Resolved] (ARROW-14495) [Python] DictionaryArray.from_buffers should not crash

     [ https://issues.apache.org/jira/browse/ARROW-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Antoine Pitrou resolved ARROW-14495.
------------------------------------
    Fix Version/s: 10.0.0
       Resolution: Fixed

Issue resolved by pull request 13989
[https://github.com/apache/arrow/pull/13989]

> [Python] DictionaryArray.from_buffers should not crash
> ------------------------------------------------------
>
>                 Key: ARROW-14495
>                 URL: https://issues.apache.org/jira/browse/ARROW-14495
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Joris Van den Bossche
>            Assignee: Miles Granger
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 10.0.0
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> From https://stackoverflow.com/questions/69746789/how-to-make-a-pyarrow-dictionaryarray-with-extensiontype-using-from-buffers-us
> Trying to create a DictionaryArray with {{from_buffers}} crashes:
> {code}
> >>> import pyarrow as pa
> >>> a = pa.array(["one", "two", "three", "two", "one"]).dictionary_encode()
> >>> b = pa.DictionaryArray.from_buffers(a.type, len(a), a.indices.buffers())
> ../src/arrow/array/array_dict.cc:83:  Check failed: (data->dictionary) != (nullptr) 
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcb26)[0x7fa850076b26]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcaa4)[0x7fa850076aa4]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcac6)[0x7fa850076ac6]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow4util8ArrowLogD1Ev+0x47)[0x7fa850076e25]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow15DictionaryArrayC2ERKSt10shared_ptrINS_9ArrayDataEE+0x1b9)[0x7fa84fad33fb]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN9__gnu_cxx13new_allocatorIN5arrow15DictionaryArrayEE9constructIS2_JRKSt10shared_ptrINS1_9ArrayDataEEEEEvPT_DpOT0_+0x49)[0x7fa84fc0f9f5]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt16allocator_traitsISaIN5arrow15DictionaryArrayEEE9constructIS1_JRKSt10shared_ptrINS0_9ArrayDataEEEEEvRS2_PT_DpOT0_+0x38)[0x7fa84fc0d44d]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt23_Sp_counted_ptr_inplaceIN5arrow15DictionaryArrayESaIS1_ELN9__gnu_cxx12_Lock_policyE2EEC2IJRKSt10shared_ptrINS0_9ArrayDataEEEEES2_DpOT_+0xaf)[0x7fa84fc0a027]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC2IN5arrow15DictionaryArrayESaIS5_EJRKSt10shared_ptrINS4_9ArrayDataEEEEERPT_St20_Sp_alloc_shared_tagIT0_EDpOT1_+0xb2)[0x7fa84fc04560]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt12__shared_ptrIN5arrow15DictionaryArrayELN9__gnu_cxx12_Lock_policyE2EEC1ISaIS1_EJRKSt10shared_ptrINS0_9ArrayDataEEEEESt20_Sp_alloc_shared_tagIT_EDpOT0_+0x4c)[0x7fa84fbffcdc]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt10shared_ptrIN5arrow15DictionaryArrayEEC2ISaIS1_EJRKS_INS0_9ArrayDataEEEEESt20_Sp_alloc_shared_tagIT_EDpOT0_+0x39)[0x7fa84fbfd8f9]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZSt15allocate_sharedIN5arrow15DictionaryArrayESaIS1_EJRKSt10shared_ptrINS0_9ArrayDataEEEES3_IT_ERKT0_DpOT1_+0x38)[0x7fa84fbfb500]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZSt11make_sharedIN5arrow15DictionaryArrayEJRKSt10shared_ptrINS0_9ArrayDataEEEES2_IT_EDpOT0_+0x54)[0x7fa84fbf7be6]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0xd36104)[0x7fa84fbf0104]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0xd2f2f8)[0x7fa84fbe92f8]
> /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow9MakeArrayERKSt10shared_ptrINS_9ArrayDataEE+0x99)[0x7fa84fbe1d3d]
> {code}
> I don't know if this can ever work with the current signature, since you can only pass buffers and not the dictionary itself (which is not included in the buffers). In C++ there is an {{ArrayData::Make}} that in addition also takes a dictionary. I think we should add a custom {{from_buffers}} on DictionaryArray in cython to use that instead of the base class {{from_buffers}} implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)