You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/10/28 08:25:00 UTC

[jira] [Created] (ARROW-14495) [Python] DictionaryArray.from_buffers should not crash

Joris Van den Bossche created ARROW-14495:
---------------------------------------------

             Summary: [Python] DictionaryArray.from_buffers should not crash
                 Key: ARROW-14495
                 URL: https://issues.apache.org/jira/browse/ARROW-14495
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++, Python
            Reporter: Joris Van den Bossche
             Fix For: 7.0.0


From https://stackoverflow.com/questions/69746789/how-to-make-a-pyarrow-dictionaryarray-with-extensiontype-using-from-buffers-us

Trying to create a DictionaryArray with {{from_buffers}} crashes:

{code}
>>> import pyarrow as pa
>>> a = pa.array(["one", "two", "three", "two", "one"]).dictionary_encode()
>>> b = pa.DictionaryArray.from_buffers(a.type, len(a), a.indices.buffers())
../src/arrow/array/array_dict.cc:83:  Check failed: (data->dictionary) != (nullptr) 
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcb26)[0x7fa850076b26]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcaa4)[0x7fa850076aa4]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcac6)[0x7fa850076ac6]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow4util8ArrowLogD1Ev+0x47)[0x7fa850076e25]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow15DictionaryArrayC2ERKSt10shared_ptrINS_9ArrayDataEE+0x1b9)[0x7fa84fad33fb]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN9__gnu_cxx13new_allocatorIN5arrow15DictionaryArrayEE9constructIS2_JRKSt10shared_ptrINS1_9ArrayDataEEEEEvPT_DpOT0_+0x49)[0x7fa84fc0f9f5]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt16allocator_traitsISaIN5arrow15DictionaryArrayEEE9constructIS1_JRKSt10shared_ptrINS0_9ArrayDataEEEEEvRS2_PT_DpOT0_+0x38)[0x7fa84fc0d44d]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt23_Sp_counted_ptr_inplaceIN5arrow15DictionaryArrayESaIS1_ELN9__gnu_cxx12_Lock_policyE2EEC2IJRKSt10shared_ptrINS0_9ArrayDataEEEEES2_DpOT_+0xaf)[0x7fa84fc0a027]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC2IN5arrow15DictionaryArrayESaIS5_EJRKSt10shared_ptrINS4_9ArrayDataEEEEERPT_St20_Sp_alloc_shared_tagIT0_EDpOT1_+0xb2)[0x7fa84fc04560]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt12__shared_ptrIN5arrow15DictionaryArrayELN9__gnu_cxx12_Lock_policyE2EEC1ISaIS1_EJRKSt10shared_ptrINS0_9ArrayDataEEEEESt20_Sp_alloc_shared_tagIT_EDpOT0_+0x4c)[0x7fa84fbffcdc]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt10shared_ptrIN5arrow15DictionaryArrayEEC2ISaIS1_EJRKS_INS0_9ArrayDataEEEEESt20_Sp_alloc_shared_tagIT_EDpOT0_+0x39)[0x7fa84fbfd8f9]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZSt15allocate_sharedIN5arrow15DictionaryArrayESaIS1_EJRKSt10shared_ptrINS0_9ArrayDataEEEES3_IT_ERKT0_DpOT1_+0x38)[0x7fa84fbfb500]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZSt11make_sharedIN5arrow15DictionaryArrayEJRKSt10shared_ptrINS0_9ArrayDataEEEES2_IT_EDpOT0_+0x54)[0x7fa84fbf7be6]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0xd36104)[0x7fa84fbf0104]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0xd2f2f8)[0x7fa84fbe92f8]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow9MakeArrayERKSt10shared_ptrINS_9ArrayDataEE+0x99)[0x7fa84fbe1d3d]
{code}

I don't know if this can ever work with the current signature, since you can only pass buffers and not the dictionary itself (which is not included in the buffers). In C++ there is an ArrayData::Make}} that in addition also takes a dictionary. I think we should add a custom {{from_buffers}} on DictionaryArray in cython to use that instead of the base class {{from_buffers}} implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)