You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/11/09 10:28:00 UTC

[jira] [Commented] (ARROW-14520) [Python] Non-deterministic Segfault with Pyarrow

    [ https://issues.apache.org/jira/browse/ARROW-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441048#comment-17441048 ] 

Joris Van den Bossche commented on ARROW-14520:
-----------------------------------------------

The segfault is happening somewhere in the Cast implementation, during the conversion from pandas DataFrame to an arrow Table (i.e. during the conversion of each of the columns of the dataframe to an arrow array). 
Do you specify the schema when writing to parquet?

I don't know if the above information would help further triaging the issue. Could you otherwise give some additional information? Do you have a code example that (from time to time) reproduces the segfault?

> [Python] Non-deterministic Segfault with Pyarrow
> ------------------------------------------------
>
>                 Key: ARROW-14520
>                 URL: https://issues.apache.org/jira/browse/ARROW-14520
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 4.0.1
>         Environment: Ubuntu 18.04.5 LTS
>            Reporter: Karthik Velayutham
>            Priority: Major
>
> Hi all,
> I've been getting this non-deterministic seg fault when writing out parquet files via Dask. I was wondering if anyone could help me figure out what the stack trace means and I can probably triage the issue from there.
> {code:java}
> // code placeholder
> epee:23196] *** Process received signal ***
> [epee:23196] Signal: Segmentation fault (11)
> [epee:23196] Signal code:  (128)
> [epee:23196] Failing at address: (nil)
> [epee:23196] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f595b277980]
> [epee:23196] [ 1] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow.so.400(_ZN5arrow7compute8internal12DoStaticCastIhlEEvPKvlllPv+0x98)[0x7f594720c278]
> [epee:23196] [ 2] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow.so.400(_ZN5arrow7compute8internal14CastNumberImplINS_9Int64TypeEEEvNS_4Type4typeERKNS_5DatumEPS6_+0xe72)[0x7f5947217592]
> [epee:23196] [ 3] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow.so.400(_ZN5arrow7compute8internal20CastIntegerToIntegerEPNS0_13KernelContextERKNS0_9ExecBatchEPNS_5DatumE+0x74)[0x7f5947231254]
> [epee:23196] [ 4] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow.so.400(+0x52e87d)[0x7f594710d87d]
> [epee:23196] [ 5] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow.so.400(_ZNK5arrow7compute8Function7ExecuteERKSt6vectorINS_5DatumESaIS3_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0xd9b)[0x7f594711455b]
> [epee:23196] [ 6] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow.so.400(+0x51efbd)[0x7f59470fdfbd]
> [epee:23196] [ 7] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow.so.400(_ZNK5arrow7compute12MetaFunction7ExecuteERKSt6vectorINS_5DatumESaIS3_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0x7c)[0x7f594710f21c]
> [epee:23196] [ 8] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow.so.400(_ZN5arrow7compute12CallFunctionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorINS_5DatumESaISA_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0x94)[0x7f5947108df4]
> [epee:23196] [ 9] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow.so.400(_ZN5arrow7compute4CastERKNS_5DatumERKNS0_11CastOptionsEPNS0_11ExecContextE+0xc8)[0x7f59470ff038]
> [epee:23196] [10] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow.so.400(_ZN5arrow7compute4CastERKNS_5DatumESt10shared_ptrINS_8DataTypeEERKNS0_11CastOptionsEPNS0_11ExecContextE+0xed)[0x7f59471013ed]
> [epee:23196] [11] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow.so.400(_ZN5arrow7compute4CastERKNS_5ArrayESt10shared_ptrINS_8DataTypeEERKNS0_11CastOptionsEPNS0_11ExecContextE+0x87)[0x7f5947101957]
> [epee:23196] [12] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow_python.so.400(+0xbf123)[0x7f5946b4e123]
> [epee:23196] [13] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow_python.so.400(+0xc5df6)[0x7f5946b54df6]
> [epee:23196] [14] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow_python.so.400(+0xcc828)[0x7f5946b5b828]
> [epee:23196] [15] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow_python.so.400(_ZN5arrow2py14NumPyConverter7ConvertEv+0x4d)[0x7f5946b5b97d]
> [epee:23196] [16] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/../../../libarrow_python.so.400(_ZN5arrow2py14NdarrayToArrowEPNS_10MemoryPoolEP7_objectS4_bRKSt10shared_ptrINS_8DataTypeEERKNS_7compute11CastOptionsEPS5_INS_12ChunkedArrayEE+0x2b0)[0x7f5946b5bf60]
> [epee:23196] [17] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/lib.cpython-38-x86_64-linux-gnu.so(+0x1b5493)[0x7f5947bfa493]
> [epee:23196] [18] /home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/lib.cpython-38-x86_64-linux-gnu.so(+0x1b85ff)[0x7f5947bfd5ff]
> [epee:23196] [19] python3(PyCFunction_Call+0x54)[0x5652b403fc64]
> [epee:23196] [20] python3(_PyObject_MakeTpCall+0x31e)[0x5652b404f0fe]
> [epee:23196] [21] python3(_PyEval_EvalFrameDefault+0x5685)[0x5652b40e5a95]
> [epee:23196] [22] python3(_PyEval_EvalCodeWithName+0x2c3)[0x5652b40c2fa3]
> [epee:23196] [23] python3(_PyFunction_Vectorcall+0x378)[0x5652b40c4388]
> [epee:23196] [24] python3(_PyEval_EvalFrameDefault+0x947)[0x5652b40e0d57]
> [epee:23196] [25] python3(_PyEval_EvalCodeWithName+0x2c3)[0x5652b40c2fa3]
> [epee:23196] [26] python3(_PyFunction_Vectorcall+0x378)[0x5652b40c4388]
> [epee:23196] [27] python3(_PyEval_EvalFrameDefault+0x947)[0x5652b40e0d57]
> [epee:23196] [28] python3(_PyEval_EvalCodeWithName+0x2c3)[0x5652b40c2fa3]
> [epee:23196] [29] python3(_PyFunction_FastCallDict+0x1b2)[0x5652b3fe018a]
> [epee:23196] *** End of error message ***
> {code}
> Here's another concrete stack trace with faulthandler.
> {code:java}
> Current thread 0x00007f0256ad9700 (most recent call first):
>   File "/home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 575 in convert_column
>   File "/home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 594 in <listcomp>
>   File "/home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 594 in dataframe_to_arrays
>   File "/home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/dask/dataframe/io/parquet/arrow.py", line 892 in _pandas_to_arrow_table
>   File "/home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/dask/dataframe/io/parquet/arrow.py", line 922 in write_partition
>   File "/home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/distributed/worker.py", line 3936 in apply_function_simple
>   File "/home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/distributed/worker.py", line 3914 in apply_function
>   File "/home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/distributed/_concurrent_futures_thread.py", line 66 in run
>   File "/home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/site-packages/distributed/threadpoolexecutor.py", line 55 in _worker
>   File "/home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/threading.py", line 870 in run
>   File "/home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/threading.py", line 932 in _bootstrap_inner
>   File "/home/vkarthik/anaconda3/envs/katana-dev/lib/python3.8/threading.py", line 890 in _bootstrap
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)