You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/05/03 14:54:00 UTC
[jira] [Updated] (ARROW-12622) [Python] Segfault when reading CSV
inside Flight server
[ https://issues.apache.org/jira/browse/ARROW-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-12622:
-----------------------------------
Labels: pull-request-available (was: )
> [Python] Segfault when reading CSV inside Flight server
> -------------------------------------------------------
>
> Key: ARROW-12622
> URL: https://issues.apache.org/jira/browse/ARROW-12622
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 4.0.0
> Environment: Arch Linux 5.11.16-arch1-1
> Originally found on GitHub Actions Ubuntu 20.04.2
> Python 3.8 and Python 3.9
> Reporter: Jeroen Hoekx
> Assignee: David Li
> Priority: Major
> Labels: pull-request-available
> Attachments: crash.py, test.csv
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Using pyarrow.csv.read_csv inside a Flight server results in a segfault. This did not happen in pyarrow 3.0.0.
> The [CI build of a library we're building failed|https://github.com/timeseer-ai/kukur/runs/2467203316] and made us aware of the issue.
> Attached, a CSV and Python server/client can be found that demonstrates the problem.
> * Run the server with `python crash.py server`.
> * Run the client with `python crash.py client`. The server segfaults with 'Segmentation fault (core dumped)'.
> The crash does not happen when just reading the CSV (`python crash.py`).
> This is the stacktrace generated by `coredumpctl debug` of a debug build of commit 2746266addddf71d20a4fe49381497b894c4d15c:
> {code:java}
> #0 0x00007f9275cffedc in __gnu_cxx::__atomic_add (__val=1, __mem=0x10) at /usr/include/c++/10.2.0/ext/atomicity.h:55
> #1 __gnu_cxx::__atomic_add_dispatch (__val=1, __mem=0x10) at /usr/include/c++/10.2.0/ext/atomicity.h:96
> #2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy (this=0x8)
> at /usr/include/c++/10.2.0/bits/shared_ptr_base.h:142
> #3 0x00007f9275cfe0a5 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count (this=0x7f92735a2778,
> __r=...) at /usr/include/c++/10.2.0/bits/shared_ptr_base.h:740
> #4 0x00007f9275cfd01f in std::__shared_ptr<arrow::StopSourceImpl, (__gnu_cxx::_Lock_policy)2>::__shared_ptr (
> this=0x7f92735a2770) at /usr/include/c++/10.2.0/bits/shared_ptr_base.h:1181
> #5 0x00007f9275cfd045 in std::shared_ptr<arrow::StopSourceImpl>::shared_ptr (this=0x7f92735a2770)
> at /usr/include/c++/10.2.0/bits/shared_ptr.h:149
> #6 0x00007f9275cfd06b in arrow::StopToken::StopToken (this=0x7f92735a2770)
> at /home/jeroen/dev/python/apache-arrow/dist/include/arrow/util/cancel.h:57
> #7 0x00007f9275ce96f7 in __pyx_pf_7pyarrow_4_csv_read_csv (__pyx_self=0x0, __pyx_v_input_file=0x7f929e9f28b0,
> __pyx_v_read_options=0x7f929f49ee80 <_Py_NoneStruct>, __pyx_v_parse_options=0x7f929f49ee80 <_Py_NoneStruct>,
> __pyx_v_convert_options=0x7f929f49ee80 <_Py_NoneStruct>, __pyx_v_memory_pool=0x7f929f49ee80 <_Py_NoneStruct>)
> at /home/jeroen/dev/python/apache-arrow/arrow/python/build/temp.linux-x86_64-3.8/_csv.cpp:14208
> #8 0x00007f9275ce8b92 in __pyx_pw_7pyarrow_4_csv_1read_csv (__pyx_self=0x0, __pyx_args=0x7f929ea64be0, __pyx_kwds=0x0)
> at /home/jeroen/dev/python/apache-arrow/arrow/python/build/temp.linux-x86_64-3.8/_csv.cpp:14036
> #9 0x00007f929f22cf98 in ?? () from /usr/lib/libpython3.8.so.1.0
> #10 0x00007f929f22d5f8 in _PyObject_MakeTpCall () from /usr/lib/libpython3.8.so.1.0
> {code}
> Based on my limited understanding of the code, it looks like the error is here:
> [https://github.com/apache/arrow/blob/master/python/pyarrow/_csv.pyx#L799]
> {code:java}
> with SignalStopHandler() as stop_handler:
> io_context = CIOContext(
> maybe_unbox_memory_pool(memory_pool),
> (<StopToken> stop_handler.stop_token).stop_token)
> {code}
> Where `stop_token` is null, because the `SignalStopHandler` had an empty list of signals on creation ([https://github.com/apache/arrow/blob/master/python/pyarrow/error.pxi#L191).|https://github.com/apache/arrow/blob/master/python/pyarrow/error.pxi#L191]
> {code:java}
> if (signal_handlers_enabled and
> threading.current_thread() is threading.main_thread()):
> self._signals = [
> sig for sig in (signal.SIGINT, signal.SIGTERM)
> if signal.getsignal(sig) not in (signal.SIG_DFL,
> signal.SIG_IGN, None)]
> if not self._signals.empty():
> self._stop_token = StopToken()
> self._stop_token.init(GetResultValue(
> SetSignalStopSource()).token())
> self._enabled = True
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)