You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Apache Arrow JIRA Bot (Jira)" <ji...@apache.org> on 2022/12/20 17:54:00 UTC

[jira] [Assigned] (ARROW-17319) [Python] pyarrow seems to set default CPU affinity to 0 on shutdown, crashes if CPU 0 is not available

     [ https://issues.apache.org/jira/browse/ARROW-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Arrow JIRA Bot reassigned ARROW-17319:
---------------------------------------------

    Assignee:     (was: Mike Gevaert)

> [Python] pyarrow seems to set default CPU affinity to 0 on shutdown, crashes if CPU 0 is not available
> ------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17319
>                 URL: https://issues.apache.org/jira/browse/ARROW-17319
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 9.0.0
>         Environment: Ubuntu 20.02 / Python 3.8.10 (default, Jun 22 2022, 20:18:18)
> $ pip list 
> Package         Version
> --------------- -------
> numpy           1.23.1 
> pandas          1.4.3  
> pip             20.0.2 
> pkg-resources   0.0.0  
> pyarrow         9.0.0  
> python-dateutil 2.8.2  
> pytz            2022.1 
> setuptools      44.0.0 
> six             1.16.0 
>            Reporter: Mike Gevaert
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I get the following traceback when exiting python after loading {{pyarrow.parquet}}
> {code}
> Python 3.8.10 (default, Jun 22 2022, 20:18:18) 
> [GCC 9.4.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> os.getpid()
> 25106
> >>> import pyarrow.parquet
> >>> 
> Fatal error condition occurred in /opt/vcpkg/buildtrees/aws-c-io/src/9e6648842a-364b708815.clean/source/event_loop.c:72: aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn, el_group, &thread_options) == AWS_OP_SUCCESS
> Exiting Application
> ################################################################################
> Stack trace:
> ################################################################################
> /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200af06) [0x7f831b2b3f06]
> /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x20028e5) [0x7f831b2ab8e5]
> /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1f27e09) [0x7f831b1d0e09]
> /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200ba3d) [0x7f831b2b4a3d]
> /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1f25948) [0x7f831b1ce948]
> /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200ba3d) [0x7f831b2b4a3d]
> /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1ee0b46) [0x7f831b189b46]
> /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x194546a) [0x7f831abee46a]
> /lib/x86_64-linux-gnu/libc.so.6(+0x468a7) [0x7f831c6188a7]
> /lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7f831c618a60]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfa) [0x7f831c5f608a]
>  {code}
> To replicate this; one needs to make sure that CPU 0 isn't available to schedule tasks on.  In HPC our environment, that happens due to slurm using cgroups to constrain CPU usage.
> On a linux workstation, one should be able to:
> 1) open python as a normal user
> 2) get the pid
> 3) as root:
> {code}
> cd /sys/fs/cgroup/cpuset/
> mkdir pyarrow
> cd pyarrow
> echo 0 > cpuset.mems
> echo 1 > cpuset.cpus # sets the cgroup to only have access to cpu 1
> echo $PID > tasks
> {code}
> Then, in the python enviroment:
> {code}
> import pyarrow.parquet
> exit()
> {code}
> Which should trigger the crash.
> Sadly, I couldn't track down which {{aws-c-common}} and {{aws-c-io}} are being used for the 9.0.0 py38 manylinux wheels. (libarrow.so.900 has BuildID[sha1]=dd6c5a2efd5cacf09657780a58c40f7c930e4df1)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)