You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "H. Vetinari (Jira)" <ji...@apache.org> on 2022/12/09 03:16:00 UTC
[jira] [Commented] (ARROW-15141) [C++] Fatal error condition occurred in aws_thread_launch

    [ https://issues.apache.org/jira/browse/ARROW-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645058#comment-17645058 ] 

H. Vetinari commented on ARROW-15141:
-------------------------------------

A couple of days ago we released arrow 10.0.1 in conda-forge with aws-sdk-cpp 1.9 (with a version that should have the fix for this issue). Could someone ([~hoeze]?) verify if the problem still exists with v10? Then we could backport 1.9 change to the maintenance branches, which would make the situation with the openssl 3 migration much easier.

> [C++] Fatal error condition occurred in aws_thread_launch
> ---------------------------------------------------------
>
>                 Key: ARROW-15141
>                 URL: https://issues.apache.org/jira/browse/ARROW-15141
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 6.0.0, 6.0.1
>         Environment: - `uname -a`:
> Linux datalab2 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
> - `mamba list | grep -i "pyarrow\|tensorflow\|^python"`
> pyarrow                   6.0.0           py39hff6fa39_1_cpu    conda-forge
> python                    3.9.7           hb7a2778_3_cpython    conda-forge
> python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
> python-flatbuffers        1.12               pyhd8ed1ab_1    conda-forge
> python-irodsclient        1.0.0              pyhd8ed1ab_0    conda-forge
> python-rocksdb            0.7.0            py39h7fcd5f3_4    conda-forge
> python_abi                3.9                      2_cp39    conda-forge
> tensorflow                2.6.2           cuda112py39h9333c2f_0    conda-forge
> tensorflow-base           2.6.2           cuda112py39h7de589b_0    conda-forge
> tensorflow-estimator      2.6.2           cuda112py39h9333c2f_0    conda-forge
> tensorflow-gpu            2.6.2           cuda112py39h0bbbad9_0    conda-forge
>            Reporter: F. H.
>            Assignee: Uwe Korn
>            Priority: Major
>
> Hi, I am getting randomly the following error when first running inference with a Tensorflow model and then writing the result to a `.parquet` file:
> {code}
> Fatal error condition occurred in /home/conda/feedstock_root/build_artifacts/aws-c-io_1633633131324/work/source/event_loop.c:72: aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn, el_group, &thread_options) == AWS_OP_SUCCESS
> Exiting Application
> ################################################################################
> Stack trace:
> ################################################################################
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_backtrace_print+0x59) [0x7ffb14235f19]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_fatal_assert+0x48) [0x7ffb14227098]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0x10a43) [0x7ffb1406ea43]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d) [0x7ffb14237fad]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0xe35a) [0x7ffb1406c35a]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d) [0x7ffb14237fad]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-crt-cpp.so(_ZN3Aws3Crt2Io15ClientBootstrapD1Ev+0x3a) [0x7ffb142a2f5a]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so(+0x5f570) [0x7ffb147fd570]
> /lib/x86_64-linux-gnu/libc.so.6(+0x49a27) [0x7ffb17f7da27]
> /lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7ffb17f7dbe0]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfa) [0x7ffb17f5b0ba]
> /home/<user>/miniconda3/envs/spliceai_env/bin/python3.9(+0x20aa51) [0x562576609a51]
> /bin/bash: line 1: 2341494 Aborted                 (core dumped)
> {code}
> My colleague ran into the same issue on Centos 8 while running the same job + same environment on SLURM, so I guess it could be some issue with tensorflow + pyarrow.
> Also I found a github issue with multiple people running into the same issue:
> [https://github.com/huggingface/datasets/issues/3310]
>  
> It would be very important to my lab that this bug gets resolved, as we cannot work with parquet any more. Unfortunately, we do not have the knowledge to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)