You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Josh Dimarsky (Jira)" <ji...@apache.org> on 2020/06/25 16:49:00 UTC

[jira] [Updated] (ARROW-9229) Pyarrow.Parquet.read_table Silently Crashes Python

     [ https://issues.apache.org/jira/browse/ARROW-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Dimarsky updated ARROW-9229:
---------------------------------
    Description: 
A simple use of reading a Parquet file using PyArrow crashes Python silently with no explanation. Sudden and strange. I've narrowed it down to reproduce it as follows:

(not sure how to format code on JIRA, first time here)

{code:bash}
conda create -n pa36 python=3.6 pyarrow=0.17 -c conda-forge -y
python
{code}
{code:python}
import pyarrow.parquet
tbl = pyarrow.parquet.read_table("some_file.snappy.parquet")
{code}

Result - It crashes

 

Environment:

{code:bash}(base) > conda env export{code}
{code:yaml}
 name: pa1
 channels:
 - conda-forge
 - defaults
 dependencies:
 - abseil-cpp=20200225.2=h33f27b4_0
 - arrow-cpp=0.17.1=py36h1234567_8_cpu
 - aws-sdk-cpp=1.7.164=vc14h867dc94_1
 - boost-cpp=1.72.0=h2ba7cf6_1
 - brotli=1.0.7=h33f27b4_1002
 - bzip2=1.0.8=hfa6e2cd_2
 - c-ares=1.15.0=h2fa13f4_1001
 - ca-certificates=2020.6.20=hecda079_0
 - certifi=2020.6.20=py36h9f0ad1d_0
 - curl=7.71.0=h4b64cdc_0
 - gflags=2.2.2=he025d50_1002
 - glog=0.4.0=h0174b99_3
 - grpc-cpp=1.30.0=hfae5148_0
 - intel-openmp=2020.0=166
 - krb5=1.17.1=hc04afaa_1
 - libblas=3.8.0=15_mkl
 - libcblas=3.8.0=15_mkl
 - libcurl=7.71.0=h4b64cdc_0
 - liblapack=3.8.0=15_mkl
 - libprotobuf=3.12.3=h7bd577a_0
 - libssh2=1.9.0=h3235a2c_2
 - lz4-c=1.9.2=h62dcd97_1
 - mkl=2020.0=166
 - numpy=1.18.5=py36h4d86e3b_0
 - openssl=1.1.1g=he774522_0
 - pandas=1.0.5=py36hcc50265_0
 - parquet-cpp=1.5.1=2
 - pip=20.1.1=py_1
 - pyarrow=0.17.1=py36h1234567_8_cpu
 - python=3.6.10=he025d50_1009_cpython
 - python-dateutil=2.8.1=py_0
 - python_abi=3.6=1_cp36m
 - pytz=2020.1=pyh9f0ad1d_0
 - re2=2020.06.01=h33f27b4_0
 - setuptools=47.3.1=py36h9f0ad1d_0
 - six=1.15.0=pyh9f0ad1d_0
 - snappy=1.1.8=ha925a31_2
 - thrift-cpp=0.13.0=h1907cbf_2
 - tk=8.6.10=hfa6e2cd_0
 - vc=14.1=h869be7e_1
 - vs2015_runtime=14.16.27012=h30e32a0_2
 - wheel=0.34.2=py_1
 - wincertstore=0.2=py36_1003
 - xz=5.2.5=h2fa13f4_0
 - zlib=1.2.11=h2fa13f4_1006
 - zstd=1.4.4=h9f78265_3
{code}

  was:
A simple use of reading a Parquet file using PyArrow crashes Python silently with no explanation. Sudden and strange. I've narrowed it down to reproduce it as follows:

(not sure how to format code on JIRA, first time here)

conda create -n pa36 python=3.6 pyarrow=0.17 -c conda-forge -y
 python
 import pyarrow.parquet
 tbl = pyarrow.parquet.read_table("some_file.snappy.parquet")

[It crashes]

 

Environment:


(base) > conda env export
name: pa1
channels:
 - conda-forge
 - defaults
dependencies:
 - abseil-cpp=20200225.2=h33f27b4_0
 - arrow-cpp=0.17.1=py36h1234567_8_cpu
 - aws-sdk-cpp=1.7.164=vc14h867dc94_1
 - boost-cpp=1.72.0=h2ba7cf6_1
 - brotli=1.0.7=h33f27b4_1002
 - bzip2=1.0.8=hfa6e2cd_2
 - c-ares=1.15.0=h2fa13f4_1001
 - ca-certificates=2020.6.20=hecda079_0
 - certifi=2020.6.20=py36h9f0ad1d_0
 - curl=7.71.0=h4b64cdc_0
 - gflags=2.2.2=he025d50_1002
 - glog=0.4.0=h0174b99_3
 - grpc-cpp=1.30.0=hfae5148_0
 - intel-openmp=2020.0=166
 - krb5=1.17.1=hc04afaa_1
 - libblas=3.8.0=15_mkl
 - libcblas=3.8.0=15_mkl
 - libcurl=7.71.0=h4b64cdc_0
 - liblapack=3.8.0=15_mkl
 - libprotobuf=3.12.3=h7bd577a_0
 - libssh2=1.9.0=h3235a2c_2
 - lz4-c=1.9.2=h62dcd97_1
 - mkl=2020.0=166
 - numpy=1.18.5=py36h4d86e3b_0
 - openssl=1.1.1g=he774522_0
 - pandas=1.0.5=py36hcc50265_0
 - parquet-cpp=1.5.1=2
 - pip=20.1.1=py_1
 - pyarrow=0.17.1=py36h1234567_8_cpu
 - python=3.6.10=he025d50_1009_cpython
 - python-dateutil=2.8.1=py_0
 - python_abi=3.6=1_cp36m
 - pytz=2020.1=pyh9f0ad1d_0
 - re2=2020.06.01=h33f27b4_0
 - setuptools=47.3.1=py36h9f0ad1d_0
 - six=1.15.0=pyh9f0ad1d_0
 - snappy=1.1.8=ha925a31_2
 - thrift-cpp=0.13.0=h1907cbf_2
 - tk=8.6.10=hfa6e2cd_0
 - vc=14.1=h869be7e_1
 - vs2015_runtime=14.16.27012=h30e32a0_2
 - wheel=0.34.2=py_1
 - wincertstore=0.2=py36_1003
 - xz=5.2.5=h2fa13f4_0
 - zlib=1.2.11=h2fa13f4_1006
 - zstd=1.4.4=h9f78265_3


> Pyarrow.Parquet.read_table Silently Crashes Python
> --------------------------------------------------
>
>                 Key: ARROW-9229
>                 URL: https://issues.apache.org/jira/browse/ARROW-9229
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.17.0, 0.17.1
>         Environment: Windows 10 1903
>            Reporter: Josh Dimarsky
>            Priority: Major
>
> A simple use of reading a Parquet file using PyArrow crashes Python silently with no explanation. Sudden and strange. I've narrowed it down to reproduce it as follows:
> (not sure how to format code on JIRA, first time here)
> {code:bash}
> conda create -n pa36 python=3.6 pyarrow=0.17 -c conda-forge -y
> python
> {code}
> {code:python}
> import pyarrow.parquet
> tbl = pyarrow.parquet.read_table("some_file.snappy.parquet")
> {code}
> Result - It crashes
>  
> Environment:
> {code:bash}(base) > conda env export{code}
> {code:yaml}
>  name: pa1
>  channels:
>  - conda-forge
>  - defaults
>  dependencies:
>  - abseil-cpp=20200225.2=h33f27b4_0
>  - arrow-cpp=0.17.1=py36h1234567_8_cpu
>  - aws-sdk-cpp=1.7.164=vc14h867dc94_1
>  - boost-cpp=1.72.0=h2ba7cf6_1
>  - brotli=1.0.7=h33f27b4_1002
>  - bzip2=1.0.8=hfa6e2cd_2
>  - c-ares=1.15.0=h2fa13f4_1001
>  - ca-certificates=2020.6.20=hecda079_0
>  - certifi=2020.6.20=py36h9f0ad1d_0
>  - curl=7.71.0=h4b64cdc_0
>  - gflags=2.2.2=he025d50_1002
>  - glog=0.4.0=h0174b99_3
>  - grpc-cpp=1.30.0=hfae5148_0
>  - intel-openmp=2020.0=166
>  - krb5=1.17.1=hc04afaa_1
>  - libblas=3.8.0=15_mkl
>  - libcblas=3.8.0=15_mkl
>  - libcurl=7.71.0=h4b64cdc_0
>  - liblapack=3.8.0=15_mkl
>  - libprotobuf=3.12.3=h7bd577a_0
>  - libssh2=1.9.0=h3235a2c_2
>  - lz4-c=1.9.2=h62dcd97_1
>  - mkl=2020.0=166
>  - numpy=1.18.5=py36h4d86e3b_0
>  - openssl=1.1.1g=he774522_0
>  - pandas=1.0.5=py36hcc50265_0
>  - parquet-cpp=1.5.1=2
>  - pip=20.1.1=py_1
>  - pyarrow=0.17.1=py36h1234567_8_cpu
>  - python=3.6.10=he025d50_1009_cpython
>  - python-dateutil=2.8.1=py_0
>  - python_abi=3.6=1_cp36m
>  - pytz=2020.1=pyh9f0ad1d_0
>  - re2=2020.06.01=h33f27b4_0
>  - setuptools=47.3.1=py36h9f0ad1d_0
>  - six=1.15.0=pyh9f0ad1d_0
>  - snappy=1.1.8=ha925a31_2
>  - thrift-cpp=0.13.0=h1907cbf_2
>  - tk=8.6.10=hfa6e2cd_0
>  - vc=14.1=h869be7e_1
>  - vs2015_runtime=14.16.27012=h30e32a0_2
>  - wheel=0.34.2=py_1
>  - wincertstore=0.2=py36_1003
>  - xz=5.2.5=h2fa13f4_0
>  - zlib=1.2.11=h2fa13f4_1006
>  - zstd=1.4.4=h9f78265_3
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)