You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Geoff Quested-Joens (Jira)" <ji...@apache.org> on 2020/04/09 15:53:00 UTC

[jira] [Created] (ARROW-8385) Crash on parquet.read_table on windows python 3.82

Geoff Quested-Joens created ARROW-8385:
------------------------------------------

             Summary: Crash on parquet.read_table on windows python 3.82
                 Key: ARROW-8385
                 URL: https://issues.apache.org/jira/browse/ARROW-8385
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.16.0
         Environment: Window 10 
python 3.8.2 pip 20.0.2
pip freeze ->
numpy==1.18.2
pandas==1.0.3
pyarrow==0.16.0
python-dateutil==2.8.1
pytz==2019.3
six==1.14.0
            Reporter: Geoff Quested-Joens
         Attachments: crash.parquet

On read of parquet file using pyarrow the program spontaneously exits no thrown exceptions windows only. Testing the same setup on linux (debian 10 in a Docker) reading the same parquet file is done without issue.

The follow can reproduce the crash in a python 3.8.2 environment env listed bellow but is essentially pip install pandas and pyarrow.
{code:python}
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

def test_pandas_write_read():
    df_out = pd.DataFrame.from_dict([{"A":i} for i in range(3)])
    df_out.to_parquet("crash.parquet")
    df_in = pd.read_parquet("crash.parquet")
    print(df_in)

def test_arrow_write_read():
    df = pd.DataFrame.from_dict([{"A":i} for i in range(3)])
    table_out = pa.Table.from_pandas(df)
    pq.write_table(table_out, 'crash.parquet')
    table_in = pq.read_table('crash.parquet')
    print(table_in)

if _name_ == "_main_":
    test_pandas_write_read()
    test_arrow_write_read()
{code}
 The interpreter never reaches the print statements crashing somewhere in the call on line 252 of {{parquet.py}} no error is thrown just spontaneous program exit.
{code:python}
    self.reader.read_all(...
{code}
In contrast running the same code and python environment in debian 10 there is no error reading the parquet files generated by the same windows code. The sha2sum compare equal for the crash.parquet generated running on debian and windows so something appears to be up with the read. Attached is the crash.parquet file generated on my machine.

Obtusely changing the {{range(3)}} to {{range(2)}} gets rid of the crash on windows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)