You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Geoff Quested-Joens (Jira)" <ji...@apache.org> on 2020/04/09 15:53:00 UTC
[jira] [Created] (ARROW-8385) Crash on parquet.read_table on
windows python 3.82
Geoff Quested-Joens created ARROW-8385:
------------------------------------------
Summary: Crash on parquet.read_table on windows python 3.82
Key: ARROW-8385
URL: https://issues.apache.org/jira/browse/ARROW-8385
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.16.0
Environment: Window 10
python 3.8.2 pip 20.0.2
pip freeze ->
numpy==1.18.2
pandas==1.0.3
pyarrow==0.16.0
python-dateutil==2.8.1
pytz==2019.3
six==1.14.0
Reporter: Geoff Quested-Joens
Attachments: crash.parquet
On read of parquet file using pyarrow the program spontaneously exits no thrown exceptions windows only. Testing the same setup on linux (debian 10 in a Docker) reading the same parquet file is done without issue.
The follow can reproduce the crash in a python 3.8.2 environment env listed bellow but is essentially pip install pandas and pyarrow.
{code:python}
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
def test_pandas_write_read():
df_out = pd.DataFrame.from_dict([{"A":i} for i in range(3)])
df_out.to_parquet("crash.parquet")
df_in = pd.read_parquet("crash.parquet")
print(df_in)
def test_arrow_write_read():
df = pd.DataFrame.from_dict([{"A":i} for i in range(3)])
table_out = pa.Table.from_pandas(df)
pq.write_table(table_out, 'crash.parquet')
table_in = pq.read_table('crash.parquet')
print(table_in)
if _name_ == "_main_":
test_pandas_write_read()
test_arrow_write_read()
{code}
The interpreter never reaches the print statements crashing somewhere in the call on line 252 of {{parquet.py}} no error is thrown just spontaneous program exit.
{code:python}
self.reader.read_all(...
{code}
In contrast running the same code and python environment in debian 10 there is no error reading the parquet files generated by the same windows code. The sha2sum compare equal for the crash.parquet generated running on debian and windows so something appears to be up with the read. Attached is the crash.parquet file generated on my machine.
Obtusely changing the {{range(3)}} to {{range(2)}} gets rid of the crash on windows.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)