You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Yiannis Liodakis (JIRA)" <ji...@apache.org> on 2018/01/24 02:54:00 UTC
[jira] [Created] (ARROW-2020) pyarrow: Parquet segfaults if
coercing ns timestamps and writing 96-bit timestamps
Yiannis Liodakis created ARROW-2020:
---------------------------------------
Summary: pyarrow: Parquet segfaults if coercing ns timestamps and writing 96-bit timestamps
Key: ARROW-2020
URL: https://issues.apache.org/jira/browse/ARROW-2020
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.8.0
Environment: OS: Mac OS X 10.13.2
Python: 3.6.4
PyArrow: 0.8.0
Reporter: Yiannis Liodakis
Attachments: crash-report.txt
If you try to write a PyArrow table containing nanosecond-resolution timestamps to Parquet using `coerce_timestamps` and `use_deprecated_int96_timestamps=True`, the Arrow library will segfault.
The crash doesn't happen if you don't coerce the timestamp resolution or if you don't use 96-bit timestamps.
*To Reproduce:*
{code:java}
import datetime
import pyarrow
from pyarrow import parquet
schema = pyarrow.schema([
pyarrow.field('last_updated', pyarrow.timestamp('ns')),
])
data = [
pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('ns')),
]
table = pyarrow.Table.from_arrays(data, ['last_updated'])
with open('test_file.parquet', 'wb') as fdesc:
parquet.write_table(table, fdesc,
coerce_timestamps='us', # 'ms' works too
use_deprecated_int96_timestamps=True){code}
See attached file for the crash report.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)