You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joshua Storck (JIRA)" <ji...@apache.org> on 2018/04/17 19:22:00 UTC
[jira] [Closed] (ARROW-2020) [Python] Parquet segfaults if coercing
ns timestamps and writing 96-bit timestamps
[ https://issues.apache.org/jira/browse/ARROW-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joshua Storck closed ARROW-2020.
--------------------------------
Resolution: Duplicate
> [Python] Parquet segfaults if coercing ns timestamps and writing 96-bit timestamps
> ----------------------------------------------------------------------------------
>
> Key: ARROW-2020
> URL: https://issues.apache.org/jira/browse/ARROW-2020
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.8.0
> Environment: OS: Mac OS X 10.13.2
> Python: 3.6.4
> PyArrow: 0.8.0
> Reporter: Diego Argueta
> Assignee: Joshua Storck
> Priority: Major
> Labels: timestamps
> Fix For: 0.10.0
>
> Attachments: crash-report.txt
>
>
> If you try to write a PyArrow table containing nanosecond-resolution timestamps to Parquet using `coerce_timestamps` and `use_deprecated_int96_timestamps=True`, the Arrow library will segfault.
> The crash doesn't happen if you don't coerce the timestamp resolution or if you don't use 96-bit timestamps.
>
>
> *To Reproduce:*
>
> {code:java}
>
> import datetime
> import pyarrow
> from pyarrow import parquet
> schema = pyarrow.schema([
> pyarrow.field('last_updated', pyarrow.timestamp('ns')),
> ])
> data = [
> pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('ns')),
> ]
> table = pyarrow.Table.from_arrays(data, ['last_updated'])
> with open('test_file.parquet', 'wb') as fdesc:
> parquet.write_table(table, fdesc,
> coerce_timestamps='us', # 'ms' works too
> use_deprecated_int96_timestamps=True){code}
>
> See attached file for the crash report.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)