You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Eric Conlon (JIRA)" <ji...@apache.org> on 2018/09/25 16:21:00 UTC

[jira] [Commented] (ARROW-2555) [Python] Provide an option to convert on coerce_timestamps instead of error

    [ https://issues.apache.org/jira/browse/ARROW-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627586#comment-16627586 ] 

Eric Conlon commented on ARROW-2555:
------------------------------------

We have been hitting this, here is a very simple repro:
{code:java}
from datetime import datetime
import pandas as pd

dt = datetime(day=1, month=1, year=2017, hour=1, minute=1, second=1, microsecond=1)
values = [(dt,)]
df = pd.DataFrame.from_records(values, columns=['testname'])
df.to_parquet('/tmp/repro.parquet', coerce_timestamps='ms')
{code}
This fails with:
{code:java}
Traceback (most recent call last):
<SNIP>
File "pyarrow/_parquet.pyx", line 922, in pyarrow._parquet.ParquetWriter.write_table
File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1483232461000001000
Segmentation fault (core dumped)
{code}

> [Python] Provide an option to convert on coerce_timestamps instead of error
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-2555
>                 URL: https://issues.apache.org/jira/browse/ARROW-2555
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Uwe L. Korn
>            Priority: Major
>             Fix For: 0.13.0
>
>
> At the moment, we error out on {{coerce_timestamps='ms'}} on {{pyarrow.parquet.write_table}} if the data contains a timestamp that would loose information when converted to milliseconds. In a lot of cases the user does not care about this granularity and rather wants the comfort functionality that the timestamp are stored regardlessly in Parquet. Thus we should provide an option to ignore the error and do the lossy conversion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)