You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2018/12/04 01:42:00 UTC

[jira] [Commented] (ARROW-3907) [Python] from_pandas errors when schemas are used with lower resolution timestamps

    [ https://issues.apache.org/jira/browse/ARROW-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708059#comment-16708059 ] 

Wes McKinney commented on ARROW-3907:
-------------------------------------

ETL can be a messy business. If you have ideas about improving the APIs for schema coercion / casting, I'd be interested to discuss more

> [Python] from_pandas errors when schemas are used with lower resolution timestamps
> ----------------------------------------------------------------------------------
>
>                 Key: ARROW-3907
>                 URL: https://issues.apache.org/jira/browse/ARROW-3907
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.11.1
>            Reporter: David Lee
>            Priority: Major
>             Fix For: 0.11.1
>
>
> When passing in a schema object to from_pandas a resolution error occurs if the schema uses a lower resolution timestamp. Do we need to also add "coerce_timestamps" and "allow_truncated_timestamps" parameters found in write_table() to from_pandas()?
> Error:
> pyarrow.lib.ArrowInvalid: ('Casting from timestamp[ns] to timestamp[ms] would lose data: 1532015191753713000', 'Conversion failed for column modified with type datetime64[ns]')
> Code:
>  
> {code:java}
> processed_schema = pa.schema([
> pa.field('Id', pa.string()),
> pa.field('modified', pa.timestamp('ms')),
> pa.field('records', pa.int32())
> ])
> pa.Table.from_pandas(df, schema=processed_schema, preserve_index=False)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)