You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Dave Challis (JIRA)" <ji...@apache.org> on 2018/04/09 15:59:00 UTC

[jira] [Created] (ARROW-2429) [Python] Timestamp unit in schema changes when writing to Parquet file then reading back

Dave Challis created ARROW-2429:
-----------------------------------

             Summary: [Python] Timestamp unit in schema changes when writing to Parquet file then reading back
                 Key: ARROW-2429
                 URL: https://issues.apache.org/jira/browse/ARROW-2429
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.9.0
         Environment: Mac OS High Sierra
PyArrow 0.9.0 (py36_1)
Python
            Reporter: Dave Challis


When creating an Arrow table from a Pandas DataFrame, the table schema contains a field of type `timestamp[ns]`.

When serialising that table to a parquet file and then immediately reading it back, the schema of the table read instead contains a field with type `timestamp[us]`.

 
{code:python}
#!/usr/bin/env python

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

# create DataFrame with a datetime column
df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']})
df['created'] = pd.to_datetime(df['created'])

# create Arrow table from DataFrame
table = pa.Table.from_pandas(df, preserve_index=False)

# write the table as a parquet file, then read it back again
pq.write_table(table, 'foo.parquet')
table2 = pq.read_table('foo.parquet')



print(table.schema[0])  # pyarrow.Field<created: timestamp[ns]> (nanosecond units)
print(table2.schema[0]) # pyarrow.Field<created: timestamp[us]> (microsecond units)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)