You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Rob Ambalu (JIRA)" <ji...@apache.org> on 2018/06/06 16:21:00 UTC

[jira] [Created] (ARROW-2679) pyarrow dataframe streaming to/from parquet is type-lossy

Rob Ambalu created ARROW-2679:
---------------------------------

             Summary: pyarrow dataframe streaming to/from parquet is type-lossy
                 Key: ARROW-2679
                 URL: https://issues.apache.org/jira/browse/ARROW-2679
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.9.0
            Reporter: Rob Ambalu


While streaming a dataframe -> pyarrow -> parquet file and back I noticed that my date column had its type information switch from "object" ( which would have loaded it as a date I would imagine ) to "datetime":

 
{code:java}
from datetime import date
import pandas as pd
import pyarrow.parquet as pp
import pyarrow as pa

df = pd.DataFrame( { 'a' : [ date( 2017, 1, 1), date( 2017, 2, 1 ) ] })
table = pa.Table.from_pandas( df )
pp.write_table( table, 'C:\\Temp\\parquet_test')
table2 = pp.read_table( 'C:\\Temp\\parquet_test' )
df2 = table2.to_pandas()

>>> df['a'].dtype
dtype('O')
>>> df2['a'].dtype
dtype('<M8[ns]')
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)