You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Rob Ambalu (JIRA)" <ji...@apache.org> on 2018/06/06 16:21:00 UTC
[jira] [Created] (ARROW-2679) pyarrow dataframe streaming to/from
parquet is type-lossy
Rob Ambalu created ARROW-2679:
---------------------------------
Summary: pyarrow dataframe streaming to/from parquet is type-lossy
Key: ARROW-2679
URL: https://issues.apache.org/jira/browse/ARROW-2679
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.9.0
Reporter: Rob Ambalu
While streaming a dataframe -> pyarrow -> parquet file and back I noticed that my date column had its type information switch from "object" ( which would have loaded it as a date I would imagine ) to "datetime":
{code:java}
from datetime import date
import pandas as pd
import pyarrow.parquet as pp
import pyarrow as pa
df = pd.DataFrame( { 'a' : [ date( 2017, 1, 1), date( 2017, 2, 1 ) ] })
table = pa.Table.from_pandas( df )
pp.write_table( table, 'C:\\Temp\\parquet_test')
table2 = pp.read_table( 'C:\\Temp\\parquet_test' )
df2 = table2.to_pandas()
>>> df['a'].dtype
dtype('O')
>>> df2['a'].dtype
dtype('<M8[ns]')
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)