You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2016/12/20 16:54:58 UTC
[jira] [Closed] (ARROW-436) [Python] pandas-parquet roundtrip dtype
mismatch
[ https://issues.apache.org/jira/browse/ARROW-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney closed ARROW-436.
------------------------------
Resolution: Not A Bug
This is a type fidelity issue for the Parquet 1.0 format (because we don't have specific-integer logical types)
> [Python] pandas-parquet roundtrip dtype mismatch
> ------------------------------------------------
>
> Key: ARROW-436
> URL: https://issues.apache.org/jira/browse/ARROW-436
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Wes McKinney
>
> As a follow up to ARROW-434, I observed the following odd failure:
> {code}
> @parquet
> def test_pandas_parquet_pyfile_failure(tmpdir):
> filename = tmpdir.join('pandas_pyfile_roundtrip.parquet').strpath
> size = 5
> np.random.seed(0)
> df = pd.DataFrame({
> 'uint8': np.arange(size, dtype=np.uint8),
> 'uint16': np.arange(size, dtype=np.uint16),
> 'uint32': np.arange(size, dtype=np.uint32),
> 'uint64': np.arange(size, dtype=np.uint64),
> 'int8': np.arange(size, dtype=np.int16),
> 'int16': np.arange(size, dtype=np.int16),
> 'int32': np.arange(size, dtype=np.int32),
> 'int64': np.arange(size, dtype=np.int64),
> 'float32': np.arange(size, dtype=np.float32),
> 'float64': np.arange(size, dtype=np.float64),
> 'bool': np.random.randn(size) > 0
> })
> arrow_table = A.from_pandas_dataframe(df)
> with open(filename, 'wb') as f:
> A.parquet.write_table(arrow_table, f, version="1.0")
> data = io.BytesIO(open(filename, 'rb').read())
> table_read = pq.read_table(data)
> df_read = table_read.to_pandas()
> pdt.assert_frame_equal(df, df_read)
> {code}
> I see debugging locally:
> {code}
> (Pdb) df.dtypes
> bool bool
> float32 float32
> float64 float64
> int16 int16
> int32 int32
> int64 int64
> int8 int16
> uint16 uint16
> uint32 uint32
> uint64 uint64
> uint8 uint8
> dtype: object
> (Pdb) df_read.dtypes
> bool bool
> float32 float32
> float64 float64
> int16 int16
> int32 int32
> int64 int64
> int8 int16
> uint16 uint16
> uint32 int64
> uint64 uint64
> uint8 uint8
> dtype: object
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)