You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2016/12/20 16:38:58 UTC
[jira] [Created] (ARROW-436) [Python] pandas-parquet roundtrip
dtype mismatch
Wes McKinney created ARROW-436:
----------------------------------
Summary: [Python] pandas-parquet roundtrip dtype mismatch
Key: ARROW-436
URL: https://issues.apache.org/jira/browse/ARROW-436
Project: Apache Arrow
Issue Type: Bug
Components: Python
Reporter: Wes McKinney
As a follow up to ARROW-434, I observed the following odd failure:
{code}
@parquet
def test_pandas_parquet_pyfile_failure(tmpdir):
filename = tmpdir.join('pandas_pyfile_roundtrip.parquet').strpath
size = 5
np.random.seed(0)
df = pd.DataFrame({
'uint8': np.arange(size, dtype=np.uint8),
'uint16': np.arange(size, dtype=np.uint16),
'uint32': np.arange(size, dtype=np.uint32),
'uint64': np.arange(size, dtype=np.uint64),
'int8': np.arange(size, dtype=np.int16),
'int16': np.arange(size, dtype=np.int16),
'int32': np.arange(size, dtype=np.int32),
'int64': np.arange(size, dtype=np.int64),
'float32': np.arange(size, dtype=np.float32),
'float64': np.arange(size, dtype=np.float64),
'bool': np.random.randn(size) > 0
})
arrow_table = A.from_pandas_dataframe(df)
with open(filename, 'wb') as f:
A.parquet.write_table(arrow_table, f, version="1.0")
data = io.BytesIO(open(filename, 'rb').read())
table_read = pq.read_table(data)
df_read = table_read.to_pandas()
pdt.assert_frame_equal(df, df_read)
{code}
I see debugging locally:
{code}
(Pdb) df.dtypes
bool bool
float32 float32
float64 float64
int16 int16
int32 int32
int64 int64
int8 int16
uint16 uint16
uint32 uint32
uint64 uint64
uint8 uint8
dtype: object
(Pdb) df_read.dtypes
bool bool
float32 float32
float64 float64
int16 int16
int32 int32
int64 int64
int8 int16
uint16 uint16
uint32 int64
uint64 uint64
uint8 uint8
dtype: object
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)