You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Joachim Haga (Jira)" <ji...@apache.org> on 2019/11/12 07:51:00 UTC
[jira] [Created] (ARROW-7112) Wrong contents when initializinga
pyarrow.Table from boolean DataFrame
Joachim Haga created ARROW-7112:
-----------------------------------
Summary: Wrong contents when initializinga pyarrow.Table from boolean DataFrame
Key: ARROW-7112
URL: https://issues.apache.org/jira/browse/ARROW-7112
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.14.1
Environment: Tested with 0.14.1 and 0.14.0.RAY from pip3 on ubuntu
Reporter: Joachim Haga
When initializing a Table from a boolean pandas.DataFrame _that is not in Fortran order_, the contents of the resulting Table is different from the contents of the DataFrame.
Sample:
{code:java}
import pandas as pd
import pyarrow as pa
import numpy as np
mask = np.full((3,3), False)
mask[:,1] = True
df = pd.DataFrame(mask)
print(df)
print(pa.table(df).to_pandas())
{code}
The output:
{noformat}
0 1 2
0 False True False
1 False True False
2 False True False
0 1 2
0 False True False
1 False False False
2 False False False
{noformat}
I.e., column 1 is different before and after roundtripping through pa.Table.
If I add *{{order='F'}}* to the *{{np.full}}* invocation, the result is as expected. Also, the problem seems to disappear if I use {{*dtype=int*}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)