You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Joachim Haga (Jira)" <ji...@apache.org> on 2019/11/12 07:51:00 UTC

[jira] [Created] (ARROW-7112) Wrong contents when initializinga pyarrow.Table from boolean DataFrame

Joachim Haga created ARROW-7112:
-----------------------------------

             Summary: Wrong contents when initializinga pyarrow.Table from boolean DataFrame 
                 Key: ARROW-7112
                 URL: https://issues.apache.org/jira/browse/ARROW-7112
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.14.1
         Environment: Tested with 0.14.1 and 0.14.0.RAY from pip3 on ubuntu
            Reporter: Joachim Haga


When initializing a Table from a boolean pandas.DataFrame _that is not in Fortran order_, the contents of the resulting Table is different from the contents of the DataFrame.

Sample:

 
{code:java}
import pandas as pd
import pyarrow as pa
import numpy as np
mask = np.full((3,3), False)
mask[:,1] = True
df = pd.DataFrame(mask)
print(df)
print(pa.table(df).to_pandas()) 
{code}
 

The output:

 
{noformat}
       0     1      2
0  False  True  False
1  False  True  False
2  False  True  False
       0      1      2
0  False   True  False
1  False  False  False
2  False  False  False
{noformat}
I.e., column 1 is different before and after roundtripping through pa.Table.

If I add *{{order='F'}}* to the *{{np.full}}* invocation, the result is as expected. Also, the problem seems to disappear if I use {{*dtype=int*}}.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)