You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Florian Jetter (JIRA)" <ji...@apache.org> on 2018/02/21 17:47:00 UTC

[jira] [Created] (ARROW-2194) Pandas columns metadata incorrect for empty string columns

Florian Jetter created ARROW-2194:
-------------------------------------

             Summary: Pandas columns metadata incorrect for empty string columns
                 Key: ARROW-2194
                 URL: https://issues.apache.org/jira/browse/ARROW-2194
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.8.0
            Reporter: Florian Jetter


The {{pandas_type}} for {{bytes}} or {{unicode}} columns of an empty pandas DataFrame is unexpectedly {{float64}}

 
{code}
import numpy as np
import pandas as pd
import pyarrow as pa
import json

empty_df = pd.DataFrame({'unicode': np.array([], dtype=np.unicode_), 'bytes': np.array([], dtype=np.bytes_)})
empty_table = pa.Table.from_pandas(empty_df)
json.loads(empty_table.schema.metadata[b'pandas'])['columns']

# Same behavior for input dtype np.unicode_
[{u'field_name': u'bytes',
u'metadata': None,
u'name': u'bytes',
u'numpy_type': u'object',
u'pandas_type': u'float64'},
{u'field_name': u'unicode',
u'metadata': None,
u'name': u'unicode',
u'numpy_type': u'object',
u'pandas_type': u'float64'},
{u'field_name': u'__index_level_0__',
u'metadata': None,
u'name': None,
u'numpy_type': u'int64',
u'pandas_type': u'int64'}]{code}
 

Tested on Debian 8 with python2.7 and python 3.6.4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)