You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2018/02/21 18:13:00 UTC

[jira] [Updated] (ARROW-2194) Pandas columns metadata incorrect for empty string columns

     [ https://issues.apache.org/jira/browse/ARROW-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney updated ARROW-2194:
--------------------------------
    Fix Version/s: 0.9.0

> Pandas columns metadata incorrect for empty string columns
> ----------------------------------------------------------
>
>                 Key: ARROW-2194
>                 URL: https://issues.apache.org/jira/browse/ARROW-2194
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>            Reporter: Florian Jetter
>            Priority: Minor
>             Fix For: 0.9.0
>
>
> The {{pandas_type}} for {{bytes}} or {{unicode}} columns of an empty pandas DataFrame is unexpectedly {{float64}}
>  
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> import json
> empty_df = pd.DataFrame({'unicode': np.array([], dtype=np.unicode_), 'bytes': np.array([], dtype=np.bytes_)})
> empty_table = pa.Table.from_pandas(empty_df)
> json.loads(empty_table.schema.metadata[b'pandas'])['columns']
> # Same behavior for input dtype np.unicode_
> [{u'field_name': u'bytes',
> u'metadata': None,
> u'name': u'bytes',
> u'numpy_type': u'object',
> u'pandas_type': u'float64'},
> {u'field_name': u'unicode',
> u'metadata': None,
> u'name': u'unicode',
> u'numpy_type': u'object',
> u'pandas_type': u'float64'},
> {u'field_name': u'__index_level_0__',
> u'metadata': None,
> u'name': None,
> u'numpy_type': u'int64',
> u'pandas_type': u'int64'}]{code}
>  
> Tested on Debian 8 with python2.7 and python 3.6.4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)