You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Uwe L. Korn (JIRA)" <ji...@apache.org> on 2018/02/27 14:20:00 UTC

[jira] [Commented] (ARROW-2194) [Python] Pandas columns metadata incorrect for empty string columns

    [ https://issues.apache.org/jira/browse/ARROW-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378634#comment-16378634 ] 

Uwe L. Korn commented on ARROW-2194:
------------------------------------

The behaviour for this is different on master.

You now get {{empty}} instead of {{float64}}:
{code:java}
[{'field_name': 'bytes',
'metadata': None,
'name': 'bytes',
'numpy_type': 'object',
'pandas_type': 'empty'},
{'field_name': 'unicode',
'metadata': None,
'name': 'unicode',
'numpy_type': 'object',
'pandas_type': 'empty'},
{'field_name': '__index_level_0__',
'metadata': None,
'name': None,
'numpy_type': 'int64',
'pandas_type': 'int64'}]{code}

> [Python] Pandas columns metadata incorrect for empty string columns
> -------------------------------------------------------------------
>
>                 Key: ARROW-2194
>                 URL: https://issues.apache.org/jira/browse/ARROW-2194
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>            Reporter: Florian Jetter
>            Priority: Minor
>             Fix For: 0.9.0
>
>
> The {{pandas_type}} for {{bytes}} or {{unicode}} columns of an empty pandas DataFrame is unexpectedly {{float64}}
>  
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> import json
> empty_df = pd.DataFrame({'unicode': np.array([], dtype=np.unicode_), 'bytes': np.array([], dtype=np.bytes_)})
> empty_table = pa.Table.from_pandas(empty_df)
> json.loads(empty_table.schema.metadata[b'pandas'])['columns']
> # Same behavior for input dtype np.unicode_
> [{u'field_name': u'bytes',
> u'metadata': None,
> u'name': u'bytes',
> u'numpy_type': u'object',
> u'pandas_type': u'float64'},
> {u'field_name': u'unicode',
> u'metadata': None,
> u'name': u'unicode',
> u'numpy_type': u'object',
> u'pandas_type': u'float64'},
> {u'field_name': u'__index_level_0__',
> u'metadata': None,
> u'name': None,
> u'numpy_type': u'int64',
> u'pandas_type': u'int64'}]{code}
>  
> Tested on Debian 8 with python2.7 and python 3.6.4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)