You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Michael Wheeler (Jira)" <ji...@apache.org> on 2019/10/22 17:17:00 UTC

[jira] [Created] (ARROW-6968) [Python] 0.14.1 to 0.15.0 upgrade produces AttributeError

Michael Wheeler created ARROW-6968:
--------------------------------------

             Summary: [Python] 0.14.1 to 0.15.0 upgrade produces AttributeError
                 Key: ARROW-6968
                 URL: https://issues.apache.org/jira/browse/ARROW-6968
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.15.0
         Environment: Python 3.7.4 on macOS Mojave 10.14.6
Python 3.6.7 on Ubuntu 16.04.6 LTS
            Reporter: Michael Wheeler
             Fix For: 0.15.0
         Attachments: attribute_error_pyarrow_0_15_0.py

The code in question:
{code:java}
"""
Reproduce AttributeError with PyArrow == 0.15.0
"""
import io
import logging
import pandas
import pyarrow
import sys
import textwrap

logging.basicConfig(level=logging.DEBUG)
logging.debug(f'Python v{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}')
logging.debug(f'PyArrow v{pyarrow.__version__}' + '\n')

CSV_TEXT = textwrap.dedent("""\
              id,gender,some_date,age
              001,M,01/01/2019,75
              002,F,02/02/2018,32
              003,M,03/03/2017,27
              004,F,04/04/2016,19
              005,M,05/05/2015,55
              006,F,06/06/2014,42
              """)

# Initialize pyarrow table via pandas
mock_file = io.StringIO(CSV_TEXT)
df = pandas.read_csv(mock_file).sort_values(['age', 'gender'])
table = pyarrow.Table.from_pandas(df=df)

# This comprehension generates a map between the name of the column and its index
map_col_names_to_incides = {item.name: table.columns.index(item) for item in table.columns}
logging.debug('The column indices are:')
for name, index in map_col_names_to_incides.items():
    logging.debug(f'Col {name} -> #{index}')
{code}
 

Expected result (generated with 0.14.0):
{code:java}
DEBUG:root:Python v3.7.4
DEBUG:root:PyArrow v0.14.1

DEBUG:root:The column indices are:
DEBUG:root:Col id -> #0
DEBUG:root:Col gender -> #1
DEBUG:root:Col some_date -> #2
DEBUG:root:Col age -> #3
DEBUG:root:Col __index_level_0__ -> #4
{code}
Actual result (generated with 0.15.0):
{code:java}
DEBUG:root:Python v3.7.4
DEBUG:root:PyArrow v0.15.0

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1758, in <module>
    main()
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1752, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1147, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/mwheeler/Library/Preferences/PyCharm2019.1/scratches/scratch.py", line 31, in <module>
    map_col_names_to_incides = {item.name: table.columns.index(item) for item in table.columns}
  File "/Users/mwheeler/Library/Preferences/PyCharm2019.1/scratches/scratch.py", line 31, in <dictcomp>
    map_col_names_to_incides = {item.name: table.columns.index(item) for item in table.columns}
AttributeError: 'pyarrow.lib.ChunkedArray' object has no attribute 'name'
{code}
 

This error occurs in both of the environments specified below.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)