You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Iemand (Jira)" <ji...@apache.org> on 2020/03/06 06:34:00 UTC

[jira] [Created] (ARROW-8017) [Python] Pyarrow no support for pathlib Path with table = pa.Table.from_pandas() or pd.to_parquet()

Iemand created ARROW-8017:
-----------------------------

             Summary: [Python] Pyarrow no support for pathlib Path with table = pa.Table.from_pandas() or pd.to_parquet()
                 Key: ARROW-8017
                 URL: https://issues.apache.org/jira/browse/ARROW-8017
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.15.1
         Environment: Conda :
arrow-cpp                 0.15.1           py38h7cd5009_5
numba                     0.48.0           py38h0573a6f_0  
numpy                     1.18.1           py38h4f9e942_0  
numpy-base                1.18.1           py38hde5b4d6_1
pandas                    1.0.1            py38h0573a6f_0
pyarrow                   0.15.1           py38h0573a6f_0
pycparser                 2.19                       py_0
python                    3.8.1                h0371630_1  
python-dateutil           2.8.1                      py_0

            Reporter: Iemand


Trying to store a table with Python's pathlib Path will give an ArrowInvalid:

{{ArrowInvalid: ('Could not convert foo/spam.wav with type PosixPath: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column filepath with type object')}}

{{Pandas approach:}}
{code:python}
import pandas as pd
df_test = pd.DataFrame({"filepath": [Path("foo", "spam.wav")]})
df_test.to_parquet("egg.parquet"){code}
 

{{Parquet approach}}
{code:python}
import pyarrow as pa
import pyarrow.parquet as pq
table = pa.Table.from_pandas(df_test)  # fails here
# pq.write_table(table, 'egg.parquet') # , version='2.0'
{code}
 

{{Full error Traceback of }}{{pa.Table.from_pandas}}
{code:python}
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-220-bce69439945e> in <module>
      2 import pyarrow.parquet as pq
      3 
----> 4 table = pa.Table.from_pandas(df_test)
      5 pq.write_table(table, 'egg.parquet', version='2.0')

~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/table.pxi in pyarrow.lib.Table.from_pandas()

~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
    552 
    553     if nthreads == 1:
--> 554         arrays = [convert_column(c, f)
    555                   for c, f in zip(columns_to_convert, convert_fields)]
    556     else:

~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py in <listcomp>(.0)
    552 
    553     if nthreads == 1:
--> 554         arrays = [convert_column(c, f)
    555                   for c, f in zip(columns_to_convert, convert_fields)]
    556     else:

~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py in convert_column(col, field)
    544             e.args += ("Conversion failed for column {0!s} with type {1!s}"
    545                        .format(col.name, col.dtype),)
--> 546             raise e
    547         if not field_nullable and result.null_count > 0:
    548             raise ValueError("Field {} was non-nullable but pandas column "

~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py in convert_column(col, field)
    538 
    539         try:
--> 540             result = pa.array(col, type=type_, from_pandas=True, safe=safe)
    541         except (pa.ArrowInvalid,
    542                 pa.ArrowNotImplementedError,

~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/array.pxi in pyarrow.lib.array()

~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: ('Could not convert foo/spam.wav with type PosixPath: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column filepath with type object'){code}

Might be related to https://issues.apache.org/jira/browse/ARROW-2046 , although that was about file save location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)