You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Iemand (Jira)" <ji...@apache.org> on 2020/03/06 06:34:00 UTC
[jira] [Created] (ARROW-8017) [Python] Pyarrow no support for
pathlib Path with table = pa.Table.from_pandas() or pd.to_parquet()
Iemand created ARROW-8017:
-----------------------------
Summary: [Python] Pyarrow no support for pathlib Path with table = pa.Table.from_pandas() or pd.to_parquet()
Key: ARROW-8017
URL: https://issues.apache.org/jira/browse/ARROW-8017
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.15.1
Environment: Conda :
arrow-cpp 0.15.1 py38h7cd5009_5
numba 0.48.0 py38h0573a6f_0
numpy 1.18.1 py38h4f9e942_0
numpy-base 1.18.1 py38hde5b4d6_1
pandas 1.0.1 py38h0573a6f_0
pyarrow 0.15.1 py38h0573a6f_0
pycparser 2.19 py_0
python 3.8.1 h0371630_1
python-dateutil 2.8.1 py_0
Reporter: Iemand
Trying to store a table with Python's pathlib Path will give an ArrowInvalid:
{{ArrowInvalid: ('Could not convert foo/spam.wav with type PosixPath: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column filepath with type object')}}
{{Pandas approach:}}
{code:python}
import pandas as pd
df_test = pd.DataFrame({"filepath": [Path("foo", "spam.wav")]})
df_test.to_parquet("egg.parquet"){code}
{{Parquet approach}}
{code:python}
import pyarrow as pa
import pyarrow.parquet as pq
table = pa.Table.from_pandas(df_test) # fails here
# pq.write_table(table, 'egg.parquet') # , version='2.0'
{code}
{{Full error Traceback of }}{{pa.Table.from_pandas}}
{code:python}
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
<ipython-input-220-bce69439945e> in <module>
2 import pyarrow.parquet as pq
3
----> 4 table = pa.Table.from_pandas(df_test)
5 pq.write_table(table, 'egg.parquet', version='2.0')
~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/table.pxi in pyarrow.lib.Table.from_pandas()
~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
552
553 if nthreads == 1:
--> 554 arrays = [convert_column(c, f)
555 for c, f in zip(columns_to_convert, convert_fields)]
556 else:
~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py in <listcomp>(.0)
552
553 if nthreads == 1:
--> 554 arrays = [convert_column(c, f)
555 for c, f in zip(columns_to_convert, convert_fields)]
556 else:
~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py in convert_column(col, field)
544 e.args += ("Conversion failed for column {0!s} with type {1!s}"
545 .format(col.name, col.dtype),)
--> 546 raise e
547 if not field_nullable and result.null_count > 0:
548 raise ValueError("Field {} was non-nullable but pandas column "
~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py in convert_column(col, field)
538
539 try:
--> 540 result = pa.array(col, type=type_, from_pandas=True, safe=safe)
541 except (pa.ArrowInvalid,
542 pa.ArrowNotImplementedError,
~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/array.pxi in pyarrow.lib.array()
~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
ArrowInvalid: ('Could not convert foo/spam.wav with type PosixPath: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column filepath with type object'){code}
Might be related to https://issues.apache.org/jira/browse/ARROW-2046 , although that was about file save location.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)