You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ARF (Jira)" <ji...@apache.org> on 2021/02/28 10:09:00 UTC
[jira] [Commented] (ARROW-9369) [Python] Support conversion from
python sequence to dictionary type
[ https://issues.apache.org/jira/browse/ARROW-9369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292362#comment-17292362 ]
ARF commented on ARROW-9369:
----------------------------
I think this issue has been fixed and can be closed:
{code:python}
pa.array(['a', 'b', 'a'], pa.dictionary(pa.int32(), pa.string()))
{code}
Output:
{code:none}
<pyarrow.lib.DictionaryArray object at 0x00000172FAABEBA0>
-- dictionary:
[
"a",
"b"
]
-- indices:
[
0,
1,
0
]
{code}
> [Python] Support conversion from python sequence to dictionary type
> -------------------------------------------------------------------
>
> Key: ARROW-9369
> URL: https://issues.apache.org/jira/browse/ARROW-9369
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.17.1
> Reporter: Tomas Remes
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> Converting from a python sequence with specified target type isn't implemented yet for dictionary type:
> {code}
> In [1]: pa.array(['a', 'b', 'a'], pa.dictionary(pa.int32(), pa.string()))
> ---------------------------------------------------------------------------
> ArrowNotImplementedError Traceback (most recent call last)
> <ipython-input-1-bda8628a4917> in <module>
> ----> 1 pa.array(['a', 'b', 'a'], pa.dictionary(pa.int32(), pa.string()))
> ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
> ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()
> ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
> ArrowNotImplementedError: Sequence converter for type dictionary<values=string, indices=int32, ordered=0> not implemented
> {code}
> -----
> _Original report_
> Hello, I am trying to do the following (please correct me if I am doing some non-sense):
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> fields = [pa.field("object", pa.dictionary(pa.int64(), pa.string()))]
> data = {"object": {
> "a": "a",
> "b": "b",
> "c": "c",
> "s": "d" }}
> df = pd.DataFrame(data)
> table = pa.Table.from_pandas(df, pa.schema(fields))
> pq.write_table(table, "test.parquet")
> {code}
> and I am getting:
> {noformat}
> Traceback (most recent call last):
> File "pa_test.py", line 17, in <module>
> table = pa.Table.from_pandas(df, pa.schema(fields))
> File "pyarrow/table.pxi", line 1451, in pyarrow.lib.Table.from_pandas
> File "/home/tremes/GITHUB/data-pipeline/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py", line 575, in dataframe_to_arrays
> for c, f in zip(columns_to_convert, convert_fields)]
> File "/home/tremes/GITHUB/data-pipeline/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py", line 575, in <listcomp>
> for c, f in zip(columns_to_convert, convert_fields)]
> File "/home/tremes/GITHUB/data-pipeline/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py", line 566, in convert_column
> raise e
> File "/home/tremes/GITHUB/data-pipeline/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py", line 560, in convert_column
> result = pa.array(col, type=type_, from_pandas=True, safe=safe)
> File "pyarrow/array.pxi", line 265, in pyarrow.lib.array
> File "pyarrow/array.pxi", line 80, in pyarrow.lib._ndarray_to_array
> File "pyarrow/error.pxi", line 106, in pyarrow.lib.check_status
> pyarrow.lib.ArrowNotImplementedError: ('Sequence converter for type dictionary<values=string, indices=int64, ordered=0> not implemented', 'Conversion failed for column object with type object')
> {noformat}
> Workaround is to use {{df.to_parquet("test.parquet")}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)