You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Gil Forsyth (Jira)" <ji...@apache.org> on 2022/08/31 15:26:00 UTC
[jira] [Created] (ARROW-17582) Relax / extend type checking for pyarrow array creation
Gil Forsyth created ARROW-17582:
-----------------------------------
Summary: Relax / extend type checking for pyarrow array creation
Key: ARROW-17582
URL: https://issues.apache.org/jira/browse/ARROW-17582
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Gil Forsyth
in ibis we're interested in offering query results as a record batch – some of the data we're starting with is coming back from a `sqlalchemy.cursor` which _look_ like `tuple`s and `dict`s but are actually `sqlalchemy.engine.row.LegacyRow` and `sqlalchemy.engine.row.RowMapping`, respectively.
The checks in `python_to_arrow.cc` are strict enough that these can't be readily dumped into an `array` without first calling, e.g. `tuple` on the individual rows of the results.
{code:java}
In [168]: batch[:5]
Out[168]: [(1, 2173), (1, 943), (1, 892), (1, 30), (1, 337)]
In [169]: pa_schema = pa.struct([("l_orderkey", pa.int32()), ("l_partkey", pa.int32())])
In [170]: pa.array(batch[:5], type=pa_schema)
---------------------------------------------------------------------------
ArrowTypeError Traceback (most recent call last)
Input In [170], in <cell line: 1>()
----> 1 pa.array(batch[:5], type=pa_schema)
File /nix/store/z9qn3g22d8nx1x4mgzq3497iy8ji5h8x-python3-3.10.6-env/lib/python3.10/site-packages/pyarrow/array.pxi:317, in pyarrow.lib.array()
File /nix/store/z9qn3g22d8nx1x4mgzq3497iy8ji5h8x-python3-3.10.6-env/lib/python3.10/site-packages/pyarrow/array.pxi:39, in pyarrow.lib._sequence_to_array()
File /nix/store/z9qn3g22d8nx1x4mgzq3497iy8ji5h8x-python3-3.10.6-env/lib/python3.10/site-packages/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status()
File /nix/store/z9qn3g22d8nx1x4mgzq3497iy8ji5h8x-python3-3.10.6-env/lib/python3.10/site-packages/pyarrow/error.pxi:123, in pyarrow.lib.check_status()
ArrowTypeError: Could not convert 1 with type int: was expecting tuple of (key, value) pair
/build/apache-arrow-9.0.0/cpp/src/arrow/python/python_to_arrow.cc:938 GetKeyValuePair(items, i)
/build/apache-arrow-9.0.0/cpp/src/arrow/python/python_to_arrow.cc:1010 InferKeyKind(items)
/build/apache-arrow-9.0.0/cpp/src/arrow/python/iterators.h:73 func(value, static_cast<int64_t>(i), &keep_going)
/build/apache-arrow-9.0.0/cpp/src/arrow/python/python_to_arrow.cc:1182 converter->Extend(seq, size)
{code}
vs
{{{{ }}}}
{code:java}
In [171]: pa.array(map(tuple, batch[:5]), type=pa_schema)
Out[171]:
<pyarrow.lib.StructArray object at 0x7fd4fb52d660>
-- is_valid: all not null
-- child 0 type: int32
[
1,
1,
1,
1,
1
]
-- child 1 type: int32
[
2173,
943,
892,
30,
337
]{code}
{{{{}}}}
To avoid the overhead of this extra conversion, maybe there are some checks that aren't explicit python type-checks that we can rely on?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)