You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "taichi kato (Jira)" <ji...@apache.org> on 2022/03/19 08:27:00 UTC
[jira] [Created] (ARROW-15977) [Python] Can't ignore the overflow error.
taichi kato created ARROW-15977:
-----------------------------------
Summary: [Python] Can't ignore the overflow error.
Key: ARROW-15977
URL: https://issues.apache.org/jira/browse/ARROW-15977
Project: Apache Arrow
Issue Type: Bug
Components: Python
Reporter: taichi kato
I know that the argument safe=False in pa.Table.from_pandas ignores overflow errors, but it does not ignore overflow in list or in struct.
It works.{{{}{}}}
{code:java}
{code}
{{import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
import json
test_json = [
\{
"name": "taro",
"id": 3046682132,
"points": [2, 2, 2],
"groups": {
"group_name": "baseball",
"group_id": 1234
}
},
\{
"name": "taro",
"id": 1234,
}
]
schema = pa.schema([
pa.field('name', pa.string()),
pa.field('id', pa.int32()),
pa.field("points", pa.list_(pa.int32())),
pa.field('groups', pa.struct([
pa.field("group_name", pa.string()),
pa.field("group_id", pa.int32()),
])),
])
writer = pq.ParquetWriter('test_schema.parquet', schema=schema)
df = pd.DataFrame(test_json)
table = pa.Table.from_pandas(df, schema=schema, safe=False)
writer.write_table(table)
writer.close()
table = pq.read_table("test_schema.parquet")
print(table) }}
{code:java}
{code}
{{name: [["taro","taro"]]
id: [[-1248285164,1234]]
points: [[[2,2,2],null]]
groups: [ – is_valid: [
true,
false
] – child 0 type: string
[
"baseball",
null
] – child 1 type: int32
[
1234,
null
]] }}
However, the following two do not work.{{{}{}}}
{code:java}
test_json = [
{
"name": "taro",
"id": 2,
"points": [2, 3046682132, 2],
"groups": {
"group_name": "baseball",
"group_id": 1234
}
},
{
"name": "taro",
"id": 1234,
}
]{code}
{code:java}
Traceback (most recent call last):
File "test_pyarrow.py", line 35, in <module>
table = pa.Table.from_pandas(df, schema=schema, safe=False)
File "pyarrow/table.pxi", line 1782, in pyarrow.lib.Table.from_pandas
File "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 594, in dataframe_to_arrays
arrays = [convert_column(c, f)
File "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 594, in <listcomp>
arrays = [convert_column(c, f)
File "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 581, in convert_column
raise e
File "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 575, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: ('Value 3046682132 too large to fit in C integer type', 'Conversion failed for column points with type object') {code}
{code:java}
test_json = [
{
"name": "taro",
"id": 2,
"points": [2, 2, 2],
"groups": {
"group_name": "baseball",
"group_id": 3046682132
}
},
{
"name": "taro",
"id": 1234,
}
] {code}
{code:java}
Traceback (most recent call last):
File "test_pyarrow.py", line 35, in <module>
table = pa.Table.from_pandas(df, schema=schema, safe=False)
File "pyarrow/table.pxi", line 1782, in pyarrow.lib.Table.from_pandas
File "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 594, in dataframe_to_arrays
arrays = [convert_column(c, f)
File "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 594, in <listcomp>
arrays = [convert_column(c, f)
File "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 581, in convert_column
raise e
File "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 575, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: ('Value 3046682132 too large to fit in C integer type', 'Conversion failed for column groups with type object') {code}
{{}}
Could you please fix this bug?
pyarrow==7.0.0
--
This message was sent by Atlassian Jira
(v8.20.1#820001)