You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ryan Weisman (Jira)" <ji...@apache.org> on 2022/09/23 20:07:00 UTC
[jira] [Updated] (ARROW-17835) pyarrow.json.read_json ignores nullable=True on fields with non-nullable subfields in explicit_schema parse_options
[ https://issues.apache.org/jira/browse/ARROW-17835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan Weisman updated ARROW-17835:
---------------------------------
Description:
*Summary:*
The parser seems to be ignoring the "nullable" flag on the parent field.
This behavior may be related to ARROW-16603, but that issue covers the opposite case - failing to include the "not null" constraints in the schema of the output table.
*Reproducible example:*
{code:java}
import json
import pyarrow as pa
interior_struct = pa.struct([
pa.field(name='some_information', type=pa.int64(), nullable=False)
])
sample_schema = pa.schema([
pa.field(name='my_struct', type=interior_struct, nullable=True)
])
# encode an empty JSON object -
# the issue persists regardless of the input JSON,
# so long as there is no field named "egg_carton" in the input
sample_json_bytes = io.BytesIO('{}'.encode())
table = pa.json.read_json(
input_file=sample_json_bytes,
parse_options=pa.json.ParseOptions(explicit_schema=sample_schema)
)
print(table
{code}
*Expected output:*
Table containing one column, with name "my_struct" and type "interior_struct", whose value is null.{*}{{*}}
*Actual output:*
{code:java}
ArrowInvalid: JSON parse error: a required field was null{code}
was:
*Summary:*
The parser seems to be ignoring the "nullable" flag on the parent field.
This behavior may be related to ARROW-16603, but that issue covers the opposite case - failing to include the "not null" constraints in the schema of the output table.
*Reproducible example:*
import json
import pyarrow as pa
interior_struct = pa.struct([
pa.field(name='some_information', type=pa.int64(), nullable=False)
])
sample_schema = pa.schema([
pa.field(name='my_struct', type=interior_struct, nullable=True)
])
# encode an empty JSON object -
# the issue persists regardless of the input JSON,
# so long as there is no field named "egg_carton" in the input
sample_json_bytes = io.BytesIO('{}'.encode())
table = pa.json.read_json(
input_file=sample_json_bytes,
parse_options=pa.json.ParseOptions(explicit_schema=sample_schema)
)
print(table)
*Expected output:*
Table containing one column, with name "my_struct" and type "interior_struct", whose value is null.{*}{*}
*Actual output:*
{code:java}
ArrowInvalid: JSON parse error: a required field was null{code}
> pyarrow.json.read_json ignores nullable=True on fields with non-nullable subfields in explicit_schema parse_options
> -------------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-17835
> URL: https://issues.apache.org/jira/browse/ARROW-17835
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 7.0.0
> Reporter: Ryan Weisman
> Priority: Major
>
> *Summary:*
> The parser seems to be ignoring the "nullable" flag on the parent field.
> This behavior may be related to ARROW-16603, but that issue covers the opposite case - failing to include the "not null" constraints in the schema of the output table.
> *Reproducible example:*
> {code:java}
> import json
> import pyarrow as pa
> interior_struct = pa.struct([
> pa.field(name='some_information', type=pa.int64(), nullable=False)
> ])
> sample_schema = pa.schema([
> pa.field(name='my_struct', type=interior_struct, nullable=True)
> ])
> # encode an empty JSON object -
> # the issue persists regardless of the input JSON,
> # so long as there is no field named "egg_carton" in the input
> sample_json_bytes = io.BytesIO('{}'.encode())
> table = pa.json.read_json(
> input_file=sample_json_bytes,
> parse_options=pa.json.ParseOptions(explicit_schema=sample_schema)
> )
> print(table
> {code}
> *Expected output:*
> Table containing one column, with name "my_struct" and type "interior_struct", whose value is null.{*}{{*}}
> *Actual output:*
> {code:java}
> ArrowInvalid: JSON parse error: a required field was null{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)