You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Felipe Santos (Jira)" <ji...@apache.org> on 2020/06/02 22:47:00 UTC
[jira] [Created] (ARROW-9020) read_json won't respect
explicit_schema in parse_options
Felipe Santos created ARROW-9020:
------------------------------------
Summary: read_json won't respect explicit_schema in parse_options
Key: ARROW-9020
URL: https://issues.apache.org/jira/browse/ARROW-9020
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.17.1
Environment: CPython 3.8.2, MacOS Mojave 10.14.6
Reporter: Felipe Santos
Fix For: 0.17.1
I am trying to read a json file using an explicit schema but it looks like the schema is ignored. Moreover, if the my schema contains a field not present in the json file, then the output table contains all the fields in the json file plus the fields of my schema not found in the file.
A minimal example:
{code:python}
import pyarrow as pa
from pyarrow import json
# allowing for type inference
print(json.read_json('tmp.json'))
# prints:
# pyarrow.Table
# foo: string
# baz: string
# using an explicit schema that would read only "foo"
schema = pa.schema([('foo', pa.string())])
print(json.read_json('tmp.json', parse_options=json.ParseOptions(explicit_schema=schema)))
# prints:
# pyarrow.Table
# foo: string
# baz: string
# using an explicit schema that would read only "not_a_field",
# which is not present in the json file
schema = pa.schema([('not_a_field', pa.string())])
print(json.read_json('tmp.json', parse_options=json.ParseOptions(explicit_schema=schema)))
# prints:
# pyarrow.Table
# not_a_field: string
# foo: string
# baz: string
{code}
And the tmp.json file looks like:
{code:json}
{"foo": "bar", "baz": "1"}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)