You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Felipe Santos (Jira)" <ji...@apache.org> on 2020/06/02 22:47:00 UTC

[jira] [Created] (ARROW-9020) read_json won't respect explicit_schema in parse_options

Felipe Santos created ARROW-9020:
------------------------------------

             Summary: read_json won't respect explicit_schema in parse_options
                 Key: ARROW-9020
                 URL: https://issues.apache.org/jira/browse/ARROW-9020
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.17.1
         Environment: CPython 3.8.2, MacOS Mojave 10.14.6
            Reporter: Felipe Santos
             Fix For: 0.17.1


I am trying to read a json file using an explicit schema but it looks like the schema is ignored. Moreover, if the my schema contains a field not present in the json file, then the output table contains all the fields in the json file plus the fields of my schema not found in the file.

A minimal example:
{code:python}
import pyarrow as pa
from pyarrow import json

# allowing for type inference
print(json.read_json('tmp.json'))
# prints:
# pyarrow.Table
# foo: string
# baz: string

# using an explicit schema that would read only "foo"
schema = pa.schema([('foo', pa.string())])
print(json.read_json('tmp.json', parse_options=json.ParseOptions(explicit_schema=schema)))
# prints:
# pyarrow.Table
# foo: string
# baz: string

# using an explicit schema that would read only "not_a_field",
# which is not present in the json file
schema = pa.schema([('not_a_field', pa.string())])
print(json.read_json('tmp.json', parse_options=json.ParseOptions(explicit_schema=schema)))
# prints:
# pyarrow.Table
# not_a_field: string
# foo: string
# baz: string
{code}

And the tmp.json file looks like:
{code:json}
{"foo": "bar", "baz": "1"}

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)