You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Alenka Frim (Jira)" <ji...@apache.org> on 2022/10/17 07:03:00 UTC

[jira] [Updated] (ARROW-16603) [Python] pyarrow.json.read_json ignores nullable=False in explicit_schema parse_options

     [ https://issues.apache.org/jira/browse/ARROW-16603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alenka Frim updated ARROW-16603:
--------------------------------
    Description: 
Reproducible example:
{code:python}
import json
import pyarrow.json as pj
import pyarrow as pa

s = {"id": "value", "nested": {"value": 1}}

with open("issue.json", "w") as write_file:
    json.dump(s, write_file, indent=4)

schema = pa.schema([
    pa.field("id", pa.string(), nullable=False),
    pa.field("nested", pa.struct([pa.field("value", pa.int64(), nullable=False)]))
])

table = pj.read_json('issue.json', parse_options=pj.ParseOptions(explicit_schema=schema))

print(schema)
# id: string not null
# nested: struct<value: int64 not null>
#   child 0, value: int64 not null 
print(table.schema)
# id: string
# nested: struct<value: int64>
#   child 0, value: int64{code}

  was:
Reproducible example:
{code:python}
import json
import pyarrow.json as pj
import pyarrow as pa

s = {"id": "value", "nested": {"value": 1}}

with open("issue.json", "w") as write_file:
    json.dump(s, write_file, indent=4)

schema = pa.schema([
    pa.field("id", pa.string(), nullable=False),
    pa.field("nested", pa.struct([pa.field("value", pa.int64(), nullable=False)]))
])

table = pj.read_json('issue.json', parse_options=pj.ParseOptions(explicit_schema=schema))

print(schema)
print(table.schema)
{code}


> [Python] pyarrow.json.read_json ignores nullable=False in explicit_schema parse_options
> ---------------------------------------------------------------------------------------
>
>                 Key: ARROW-16603
>                 URL: https://issues.apache.org/jira/browse/ARROW-16603
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Alenka Frim
>            Priority: Major
>
> Reproducible example:
> {code:python}
> import json
> import pyarrow.json as pj
> import pyarrow as pa
> s = {"id": "value", "nested": {"value": 1}}
> with open("issue.json", "w") as write_file:
>     json.dump(s, write_file, indent=4)
> schema = pa.schema([
>     pa.field("id", pa.string(), nullable=False),
>     pa.field("nested", pa.struct([pa.field("value", pa.int64(), nullable=False)]))
> ])
> table = pj.read_json('issue.json', parse_options=pj.ParseOptions(explicit_schema=schema))
> print(schema)
> # id: string not null
> # nested: struct<value: int64 not null>
> #   child 0, value: int64 not null 
> print(table.schema)
> # id: string
> # nested: struct<value: int64>
> #   child 0, value: int64{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)