You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "eriteia (via GitHub)" <gi...@apache.org> on 2024/03/19 22:06:56 UTC

[I] Can Struct with "non-nullable" nested attribute be nullable in pyarrow.json.read_json ? [arrow]

eriteia opened a new issue, #40681:
URL: https://github.com/apache/arrow/issues/40681

   ### Describe the enhancement requested
   
   ```
   import pyarrow as pa
   
   struct_type = pa.struct([
       ('dimensionCm', pa.int64(), False)  # Non-nullable field within the struct, the code works when set to True
   ])
   
   schema = pa.schema([
       ('id', pa.string(), False),
       ('heightCm', struct_type, True)  # Struct field marked as nullable
   ])
   
   table = pj.read_json("test.json", parse_options=pj.ParseOptions(explicit_schema=schema))
   ```
   ```
   // Json payload example
   {"id": "test1", "heightCm": {"dimensionCm": 10}}
   {"id": "test2", "heightCm": {"dimensionCm": 20}}
   {"id": "test3"}
   ```
   
   A Struct field with nested "not null" field can't parse json files and return "ArrowInvalid: JSON parse error: a required field was null" error.
   But if the nested attributes is nullable, then no error is returned.
   
   Can it be changed to allow Struct type to be used to parse Json file even though it has non-nullable inner attribute inside?
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org