You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/09 13:01:25 UTC

[GitHub] [arrow-rs] tustvold opened a new issue, #1681: Incorrect Repeated Field Schema Inference

tustvold opened a new issue, #1681:
URL: https://github.com/apache/arrow-rs/issues/1681

   **Describe the bug**
   
   The schema inference logic in parquet does not infer the correct nullability for nested types.
   
   For example
   
   ```
   let message_type = "
   message test_schema {
     OPTIONAL INT32 leaf1;
     REPEATED GROUP outerGroup {
       OPTIONAL INT32 leaf2;
       REPEATED GROUP innerGroup {
         OPTIONAL INT32 leaf3;
       }
     }
   }
   ";
   let parquet_group_type = parse_message_type(message_type).unwrap();
   let parquet_schema = SchemaDescriptor::new(Arc::new(parquet_group_type));
   let converted_arrow_schema =
   parquet_to_arrow_schema(&parquet_schema, None).unwrap();
   ```
   
   Will infer innerGroup and outerGroup as nullable lists with nullable elements, when they are neither.
   
   **To Reproduce**
   
   See test
   
   **Expected behavior**
   
   The nullability should be inferred correctly
   
   **Additional context**
   
   This has likely been hidden by the lack of support for repeated fields - https://github.com/apache/arrow-rs/issues/1680
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb closed issue #1681: Incorrect Repeated Field Schema Inference

Posted by GitBox <gi...@apache.org>.
alamb closed issue #1681: Incorrect Repeated Field Schema Inference
URL: https://github.com/apache/arrow-rs/issues/1681


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org