You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/05 17:45:22 UTC

[GitHub] [arrow-rs] viirya commented on pull request #1517: Fix reading nested lists from parquet files

viirya commented on PR #1517:
URL: https://github.com/apache/arrow-rs/pull/1517#issuecomment-1089107049

   Interesting, I cannot reproduce the Parquet file by writing a simple Parquet file in test.
   
   The main difference is:
   
   For the original Parquet file, the path to the `name` column is:
   
   ```
    ["table", "table_info", "name"]
   ```
   
   Because `table_info` is a list of struct, so we need the change in this PR.
   
   I use the same message type to write out a Parquet file in test. But when I read it back. For the same `name` column, the column path is different:
   
   ```
   parts: ["arrow_schema", "table_info", "table_info", "name"]
   ```
   
   For the first `table_info`, `get_arrow_field` will simply pick up same name field from arrow schema, which is the list field. Then moving to next `table_info`, it picks the struct field from there. Next part is `name`, and current field is struct, it can get the correct field.
   
   I checked the arrow schema for the Parquet file and the one I wrote in test, they are the same. So I am currently not sure why the column paths for same column `name` could be different for two cases.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org