You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/02/03 14:43:50 UTC

[GitHub] [arrow] jorisvandenbossche commented on pull request #33955: GH-33954: [C++][Parquet] Preserve field-id for nested type

jorisvandenbossche commented on PR #33955:
URL: https://github.com/apache/arrow/pull/33955#issuecomment-1415969387

   > This looks good to me. @jorisvandenbossche Do you remember if there were any problems with Parquet field ids?
   
   I remember that we had previous PRs where we wondered what to do with it. Your comment at this PR summarizes how we currently handle it (https://github.com/apache/arrow/pull/10289#issuecomment-839634828):
   
   > Based on my understanding, it seems that we should:
   > * when reading from Parquet, reflect Parquet field_ids (if any) under the `PARQUET:field_id` metadata key
   > * when writing to Parquet, generate Parquet field_ids from the `PARQUET:field_id` metadata key (if present)
   > * not attempt to auto-generate any field_ids if they are not present in metadata
   
   So correctly preserving the field-ids for nested types seems to follow that. 
   
   @wgtmac the current test does a full roundtrip of the arrow<->parquet schemas, so I assume this indirectly also covers that we correctly write them to Parquet if the information is present in the field metadata of an Arrow schema?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org