You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/02/03 14:52:51 UTC

[GitHub] [arrow] wgtmac commented on pull request #33955: GH-33954: [C++][Parquet] Preserve field-id for nested type

wgtmac commented on PR #33955:
URL: https://github.com/apache/arrow/pull/33955#issuecomment-1415983359

   > > This looks good to me. @jorisvandenbossche Do you remember if there were any problems with Parquet field ids?
   > 
   > I remember that we had previous PRs where we wondered what to do with it. Your comment at this PR summarizes how we currently handle it ([#10289 (comment)](https://github.com/apache/arrow/pull/10289#issuecomment-839634828)):
   > 
   > > Based on my understanding, it seems that we should:
   > > 
   > > * when reading from Parquet, reflect Parquet field_ids (if any) under the `PARQUET:field_id` metadata key
   > > * when writing to Parquet, generate Parquet field_ids from the `PARQUET:field_id` metadata key (if present)
   > > * not attempt to auto-generate any field_ids if they are not present in metadata
   > 
   > So correctly preserving the field-ids for nested types seems to follow that.
   > 
   > @wgtmac the current test does a full roundtrip of the arrow<->parquet schemas, so I assume this indirectly also covers that we correctly write them to Parquet if the information is present in the field metadata of an Arrow schema?
   
   Yes, the current fix preserves all field-ids that exist in the `PARQUET:field_id` metadata key for all data types. This is required if we intend to write a parquet file which contains nested types to follow what the Apache Iceberg expects. @jorisvandenbossche 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org