You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "amogh-jahagirdar (via GitHub)" <gi...@apache.org> on 2023/01/24 00:34:03 UTC
[GitHub] [iceberg] amogh-jahagirdar opened a new pull request, #6654: Python: Check if optional file kv metadata is None before reading Iceberg Schema
amogh-jahagirdar opened a new pull request, #6654:
URL: https://github.com/apache/iceberg/pull/6654
This is an interim solution for https://github.com/apache/iceberg/issues/6647. Parquet file k/v metadata is optional and not required to be written as per the Parquet spec https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L1022
The real fix will be to determine the Iceberg schema from the Parquet file (in this case we don't care about any external definitions, it'll be just from the parquet schema) https://github.com/apache/iceberg/issues/6505
If the real solution is ready for PR soon, we can just close this. But I was thinking in the interim it would be useful so that users know it's a known issue until we release the fix.
CC: @Fokko @JonasJ-ap
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6654: Python: Check if optional Parquet kv metadata is None before reading Iceberg Schema
Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #6654:
URL: https://github.com/apache/iceberg/pull/6654#discussion_r1085739000
##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -505,7 +505,9 @@ def project_table(
# Get the schema
with fs.open_input_file(path) as fout:
parquet_schema = pq.read_schema(fout)
- schema_raw = parquet_schema.metadata.get(ICEBERG_SCHEMA)
+ schema_raw = None
+ if parquet_schema.metadata is not None:
Review Comment:
Oh sweet, I never knew about this operator, that's pretty nifty :) Thanks @Fokko !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar commented on pull request #6654: Python: Check if optional Parquet kv metadata is None before reading Iceberg Schema
Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on PR #6654:
URL: https://github.com/apache/iceberg/pull/6654#issuecomment-1402436676
Thanks @Fokko @jackye1995 for the reviews! I'll follow up with @JonasJ-ap about the long term solution
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] Fokko commented on a diff in pull request #6654: Python: Check if optional Parquet kv metadata is None before reading Iceberg Schema
Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #6654:
URL: https://github.com/apache/iceberg/pull/6654#discussion_r1085731519
##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -505,7 +505,9 @@ def project_table(
# Get the schema
with fs.open_input_file(path) as fout:
parquet_schema = pq.read_schema(fout)
- schema_raw = parquet_schema.metadata.get(ICEBERG_SCHEMA)
+ schema_raw = None
+ if parquet_schema.metadata is not None:
Review Comment:
You could also leverage the Walrus operator here:
```suggestion
if metadata := parquet_schema.metadata:
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] Fokko merged pull request #6654: Python: Check if optional Parquet kv metadata is None before reading Iceberg Schema
Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko merged PR #6654:
URL: https://github.com/apache/iceberg/pull/6654
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org