You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "amogh-jahagirdar (via GitHub)" <gi...@apache.org> on 2023/01/24 00:34:03 UTC

[GitHub] [iceberg] amogh-jahagirdar opened a new pull request, #6654: Python: Check if optional file kv metadata is None before reading Iceberg Schema

amogh-jahagirdar opened a new pull request, #6654:
URL: https://github.com/apache/iceberg/pull/6654

   This is an interim solution for https://github.com/apache/iceberg/issues/6647. Parquet file k/v metadata is optional and not required to be written as per the Parquet spec https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L1022
   
   The real fix will be to determine the Iceberg schema from the Parquet file (in this case we don't care about any external definitions, it'll be just from the parquet schema) https://github.com/apache/iceberg/issues/6505
   
   If the real solution is ready for PR soon, we can just close this. But I was thinking in the interim it would be useful so that users know it's a known issue until we release the fix.
   
   CC: @Fokko @JonasJ-ap 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6654: Python: Check if optional Parquet kv metadata is None before reading Iceberg Schema

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #6654:
URL: https://github.com/apache/iceberg/pull/6654#discussion_r1085739000


##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -505,7 +505,9 @@ def project_table(
         # Get the schema
         with fs.open_input_file(path) as fout:
             parquet_schema = pq.read_schema(fout)
-            schema_raw = parquet_schema.metadata.get(ICEBERG_SCHEMA)
+            schema_raw = None
+            if parquet_schema.metadata is not None:

Review Comment:
   Oh sweet, I never knew about this operator, that's pretty nifty :) Thanks @Fokko ! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on pull request #6654: Python: Check if optional Parquet kv metadata is None before reading Iceberg Schema

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on PR #6654:
URL: https://github.com/apache/iceberg/pull/6654#issuecomment-1402436676

   Thanks @Fokko @jackye1995  for the reviews! I'll follow up with @JonasJ-ap about the long term solution 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6654: Python: Check if optional Parquet kv metadata is None before reading Iceberg Schema

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #6654:
URL: https://github.com/apache/iceberg/pull/6654#discussion_r1085731519


##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -505,7 +505,9 @@ def project_table(
         # Get the schema
         with fs.open_input_file(path) as fout:
             parquet_schema = pq.read_schema(fout)
-            schema_raw = parquet_schema.metadata.get(ICEBERG_SCHEMA)
+            schema_raw = None
+            if parquet_schema.metadata is not None:

Review Comment:
   You could also leverage the Walrus operator here:
   ```suggestion
               if metadata := parquet_schema.metadata:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko merged pull request #6654: Python: Check if optional Parquet kv metadata is None before reading Iceberg Schema

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko merged PR #6654:
URL: https://github.com/apache/iceberg/pull/6654


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org