You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "syun64 (via GitHub)" <gi...@apache.org> on 2024/04/12 21:24:19 UTC

[PR] Add Partitions Metadata Table [iceberg-python]

syun64 opened a new pull request, #603:
URL: https://github.com/apache/iceberg-python/pull/603

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add Partitions Metadata Table [iceberg-python]

Posted by "syun64 (via GitHub)" <gi...@apache.org>.
syun64 commented on code in PR #603:
URL: https://github.com/apache/iceberg-python/pull/603#discussion_r1564204302


##########
pyiceberg/table/__init__.py:
##########
@@ -3410,6 +3411,94 @@ def _readable_metrics_struct(bound_type: PrimitiveType) -> pa.StructType:
             schema=entries_schema,
         )
 
+    def partitions(self) -> "pa.Table":
+        import pyarrow as pa
+
+        from pyiceberg.io.pyarrow import schema_to_pyarrow
+
+        table_schema = pa.schema([
+            pa.field('record_count', pa.int64(), nullable=False),
+            pa.field('file_count', pa.int32(), nullable=False),
+            pa.field('total_data_file_size_in_bytes', pa.int64(), nullable=False),
+            pa.field('position_delete_record_count', pa.int64(), nullable=False),
+            pa.field('position_delete_file_count', pa.int32(), nullable=False),
+            pa.field('equality_delete_record_count', pa.int64(), nullable=False),
+            pa.field('equality_delete_file_count', pa.int32(), nullable=False),
+            pa.field('last_updated_at', pa.timestamp(unit='ms'), nullable=True),
+            pa.field('last_updated_snapshot_id', pa.int64(), nullable=True),
+        ])
+
+        partition_record = self.tbl.metadata.specs_struct()
+        has_partitions = len(partition_record.fields) > 0
+
+        if has_partitions:

Review Comment:
   Similar to how we did with the entries table, should we opt-in to keep consistent metadata table schema for both unpartitioned and partitioned tables?



##########
pyiceberg/table/__init__.py:
##########
@@ -3410,6 +3411,94 @@ def _readable_metrics_struct(bound_type: PrimitiveType) -> pa.StructType:
             schema=entries_schema,
         )
 
+    def partitions(self) -> "pa.Table":

Review Comment:
   TODO: Waiting on https://github.com/apache/iceberg-python/pull/599 to merge to introduce TimeTravel



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add Partitions Metadata Table [iceberg-python]

Posted by "syun64 (via GitHub)" <gi...@apache.org>.
syun64 commented on code in PR #603:
URL: https://github.com/apache/iceberg-python/pull/603#discussion_r1564235822


##########
pyiceberg/table/__init__.py:
##########
@@ -3410,6 +3411,94 @@ def _readable_metrics_struct(bound_type: PrimitiveType) -> pa.StructType:
             schema=entries_schema,
         )
 
+    def partitions(self) -> "pa.Table":
+        import pyarrow as pa
+
+        from pyiceberg.io.pyarrow import schema_to_pyarrow
+
+        table_schema = pa.schema([
+            pa.field('record_count', pa.int64(), nullable=False),
+            pa.field('file_count', pa.int32(), nullable=False),
+            pa.field('total_data_file_size_in_bytes', pa.int64(), nullable=False),
+            pa.field('position_delete_record_count', pa.int64(), nullable=False),
+            pa.field('position_delete_file_count', pa.int32(), nullable=False),
+            pa.field('equality_delete_record_count', pa.int64(), nullable=False),
+            pa.field('equality_delete_file_count', pa.int32(), nullable=False),
+            pa.field('last_updated_at', pa.timestamp(unit='ms'), nullable=True),
+            pa.field('last_updated_snapshot_id', pa.int64(), nullable=True),
+        ])
+
+        partition_record = self.tbl.metadata.specs_struct()
+        has_partitions = len(partition_record.fields) > 0
+
+        if has_partitions:

Review Comment:
   Sounds good @Fokko . That makes sense. Just wanted to double check if we wanted to carry this logic over.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add Partitions Metadata Table [iceberg-python]

Posted by "HonahX (via GitHub)" <gi...@apache.org>.
HonahX commented on PR #603:
URL: https://github.com/apache/iceberg-python/pull/603#issuecomment-2058317984

   Merged! Thanks @syun64 for working on this and thanks @Fokko for reviewing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add Partitions Metadata Table [iceberg-python]

Posted by "syun64 (via GitHub)" <gi...@apache.org>.
syun64 commented on code in PR #603:
URL: https://github.com/apache/iceberg-python/pull/603#discussion_r1564204302


##########
pyiceberg/table/__init__.py:
##########
@@ -3410,6 +3411,94 @@ def _readable_metrics_struct(bound_type: PrimitiveType) -> pa.StructType:
             schema=entries_schema,
         )
 
+    def partitions(self) -> "pa.Table":
+        import pyarrow as pa
+
+        from pyiceberg.io.pyarrow import schema_to_pyarrow
+
+        table_schema = pa.schema([
+            pa.field('record_count', pa.int64(), nullable=False),
+            pa.field('file_count', pa.int32(), nullable=False),
+            pa.field('total_data_file_size_in_bytes', pa.int64(), nullable=False),
+            pa.field('position_delete_record_count', pa.int64(), nullable=False),
+            pa.field('position_delete_file_count', pa.int32(), nullable=False),
+            pa.field('equality_delete_record_count', pa.int64(), nullable=False),
+            pa.field('equality_delete_file_count', pa.int32(), nullable=False),
+            pa.field('last_updated_at', pa.timestamp(unit='ms'), nullable=True),
+            pa.field('last_updated_snapshot_id', pa.int64(), nullable=True),
+        ])
+
+        partition_record = self.tbl.metadata.specs_struct()
+        has_partitions = len(partition_record.fields) > 0
+
+        if has_partitions:

Review Comment:
   Similar to how we did with the entries table, should we opt-in to keep a consistent table schema for both unpartitioned and partitioned tables?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add Partitions Metadata Table [iceberg-python]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #603:
URL: https://github.com/apache/iceberg-python/pull/603#discussion_r1564221753


##########
pyiceberg/table/__init__.py:
##########
@@ -3410,6 +3411,94 @@ def _readable_metrics_struct(bound_type: PrimitiveType) -> pa.StructType:
             schema=entries_schema,
         )
 
+    def partitions(self) -> "pa.Table":
+        import pyarrow as pa
+
+        from pyiceberg.io.pyarrow import schema_to_pyarrow
+
+        table_schema = pa.schema([
+            pa.field('record_count', pa.int64(), nullable=False),
+            pa.field('file_count', pa.int32(), nullable=False),
+            pa.field('total_data_file_size_in_bytes', pa.int64(), nullable=False),
+            pa.field('position_delete_record_count', pa.int64(), nullable=False),
+            pa.field('position_delete_file_count', pa.int32(), nullable=False),
+            pa.field('equality_delete_record_count', pa.int64(), nullable=False),
+            pa.field('equality_delete_file_count', pa.int32(), nullable=False),
+            pa.field('last_updated_at', pa.timestamp(unit='ms'), nullable=True),
+            pa.field('last_updated_snapshot_id', pa.int64(), nullable=True),
+        ])
+
+        partition_record = self.tbl.metadata.specs_struct()
+        has_partitions = len(partition_record.fields) > 0
+
+        if has_partitions:

Review Comment:
   I don't think that's needed. The schema will change anyway based on the partition spec.
   
   I've checked and we have this behavior in Spark as well:
   
   <img width="1172" alt="image" src="https://github.com/apache/iceberg-python/assets/1134248/948d636a-d292-4f66-a291-cb76cf715c12">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Add Partitions Metadata Table [iceberg-python]

Posted by "HonahX (via GitHub)" <gi...@apache.org>.
HonahX merged PR #603:
URL: https://github.com/apache/iceberg-python/pull/603


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org