You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "Fokko (via GitHub)" <gi...@apache.org> on 2023/05/25 21:32:17 UTC

[GitHub] [iceberg] Fokko commented on a diff in pull request #6323: Python: Alter table plumbing and REST support

Fokko commented on code in PR #6323:
URL: https://github.com/apache/iceberg/pull/6323#discussion_r1206024478


##########
python/mkdocs/docs/api.md:
##########
@@ -241,52 +146,88 @@ catalog.create_table(
 )
 ```
 
-Which returns a newly created table:
+## Altering the table metadata
+
+Using the Python API you can alter table metadata.
+
+### Update the schema
+
+Add a new field to the table:
+
+```python
+from pyiceberg.schema import Schema
+from pyiceberg.types import (
+    BooleanType,
+    DoubleType,
+    IntegerType,
+    NestedField,
+    StringType,
+    TimestampType,
+)
+
+schema = Schema(
+    NestedField(field_id=1, name="str", field_type=StringType(), required=False),
+    NestedField(field_id=2, name="int", field_type=IntegerType(), required=True),
+    NestedField(field_id=3, name="bool", field_type=BooleanType(), required=False),
+    NestedField(
+        field_id=4, name="datetime", field_type=TimestampType(), required=False
+    ),
+    # Add a new column to the table
+    NestedField(field_id=5, name="double", field_type=DoubleType(), required=False),
+)
+
+table = table.alter().set_schema(schema).commit()

Review Comment:
   We could also add another API next to it that will allow users to easily extend the current schema. `set_schema` will just override the schema as the name implies. It is Python, so people can also script this on top of the `Schema`. I've removed it for now.



##########
python/mkdocs/docs/api.md:
##########
@@ -241,52 +146,88 @@ catalog.create_table(
 )
 ```
 
-Which returns a newly created table:
+## Altering the table metadata
+
+Using the Python API you can alter table metadata.
+
+### Update the schema
+
+Add a new field to the table:
+
+```python
+from pyiceberg.schema import Schema
+from pyiceberg.types import (
+    BooleanType,
+    DoubleType,
+    IntegerType,
+    NestedField,
+    StringType,
+    TimestampType,
+)
+
+schema = Schema(
+    NestedField(field_id=1, name="str", field_type=StringType(), required=False),
+    NestedField(field_id=2, name="int", field_type=IntegerType(), required=True),
+    NestedField(field_id=3, name="bool", field_type=BooleanType(), required=False),
+    NestedField(
+        field_id=4, name="datetime", field_type=TimestampType(), required=False
+    ),
+    # Add a new column to the table
+    NestedField(field_id=5, name="double", field_type=DoubleType(), required=False),
+)
+
+table = table.alter().set_schema(schema).commit()
+```
+
+### Update the partition spec
+
+Updates the partition spec that will be applied for all new data that's being added to the table.
 
 ```python
-Table(
-    identifier=('default', 'bids'),
-    metadata_location='/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/bids//metadata/00000-c8cd93ab-f784-474d-a167-b1a86b05195f.metadata.json',
-    metadata=TableMetadataV2(
-        location='/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/bids/',
-        table_uuid=UUID('38d4cb39-4945-4bf2-b374-984b5c4984d2'),
-        last_updated_ms=1661847562069,
-        last_column_id=4,
-        schemas=[
-            Schema(
-                NestedField(field_id=1, name='datetime', field_type=TimestampType(), required=False),
-                NestedField(field_id=2, name='bid', field_type=DoubleType(), required=False),
-                NestedField(field_id=3, name='ask', field_type=DoubleType(), required=False),
-                NestedField(field_id=4, name='symbol', field_type=StringType(), required=False)),
-                schema_id=1,
-                identifier_field_ids=[])
-        ],
-        current_schema_id=1,
-        partition_specs=[
-            PartitionSpec(
-                PartitionField(source_id=1, field_id=1000, transform=DayTransform(), name='datetime_day'),))
-        ],
-        default_spec_id=0,
-        last_partition_id=1000,
-        properties={},
-        current_snapshot_id=None,
-        snapshots=[],
-        snapshot_log=[],
-        metadata_log=[],
-        sort_orders=[
-            SortOrder(order_id=1, fields=[SortField(source_id=4, transform=IdentityTransform(), direction=SortDirection.ASC, null_order=NullOrder.NULLS_FIRST)])
-        ],
-        default_sort_order_id=1,
-        refs={},
-        format_version=2,
-        last_sequence_number=0
+from pyiceberg.partitioning import PartitionField, PartitionSpec
+from pyiceberg.transforms import DayTransform
+
+spec = PartitionSpec(
+    PartitionField(
+        source_id=4, field_id=1000, transform=DayTransform(), name="datetime_day"
     )
 )
+
+table = table.alter().set_partition_spec(spec).commit()

Review Comment:
   Removed.



##########
python/mkdocs/docs/api.md:
##########
@@ -241,52 +146,88 @@ catalog.create_table(
 )
 ```
 
-Which returns a newly created table:
+## Altering the table metadata
+
+Using the Python API you can alter table metadata.
+
+### Update the schema
+
+Add a new field to the table:
+
+```python
+from pyiceberg.schema import Schema
+from pyiceberg.types import (
+    BooleanType,
+    DoubleType,
+    IntegerType,
+    NestedField,
+    StringType,
+    TimestampType,
+)
+
+schema = Schema(
+    NestedField(field_id=1, name="str", field_type=StringType(), required=False),
+    NestedField(field_id=2, name="int", field_type=IntegerType(), required=True),
+    NestedField(field_id=3, name="bool", field_type=BooleanType(), required=False),
+    NestedField(
+        field_id=4, name="datetime", field_type=TimestampType(), required=False
+    ),
+    # Add a new column to the table
+    NestedField(field_id=5, name="double", field_type=DoubleType(), required=False),
+)
+
+table = table.alter().set_schema(schema).commit()
+```
+
+### Update the partition spec
+
+Updates the partition spec that will be applied for all new data that's being added to the table.
 
 ```python
-Table(
-    identifier=('default', 'bids'),
-    metadata_location='/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/bids//metadata/00000-c8cd93ab-f784-474d-a167-b1a86b05195f.metadata.json',
-    metadata=TableMetadataV2(
-        location='/Users/fokkodriesprong/Desktop/docker-spark-iceberg/wh/bids/',
-        table_uuid=UUID('38d4cb39-4945-4bf2-b374-984b5c4984d2'),
-        last_updated_ms=1661847562069,
-        last_column_id=4,
-        schemas=[
-            Schema(
-                NestedField(field_id=1, name='datetime', field_type=TimestampType(), required=False),
-                NestedField(field_id=2, name='bid', field_type=DoubleType(), required=False),
-                NestedField(field_id=3, name='ask', field_type=DoubleType(), required=False),
-                NestedField(field_id=4, name='symbol', field_type=StringType(), required=False)),
-                schema_id=1,
-                identifier_field_ids=[])
-        ],
-        current_schema_id=1,
-        partition_specs=[
-            PartitionSpec(
-                PartitionField(source_id=1, field_id=1000, transform=DayTransform(), name='datetime_day'),))
-        ],
-        default_spec_id=0,
-        last_partition_id=1000,
-        properties={},
-        current_snapshot_id=None,
-        snapshots=[],
-        snapshot_log=[],
-        metadata_log=[],
-        sort_orders=[
-            SortOrder(order_id=1, fields=[SortField(source_id=4, transform=IdentityTransform(), direction=SortDirection.ASC, null_order=NullOrder.NULLS_FIRST)])
-        ],
-        default_sort_order_id=1,
-        refs={},
-        format_version=2,
-        last_sequence_number=0
+from pyiceberg.partitioning import PartitionField, PartitionSpec
+from pyiceberg.transforms import DayTransform
+
+spec = PartitionSpec(
+    PartitionField(
+        source_id=4, field_id=1000, transform=DayTransform(), name="datetime_day"
     )
 )
+
+table = table.alter().set_partition_spec(spec).commit()
+```
+
+### Update the sort order
+
+Updates the sort order of the table.
+
+```python
+from pyiceberg.table.sorting import SortOrder, SortField
+from pyiceberg.transforms import IdentityTransform
+
+order = SortOrder(SortField(source_id=2, transform=IdentityTransform()))
+
+table = table.alter().set_sort_order(order).commit()
+```
+
+### Update the properties
+
+Add, update and remove properties:
+
+```python
+assert table.properties == {}
+
+table = table.alter().set_properties(abc="def").commit()
+
+assert table.properties == {"abc": "def"}
+
+table = table.alter().unset_properties("abc").commit()

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org