You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/09/01 08:22:21 UTC

[GitHub] [iceberg] Fokko opened a new pull request, #5686: Python: Remove the pre-validators

Fokko opened a new pull request, #5686:
URL: https://github.com/apache/iceberg/pull/5686

   I would like to remove the pre-validators because they are confusing.
   
   Mostly because in the pre-validators the defaults and aliases aren't applied, do we have to check all the permutations of casing. Removing those requires setting defaults, and checking them afterward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko merged pull request #5686: Python: Remove the pre-validators

Posted by GitBox <gi...@apache.org>.
Fokko merged PR #5686:
URL: https://github.com/apache/iceberg/pull/5686


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #5686: Python: Remove the pre-validators

Posted by GitBox <gi...@apache.org>.
Fokko commented on PR #5686:
URL: https://github.com/apache/iceberg/pull/5686#issuecomment-1250625007

   Thanks for the review @dhruv-pratap 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5686: Python: Remove the pre-validators

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5686:
URL: https://github.com/apache/iceberg/pull/5686#discussion_r963099531


##########
python/pyiceberg/table/metadata.py:
##########
@@ -203,24 +210,23 @@ class TableMetadataV1(TableMetadataCommonFields, IcebergBaseModel):
     # because bumping the version should be an explicit operation that is up
     # to the owner of the table.
 
-    @root_validator(pre=True)
+    @root_validator
     def set_v2_compatible_defaults(cls, data: Dict[str, Any]) -> Dict[str, Any]:
         """Sets default values to be compatible with the format v2
 
-        Set some sensible defaults for V1, so we comply with the schema
-        this is in pre=True, meaning that this will be done before validation.
-        We don't want to make the fields optional, since they are required for V2
-
         Args:
             data: The raw arguments when initializing a V1 TableMetadata
 
         Returns:
             The TableMetadata with the defaults applied
         """
-        if data.get("schema") and "schema-id" not in data["schema"]:
-            data["schema"]["schema-id"] = DEFAULT_SCHEMA_ID
-        if data.get("partition-spec") and "last-partition-id" not in data:
-            data["last-partition-id"] = max(spec["field-id"] for spec in data["partition-spec"])
+        # When the schema doesn't have an ID
+        if data.get("schema") and "schema_id" not in data["schema"]:
+            data["schema"]["schema_id"] = DEFAULT_SCHEMA_ID
+
+        if data.get("partition_spec") and "last_partition_id" not in data:
+            data["last_partition_id"] = max(spec["field-id"] for spec in data["partition_spec"])

Review Comment:
   Thanks for asking this, and this is indeed a bug.. And also one of the reasons I don't like the pre-validators. The whole logic doesn't make sense because the `field-id` is on the fields of the spec.
   
   It didn't catch anything because `last_partition_id` is always in `data`. When running with pre-validation it could be empty because the defaults weren't applied.
   
   I've moved this to `construct_partition_specs` and threw in some walrus :=
   ```python
   if partition_specs := data.get(PARTITION_SPECS):
       data["last_partition_id"] = max(spec.last_assigned_field_id for spec in partition_specs)
   ```
   This way we set it every time we construct a table metadata.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5686: Python: Remove the pre-validators

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5686:
URL: https://github.com/apache/iceberg/pull/5686#discussion_r963057581


##########
python/pyiceberg/table/metadata.py:
##########
@@ -203,24 +210,23 @@ class TableMetadataV1(TableMetadataCommonFields, IcebergBaseModel):
     # because bumping the version should be an explicit operation that is up
     # to the owner of the table.
 
-    @root_validator(pre=True)
+    @root_validator
     def set_v2_compatible_defaults(cls, data: Dict[str, Any]) -> Dict[str, Any]:
         """Sets default values to be compatible with the format v2
 
-        Set some sensible defaults for V1, so we comply with the schema
-        this is in pre=True, meaning that this will be done before validation.
-        We don't want to make the fields optional, since they are required for V2
-
         Args:
             data: The raw arguments when initializing a V1 TableMetadata
 
         Returns:
             The TableMetadata with the defaults applied
         """
-        if data.get("schema") and "schema-id" not in data["schema"]:
-            data["schema"]["schema-id"] = DEFAULT_SCHEMA_ID
-        if data.get("partition-spec") and "last-partition-id" not in data:
-            data["last-partition-id"] = max(spec["field-id"] for spec in data["partition-spec"])
+        # When the schema doesn't have an ID
+        if data.get("schema") and "schema_id" not in data["schema"]:
+            data["schema"]["schema_id"] = DEFAULT_SCHEMA_ID
+
+        if data.get("partition_spec") and "last_partition_id" not in data:
+            data["last_partition_id"] = max(spec["field-id"] for spec in data["partition_spec"])

Review Comment:
   This still uses `field-id`. Is that because the key aliases were only applied for this and the partition spec will apply them for those fields?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5686: Python: Remove the pre-validators

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5686:
URL: https://github.com/apache/iceberg/pull/5686#discussion_r963108684


##########
python/pyiceberg/table/metadata.py:
##########
@@ -203,24 +210,23 @@ class TableMetadataV1(TableMetadataCommonFields, IcebergBaseModel):
     # because bumping the version should be an explicit operation that is up
     # to the owner of the table.
 
-    @root_validator(pre=True)
+    @root_validator
     def set_v2_compatible_defaults(cls, data: Dict[str, Any]) -> Dict[str, Any]:
         """Sets default values to be compatible with the format v2
 
-        Set some sensible defaults for V1, so we comply with the schema
-        this is in pre=True, meaning that this will be done before validation.
-        We don't want to make the fields optional, since they are required for V2
-
         Args:
             data: The raw arguments when initializing a V1 TableMetadata
 
         Returns:
             The TableMetadata with the defaults applied
         """
-        if data.get("schema") and "schema-id" not in data["schema"]:
-            data["schema"]["schema-id"] = DEFAULT_SCHEMA_ID
-        if data.get("partition-spec") and "last-partition-id" not in data:
-            data["last-partition-id"] = max(spec["field-id"] for spec in data["partition-spec"])
+        # When the schema doesn't have an ID
+        if data.get("schema") and "schema_id" not in data["schema"]:
+            data["schema"]["schema_id"] = DEFAULT_SCHEMA_ID
+
+        if data.get("partition_spec") and "last_partition_id" not in data:
+            data["last_partition_id"] = max(spec["field-id"] for spec in data["partition_spec"])

Review Comment:
   `last-partition-id` is in the spec, so we shouldn't generate it this way unless it isn't set in `data`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org