You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "Fokko (via GitHub)" <gi...@apache.org> on 2023/05/25 18:25:03 UTC

[GitHub] [iceberg] Fokko opened a new pull request, #7699: Python: Add support for initial default

Fokko opened a new pull request, #7699:
URL: https://github.com/apache/iceberg/pull/7699

   This makes it easier to inherit the sequence numbers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #7699: Python: Add support for initial default

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #7699:
URL: https://github.com/apache/iceberg/pull/7699#discussion_r1209526383


##########
python/pyiceberg/avro/resolver.py:
##########
@@ -150,8 +152,12 @@ def struct(self, struct: StructType, expected_struct: Optional[IcebergType], fie
             if read_field.field_id not in file_fields:
                 if read_field.required:
                     raise ResolveError(f"{read_field} is non-optional, and not part of the file schema")
-                # Just set the new field to None
-                results.append((pos, NoneReader()))

Review Comment:
   Ah nice, I missed that one. Updated! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7699: Python: Add support for initial default

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7699:
URL: https://github.com/apache/iceberg/pull/7699#discussion_r1209492285


##########
python/pyiceberg/avro/reader.py:
##########
@@ -103,6 +103,19 @@ def skip(self, decoder: BinaryDecoder) -> None:
         return None
 
 
+class DefaultReader(Reader):
+    default_value: Any
+
+    def __init__(self, default_value: Any) -> None:
+        self.default_value = default_value
+
+    def read(self, _: BinaryDecoder) -> Any:
+        return self.default_value
+
+    def skip(self, decoder: BinaryDecoder) -> None:
+        return None

Review Comment:
   This isn't an issue, but I don't think that we need to return anything from `skip`, do we?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7699: Python: Add support for initial default

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7699:
URL: https://github.com/apache/iceberg/pull/7699#discussion_r1209493059


##########
python/pyiceberg/avro/resolver.py:
##########
@@ -150,8 +152,12 @@ def struct(self, struct: StructType, expected_struct: Optional[IcebergType], fie
             if read_field.field_id not in file_fields:
                 if read_field.required:
                     raise ResolveError(f"{read_field} is non-optional, and not part of the file schema")
-                # Just set the new field to None
-                results.append((pos, NoneReader()))

Review Comment:
   If the initial default is present, then the field _can_ be required. It is valid to add a new required field as long as it has a non-null initial default.
   
   I think the logic should be:
   * If there is a default, return a `DefaultReader`
   * If the field is required (and there is no default), throw an exception
   * If the field is optional, use the `NoneReader`
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #7699: Python: Add support for initial default

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #7699:
URL: https://github.com/apache/iceberg/pull/7699#discussion_r1209558572


##########
python/tests/utils/test_manifest.py:
##########
@@ -191,9 +191,9 @@ def test_read_manifest(generated_manifest_file_file: str) -> None:
 
     assert manifest_list.manifest_length == 7989
     assert manifest_list.partition_spec_id == 0
-    assert manifest_list.content is None
-    assert manifest_list.sequence_number is None
-    assert manifest_list.min_sequence_number is None
+    assert manifest_list.content == 0

Review Comment:
   This will go well for a long time:
   ```python
   ➜  python git:(fd-initial-default) ✗ python3
   Python 3.11.3 (main, Apr  7 2023, 20:13:31) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> from pyiceberg.manifest import ManifestContent
   >>> ManifestContent.DATA == 0
   True
   >>> ManifestContent.DATA == 1
   False
   >>> 1 == ManifestContent.DATA
   False
   ```
   But you're right and this should be an enum, updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #7699: Python: Add support for initial default

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on PR #7699:
URL: https://github.com/apache/iceberg/pull/7699#issuecomment-1567524552

   @rdblue Thanks for the review. Went over the comments and added extensive tests for v1 and v2 manifest-file's. 
   
   The CI will fail because of: https://github.com/apache/iceberg/pull/7729 The caching logic for caching the docker image worked on the PR ([#7698](https://github.com/apache/iceberg/pull/7698)), but somehow fails on the master branch. I think the easiest way is to revert it for now. I'll open up a new PR with the changes and also refactoring of the caching logic (I don't like it, way to complicated).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #7699: Python: Add support for initial default

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #7699:
URL: https://github.com/apache/iceberg/pull/7699#discussion_r1209559087


##########
python/tests/avro/test_resolver.py:
##########
@@ -280,3 +281,23 @@ class Ints(Record):
             records = list(reader)
 
     assert repr(records) == "[Ints[c=3, d=None]]"
+
+
+def test_resolver_initial_value() -> None:
+    write_schema = Schema(
+        NestedField(1, "name", StringType()),
+        schema_id=1,
+    )
+    read_schema = Schema(
+        NestedField(2, "something", StringType(), required=False, initial_default="vo"),
+        schema_id=1,

Review Comment:
   Eagle eye, I've changed it to two.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7699: Python: Add support for initial default

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7699:
URL: https://github.com/apache/iceberg/pull/7699#discussion_r1209493059


##########
python/pyiceberg/avro/resolver.py:
##########
@@ -150,8 +152,12 @@ def struct(self, struct: StructType, expected_struct: Optional[IcebergType], fie
             if read_field.field_id not in file_fields:
                 if read_field.required:
                     raise ResolveError(f"{read_field} is non-optional, and not part of the file schema")
-                # Just set the new field to None
-                results.append((pos, NoneReader()))

Review Comment:
   If the initial default is present, then the field _can_ be required. It is valid to add a new required field as long as it has a non-null initial default.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7699: Python: Add support for initial default

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7699:
URL: https://github.com/apache/iceberg/pull/7699#discussion_r1209494749


##########
python/tests/utils/test_manifest.py:
##########
@@ -191,9 +191,9 @@ def test_read_manifest(generated_manifest_file_file: str) -> None:
 
     assert manifest_list.manifest_length == 7989
     assert manifest_list.partition_spec_id == 0
-    assert manifest_list.content is None
-    assert manifest_list.sequence_number is None
-    assert manifest_list.min_sequence_number is None
+    assert manifest_list.content == 0

Review Comment:
   Not a blocker, but we should have an enum for `content` and use it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #7699: Python: Add support for initial default

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue merged PR #7699:
URL: https://github.com/apache/iceberg/pull/7699


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #7699: Python: Add support for initial default

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #7699:
URL: https://github.com/apache/iceberg/pull/7699#discussion_r1209525163


##########
python/pyiceberg/avro/reader.py:
##########
@@ -103,6 +103,19 @@ def skip(self, decoder: BinaryDecoder) -> None:
         return None
 
 
+class DefaultReader(Reader):
+    default_value: Any
+
+    def __init__(self, default_value: Any) -> None:
+        self.default_value = default_value
+
+    def read(self, _: BinaryDecoder) -> Any:
+        return self.default_value
+
+    def skip(self, decoder: BinaryDecoder) -> None:
+        return None

Review Comment:
   Fair point, I've changed it into:
   ```suggestion
           pass
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #7699: Python: Add support for initial default

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #7699:
URL: https://github.com/apache/iceberg/pull/7699#discussion_r1209525163


##########
python/pyiceberg/avro/reader.py:
##########
@@ -103,6 +103,19 @@ def skip(self, decoder: BinaryDecoder) -> None:
         return None
 
 
+class DefaultReader(Reader):
+    default_value: Any
+
+    def __init__(self, default_value: Any) -> None:
+        self.default_value = default_value
+
+    def read(self, _: BinaryDecoder) -> Any:
+        return self.default_value
+
+    def skip(self, decoder: BinaryDecoder) -> None:
+        return None

Review Comment:
   Fair point, I've changed it into:
   ```suggestion
           pass
   ```
   that's equivalent



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7699: Python: Add support for initial default

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7699:
URL: https://github.com/apache/iceberg/pull/7699#discussion_r1209494355


##########
python/tests/avro/test_resolver.py:
##########
@@ -280,3 +281,23 @@ class Ints(Record):
             records = list(reader)
 
     assert repr(records) == "[Ints[c=3, d=None]]"
+
+
+def test_resolver_initial_value() -> None:
+    write_schema = Schema(
+        NestedField(1, "name", StringType()),
+        schema_id=1,
+    )
+    read_schema = Schema(
+        NestedField(2, "something", StringType(), required=False, initial_default="vo"),
+        schema_id=1,

Review Comment:
   Probably doesn't matter, but this should be incremented.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #7699: Python: Add support for initial default

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on PR #7699:
URL: https://github.com/apache/iceberg/pull/7699#issuecomment-1574412480

   Merged. Thanks, @Fokko!
   
   The only thing that caught my eye this time around was adding an enum reader. I was thinking that the enum would work like an alias for the ID or that we would translate in the type after it was read, but it looks like this works just fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org