You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/31 20:39:20 UTC

[GitHub] [iceberg] CircArgs opened a new pull request #4016: types use new for generics. no metaprogramming

CircArgs opened a new pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016


   This PR is in contrast to https://github.com/apache/iceberg/pull/3981 which sought to use a pattern similar to the python typings e.g. `List, Dict, Union, etc` to create types that could be used inside iceberg and also server as types that could be statically checked when used to type code. 
   
   After discussions with @samredai and @rdblue I've revised it further so there is no real metaprogramming yet we still get much of the value.
   
   The same syntax as the current code is used to create types:
   
   `IntegerType(), StructType(
           [
               NestedField(True, 1, "required_field", StringType()),
               NestedField(False, 2, "optional_field", IntegerType()),
           ]
       )`
   
   yet we get `==` for free (no dedicated `__eq__` methods) and can use `isinstance` to check types instead of `issubclass` as was the case in https://github.com/apache/iceberg/pull/3981. 
   
   Take this example:
   
   ```python
   >>> str(IntegerType())
   integer
   
   >>> IntegerType() is IntegerType() # same object in memory
   True
   
   >>> repr(BooleanType())
   BooleanType()
   
   >>> repr(StructType(
           [
               NestedField(True, 1, "required_field", StringType()),
               NestedField(False, 2, "optional_field", IntegerType()),
           ]
       ))
   StructType(fields=(NestedField(is_optional=True, field_id=1, name='required_field', field_type=StringType(), doc=None), NestedField(is_optional=False, field_id=2, name='optional_field', field_type=IntegerType(), doc=None)))
   
   >>> str(StructType(
           [
               NestedField(True, 1, "required_field", StringType()),
               NestedField(False, 2, "optional_field", IntegerType()),
           ]
       ))
   struct<[nestedfield<True, 1, required_field, string, None>, nestedfield<False, 2, optional_field, integer, None>]>
   
   >>> StructType(
           [
               NestedField(True, 1, "required_field", StringType()),
               NestedField(False, 2, "optional_field", IntegerType()),
           ]
       )==StructType(
           [
               NestedField(True, 1, "required_field", StringType()),
               NestedField(False, 2, "optional_field", IntegerType()),
           ]
       )
   True 
   
   >>> StructType(
           [
               NestedField(True, 1, "required_field", StringType()),
               NestedField(False, 2, "optional_field", IntegerType()),
           ]
       )==StructType(
           [
               NestedField(True, 0, "required_field", StringType()), # id changed from 1 to 0
               NestedField(False, 2, "optional_field", IntegerType()),
           ]
       ) 
   False 
   
   >>> isinstance(StringType(), StringType)
   True
   ```
   
   This `types.py` is about 100 lines less code than the current one with greater functionality as described
   
   The centerpiece of this PR is simply the `__new__` method on the base `IcebergType` which checks the attribute `_implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"]` which you can see keeps track of `IcebergType` _instances_ by storing keys to the type's name and attributes (as defined it the init)
   
   Thank you to @samredai and @rdblue for helping to inspire this change.
   
   Note: if this PR is accepted https://github.com/apache/iceberg/pull/3981 should be closed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800231771



##########
File path: python/src/iceberg/types.py
##########
@@ -48,27 +46,69 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
 
+class PrimitiveType(Type):
+    """Base class for all Iceberg Primitive Types"""
+
+    _implemented = {}  # type: ignore
+
+    def __new__(cls):

Review comment:
       Possible reuse later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r807052962



##########
File path: python/src/iceberg/types.py
##########
@@ -135,75 +179,122 @@ def doc(self) -> Optional[str]:
         return self._doc
 
     @property
-    def type(self) -> Type:
+    def type(self) -> IcebergType:
         return self._type
 
-    def __repr__(self):
-        return (
-            f"NestedField(is_optional={self._is_optional}, field_id={self._id}, "
-            f"name={repr(self._name)}, field_type={repr(self._type)}, doc={repr(self._doc)})"
-        )
 
-    def __str__(self):
-        return (
-            f"{self._id}: {self._name}: {'optional' if self._is_optional else 'required'} {self._type}" ""
-            if self._doc is None
-            else f" ({self._doc})"
+class StructType(IcebergType):
+    """A struct type in Iceberg
+
+    Example:
+        >>> StructType(
+                NestedField(True, 1, "required_field", StringType()),
+                NestedField(False, 2, "optional_field", IntegerType())
         )
+    """
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return (
-                self.is_optional == other.is_optional
-                and self.field_id == other.field_id
-                and self.name == other.name
-                and self.doc == other.doc
-                and self.type == other.type
-            )
-        return False
+    _instances: Dict[Tuple[NestedField, ...], "StructType"] = {}
 
+    def __new__(cls, *fields: NestedField):
+        cls._instances[fields] = cls._instances.get(fields) or object.__new__(cls)
+        return cls._instances[fields]
 
-class StructType(Type):
-    def __init__(self, fields: list):
+    def __init__(self, *fields: NestedField):
         super().__init__(
             f"struct<{', '.join(map(str, fields))}>",
-            f"StructType(fields={repr(fields)})",
+            f"StructType{repr(fields)}",
         )
         self._fields = fields
 
     @property
-    def fields(self) -> list:
+    def fields(self) -> Tuple[NestedField, ...]:
         return self._fields
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.fields == other.fields
-        return False
 
+class ListType(IcebergType):
+    """A list type in Iceberg
+
+    Example:
+        >>> ListType(element_id=3, element_type=StringType(), element_is_optional=True)
+        ListType(element=NestedField(is_optional=True, field_id=3, name='element', field_type=StringType(), doc=None))
+    """
+
+    _instances: Dict[Tuple[bool, int, IcebergType], "ListType"] = {}
+
+    def __new__(
+        cls,
+        element_id: int,
+        element_type: IcebergType,
+        element_is_optional: bool,
+    ):
+        key = (element_is_optional, element_id, element_type)
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)
+        return cls._instances[key]
 
-class ListType(Type):
-    def __init__(self, element: NestedField):
-        super().__init__(f"list<{element.type}>", f"ListType(element={repr(element)})")
-        self._element_field = element
+    def __init__(
+        self,
+        element_id: int,
+        element_type: IcebergType,
+        element_is_optional: bool,
+    ):
+        super().__init__(
+            f"list<{element_type}>",
+            f"ListType(element_is_optional={element_is_optional}, element_id={element_id}, "

Review comment:
       Minor: It would be nice if the argument order were consistent. This is slightly misleading because `element_is_optional` is last if you're passing by position.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800234217



##########
File path: python/src/iceberg/types.py
##########
@@ -135,53 +186,54 @@ def doc(self) -> Optional[str]:
         return self._doc
 
     @property
-    def type(self) -> Type:
+    def type(self) -> IcebergType:
         return self._type
 
-    def __repr__(self):
-        return (
-            f"NestedField(is_optional={self._is_optional}, field_id={self._id}, "
-            f"name={repr(self._name)}, field_type={repr(self._type)}, doc={repr(self._doc)})"
-        )
 
-    def __str__(self):
-        return (
-            f"{self._id}: {self._name}: {'optional' if self._is_optional else 'required'} {self._type}" ""
-            if self._doc is None
-            else f" ({self._doc})"
-        )
+class StructType(IcebergType):
+    """A struct type in Iceberg
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return (
-                self.is_optional == other.is_optional
-                and self.field_id == other.field_id
-                and self.name == other.name
-                and self.doc == other.doc
-                and self.type == other.type
-            )
-        return False
+    Example:
+        >>> StructType(
+            [
+                NestedField(True, 1, "required_field", StringType()),
+                NestedField(False, 2, "optional_field", IntegerType()),
+            ]
+    """
 
+    _instances: Dict[Tuple[NestedField, ...], "StructType"] = {}
 
-class StructType(Type):
-    def __init__(self, fields: list):
+    def __new__(cls, fields: List[NestedField]):

Review comment:
       Is it possible to change this to `*fields` instead of `List[NestedField]`? That would allow a bit more natural syntax:
   
   ```python
   s: StructType = StructType(NestedField(True, 1, "col", StringType()))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796560526



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):

Review comment:
       > For the decorator example, while the object created using FixedType will be singleton, but IIUC, the FixedType itself is a function not a class any more, right? So we cannot call class methods from it, etc. It is better we don't use decorator.
   
   Decorators return a type they decorate. In this case, the decorator would take the class and return the very same class  (in contrast to most decorators of functions which usually act as a closure and return a wrapper function) so FixedType would still be a class, still have `__name__` FixedType and still have all the same methods and attributes with `__new__` added via the decorator
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796193872



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):
+        init_func = cls.__dict__.get("__init__")
+        names = init_func.__code__.co_varnames[1:] if init_func else []
+        defaults = init_func and init_func.__defaults__ or []
+        args = {**dict(zip(names[::-1], defaults[::-1])), **dict(zip(names, args))}
+        args = {**args, **kwargs}
+        args = tuple(
+            (k, tuple(args[k]) if isinstance(args[k], list) else args[k]) for k in names
+        )
+        key = (cls.__name__, args)
+        if key in cls._implemented:
+            return cls._implemented[key]
+        else:
+            ret = object.__new__(cls)
+            ret._args = args
+            cls._implemented[key] = ret
+            return ret
 
     def __repr__(self):
-        return self._repr_string
+        base = f"{type(self).__name__}("
+        if not self._args:
+            return base + ")"
+        for k, v in self._args:
+            base += k + "=" + repr(v) + ", "
+        return base[:-2] + ")"

Review comment:
       thanks changed it. definitely makes that more concise




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796188322



##########
File path: python/tests/test_types.py
##########
@@ -63,7 +63,7 @@ def test_repr_primitive_types(input_type):
 def test_fixed_type():
     type_var = FixedType(length=5)
     assert type_var.length == 5
-    assert str(type_var) == "fixed[5]"
+    assert str(type_var) == "fixed<5>"

Review comment:
       I think this representation is required by the spec, so we should probably not change it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796198759



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):

Review comment:
       I think there's still a case to be made for a single implementation maybe just one that does not introspect and is more explicit? Say a decorator? 
   
   I think something like 
   ```python
   
   @singleton('length')
   class FixedType(IcebergType):
   # __new__ given via decorator
   ```
   would be doable
   
   `singleton` wouldn't have to do much but set a `new` that looks at what the decorator was fed. it would still have generic logic though just not introspection with the hidden attributes as it is currently. Do you think that's still too much?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796917075



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:

Review comment:
       haha okay I'll use `IcebergType` again




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#issuecomment-1028408742


   This is looking good, but I just committed @jun-he's change that adds __eq__ since it was simple and unblocks him. Can you rebase and merge in those changes? That has a Singleton class that can be used in a couple places.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796187657



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):
+        init_func = cls.__dict__.get("__init__")
+        names = init_func.__code__.co_varnames[1:] if init_func else []
+        defaults = init_func and init_func.__defaults__ or []
+        args = {**dict(zip(names[::-1], defaults[::-1])), **dict(zip(names, args))}
+        args = {**args, **kwargs}
+        args = tuple(
+            (k, tuple(args[k]) if isinstance(args[k], list) else args[k]) for k in names
+        )
+        key = (cls.__name__, args)
+        if key in cls._implemented:
+            return cls._implemented[key]
+        else:
+            ret = object.__new__(cls)
+            ret._args = args
+            cls._implemented[key] = ret
+            return ret
 
     def __repr__(self):
-        return self._repr_string
+        base = f"{type(self).__name__}("

Review comment:
       I'd probably keep the existing `__init__` methods that construct their own repr. I think that's easier to maintain and understand, kind of like the simpler `__new__` approach. If you build a method for each class then it is simpler.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796192638



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:

Review comment:
       yeah it conflicts with a common typing primitive `from typing import Type` which is used when typing something that takes a type and not a literal




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796919451



##########
File path: python/src/iceberg/types.py
##########
@@ -48,27 +46,69 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
 
+class PrimitiveType(Type):
+    """Base class for all Iceberg Primitive Types"""
+
+    _implemented = {}  # type: ignore
+
+    def __new__(cls):
+        if cls in cls._implemented:
+            return cls._implemented[cls]
+        else:
+            ret = object.__new__(cls)

Review comment:
       it's a dictionary, but I put it there so that `get` would work otherwise I have to first check for `None` right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#issuecomment-1043527521


   Hey @rdblue the `__init__` will be called every time we request an instance of the type. First `__new__` will give us the instance (which the custom `__new__` on the class or inherited from `Singleton` will give) and then `__init__` will be called. It's true for both the parameterized typed and the non-parameterized ones.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796322693



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:

Review comment:
       It seems we don't have this conflict based on the python_legacy and the current development. Also, this class is internal and won't be used extensively and not sure if the renaming is required.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796628344



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):
+        init_func = cls.__dict__.get("__init__")
+        names = init_func.__code__.co_varnames[1:] if init_func else []
+        defaults = init_func and init_func.__defaults__ or []
+        args = {**dict(zip(names[::-1], defaults[::-1])), **dict(zip(names, args))}
+        args = {**args, **kwargs}
+        args = tuple(
+            (k, tuple(args[k]) if isinstance(args[k], list) else args[k]) for k in names
+        )
+        key = (cls.__name__, args)
+        if key in cls._implemented:
+            return cls._implemented[key]
+        else:
+            ret = object.__new__(cls)
+            ret._args = args
+            cls._implemented[key] = ret
+            return ret
 
     def __repr__(self):
-        return self._repr_string
+        base = f"{type(self).__name__}("

Review comment:
       added back all the manual definitions for str/repr




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796925557



##########
File path: python/src/iceberg/types.py
##########
@@ -206,19 +204,14 @@ class IntegerType(Type):
 
     min: int = -2147483648
 
-    def __init__(self):
-        super().__init__("int", "IntegerType()", is_primitive=True)
 
-
-class LongType(Type):
+class LongType(PrimitiveType):
     """A Long data type in Iceberg can be represented using an instance of this class. Longs in Iceberg are
     64-bit signed integers.
-
     Example:
         >>> column_foo = LongType()
         >>> isinstance(column_foo, LongType)
         True
-

Review comment:
       I think it's actually a matter of black not formatting docstrings. Seems to be an unresolved issue https://github.com/psf/black/issues/144 mentions some additional formatting libraries but I've never seen them. Added in the whitespace manually for now for all docstrings




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800233203



##########
File path: python/src/iceberg/types.py
##########
@@ -135,53 +186,54 @@ def doc(self) -> Optional[str]:
         return self._doc
 
     @property
-    def type(self) -> Type:
+    def type(self) -> IcebergType:
         return self._type
 
-    def __repr__(self):
-        return (
-            f"NestedField(is_optional={self._is_optional}, field_id={self._id}, "
-            f"name={repr(self._name)}, field_type={repr(self._type)}, doc={repr(self._doc)})"
-        )
 
-    def __str__(self):
-        return (
-            f"{self._id}: {self._name}: {'optional' if self._is_optional else 'required'} {self._type}" ""
-            if self._doc is None
-            else f" ({self._doc})"
-        )
+class StructType(IcebergType):
+    """A struct type in Iceberg
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return (
-                self.is_optional == other.is_optional
-                and self.field_id == other.field_id
-                and self.name == other.name
-                and self.doc == other.doc
-                and self.type == other.type
-            )
-        return False
+    Example:
+        >>> StructType(
+            [
+                NestedField(True, 1, "required_field", StringType()),
+                NestedField(False, 2, "optional_field", IntegerType()),
+            ]
+    """
 
+    _instances: Dict[Tuple[NestedField, ...], "StructType"] = {}
 
-class StructType(Type):
-    def __init__(self, fields: list):
+    def __new__(cls, fields: List[NestedField]):
+        key = tuple(fields)
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)
+        return cls._instances[key]
+
+    def __init__(self, fields: List[NestedField] = []):
         super().__init__(
             f"struct<{', '.join(map(str, fields))}>",
             f"StructType(fields={repr(fields)})",
         )
         self._fields = fields
 
     @property
-    def fields(self) -> list:
+    def fields(self) -> List[NestedField]:
         return self._fields
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.fields == other.fields
-        return False
 
+class ListType(IcebergType):
+    """A list type in Iceberg
+
+    Example:
+        >>> ListType(NestedField(True, 1, "required_field", StringType()))

Review comment:
       List elements must always be named "element" so we should verify that in `__init__`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800232127



##########
File path: python/src/iceberg/types.py
##########
@@ -57,30 +57,65 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
+
+class PrimitiveType(IcebergType):
+    """Base class for all Iceberg Primitive Types"""
+
+    _instances = {}  # type: ignore
+
+    def __new__(cls):
+        cls._instances[cls] = cls._instances.get(cls) or object.__new__(cls)
+        return cls._instances[cls]
+
+
+class FixedType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> FixedType(8)
+        FixedType(length=8)
+        >>> FixedType(8)==FixedType(8)
+        True
+    """
+
+    _instances: Dict[int, "FixedType"] = {}
 
+    def __new__(cls, length: int):
+        cls._instances[length] = cls._instances.get(length) or object.__new__(cls)

Review comment:
       Doesn't this need to pass `length` into `__new__` so that `__init__` is called correctly?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#issuecomment-1040514755


   Thanks, @CircArgs! Just a couple minor fixes to go and then we can get this in. The only semi-major thing is my question about `__init__`: is that called every time we return a type? Or does Python keep track and only call it once for the object?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800232677



##########
File path: python/src/iceberg/types.py
##########
@@ -93,21 +128,37 @@ def precision(self) -> int:
     def scale(self) -> int:
         return self._scale
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.precision == other.precision and self.scale == other.scale
-        return False
 
+class NestedField(IcebergType):
+    """equivalent of `NestedField` type from Java implementation"""
+
+    _instances: Dict[Tuple[bool, int, str, IcebergType, Optional[str]], "NestedField"] = {}
+
+    def __new__(
+        cls,
+        is_optional: bool,
+        field_id: int,
+        name: str,
+        field_type: IcebergType,
+        doc: Optional[str] = None,
+    ):
+        key = is_optional, field_id, name, field_type, doc
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)

Review comment:
       Need to pass arguments?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#issuecomment-1040514755


   Thanks, @CircArgs! Just a couple minor fixes to go and then we can get this in. The only semi-major thing is my question about `__init__`: is that called every time we return a type? Or does Python keep track and only call it once for the object?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r807049622



##########
File path: python/src/iceberg/types.py
##########
@@ -93,21 +121,37 @@ def precision(self) -> int:
     def scale(self) -> int:
         return self._scale
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.precision == other.precision and self.scale == other.scale
-        return False
 
+class NestedField(IcebergType):
+    """This represents a field of a struct, a map key, a map value, or a list element. This is where field IDs, names, docs, and nullability are tracked."""

Review comment:
       This doc string should be wrapped at 130 characters, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r803146563



##########
File path: python/src/iceberg/types.py
##########
@@ -57,30 +57,65 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
+
+class PrimitiveType(IcebergType):
+    """Base class for all Iceberg Primitive Types"""
+
+    _instances = {}  # type: ignore
+
+    def __new__(cls):
+        cls._instances[cls] = cls._instances.get(cls) or object.__new__(cls)
+        return cls._instances[cls]
+
+
+class FixedType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> FixedType(8)
+        FixedType(length=8)
+        >>> FixedType(8)==FixedType(8)
+        True
+    """
+
+    _instances: Dict[int, "FixedType"] = {}
 
+    def __new__(cls, length: int):
+        cls._instances[length] = cls._instances.get(length) or object.__new__(cls)
+        return cls._instances[length]
 
-class FixedType(Type):
     def __init__(self, length: int):
-        super().__init__(f"fixed[{length}]", f"FixedType(length={length})", is_primitive=True)
+        super().__init__(f"fixed[{length}]", f"FixedType(length={length})")
         self._length = length
 
     @property
     def length(self) -> int:
         return self._length
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.length == other.length
-        return False
 
+class DecimalType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> DecimalType(32, 3)
+        DecimalType(precision=32, scale=3)
+        >>> DecimalType(8, 3)==DecimalType(8, 3)
+        True
+    """
+
+    _instances: Dict[Tuple[int, int], "DecimalType"] = {}
+
+    def __new__(cls, precision: int, scale: int):
+        key = precision, scale
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)

Review comment:
       It's not in `__new__`'s semantics to deal with initialization. It could be done but I think that's bad practice and it's left to `__init__`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796065073



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):
+        init_func = cls.__dict__.get("__init__")
+        names = init_func.__code__.co_varnames[1:] if init_func else []
+        defaults = init_func and init_func.__defaults__ or []
+        args = {**dict(zip(names[::-1], defaults[::-1])), **dict(zip(names, args))}
+        args = {**args, **kwargs}
+        args = tuple(
+            (k, tuple(args[k]) if isinstance(args[k], list) else args[k]) for k in names
+        )
+        key = (cls.__name__, args)
+        if key in cls._implemented:
+            return cls._implemented[key]
+        else:
+            ret = object.__new__(cls)
+            ret._args = args
+            cls._implemented[key] = ret
+            return ret
 
     def __repr__(self):
-        return self._repr_string
+        base = f"{type(self).__name__}("
+        if not self._args:
+            return base + ")"
+        for k, v in self._args:
+            base += k + "=" + repr(v) + ", "
+        return base[:-2] + ")"

Review comment:
       You can likely use `join` and a comprehension so you don't have to chop off the last two characters of the string.
   
   Untested, but maybe something like
   ```python
   ', '.join([f"k=repr(v)" for k, v in self._args if self._args])
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796843127



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):
+        init_func = cls.__dict__.get("__init__")
+        names = init_func.__code__.co_varnames[1:] if init_func else []
+        defaults = init_func and init_func.__defaults__ or []
+        args = {**dict(zip(names[::-1], defaults[::-1])), **dict(zip(names, args))}
+        args = {**args, **kwargs}
+        args = tuple(
+            (k, tuple(args[k]) if isinstance(args[k], list) else args[k]) for k in names
+        )
+        key = (cls.__name__, args)
+        if key in cls._implemented:
+            return cls._implemented[key]
+        else:
+            ret = object.__new__(cls)
+            ret._args = args
+            cls._implemented[key] = ret
+            return ret
 
     def __repr__(self):
-        return self._repr_string
+        base = f"{type(self).__name__}("
+        if not self._args:
+            return base + ")"
+        for k, v in self._args:
+            base += k + "=" + repr(v) + ", "
+        return base[:-2] + ")"
 
     def __str__(self):
-        return self._type_string
+        base = f"{type(self).__name__.lower().replace('type', '')}<"
+        if not self._args:
+            return base[:-1]
+        for k, v in self._args:
+            base += (
+                str(v)
+                if type(v) != tuple
+                else "[" + ", ".join([str(vv) for vv in v]) + "]"
+            ) + ", "
+        return base[:-2] + ">"
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)

Review comment:
       We haven't needed those in the JVM implementation. I think this should be fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800233141



##########
File path: python/src/iceberg/types.py
##########
@@ -48,27 +46,69 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
 
+class PrimitiveType(Type):
+    """Base class for all Iceberg Primitive Types"""
+
+    _implemented = {}  # type: ignore
+
+    def __new__(cls):

Review comment:
       I'd probably continue to use Singleton so that you don't have to call `object.__new__` and can use `super().__new__` instead. That seems safer to me for some reason.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r801163104



##########
File path: python/src/iceberg/types.py
##########
@@ -57,30 +57,65 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
+
+class PrimitiveType(IcebergType):
+    """Base class for all Iceberg Primitive Types"""
+
+    _instances = {}  # type: ignore
+
+    def __new__(cls):
+        cls._instances[cls] = cls._instances.get(cls) or object.__new__(cls)
+        return cls._instances[cls]
+
+
+class FixedType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> FixedType(8)
+        FixedType(length=8)
+        >>> FixedType(8)==FixedType(8)
+        True
+    """
+
+    _instances: Dict[int, "FixedType"] = {}
 
+    def __new__(cls, length: int):
+        cls._instances[length] = cls._instances.get(length) or object.__new__(cls)
+        return cls._instances[length]
 
-class FixedType(Type):
     def __init__(self, length: int):
-        super().__init__(f"fixed[{length}]", f"FixedType(length={length})", is_primitive=True)
+        super().__init__(f"fixed[{length}]", f"FixedType(length={length})")
         self._length = length
 
     @property
     def length(self) -> int:
         return self._length
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.length == other.length
-        return False
 
+class DecimalType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> DecimalType(32, 3)
+        DecimalType(precision=32, scale=3)
+        >>> DecimalType(8, 3)==DecimalType(8, 3)
+        True
+    """
+
+    _instances: Dict[Tuple[int, int], "DecimalType"] = {}
+
+    def __new__(cls, precision: int, scale: int):
+        key = precision, scale

Review comment:
       I can see how that might be more explicit. Added




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r807055531



##########
File path: python/src/iceberg/types.py
##########
@@ -93,21 +121,37 @@ def precision(self) -> int:
     def scale(self) -> int:
         return self._scale
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.precision == other.precision and self.scale == other.scale
-        return False
 
+class NestedField(IcebergType):
+    """This represents a field of a struct, a map key, a map value, or a list element. This is where field IDs, names, docs, and nullability are tracked."""
+
+    _instances: Dict[Tuple[bool, int, str, IcebergType, Optional[str]], "NestedField"] = {}
+
+    def __new__(
+        cls,
+        is_optional: bool,

Review comment:
       The optional boolean is at the end for map and list types. What about adding this after `doc` and defaulting it to `True`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r807051760



##########
File path: python/src/iceberg/types.py
##########
@@ -135,75 +179,122 @@ def doc(self) -> Optional[str]:
         return self._doc
 
     @property
-    def type(self) -> Type:
+    def type(self) -> IcebergType:
         return self._type
 
-    def __repr__(self):
-        return (
-            f"NestedField(is_optional={self._is_optional}, field_id={self._id}, "
-            f"name={repr(self._name)}, field_type={repr(self._type)}, doc={repr(self._doc)})"
-        )
 
-    def __str__(self):
-        return (
-            f"{self._id}: {self._name}: {'optional' if self._is_optional else 'required'} {self._type}" ""
-            if self._doc is None
-            else f" ({self._doc})"
+class StructType(IcebergType):
+    """A struct type in Iceberg
+
+    Example:
+        >>> StructType(
+                NestedField(True, 1, "required_field", StringType()),
+                NestedField(False, 2, "optional_field", IntegerType())
         )
+    """
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return (
-                self.is_optional == other.is_optional
-                and self.field_id == other.field_id
-                and self.name == other.name
-                and self.doc == other.doc
-                and self.type == other.type
-            )
-        return False
+    _instances: Dict[Tuple[NestedField, ...], "StructType"] = {}
 
+    def __new__(cls, *fields: NestedField):
+        cls._instances[fields] = cls._instances.get(fields) or object.__new__(cls)
+        return cls._instances[fields]
 
-class StructType(Type):
-    def __init__(self, fields: list):
+    def __init__(self, *fields: NestedField):
         super().__init__(
             f"struct<{', '.join(map(str, fields))}>",
-            f"StructType(fields={repr(fields)})",
+            f"StructType{repr(fields)}",
         )
         self._fields = fields
 
     @property
-    def fields(self) -> list:
+    def fields(self) -> Tuple[NestedField, ...]:
         return self._fields
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.fields == other.fields
-        return False
 
+class ListType(IcebergType):
+    """A list type in Iceberg
+
+    Example:
+        >>> ListType(element_id=3, element_type=StringType(), element_is_optional=True)
+        ListType(element=NestedField(is_optional=True, field_id=3, name='element', field_type=StringType(), doc=None))
+    """
+
+    _instances: Dict[Tuple[bool, int, IcebergType], "ListType"] = {}
+
+    def __new__(
+        cls,
+        element_id: int,
+        element_type: IcebergType,
+        element_is_optional: bool,
+    ):
+        key = (element_is_optional, element_id, element_type)
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)
+        return cls._instances[key]
 
-class ListType(Type):
-    def __init__(self, element: NestedField):
-        super().__init__(f"list<{element.type}>", f"ListType(element={repr(element)})")
-        self._element_field = element
+    def __init__(
+        self,
+        element_id: int,
+        element_type: IcebergType,
+        element_is_optional: bool,
+    ):
+        super().__init__(
+            f"list<{element_type}>",
+            f"ListType(element_is_optional={element_is_optional}, element_id={element_id}, "
+            f"element_type={repr(element_type)})",
+        )
+        self._element_field = NestedField(
+            name="element",
+            is_optional=element_is_optional,
+            field_id=element_id,
+            field_type=element_type,
+        )
 
     @property
     def element(self) -> NestedField:
         return self._element_field
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.element == other.element
-        return False
 
+class MapType(IcebergType):
+    """A map type in Iceberg
+
+    Example:
+        >>> MapType(key_id=1, key_type=StringType(), value_id=2, value_type=IntegerType(), value_is_optional=True)
+        MapType(key=NestedField(is_optional=False, field_id=1, name='key', field_type=StringType(), doc=None), value=NestedField(is_optional=True, field_id=2, name='value', field_type=IntegerType(), doc=None))

Review comment:
       Looks like this also needs to be wrapped to the line length.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796188097



##########
File path: python/src/iceberg/types.py
##########
@@ -206,19 +204,14 @@ class IntegerType(Type):
 
     min: int = -2147483648
 
-    def __init__(self):
-        super().__init__("int", "IntegerType()", is_primitive=True)
 
-
-class LongType(Type):
+class LongType(PrimitiveType):
     """A Long data type in Iceberg can be represented using an instance of this class. Longs in Iceberg are
     64-bit signed integers.
-
     Example:
         >>> column_foo = LongType()
         >>> isinstance(column_foo, LongType)
         True
-

Review comment:
       Looks like there are a lot of whitespace changes. We generally avoid these because they can cause git commit conflicts. I think we also use an autoformatter for python.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] samredai commented on pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
samredai commented on pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#issuecomment-1029142996


   I took a look at this test failure and it looks like it might be transient. I suspect rerunning the 3.8 test should succeed. That being said I noticed that `--diff` is not included so we don't see any details in the logs on which lines are causing the lint failure so I opened PR #4034 to add that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#issuecomment-1048051953


   Thanks, @CircArgs! Great to have this in. @jun-he, can you update the transforms PR? I think that one is next.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796200595



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):

Review comment:
       Also, I added some clarifying comments to the `__new__` in the latest commit. I think you already completely understand what's going on there though




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796196559



##########
File path: python/tests/test_types.py
##########
@@ -63,7 +63,7 @@ def test_repr_primitive_types(input_type):
 def test_fixed_type():
     type_var = FixedType(length=5)
     assert type_var.length == 5
-    assert str(type_var) == "fixed[5]"
+    assert str(type_var) == "fixed<5>"

Review comment:
       I must have misremembered then or gotten confused. Is it the case that `Decimal, Fixed` use square brackets and `List, Map, Struct` use angle brackets?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796560526



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):

Review comment:
       > For the decorator example, while the object created using FixedType will be singleton, but IIUC, the FixedType itself is a function not a class any more, right? So we cannot call class methods from it, etc. It is better we don't use decorator.
   
   Decorators return a type they decorate. In this case, the decorator would take the class and return the very same class (in contrast to most decorators of functions which usually act as a closure and return a wrapper function)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796326867



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):

Review comment:
       I agree with @rdblue about it. It would be much simpler and easy to understand and maintain if we do that separately. I gave it a try in https://github.com/apache/iceberg/pull/3965 and seems quite straightforward.
   
   For the decorator example, while the object created using FixedType will be singleton, but IIUC, the FixedType itself is a function not a class any more, right? So we cannot call class methods from it, etc. It is better we don't use decorator.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796926369



##########
File path: python/src/iceberg/types.py
##########
@@ -48,27 +46,69 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
 
+class PrimitiveType(Type):
+    """Base class for all Iceberg Primitive Types"""
+
+    _implemented = {}  # type: ignore
+
+    def __new__(cls):

Review comment:
       What do you think the benefits of that might be?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r807048847



##########
File path: python/src/iceberg/types.py
##########
@@ -57,30 +57,65 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
+
+class PrimitiveType(IcebergType):
+    """Base class for all Iceberg Primitive Types"""
+
+    _instances = {}  # type: ignore
+
+    def __new__(cls):
+        cls._instances[cls] = cls._instances.get(cls) or object.__new__(cls)
+        return cls._instances[cls]
+
+
+class FixedType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> FixedType(8)
+        FixedType(length=8)
+        >>> FixedType(8)==FixedType(8)
+        True
+    """
+
+    _instances: Dict[int, "FixedType"] = {}
 
+    def __new__(cls, length: int):
+        cls._instances[length] = cls._instances.get(length) or object.__new__(cls)
+        return cls._instances[length]
 
-class FixedType(Type):
     def __init__(self, length: int):
-        super().__init__(f"fixed[{length}]", f"FixedType(length={length})", is_primitive=True)
+        super().__init__(f"fixed[{length}]", f"FixedType(length={length})")
         self._length = length
 
     @property
     def length(self) -> int:
         return self._length
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.length == other.length
-        return False
 
+class DecimalType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> DecimalType(32, 3)
+        DecimalType(precision=32, scale=3)
+        >>> DecimalType(8, 3)==DecimalType(8, 3)
+        True
+    """
+
+    _instances: Dict[Tuple[int, int], "DecimalType"] = {}
+
+    def __new__(cls, precision: int, scale: int):
+        key = precision, scale
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)

Review comment:
       Okay, I understand. So Python is still going to call `__init__` after this method on the object that this returns.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#issuecomment-1029227432


   @rdblue Merged in those changes. I think it's in line with the discussion @jun-he and I were having in his PR. Singleton is as it was in his


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796187179



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:

Review comment:
       What's the motivation for renaming `Type` to `IcebergType`? Does `Type` conflict with something?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796870051



##########
File path: python/src/iceberg/types.py
##########
@@ -48,27 +46,69 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
 
+class PrimitiveType(Type):
+    """Base class for all Iceberg Primitive Types"""
+
+    _implemented = {}  # type: ignore
+
+    def __new__(cls):
+        if cls in cls._implemented:
+            return cls._implemented[cls]
+        else:
+            ret = object.__new__(cls)

Review comment:
       Do we need to keep a set here? Seems just `_instance = None` should be OK, right?

##########
File path: python/src/iceberg/types.py
##########
@@ -48,27 +46,69 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
 
+class PrimitiveType(Type):
+    """Base class for all Iceberg Primitive Types"""
+
+    _implemented = {}  # type: ignore
+
+    def __new__(cls):

Review comment:
       Wondering if it is better to move this logic into a Singleton class and any class can extend it to be a singleton.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796065073



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):
+        init_func = cls.__dict__.get("__init__")
+        names = init_func.__code__.co_varnames[1:] if init_func else []
+        defaults = init_func and init_func.__defaults__ or []
+        args = {**dict(zip(names[::-1], defaults[::-1])), **dict(zip(names, args))}
+        args = {**args, **kwargs}
+        args = tuple(
+            (k, tuple(args[k]) if isinstance(args[k], list) else args[k]) for k in names
+        )
+        key = (cls.__name__, args)
+        if key in cls._implemented:
+            return cls._implemented[key]
+        else:
+            ret = object.__new__(cls)
+            ret._args = args
+            cls._implemented[key] = ret
+            return ret
 
     def __repr__(self):
-        return self._repr_string
+        base = f"{type(self).__name__}("
+        if not self._args:
+            return base + ")"
+        for k, v in self._args:
+            base += k + "=" + repr(v) + ", "
+        return base[:-2] + ")"

Review comment:
       You can likely use `join` and a comprehension so you don't have to chop off the last two characters of the string.
   
   Untested, but maybe something like
   ```python
   ', '.join([f"k=repr(v)" for k, v in self._args])
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800231819



##########
File path: python/src/iceberg/types.py
##########
@@ -57,30 +57,65 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
+
+class PrimitiveType(IcebergType):
+    """Base class for all Iceberg Primitive Types"""
+
+    _instances = {}  # type: ignore
+
+    def __new__(cls):
+        cls._instances[cls] = cls._instances.get(cls) or object.__new__(cls)
+        return cls._instances[cls]
+
+
+class FixedType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> FixedType(8)
+        FixedType(length=8)
+        >>> FixedType(8)==FixedType(8)
+        True
+    """
+
+    _instances: Dict[int, "FixedType"] = {}
 
+    def __new__(cls, length: int):
+        cls._instances[length] = cls._instances.get(length) or object.__new__(cls)
+        return cls._instances[length]
 
-class FixedType(Type):
     def __init__(self, length: int):
-        super().__init__(f"fixed[{length}]", f"FixedType(length={length})", is_primitive=True)
+        super().__init__(f"fixed[{length}]", f"FixedType(length={length})")
         self._length = length
 
     @property
     def length(self) -> int:
         return self._length
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.length == other.length
-        return False
 
+class DecimalType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> DecimalType(32, 3)
+        DecimalType(precision=32, scale=3)
+        >>> DecimalType(8, 3)==DecimalType(8, 3)
+        True
+    """
+
+    _instances: Dict[Tuple[int, int], "DecimalType"] = {}
+
+    def __new__(cls, precision: int, scale: int):
+        key = precision, scale

Review comment:
       I think it would be more clear to have an explicit tuple here, but maybe I'm just not used to reading Python.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800233373



##########
File path: python/src/iceberg/types.py
##########
@@ -305,10 +364,10 @@ class DoubleType(Type, Singleton):
     """
 
     def __init__(self):
-        super().__init__("double", "DoubleType()", is_primitive=True)
+        super().__init__("double", "DoubleType()")
 
 
-class DateType(Type, Singleton):
+class DateType(IcebergType, Singleton):

Review comment:
       These all need to be `PrimitiveType` to make the `is_primitive` check work, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800835256



##########
File path: python/src/iceberg/types.py
##########
@@ -135,53 +186,54 @@ def doc(self) -> Optional[str]:
         return self._doc
 
     @property
-    def type(self) -> Type:
+    def type(self) -> IcebergType:
         return self._type
 
-    def __repr__(self):
-        return (
-            f"NestedField(is_optional={self._is_optional}, field_id={self._id}, "
-            f"name={repr(self._name)}, field_type={repr(self._type)}, doc={repr(self._doc)})"
-        )
 
-    def __str__(self):
-        return (
-            f"{self._id}: {self._name}: {'optional' if self._is_optional else 'required'} {self._type}" ""
-            if self._doc is None
-            else f" ({self._doc})"
-        )
+class StructType(IcebergType):
+    """A struct type in Iceberg
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return (
-                self.is_optional == other.is_optional
-                and self.field_id == other.field_id
-                and self.name == other.name
-                and self.doc == other.doc
-                and self.type == other.type
-            )
-        return False
+    Example:
+        >>> StructType(
+            [
+                NestedField(True, 1, "required_field", StringType()),
+                NestedField(False, 2, "optional_field", IntegerType()),
+            ]
+    """
 
+    _instances: Dict[Tuple[NestedField, ...], "StructType"] = {}
 
-class StructType(Type):
-    def __init__(self, fields: list):
+    def __new__(cls, fields: List[NestedField]):
+        key = tuple(fields)
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)
+        return cls._instances[key]
+
+    def __init__(self, fields: List[NestedField] = []):
         super().__init__(
             f"struct<{', '.join(map(str, fields))}>",
             f"StructType(fields={repr(fields)})",
         )
         self._fields = fields
 
     @property
-    def fields(self) -> list:
+    def fields(self) -> List[NestedField]:

Review comment:
       Yeah returning the tuple now came with changing to `*fields`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800927129



##########
File path: python/tests/test_types.py
##########
@@ -101,9 +101,7 @@ def test_struct_type():
     assert len(type_var.fields) == 3
     assert str(type_var) == str(eval(repr(type_var)))
     assert type_var == eval(repr(type_var))
-    assert type_var != StructType(
-        [NestedField(True, 1, "optional_field", IntegerType())]
-    )
+    assert type_var != StructType([NestedField(True, 1, "optional_field", IntegerType())])

Review comment:
       added a test which tests all the types




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] samredai commented on pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
samredai commented on pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#issuecomment-1047383065


   I [commented](https://github.com/apache/iceberg/pull/4174#issuecomment-1047381823) in the other PR and I see those commits were added here, this LGTM. Thanks @CircArgs and @rdblue!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800839125



##########
File path: python/src/iceberg/types.py
##########
@@ -305,10 +364,10 @@ class DoubleType(Type, Singleton):
     """
 
     def __init__(self):
-        super().__init__("double", "DoubleType()", is_primitive=True)
+        super().__init__("double", "DoubleType()")
 
 
-class DateType(Type, Singleton):
+class DateType(IcebergType, Singleton):

Review comment:
       Yeah I messed it up when mergins in Jun's

##########
File path: python/src/iceberg/types.py
##########
@@ -305,10 +364,10 @@ class DoubleType(Type, Singleton):
     """
 
     def __init__(self):
-        super().__init__("double", "DoubleType()", is_primitive=True)
+        super().__init__("double", "DoubleType()")
 
 
-class DateType(Type, Singleton):
+class DateType(IcebergType, Singleton):

Review comment:
       Yeah I messed it up when mergin in Jun's

##########
File path: python/src/iceberg/types.py
##########
@@ -305,10 +364,10 @@ class DoubleType(Type, Singleton):
     """
 
     def __init__(self):
-        super().__init__("double", "DoubleType()", is_primitive=True)
+        super().__init__("double", "DoubleType()")
 
 
-class DateType(Type, Singleton):
+class DateType(IcebergType, Singleton):

Review comment:
       Yeah I messed it up when merging in Jun's




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800232808



##########
File path: python/src/iceberg/types.py
##########
@@ -135,53 +186,54 @@ def doc(self) -> Optional[str]:
         return self._doc
 
     @property
-    def type(self) -> Type:
+    def type(self) -> IcebergType:
         return self._type
 
-    def __repr__(self):
-        return (
-            f"NestedField(is_optional={self._is_optional}, field_id={self._id}, "
-            f"name={repr(self._name)}, field_type={repr(self._type)}, doc={repr(self._doc)})"
-        )
 
-    def __str__(self):
-        return (
-            f"{self._id}: {self._name}: {'optional' if self._is_optional else 'required'} {self._type}" ""
-            if self._doc is None
-            else f" ({self._doc})"
-        )
+class StructType(IcebergType):
+    """A struct type in Iceberg
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return (
-                self.is_optional == other.is_optional
-                and self.field_id == other.field_id
-                and self.name == other.name
-                and self.doc == other.doc
-                and self.type == other.type
-            )
-        return False
+    Example:
+        >>> StructType(
+            [
+                NestedField(True, 1, "required_field", StringType()),
+                NestedField(False, 2, "optional_field", IntegerType()),
+            ]
+    """
 
+    _instances: Dict[Tuple[NestedField, ...], "StructType"] = {}
 
-class StructType(Type):
-    def __init__(self, fields: list):
+    def __new__(cls, fields: List[NestedField]):
+        key = tuple(fields)
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)
+        return cls._instances[key]
+
+    def __init__(self, fields: List[NestedField] = []):
         super().__init__(
             f"struct<{', '.join(map(str, fields))}>",
             f"StructType(fields={repr(fields)})",
         )
         self._fields = fields
 
     @property
-    def fields(self) -> list:
+    def fields(self) -> List[NestedField]:

Review comment:
       Should we store and return a Tuple here so that these are unmodifiable?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800232359



##########
File path: python/src/iceberg/types.py
##########
@@ -93,21 +128,37 @@ def precision(self) -> int:
     def scale(self) -> int:
         return self._scale
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.precision == other.precision and self.scale == other.scale
-        return False
 
+class NestedField(IcebergType):
+    """equivalent of `NestedField` type from Java implementation"""

Review comment:
       Doc strings should use normal sentence case, so this should start with a capital letter.
   
   Also, I don't think that we want to refer to the Java implementation. This represents a field of a struct, a map key, a map value, or a list element. This is where field IDs are held.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800231707



##########
File path: python/src/iceberg/types.py
##########
@@ -15,7 +15,6 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-

Review comment:
       Nit: would be nice to leave this newline.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796194881



##########
File path: python/src/iceberg/types.py
##########
@@ -206,19 +204,14 @@ class IntegerType(Type):
 
     min: int = -2147483648
 
-    def __init__(self):
-        super().__init__("int", "IntegerType()", is_primitive=True)
 
-
-class LongType(Type):
+class LongType(PrimitiveType):
     """A Long data type in Iceberg can be represented using an instance of this class. Longs in Iceberg are
     64-bit signed integers.
-
     Example:
         >>> column_foo = LongType()
         >>> isinstance(column_foo, LongType)
         True
-

Review comment:
       hmm I did run the tox formatter before committing `tox -e format` maybe that case was that it was not run on `types.py` previously? (which I think would have been my fault still...)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796199504



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):
+        init_func = cls.__dict__.get("__init__")
+        names = init_func.__code__.co_varnames[1:] if init_func else []
+        defaults = init_func and init_func.__defaults__ or []
+        args = {**dict(zip(names[::-1], defaults[::-1])), **dict(zip(names, args))}
+        args = {**args, **kwargs}
+        args = tuple(
+            (k, tuple(args[k]) if isinstance(args[k], list) else args[k]) for k in names
+        )
+        key = (cls.__name__, args)
+        if key in cls._implemented:
+            return cls._implemented[key]
+        else:
+            ret = object.__new__(cls)
+            ret._args = args
+            cls._implemented[key] = ret
+            return ret
 
     def __repr__(self):
-        return self._repr_string
+        base = f"{type(self).__name__}("

Review comment:
       just to explain how this change started: to get `__new__` to work I back up to `object`'s `__new__` and can't feed it any arguments so using the `__init__` for this as it was I don't think is so easy to get working, so I can change it back especially with how your comment about me changing `fixed[8]->fixed<8>` is shaking out




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r807058828



##########
File path: python/src/iceberg/types.py
##########
@@ -93,21 +121,37 @@ def precision(self) -> int:
     def scale(self) -> int:
         return self._scale
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.precision == other.precision and self.scale == other.scale
-        return False
 
+class NestedField(IcebergType):
+    """This represents a field of a struct, a map key, a map value, or a list element. This is where field IDs, names, docs, and nullability are tracked."""
+
+    _instances: Dict[Tuple[bool, int, str, IcebergType, Optional[str]], "NestedField"] = {}
+
+    def __new__(
+        cls,
+        is_optional: bool,
+        field_id: int,
+        name: str,
+        field_type: IcebergType,
+        doc: Optional[str] = None,
+    ):
+        key = (is_optional, field_id, name, field_type, doc)
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)
+        return cls._instances[key]
 
-class NestedField:
     def __init__(
         self,
         is_optional: bool,
         field_id: int,
         name: str,
-        field_type: Type,
+        field_type: IcebergType,
         doc: Optional[str] = None,
     ):
+        super().__init__(
+            (f"{field_id}: {name}: {'optional' if is_optional else 'required'} {field_type}" "" if doc is None else f" ({doc})"),
+            f"NestedField(is_optional={is_optional}, field_id={field_id}, "
+            f"name={repr(name)}, field_type={repr(field_type)}, doc={repr(doc)})",

Review comment:
       Can we omit doc if it is None? In the repr string, I see `doc=None`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796611528



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:

Review comment:
       Definitely not required, but I felt `IcebergType` was clearer and as you say 
   
   > this class is internal and won't be used extensively
   
   
   
   so to me that would mean making the change for the sake of clarity and avoiding the unnecessary collision. Would this base type be touched at all even or is it just for consolidating logic?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796842664



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:

Review comment:
       I'm good either way. Up to you, but now that I understand the motivation I'd probably use `IcebergType` as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796627972



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:

Review comment:
       I switched back to just `Type`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796876298



##########
File path: python/src/iceberg/types.py
##########
@@ -82,7 +122,23 @@ def scale(self) -> int:
         return self._scale
 
 
-class NestedField(object):
+class NestedField(Type):
+    """equivalent of `NestedField` type from Java implementation"""
+
+    _implemented: Dict[Tuple[bool, int, str, Type, Optional[str]], "NestedField"] = {}
+
+    def __new__(
+        cls,
+        is_optional: bool,
+        field_id: int,
+        name: str,
+        field_type: Type,
+        doc: Optional[str] = None,
+    ):
+        key = is_optional, field_id, name, field_type, doc
+        cls._implemented[key] = cls._implemented.get(key, object.__new__(cls))

Review comment:
       In this line, will it always create a new object, `object.__new__(cls)` and pass it to get method call?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796630205



##########
File path: python/src/iceberg/types.py
##########
@@ -206,19 +204,14 @@ class IntegerType(Type):
 
     min: int = -2147483648
 
-    def __init__(self):
-        super().__init__("int", "IntegerType()", is_primitive=True)
 
-
-class LongType(Type):
+class LongType(PrimitiveType):
     """A Long data type in Iceberg can be represented using an instance of this class. Longs in Iceberg are
     64-bit signed integers.
-
     Example:
         >>> column_foo = LongType()
         >>> isinstance(column_foo, LongType)
         True
-

Review comment:
       Now I see I think I added the whitespace in the docstrings. They seem much clearer with a line in between the descriptions and examples. Would you like me to get rid of them?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r807048847



##########
File path: python/src/iceberg/types.py
##########
@@ -57,30 +57,65 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
+
+class PrimitiveType(IcebergType):
+    """Base class for all Iceberg Primitive Types"""
+
+    _instances = {}  # type: ignore
+
+    def __new__(cls):
+        cls._instances[cls] = cls._instances.get(cls) or object.__new__(cls)
+        return cls._instances[cls]
+
+
+class FixedType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> FixedType(8)
+        FixedType(length=8)
+        >>> FixedType(8)==FixedType(8)
+        True
+    """
+
+    _instances: Dict[int, "FixedType"] = {}
 
+    def __new__(cls, length: int):
+        cls._instances[length] = cls._instances.get(length) or object.__new__(cls)
+        return cls._instances[length]
 
-class FixedType(Type):
     def __init__(self, length: int):
-        super().__init__(f"fixed[{length}]", f"FixedType(length={length})", is_primitive=True)
+        super().__init__(f"fixed[{length}]", f"FixedType(length={length})")
         self._length = length
 
     @property
     def length(self) -> int:
         return self._length
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.length == other.length
-        return False
 
+class DecimalType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> DecimalType(32, 3)
+        DecimalType(precision=32, scale=3)
+        >>> DecimalType(8, 3)==DecimalType(8, 3)
+        True
+    """
+
+    _instances: Dict[Tuple[int, int], "DecimalType"] = {}
+
+    def __new__(cls, precision: int, scale: int):
+        key = precision, scale
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)

Review comment:
       Okay, I understand. So Python is still going to call `__init__` after this method on the object that this returns?
   
   Is it a concern that `__init__` is called every time this is returned?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r807058055



##########
File path: python/src/iceberg/types.py
##########
@@ -135,75 +179,122 @@ def doc(self) -> Optional[str]:
         return self._doc
 
     @property
-    def type(self) -> Type:
+    def type(self) -> IcebergType:
         return self._type
 
-    def __repr__(self):
-        return (
-            f"NestedField(is_optional={self._is_optional}, field_id={self._id}, "
-            f"name={repr(self._name)}, field_type={repr(self._type)}, doc={repr(self._doc)})"
-        )
 
-    def __str__(self):
-        return (
-            f"{self._id}: {self._name}: {'optional' if self._is_optional else 'required'} {self._type}" ""
-            if self._doc is None
-            else f" ({self._doc})"
+class StructType(IcebergType):
+    """A struct type in Iceberg
+
+    Example:
+        >>> StructType(
+                NestedField(True, 1, "required_field", StringType()),
+                NestedField(False, 2, "optional_field", IntegerType())
         )
+    """
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return (
-                self.is_optional == other.is_optional
-                and self.field_id == other.field_id
-                and self.name == other.name
-                and self.doc == other.doc
-                and self.type == other.type
-            )
-        return False
+    _instances: Dict[Tuple[NestedField, ...], "StructType"] = {}
 
+    def __new__(cls, *fields: NestedField):
+        cls._instances[fields] = cls._instances.get(fields) or object.__new__(cls)
+        return cls._instances[fields]
 
-class StructType(Type):
-    def __init__(self, fields: list):
+    def __init__(self, *fields: NestedField):
         super().__init__(
             f"struct<{', '.join(map(str, fields))}>",
-            f"StructType(fields={repr(fields)})",
+            f"StructType{repr(fields)}",
         )
         self._fields = fields
 
     @property
-    def fields(self) -> list:
+    def fields(self) -> Tuple[NestedField, ...]:
         return self._fields
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.fields == other.fields
-        return False
 
+class ListType(IcebergType):
+    """A list type in Iceberg
+
+    Example:
+        >>> ListType(element_id=3, element_type=StringType(), element_is_optional=True)
+        ListType(element=NestedField(is_optional=True, field_id=3, name='element', field_type=StringType(), doc=None))
+    """
+
+    _instances: Dict[Tuple[bool, int, IcebergType], "ListType"] = {}
+
+    def __new__(
+        cls,
+        element_id: int,
+        element_type: IcebergType,
+        element_is_optional: bool,

Review comment:
       Can we default element_is_optional to True? Same with value_is_optional for maps.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r807057475



##########
File path: python/src/iceberg/types.py
##########
@@ -93,21 +121,37 @@ def precision(self) -> int:
     def scale(self) -> int:
         return self._scale
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.precision == other.precision and self.scale == other.scale
-        return False
 
+class NestedField(IcebergType):
+    """This represents a field of a struct, a map key, a map value, or a list element. This is where field IDs, names, docs, and nullability are tracked."""
+
+    _instances: Dict[Tuple[bool, int, str, IcebergType, Optional[str]], "NestedField"] = {}
+
+    def __new__(

Review comment:
       Testing this out in ipython, I missed a couple args and got this error message:
   
   ```
   __new__() missing 2 required positional arguments: 'name' and 'field_type'
   ```
   
   Is there a way to improve that? Or is that helpful enough?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796627657



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):

Review comment:
       I ended up implementing `__new__` separately for all types that have some parameters and once for those that do not as the decorator would still have needed some resolving of arguments as it was in the single `__new__`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796844790



##########
File path: python/tests/test_types.py
##########
@@ -63,7 +63,7 @@ def test_repr_primitive_types(input_type):
 def test_fixed_type():
     type_var = FixedType(length=5)
     assert type_var.length == 5
-    assert str(type_var) == "fixed[5]"
+    assert str(type_var) == "fixed<5>"

Review comment:
       There's a difference between the nested types that use `<>` and the primitive types that use `[]` or `()`. I think the inconsistency between fixed and decimal is probably my fault, but we can't really change it now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796187765



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):
+        init_func = cls.__dict__.get("__init__")
+        names = init_func.__code__.co_varnames[1:] if init_func else []
+        defaults = init_func and init_func.__defaults__ or []
+        args = {**dict(zip(names[::-1], defaults[::-1])), **dict(zip(names, args))}
+        args = {**args, **kwargs}
+        args = tuple(
+            (k, tuple(args[k]) if isinstance(args[k], list) else args[k]) for k in names
+        )
+        key = (cls.__name__, args)
+        if key in cls._implemented:
+            return cls._implemented[key]
+        else:
+            ret = object.__new__(cls)
+            ret._args = args
+            cls._implemented[key] = ret
+            return ret
 
     def __repr__(self):
-        return self._repr_string
+        base = f"{type(self).__name__}("
+        if not self._args:
+            return base + ")"
+        for k, v in self._args:
+            base += k + "=" + repr(v) + ", "
+        return base[:-2] + ")"
 
     def __str__(self):
-        return self._type_string
+        base = f"{type(self).__name__.lower().replace('type', '')}<"
+        if not self._args:
+            return base[:-1]
+        for k, v in self._args:
+            base += (
+                str(v)
+                if type(v) != tuple
+                else "[" + ", ".join([str(vv) for vv in v]) + "]"
+            ) + ", "
+        return base[:-2] + ">"
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)

Review comment:
       This is a good idea rather than passing `_is_primitive`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r807048847



##########
File path: python/src/iceberg/types.py
##########
@@ -57,30 +57,65 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
+
+class PrimitiveType(IcebergType):
+    """Base class for all Iceberg Primitive Types"""
+
+    _instances = {}  # type: ignore
+
+    def __new__(cls):
+        cls._instances[cls] = cls._instances.get(cls) or object.__new__(cls)
+        return cls._instances[cls]
+
+
+class FixedType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> FixedType(8)
+        FixedType(length=8)
+        >>> FixedType(8)==FixedType(8)
+        True
+    """
+
+    _instances: Dict[int, "FixedType"] = {}
 
+    def __new__(cls, length: int):
+        cls._instances[length] = cls._instances.get(length) or object.__new__(cls)
+        return cls._instances[length]
 
-class FixedType(Type):
     def __init__(self, length: int):
-        super().__init__(f"fixed[{length}]", f"FixedType(length={length})", is_primitive=True)
+        super().__init__(f"fixed[{length}]", f"FixedType(length={length})")
         self._length = length
 
     @property
     def length(self) -> int:
         return self._length
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.length == other.length
-        return False
 
+class DecimalType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> DecimalType(32, 3)
+        DecimalType(precision=32, scale=3)
+        >>> DecimalType(8, 3)==DecimalType(8, 3)
+        True
+    """
+
+    _instances: Dict[Tuple[int, int], "DecimalType"] = {}
+
+    def __new__(cls, precision: int, scale: int):
+        key = precision, scale
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)

Review comment:
       Okay, I understand. So Python is still going to call `__init__` after this method on the object that this returns.

##########
File path: python/src/iceberg/types.py
##########
@@ -57,30 +57,65 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
+
+class PrimitiveType(IcebergType):
+    """Base class for all Iceberg Primitive Types"""
+
+    _instances = {}  # type: ignore
+
+    def __new__(cls):
+        cls._instances[cls] = cls._instances.get(cls) or object.__new__(cls)
+        return cls._instances[cls]
+
+
+class FixedType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> FixedType(8)
+        FixedType(length=8)
+        >>> FixedType(8)==FixedType(8)
+        True
+    """
+
+    _instances: Dict[int, "FixedType"] = {}
 
+    def __new__(cls, length: int):
+        cls._instances[length] = cls._instances.get(length) or object.__new__(cls)
+        return cls._instances[length]
 
-class FixedType(Type):
     def __init__(self, length: int):
-        super().__init__(f"fixed[{length}]", f"FixedType(length={length})", is_primitive=True)
+        super().__init__(f"fixed[{length}]", f"FixedType(length={length})")
         self._length = length
 
     @property
     def length(self) -> int:
         return self._length
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.length == other.length
-        return False
 
+class DecimalType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> DecimalType(32, 3)
+        DecimalType(precision=32, scale=3)
+        >>> DecimalType(8, 3)==DecimalType(8, 3)
+        True
+    """
+
+    _instances: Dict[Tuple[int, int], "DecimalType"] = {}
+
+    def __new__(cls, precision: int, scale: int):
+        key = precision, scale
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)

Review comment:
       Okay, I understand. So Python is still going to call `__init__` after this method on the object that this returns?
   
   Is it a concern that `__init__` is called every time this is returned?

##########
File path: python/src/iceberg/types.py
##########
@@ -93,21 +121,37 @@ def precision(self) -> int:
     def scale(self) -> int:
         return self._scale
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.precision == other.precision and self.scale == other.scale
-        return False
 
+class NestedField(IcebergType):
+    """This represents a field of a struct, a map key, a map value, or a list element. This is where field IDs, names, docs, and nullability are tracked."""

Review comment:
       This doc string should be wrapped at 130 characters, right?

##########
File path: python/src/iceberg/types.py
##########
@@ -135,75 +179,122 @@ def doc(self) -> Optional[str]:
         return self._doc
 
     @property
-    def type(self) -> Type:
+    def type(self) -> IcebergType:
         return self._type
 
-    def __repr__(self):
-        return (
-            f"NestedField(is_optional={self._is_optional}, field_id={self._id}, "
-            f"name={repr(self._name)}, field_type={repr(self._type)}, doc={repr(self._doc)})"
-        )
 
-    def __str__(self):
-        return (
-            f"{self._id}: {self._name}: {'optional' if self._is_optional else 'required'} {self._type}" ""
-            if self._doc is None
-            else f" ({self._doc})"
+class StructType(IcebergType):
+    """A struct type in Iceberg
+
+    Example:
+        >>> StructType(
+                NestedField(True, 1, "required_field", StringType()),
+                NestedField(False, 2, "optional_field", IntegerType())
         )
+    """
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return (
-                self.is_optional == other.is_optional
-                and self.field_id == other.field_id
-                and self.name == other.name
-                and self.doc == other.doc
-                and self.type == other.type
-            )
-        return False
+    _instances: Dict[Tuple[NestedField, ...], "StructType"] = {}
 
+    def __new__(cls, *fields: NestedField):
+        cls._instances[fields] = cls._instances.get(fields) or object.__new__(cls)
+        return cls._instances[fields]
 
-class StructType(Type):
-    def __init__(self, fields: list):
+    def __init__(self, *fields: NestedField):
         super().__init__(
             f"struct<{', '.join(map(str, fields))}>",
-            f"StructType(fields={repr(fields)})",
+            f"StructType{repr(fields)}",
         )
         self._fields = fields
 
     @property
-    def fields(self) -> list:
+    def fields(self) -> Tuple[NestedField, ...]:
         return self._fields
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.fields == other.fields
-        return False
 
+class ListType(IcebergType):
+    """A list type in Iceberg
+
+    Example:
+        >>> ListType(element_id=3, element_type=StringType(), element_is_optional=True)
+        ListType(element=NestedField(is_optional=True, field_id=3, name='element', field_type=StringType(), doc=None))
+    """
+
+    _instances: Dict[Tuple[bool, int, IcebergType], "ListType"] = {}
+
+    def __new__(
+        cls,
+        element_id: int,
+        element_type: IcebergType,
+        element_is_optional: bool,
+    ):
+        key = (element_is_optional, element_id, element_type)
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)
+        return cls._instances[key]
 
-class ListType(Type):
-    def __init__(self, element: NestedField):
-        super().__init__(f"list<{element.type}>", f"ListType(element={repr(element)})")
-        self._element_field = element
+    def __init__(
+        self,
+        element_id: int,
+        element_type: IcebergType,
+        element_is_optional: bool,
+    ):
+        super().__init__(
+            f"list<{element_type}>",
+            f"ListType(element_is_optional={element_is_optional}, element_id={element_id}, "
+            f"element_type={repr(element_type)})",
+        )
+        self._element_field = NestedField(
+            name="element",
+            is_optional=element_is_optional,
+            field_id=element_id,
+            field_type=element_type,
+        )
 
     @property
     def element(self) -> NestedField:
         return self._element_field
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.element == other.element
-        return False
 
+class MapType(IcebergType):
+    """A map type in Iceberg
+
+    Example:
+        >>> MapType(key_id=1, key_type=StringType(), value_id=2, value_type=IntegerType(), value_is_optional=True)
+        MapType(key=NestedField(is_optional=False, field_id=1, name='key', field_type=StringType(), doc=None), value=NestedField(is_optional=True, field_id=2, name='value', field_type=IntegerType(), doc=None))

Review comment:
       Looks like this also needs to be wrapped to the line length.

##########
File path: python/src/iceberg/types.py
##########
@@ -135,75 +179,122 @@ def doc(self) -> Optional[str]:
         return self._doc
 
     @property
-    def type(self) -> Type:
+    def type(self) -> IcebergType:
         return self._type
 
-    def __repr__(self):
-        return (
-            f"NestedField(is_optional={self._is_optional}, field_id={self._id}, "
-            f"name={repr(self._name)}, field_type={repr(self._type)}, doc={repr(self._doc)})"
-        )
 
-    def __str__(self):
-        return (
-            f"{self._id}: {self._name}: {'optional' if self._is_optional else 'required'} {self._type}" ""
-            if self._doc is None
-            else f" ({self._doc})"
+class StructType(IcebergType):
+    """A struct type in Iceberg
+
+    Example:
+        >>> StructType(
+                NestedField(True, 1, "required_field", StringType()),
+                NestedField(False, 2, "optional_field", IntegerType())
         )
+    """
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return (
-                self.is_optional == other.is_optional
-                and self.field_id == other.field_id
-                and self.name == other.name
-                and self.doc == other.doc
-                and self.type == other.type
-            )
-        return False
+    _instances: Dict[Tuple[NestedField, ...], "StructType"] = {}
 
+    def __new__(cls, *fields: NestedField):
+        cls._instances[fields] = cls._instances.get(fields) or object.__new__(cls)
+        return cls._instances[fields]
 
-class StructType(Type):
-    def __init__(self, fields: list):
+    def __init__(self, *fields: NestedField):
         super().__init__(
             f"struct<{', '.join(map(str, fields))}>",
-            f"StructType(fields={repr(fields)})",
+            f"StructType{repr(fields)}",
         )
         self._fields = fields
 
     @property
-    def fields(self) -> list:
+    def fields(self) -> Tuple[NestedField, ...]:
         return self._fields
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.fields == other.fields
-        return False
 
+class ListType(IcebergType):
+    """A list type in Iceberg
+
+    Example:
+        >>> ListType(element_id=3, element_type=StringType(), element_is_optional=True)
+        ListType(element=NestedField(is_optional=True, field_id=3, name='element', field_type=StringType(), doc=None))
+    """
+
+    _instances: Dict[Tuple[bool, int, IcebergType], "ListType"] = {}
+
+    def __new__(
+        cls,
+        element_id: int,
+        element_type: IcebergType,
+        element_is_optional: bool,
+    ):
+        key = (element_is_optional, element_id, element_type)
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)
+        return cls._instances[key]
 
-class ListType(Type):
-    def __init__(self, element: NestedField):
-        super().__init__(f"list<{element.type}>", f"ListType(element={repr(element)})")
-        self._element_field = element
+    def __init__(
+        self,
+        element_id: int,
+        element_type: IcebergType,
+        element_is_optional: bool,
+    ):
+        super().__init__(
+            f"list<{element_type}>",
+            f"ListType(element_is_optional={element_is_optional}, element_id={element_id}, "

Review comment:
       Minor: It would be nice if the argument order were consistent. This is slightly misleading because `element_is_optional` is last if you're passing by position.

##########
File path: python/src/iceberg/types.py
##########
@@ -93,21 +121,37 @@ def precision(self) -> int:
     def scale(self) -> int:
         return self._scale
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.precision == other.precision and self.scale == other.scale
-        return False
 
+class NestedField(IcebergType):
+    """This represents a field of a struct, a map key, a map value, or a list element. This is where field IDs, names, docs, and nullability are tracked."""
+
+    _instances: Dict[Tuple[bool, int, str, IcebergType, Optional[str]], "NestedField"] = {}
+
+    def __new__(
+        cls,
+        is_optional: bool,

Review comment:
       The optional boolean is at the end for map and list types. What about adding this after `doc` and defaulting it to `True`?

##########
File path: python/src/iceberg/types.py
##########
@@ -93,21 +121,37 @@ def precision(self) -> int:
     def scale(self) -> int:
         return self._scale
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.precision == other.precision and self.scale == other.scale
-        return False
 
+class NestedField(IcebergType):
+    """This represents a field of a struct, a map key, a map value, or a list element. This is where field IDs, names, docs, and nullability are tracked."""
+
+    _instances: Dict[Tuple[bool, int, str, IcebergType, Optional[str]], "NestedField"] = {}
+
+    def __new__(

Review comment:
       Testing this out in ipython, I missed a couple args and got this error message:
   
   ```
   __new__() missing 2 required positional arguments: 'name' and 'field_type'
   ```
   
   Is there a way to improve that? Or is that helpful enough?

##########
File path: python/src/iceberg/types.py
##########
@@ -135,75 +179,122 @@ def doc(self) -> Optional[str]:
         return self._doc
 
     @property
-    def type(self) -> Type:
+    def type(self) -> IcebergType:
         return self._type
 
-    def __repr__(self):
-        return (
-            f"NestedField(is_optional={self._is_optional}, field_id={self._id}, "
-            f"name={repr(self._name)}, field_type={repr(self._type)}, doc={repr(self._doc)})"
-        )
 
-    def __str__(self):
-        return (
-            f"{self._id}: {self._name}: {'optional' if self._is_optional else 'required'} {self._type}" ""
-            if self._doc is None
-            else f" ({self._doc})"
+class StructType(IcebergType):
+    """A struct type in Iceberg
+
+    Example:
+        >>> StructType(
+                NestedField(True, 1, "required_field", StringType()),
+                NestedField(False, 2, "optional_field", IntegerType())
         )
+    """
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return (
-                self.is_optional == other.is_optional
-                and self.field_id == other.field_id
-                and self.name == other.name
-                and self.doc == other.doc
-                and self.type == other.type
-            )
-        return False
+    _instances: Dict[Tuple[NestedField, ...], "StructType"] = {}
 
+    def __new__(cls, *fields: NestedField):
+        cls._instances[fields] = cls._instances.get(fields) or object.__new__(cls)
+        return cls._instances[fields]
 
-class StructType(Type):
-    def __init__(self, fields: list):
+    def __init__(self, *fields: NestedField):
         super().__init__(
             f"struct<{', '.join(map(str, fields))}>",
-            f"StructType(fields={repr(fields)})",
+            f"StructType{repr(fields)}",
         )
         self._fields = fields
 
     @property
-    def fields(self) -> list:
+    def fields(self) -> Tuple[NestedField, ...]:
         return self._fields
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.fields == other.fields
-        return False
 
+class ListType(IcebergType):
+    """A list type in Iceberg
+
+    Example:
+        >>> ListType(element_id=3, element_type=StringType(), element_is_optional=True)
+        ListType(element=NestedField(is_optional=True, field_id=3, name='element', field_type=StringType(), doc=None))
+    """
+
+    _instances: Dict[Tuple[bool, int, IcebergType], "ListType"] = {}
+
+    def __new__(
+        cls,
+        element_id: int,
+        element_type: IcebergType,
+        element_is_optional: bool,

Review comment:
       Can we default element_is_optional to True? Same with value_is_optional for maps.

##########
File path: python/src/iceberg/types.py
##########
@@ -93,21 +121,37 @@ def precision(self) -> int:
     def scale(self) -> int:
         return self._scale
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.precision == other.precision and self.scale == other.scale
-        return False
 
+class NestedField(IcebergType):
+    """This represents a field of a struct, a map key, a map value, or a list element. This is where field IDs, names, docs, and nullability are tracked."""
+
+    _instances: Dict[Tuple[bool, int, str, IcebergType, Optional[str]], "NestedField"] = {}
+
+    def __new__(
+        cls,
+        is_optional: bool,
+        field_id: int,
+        name: str,
+        field_type: IcebergType,
+        doc: Optional[str] = None,
+    ):
+        key = (is_optional, field_id, name, field_type, doc)
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)
+        return cls._instances[key]
 
-class NestedField:
     def __init__(
         self,
         is_optional: bool,
         field_id: int,
         name: str,
-        field_type: Type,
+        field_type: IcebergType,
         doc: Optional[str] = None,
     ):
+        super().__init__(
+            (f"{field_id}: {name}: {'optional' if is_optional else 'required'} {field_type}" "" if doc is None else f" ({doc})"),
+            f"NestedField(is_optional={is_optional}, field_id={field_id}, "
+            f"name={repr(name)}, field_type={repr(field_type)}, doc={repr(doc)})",

Review comment:
       Can we omit doc if it is None? In the repr string, I see `doc=None`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800232170



##########
File path: python/src/iceberg/types.py
##########
@@ -57,30 +57,65 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
+
+class PrimitiveType(IcebergType):
+    """Base class for all Iceberg Primitive Types"""
+
+    _instances = {}  # type: ignore
+
+    def __new__(cls):
+        cls._instances[cls] = cls._instances.get(cls) or object.__new__(cls)
+        return cls._instances[cls]
+
+
+class FixedType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> FixedType(8)
+        FixedType(length=8)
+        >>> FixedType(8)==FixedType(8)
+        True
+    """
+
+    _instances: Dict[int, "FixedType"] = {}
 
+    def __new__(cls, length: int):
+        cls._instances[length] = cls._instances.get(length) or object.__new__(cls)
+        return cls._instances[length]
 
-class FixedType(Type):
     def __init__(self, length: int):
-        super().__init__(f"fixed[{length}]", f"FixedType(length={length})", is_primitive=True)
+        super().__init__(f"fixed[{length}]", f"FixedType(length={length})")
         self._length = length
 
     @property
     def length(self) -> int:
         return self._length
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.length == other.length
-        return False
 
+class DecimalType(PrimitiveType):
+    """A fixed data type in Iceberg.
+
+    Example:
+        >>> DecimalType(32, 3)
+        DecimalType(precision=32, scale=3)
+        >>> DecimalType(8, 3)==DecimalType(8, 3)
+        True
+    """
+
+    _instances: Dict[Tuple[int, int], "DecimalType"] = {}
+
+    def __new__(cls, precision: int, scale: int):
+        key = precision, scale
+        cls._instances[key] = cls._instances.get(key) or object.__new__(cls)

Review comment:
       Doesn't this need to pass precision and scale in for `__init__`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800233058



##########
File path: python/src/iceberg/types.py
##########
@@ -190,13 +242,25 @@ def __init__(self, element: NestedField):
     def element(self) -> NestedField:
         return self._element_field
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.element == other.element
-        return False
 
+class MapType(IcebergType):
+    """A map type in Iceberg
+
+    Example:
+        >>> MapType(
+                NestedField(True, 1, "required_field", StringType()),

Review comment:
       Map keys are not allowed to be null. If we are creating a map by passing two `NestedField` instances, then we should check that. Same thing with names. Map key fields must be named "key" and map value fields must be named "value".
   
   This is why the Java API passes the types and field IDs instead of accepting nested fields:
   
   ```java
   MapType stringToIntMap = MapType.ofRequired(1, 2, StringType.get(), IntegerType.get());
   // Also defines `ofOptional`
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800233479



##########
File path: python/tests/test_types.py
##########
@@ -101,9 +101,7 @@ def test_struct_type():
     assert len(type_var.fields) == 3
     assert str(type_var) == str(eval(repr(type_var)))
     assert type_var == eval(repr(type_var))
-    assert type_var != StructType(
-        [NestedField(True, 1, "optional_field", IntegerType())]
-    )
+    assert type_var != StructType([NestedField(True, 1, "optional_field", IntegerType())])

Review comment:
       Can you add tests for `is_primitive`? I don't think those would be passing because many of the types aren't inheriting from `PrimitiveType`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800232359



##########
File path: python/src/iceberg/types.py
##########
@@ -93,21 +128,37 @@ def precision(self) -> int:
     def scale(self) -> int:
         return self._scale
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return self.precision == other.precision and self.scale == other.scale
-        return False
 
+class NestedField(IcebergType):
+    """equivalent of `NestedField` type from Java implementation"""

Review comment:
       Doc strings should use normal sentence case, so this should start with a capital letter.
   
   Also, I don't think that we want to refer to the Java implementation. This represents a field of a struct, a map key, a map value, or a list element. This is where field IDs, names, docs, and nullability are tracked.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r800828299



##########
File path: python/src/iceberg/types.py
##########
@@ -135,53 +186,54 @@ def doc(self) -> Optional[str]:
         return self._doc
 
     @property
-    def type(self) -> Type:
+    def type(self) -> IcebergType:
         return self._type
 
-    def __repr__(self):
-        return (
-            f"NestedField(is_optional={self._is_optional}, field_id={self._id}, "
-            f"name={repr(self._name)}, field_type={repr(self._type)}, doc={repr(self._doc)})"
-        )
 
-    def __str__(self):
-        return (
-            f"{self._id}: {self._name}: {'optional' if self._is_optional else 'required'} {self._type}" ""
-            if self._doc is None
-            else f" ({self._doc})"
-        )
+class StructType(IcebergType):
+    """A struct type in Iceberg
 
-    def __eq__(self, other):
-        if type(self) is type(other):
-            return (
-                self.is_optional == other.is_optional
-                and self.field_id == other.field_id
-                and self.name == other.name
-                and self.doc == other.doc
-                and self.type == other.type
-            )
-        return False
+    Example:
+        >>> StructType(
+            [
+                NestedField(True, 1, "required_field", StringType()),
+                NestedField(False, 2, "optional_field", IntegerType()),
+            ]
+    """
 
+    _instances: Dict[Tuple[NestedField, ...], "StructType"] = {}
 
-class StructType(Type):
-    def __init__(self, fields: list):
+    def __new__(cls, fields: List[NestedField]):

Review comment:
       sure is. looks much better this way. all set now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796628720



##########
File path: python/tests/test_types.py
##########
@@ -63,7 +63,7 @@ def test_repr_primitive_types(input_type):
 def test_fixed_type():
     type_var = FixedType(length=5)
     assert type_var.length == 5
-    assert str(type_var) == "fixed[5]"
+    assert str(type_var) == "fixed<5>"

Review comment:
       Fixed by adding back manual str/repr in inits




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796200006



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):
+        init_func = cls.__dict__.get("__init__")
+        names = init_func.__code__.co_varnames[1:] if init_func else []
+        defaults = init_func and init_func.__defaults__ or []
+        args = {**dict(zip(names[::-1], defaults[::-1])), **dict(zip(names, args))}
+        args = {**args, **kwargs}
+        args = tuple(
+            (k, tuple(args[k]) if isinstance(args[k], list) else args[k]) for k in names
+        )
+        key = (cls.__name__, args)
+        if key in cls._implemented:
+            return cls._implemented[key]
+        else:
+            ret = object.__new__(cls)
+            ret._args = args
+            cls._implemented[key] = ret
+            return ret
 
     def __repr__(self):
-        return self._repr_string
+        base = f"{type(self).__name__}("
+        if not self._args:
+            return base + ")"
+        for k, v in self._args:
+            base += k + "=" + repr(v) + ", "
+        return base[:-2] + ")"
 
     def __str__(self):
-        return self._type_string
+        base = f"{type(self).__name__.lower().replace('type', '')}<"
+        if not self._args:
+            return base[:-1]
+        for k, v in self._args:
+            base += (
+                str(v)
+                if type(v) != tuple
+                else "[" + ", ".join([str(vv) for vv in v]) + "]"
+            ) + ", "
+        return base[:-2] + ">"
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)

Review comment:
       is there any need for any other such identifiers like `is_numeric` that you could see this pattern being useful for and I should add?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796843443



##########
File path: python/src/iceberg/types.py
##########
@@ -206,19 +204,14 @@ class IntegerType(Type):
 
     min: int = -2147483648
 
-    def __init__(self):
-        super().__init__("int", "IntegerType()", is_primitive=True)
 
-
-class LongType(Type):
+class LongType(PrimitiveType):
     """A Long data type in Iceberg can be represented using an instance of this class. Longs in Iceberg are
     64-bit signed integers.
-
     Example:
         >>> column_foo = LongType()
         >>> isinstance(column_foo, LongType)
         True
-

Review comment:
       I'd prefer keeping the extra whitespace. Maybe we need to fix tox?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796187007



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):

Review comment:
       I can understand wanting to have one implementation for all classes, but I think that it is likely that the code would be more maintainable if it were split out because you wouldn't need to handle all possible values passed to `__init__`.
   
   For example, adding this works for all of the non-parameterized primitive types:
   
   ```python
   # not modified other than this addition
   class Type:
       _instance = None
       def __new__(cls, *args, **kwargs):
           if not getattr(cls, "_instance"):
               cls._instance = super().__new__(cls, *args, **kwargs)
           return cls._instance
   ```
   ```python
   In [4]: i = test_types.IntegerType()
   
   In [5]: i == test_types.IntegerType()
   Out[5]: True
   
   In [6]: i == test_types.FloatType()
   Out[6]: False
   
   In [7]: test_types.FloatType()
   Out[7]: FloatType()
   
   In [8]: f = test_types.FloatType()
   
   In [9]: f == test_types.FloatType()
   Out[9]: True
   
   In [10]: i == f
   Out[10]: False
   ```
   
   I think it would be much more straightforward to override `__new__` separately. Then because you control each `__init__` method, you know that you don't have to handle default args or kwargs or things that make this so complex.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796919871



##########
File path: python/src/iceberg/types.py
##########
@@ -82,7 +122,23 @@ def scale(self) -> int:
         return self._scale
 
 
-class NestedField(object):
+class NestedField(Type):
+    """equivalent of `NestedField` type from Java implementation"""
+
+    _implemented: Dict[Tuple[bool, int, str, Type, Optional[str]], "NestedField"] = {}
+
+    def __new__(
+        cls,
+        is_optional: bool,
+        field_id: int,
+        name: str,
+        field_type: Type,
+        doc: Optional[str] = None,
+    ):
+        key = is_optional, field_id, name, field_type, doc
+        cls._implemented[key] = cls._implemented.get(key, object.__new__(cls))

Review comment:
       Yeah you're right. Unnecessarily inefficient. changed it now to use `or` instead




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796918240



##########
File path: python/src/iceberg/types.py
##########
@@ -48,27 +46,69 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
 
+class PrimitiveType(Type):
+    """Base class for all Iceberg Primitive Types"""
+
+    _implemented = {}  # type: ignore

Review comment:
       the `implemented` thinking was back from when I was using that factory pattern for generics.  `instances` does sound better actually. changed it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796065073



##########
File path: python/src/iceberg/types.py
##########
@@ -15,61 +15,84 @@
 # specific language governing permissions and limitations
 # under the License.
 """Data types used in describing Iceberg schemas
-
 This module implements the data types described in the Iceberg specification for Iceberg schemas. To
 describe an Iceberg table schema, these classes can be used in the construction of a StructType instance.
-
 Example:
     >>> StructType(
         [
             NestedField(True, 1, "required_field", StringType()),
             NestedField(False, 2, "optional_field", IntegerType()),
         ]
     )
-
 Notes:
   - https://iceberg.apache.org/#spec/#primitive-types
 """
 
-from typing import Optional
+from typing import Any, Dict, List, Optional, Tuple
 
 
-class Type:
-    def __init__(self, type_string: str, repr_string: str, is_primitive=False):
-        self._type_string = type_string
-        self._repr_string = repr_string
-        self._is_primitive = is_primitive
+class IcebergType:
+
+    _implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"] = {}
+
+    def __new__(cls, *args, **kwargs):
+        init_func = cls.__dict__.get("__init__")
+        names = init_func.__code__.co_varnames[1:] if init_func else []
+        defaults = init_func and init_func.__defaults__ or []
+        args = {**dict(zip(names[::-1], defaults[::-1])), **dict(zip(names, args))}
+        args = {**args, **kwargs}
+        args = tuple(
+            (k, tuple(args[k]) if isinstance(args[k], list) else args[k]) for k in names
+        )
+        key = (cls.__name__, args)
+        if key in cls._implemented:
+            return cls._implemented[key]
+        else:
+            ret = object.__new__(cls)
+            ret._args = args
+            cls._implemented[key] = ret
+            return ret
 
     def __repr__(self):
-        return self._repr_string
+        base = f"{type(self).__name__}("
+        if not self._args:
+            return base + ")"
+        for k, v in self._args:
+            base += k + "=" + repr(v) + ", "
+        return base[:-2] + ")"

Review comment:
       You can likely use `join` and a comprehension so you don't have to chop off the last two characters of the string.
   
   Untested, but maybe something like
   ```python
   ', '.join([f"{k=repr(v)" for k, v in self._args])
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#discussion_r796853739



##########
File path: python/src/iceberg/types.py
##########
@@ -48,27 +46,69 @@ def __str__(self):
 
     @property
     def is_primitive(self) -> bool:
-        return self._is_primitive
+        return isinstance(self, PrimitiveType)
+
 
+class PrimitiveType(Type):
+    """Base class for all Iceberg Primitive Types"""
+
+    _implemented = {}  # type: ignore

Review comment:
       What about `_instances`? I don't quite follow what you mean by the name `_implemented`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on pull request #4016: types use new for generics. no metaprogramming

Posted by GitBox <gi...@apache.org>.
CircArgs commented on pull request #4016:
URL: https://github.com/apache/iceberg/pull/4016#issuecomment-1047376334


   > @CircArgs, FYI. I tried to open a PR against your branch, but I couldn't so I had to create a new PR.
   > 
   > Feel free to pick the changes into your branch if you want to commit the other PR.
   
   Thanks a lot @rdblue. I've pulled in your commits here. Sorry I wasn't quite on the same page with what you were thinking, makes sense with the flag in your changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org