You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/21 23:02:55 UTC

[GitHub] [iceberg] CircArgs opened a new pull request #3952: Types to types (optional value instantiated) + literals

CircArgs opened a new pull request #3952:
URL: https://github.com/apache/iceberg/pull/3952


   This PR takes a step towards making types functional by incorporating logic that allows them to take values and be literals.
   
   Generic types will take a form like `Fixed(8)` for the type and `Fixed(length=5, value=b'hello')` for the literal. `Type` has been removed from most classes because of this.
   
   Further, a `PrimitiveType` is a parent to all primitive types so that one can check primitive status of a type as `isinstance(Integer(), PrimitiveType)==True`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3952: Types to types (optional value instantiated) + literals

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3952:
URL: https://github.com/apache/iceberg/pull/3952#issuecomment-1019000279


   @CircArgs, I'm trying to understand the goal here. It looks like this makes the type classes both types and literals at the same time. I can now have both `Integer()` and `Integer(5)`. The first is a type. But the second is also a type, but one that is used as a literal? What is the value of mixing the two classes together?
   
   I probably don't understand the benefit, but I do see some drawbacks. First, I think that currently `Integer() == Integer()` will be `False`. That's confusing to me because I think those should be equivalent. Second, `isinstance(Integer(5), PrimitiveType)` is `True`, so the literal is reporting that it can be used as a type.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3952: Types to types (optional value instantiated) + literals

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3952:
URL: https://github.com/apache/iceberg/pull/3952#issuecomment-1019540788


   > usage feels very close to how the python standard library works
   
   Can you give examples? The only thing I can think of is that `str` can be used as a type or can be used to cast a value. We could do the same by implementing `__call__` on type objects to produce literals. But there is no need to make the type and the literal share a class.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on pull request #3952: Types to types (optional value instantiated) + literals

Posted by GitBox <gi...@apache.org>.
CircArgs commented on pull request #3952:
URL: https://github.com/apache/iceberg/pull/3952#issuecomment-1021223012


   @rdblue I definitely agree with your comments. Overall, I feel like the state from the previous types PR into this one of having `Integer()` as the method to get a type is pretty ugly and unnecessary. This PR was meant as a stepping stone to something more akin to true generics. I'm thinking that a lot of the hangup for the true sort of generics was me having tried to mix literals in with types to consolidate them, so I've taken a big step backwards and removed all literal functionality while also slimming down the generics implementation. Please have a look at https://github.com/apache/iceberg/pull/3981 to compare


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3952: Types to types (optional value instantiated) + literals

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3952:
URL: https://github.com/apache/iceberg/pull/3952#issuecomment-1021715031


   Thanks @CircArgs! I'm excited to take a look at the next steps in #3981. I'll hopefully get a chance soon!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3952: Types to types (optional value instantiated) + literals

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3952:
URL: https://github.com/apache/iceberg/pull/3952#discussion_r790079138



##########
File path: python/src/iceberg/types.py
##########
@@ -230,133 +191,260 @@ class LongType(Type):
 
     min: int = -9223372036854775808
 
-    def __init__(self):
-        super().__init__("long", "LongType()", is_primitive=True)
+    def __init__(self, value: Optional[int] = None):
+        super().__init__(value)
 
 
-class FloatType(Type):
+class Float(PrimitiveType):
     """A Float data type in Iceberg can be represented using an instance of this class. Floats in Iceberg are
     32-bit IEEE 754 floating points and can be promoted to Doubles.
 
+    Args:
+        value: an optional argument for creating a literal without which instantiation represents just a type
+
     Example:
-        >>> column_foo = FloatType()
-        >>> isinstance(column_foo, FloatType)
+        >>> column_foo = Float()
+        >>> isinstance(column_foo, Float)
         True
+        >>> Float(3.14)
+        Float(value=3.14)
+
     """
 
-    def __init__(self):
-        super().__init__("float", "FloatType()", is_primitive=True)
+    def __init__(self, value: Optional[float] = None):
+        super().__init__(value)
 
 
-class DoubleType(Type):
+class Double(PrimitiveType):
     """A Double data type in Iceberg can be represented using an instance of this class. Doubles in Iceberg are
     64-bit IEEE 754 floating points.
 
+    Args:
+        value: an optional argument for creating a literal without which instantiation represents just a type
+
     Example:
-        >>> column_foo = DoubleType()
-        >>> isinstance(column_foo, DoubleType)
+        >>> column_foo = Double()
+        >>> isinstance(column_foo, Double)
         True
+        >>> Double(3.14)
+        Double(value=3.14)
+
     """
 
-    def __init__(self):
-        super().__init__("double", "DoubleType()", is_primitive=True)
+    def __init__(self, value: Optional[float] = None):
+        super().__init__(value)
 
 
-class DateType(Type):
+class Date(PrimitiveType):
     """A Date data type in Iceberg can be represented using an instance of this class. Dates in Iceberg are
     calendar dates without a timezone or time.
 
+    Args:
+        value: an optional argument for creating a literal without which instantiation represents just a type
+
     Example:
-        >>> column_foo = DateType()
-        >>> isinstance(column_foo, DateType)
+        >>> column_foo = Date()
+        >>> isinstance(column_foo, Date)
         True
+        >>> Date("2017-11-16")
+        Date(value="2017-11-16")
     """
 
-    def __init__(self):
-        super().__init__("date", "DateType()", is_primitive=True)
+    def __init__(self, value: Optional[str] = None):
+        super().__init__(value)
 
 
-class TimeType(Type):
+class Time(PrimitiveType):
     """A Time data type in Iceberg can be represented using an instance of this class. Times in Iceberg
     have microsecond precision and are a time of day without a date or timezone.
 
+    Args:
+        value: an optional argument for creating a literal without which instantiation represents just a type
+
     Example:
-        >>> column_foo = TimeType()
-        >>> isinstance(column_foo, TimeType)
+        >>> column_foo = Time()
+        >>> isinstance(column_foo, Time)
         True
+        >>> Time("14:31:08")
+        Time(value="14:31:08")
 
     """
 
-    def __init__(self):
-        super().__init__("time", "TimeType()", is_primitive=True)
+    def __init__(self, value: Optional[str] = None):
+        super().__init__(value)
 
 
-class TimestampType(Type):
+class Timestamp(PrimitiveType):
     """A Timestamp data type in Iceberg can be represented using an instance of this class. Timestamps in
     Iceberg have microsecond precision and include a date and a time of day without a timezone.
 
+    Args:
+        value: an optional argument for creating a literal without which instantiation represents just a type
+
     Example:
-        >>> column_foo = TimestampType()
-        >>> isinstance(column_foo, TimestampType)
+        >>> column_foo = Timestamp()
+        >>> isinstance(column_foo, Timestamp)
         True
+        >>> Timestamp("2017-11-16T14:31:08-08:00")
+        Timestamp(value="2017-11-16T14:31:08-08:00")
 
     """
 
-    def __init__(self):
-        super().__init__("timestamp", "TimestampType()", is_primitive=True)
+    def __init__(self, value: Optional[str] = None):
+        super().__init__(value)
 
 
-class TimestamptzType(Type):
+class Timestamptz(PrimitiveType):
     """A Timestamptz data type in Iceberg can be represented using an instance of this class. Timestamptzs in
     Iceberg are stored as UTC and include a date and a time of day with a timezone.
 
     Example:
-        >>> column_foo = TimestamptzType()
-        >>> isinstance(column_foo, TimestamptzType)
+        >>> column_foo = Timestamptz()
+        >>> isinstance(column_foo, Timestamptz)
         True
+        >>> Timestamptz("2017-11-16T14:31:08-08:00")
+        Timestamptz(value="2017-11-16T14:31:08-08:00")
     """
 
-    def __init__(self):
-        super().__init__("timestamptz", "TimestamptzType()", is_primitive=True)
+    def __init__(self, value: Optional[str] = None):
+        super().__init__(value)
 
 
-class StringType(Type):
+class String(PrimitiveType):
     """A String data type in Iceberg can be represented using an instance of this class. Strings in
     Iceberg are arbitrary-length character sequences and are encoded with UTF-8.
 
+    Args:
+        value: an optional argument for creating a literal without which instantiation represents just a type
+
     Example:
-        >>> column_foo = StringType()
-        >>> isinstance(column_foo, StringType)
+        >>> column_foo = String()
+        >>> isinstance(column_foo, String)
         True
+        >>> String("hello")
+        String(value="hello")
     """
 
-    def __init__(self):
-        super().__init__("string", "StringType()", is_primitive=True)
+    def __init__(self, value: Optional[str] = None):
+        super().__init__(value)
 
 
-class UUIDType(Type):
+class UUID(PrimitiveType):
     """A UUID data type in Iceberg can be represented using an instance of this class. UUIDs in
     Iceberg are universally unique identifiers.
 
     Example:
-        >>> column_foo = UUIDType()
-        >>> isinstance(column_foo, UUIDType)
+        >>> column_foo = UUID()
+        >>> isinstance(column_foo, UUID)
         True
+        >>> UUID("f79c3e09-677c-4bbd-a479-3f349cb785e7")
+        UUID(value="f79c3e09-677c-4bbd-a479-3f349cb785e7")
     """
 
-    def __init__(self):
-        super().__init__("uuid", "UUIDType()", is_primitive=True)
+    def __init__(self, value: Optional[str] = None):
+        super().__init__(value)
 
 
-class BinaryType(Type):
+class Binary(PrimitiveType):
     """A Binary data type in Iceberg can be represented using an instance of this class. Binarys in
     Iceberg are arbitrary-length byte arrays.
 
+    Args:
+        value: an optional argument for creating a literal without which instantiation represents just a type
+
     Example:
-        >>> column_foo = BinaryType()
-        >>> isinstance(column_foo, BinaryType)
+        >>> column_foo = Binary()
+        >>> isinstance(column_foo, Binary)
         True
+        >>> Binary(b'hello'))
+
     """
 
-    def __init__(self):
-        super().__init__("binary", "BinaryType()", is_primitive=True)
+    def __init__(self, value: Optional[bytes] = None):
+        super().__init__(value)
+
+
+class NestedField(object):

Review comment:
       Rearranging the classes appears to have introduced a lot more changes than needed. We ended up with primitives being diffed against the old nested types, and we don't get to see changes for these classes. I think making this a two-step change would allow us to see what changed much more easily. Could you move these back to the top, or move them and then base the PR on master after the move?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs closed pull request #3952: Types to types (optional value instantiated) + literals

Posted by GitBox <gi...@apache.org>.
CircArgs closed pull request #3952:
URL: https://github.com/apache/iceberg/pull/3952


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on pull request #3952: Types to types (optional value instantiated) + literals

Posted by GitBox <gi...@apache.org>.
CircArgs commented on pull request #3952:
URL: https://github.com/apache/iceberg/pull/3952#issuecomment-1018927205


   tagging @rdblue @samredai 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs closed pull request #3952: Types to types (optional value instantiated) + literals

Posted by GitBox <gi...@apache.org>.
CircArgs closed pull request #3952:
URL: https://github.com/apache/iceberg/pull/3952


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] samredai commented on pull request #3952: Types to types (optional value instantiated) + literals

Posted by GitBox <gi...@apache.org>.
samredai commented on pull request #3952:
URL: https://github.com/apache/iceberg/pull/3952#issuecomment-1019331217


   There was some discussion in previous python syncs about decreasing the size of this part of the codebase by not having a literal class defined separately for each type. I don't think we reached a consensus so maybe we can do that here.
   
   I believe the only place literals are used is in the expressions logic which handles filtering of the metadata (partitions, manifest files, etc). Types are used in a lot of places since `StructType` is used to define schema in many places (table schema as well as the schema of metadata files).
   
   I like the intuitiveness of only having to look in one place to understand how a type is defined as well as how an instance of that type (a literal) behaves and usage feels very close to how the python standard library works. If the types and literals actually aren't as tightly coupled as the class names in the legacy and java implementation might indicate, then I can see how keeping them as separate classes could be better and they can be tied together by just an attribute on the literal classes. Curious to hear everyone else's thoughts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] samredai commented on pull request #3952: Types to types (optional value instantiated) + literals

Posted by GitBox <gi...@apache.org>.
samredai commented on pull request #3952:
URL: https://github.com/apache/iceberg/pull/3952#issuecomment-1019549170


   @rdblue I meant that both the following are true in built-in types:
   - calling `type` on a literal value returns the class used to instantiate the literal
   - checking the type of a literal value is done using `isinstance`, the literal, and that same class returned by `type`
   
   ```py
   def type_and_literal(literal):
       python_type = type(literal)  # calling type(5) returns int
       new_instance = python_type(literal)  # int(5) returns another literal instance
       
       # int is used in both isinstance and == so all of the below is True for all built-ins
       assert isinstance(new_instance, python_type)
       assert type(new_instance) == python_type
       assert isinstance(literal, python_type)
       assert type(literal) == python_type
   
   # Confirm the assertions for all built-in types
   for literal in "foo", 1, 1.1, True, {"foo": "bar"}, ("foo", "bar", 0, 1), b'foo':
       type_and_literal(literal)
   ```
   
   So there's no differentiation between a type object and a literal object or at least that's how the api makes it seem.
   
   > But there is no need to make the type and the literal share a class.
   
   I see, so there could be an `IntegerType` class that can create an `IntegerLiteral` and `IntegerLiteral` would return it's type as `IntegerType`. Would it be useful to wrap them both in a top-level `Integer` class to match the api in the snippet above? Or is that just getting us back to having the logic mixed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] CircArgs commented on pull request #3952: Types to types (optional value instantiated) + literals

Posted by GitBox <gi...@apache.org>.
CircArgs commented on pull request #3952:
URL: https://github.com/apache/iceberg/pull/3952#issuecomment-1021223012


   @rdblue I definitely agree with your comments. Overall, I feel like the state from the previous types PR into this one of having `Integer()` as the method to get a type is pretty ugly and unnecessary. This PR was meant as a stepping stone to something more akin to true generics. I'm thinking that a lot of the hangup for the true sort of generics was me having tried to mix literals in with types to consolidate them, so I've taken a big step backwards and removed all literal functionality while also slimming down the generics implementation. Please have a look at https://github.com/apache/iceberg/pull/3981 to compare


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org