You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/13 00:29:18 UTC
[GitHub] [iceberg] samredai commented on a diff in pull request #5258: More Python Expressions

samredai commented on code in PR #5258:
URL: https://github.com/apache/iceberg/pull/5258#discussion_r919444305


##########
python/pyiceberg/expressions/base.py:
##########
@@ -327,6 +347,54 @@ def bind(self, schema: Schema, case_sensitive: bool) -> BoundIn[T]:
         return BoundIn(bound_ref, tuple(lit.to(bound_ref.field.field_type) for lit in self.literals))  # type: ignore
 
 
+@dataclass(frozen=True)
+class BoundEQ(BoundPredicate[T]):
+    def __invert__(self):
+        raise TypeError("EQ expressions do not support negation.")
+
+
+@dataclass(frozen=True)
+class EQ(UnboundPredicate[T]):
+    def __invert__(self):
+        raise TypeError("EQ expressions do not support negation.")
+
+    def bind(self, schema: Schema, case_sensitive: bool) -> BoundEQ[T]:
+        bound_ref = self.term.bind(schema, case_sensitive)
+        return BoundEQ(bound_ref, self.literals[0].to(bound_ref.field.field_type))  # type: ignore
+
+
+@dataclass(frozen=True)
+class BoundLT(BoundPredicate[T]):
+    def __invert__(self):
+        raise TypeError("LT expressions do not support negation.")
+
+
+@dataclass(frozen=True)
+class LT(UnboundPredicate[T]):

Review Comment:
   This isn't a huge deal and I know the Java enum used these abbreviations like `LT`, but I feel like expanding these might be better for the python class names, i.e. `LessThan`.
   
   In a larger expression, `LessThan(Reference("foo_num"), literal(10))` is easier to reason about when seen for the first time than `LT(Reference("foo_num"), literal(10))`.



##########
python/pyiceberg/expressions/base.py:
##########
@@ -172,13 +178,33 @@ def bind(self, schema: Schema, case_sensitive: bool) -> BoundReference[T]:
 @dataclass(frozen=True)  # type: ignore[misc]
 class BoundPredicate(Bound[T], BooleanExpression):
     term: BoundReference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]
+    literals_len_lb: int = field(init=False, repr=False, hash=False, compare=False, default=1)
+    literals_len_ub: Optional[int] = field(init=False, repr=False, hash=False, compare=False, default=1)
+
+    def __post_init__(self):
+        if not isinstance(self.literals, tuple):
+            object.__setattr__(self, "literals", (self.literals,))
+        if not (self.literals_len_lb <= len(self.literals) <= (self.literals_len_ub or float("inf"))):  # type: ignore
+            raise AttributeError(
+                f"{self.__class__} expects a number of literals between at least {self.literals_len_lb} and at most {self.literals_len_ub or float('inf')}"
+            )
 
 
 @dataclass(frozen=True)  # type: ignore[misc]
 class UnboundPredicate(Unbound[T, BooleanExpression], BooleanExpression):
     term: Reference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]
+    literals_len_lb: int = field(init=False, repr=False, hash=False, compare=False, default=1)

Review Comment:
   Should this be `Optional[int]` since it has `default=1` for the field?



##########
python/pyiceberg/expressions/base.py:
##########
@@ -172,13 +178,33 @@ def bind(self, schema: Schema, case_sensitive: bool) -> BoundReference[T]:
 @dataclass(frozen=True)  # type: ignore[misc]
 class BoundPredicate(Bound[T], BooleanExpression):
     term: BoundReference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]
+    literals_len_lb: int = field(init=False, repr=False, hash=False, compare=False, default=1)
+    literals_len_ub: Optional[int] = field(init=False, repr=False, hash=False, compare=False, default=1)
+
+    def __post_init__(self):
+        if not isinstance(self.literals, tuple):
+            object.__setattr__(self, "literals", (self.literals,))
+        if not (self.literals_len_lb <= len(self.literals) <= (self.literals_len_ub or float("inf"))):  # type: ignore
+            raise AttributeError(
+                f"{self.__class__} expects a number of literals between at least {self.literals_len_lb} and at most {self.literals_len_ub or float('inf')}"
+            )
 
 
 @dataclass(frozen=True)  # type: ignore[misc]
 class UnboundPredicate(Unbound[T, BooleanExpression], BooleanExpression):
     term: Reference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]

Review Comment:
   I think using a pipe character will work here for the typehint. ->   `literals: Tuple[Literal[T], ...] | Literal[T]` 



##########
python/pyiceberg/expressions/base.py:
##########
@@ -327,6 +347,54 @@ def bind(self, schema: Schema, case_sensitive: bool) -> BoundIn[T]:
         return BoundIn(bound_ref, tuple(lit.to(bound_ref.field.field_type) for lit in self.literals))  # type: ignore
 
 
+@dataclass(frozen=True)
+class BoundEQ(BoundPredicate[T]):
+    def __invert__(self):
+        raise TypeError("EQ expressions do not support negation.")
+
+
+@dataclass(frozen=True)
+class EQ(UnboundPredicate[T]):
+    def __invert__(self):
+        raise TypeError("EQ expressions do not support negation.")

Review Comment:
   Initially I was thinking we wouldn't need `__invert__` since we could just negate when visiting...but I'm realizing we need to capture these inversions as the user defines the expression. Since `~EQ(Reference("foo"), literal("bar"))` would raise `TypeError: bad operand type for unary ~: EQ`, we actually might need a `NotEQ` class that can be used directly or returned by `EQ.__invert__`. Does that sound right?



##########
python/pyiceberg/expressions/base.py:
##########
@@ -172,13 +178,33 @@ def bind(self, schema: Schema, case_sensitive: bool) -> BoundReference[T]:
 @dataclass(frozen=True)  # type: ignore[misc]
 class BoundPredicate(Bound[T], BooleanExpression):
     term: BoundReference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]
+    literals_len_lb: int = field(init=False, repr=False, hash=False, compare=False, default=1)
+    literals_len_ub: Optional[int] = field(init=False, repr=False, hash=False, compare=False, default=1)
+
+    def __post_init__(self):
+        if not isinstance(self.literals, tuple):
+            object.__setattr__(self, "literals", (self.literals,))
+        if not (self.literals_len_lb <= len(self.literals) <= (self.literals_len_ub or float("inf"))):  # type: ignore
+            raise AttributeError(
+                f"{self.__class__} expects a number of literals between at least {self.literals_len_lb} and at most {self.literals_len_ub or float('inf')}"
+            )
 
 
 @dataclass(frozen=True)  # type: ignore[misc]
 class UnboundPredicate(Unbound[T, BooleanExpression], BooleanExpression):
     term: Reference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]
+    literals_len_lb: int = field(init=False, repr=False, hash=False, compare=False, default=1)
+    literals_len_ub: Optional[int] = field(init=False, repr=False, hash=False, compare=False, default=1)
+
+    def __post_init__(self):
+        if not isinstance(self.literals, tuple):

Review Comment:
   This makes sense to me but `if isinstance(self.literals, Literal):` is more explicit here and seems more intuitive.



##########
python/pyiceberg/expressions/base.py:
##########
@@ -172,13 +178,33 @@ def bind(self, schema: Schema, case_sensitive: bool) -> BoundReference[T]:
 @dataclass(frozen=True)  # type: ignore[misc]
 class BoundPredicate(Bound[T], BooleanExpression):
     term: BoundReference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]
+    literals_len_lb: int = field(init=False, repr=False, hash=False, compare=False, default=1)
+    literals_len_ub: Optional[int] = field(init=False, repr=False, hash=False, compare=False, default=1)
+
+    def __post_init__(self):
+        if not isinstance(self.literals, tuple):
+            object.__setattr__(self, "literals", (self.literals,))
+        if not (self.literals_len_lb <= len(self.literals) <= (self.literals_len_ub or float("inf"))):  # type: ignore
+            raise AttributeError(
+                f"{self.__class__} expects a number of literals between at least {self.literals_len_lb} and at most {self.literals_len_ub or float('inf')}"
+            )
 
 
 @dataclass(frozen=True)  # type: ignore[misc]
 class UnboundPredicate(Unbound[T, BooleanExpression], BooleanExpression):
     term: Reference[T]
-    literals: Tuple[Literal[T], ...]
+    literals: Union[Tuple[Literal[T], ...], Literal[T]]
+    literals_len_lb: int = field(init=False, repr=False, hash=False, compare=False, default=1)
+    literals_len_ub: Optional[int] = field(init=False, repr=False, hash=False, compare=False, default=1)
+
+    def __post_init__(self):
+        if not isinstance(self.literals, tuple):
+            object.__setattr__(self, "literals", (self.literals,))
+        if not (self.literals_len_lb <= len(self.literals) <= (self.literals_len_ub or float("inf"))):  # type: ignore

Review Comment:
   For the `literals_len_lb` and `literals_len_ub`, are there scenarios where we need this specificity or is it the case that there will be two kinds of predicates: those that take a single literal and those that take a set of literals of any size? If it's the latter, should we just handle that in the `__post_init__` methods for the concrete predicate classes?
   
   ```py
   @dataclass(frozen=True)
   class In(UnboundPredicate[T]):
   
       def __post_init__(self):
           if isinstance(self.literals, Literal):
               object.__setattr__(self, "literals", set(self.literals,))
   ```
   
   ...and do the reverse for predicates that require a single literal:
   
   ```py
   @dataclass(frozen=True)
   class EQ(UnboundPredicate[T]):
   
       def __post_init__(self):
           if isinstance(self.literals, Set):
               if len(self.literals) > 1:
                   raise TypeError("...")
               object.__setattr__(self, "literals", self.literals[0])
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org