You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/31 17:59:26 UTC

[GitHub] [iceberg] TGooch44 opened a new pull request #2399: [python] Updating Expressions with the Term object used in Predicates

TGooch44 opened a new pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399


   Part 1 of N(?) updating python expressions to move it closer to the current state of the java implementation.  This was motivated following review of some of the changes when I started working on some issues in the ResidualEvaluator and on non-identity partition columns.  
   
   The implementation in java has diverged a lot since the initial python implementation, this and the following commits should help move it back in line.  I expect there may be some back and forth on this because I was a little unsure about how to approach this.  cc: @rymurr Let me know what your thoughts are on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on a change in pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#discussion_r607780596



##########
File path: python/iceberg/api/expressions/literals.py
##########
@@ -76,6 +77,7 @@ class Literal(object):
     JAVA_MIN_INT = -2147483648
     JAVA_MAX_FLOAT = 3.4028235E38
     JAVA_MIN_FLOAT = -3.4028235E38
+    value: Any

Review comment:
       Not sure I love the class variable here. And its not being used below (the `self.value` in `BaseLiteral` hides it). Maybe thsi should be an `abc.ABC`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on a change in pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on a change in pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#discussion_r606005443



##########
File path: python/iceberg/api/expressions/predicate.py
##########
@@ -70,72 +86,152 @@ def __str__(self):
 
 class BoundPredicate(Predicate):
 
-    def __init__(self, op, ref, lit=None):
-        super(BoundPredicate, self).__init__(op, ref, lit)
+    def __init__(self, op: Operation, term: BoundTerm, lit: Literal = None, literals: List[Literal] = None,
+                 is_unary_predicate: bool = False, is_literal_predicate: bool = False,
+                 is_set_predicate: bool = False):
+        super(BoundPredicate, self).__init__(op, term)
+        ValidationException.check(sum([is_unary_predicate, is_literal_predicate, is_set_predicate]) == 1,
+                                  "Only a single predicate type may be set: %s=%s, %s=%s, %s=%s",
+                                  ("is_unary_predicate", is_unary_predicate,
+                                   "is_literal_predicate", is_literal_predicate,
+                                   "is_set_predicate", is_set_predicate))
+
+        self._literals: Union[None, List[Literal]] = None
+        if is_unary_predicate:
+            ValidationException.check(lit is None, "Unary Predicates may not have a literal", ())
+
+        elif is_literal_predicate:
+            ValidationException.check(lit is not None, "Literal Predicates must have a literal set", ())
+            self._literals = [lit]  # type: ignore
+
+        elif is_set_predicate:
+            ValidationException.check(literals is not None, "Set Predicates must have literals set", ())
+            self._literals = literals
+        else:
+            raise ValueError(f"Unable to instantiate {op} -> (lit={lit}, literal={literals}")
 
-    def negate(self):
-        return BoundPredicate(self.op.negate(), self.ref, self.lit)
+    @property
+    def lit(self) -> Union[None, Literal]:
+        if self._literals is None or len(self._literals) == 0:
+            return None
+        return self._literals[0]
+
+    def eval(self, struct: StructLike) -> bool:
+        raise NotImplementedError("eval not implemented for BoundPredicate")
+
+    def test(self, struct) -> bool:
+        raise NotImplementedError("TO:DO implement test")
 
 
 class UnboundPredicate(Predicate):
 
-    def __init__(self, op, named_ref, value=None, lit=None):
-        if value is None and lit is None:
-            super(UnboundPredicate, self).__init__(op, named_ref, None)
+    def __init__(self, op, term, value=None, lit=None, values=None, literals=None):
+        self._literals = None
+        num_set_args = sum([1 for x in [value, lit, values, literals] if x is not None])
+
+        if num_set_args > 1:
+            raise ValueError(f"Only one of value={value}, lit={lit}, values={values}, literals={literals} may be set")
+        super(UnboundPredicate, self).__init__(op, term)
         if isinstance(value, Literal):
             lit = value
             value = None
         if value is not None:
-            super(UnboundPredicate, self).__init__(op, named_ref, Literals.from_(value))
+            self._literals = [Literals.from_(value)]
         elif lit is not None:
-            super(UnboundPredicate, self).__init__(op, named_ref, lit)
+            self._literals = [lit]
+        elif values is not None:
+            self._literals = map(Literals.from_, values)
+        elif literals is not None:
+            self._literals = literals
+
+    @property
+    def literals(self):
+        return self._literals
+
+    @property
+    def lit(self):
+        if self.op in [Operation.IN, Operation.NOT_IN]:
+            raise ValueError(f"{self.op} predicate cannot return a literal")
+
+        return None if self.literals is None else self.literals[0]
 
     def negate(self):
-        return UnboundPredicate(self.op.negate(), self.ref, self.lit)
+        return UnboundPredicate(self.op.negate(), self.term, literals=self.literals)
 
-    def bind(self, struct, case_sensitive=True):  # noqa: C901
-        if case_sensitive:
-            field = struct.field(self.ref.name)
-        else:
-            field = struct.case_insensitive_field(self.ref.name.lower())
-
-        ValidationException.check(field is not None,
-                                  "Cannot find field '%s' in struct %s", (self.ref.name, struct))
-
-        if self.lit is None:
-            if self.op == Operation.IS_NULL:
-                if field.is_required:
-                    return FALSE
-                return BoundPredicate(Operation.IS_NULL, BoundReference(struct, field.field_id))
-            elif self.op == Operation.NOT_NULL:
-                if field.is_required:
-                    return TRUE
-                return BoundPredicate(Operation.NOT_NULL, BoundReference(struct, field.field_id))
+    def bind(self, struct, case_sensitive=True):
+        bound = self.term.bind(struct, case_sensitive=case_sensitive)
+
+        if self.literals is None:
+            return self.bind_unary_operation(bound)
+        elif self.op in [Operation.IN, Operation.NOT_IN]:
+            return self.bind_in_operation(bound)
+
+        return self.bind_literal_operation(bound)
+
+    def bind_unary_operation(self, bound_term: BoundTerm) -> BoundPredicate:
+        from .expressions import Expressions

Review comment:
       If it's at the top it introduces a circular import because of all the static methods in Expressions that expose shorthand's for the various predicate type, eg. `Expressions.equal`,  `Expressions.not_equal`, etc.  These could probably be moved into the predicate file itself, but leaving it in the expressions keeps it a bit more consistent with the Java API. We could also move it to the predicates file and then expose it in the expressions module init file.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#issuecomment-813629444


   Do you think this is good to merge, or were there some other things you want me to fix/take a look at?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#issuecomment-812672404


   @rymurr I believe I got all the issues you mentioned.  With the Term hierarchy, I agree it's not ideal, but I played around with a couple of different refactors, and nothing seemed much better.  I think a more extensive rewrite would be needed to really reduce the complexity of the class hierarchies.  I like the `Protocol` idea, although I think we're supporting py 3.7(...for now anyways...), so maybe something to think about in the next iteration on this module.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr merged pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
rymurr merged pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#issuecomment-816663489


   LGTM @TGooch44, thanks for the patience!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on a change in pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on a change in pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#discussion_r607989089



##########
File path: python/iceberg/api/expressions/literals.py
##########
@@ -76,6 +77,7 @@ class Literal(object):
     JAVA_MIN_INT = -2147483648
     JAVA_MAX_FLOAT = 3.4028235E38
     JAVA_MIN_FLOAT = -3.4028235E38
+    value: Any

Review comment:
       I believe the `value` is there purely to satisfy mypy checks(per [this PEP](https://www.python.org/dev/peps/pep-0526/ ) .  I do think you are correct that the Literal Class is kinda useless, I'll need to double check, but I think I could just make everything BaseLiteral, which does already have `value` as a instance variable, and move the constants and the `of` method  out to be in the file global scope.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on a change in pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on a change in pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#discussion_r607838572



##########
File path: python/iceberg/api/expressions/literals.py
##########
@@ -76,6 +77,7 @@ class Literal(object):
     JAVA_MIN_INT = -2147483648
     JAVA_MAX_FLOAT = 3.4028235E38
     JAVA_MIN_FLOAT = -3.4028235E38
+    value: Any

Review comment:
       Let me take a look, that sounds reasonable




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on a change in pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#discussion_r605580800



##########
File path: python/iceberg/api/expressions/predicate.py
##########
@@ -15,23 +15,39 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from __future__ import annotations
+
+from typing import List, TYPE_CHECKING, Union
+
 from iceberg.exceptions import ValidationException
 
 from .expression import (Expression,
-                         FALSE,
-                         Operation,
-                         TRUE)
+                         Operation)
 from .literals import (Literal,
                        Literals)
-from .reference import BoundReference
+from .term import BoundTerm, Term
+from ..types import TypeID
+
+if TYPE_CHECKING:
+    from iceberg.api import StructLike
 
 
 class Predicate(Expression):
 
-    def __init__(self, op, ref, lit):
-        self.op = op
-        self.ref = ref
-        self.lit = lit
+    def __init__(self, op, term):

Review comment:
       nit: can you add types to the `op` and `term` here? 

##########
File path: python/iceberg/api/expressions/predicate.py
##########
@@ -70,72 +86,152 @@ def __str__(self):
 
 class BoundPredicate(Predicate):
 
-    def __init__(self, op, ref, lit=None):
-        super(BoundPredicate, self).__init__(op, ref, lit)
+    def __init__(self, op: Operation, term: BoundTerm, lit: Literal = None, literals: List[Literal] = None,
+                 is_unary_predicate: bool = False, is_literal_predicate: bool = False,
+                 is_set_predicate: bool = False):
+        super(BoundPredicate, self).__init__(op, term)
+        ValidationException.check(sum([is_unary_predicate, is_literal_predicate, is_set_predicate]) == 1,
+                                  "Only a single predicate type may be set: %s=%s, %s=%s, %s=%s",
+                                  ("is_unary_predicate", is_unary_predicate,
+                                   "is_literal_predicate", is_literal_predicate,
+                                   "is_set_predicate", is_set_predicate))
+
+        self._literals: Union[None, List[Literal]] = None
+        if is_unary_predicate:
+            ValidationException.check(lit is None, "Unary Predicates may not have a literal", ())
+
+        elif is_literal_predicate:
+            ValidationException.check(lit is not None, "Literal Predicates must have a literal set", ())
+            self._literals = [lit]  # type: ignore
+
+        elif is_set_predicate:
+            ValidationException.check(literals is not None, "Set Predicates must have literals set", ())
+            self._literals = literals
+        else:
+            raise ValueError(f"Unable to instantiate {op} -> (lit={lit}, literal={literals}")
 
-    def negate(self):
-        return BoundPredicate(self.op.negate(), self.ref, self.lit)
+    @property
+    def lit(self) -> Union[None, Literal]:

Review comment:
       nit: `Optional` may be cleaner than `Union[None, T]`

##########
File path: python/iceberg/api/expressions/term.py
##########
@@ -0,0 +1,53 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+
+from iceberg.api.types import StructType
+
+if TYPE_CHECKING:
+    from iceberg.api import StructLike
+
+
+class Term(object):
+    pass
+
+
+class BoundTerm(Term):
+
+    @property
+    def ref(self):
+        raise NotImplementedError("Base class does not have implementation")
+
+    @property
+    def type(self):
+        raise NotImplementedError("Base class does not have implementation")
+
+    def eval(self, struct: 'StructLike'):

Review comment:
       nit: `'StructLike'`  here and `StructLike` below

##########
File path: python/iceberg/api/expressions/predicate.py
##########
@@ -70,72 +86,152 @@ def __str__(self):
 
 class BoundPredicate(Predicate):
 
-    def __init__(self, op, ref, lit=None):
-        super(BoundPredicate, self).__init__(op, ref, lit)
+    def __init__(self, op: Operation, term: BoundTerm, lit: Literal = None, literals: List[Literal] = None,
+                 is_unary_predicate: bool = False, is_literal_predicate: bool = False,
+                 is_set_predicate: bool = False):
+        super(BoundPredicate, self).__init__(op, term)
+        ValidationException.check(sum([is_unary_predicate, is_literal_predicate, is_set_predicate]) == 1,
+                                  "Only a single predicate type may be set: %s=%s, %s=%s, %s=%s",
+                                  ("is_unary_predicate", is_unary_predicate,
+                                   "is_literal_predicate", is_literal_predicate,
+                                   "is_set_predicate", is_set_predicate))
+
+        self._literals: Union[None, List[Literal]] = None
+        if is_unary_predicate:
+            ValidationException.check(lit is None, "Unary Predicates may not have a literal", ())
+
+        elif is_literal_predicate:
+            ValidationException.check(lit is not None, "Literal Predicates must have a literal set", ())
+            self._literals = [lit]  # type: ignore
+
+        elif is_set_predicate:
+            ValidationException.check(literals is not None, "Set Predicates must have literals set", ())
+            self._literals = literals
+        else:
+            raise ValueError(f"Unable to instantiate {op} -> (lit={lit}, literal={literals}")
 
-    def negate(self):
-        return BoundPredicate(self.op.negate(), self.ref, self.lit)
+    @property
+    def lit(self) -> Union[None, Literal]:
+        if self._literals is None or len(self._literals) == 0:
+            return None
+        return self._literals[0]
+
+    def eval(self, struct: StructLike) -> bool:
+        raise NotImplementedError("eval not implemented for BoundPredicate")
+
+    def test(self, struct) -> bool:

Review comment:
       nit: `struct: StructLike` and `TODO:`

##########
File path: python/iceberg/api/expressions/term.py
##########
@@ -0,0 +1,53 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+
+from iceberg.api.types import StructType
+
+if TYPE_CHECKING:
+    from iceberg.api import StructLike
+
+
+class Term(object):

Review comment:
       Is the purpose of this class structure to give type safety or as an abstract class? I don't love these hierarchies in python and found this blog really interesting: https://glyph.twistedmatrix.com/2021/03/interfaces-and-protocols.html
   
   By no means do we need to do this here, just some food for thought.

##########
File path: python/iceberg/api/expressions/predicate.py
##########
@@ -70,72 +86,152 @@ def __str__(self):
 
 class BoundPredicate(Predicate):
 
-    def __init__(self, op, ref, lit=None):
-        super(BoundPredicate, self).__init__(op, ref, lit)
+    def __init__(self, op: Operation, term: BoundTerm, lit: Literal = None, literals: List[Literal] = None,
+                 is_unary_predicate: bool = False, is_literal_predicate: bool = False,
+                 is_set_predicate: bool = False):
+        super(BoundPredicate, self).__init__(op, term)
+        ValidationException.check(sum([is_unary_predicate, is_literal_predicate, is_set_predicate]) == 1,
+                                  "Only a single predicate type may be set: %s=%s, %s=%s, %s=%s",
+                                  ("is_unary_predicate", is_unary_predicate,
+                                   "is_literal_predicate", is_literal_predicate,
+                                   "is_set_predicate", is_set_predicate))
+
+        self._literals: Union[None, List[Literal]] = None
+        if is_unary_predicate:
+            ValidationException.check(lit is None, "Unary Predicates may not have a literal", ())
+
+        elif is_literal_predicate:
+            ValidationException.check(lit is not None, "Literal Predicates must have a literal set", ())
+            self._literals = [lit]  # type: ignore
+
+        elif is_set_predicate:
+            ValidationException.check(literals is not None, "Set Predicates must have literals set", ())
+            self._literals = literals
+        else:
+            raise ValueError(f"Unable to instantiate {op} -> (lit={lit}, literal={literals}")
 
-    def negate(self):
-        return BoundPredicate(self.op.negate(), self.ref, self.lit)
+    @property
+    def lit(self) -> Union[None, Literal]:
+        if self._literals is None or len(self._literals) == 0:
+            return None
+        return self._literals[0]
+
+    def eval(self, struct: StructLike) -> bool:
+        raise NotImplementedError("eval not implemented for BoundPredicate")
+
+    def test(self, struct) -> bool:
+        raise NotImplementedError("TO:DO implement test")
 
 
 class UnboundPredicate(Predicate):
 
-    def __init__(self, op, named_ref, value=None, lit=None):
-        if value is None and lit is None:
-            super(UnboundPredicate, self).__init__(op, named_ref, None)
+    def __init__(self, op, term, value=None, lit=None, values=None, literals=None):
+        self._literals = None
+        num_set_args = sum([1 for x in [value, lit, values, literals] if x is not None])
+
+        if num_set_args > 1:
+            raise ValueError(f"Only one of value={value}, lit={lit}, values={values}, literals={literals} may be set")
+        super(UnboundPredicate, self).__init__(op, term)
         if isinstance(value, Literal):
             lit = value
             value = None
         if value is not None:
-            super(UnboundPredicate, self).__init__(op, named_ref, Literals.from_(value))
+            self._literals = [Literals.from_(value)]
         elif lit is not None:
-            super(UnboundPredicate, self).__init__(op, named_ref, lit)
+            self._literals = [lit]
+        elif values is not None:
+            self._literals = map(Literals.from_, values)
+        elif literals is not None:
+            self._literals = literals
+
+    @property
+    def literals(self):
+        return self._literals
+
+    @property
+    def lit(self):
+        if self.op in [Operation.IN, Operation.NOT_IN]:
+            raise ValueError(f"{self.op} predicate cannot return a literal")
+
+        return None if self.literals is None else self.literals[0]
 
     def negate(self):
-        return UnboundPredicate(self.op.negate(), self.ref, self.lit)
+        return UnboundPredicate(self.op.negate(), self.term, literals=self.literals)
 
-    def bind(self, struct, case_sensitive=True):  # noqa: C901
-        if case_sensitive:
-            field = struct.field(self.ref.name)
-        else:
-            field = struct.case_insensitive_field(self.ref.name.lower())
-
-        ValidationException.check(field is not None,
-                                  "Cannot find field '%s' in struct %s", (self.ref.name, struct))
-
-        if self.lit is None:
-            if self.op == Operation.IS_NULL:
-                if field.is_required:
-                    return FALSE
-                return BoundPredicate(Operation.IS_NULL, BoundReference(struct, field.field_id))
-            elif self.op == Operation.NOT_NULL:
-                if field.is_required:
-                    return TRUE
-                return BoundPredicate(Operation.NOT_NULL, BoundReference(struct, field.field_id))
+    def bind(self, struct, case_sensitive=True):
+        bound = self.term.bind(struct, case_sensitive=case_sensitive)
+
+        if self.literals is None:
+            return self.bind_unary_operation(bound)
+        elif self.op in [Operation.IN, Operation.NOT_IN]:
+            return self.bind_in_operation(bound)
+
+        return self.bind_literal_operation(bound)
+
+    def bind_unary_operation(self, bound_term: BoundTerm) -> BoundPredicate:
+        from .expressions import Expressions

Review comment:
       just curious why this import is at the method only?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on a change in pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#discussion_r607842297



##########
File path: python/iceberg/api/expressions/literals.py
##########
@@ -76,6 +77,7 @@ class Literal(object):
     JAVA_MIN_INT = -2147483648
     JAVA_MAX_FLOAT = 3.4028235E38
     JAVA_MIN_FLOAT = -3.4028235E38
+    value: Any

Review comment:
       Im still musing about adjusting the class heirarchy. I think we can skip the `abc.ABC` for now. I do think the class variable can go though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on a change in pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#discussion_r608045892



##########
File path: python/iceberg/api/expressions/literals.py
##########
@@ -76,6 +77,7 @@ class Literal(object):
     JAVA_MIN_INT = -2147483648
     JAVA_MAX_FLOAT = 3.4028235E38
     JAVA_MIN_FLOAT = -3.4028235E38
+    value: Any

Review comment:
       thats cool. I think we can also just remove the `value: Any` since its not used anywhere else in the class. The static method doesn't use it right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on a change in pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on a change in pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#discussion_r608049343



##########
File path: python/iceberg/api/expressions/literals.py
##########
@@ -76,6 +77,7 @@ class Literal(object):
     JAVA_MIN_INT = -2147483648
     JAVA_MAX_FLOAT = 3.4028235E38
     JAVA_MIN_FLOAT = -3.4028235E38
+    value: Any

Review comment:
       yeah, it's not used anywhere it's just a type hint.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on a change in pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on a change in pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#discussion_r610600544



##########
File path: python/iceberg/api/expressions/literals.py
##########
@@ -76,6 +77,7 @@ class Literal(object):
     JAVA_MIN_INT = -2147483648
     JAVA_MAX_FLOAT = 3.4028235E38
     JAVA_MIN_FLOAT = -3.4028235E38
+    value: Any

Review comment:
       I removed `value` from `Literal` and changed the type hints in `Predicate` to `BaseLiteral`. Let me know if that looks good now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#issuecomment-812401140


   > I'd just kind of like to get a functional read path(and maybe at least some of the simple append type write path cases) in before iterating too much on the current design.
   
   My thoughts exactly!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#issuecomment-813353404


   > @rymurr I believe I got all the issues you mentioned. With the Term hierarchy, I agree it's not ideal, but I played around with a couple of different refactors, and nothing seemed much better. I think a more extensive rewrite would be needed to really reduce the complexity of the class hierarchies. I like the `Protocol` idea, although I think we're supporting py 3.7(...for now anyways...), so maybe something to think about in the next iteration on this module.
   
   Cool, I agree with you. Lets get it working then possibly re-write it later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on a change in pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on a change in pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#discussion_r607989089



##########
File path: python/iceberg/api/expressions/literals.py
##########
@@ -76,6 +77,7 @@ class Literal(object):
     JAVA_MIN_INT = -2147483648
     JAVA_MAX_FLOAT = 3.4028235E38
     JAVA_MIN_FLOAT = -3.4028235E38
+    value: Any

Review comment:
       I believe the `value` is there purely to satisfy mypy checks(per [this PEP](https://www.python.org/dev/peps/pep-0526/) .  I do think you are correct that the Literal Class is kinda useless, I'll need to double check, but I think I could just make everything BaseLiteral, which does already have `value` as a instance variable, and move the constants and the `of` method  out to be in the file global scope.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#issuecomment-812240754


   > A handful of minor comments @TGooch44 , I have also been concerned about Java and Python diverging. Been thinking of how to better keep the two in sync with so few people on the python impl. Something we need to figure out if we want to push this to pypi at some point. Curious to see what you think!
   > 
   > This change itself LGTM, its hard to tell w/o all the context. What other changes are you thinking of after this and what is the broader plan for the other N-1 PRs?
   
   The next PR, fleshes out the capability around partition transformation/evaluation, and updates some of the evaluators.  From there, I think a few months back we had discussed a higher level construct over the parquet reader that can take a Table or TableScan and turn that in to a set of parquet reads and then unify that into an arrow table, record batches, etc.  I know it can be difficult to review some of this broad PRs, so I was trying to chunk it up into some more manageable sized chunks and be sure to have good test coverage especially if the java side has the tests already there.
   
   I think it's going to be challenging maintaining a close coupling with the java implementation.  It may be worthwhile sometime soon, to look at what's currently in the python library and how it can be simplified or designed in a way that's less aligned with java and more aligned with a design that works well in python. I'd just kind of like to get a functional read path(and maybe at least some of the simple append type write path cases) in before iterating too much on the current design.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on a change in pull request #2399: [python] Updating Expressions with the Term object used in Predicates

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on a change in pull request #2399:
URL: https://github.com/apache/iceberg/pull/2399#discussion_r606005443



##########
File path: python/iceberg/api/expressions/predicate.py
##########
@@ -70,72 +86,152 @@ def __str__(self):
 
 class BoundPredicate(Predicate):
 
-    def __init__(self, op, ref, lit=None):
-        super(BoundPredicate, self).__init__(op, ref, lit)
+    def __init__(self, op: Operation, term: BoundTerm, lit: Literal = None, literals: List[Literal] = None,
+                 is_unary_predicate: bool = False, is_literal_predicate: bool = False,
+                 is_set_predicate: bool = False):
+        super(BoundPredicate, self).__init__(op, term)
+        ValidationException.check(sum([is_unary_predicate, is_literal_predicate, is_set_predicate]) == 1,
+                                  "Only a single predicate type may be set: %s=%s, %s=%s, %s=%s",
+                                  ("is_unary_predicate", is_unary_predicate,
+                                   "is_literal_predicate", is_literal_predicate,
+                                   "is_set_predicate", is_set_predicate))
+
+        self._literals: Union[None, List[Literal]] = None
+        if is_unary_predicate:
+            ValidationException.check(lit is None, "Unary Predicates may not have a literal", ())
+
+        elif is_literal_predicate:
+            ValidationException.check(lit is not None, "Literal Predicates must have a literal set", ())
+            self._literals = [lit]  # type: ignore
+
+        elif is_set_predicate:
+            ValidationException.check(literals is not None, "Set Predicates must have literals set", ())
+            self._literals = literals
+        else:
+            raise ValueError(f"Unable to instantiate {op} -> (lit={lit}, literal={literals}")
 
-    def negate(self):
-        return BoundPredicate(self.op.negate(), self.ref, self.lit)
+    @property
+    def lit(self) -> Union[None, Literal]:
+        if self._literals is None or len(self._literals) == 0:
+            return None
+        return self._literals[0]
+
+    def eval(self, struct: StructLike) -> bool:
+        raise NotImplementedError("eval not implemented for BoundPredicate")
+
+    def test(self, struct) -> bool:
+        raise NotImplementedError("TO:DO implement test")
 
 
 class UnboundPredicate(Predicate):
 
-    def __init__(self, op, named_ref, value=None, lit=None):
-        if value is None and lit is None:
-            super(UnboundPredicate, self).__init__(op, named_ref, None)
+    def __init__(self, op, term, value=None, lit=None, values=None, literals=None):
+        self._literals = None
+        num_set_args = sum([1 for x in [value, lit, values, literals] if x is not None])
+
+        if num_set_args > 1:
+            raise ValueError(f"Only one of value={value}, lit={lit}, values={values}, literals={literals} may be set")
+        super(UnboundPredicate, self).__init__(op, term)
         if isinstance(value, Literal):
             lit = value
             value = None
         if value is not None:
-            super(UnboundPredicate, self).__init__(op, named_ref, Literals.from_(value))
+            self._literals = [Literals.from_(value)]
         elif lit is not None:
-            super(UnboundPredicate, self).__init__(op, named_ref, lit)
+            self._literals = [lit]
+        elif values is not None:
+            self._literals = map(Literals.from_, values)
+        elif literals is not None:
+            self._literals = literals
+
+    @property
+    def literals(self):
+        return self._literals
+
+    @property
+    def lit(self):
+        if self.op in [Operation.IN, Operation.NOT_IN]:
+            raise ValueError(f"{self.op} predicate cannot return a literal")
+
+        return None if self.literals is None else self.literals[0]
 
     def negate(self):
-        return UnboundPredicate(self.op.negate(), self.ref, self.lit)
+        return UnboundPredicate(self.op.negate(), self.term, literals=self.literals)
 
-    def bind(self, struct, case_sensitive=True):  # noqa: C901
-        if case_sensitive:
-            field = struct.field(self.ref.name)
-        else:
-            field = struct.case_insensitive_field(self.ref.name.lower())
-
-        ValidationException.check(field is not None,
-                                  "Cannot find field '%s' in struct %s", (self.ref.name, struct))
-
-        if self.lit is None:
-            if self.op == Operation.IS_NULL:
-                if field.is_required:
-                    return FALSE
-                return BoundPredicate(Operation.IS_NULL, BoundReference(struct, field.field_id))
-            elif self.op == Operation.NOT_NULL:
-                if field.is_required:
-                    return TRUE
-                return BoundPredicate(Operation.NOT_NULL, BoundReference(struct, field.field_id))
+    def bind(self, struct, case_sensitive=True):
+        bound = self.term.bind(struct, case_sensitive=case_sensitive)
+
+        if self.literals is None:
+            return self.bind_unary_operation(bound)
+        elif self.op in [Operation.IN, Operation.NOT_IN]:
+            return self.bind_in_operation(bound)
+
+        return self.bind_literal_operation(bound)
+
+    def bind_unary_operation(self, bound_term: BoundTerm) -> BoundPredicate:
+        from .expressions import Expressions

Review comment:
       It introduces a circular import because of all the static methods in Expressions that expose shorthand's for the various predicate type, eg. `Expressions.equal`,  `Expressions.not_equal`, etc.  These could probably be moved into the predicate file itself, but leaving it in the expressions keeps it a bit more consistent with the Java API. We could also move it to the predicates file and then expose it in the expressions module init file.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org