You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/19 18:41:54 UTC

[GitHub] [iceberg] TGooch44 opened a new pull request #2115: [python] Completing ResidualEvaluator Implementation

TGooch44 opened a new pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115


   This completes the ResidualEvaluator implementation.  This is used to pass to the lower level file scanners any residual filters that must be applied at a row level after file level filtering has been applied.  This borrows heavily from the java implementation.
   
   cc: @rymurr 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] emkornfield commented on a change in pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#discussion_r788246600



##########
File path: python/iceberg/api/expressions/projections.py
##########
@@ -15,32 +15,81 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from typing import TYPE_CHECKING
 
 from .expressions import Expressions, ExpressionVisitors, RewriteNot
 from .predicate import BoundPredicate, UnboundPredicate
 
+if TYPE_CHECKING:
+    from .expression import Expression
+    from .predicate import Predicate
+    from ..partition_spec import PartitionSpec
 
-def inclusive(spec, case_sensitive=True):
+
+def inclusive(spec: 'PartitionSpec', case_sensitive: bool = True):

Review comment:
       Is the return type specifically not-included here?  I see below  the object methods are not annotated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 closed pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
TGooch44 closed pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on a change in pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#discussion_r561056848



##########
File path: python/iceberg/api/expressions/residual_evaluator.py
##########
@@ -15,98 +15,158 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from .expression import Expression
 from .expressions import Expressions, ExpressionVisitors
+from .literals import Literal
 from .predicate import BoundPredicate, Predicate, UnboundPredicate
+from .reference import BoundReference
+from ..partition_spec import PartitionSpec
+from ..struct_like import StructLike
 
 
 class ResidualEvaluator(object):
+    """
+    Finds the residuals for an {@link Expression} the partitions in the given {@link PartitionSpec}.
 
-    def __init__(self, spec, expr):
+    A residual expression is made by partially evaluating an expression using partition values. For
+    example, if a table is partitioned by day(utc_timestamp) and is read with a filter expression
+    utc_timestamp &gt;= a and utc_timestamp &lt;= b, then there are 4 possible residuals expressions
+    for the partition data, d:
+
+
+        + If d > day(a) and d < day(b), the residual is always true
+        + If d == day(a) and d != day(b), the residual is utc_timestamp >= b

Review comment:
       ```suggestion
           + If d == day(a) and d != day(b), the residual is utc_timestamp >= a
   ```
   
   I think...right?

##########
File path: python/iceberg/api/expressions/residual_evaluator.py
##########
@@ -115,3 +175,14 @@ def unbound_predicate(self, pred):
             return bound_residual
 
         return bound
+
+
+class UnpartitionedEvaluator(ResidualEvaluator):
+
+    def __init__(self, expr):
+        return super(UnpartitionedEvaluator, self).__init__(PartitionSpec.unpartitioned(),

Review comment:
       `return` from an `__init__`? And an expression after it? I guess you don't want the return statement here.

##########
File path: python/iceberg/api/expressions/projections.py
##########
@@ -15,32 +15,81 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from typing import TYPE_CHECKING
 
 from .expressions import Expressions, ExpressionVisitors, RewriteNot
 from .predicate import BoundPredicate, UnboundPredicate
 
+if TYPE_CHECKING:
+    from .expression import Expression
+    from .predicate import Predicate
+    from ..partition_spec import PartitionSpec
 
-def inclusive(spec, case_sensitive=True):
+

Review comment:
       it seems a bit weird to me to expose `inclusive` and `InclusiveProjection` in the public API of this module. I wonder if we should remove `InclusiveProjection` from `__init__.py` of this module to ensure everyone uses the well documented `inclusive` function. (same for `strict` below)

##########
File path: python/iceberg/api/partition_spec.py
##########
@@ -117,9 +118,9 @@ def compatible_with(self, other):
 
     def lazy_fields_by_source_id(self):
         if self.fields_by_source_id is None:
-            self.fields_by_source_id = dict()
+            self.fields_by_source_id = defaultdict(list)
             for field in self.fields:
-                self.fields_by_source_id[field.source_id] = field
+                self.fields_by_source_id[field.source_id].append(field)

Review comment:
       was wondering about this change...what prompted it? Apologies if I missed it. I noticed the similar method below it `lazy_fields_bu_source_name` still uses only a dict. 
   
   None of the added tests exercise this change either,




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on a change in pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on a change in pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#discussion_r561241713



##########
File path: python/iceberg/api/expressions/residual_evaluator.py
##########
@@ -115,3 +175,14 @@ def unbound_predicate(self, pred):
             return bound_residual
 
         return bound
+
+
+class UnpartitionedEvaluator(ResidualEvaluator):
+
+    def __init__(self, expr):
+        return super(UnpartitionedEvaluator, self).__init__(PartitionSpec.unpartitioned(),

Review comment:
       yes, hmm...not sure what I was doing there, but you are of course right.  Will update.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on a change in pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on a change in pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#discussion_r561245043



##########
File path: python/iceberg/api/expressions/residual_evaluator.py
##########
@@ -15,98 +15,158 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from .expression import Expression
 from .expressions import Expressions, ExpressionVisitors
+from .literals import Literal
 from .predicate import BoundPredicate, Predicate, UnboundPredicate
+from .reference import BoundReference
+from ..partition_spec import PartitionSpec
+from ..struct_like import StructLike
 
 
 class ResidualEvaluator(object):
+    """
+    Finds the residuals for an {@link Expression} the partitions in the given {@link PartitionSpec}.
 
-    def __init__(self, spec, expr):
+    A residual expression is made by partially evaluating an expression using partition values. For
+    example, if a table is partitioned by day(utc_timestamp) and is read with a filter expression
+    utc_timestamp &gt;= a and utc_timestamp &lt;= b, then there are 4 possible residuals expressions
+    for the partition data, d:
+
+
+        + If d > day(a) and d < day(b), the residual is always true
+        + If d == day(a) and d != day(b), the residual is utc_timestamp >= b

Review comment:
       yes, definitely right since all values in the file will fall greater than that lower bound.  This looks like a typo/mistake, will update.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on a change in pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on a change in pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#discussion_r561240703



##########
File path: python/iceberg/api/partition_spec.py
##########
@@ -117,9 +118,9 @@ def compatible_with(self, other):
 
     def lazy_fields_by_source_id(self):
         if self.fields_by_source_id is None:
-            self.fields_by_source_id = dict()
+            self.fields_by_source_id = defaultdict(list)
             for field in self.fields:
-                self.fields_by_source_id[field.source_id] = field
+                self.fields_by_source_id[field.source_id].append(field)

Review comment:
       I believe this is because there may be multiple hidden partitions derived from the same table column.  I think Ryan Blue introduced this in the java implementation here:
   https://github.com/apache/iceberg/commit/649cbdde83693ebda8e8dc6e75857426d25414ec#diff-d1905822d843dea78ebe5404ee9ce885b7adbc46970fbaf931a87ae2758abeb6
   
   Let me double check, I had written this in our internal repo a while back and it's a little foggy now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#issuecomment-763896598


   @rymurr Thanks again for taking a look.  Your help on this has really been invaluable.  I'm reviewing your comments now, will comment on each. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on a change in pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on a change in pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#discussion_r561241305



##########
File path: python/iceberg/api/partition_spec.py
##########
@@ -117,9 +118,9 @@ def compatible_with(self, other):
 
     def lazy_fields_by_source_id(self):
         if self.fields_by_source_id is None:
-            self.fields_by_source_id = dict()
+            self.fields_by_source_id = defaultdict(list)
             for field in self.fields:
-                self.fields_by_source_id[field.source_id] = field
+                self.fields_by_source_id[field.source_id].append(field)

Review comment:
       at the very least, I will add some test coverage, but it may be worthwhile looking at the change more critically in general.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] emkornfield commented on pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#issuecomment-1017733969


   Thanks @TGooch44 does this apply to any of the other open PRs?  Or is there an easy way for me to distinguish myself (this PR appeared on the surface to be being made again python and not python legacy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on a change in pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on a change in pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#discussion_r788397380



##########
File path: python/iceberg/api/expressions/residual_evaluator.py
##########
@@ -15,98 +15,158 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from .expression import Expression
 from .expressions import Expressions, ExpressionVisitors
+from .literals import Literal
 from .predicate import BoundPredicate, Predicate, UnboundPredicate
+from .reference import BoundReference
+from ..partition_spec import PartitionSpec
+from ..struct_like import StructLike
 
 
 class ResidualEvaluator(object):
+    """
+    Finds the residuals for an {@link Expression} the partitions in the given {@link PartitionSpec}.
 
-    def __init__(self, spec, expr):
+    A residual expression is made by partially evaluating an expression using partition values. For
+    example, if a table is partitioned by day(utc_timestamp) and is read with a filter expression
+    utc_timestamp &gt;= a and utc_timestamp &lt;= b, then there are 4 possible residuals expressions
+    for the partition data, d:
+
+
+        + If d > day(a) and d < day(b), the residual is always true
+        + If d == day(a) and d != day(b), the residual is utc_timestamp >= b
+        + if d == day(b) and d != day(a), the residual is utc_timestamp <= b
+        + If d == day(a) == day(b), the residual is utc_timestamp >= a and utc_timestamp <= b
+
+    Partition data is passed using StructLike. Residuals are returned by residualFor(StructLike).
+
+    """
+    @staticmethod
+    def unpartitioned(expr: Expression) -> 'UnpartitionedEvaluator':
+        return UnpartitionedEvaluator(expr)
+
+    @staticmethod
+    def of(spec: PartitionSpec, expr: Expression, case_sensitive=True) -> 'ResidualEvaluator':
+        if len(spec.fields) > 0:
+            return ResidualEvaluator(spec, expr, case_sensitive=case_sensitive)
+        else:
+            return ResidualEvaluator.unpartitioned(expr)
+
+    def __init__(self, spec, expr, case_sensitive=True):
         self._spec = spec
         self._expr = expr
+        self._case_sensitive = case_sensitive
         self.__visitor = None
 
-    def _visitor(self):
+    def _visitor(self) -> 'ResidualVisitor':
         if self.__visitor is None:
-            self.__visitor = ResidualVisitor()
+            self.__visitor = ResidualVisitor(self._spec,
+                                             self._expr,
+                                             self._case_sensitive)
 
         return self.__visitor
 
-    def residual_for(self, partition_data):
+    def residual_for(self, partition_data: StructLike) -> Expression:
+        """
+        Returns a residual expression for the given partition values.
+
+        Parameters
+        ----------
+        partition_data: StructLike
+            partition data values
+
+        Returns
+        -------
+        Expression
+            the residual of this evaluator's expression from the partition values
+        """
         return self._visitor().eval(partition_data)
 
 
 class ResidualVisitor(ExpressionVisitors.BoundExpressionVisitor):
 
-    def __init__(self):
+    def __init__(self, spec, expr, case_sensitive=True):

Review comment:
       Hi Micah,  I'm going to go ahead and close out this PR since the focus has moved on to the redesigned library and this PR is on the legacy implementation, which if I understand correctly is only going to be bug fixes going forward and once there is a viable alternative in the new design, this will be fully deprecated. Sorry for any confusion here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#issuecomment-1017922194


   Let me go through and clear out any that are for the legacy code-base
   
   On Thu, Jan 20, 2022 at 9:16 AM emkornfield ***@***.***>
   wrote:
   
   > Thanks @TGooch44 <https://github.com/TGooch44> does this apply to any of
   > the other open PRs? Or is there an easy way for me to distinguish myself
   > (this PR appeared on the surface to be being made again python and not
   > python legacy.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/iceberg/pull/2115#issuecomment-1017733969>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAMETH3EHD6ISNFANTK6KK3UXA7QJANCNFSM4WJGQAIQ>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#issuecomment-1017928254


   @emkornfield I closed out mine as I think they are likely not relevant in the new design.  I labeled the other few that I'm sure are not bug fixes and are for python legacy implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#issuecomment-763896598


   @rymurr Thanks again for taking a look.  Your help on this has really been invaluable.  I'm reviewing your comments now, will comment on each. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on a change in pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on a change in pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#discussion_r561241509



##########
File path: python/iceberg/api/expressions/projections.py
##########
@@ -15,32 +15,81 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from typing import TYPE_CHECKING
 
 from .expressions import Expressions, ExpressionVisitors, RewriteNot
 from .predicate import BoundPredicate, UnboundPredicate
 
+if TYPE_CHECKING:
+    from .expression import Expression
+    from .predicate import Predicate
+    from ..partition_spec import PartitionSpec
 
-def inclusive(spec, case_sensitive=True):
+

Review comment:
       agree, I'll update this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TGooch44 commented on a change in pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
TGooch44 commented on a change in pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#discussion_r561240703



##########
File path: python/iceberg/api/partition_spec.py
##########
@@ -117,9 +118,9 @@ def compatible_with(self, other):
 
     def lazy_fields_by_source_id(self):
         if self.fields_by_source_id is None:
-            self.fields_by_source_id = dict()
+            self.fields_by_source_id = defaultdict(list)
             for field in self.fields:
-                self.fields_by_source_id[field.source_id] = field
+                self.fields_by_source_id[field.source_id].append(field)

Review comment:
       I believe this is because there may be multiple hidden partitions derived from the same table column.  I think Ryan Blue introduced this in the java implementation here:
   https://github.com/apache/iceberg/commit/649cbdde83693ebda8e8dc6e75857426d25414ec#diff-d1905822d843dea78ebe5404ee9ce885b7adbc46970fbaf931a87ae2758abeb6
   
   Let me double check, I had written this in our internal repo a while back and it's a little foggy now.

##########
File path: python/iceberg/api/partition_spec.py
##########
@@ -117,9 +118,9 @@ def compatible_with(self, other):
 
     def lazy_fields_by_source_id(self):
         if self.fields_by_source_id is None:
-            self.fields_by_source_id = dict()
+            self.fields_by_source_id = defaultdict(list)
             for field in self.fields:
-                self.fields_by_source_id[field.source_id] = field
+                self.fields_by_source_id[field.source_id].append(field)

Review comment:
       at the very least, I will add some test coverage, but it may be worthwhile looking at the change more critically in general.

##########
File path: python/iceberg/api/expressions/projections.py
##########
@@ -15,32 +15,81 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from typing import TYPE_CHECKING
 
 from .expressions import Expressions, ExpressionVisitors, RewriteNot
 from .predicate import BoundPredicate, UnboundPredicate
 
+if TYPE_CHECKING:
+    from .expression import Expression
+    from .predicate import Predicate
+    from ..partition_spec import PartitionSpec
 
-def inclusive(spec, case_sensitive=True):
+

Review comment:
       agree, I'll update this

##########
File path: python/iceberg/api/expressions/residual_evaluator.py
##########
@@ -115,3 +175,14 @@ def unbound_predicate(self, pred):
             return bound_residual
 
         return bound
+
+
+class UnpartitionedEvaluator(ResidualEvaluator):
+
+    def __init__(self, expr):
+        return super(UnpartitionedEvaluator, self).__init__(PartitionSpec.unpartitioned(),

Review comment:
       yes, hmm...not sure what I was doing there, but you are of course right.  Will update.

##########
File path: python/iceberg/api/expressions/residual_evaluator.py
##########
@@ -15,98 +15,158 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from .expression import Expression
 from .expressions import Expressions, ExpressionVisitors
+from .literals import Literal
 from .predicate import BoundPredicate, Predicate, UnboundPredicate
+from .reference import BoundReference
+from ..partition_spec import PartitionSpec
+from ..struct_like import StructLike
 
 
 class ResidualEvaluator(object):
+    """
+    Finds the residuals for an {@link Expression} the partitions in the given {@link PartitionSpec}.
 
-    def __init__(self, spec, expr):
+    A residual expression is made by partially evaluating an expression using partition values. For
+    example, if a table is partitioned by day(utc_timestamp) and is read with a filter expression
+    utc_timestamp &gt;= a and utc_timestamp &lt;= b, then there are 4 possible residuals expressions
+    for the partition data, d:
+
+
+        + If d > day(a) and d < day(b), the residual is always true
+        + If d == day(a) and d != day(b), the residual is utc_timestamp >= b

Review comment:
       yes, definitely right since all values in the file will fall greater than that lower bound.  This looks like a typo/mistake, will update.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] emkornfield commented on a change in pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#discussion_r788247681



##########
File path: python/iceberg/api/expressions/residual_evaluator.py
##########
@@ -15,98 +15,158 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from .expression import Expression
 from .expressions import Expressions, ExpressionVisitors
+from .literals import Literal
 from .predicate import BoundPredicate, Predicate, UnboundPredicate
+from .reference import BoundReference
+from ..partition_spec import PartitionSpec
+from ..struct_like import StructLike
 
 
 class ResidualEvaluator(object):
+    """
+    Finds the residuals for an {@link Expression} the partitions in the given {@link PartitionSpec}.
 
-    def __init__(self, spec, expr):
+    A residual expression is made by partially evaluating an expression using partition values. For
+    example, if a table is partitioned by day(utc_timestamp) and is read with a filter expression
+    utc_timestamp &gt;= a and utc_timestamp &lt;= b, then there are 4 possible residuals expressions
+    for the partition data, d:
+
+
+        + If d > day(a) and d < day(b), the residual is always true
+        + If d == day(a) and d != day(b), the residual is utc_timestamp >= b
+        + if d == day(b) and d != day(a), the residual is utc_timestamp <= b
+        + If d == day(a) == day(b), the residual is utc_timestamp >= a and utc_timestamp <= b
+
+    Partition data is passed using StructLike. Residuals are returned by residualFor(StructLike).
+
+    """
+    @staticmethod
+    def unpartitioned(expr: Expression) -> 'UnpartitionedEvaluator':
+        return UnpartitionedEvaluator(expr)
+
+    @staticmethod
+    def of(spec: PartitionSpec, expr: Expression, case_sensitive=True) -> 'ResidualEvaluator':
+        if len(spec.fields) > 0:
+            return ResidualEvaluator(spec, expr, case_sensitive=case_sensitive)
+        else:
+            return ResidualEvaluator.unpartitioned(expr)
+
+    def __init__(self, spec, expr, case_sensitive=True):
         self._spec = spec
         self._expr = expr
+        self._case_sensitive = case_sensitive
         self.__visitor = None
 
-    def _visitor(self):
+    def _visitor(self) -> 'ResidualVisitor':
         if self.__visitor is None:
-            self.__visitor = ResidualVisitor()
+            self.__visitor = ResidualVisitor(self._spec,
+                                             self._expr,
+                                             self._case_sensitive)
 
         return self.__visitor
 
-    def residual_for(self, partition_data):
+    def residual_for(self, partition_data: StructLike) -> Expression:
+        """
+        Returns a residual expression for the given partition values.
+
+        Parameters
+        ----------
+        partition_data: StructLike
+            partition data values
+
+        Returns
+        -------
+        Expression
+            the residual of this evaluator's expression from the partition values
+        """
         return self._visitor().eval(partition_data)
 
 
 class ResidualVisitor(ExpressionVisitors.BoundExpressionVisitor):
 
-    def __init__(self):
+    def __init__(self, spec, expr, case_sensitive=True):

Review comment:
       type annotations for spec and expr?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] emkornfield commented on a change in pull request #2115: [python] Completing ResidualEvaluator Implementation

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#discussion_r788244997



##########
File path: python/iceberg/api/expressions/projections.py
##########
@@ -15,32 +15,81 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from typing import TYPE_CHECKING
 
 from .expressions import Expressions, ExpressionVisitors, RewriteNot
 from .predicate import BoundPredicate, UnboundPredicate
 
+if TYPE_CHECKING:
+    from .expression import Expression
+    from .predicate import Predicate
+    from ..partition_spec import PartitionSpec
 
-def inclusive(spec, case_sensitive=True):
+
+def inclusive(spec: 'PartitionSpec', case_sensitive: bool = True):
+    """
+    Creates an inclusive ProjectionEvaluator for the PartitionSpec, defaulting
+    to case sensitive mode.
+
+    An evaluator is used to project expressions for a table's data rows into expressions on the
+    table's partition values. The evaluator returned by this function is inclusive and will build
+    expressions with the following guarantee: if the original expression matches a row, then the
+    projected expression will match that row's partition.
+
+    Each predicate in the expression is projected using Transform.project(String, BoundPredicate).
+
+    Parameters
+    ----------
+    spec: PartitionSpec
+       a partition spec
+    case_sensitive: boolean
+       whether the Projection should consider case sensitivity on column names or not.
+
+    Returns
+    -------
+    InclusiveProjection
+        an inclusive projection evaluator for the partition spec. Inclusive transform used for each predicate
+    """
     return InclusiveProjection(spec, case_sensitive)
 
 
-def strict(spec):
+def strict(spec: 'PartitionSpec'):
+    """
+    Creates a strict ProjectionEvaluatorfor the PartitionSpec, defaulting to case sensitive mode.

Review comment:
       ```suggestion
       Creates a strict ProjectionEvaluator for the PartitionSpec, defaulting to case sensitive mode.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org