You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/20 15:49:56 UTC

[GitHub] [iceberg] rymurr commented on a change in pull request #2115: [python] Completing ResidualEvaluator Implementation

rymurr commented on a change in pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#discussion_r561056848



##########
File path: python/iceberg/api/expressions/residual_evaluator.py
##########
@@ -15,98 +15,158 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from .expression import Expression
 from .expressions import Expressions, ExpressionVisitors
+from .literals import Literal
 from .predicate import BoundPredicate, Predicate, UnboundPredicate
+from .reference import BoundReference
+from ..partition_spec import PartitionSpec
+from ..struct_like import StructLike
 
 
 class ResidualEvaluator(object):
+    """
+    Finds the residuals for an {@link Expression} the partitions in the given {@link PartitionSpec}.
 
-    def __init__(self, spec, expr):
+    A residual expression is made by partially evaluating an expression using partition values. For
+    example, if a table is partitioned by day(utc_timestamp) and is read with a filter expression
+    utc_timestamp &gt;= a and utc_timestamp &lt;= b, then there are 4 possible residuals expressions
+    for the partition data, d:
+
+
+        + If d > day(a) and d < day(b), the residual is always true
+        + If d == day(a) and d != day(b), the residual is utc_timestamp >= b

Review comment:
       ```suggestion
           + If d == day(a) and d != day(b), the residual is utc_timestamp >= a
   ```
   
   I think...right?

##########
File path: python/iceberg/api/expressions/residual_evaluator.py
##########
@@ -115,3 +175,14 @@ def unbound_predicate(self, pred):
             return bound_residual
 
         return bound
+
+
+class UnpartitionedEvaluator(ResidualEvaluator):
+
+    def __init__(self, expr):
+        return super(UnpartitionedEvaluator, self).__init__(PartitionSpec.unpartitioned(),

Review comment:
       `return` from an `__init__`? And an expression after it? I guess you don't want the return statement here.

##########
File path: python/iceberg/api/expressions/projections.py
##########
@@ -15,32 +15,81 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from typing import TYPE_CHECKING
 
 from .expressions import Expressions, ExpressionVisitors, RewriteNot
 from .predicate import BoundPredicate, UnboundPredicate
 
+if TYPE_CHECKING:
+    from .expression import Expression
+    from .predicate import Predicate
+    from ..partition_spec import PartitionSpec
 
-def inclusive(spec, case_sensitive=True):
+

Review comment:
       it seems a bit weird to me to expose `inclusive` and `InclusiveProjection` in the public API of this module. I wonder if we should remove `InclusiveProjection` from `__init__.py` of this module to ensure everyone uses the well documented `inclusive` function. (same for `strict` below)

##########
File path: python/iceberg/api/partition_spec.py
##########
@@ -117,9 +118,9 @@ def compatible_with(self, other):
 
     def lazy_fields_by_source_id(self):
         if self.fields_by_source_id is None:
-            self.fields_by_source_id = dict()
+            self.fields_by_source_id = defaultdict(list)
             for field in self.fields:
-                self.fields_by_source_id[field.source_id] = field
+                self.fields_by_source_id[field.source_id].append(field)

Review comment:
       was wondering about this change...what prompted it? Apologies if I missed it. I noticed the similar method below it `lazy_fields_bu_source_name` still uses only a dict. 
   
   None of the added tests exercise this change either,




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org