You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/06 02:28:11 UTC

[GitHub] [iceberg] Fokko opened a new pull request, #6128: Python: Projection

Fokko opened a new pull request, #6128:
URL: https://github.com/apache/iceberg/pull/6128

   - Adds projection for each of the transforms
   - Removes the dataclasses from the expressions. I noticed that the IDE didn't like the way we were using the dataclasses in a hierarchical structure: 
   ![image](https://user-images.githubusercontent.com/1134248/200150958-3634a40c-a530-424c-956d-94b843bb5cb9.png)
   It complaints about unexpected arguments. Also, noticed that mypy was struggling with it. I've removed the dataclasses and added the abstract classes. I've implemented the `__eq__` and `__repr__` where required. Also removed some inconsistencies along the way (`In` was expecting a tuple, and `BoundIn` a set).
   
   More tests still to be added


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022125991


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimeLiteral, TimestampLiteral)):
+        raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+    if isinstance(pred, BoundLessThan):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))
+    elif isinstance(pred, BoundLessThanOrEqual):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundGreaterThan):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary.increment()))
+    elif isinstance(pred, BoundGreaterThanOrEqual):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(transform, boundary))
+    else:
+        return None
+
+
+def _truncate_array(
+    name: str, pred: BoundLiteralPredicate, func: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if type(pred) in {BoundLessThan, BoundLessThanOrEqual}:
+        return LessThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    elif type(pred) in {BoundGreaterThan, BoundGreaterThanOrEqual}:
+        return GreaterThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    if isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(func, boundary))
+    else:
+        return None
+
+
+def _project_transform_predicate(transform: Transform, partition_name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+    term = pred.term
+    if isinstance(term, BoundTransform) and transform == term.transform:
+        return _remove_transform(partition_name, pred)
+    return None
+
+
+def _remove_transform(partition_name: str, pred: BoundPredicate):
+    if isinstance(pred, BoundUnaryPredicate):
+        return pred.as_unbound(Reference(partition_name))
+    elif isinstance(pred, BoundLiteralPredicate):
+        return pred.as_unbound(Reference(partition_name), pred.literal)
+    elif isinstance(pred, (BoundIn, BoundNotIn)):
+        return pred.as_unbound(Reference(partition_name), pred.literals)
+    else:
+        raise ValueError(f"Cannot replace transform in unknown predicate: {pred}")
+
+
+def _transform_set(name: str, pred: BoundSetPredicate, func: Callable[[Optional[M]], Optional[M]]) -> UnboundPredicate:
+    literals = pred.literals
+    if isinstance(pred, BoundIn):
+        return In(Reference(name), {_transform_literal(func, literal) for literal in literals})

Review Comment:
   Should this be `pred.as_unbound`? That could handle both cases, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027296256


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimeLiteral, TimestampLiteral)):
+        raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+    if isinstance(pred, BoundLessThan):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))
+    elif isinstance(pred, BoundLessThanOrEqual):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundGreaterThan):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary.increment()))
+    elif isinstance(pred, BoundGreaterThanOrEqual):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(transform, boundary))
+    else:
+        return None
+
+
+def _truncate_array(
+    name: str, pred: BoundLiteralPredicate, func: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if type(pred) in {BoundLessThan, BoundLessThanOrEqual}:
+        return LessThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    elif type(pred) in {BoundGreaterThan, BoundGreaterThanOrEqual}:
+        return GreaterThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    if isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(func, boundary))
+    else:
+        return None
+
+
+def _project_transform_predicate(transform: Transform, partition_name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+    term = pred.term
+    if isinstance(term, BoundTransform) and transform == term.transform:
+        return _remove_transform(partition_name, pred)
+    return None
+
+
+def _remove_transform(partition_name: str, pred: BoundPredicate):
+    if isinstance(pred, BoundUnaryPredicate):
+        return pred.as_unbound(Reference(partition_name))
+    elif isinstance(pred, BoundLiteralPredicate):
+        return pred.as_unbound(Reference(partition_name), pred.literal)
+    elif isinstance(pred, (BoundIn, BoundNotIn)):
+        return pred.as_unbound(Reference(partition_name), pred.literals)
+    else:
+        raise ValueError(f"Cannot replace transform in unknown predicate: {pred}")
+
+
+def _transform_set(name: str, pred: BoundSetPredicate, func: Callable[[Optional[M]], Optional[M]]) -> UnboundPredicate:
+    literals = pred.literals
+    if isinstance(pred, BoundIn):
+        return In(Reference(name), {_transform_literal(func, literal) for literal in literals})

Review Comment:
   Changed that, will also reduce the possibility of typo's πŸ•ΊπŸ» 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027341625


##########
python/pyiceberg/expressions/literals.py:
##########
@@ -381,10 +381,12 @@ def __init__(self, value: Decimal):
         super().__init__(value, Decimal)
 
     def increment(self) -> Literal[Decimal]:
-        return DecimalLiteral(self.value + 1)
+        inc_value = 1 / (10 ** abs(self.value.as_tuple().exponent))

Review Comment:
   We don't want to do decimal math like this because we then have to worry about precision. Instead, can you use the to/from unscaled methods?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1014916040


##########
python/pyiceberg/transforms.py:
##########
@@ -173,6 +201,23 @@ def apply(self, value: Optional[S]) -> Optional[int]:
     def result_type(self, source: IcebergType) -> IcebergType:
         return IntegerType()
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        func = self.transform(pred.term.ref().field.field_type)

Review Comment:
   Is there a better name than `func`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1014916582


##########
python/pyiceberg/transforms.py:
##########
@@ -427,6 +486,20 @@ def can_transform(self, source: IcebergType) -> bool:
             TimestamptzType,
         }
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:

Review Comment:
   Is this different than the other `DatesTransform` implementation? I might be missing it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027342736


##########
python/tests/expressions/test_literals.py:
##########
@@ -805,6 +805,24 @@ def test_string_to_decimal_type_invalid_value():
     assert "Could not convert 18.15 into a decimal(10, 0), scales differ 0 <> 2" in str(e.value)
 
 
+def test_decimal_literal_increment():

Review Comment:
   These tests look good to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027300371


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimeLiteral, TimestampLiteral)):

Review Comment:
   Did this together with removing `.increment()` and `.decrement()` from the Literal πŸ‘πŸ» 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1028605806


##########
python/pyiceberg/transforms.py:
##########
@@ -511,6 +590,31 @@ def preserves_order(self) -> bool:
     def source_type(self) -> IcebergType:
         return self._source_type
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        field_type = pred.term.ref().field.field_type
+
+        if isinstance(pred.term, BoundTransform):
+            return _project_transform_predicate(self, name, pred)
+
+        # Implement startswith and notstartswith for string (and probably binary)
+        # https://github.com/apache/iceberg/issues/6112
+
+        if isinstance(pred, BoundUnaryPredicate):
+            return pred.as_unbound(Reference(name))
+
+        if isinstance(field_type, (IntegerType, LongType, DecimalType)):
+            if isinstance(pred, BoundLiteralPredicate):
+                return _truncate_number(name, pred, self.transform(field_type))
+            elif isinstance(pred, BoundIn):
+                return _transform_set(name, pred, self.transform(field_type))

Review Comment:
   Yeah, I just meant for the inner `elif` case, where both call `_set_apply_transform`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1017838594


##########
python/pyiceberg/transforms.py:
##########
@@ -173,6 +201,23 @@ def apply(self, value: Optional[S]) -> Optional[int]:
     def result_type(self, source: IcebergType) -> IcebergType:
         return IntegerType()
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        func = self.transform(pred.term.ref().field.field_type)

Review Comment:
   `transformer`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1014917239


##########
python/tests/expressions/test_expressions.py:
##########
@@ -269,23 +283,23 @@ def test_bind_not_in_equal_term(table_schema_simple: Schema):
 
 
 def test_in_empty():
-    assert In(Reference("foo"), ()) == AlwaysFalse()
+    assert In(Reference("foo"), set()) == AlwaysFalse()

Review Comment:
   I simplified unbound expression creation in a recent PR. That might help here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022171762


##########
python/pyiceberg/transforms.py:
##########
@@ -511,6 +590,31 @@ def preserves_order(self) -> bool:
     def source_type(self) -> IcebergType:
         return self._source_type
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        field_type = pred.term.ref().field.field_type
+
+        if isinstance(pred.term, BoundTransform):
+            return _project_transform_predicate(self, name, pred)
+
+        # Implement startswith and notstartswith for string (and probably binary)
+        # https://github.com/apache/iceberg/issues/6112
+
+        if isinstance(pred, BoundUnaryPredicate):
+            return pred.as_unbound(Reference(name))
+
+        if isinstance(field_type, (IntegerType, LongType, DecimalType)):
+            if isinstance(pred, BoundLiteralPredicate):
+                return _truncate_number(name, pred, self.transform(field_type))
+            elif isinstance(pred, BoundIn):
+                return _transform_set(name, pred, self.transform(field_type))

Review Comment:
   Minor: this case could be moved to the higher level since it is the same for both numeric and array types.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022136368


##########
python/pyiceberg/expressions/literals.py:
##########
@@ -359,6 +384,12 @@ class TimestampLiteral(Literal[int]):
     def __init__(self, value: int):
         super().__init__(value, int)
 
+    def increment(self) -> TimeLiteral:
+        return TimeLiteral(self.value + 1)

Review Comment:
   `TimestampLiteral`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022129008


##########
python/pyiceberg/transforms.py:
##########
@@ -173,6 +206,23 @@ def apply(self, value: Optional[S]) -> Optional[int]:
     def result_type(self, source: IcebergType) -> IcebergType:
         return IntegerType()
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        transformer = self.transform(pred.term.ref().field.field_type)
+
+        if isinstance(pred.term, BoundTransform):
+            return _project_transform_predicate(self, name, pred)
+        elif isinstance(pred, BoundUnaryPredicate):
+            return pred.as_unbound(Reference(name))
+        elif isinstance(pred, BoundEqualTo):
+            return pred.as_unbound(Reference(name), _transform_literal(transformer, pred.literal))
+        elif isinstance(pred, BoundIn):  # NotIn can't be projected
+            return pred.as_unbound(Reference(name), {_transform_literal(transformer, literal) for literal in pred.literals})
+        else:
+            # - Comparison predicates can't be projected, notEq can't be projected
+            # - Small ranges can be projected:
+            #   For example, (x > 0) and (x < 3) can be turned into in({1, 2}) and projected.
+            return None

Review Comment:
   Looks correct to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#issuecomment-1322801754

   I fixed a formatting issue and tests are passing! Overall this looks great with only a couple nits left, so I'll merge it. Thanks for getting this in, @Fokko!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022181916


##########
python/tests/test_transforms.py:
##########
@@ -506,3 +532,345 @@ def test_datetime_transform_str(transform, transform_str):
 )
 def test_datetime_transform_repr(transform, transform_repr):
     assert repr(transform) == transform_repr
+
+
+@pytest.fixture
+def bound_reference_str() -> BoundReference:
+    return BoundReference(field=NestedField(1, "field", StringType(), required=False), accessor=Accessor(position=0, inner=None))
+
+
+@pytest.fixture
+def bound_reference_date() -> BoundReference:
+    return BoundReference(field=NestedField(1, "field", DateType(), required=False), accessor=Accessor(position=0, inner=None))
+
+
+@pytest.fixture
+def bound_reference_timestamp() -> BoundReference:
+    return BoundReference(
+        field=NestedField(1, "field", TimestampType(), required=False), accessor=Accessor(position=0, inner=None)
+    )
+
+
+@pytest.fixture
+def bound_reference_decimal() -> BoundReference:
+    return BoundReference(
+        field=NestedField(1, "field", DecimalType(8, 2), required=False), accessor=Accessor(position=0, inner=None)
+    )
+
+
+@pytest.fixture
+def bound_reference_long() -> BoundReference:
+    return BoundReference(
+        field=NestedField(1, "field", DecimalType(8, 2), required=False), accessor=Accessor(position=0, inner=None)
+    )
+
+
+def test_projection_bucket_unary(bound_reference_str: BoundReference) -> None:
+    assert BucketTransform(2).project("name", BoundNotNull(term=bound_reference_str)) == NotNull(term=Reference(name="name"))
+
+
+def test_projection_bucket_literal(bound_reference_str: BoundReference) -> None:
+    assert BucketTransform(2).project("name", BoundEqualTo(term=bound_reference_str, literal=literal("data"))) == EqualTo(
+        term="name", literal=literal(1)
+    )
+
+
+def test_projection_bucket_set_same_bucket(bound_reference_str: BoundReference) -> None:
+    assert BucketTransform(2).project(
+        "name", BoundIn(term=bound_reference_str, literals={literal("hello"), literal("world")})
+    ) == EqualTo(term="name", literal=literal(1))
+
+
+def test_projection_bucket_set_in(bound_reference_str: BoundReference) -> None:
+    assert BucketTransform(3).project(
+        "name", BoundIn(term=bound_reference_str, literals={literal("hello"), literal("world")})
+    ) == In(term="name", literals={literal(1), literal(2)})
+
+
+def test_projection_bucket_set_not_in(bound_reference_str: BoundReference) -> None:
+    assert (
+        BucketTransform(3).project("name", BoundNotIn(term=bound_reference_str, literals={literal("hello"), literal("world")}))
+        is None
+    )
+
+
+def test_projection_year_unary(bound_reference_date: BoundReference) -> None:
+    assert YearTransform().project("name", BoundNotNull(term=bound_reference_date)) == NotNull(term="name")
+
+
+def test_projection_year_literal(bound_reference_date: BoundReference) -> None:
+    assert YearTransform().project("name", BoundEqualTo(term=bound_reference_date, literal=DateLiteral(1925))) == EqualTo(
+        term="name", literal=literal(5)
+    )
+
+
+def test_projection_year_set_same_year(bound_reference_date: BoundReference) -> None:
+    assert YearTransform().project(
+        "name", BoundIn(term=bound_reference_date, literals={DateLiteral(1925), DateLiteral(1926)})
+    ) == EqualTo(term="name", literal=literal(5))
+
+
+def test_projection_year_set_in(bound_reference_date: BoundReference) -> None:
+    assert YearTransform().project(
+        "name", BoundIn(term=bound_reference_date, literals={DateLiteral(1925), DateLiteral(2925)})
+    ) == In(term="name", literals={literal(8), literal(5)})
+
+
+def test_projection_year_set_not_in(bound_reference_date: BoundReference) -> None:
+    assert (
+        YearTransform().project("name", BoundNotIn(term=bound_reference_date, literals={DateLiteral(1925), DateLiteral(2925)}))
+        is None
+    )
+
+
+def test_projection_month_unary(bound_reference_date: BoundReference) -> None:
+    assert MonthTransform().project("name", BoundNotNull(term=bound_reference_date)) == NotNull(term="name")
+
+
+def test_projection_month_literal(bound_reference_date: BoundReference) -> None:
+    assert MonthTransform().project("name", BoundEqualTo(term=bound_reference_date, literal=DateLiteral(1925))) == EqualTo(
+        term="name", literal=literal(63)
+    )
+
+
+def test_projection_month_set_same_month(bound_reference_date: BoundReference) -> None:
+    assert MonthTransform().project(
+        "name", BoundIn(term=bound_reference_date, literals={DateLiteral(1925), DateLiteral(1926)})
+    ) == EqualTo(term="name", literal=literal(63))
+
+
+def test_projection_month_set_in(bound_reference_date: BoundReference) -> None:
+    assert MonthTransform().project(
+        "name", BoundIn(term=bound_reference_date, literals={DateLiteral(1925), DateLiteral(2925)})
+    ) == In(term="name", literals={literal(96), literal(63)})
+
+
+def test_projection_day_month_not_in(bound_reference_date: BoundReference) -> None:
+    assert (
+        MonthTransform().project("name", BoundNotIn(term=bound_reference_date, literals={DateLiteral(1925), DateLiteral(2925)}))
+        is None
+    )
+
+
+def test_projection_day_unary(bound_reference_timestamp) -> None:
+    assert DayTransform().project("name", BoundNotNull(term=bound_reference_timestamp)) == NotNull(term="name")
+
+
+def test_projection_day_literal(bound_reference_timestamp) -> None:
+    assert DayTransform().project(
+        "name", BoundEqualTo(term=bound_reference_timestamp, literal=TimestampLiteral(1667696874000))
+    ) == EqualTo(term="name", literal=literal(19))
+
+
+def test_projection_day_set_same_day(bound_reference_timestamp) -> None:
+    assert DayTransform().project(
+        "name",
+        BoundIn(term=bound_reference_timestamp, literals={TimestampLiteral(1667696874001), TimestampLiteral(1667696874000)}),
+    ) == EqualTo(term="name", literal=literal(19))
+
+
+def test_projection_day_set_in(bound_reference_timestamp) -> None:
+    assert DayTransform().project(
+        "name",
+        BoundIn(term=bound_reference_timestamp, literals={TimestampLiteral(1667696874001), TimestampLiteral(1567696874000)}),
+    ) == In(term="name", literals={literal(18), literal(19)})
+
+
+def test_projection_day_set_not_in(bound_reference_timestamp) -> None:
+    assert (
+        DayTransform().project(
+            "name",
+            BoundNotIn(term=bound_reference_timestamp, literals={TimestampLiteral(1567696874), TimestampLiteral(1667696874)}),
+        )
+        is None
+    )
+
+
+def test_projection_day_human(bound_reference_date: BoundReference) -> None:
+    date_literal = literal(date(2018, 1, 1))
+    assert DayTransform().project("dt", BoundEqualTo(term=bound_reference_date, literal=date_literal)) == EqualTo(
+        term="dt", literal=literal(17532)
+    )  # == 2018, 1, 1
+
+    assert DayTransform().project("dt", BoundLessThanOrEqual(term=bound_reference_date, literal=date_literal)) == LessThanOrEqual(
+        term="dt", literal=literal(17532)
+    )  # <= 2018, 1, 1
+
+    assert DayTransform().project("dt", BoundLessThan(term=bound_reference_date, literal=date_literal)) == LessThanOrEqual(
+        term="dt", literal=literal(17531)
+    )  # <= 2017, 12, 31
+
+    assert DayTransform().project(
+        "dt", BoundGreaterThanOrEqual(term=bound_reference_date, literal=date_literal)
+    ) == GreaterThanOrEqual(
+        term="dt", literal=literal(17532)
+    )  # >= 2018, 1, 1
+
+    assert DayTransform().project("dt", BoundGreaterThan(term=bound_reference_date, literal=date_literal)) == GreaterThanOrEqual(
+        term="dt", literal=literal(17533)
+    )  # >= 2018, 1, 2
+
+
+def test_projection_hour_unary(bound_reference_timestamp) -> None:
+    assert HourTransform().project("name", BoundNotNull(term=bound_reference_timestamp)) == NotNull(term="name")
+
+
+def test_projection_hour_literal(bound_reference_timestamp) -> None:
+    assert HourTransform().project(
+        "name", BoundEqualTo(term=bound_reference_timestamp, literal=TimestampLiteral(1667696874000))

Review Comment:
   Minor: these tests use millisecond precision timestamps (Sun Nov 06 2022 01:07:54) but the actual representation is in microseconds. That's why the resulting hours ordinal is small. The actual date/time is Tue Jan 20 1970 07:14:56. It would be better to always use micros even when supplying constants so that future readers aren't confused.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022125991


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimeLiteral, TimestampLiteral)):
+        raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+    if isinstance(pred, BoundLessThan):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))
+    elif isinstance(pred, BoundLessThanOrEqual):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundGreaterThan):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary.increment()))
+    elif isinstance(pred, BoundGreaterThanOrEqual):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(transform, boundary))
+    else:
+        return None
+
+
+def _truncate_array(
+    name: str, pred: BoundLiteralPredicate, func: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if type(pred) in {BoundLessThan, BoundLessThanOrEqual}:
+        return LessThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    elif type(pred) in {BoundGreaterThan, BoundGreaterThanOrEqual}:
+        return GreaterThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    if isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(func, boundary))
+    else:
+        return None
+
+
+def _project_transform_predicate(transform: Transform, partition_name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+    term = pred.term
+    if isinstance(term, BoundTransform) and transform == term.transform:
+        return _remove_transform(partition_name, pred)
+    return None
+
+
+def _remove_transform(partition_name: str, pred: BoundPredicate):
+    if isinstance(pred, BoundUnaryPredicate):
+        return pred.as_unbound(Reference(partition_name))
+    elif isinstance(pred, BoundLiteralPredicate):
+        return pred.as_unbound(Reference(partition_name), pred.literal)
+    elif isinstance(pred, (BoundIn, BoundNotIn)):
+        return pred.as_unbound(Reference(partition_name), pred.literals)
+    else:
+        raise ValueError(f"Cannot replace transform in unknown predicate: {pred}")
+
+
+def _transform_set(name: str, pred: BoundSetPredicate, func: Callable[[Optional[M]], Optional[M]]) -> UnboundPredicate:
+    literals = pred.literals
+    if isinstance(pred, BoundIn):
+        return In(Reference(name), {_transform_literal(func, literal) for literal in literals})

Review Comment:
   Should this be `pred.as_unbound`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027159853


##########
python/pyiceberg/transforms.py:
##########
@@ -173,6 +206,23 @@ def apply(self, value: Optional[S]) -> Optional[int]:
     def result_type(self, source: IcebergType) -> IcebergType:
         return IntegerType()
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:

Review Comment:
   Updated it to:
   ```python
   def project(self, name: str, pred: BoundPredicate[L])
   ```
   since it is an abstract method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1017812043


##########
python/pyiceberg/expressions/literals.py:
##########
@@ -213,6 +217,26 @@ class LongLiteral(Literal[int]):
     def __init__(self, value: int):
         super().__init__(value, int)
 
+    def __add__(self, other: Any) -> LongLiteral:

Review Comment:
   Good call on the mutability. Python doesn't have a `++` or `--`, this is why I liked overriding `__add__` and `__sub__`. Changed this into `.increment()` and `.decrement()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1014915923


##########
python/pyiceberg/expressions/literals.py:
##########
@@ -213,6 +217,26 @@ class LongLiteral(Literal[int]):
     def __init__(self, value: int):
         super().__init__(value, int)
 
+    def __add__(self, other: Any) -> LongLiteral:

Review Comment:
   What about `increment` and `decrement` methods since we only need to add or subtract 1?
   
   I think I like the idea of implementing this on the literal. The main issue that I have with this is that it modifies the value of the literal. I think introducing mutability is going to cause problems because this will modify the original expression when we project, which should not happen. (That may also be why expressions are no longer immutable?)
   
   I think this should just return a new literal rather than `self`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#issuecomment-1304935137

   This is looking great. There are just two issues:
   1. The date/time transforms should use the same logic as truncate for numbers since they're basically truncating
   2. I don't think that the changes to expressions are necessary
   
   Thanks, @Fokko!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1017839219


##########
python/pyiceberg/transforms.py:
##########
@@ -237,7 +282,7 @@ class TimeResolution(IntEnum):
     SECOND = 0
 
 
-class TimeTransform(Transform[S, int], Singleton):
+class DatesTransform(Transform[S, int], Singleton):

Review Comment:
   I changed this one to match with the Java implementation, but not really required.



##########
python/pyiceberg/transforms.py:
##########
@@ -237,7 +282,7 @@ class TimeResolution(IntEnum):
     SECOND = 0
 
 
-class TimeTransform(Transform[S, int], Singleton):
+class DatesTransform(Transform[S, int], Singleton):

Review Comment:
   I changed this one to match with the Java implementation, but the change is not required



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027300467


##########
python/pyiceberg/transforms.py:
##########
@@ -511,6 +590,31 @@ def preserves_order(self) -> bool:
     def source_type(self) -> IcebergType:
         return self._source_type
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        field_type = pred.term.ref().field.field_type
+
+        if isinstance(pred.term, BoundTransform):
+            return _project_transform_predicate(self, name, pred)
+
+        # Implement startswith and notstartswith for string (and probably binary)
+        # https://github.com/apache/iceberg/issues/6112
+
+        if isinstance(pred, BoundUnaryPredicate):

Review Comment:
   I like `elif`, updated πŸ‘πŸ» 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027327473


##########
python/pyiceberg/transforms.py:
##########
@@ -511,6 +590,31 @@ def preserves_order(self) -> bool:
     def source_type(self) -> IcebergType:
         return self._source_type
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        field_type = pred.term.ref().field.field_type
+
+        if isinstance(pred.term, BoundTransform):
+            return _project_transform_predicate(self, name, pred)
+
+        # Implement startswith and notstartswith for string (and probably binary)
+        # https://github.com/apache/iceberg/issues/6112
+
+        if isinstance(pred, BoundUnaryPredicate):
+            return pred.as_unbound(Reference(name))
+
+        if isinstance(field_type, (IntegerType, LongType, DecimalType)):
+            if isinstance(pred, BoundLiteralPredicate):
+                return _truncate_number(name, pred, self.transform(field_type))
+            elif isinstance(pred, BoundIn):
+                return _transform_set(name, pred, self.transform(field_type))

Review Comment:
   Actually, `_truncate_number` and `_truncate_array` is different



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#issuecomment-1314580426

   I think the only blockers are in the literals, where `TimeLiteral` should not implement the new methods and is also copy-pasted incorrectly, and there's an issue with decimal increment/decrement to fix (it should increment the underlying unscaled value). Otherwise, this looks pretty good. There are a few minor things as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022131508


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimeLiteral, TimestampLiteral)):
+        raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+    if isinstance(pred, BoundLessThan):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))
+    elif isinstance(pred, BoundLessThanOrEqual):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundGreaterThan):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary.increment()))
+    elif isinstance(pred, BoundGreaterThanOrEqual):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(transform, boundary))
+    else:
+        return None
+
+
+def _truncate_array(
+    name: str, pred: BoundLiteralPredicate, func: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if type(pred) in {BoundLessThan, BoundLessThanOrEqual}:
+        return LessThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    elif type(pred) in {BoundGreaterThan, BoundGreaterThanOrEqual}:
+        return GreaterThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    if isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(func, boundary))
+    else:
+        return None
+
+
+def _project_transform_predicate(transform: Transform, partition_name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+    term = pred.term
+    if isinstance(term, BoundTransform) and transform == term.transform:
+        return _remove_transform(partition_name, pred)
+    return None
+
+
+def _remove_transform(partition_name: str, pred: BoundPredicate):
+    if isinstance(pred, BoundUnaryPredicate):
+        return pred.as_unbound(Reference(partition_name))
+    elif isinstance(pred, BoundLiteralPredicate):
+        return pred.as_unbound(Reference(partition_name), pred.literal)
+    elif isinstance(pred, (BoundIn, BoundNotIn)):
+        return pred.as_unbound(Reference(partition_name), pred.literals)
+    else:
+        raise ValueError(f"Cannot replace transform in unknown predicate: {pred}")
+
+
+def _transform_set(name: str, pred: BoundSetPredicate, func: Callable[[Optional[M]], Optional[M]]) -> UnboundPredicate:

Review Comment:
   I think the difference is that this is only called to truncate? Maybe it should be `_truncate_set` then?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022136111


##########
python/pyiceberg/expressions/literals.py:
##########
@@ -333,6 +346,12 @@ class DateLiteral(Literal[int]):
     def __init__(self, value: int):
         super().__init__(value, int)
 
+    def increment(self) -> TimeLiteral:

Review Comment:
   I think this should be `DateLiteral`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022141327


##########
python/pyiceberg/transforms.py:
##########
@@ -463,6 +530,18 @@ def can_transform(self, source: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:

Review Comment:
   Looks correct.



##########
python/pyiceberg/transforms.py:
##########
@@ -249,6 +294,20 @@ def satisfies_order_of(self, other: Transform) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return IntegerType()
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:

Review Comment:
   Looks correct now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1014917102


##########
python/pyiceberg/transforms.py:
##########
@@ -249,6 +294,20 @@ def satisfies_order_of(self, other: Transform) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return IntegerType()
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:

Review Comment:
   I think the date/time transforms need to use the same truncation logic as the `truncate` function. That's because the operation may need to change. For example: `ts < '2022-11-06T15:56'` with a `day` transform should be `ts_day <= '2022-11-06'`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1018100513


##########
python/pyiceberg/transforms.py:
##########
@@ -249,6 +294,20 @@ def satisfies_order_of(self, other: Transform) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return IntegerType()
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:

Review Comment:
   Nice, thanks! That was one of the areas I was unsure of. Since we can simplify this quite a lot on the Python side, it was a bit tricky to translate. I also added tests:
   ```python
   def test_projection_day_human(bound_reference_date: BoundReference) -> None:
       date_literal = literal(date(2018, 1, 1))
       assert DayTransform().project("dt", BoundEqualTo(term=bound_reference_date, literal=date_literal)) == EqualTo(
           term="dt", literal=literal(17532)
       )  # == 2018, 1, 1
   
       assert DayTransform().project("dt", BoundLessThanOrEqual(term=bound_reference_date, literal=date_literal)) == LessThanOrEqual(
           term="dt", literal=literal(17532)
       )  # <= 2018, 1, 1
   
       assert DayTransform().project("dt", BoundLessThan(term=bound_reference_date, literal=date_literal)) == LessThanOrEqual(
           term="dt", literal=literal(17531)
       )  # <= 2017, 12, 31
   
       assert DayTransform().project(
           "dt", BoundGreaterThanOrEqual(term=bound_reference_date, literal=date_literal)
       ) == GreaterThanOrEqual(
           term="dt", literal=literal(17532)
       )  # >= 2018, 1, 1
   
       assert DayTransform().project("dt", BoundGreaterThan(term=bound_reference_date, literal=date_literal)) == GreaterThanOrEqual(
           term="dt", literal=literal(17533)
       )  # >= 2018, 1, 2
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027349254


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]

Review Comment:
   That's correct! We won't be able to forget this after https://github.com/apache/iceberg/pull/6232



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027298830


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimeLiteral, TimestampLiteral)):
+        raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+    if isinstance(pred, BoundLessThan):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))
+    elif isinstance(pred, BoundLessThanOrEqual):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundGreaterThan):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary.increment()))
+    elif isinstance(pred, BoundGreaterThanOrEqual):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(transform, boundary))
+    else:
+        return None
+
+
+def _truncate_array(
+    name: str, pred: BoundLiteralPredicate, func: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if type(pred) in {BoundLessThan, BoundLessThanOrEqual}:
+        return LessThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    elif type(pred) in {BoundGreaterThan, BoundGreaterThanOrEqual}:
+        return GreaterThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    if isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(func, boundary))
+    else:
+        return None
+
+
+def _project_transform_predicate(transform: Transform, partition_name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+    term = pred.term
+    if isinstance(term, BoundTransform) and transform == term.transform:
+        return _remove_transform(partition_name, pred)
+    return None
+
+
+def _remove_transform(partition_name: str, pred: BoundPredicate):
+    if isinstance(pred, BoundUnaryPredicate):
+        return pred.as_unbound(Reference(partition_name))
+    elif isinstance(pred, BoundLiteralPredicate):
+        return pred.as_unbound(Reference(partition_name), pred.literal)
+    elif isinstance(pred, (BoundIn, BoundNotIn)):
+        return pred.as_unbound(Reference(partition_name), pred.literals)
+    else:
+        raise ValueError(f"Cannot replace transform in unknown predicate: {pred}")
+
+
+def _transform_set(name: str, pred: BoundSetPredicate, func: Callable[[Optional[M]], Optional[M]]) -> UnboundPredicate:
+    literals = pred.literals
+    if isinstance(pred, BoundIn):
+        return In(Reference(name), {_transform_literal(func, literal) for literal in literals})
+    elif isinstance(pred, BoundIn):
+        return NotIn(Reference(name), {_transform_literal(func, literal) for literal in literals})
+    else:
+        raise ValueError(f"Unknown BoundSetPredicate: {pred}")
+
+
+class BoundTransform(BoundTerm[T]):

Review Comment:
   Yes, but ran into a circular dependency. Expressions would then depend on transform which depends on expressions 😭 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027300222


##########
python/pyiceberg/transforms.py:
##########
@@ -246,9 +296,26 @@ def granularity(self) -> TimeResolution:
     def satisfies_order_of(self, other: Transform) -> bool:
         return self.granularity <= other.granularity if hasattr(other, "granularity") else False
 
-    def result_type(self, source: IcebergType) -> IcebergType:
+    def result_type(self, source: IcebergType) -> IntegerType:
         return IntegerType()
 
+    @abstractmethod
+    def transform(self, source: IcebergType) -> Callable[[Optional[Any]], Optional[int]]:
+        ...
+
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        transformer = self.transform(pred.term.ref().field.field_type)
+        if isinstance(pred.term, BoundTransform):
+            return _project_transform_predicate(self, name, pred)
+        elif isinstance(pred, BoundUnaryPredicate):
+            return pred.as_unbound(Reference(name))

Review Comment:
   Good one! I've split this out in a separate ticket: https://github.com/apache/iceberg/issues/6230



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#issuecomment-1304935954

   > Removes the dataclasses from the expressions
   
   I hit the same issue in #6127 and fixed it a different way. We should talk about how to do this separately, but I think my update might be a reasonable fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022137280


##########
python/pyiceberg/expressions/literals.py:
##########
@@ -346,6 +365,12 @@ class TimeLiteral(Literal[int]):
     def __init__(self, value: int):
         super().__init__(value, int)
 
+    def increment(self) -> TimeLiteral:
+        return TimeLiteral(self.value + 1)

Review Comment:
   I don't think that we allow time literals in partition expressions, do we? That seems weird because time literals are bounded between 0 and at the number of microseconds in a day. I'd probably exclude this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022172621


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimeLiteral, TimestampLiteral)):
+        raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+    if isinstance(pred, BoundLessThan):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))
+    elif isinstance(pred, BoundLessThanOrEqual):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundGreaterThan):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary.increment()))
+    elif isinstance(pred, BoundGreaterThanOrEqual):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(transform, boundary))
+    else:
+        return None
+
+
+def _truncate_array(
+    name: str, pred: BoundLiteralPredicate, func: Callable[[Optional[M]], Optional[M]]

Review Comment:
   `func` -> `transform`? That would match `_truncate_number`.



##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimeLiteral, TimestampLiteral)):
+        raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+    if isinstance(pred, BoundLessThan):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))
+    elif isinstance(pred, BoundLessThanOrEqual):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundGreaterThan):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary.increment()))
+    elif isinstance(pred, BoundGreaterThanOrEqual):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(transform, boundary))
+    else:
+        return None
+
+
+def _truncate_array(
+    name: str, pred: BoundLiteralPredicate, func: Callable[[Optional[M]], Optional[M]]

Review Comment:
   Nit: `func` -> `transform`? That would match `_truncate_number`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022170425


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimeLiteral, TimestampLiteral)):
+        raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+    if isinstance(pred, BoundLessThan):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))
+    elif isinstance(pred, BoundLessThanOrEqual):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundGreaterThan):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary.increment()))
+    elif isinstance(pred, BoundGreaterThanOrEqual):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(transform, boundary))
+    else:
+        return None
+
+
+def _truncate_array(
+    name: str, pred: BoundLiteralPredicate, func: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if type(pred) in {BoundLessThan, BoundLessThanOrEqual}:
+        return LessThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    elif type(pred) in {BoundGreaterThan, BoundGreaterThanOrEqual}:
+        return GreaterThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    if isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(func, boundary))
+    else:
+        return None
+
+
+def _project_transform_predicate(transform: Transform, partition_name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+    term = pred.term
+    if isinstance(term, BoundTransform) and transform == term.transform:
+        return _remove_transform(partition_name, pred)
+    return None
+
+
+def _remove_transform(partition_name: str, pred: BoundPredicate):
+    if isinstance(pred, BoundUnaryPredicate):
+        return pred.as_unbound(Reference(partition_name))
+    elif isinstance(pred, BoundLiteralPredicate):
+        return pred.as_unbound(Reference(partition_name), pred.literal)
+    elif isinstance(pred, (BoundIn, BoundNotIn)):
+        return pred.as_unbound(Reference(partition_name), pred.literals)
+    else:
+        raise ValueError(f"Cannot replace transform in unknown predicate: {pred}")
+
+
+def _transform_set(name: str, pred: BoundSetPredicate, func: Callable[[Optional[M]], Optional[M]]) -> UnboundPredicate:

Review Comment:
   Ah, I think this was copied from Java. Still, a better name would be good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022127646


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimeLiteral, TimestampLiteral)):
+        raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+    if isinstance(pred, BoundLessThan):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))
+    elif isinstance(pred, BoundLessThanOrEqual):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundGreaterThan):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary.increment()))
+    elif isinstance(pred, BoundGreaterThanOrEqual):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(transform, boundary))
+    else:
+        return None
+
+
+def _truncate_array(
+    name: str, pred: BoundLiteralPredicate, func: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if type(pred) in {BoundLessThan, BoundLessThanOrEqual}:
+        return LessThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    elif type(pred) in {BoundGreaterThan, BoundGreaterThanOrEqual}:
+        return GreaterThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    if isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(func, boundary))
+    else:
+        return None
+
+
+def _project_transform_predicate(transform: Transform, partition_name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+    term = pred.term
+    if isinstance(term, BoundTransform) and transform == term.transform:
+        return _remove_transform(partition_name, pred)
+    return None
+
+
+def _remove_transform(partition_name: str, pred: BoundPredicate):
+    if isinstance(pred, BoundUnaryPredicate):
+        return pred.as_unbound(Reference(partition_name))
+    elif isinstance(pred, BoundLiteralPredicate):
+        return pred.as_unbound(Reference(partition_name), pred.literal)
+    elif isinstance(pred, (BoundIn, BoundNotIn)):
+        return pred.as_unbound(Reference(partition_name), pred.literals)
+    else:
+        raise ValueError(f"Cannot replace transform in unknown predicate: {pred}")
+
+
+def _transform_set(name: str, pred: BoundSetPredicate, func: Callable[[Optional[M]], Optional[M]]) -> UnboundPredicate:

Review Comment:
   Is the Callable type correct? That doesn't seem to match for bucket functions, where it would be `Optional[Any] -> Optional[int]`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027299006


##########
python/pyiceberg/expressions/literals.py:
##########
@@ -491,3 +531,8 @@ def _(self, type_var: FixedType) -> Optional[Literal[bytes]]:
             return FixedLiteral(self.value)
         else:
             return None
+
+
+def _transform_literal(func: Callable[[T], T], lit: Literal[T]) -> Literal[T]:
+    """Small helper to unwrap the value from the literal, and wrap it again"""
+    return literal(func(lit.value))

Review Comment:
   Good one, done! πŸ‘πŸ» 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027350494


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]

Review Comment:
   We can also re-use `L` here and drop `M`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027342383


##########
python/pyiceberg/expressions/literals.py:
##########
@@ -381,10 +381,12 @@ def __init__(self, value: Decimal):
         super().__init__(value, Decimal)
 
     def increment(self) -> Literal[Decimal]:
-        return DecimalLiteral(self.value + 1)
+        inc_value = 1 / (10 ** abs(self.value.as_tuple().exponent))

Review Comment:
   It may also make sense to have this in `util/decimal.py` instead of here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022122972


##########
python/tests/test_transforms.py:
##########
@@ -506,3 +532,345 @@ def test_datetime_transform_str(transform, transform_str):
 )
 def test_datetime_transform_repr(transform, transform_repr):
     assert repr(transform) == transform_repr
+
+
+@pytest.fixture
+def bound_reference_str() -> BoundReference:
+    return BoundReference(field=NestedField(1, "field", StringType(), required=False), accessor=Accessor(position=0, inner=None))
+
+
+@pytest.fixture
+def bound_reference_date() -> BoundReference:
+    return BoundReference(field=NestedField(1, "field", DateType(), required=False), accessor=Accessor(position=0, inner=None))
+
+
+@pytest.fixture
+def bound_reference_timestamp() -> BoundReference:
+    return BoundReference(
+        field=NestedField(1, "field", TimestampType(), required=False), accessor=Accessor(position=0, inner=None)
+    )
+
+
+@pytest.fixture
+def bound_reference_decimal() -> BoundReference:
+    return BoundReference(
+        field=NestedField(1, "field", DecimalType(8, 2), required=False), accessor=Accessor(position=0, inner=None)
+    )
+
+
+@pytest.fixture
+def bound_reference_long() -> BoundReference:
+    return BoundReference(
+        field=NestedField(1, "field", DecimalType(8, 2), required=False), accessor=Accessor(position=0, inner=None)
+    )
+
+
+def test_projection_bucket_unary(bound_reference_str: BoundReference) -> None:
+    assert BucketTransform(2).project("name", BoundNotNull(term=bound_reference_str)) == NotNull(term=Reference(name="name"))
+
+
+def test_projection_bucket_literal(bound_reference_str: BoundReference) -> None:

Review Comment:
   I think we need tests for every combination of Transform and Predicate. For example, I'd like to see a parameterized test for all of the inequalities that return `None`, to make sure that if the implementation changes, tests are still there to validate it. An important one that is missing is also `NotEqualTo`, which can't be projected through Bucket.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022140064


##########
python/pyiceberg/transforms.py:
##########
@@ -246,9 +296,26 @@ def granularity(self) -> TimeResolution:
     def satisfies_order_of(self, other: Transform) -> bool:
         return self.granularity <= other.granularity if hasattr(other, "granularity") else False
 
-    def result_type(self, source: IcebergType) -> IcebergType:
+    def result_type(self, source: IcebergType) -> IntegerType:
         return IntegerType()
 
+    @abstractmethod
+    def transform(self, source: IcebergType) -> Callable[[Optional[Any]], Optional[int]]:
+        ...
+
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        transformer = self.transform(pred.term.ref().field.field_type)
+        if isinstance(pred.term, BoundTransform):
+            return _project_transform_predicate(self, name, pred)
+        elif isinstance(pred, BoundUnaryPredicate):
+            return pred.as_unbound(Reference(name))

Review Comment:
   We should follow up to validate `NaN` handling in several places. I just checked and we allow NaN values into literals, which we should not do. Java disallows it here: https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/expressions/Literals.java#L61



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022127997


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimeLiteral, TimestampLiteral)):
+        raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+    if isinstance(pred, BoundLessThan):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))
+    elif isinstance(pred, BoundLessThanOrEqual):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundGreaterThan):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary.increment()))
+    elif isinstance(pred, BoundGreaterThanOrEqual):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(transform, boundary))
+    else:
+        return None
+
+
+def _truncate_array(
+    name: str, pred: BoundLiteralPredicate, func: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if type(pred) in {BoundLessThan, BoundLessThanOrEqual}:
+        return LessThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    elif type(pred) in {BoundGreaterThan, BoundGreaterThanOrEqual}:
+        return GreaterThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    if isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(func, boundary))
+    else:
+        return None
+
+
+def _project_transform_predicate(transform: Transform, partition_name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+    term = pred.term
+    if isinstance(term, BoundTransform) and transform == term.transform:
+        return _remove_transform(partition_name, pred)
+    return None
+
+
+def _remove_transform(partition_name: str, pred: BoundPredicate):
+    if isinstance(pred, BoundUnaryPredicate):
+        return pred.as_unbound(Reference(partition_name))
+    elif isinstance(pred, BoundLiteralPredicate):
+        return pred.as_unbound(Reference(partition_name), pred.literal)
+    elif isinstance(pred, (BoundIn, BoundNotIn)):
+        return pred.as_unbound(Reference(partition_name), pred.literals)
+    else:
+        raise ValueError(f"Cannot replace transform in unknown predicate: {pred}")
+
+
+def _transform_set(name: str, pred: BoundSetPredicate, func: Callable[[Optional[M]], Optional[M]]) -> UnboundPredicate:
+    literals = pred.literals
+    if isinstance(pred, BoundIn):
+        return In(Reference(name), {_transform_literal(func, literal) for literal in literals})
+    elif isinstance(pred, BoundIn):
+        return NotIn(Reference(name), {_transform_literal(func, literal) for literal in literals})
+    else:
+        raise ValueError(f"Unknown BoundSetPredicate: {pred}")
+
+
+class BoundTransform(BoundTerm[T]):

Review Comment:
   Shouldn't this be in expressions rather than here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022135291


##########
python/pyiceberg/expressions/literals.py:
##########
@@ -491,3 +531,8 @@ def _(self, type_var: FixedType) -> Optional[Literal[bytes]]:
             return FixedLiteral(self.value)
         else:
             return None
+
+
+def _transform_literal(func: Callable[[T], T], lit: Literal[T]) -> Literal[T]:
+    """Small helper to unwrap the value from the literal, and wrap it again"""
+    return literal(func(lit.value))

Review Comment:
   I'd probably move this over to transforms since it is only used there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027301733


##########
python/pyiceberg/transforms.py:
##########
@@ -511,6 +590,31 @@ def preserves_order(self) -> bool:
     def source_type(self) -> IcebergType:
         return self._source_type
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        field_type = pred.term.ref().field.field_type
+
+        if isinstance(pred.term, BoundTransform):
+            return _project_transform_predicate(self, name, pred)
+
+        # Implement startswith and notstartswith for string (and probably binary)
+        # https://github.com/apache/iceberg/issues/6112
+
+        if isinstance(pred, BoundUnaryPredicate):
+            return pred.as_unbound(Reference(name))
+
+        if isinstance(field_type, (IntegerType, LongType, DecimalType)):
+            if isinstance(pred, BoundLiteralPredicate):
+                return _truncate_number(name, pred, self.transform(field_type))
+            elif isinstance(pred, BoundIn):
+                return _transform_set(name, pred, self.transform(field_type))

Review Comment:
   I like it πŸ‘πŸ» 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027296868


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimeLiteral, TimestampLiteral)):
+        raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+    if isinstance(pred, BoundLessThan):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))
+    elif isinstance(pred, BoundLessThanOrEqual):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundGreaterThan):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary.increment()))
+    elif isinstance(pred, BoundGreaterThanOrEqual):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(transform, boundary))
+    else:
+        return None
+
+
+def _truncate_array(
+    name: str, pred: BoundLiteralPredicate, func: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if type(pred) in {BoundLessThan, BoundLessThanOrEqual}:
+        return LessThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    elif type(pred) in {BoundGreaterThan, BoundGreaterThanOrEqual}:
+        return GreaterThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    if isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(func, boundary))
+    else:
+        return None
+
+
+def _project_transform_predicate(transform: Transform, partition_name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+    term = pred.term
+    if isinstance(term, BoundTransform) and transform == term.transform:
+        return _remove_transform(partition_name, pred)
+    return None
+
+
+def _remove_transform(partition_name: str, pred: BoundPredicate):
+    if isinstance(pred, BoundUnaryPredicate):
+        return pred.as_unbound(Reference(partition_name))
+    elif isinstance(pred, BoundLiteralPredicate):
+        return pred.as_unbound(Reference(partition_name), pred.literal)
+    elif isinstance(pred, (BoundIn, BoundNotIn)):
+        return pred.as_unbound(Reference(partition_name), pred.literals)
+    else:
+        raise ValueError(f"Cannot replace transform in unknown predicate: {pred}")
+
+
+def _transform_set(name: str, pred: BoundSetPredicate, func: Callable[[Optional[M]], Optional[M]]) -> UnboundPredicate:

Review Comment:
   It is also used for the TimeTransform. Changed it into `_apply_transform_set`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027353231


##########
python/tests/test_transforms.py:
##########
@@ -874,7 +874,7 @@ def test_projection_truncate_long_literal_eq(bound_reference_long: BoundReferenc
 def test_projection_truncate_long_literal_gt(bound_reference_long: BoundReference) -> None:
     assert TruncateTransform(2).project(
         "name", BoundGreaterThan(term=bound_reference_long, literal=DecimalLiteral(Decimal(19.25)))
-    ) == GreaterThanOrEqual(term="name", literal=Decimal("20.24"))
+    ) == GreaterThanOrEqual(term="name", literal=Decimal("19.26"))

Review Comment:
   I'm sorry, I don't understand the issue. It looks equivalent to the Java implementation: https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/transforms/ProjectionUtil.java#L141-L145



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #6128:
URL: https://github.com/apache/iceberg/pull/6128


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1015064168


##########
python/pyiceberg/expressions/__init__.py:
##########
@@ -68,7 +72,6 @@ def eval(self, struct: StructProtocol) -> T:  # pylint: disable=W0613
         """Returns the value at the referenced field's position in an object that abides by the StructProtocol"""
 
 
-@dataclass(frozen=True)

Review Comment:
   The rationale is in the PR description. Mostly because my IDE lights up as a Christmas tree, and it is a bit early. It seems that the dataclasses + IDE + mypy don't really know what to do with the inheritance of the different classes.
   
   ![image](https://user-images.githubusercontent.com/1134248/200243609-26f1a3c1-378b-49a7-a1fd-94f747cc16c2.png)
   
   A very slimmed-down example is given in the post around `Reference`:
   
   ![image](https://user-images.githubusercontent.com/1134248/200150958-3634a40c-a530-424c-956d-94b843bb5cb9.png)
   
   PyCharm is unable to figure out the argument:
   
   ![image](https://user-images.githubusercontent.com/1134248/200245808-a5db142c-98dc-4e8b-9f66-94296f8bcde7.png)
   
   It turns out that you shouldn't use positional arguments with data classes: https://medium.com/@aniscampos/python-dataclass-inheritance-finally-686eaf60fbb5



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1014915514


##########
python/pyiceberg/expressions/__init__.py:
##########
@@ -68,7 +72,6 @@ def eval(self, struct: StructProtocol) -> T:  # pylint: disable=W0613
         """Returns the value at the referenced field's position in an object that abides by the StructProtocol"""
 
 
-@dataclass(frozen=True)

Review Comment:
   Why are the changes to these classes necessary? I'd prefer to keep these as frozen data classes if possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022188092


##########
python/pyiceberg/expressions/literals.py:
##########
@@ -376,6 +407,12 @@ class DecimalLiteral(Literal[Decimal]):
     def __init__(self, value: Decimal):
         super().__init__(value, Decimal)
 
+    def increment(self) -> DecimalLiteral:
+        return DecimalLiteral(self.value + 1)

Review Comment:
   We actually want to increment by the smallest amount possible, since what is actually happening is adjusting the boundary to closed rather than actually incrementing. That is, `literal("12.34").to(DecimalType(9, 2).increment())` should result in `12.35`, not `13.35`.
   
   We do this in Java by converting to the unscaled value, incrementing, and then converting back. I think we should do the same thing here:
   
   ```
   def increment_unscaled(d: Decimal):
       (_, _, original_scale) = d.to_tuple()
       unscaled = decimal_to_unscaled(self.value) + 1
       return unscaled_to_decimal(unscaled, original_scale)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022174272


##########
python/tests/test_transforms.py:
##########
@@ -506,3 +532,345 @@ def test_datetime_transform_str(transform, transform_str):
 )
 def test_datetime_transform_repr(transform, transform_repr):
     assert repr(transform) == transform_repr
+
+
+@pytest.fixture
+def bound_reference_str() -> BoundReference:
+    return BoundReference(field=NestedField(1, "field", StringType(), required=False), accessor=Accessor(position=0, inner=None))
+
+
+@pytest.fixture
+def bound_reference_date() -> BoundReference:
+    return BoundReference(field=NestedField(1, "field", DateType(), required=False), accessor=Accessor(position=0, inner=None))
+
+
+@pytest.fixture
+def bound_reference_timestamp() -> BoundReference:
+    return BoundReference(
+        field=NestedField(1, "field", TimestampType(), required=False), accessor=Accessor(position=0, inner=None)
+    )
+
+
+@pytest.fixture
+def bound_reference_decimal() -> BoundReference:
+    return BoundReference(
+        field=NestedField(1, "field", DecimalType(8, 2), required=False), accessor=Accessor(position=0, inner=None)
+    )
+
+
+@pytest.fixture
+def bound_reference_long() -> BoundReference:
+    return BoundReference(
+        field=NestedField(1, "field", DecimalType(8, 2), required=False), accessor=Accessor(position=0, inner=None)
+    )
+
+
+def test_projection_bucket_unary(bound_reference_str: BoundReference) -> None:
+    assert BucketTransform(2).project("name", BoundNotNull(term=bound_reference_str)) == NotNull(term=Reference(name="name"))
+
+
+def test_projection_bucket_literal(bound_reference_str: BoundReference) -> None:

Review Comment:
   It would be fine to add these when the projection visitor is added, so that the thorough tests cover other boolean expressions (and/or/not) as well as the handling when a transform projection returns `None`.
   
   I think that's also how we test the Java implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022169422


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimeLiteral, TimestampLiteral)):

Review Comment:
   I think we should remove `TimeLiteral` from here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022114942


##########
python/pyiceberg/transforms.py:
##########
@@ -173,6 +206,23 @@ def apply(self, value: Optional[S]) -> Optional[int]:
     def result_type(self, source: IcebergType) -> IcebergType:
         return IntegerType()
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:

Review Comment:
   `BoundPredicate` should have a type (even if it is `Any`), right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027342632


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]

Review Comment:
   Should this be `BoundLiteralPredicate[M]`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1028599283


##########
python/pyiceberg/expressions/__init__.py:
##########
@@ -601,36 +640,65 @@ def __eq__(self, other):
     def __repr__(self) -> str:
         return f"{str(self.__class__.__name__)}(term={repr(self.term)}, literal={repr(self.literal)})"
 
+    @property
+    @abstractmethod
+    def as_unbound(self) -> Type[LiteralPredicate[L]]:
+        ...
+
 
 class BoundEqualTo(BoundLiteralPredicate[L]):
     def __invert__(self) -> BoundNotEqualTo[L]:
         return BoundNotEqualTo[L](self.term, self.literal)
 
+    @property
+    def as_unbound(self) -> Type[EqualTo[L]]:
+        return EqualTo
+
 
 class BoundNotEqualTo(BoundLiteralPredicate[L]):
     def __invert__(self) -> BoundEqualTo[L]:
         return BoundEqualTo[L](self.term, self.literal)
 
+    @property
+    def as_unbound(self) -> Type[NotEqualTo[L]]:
+        return NotEqualTo

Review Comment:
   Does this need the type param, `NotEqualTo[L]`? It looks like `GreaterThan` has it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027341882


##########
python/tests/test_transforms.py:
##########
@@ -874,7 +874,7 @@ def test_projection_truncate_long_literal_eq(bound_reference_long: BoundReferenc
 def test_projection_truncate_long_literal_gt(bound_reference_long: BoundReference) -> None:
     assert TruncateTransform(2).project(
         "name", BoundGreaterThan(term=bound_reference_long, literal=DecimalLiteral(Decimal(19.25)))
-    ) == GreaterThanOrEqual(term="name", literal=Decimal("20.24"))
+    ) == GreaterThanOrEqual(term="name", literal=Decimal("19.26"))

Review Comment:
   I think this test is incorrect because the reference is for a decimal column. The adjustment to the literal is probably correct, but we should never have a case where a bound expression contains a term that doesn't match the literal.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027349034


##########
python/pyiceberg/expressions/literals.py:
##########
@@ -381,10 +381,12 @@ def __init__(self, value: Decimal):
         super().__init__(value, Decimal)
 
     def increment(self) -> Literal[Decimal]:
-        return DecimalLiteral(self.value + 1)
+        inc_value = 1 / (10 ** abs(self.value.as_tuple().exponent))

Review Comment:
   Ah,  wasn't aware that we already had those methods around. I've used those instead πŸ‘πŸ» 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027335851


##########
python/pyiceberg/expressions/literals.py:
##########
@@ -376,6 +407,12 @@ class DecimalLiteral(Literal[Decimal]):
     def __init__(self, value: Decimal):
         super().__init__(value, Decimal)
 
+    def increment(self) -> DecimalLiteral:
+        return DecimalLiteral(self.value + 1)

Review Comment:
   How do you feel about:
   ```python
       def increment(self) -> Literal[Decimal]:
           inc_value = 1 / (10 ** abs(self.value.as_tuple().exponent))
           return DecimalLiteral(self.value + Decimal(str(inc_value)))
   
       def decrement(self) -> Literal[Decimal]:
           inc_value = 1 / (10 ** abs(self.value.as_tuple().exponent))
           return DecimalLiteral(self.value - Decimal(str(inc_value)))
   ```
   Added tests as well:
   ```python
   def test_decimal_literal_increment():
       dec = DecimalLiteral(Decimal('10.123'))
       # Twice to check that we don't mutate the value
       assert dec.increment() == DecimalLiteral(Decimal('10.124'))
       assert dec.increment() == DecimalLiteral(Decimal('10.124'))
       # To check that the scale is still the same
       assert dec.increment().value.as_tuple() == Decimal('10.124').as_tuple()
   
   def test_decimal_literal_dencrement():
       dec = DecimalLiteral(Decimal('10.123'))
       # Twice to check that we don't mutate the value
       assert dec.decrement() == DecimalLiteral(Decimal('10.122'))
       assert dec.decrement() == DecimalLiteral(Decimal('10.122'))
       # To check that the scale is still the same
       assert dec.decrement().value.as_tuple() == Decimal('10.122').as_tuple()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1027299865


##########
python/pyiceberg/expressions/literals.py:
##########
@@ -346,6 +365,12 @@ class TimeLiteral(Literal[int]):
     def __init__(self, value: int):
         super().__init__(value, int)
 
+    def increment(self) -> TimeLiteral:
+        return TimeLiteral(self.value + 1)

Review Comment:
   Makes sense, removed πŸ‘πŸ» 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1028604963


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+M = TypeVar("M")
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate, transform: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimeLiteral, TimestampLiteral)):
+        raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+    if isinstance(pred, BoundLessThan):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))
+    elif isinstance(pred, BoundLessThanOrEqual):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundGreaterThan):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary.increment()))
+    elif isinstance(pred, BoundGreaterThanOrEqual):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(transform, boundary))
+    else:
+        return None
+
+
+def _truncate_array(
+    name: str, pred: BoundLiteralPredicate, func: Callable[[Optional[M]], Optional[M]]
+) -> Optional[UnboundPredicate]:
+    boundary = pred.literal
+
+    if type(pred) in {BoundLessThan, BoundLessThanOrEqual}:
+        return LessThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    elif type(pred) in {BoundGreaterThan, BoundGreaterThanOrEqual}:
+        return GreaterThanOrEqual(Reference(name), _transform_literal(func, boundary))
+    if isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(func, boundary))
+    else:
+        return None
+
+
+def _project_transform_predicate(transform: Transform, partition_name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+    term = pred.term
+    if isinstance(term, BoundTransform) and transform == term.transform:
+        return _remove_transform(partition_name, pred)
+    return None
+
+
+def _remove_transform(partition_name: str, pred: BoundPredicate):
+    if isinstance(pred, BoundUnaryPredicate):
+        return pred.as_unbound(Reference(partition_name))
+    elif isinstance(pred, BoundLiteralPredicate):
+        return pred.as_unbound(Reference(partition_name), pred.literal)
+    elif isinstance(pred, (BoundIn, BoundNotIn)):
+        return pred.as_unbound(Reference(partition_name), pred.literals)
+    else:
+        raise ValueError(f"Cannot replace transform in unknown predicate: {pred}")
+
+
+def _transform_set(name: str, pred: BoundSetPredicate, func: Callable[[Optional[M]], Optional[M]]) -> UnboundPredicate:
+    literals = pred.literals
+    if isinstance(pred, BoundIn):
+        return In(Reference(name), {_transform_literal(func, literal) for literal in literals})
+    elif isinstance(pred, BoundIn):
+        return NotIn(Reference(name), {_transform_literal(func, literal) for literal in literals})
+    else:
+        raise ValueError(f"Unknown BoundSetPredicate: {pred}")
+
+
+class BoundTransform(BoundTerm[T]):

Review Comment:
   Strange. I don't see an import for expressions. In any case, we can fix it later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1028602287


##########
python/pyiceberg/transforms.py:
##########
@@ -644,8 +748,88 @@ def can_transform(self, _: IcebergType) -> bool:
     def result_type(self, source: IcebergType) -> IcebergType:
         return source
 
+    def project(self, name: str, pred: BoundPredicate[L]) -> Optional[UnboundPredicate[Any]]:
+        return None
+
     def to_human_string(self, _: IcebergType, value: Optional[S]) -> str:
         return "null"
 
     def __repr__(self) -> str:
         return "VoidTransform()"
+
+
+def _truncate_number(
+    name: str, pred: BoundLiteralPredicate[L], transform: Callable[[Optional[L]], Optional[L]]
+) -> Optional[UnboundPredicate[Any]]:
+    boundary = pred.literal
+
+    if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimestampLiteral)):
+        raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+    if isinstance(pred, BoundLessThan):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement()))  # type: ignore
+    elif isinstance(pred, BoundLessThanOrEqual):
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundGreaterThan):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary.increment()))  # type: ignore
+    elif isinstance(pred, BoundGreaterThanOrEqual):
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(transform, boundary))
+    else:
+        return None
+
+
+def _truncate_array(
+    name: str, pred: BoundLiteralPredicate[L], transform: Callable[[Optional[L]], Optional[L]]
+) -> Optional[UnboundPredicate[Any]]:
+    boundary = pred.literal
+
+    if type(pred) in {BoundLessThan, BoundLessThanOrEqual}:
+        return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    elif type(pred) in {BoundGreaterThan, BoundGreaterThanOrEqual}:
+        return GreaterThanOrEqual(Reference(name), _transform_literal(transform, boundary))
+    if isinstance(pred, BoundEqualTo):
+        return EqualTo(Reference(name), _transform_literal(transform, boundary))
+    else:
+        return None
+
+
+def _project_transform_predicate(
+    transform: Transform[Any, Any], partition_name: str, pred: BoundPredicate[L]
+) -> Optional[UnboundPredicate[Any]]:
+    term = pred.term
+    if isinstance(term, BoundTransform) and transform == term.transform:
+        return _remove_transform(partition_name, pred)
+    return None
+
+
+def _remove_transform(partition_name: str, pred: BoundPredicate[L]):
+    if isinstance(pred, BoundUnaryPredicate):
+        return pred.as_unbound(Reference(partition_name))
+    elif isinstance(pred, BoundLiteralPredicate):
+        return pred.as_unbound(Reference(partition_name), pred.literal)
+    elif isinstance(pred, (BoundIn, BoundNotIn)):
+        return pred.as_unbound(Reference(partition_name), pred.literals)
+    else:
+        raise ValueError(f"Cannot replace transform in unknown predicate: {pred}")
+
+
+def _set_apply_transform(
+    name: str, pred: BoundSetPredicate[L], transform: Callable[[L], L]
+) -> UnboundPredicate[Any]:
+    literals = pred.literals
+    if isinstance(pred, BoundSetPredicate):
+        return pred.as_unbound(Reference(name), {_transform_literal(transform, literal) for literal in literals})
+    else:
+        raise ValueError(f"Unknown BoundSetPredicate: {pred}")

Review Comment:
   Nit: in this case, it isn't a `BoundSetPredicate` because `isinstance` returned false.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022171096


##########
python/pyiceberg/transforms.py:
##########
@@ -511,6 +590,31 @@ def preserves_order(self) -> bool:
     def source_type(self) -> IcebergType:
         return self._source_type
 
+    def project(self, name: str, pred: BoundPredicate) -> Optional[UnboundPredicate]:
+        field_type = pred.term.ref().field.field_type
+
+        if isinstance(pred.term, BoundTransform):
+            return _project_transform_predicate(self, name, pred)
+
+        # Implement startswith and notstartswith for string (and probably binary)
+        # https://github.com/apache/iceberg/issues/6112
+
+        if isinstance(pred, BoundUnaryPredicate):

Review Comment:
   Other functions use `elif`. I'm good either way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1022174272


##########
python/tests/test_transforms.py:
##########
@@ -506,3 +532,345 @@ def test_datetime_transform_str(transform, transform_str):
 )
 def test_datetime_transform_repr(transform, transform_repr):
     assert repr(transform) == transform_repr
+
+
+@pytest.fixture
+def bound_reference_str() -> BoundReference:
+    return BoundReference(field=NestedField(1, "field", StringType(), required=False), accessor=Accessor(position=0, inner=None))
+
+
+@pytest.fixture
+def bound_reference_date() -> BoundReference:
+    return BoundReference(field=NestedField(1, "field", DateType(), required=False), accessor=Accessor(position=0, inner=None))
+
+
+@pytest.fixture
+def bound_reference_timestamp() -> BoundReference:
+    return BoundReference(
+        field=NestedField(1, "field", TimestampType(), required=False), accessor=Accessor(position=0, inner=None)
+    )
+
+
+@pytest.fixture
+def bound_reference_decimal() -> BoundReference:
+    return BoundReference(
+        field=NestedField(1, "field", DecimalType(8, 2), required=False), accessor=Accessor(position=0, inner=None)
+    )
+
+
+@pytest.fixture
+def bound_reference_long() -> BoundReference:
+    return BoundReference(
+        field=NestedField(1, "field", DecimalType(8, 2), required=False), accessor=Accessor(position=0, inner=None)
+    )
+
+
+def test_projection_bucket_unary(bound_reference_str: BoundReference) -> None:
+    assert BucketTransform(2).project("name", BoundNotNull(term=bound_reference_str)) == NotNull(term=Reference(name="name"))
+
+
+def test_projection_bucket_literal(bound_reference_str: BoundReference) -> None:

Review Comment:
   It would be fine to add these when the projection visitor is added, so that the thorough tests cover other boolean expressions (and/or/not) as well as the handling when a transform projection returns `None`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org