You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/07 18:16:38 UTC

[GitHub] [iceberg] Fokko opened a new pull request, #6140: Python: Fix Evaluator tests

Fokko opened a new pull request, #6140:
URL: https://github.com/apache/iceberg/pull/6140

   Instead of just supplying the unbound expression to the evaluator directly, we created a bogus one and replaced the bound expression with the one we wanted to test. But introduced a bug in the test because all the strings and doubles were converted to a long because that was the schema that was used to initialize the evaluator.
   
   This PR changes the tests to supply the correct schema, and includes the binding in the tests as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1016041704


##########
python/tests/expressions/test_visitors.py:
##########
@@ -954,36 +919,33 @@ def test_manifest_evaluator_less_than_or_equal_overlap():
             upper_bound=_to_byte_buffer(StringType(), "b"),
         )
     )
+    assert _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
-    assert _create_manifest_evaluator(expr).eval(manifest)
-
-
-def test_manifest_evaluator_less_than_or_equal_all_null():
-    expr = BoundLessThanOrEqual(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("a"),
-    )
 
+def test_manifest_evaluator_less_than_or_equal_all_null(string_schema: Schema):
+    expr = LessThanOrEqual[str](Reference("col_str"), StringLiteral("a"))
     manifest = _to_manifest_file(
-        PartitionFieldSummary(contains_null=False, contains_nan=False, lower_bound=None, upper_bound=None)
+        PartitionFieldSummary(contains_null=True, contains_nan=False, lower_bound=None, upper_bound=None)
     )
-
     # All null
-    assert not _create_manifest_evaluator(expr).eval(manifest)
+    assert not _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
 
-def test_manifest_evaluator_less_than_or_equal_no_match():
-    expr = BoundLessThanOrEqual(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("a"),
+def test_manifest_evaluator_less_than_or_equal_match(string_schema: Schema):

Review Comment:
   This looks like the same test as `test_manifest_evaluator_less_than_or_equal_on_bound`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1017243851


##########
python/tests/expressions/test_visitors.py:
##########
@@ -1585,27 +1326,14 @@ def test_manifest_evaluator_not():
             upper_bound=_to_byte_buffer(LongType(), 10),
         )
     )
-    assert not _create_manifest_evaluator(expr).eval(manifest)
+    assert not _ManifestEvalVisitor(long_schema, expr).eval(manifest)
 
 
-def test_manifest_evaluator_and():
+def test_manifest_evaluator_and(long_schema: Schema):
     expr = And(
-        BoundIn(
-            term=BoundReference(
-                field=NestedField(field_id=1, name="foo", field_type=LongType(), required=False),
-                accessor=Accessor(position=0, inner=None),
-            ),
-            literals={LongLiteral(i) for i in range(22)},
-        ),
-        BoundIn(
-            term=BoundReference(
-                field=NestedField(field_id=1, name="foo", field_type=LongType(), required=False),
-                accessor=Accessor(position=0, inner=None),
-            ),
-            literals={LongLiteral(i) for i in range(22)},
-        ),
+        In(Reference[int]("col_long"), tuple({LongLiteral(i) for i in range(22)})),
+        In(Reference[int]("col_long"), tuple({LongLiteral(i) for i in range(22)})),

Review Comment:
   See my comment below about the predicates inside `Or`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015876595


##########
python/tests/expressions/test_visitors.py:
##########
@@ -874,36 +881,19 @@ def test_manifest_evaluator_less_than_overlap():
             upper_bound=_to_byte_buffer(StringType(), "b"),
         )
     )
+    assert not _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
-    assert not _create_manifest_evaluator(expr).eval(manifest)
-
-
-def test_manifest_evaluator_less_than_all_null():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("a"),
-    )
 
+def test_manifest_evaluator_less_than_all_null(string_schema: Schema):
+    expr = LessThan[str](Reference("col_str"), StringLiteral("a"))
     manifest = _to_manifest_file(
         PartitionFieldSummary(contains_null=False, contains_nan=False, lower_bound=None, upper_bound=None)

Review Comment:
   This doesn't look right. If all values are null then `contains_null` should be `True`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015893351


##########
python/tests/expressions/test_visitors.py:
##########
@@ -836,15 +843,23 @@ def _create_manifest_evaluator(bound_expr: BoundPredicate) -> _ManifestEvalVisit
     return evaluator
 
 
-def test_manifest_evaluator_less_than_no_overlap():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("c"),
-    )
+@pytest.fixture
+def string_schema() -> Schema:
+    return Schema(NestedField(field_id=1, name="col_str", field_type=StringType(), required=False))
+
+
+@pytest.fixture
+def double_schema() -> Schema:
+    return Schema(NestedField(field_id=1, name="col_double", field_type=DoubleType(), required=False))
+
+
+@pytest.fixture
+def long_schema() -> Schema:
+    return Schema(NestedField(field_id=1, name="col_long", field_type=LongType(), required=False))
+
 
+def test_manifest_evaluator_less_than_no_overlap(string_schema: Schema):

Review Comment:
   Bad wording. I meant when there is an overlap in the boundaries `{c} ∩ {a, b} = ∅`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #6140:
URL: https://github.com/apache/iceberg/pull/6140


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015778820


##########
python/tests/expressions/test_visitors.py:
##########
@@ -836,15 +843,23 @@ def _create_manifest_evaluator(bound_expr: BoundPredicate) -> _ManifestEvalVisit
     return evaluator
 
 
-def test_manifest_evaluator_less_than_no_overlap():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("c"),
-    )
+@pytest.fixture
+def string_schema() -> Schema:
+    return Schema(NestedField(field_id=1, name="foo", field_type=StringType(), required=False))

Review Comment:
   Nit: I'd prefer using better names that make the tests more readable, like like `str_col` because these are in a separate file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015799107


##########
python/tests/expressions/test_visitors.py:
##########
@@ -853,19 +868,11 @@ def test_manifest_evaluator_less_than_no_overlap():
             upper_bound=_to_byte_buffer(StringType(), "b"),
         )
     )
-
-    assert _create_manifest_evaluator(expr).eval(manifest)
+    assert _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
 
-def test_manifest_evaluator_less_than_overlap():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("a"),
-    )
-
+def test_manifest_evaluator_less_than_overlap(string_schema: Schema):
+    expr = LessThan[str](Reference("foo"), StringLiteral("a"))

Review Comment:
   For these tests, I like that we explicitly use the `StringLiteral`, since we know that we're picking the right `LiteralType`. If there would be a bug in the literal selection process, all the tests would turn red instantly. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015778820


##########
python/tests/expressions/test_visitors.py:
##########
@@ -836,15 +843,23 @@ def _create_manifest_evaluator(bound_expr: BoundPredicate) -> _ManifestEvalVisit
     return evaluator
 
 
-def test_manifest_evaluator_less_than_no_overlap():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("c"),
-    )
+@pytest.fixture
+def string_schema() -> Schema:
+    return Schema(NestedField(field_id=1, name="foo", field_type=StringType(), required=False))

Review Comment:
   Nit: I'd prefer using better names that make the tests more readable, like like `str_col` because these aren't located with the tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1016041196


##########
python/tests/expressions/test_visitors.py:
##########
@@ -874,57 +872,32 @@ def test_manifest_evaluator_less_than_overlap():
             upper_bound=_to_byte_buffer(StringType(), "b"),
         )
     )
+    assert not _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
-    assert not _create_manifest_evaluator(expr).eval(manifest)
-
-
-def test_manifest_evaluator_less_than_all_null():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("a"),
-    )
 
+def test_manifest_evaluator_less_than_all_null(string_schema: Schema):

Review Comment:
   You can probably make this generic and test that all inequalities will reject a column of all nulls. The only predicates that should accept this metadata are `!=` and `NOT IN`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015895913


##########
python/tests/expressions/test_visitors.py:
##########
@@ -874,36 +881,19 @@ def test_manifest_evaluator_less_than_overlap():
             upper_bound=_to_byte_buffer(StringType(), "b"),
         )
     )
+    assert not _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
-    assert not _create_manifest_evaluator(expr).eval(manifest)
-
-
-def test_manifest_evaluator_less_than_all_null():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("a"),
-    )
 
+def test_manifest_evaluator_less_than_all_null(string_schema: Schema):
+    expr = LessThan[str](Reference("col_str"), StringLiteral("a"))
     manifest = _to_manifest_file(
         PartitionFieldSummary(contains_null=False, contains_nan=False, lower_bound=None, upper_bound=None)
     )
+    assert not _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
-    # All null
-    assert not _create_manifest_evaluator(expr).eval(manifest)
-
-
-def test_manifest_evaluator_less_than_no_match():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("a"),
-    )
 
+def test_manifest_evaluator_less_than_no_match(string_schema: Schema):

Review Comment:
   Copy paste one, should be:
   ```python
   def test_manifest_evaluator_less_than_no_match(string_schema: Schema):
       expr = LessThan[str](Reference("col_str"), StringLiteral("a"))
       manifest = _to_manifest_file(
           PartitionFieldSummary(
               contains_null=False,
               contains_nan=False,
               lower_bound=_to_byte_buffer(StringType(), "b"),
               upper_bound=_to_byte_buffer(StringType(), "c"),
           )
       )
       assert not _ManifestEvalVisitor(string_schema, expr).eval(manifest)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015880802


##########
python/tests/expressions/test_visitors.py:
##########
@@ -853,19 +868,11 @@ def test_manifest_evaluator_less_than_no_overlap():
             upper_bound=_to_byte_buffer(StringType(), "b"),
         )
     )
+    assert _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
-    assert _create_manifest_evaluator(expr).eval(manifest)
-
-
-def test_manifest_evaluator_less_than_overlap():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("a"),
-    )
 
+def test_manifest_evaluator_less_than_overlap(string_schema: Schema):

Review Comment:
   Isn't this the case without overlap? Values < "a" doesn't overlap values from "a" to "b".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015894495


##########
python/tests/expressions/test_visitors.py:
##########
@@ -853,19 +868,11 @@ def test_manifest_evaluator_less_than_no_overlap():
             upper_bound=_to_byte_buffer(StringType(), "b"),
         )
     )
+    assert _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
-    assert _create_manifest_evaluator(expr).eval(manifest)
-
-
-def test_manifest_evaluator_less_than_overlap():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("a"),
-    )
 
+def test_manifest_evaluator_less_than_overlap(string_schema: Schema):

Review Comment:
   `{a} ∩ {a, b} = {a}`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015882570


##########
python/tests/expressions/test_visitors.py:
##########
@@ -912,19 +902,11 @@ def test_manifest_evaluator_less_than_no_match():
             upper_bound=_to_byte_buffer(StringType(), "b"),
         )
     )
+    assert not _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
-    assert not _create_manifest_evaluator(expr).eval(manifest)
-
-
-def test_manifest_evaluator_less_than_or_equal_no_overlap():
-    expr = BoundLessThanOrEqual(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("c"),
-    )
 
+def test_manifest_evaluator_less_than_or_equal_no_overlap(string_schema: Schema):
+    expr = LessThanOrEqual[str](Reference("col_str"), StringLiteral("c"))

Review Comment:
   I think this should be `<= "a"` to test that the boundary is used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015888905


##########
python/tests/expressions/test_visitors.py:
##########
@@ -853,19 +868,11 @@ def test_manifest_evaluator_less_than_no_overlap():
             upper_bound=_to_byte_buffer(StringType(), "b"),
         )
     )
+    assert _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
-    assert _create_manifest_evaluator(expr).eval(manifest)

Review Comment:
   Removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015875223


##########
python/tests/expressions/test_visitors.py:
##########
@@ -853,19 +868,11 @@ def test_manifest_evaluator_less_than_no_overlap():
             upper_bound=_to_byte_buffer(StringType(), "b"),
         )
     )
+    assert _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
-    assert _create_manifest_evaluator(expr).eval(manifest)

Review Comment:
   Is the `_create_manifest_evaluator` method used anywhere? Why keep it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1016944001


##########
python/tests/expressions/test_visitors.py:
##########
@@ -827,24 +834,23 @@ def _to_manifest_file(*partitions: PartitionFieldSummary) -> ManifestFile:
     )
 
 
-def _create_manifest_evaluator(bound_expr: BoundPredicate) -> _ManifestEvalVisitor:
-    """For testing. Creates a bogus evaluator, and then replaces the expression"""
-    evaluator = _ManifestEvalVisitor(
-        Schema(NestedField(1, "id", LongType())), EqualTo(term=Reference("id"), literal=literal("foo"))
-    )
-    evaluator.partition_filter = bound_expr
-    return evaluator
+@pytest.fixture
+def string_schema() -> Schema:

Review Comment:
   Added all of them: https://github.com/apache/iceberg/pull/6140/commits/8b3f2696291f8bdfa8007bbd55c6f3738a4182e6
   With one minor change, I made the field IDs consecutive 🙀:
   https://github.com/apache/iceberg/blob/master/api/src/test/java/org/apache/iceberg/expressions/TestInclusiveManifestEvaluator.java#L57-L58
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015879735


##########
python/tests/expressions/test_visitors.py:
##########
@@ -836,15 +843,23 @@ def _create_manifest_evaluator(bound_expr: BoundPredicate) -> _ManifestEvalVisit
     return evaluator
 
 
-def test_manifest_evaluator_less_than_no_overlap():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("c"),
-    )
+@pytest.fixture
+def string_schema() -> Schema:
+    return Schema(NestedField(field_id=1, name="col_str", field_type=StringType(), required=False))
+
+
+@pytest.fixture
+def double_schema() -> Schema:
+    return Schema(NestedField(field_id=1, name="col_double", field_type=DoubleType(), required=False))
+
+
+@pytest.fixture
+def long_schema() -> Schema:
+    return Schema(NestedField(field_id=1, name="col_long", field_type=LongType(), required=False))
+
 
+def test_manifest_evaluator_less_than_no_overlap(string_schema: Schema):

Review Comment:
   What does "no overlap" mean? To me it seems backwards in these tests:
   
   ```
   ----a----b----c---->
   <-------------c  # filter expression
       a----b  # value range
        ^^^^   # overlap
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015798033


##########
python/tests/expressions/test_visitors.py:
##########
@@ -836,15 +843,23 @@ def _create_manifest_evaluator(bound_expr: BoundPredicate) -> _ManifestEvalVisit
     return evaluator
 
 
-def test_manifest_evaluator_less_than_no_overlap():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("c"),
-    )
+@pytest.fixture
+def string_schema() -> Schema:
+    return Schema(NestedField(field_id=1, name="foo", field_type=StringType(), required=False))

Review Comment:
   Ah, I like that a lot. You can see the type from which fixture is being used, but more explicit is more better indeed. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015882053


##########
python/tests/expressions/test_visitors.py:
##########
@@ -874,36 +881,19 @@ def test_manifest_evaluator_less_than_overlap():
             upper_bound=_to_byte_buffer(StringType(), "b"),
         )
     )
+    assert not _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
-    assert not _create_manifest_evaluator(expr).eval(manifest)
-
-
-def test_manifest_evaluator_less_than_all_null():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("a"),
-    )
 
+def test_manifest_evaluator_less_than_all_null(string_schema: Schema):
+    expr = LessThan[str](Reference("col_str"), StringLiteral("a"))
     manifest = _to_manifest_file(
         PartitionFieldSummary(contains_null=False, contains_nan=False, lower_bound=None, upper_bound=None)
     )
+    assert not _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
-    # All null
-    assert not _create_manifest_evaluator(expr).eval(manifest)
-
-
-def test_manifest_evaluator_less_than_no_match():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("a"),
-    )
 
+def test_manifest_evaluator_less_than_no_match(string_schema: Schema):

Review Comment:
   Isn't this the same as `test_manifest_evaluator_less_than_overlap`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1017244665


##########
python/tests/expressions/test_visitors.py:
##########
@@ -1039,36 +985,37 @@ def test_manifest_evaluator_equal_overlap():
             )
         ],
     )
-
-    assert _create_manifest_evaluator(expr).eval(manifest)
+    assert _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
 
-def test_manifest_evaluator_equal_all_null():
-    expr = BoundEqualTo(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("a"),
+def test_manifest_evaluator_equal_on_bound_with_upper_bound(string_schema: Schema):

Review Comment:
   With all of the Java tests copied over, do we need the original tests that were in this suite? The Java ones should be thorough enough that we can just remove these.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1015781049


##########
python/tests/expressions/test_visitors.py:
##########
@@ -853,19 +868,11 @@ def test_manifest_evaluator_less_than_no_overlap():
             upper_bound=_to_byte_buffer(StringType(), "b"),
         )
     )
-
-    assert _create_manifest_evaluator(expr).eval(manifest)
+    assert _ManifestEvalVisitor(string_schema, expr).eval(manifest)
 
 
-def test_manifest_evaluator_less_than_overlap():
-    expr = BoundLessThan(
-        term=BoundReference(
-            field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
-            accessor=Accessor(position=0, inner=None),
-        ),
-        literal=StringLiteral("a"),
-    )
-
+def test_manifest_evaluator_less_than_overlap(string_schema: Schema):
+    expr = LessThan[str](Reference("foo"), StringLiteral("a"))

Review Comment:
   Should we do this after fixing expressions so that you don't need to add `Reference` and `StringLiteral` classes explicitly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1016042277


##########
python/tests/expressions/test_visitors.py:
##########
@@ -827,24 +834,23 @@ def _to_manifest_file(*partitions: PartitionFieldSummary) -> ManifestFile:
     )
 
 
-def _create_manifest_evaluator(bound_expr: BoundPredicate) -> _ManifestEvalVisitor:
-    """For testing. Creates a bogus evaluator, and then replaces the expression"""
-    evaluator = _ManifestEvalVisitor(
-        Schema(NestedField(1, "id", LongType())), EqualTo(term=Reference("id"), literal=literal("foo"))
-    )
-    evaluator.partition_filter = bound_expr
-    return evaluator
+@pytest.fixture
+def string_schema() -> Schema:

Review Comment:
   You may actually want to just base the tests here on the Java version, which has pretty through tests and reuses the manifest metadata: https://github.com/apache/iceberg/blob/master/api/src/test/java/org/apache/iceberg/expressions/TestInclusiveManifestEvaluator.java



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6140: Python: Fix Evaluator tests

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6140:
URL: https://github.com/apache/iceberg/pull/6140#discussion_r1017243615


##########
python/tests/expressions/test_visitors.py:
##########
@@ -1614,27 +1342,14 @@ def test_manifest_evaluator_and():
             upper_bound=_to_byte_buffer(LongType(), 10),
         )
     )
-    assert _create_manifest_evaluator(expr).eval(manifest)
+    assert _ManifestEvalVisitor(long_schema, expr).eval(manifest)
 
 
-def test_manifest_evaluator_or():
+def test_manifest_evaluator_or(long_schema: Schema):
     expr = Or(
-        BoundIn(
-            term=BoundReference(
-                field=NestedField(field_id=1, name="foo", field_type=LongType(), required=False),
-                accessor=Accessor(position=0, inner=None),
-            ),
-            literals={LongLiteral(i) for i in range(22)},
-        ),
-        BoundIn(
-            term=BoundReference(
-                field=NestedField(field_id=1, name="foo", field_type=LongType(), required=False),
-                accessor=Accessor(position=0, inner=None),
-            ),
-            literals={LongLiteral(i) for i in range(22)},
-        ),
+        In(Reference[int]("col_long"), tuple({LongLiteral(i) for i in range(22)})),
+        In(Reference[int]("col_long"), tuple({LongLiteral(i) for i in range(22)})),

Review Comment:
   I would expect an `Or` test to use different expressions, like `Or(LessThan("id", 5), GreaterThan("id", 10)`. It seems weird to have the same expression twice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org