You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/12/06 17:55:04 UTC

[PR] Minor: reduce code duplication in PruningPredicate test [arrow-datafusion]

alamb opened a new pull request, #8441:
URL: https://github.com/apache/arrow-datafusion/pull/8441

   Draft:
   - [ ] Builds on https://github.com/apache/arrow-datafusion/pull/8440
   
   ## Which issue does this PR close?
   
   Follow on to https://github.com/apache/arrow-datafusion/pull/8440
   
   
   ## Rationale for this change
   I made a function that was more concise than the current tests, but I wanted to keep https://github.com/apache/arrow-datafusion/pull/8440 smaller so I didn't refactor existing tests to use it.
   
   ## What changes are included in this PR?
   1. refactor other tests to use `prune_with_expr` to reduce code duplication
   
   ## Are these changes tested?
   all tests
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   ## Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: reduce code duplication in PruningPredicate test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #8441:
URL: https://github.com/apache/arrow-datafusion/pull/8441#issuecomment-1873329537

   cc @yahoNanJing 
   
   @crepererum do you have time to review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: reduce code duplication in PruningPredicate test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #8441:
URL: https://github.com/apache/arrow-datafusion/pull/8441#issuecomment-1877837881

   Thank you @viirya  and @Ted-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: reduce code duplication in PruningPredicate test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #8441:
URL: https://github.com/apache/arrow-datafusion/pull/8441#discussion_r1435588286


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -2004,100 +2004,97 @@ mod tests {
             DataType::Decimal128(9, 2),
             true,
         )]));
-        // s1 > 5
-        let expr = col("s1").gt(lit(ScalarValue::Decimal128(Some(500), 9, 2)));
-        let expr = logical2physical(&expr, &schema);
-        // If the data is written by spark, the physical data type is INT32 in the parquet
-        // So we use the INT32 type of statistic.
-        let statistics = TestStatistics::new().with(
-            "s1",
-            ContainerStats::new_i32(
-                vec![Some(0), Some(4), None, Some(3)], // min
-                vec![Some(5), Some(6), Some(4), None], // max
+
+        prune_with_expr(

Review Comment:
   The idea is to reduce the boiler plate required to setup the test / prune the expression, and simply list the data and expected results. It may not be many less lines, but the logic is significantly less dense in my opinion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: reduce code duplication in PruningPredicate test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #8441:
URL: https://github.com/apache/arrow-datafusion/pull/8441#issuecomment-1876029781

   @Ted-Jiang  I wonder if you have some time to review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: reduce code duplication in PruningPredicate test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #8441:
URL: https://github.com/apache/arrow-datafusion/pull/8441#discussion_r1435588286


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -2004,100 +2004,97 @@ mod tests {
             DataType::Decimal128(9, 2),
             true,
         )]));
-        // s1 > 5
-        let expr = col("s1").gt(lit(ScalarValue::Decimal128(Some(500), 9, 2)));
-        let expr = logical2physical(&expr, &schema);
-        // If the data is written by spark, the physical data type is INT32 in the parquet
-        // So we use the INT32 type of statistic.
-        let statistics = TestStatistics::new().with(
-            "s1",
-            ContainerStats::new_i32(
-                vec![Some(0), Some(4), None, Some(3)], // min
-                vec![Some(5), Some(6), Some(4), None], // max
+
+        prune_with_expr(

Review Comment:
   The idea is to reduce the boiler plate required to setup the test / prune the expression, and simply list the data and expected results



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: reduce code duplication in PruningPredicate test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #8441:
URL: https://github.com/apache/arrow-datafusion/pull/8441


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org