You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/18 11:03:39 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #4278: Fix EXPLAIN plan for ParquetExec to show pruning_predicate

alamb opened a new pull request, #4278:
URL: https://github.com/apache/arrow-datafusion/pull/4278

   # Which issue does this PR close?
   
   Re: https://github.com/apache/arrow-datafusion/issues/4020
   
   
   # Rationale for this change
   
   It is confusing that the `ParquetExec` displays the pruning_predicate as "predicate" rather than "pruning_predicate" -- see https://github.com/apache/arrow-datafusion/issues/4020
   
   # What changes are included in this PR?
   
   Another issue is that the predicate is only pushed down for use as a row filter if it can be used as a pruning predicate, which I will fix
   
   
   # Are these changes tested?
   
   TBD
   
   # Are there any user-facing changes?
   Yes, better explain plans


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4278: Fix EXPLAIN plan for ParquetExec to show pruning_predicate

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4278:
URL: https://github.com/apache/arrow-datafusion/pull/4278#discussion_r1026317548


##########
datafusion/core/src/physical_plan/file_format/parquet.rs:
##########
@@ -1526,6 +1524,36 @@ mod tests {
         );
     }
 
+    #[tokio::test]
+    async fn parquet_exec_display() {
+        let c1: ArrayRef = Arc::new(StringArray::from(vec![
+            Some("Foo"),
+            None,
+            Some("bar"),
+            Some("bar"),
+            Some("bar"),
+            Some("bar"),
+            Some("zzz"),
+        ]));
+
+        // batch1: c1(string)
+        let batch1 = create_batch(vec![("c1", c1.clone())]);
+
+        // on
+        let filter = col("c1").not_eq(lit("bar"));
+
+        let rt = round_trip(vec![batch1], None, None, Some(filter), true, false).await;
+
+        // convert to explain plan form
+        let display = displayable(rt.parquet_exec.as_ref()).indent().to_string();
+
+        assert_contains!(
+            &display,
+            "pruning_predicate=c1_min@0 != bar OR bar != c1_max@1"

Review Comment:
   Previously this was "predicate=c1_min@0 != bar OR bar != c1_max@1"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4278: Fix EXPLAIN plan for ParquetExec to show pruning_predicate

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4278:
URL: https://github.com/apache/arrow-datafusion/pull/4278#discussion_r1026317548


##########
datafusion/core/src/physical_plan/file_format/parquet.rs:
##########
@@ -1526,6 +1524,36 @@ mod tests {
         );
     }
 
+    #[tokio::test]
+    async fn parquet_exec_display() {
+        let c1: ArrayRef = Arc::new(StringArray::from(vec![
+            Some("Foo"),
+            None,
+            Some("bar"),
+            Some("bar"),
+            Some("bar"),
+            Some("bar"),
+            Some("zzz"),
+        ]));
+
+        // batch1: c1(string)
+        let batch1 = create_batch(vec![("c1", c1.clone())]);
+
+        // on
+        let filter = col("c1").not_eq(lit("bar"));
+
+        let rt = round_trip(vec![batch1], None, None, Some(filter), true, false).await;
+
+        // convert to explain plan form
+        let display = displayable(rt.parquet_exec.as_ref()).indent().to_string();
+
+        assert_contains!(
+            &display,
+            "pruning_predicate=c1_min@0 != bar OR bar != c1_max@1"

Review Comment:
   Previously this was "pruning_predicate=c1_min@0 != bar OR bar != c1_max@1"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ursabot commented on pull request #4278: Fix EXPLAIN plan for ParquetExec to show pruning_predicate

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #4278:
URL: https://github.com/apache/arrow-datafusion/pull/4278#issuecomment-1319977649

   Benchmark runs are scheduled for baseline = ff2f1134a1ee96c48621465d47d2e5ea1c07ac1b and contender = 949d5af37291e3424277ef68f0aa20a28e5c6fbc. 949d5af37291e3424277ef68f0aa20a28e5c6fbc is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/380ffc975c954686975e946928932ef6...364dbfa0ac414e969452e4fa6a4a8266/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/b5fff34cb8cf470ea9260ece6bd89996...74548f4c29cf4a82a8977c280c6f3118/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/6fa27dc97e674817a56cfa23f0ea1527...781bed960e7244fd8c326472f358d24b/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/699d02490e1a4845a101e0a2a0801563...7d0fadd388d640ffb8dd3e3e379e3db3/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang merged pull request #4278: Fix EXPLAIN plan for ParquetExec to show pruning_predicate

Posted by GitBox <gi...@apache.org>.
Ted-Jiang merged PR #4278:
URL: https://github.com/apache/arrow-datafusion/pull/4278


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org