You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/07 20:31:16 UTC

[GitHub] [arrow-datafusion] Cheappie commented on a diff in pull request #2170: Handle merged schemas in parquet pruning

Cheappie commented on code in PR #2170:
URL: https://github.com/apache/arrow-datafusion/pull/2170#discussion_r845538137


##########
datafusion/core/src/physical_plan/file_format/parquet.rs:
##########
@@ -919,6 +955,73 @@ mod tests {
         assert_batches_sorted_eq!(expected, &read);
     }
 
+    #[tokio::test]
+    async fn evolved_schema_filter() {
+        let c1: ArrayRef =
+            Arc::new(StringArray::from(vec![Some("Foo"), None, Some("bar")]));
+
+        let c2: ArrayRef = Arc::new(Int64Array::from(vec![Some(1), Some(2), None]));
+
+        let c3: ArrayRef = Arc::new(Int8Array::from(vec![Some(10), Some(20), None]));
+
+        // batch1: c1(string), c2(int64), c3(int8)
+        let batch1 = create_batch(vec![
+            ("c1", c1.clone()),
+            ("c2", c2.clone()),
+            ("c3", c3.clone()),
+        ]);
+
+        // batch2: c3(int8), c2(int64), c1(string)
+        let batch2 = create_batch(vec![("c3", c3), ("c2", c2), ("c1", c1)]);
+
+        let filter = col("c3").eq(lit(0_i8));
+
+        // read/write them files:
+        let read = round_trip_to_parquet(vec![batch1, batch2], None, None, Some(filter))
+            .await
+            .unwrap();
+
+        // Predicate should prune all row groups
+        assert_eq!(read.len(), 0);
+    }
+
+    #[tokio::test]
+    async fn evolved_schema_disjoint_schema_filter() {
+        let c1: ArrayRef =
+            Arc::new(StringArray::from(vec![Some("Foo"), None, Some("bar")]));
+
+        let c2: ArrayRef = Arc::new(Int64Array::from(vec![Some(1), Some(2), None]));

Review Comment:
   I might miss some point, but why values in c2 are not materialized if we weren't able to _prune_ them ? I wonder how filter like `c2 eq 1_i64` can be satisfied against null array ?
   
   ```
               "+-----+----+",
               "| c1  | c2 |",
               "+-----+----+",
               "| Foo |    |",
               "|     |    |",
               "| bar |    |",
               "+-----+----+",
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org