You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/28 06:52:32 UTC

[GitHub] [arrow] rdettai commented on a change in pull request #7291: ARROW-8455: [Rust] Parquet Arrow column read on partially compatible files FIX

rdettai commented on a change in pull request #7291:
URL: https://github.com/apache/arrow/pull/7291#discussion_r431617782



##########
File path: rust/parquet/src/arrow/array_reader.rs
##########
@@ -609,23 +609,34 @@ where
 {
     let mut leaves = HashMap::<*const Type, usize>::new();
 
-    let mut filtered_fields: Vec<Rc<Type>> = Vec::new();
+    let mut filtered_root_names = HashSet::<String>::new();
 
     for c in column_indices {
         let column = parquet_schema.column(c).self_type() as *const Type;
         leaves.insert(column, c);
 
         let root = parquet_schema.get_column_root_ptr(c);
-        filtered_fields.push(root);
+        filtered_root_names.insert(root.name().to_string());
     }
 
     if leaves.is_empty() {
         return Err(general_err!("Can't build array reader without columns!"));
     }
 
+    // Only pass root fields that take part in the projection
+    // to avoid traversal of columns that are not read.
+    // TODO: also prune unread parts of the tree in child structures
+    let filtered_root_fields = parquet_schema
+        .root_schema()
+        .get_fields()
+        .into_iter()
+        .filter(|field| filtered_root_names.contains(field.name()))
+        .map(|field| field.clone())

Review comment:
       Thanks for the tip !




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org