You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/14 17:03:39 UTC
[GitHub] [arrow-datafusion] alamb commented on a change in pull request #55: Support qualified columns in queries

alamb commented on a change in pull request #55:
URL: https://github.com/apache/arrow-datafusion/pull/55#discussion_r651112853



##########
File path: datafusion/src/logical_plan/dfschema.rs
##########
@@ -208,6 +297,28 @@ impl Into<Schema> for DFSchema {
     }
 }
 
+impl Into<Schema> for &DFSchema {
+    /// Convert a schema into a DFSchema
+    fn into(self) -> Schema {
+        Schema::new(
+            self.fields
+                .iter()
+                .map(|f| {
+                    if f.qualifier().is_some() {

Review comment:
       its almost like this could be `f.feld.clone() even if `f.qualifier()` is Some

##########
File path: datafusion/src/optimizer/projection_push_down.rs
##########
@@ -417,15 +492,52 @@ mod tests {
             .aggregate(vec![], vec![max(col("b"))])?
             .build()?;
 
-        let expected = "Aggregate: groupBy=[[]], aggr=[[MAX(#b)]]\
-        \n  Filter: #c\
+        let expected = "Aggregate: groupBy=[[]], aggr=[[MAX(#test.b)]]\
+        \n  Filter: #test.c\
         \n    TableScan: test projection=Some([1, 2])";
 
         assert_optimized_plan_eq(&plan, expected);
 
         Ok(())
     }
 
+    #[test]
+    fn join_schema_trim() -> Result<()> {
+        let table_scan = test_table_scan()?;
+
+        let schema = Schema::new(vec![Field::new("c1", DataType::UInt32, false)]);
+        let table2_scan =
+            LogicalPlanBuilder::scan_empty(Some("test2"), &schema, None)?.build()?;
+
+        let plan = LogicalPlanBuilder::from(&table_scan)
+            .join(&table2_scan, JoinType::Left, vec!["a"], vec!["c1"])?
+            .project(vec![col("a"), col("b"), col("c1")])?
+            .build()?;
+
+        // make sure projections are pushed down to table scan
+        let expected = "Projection: #test.a, #test.b, #test2.c1\

Review comment:
       👍 

##########
File path: datafusion/src/logical_plan/plan.rs
##########
@@ -125,9 +130,11 @@ pub enum LogicalPlan {
         /// Right input
         right: Arc<LogicalPlan>,
         /// Equijoin clause expressed as pairs of (left, right) join columns
-        on: Vec<(String, String)>,
+        on: Vec<(Column, Column)>,

Review comment:
       👍 

##########
File path: datafusion/src/logical_plan/expr.rs
##########
@@ -33,6 +33,90 @@ use std::collections::HashSet;
 use std::fmt;
 use std::sync::Arc;
 
+/// A named reference to a qualified filed in a schema.

Review comment:
       ```suggestion
   /// A named reference to a qualified field in a schema.
   ```

##########
File path: ballista/rust/core/proto/ballista.proto
##########
@@ -408,6 +421,119 @@ message PhysicalPlanNode {
   }
 }
 
+// physical expressions

Review comment:
       I think this is a very common tactic in other systems and so 👍 
   
   My only regret (?) is that I have spent non trivial amounts of time tracking down bugs when the offsets get messed up -- lol! But I don't have any better suggestions

##########
File path: datafusion/src/logical_plan/expr.rs
##########
@@ -27,14 +27,89 @@ use aggregates::{AccumulatorFunctionImplementation, StateTypeFunction};
 use arrow::{compute::can_cast_types, datatypes::DataType};
 
 use crate::error::{DataFusionError, Result};
-use crate::logical_plan::{DFField, DFSchema};
+use crate::logical_plan::{DFField, DFSchema, DFSchemaRef};
 use crate::physical_plan::{
     aggregates, expressions::binary_operator_data_type, functions, udf::ScalarUDF,
 };
 use crate::{physical_plan::udaf::AggregateUDF, scalar::ScalarValue};
 use functions::{ReturnTypeFunction, ScalarFunctionImplementation, Signature};
 use std::collections::HashSet;
 
+/// A named reference to a qualified filed in a schema.
+#[derive(Debug, Clone, PartialEq, Eq, Hash)]
+pub struct Column {
+    /// relation/table name.
+    pub relation: Option<String>,
+    /// field/column name.
+    pub name: String,

Review comment:
       I suggest we track this as a follow on ticket




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org