You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/17 15:03:45 UTC

[GitHub] [arrow-datafusion] waynexia commented on a diff in pull request #4258: Add ambiguous check for join

waynexia commented on code in PR #4258:
URL: https://github.com/apache/arrow-datafusion/pull/4258#discussion_r1025284507


##########
datafusion/sql/src/planner.rs:
##########
@@ -3005,6 +3011,66 @@ fn extract_possible_join_keys(
     }
 }
 
+/// Ensure any column reference of the expression is determined.
+/// Assume we have two schema:
+/// schema1: a, b ,c
+/// schema2: a, d, e
+///
+/// `schema1.a + schema2.a` is determined.
+/// `a + d` is not determined, because `a` may come from schema1 or schema2.
+fn ensure_any_column_reference_is_determined(
+    expr: &Expr,
+    schemas: &[DFSchemaRef],
+) -> Result<()> {
+    if schemas.len() == 1 {
+        return Ok(());
+    }
+
+    // extract columns both in one more schemas.
+    let mut column_count_map: HashMap<String, usize> = HashMap::new();
+    schemas
+        .iter()
+        .flat_map(|schema| schema.fields())
+        .for_each(|field| {
+            column_count_map
+                .entry(field.name().into())
+                .and_modify(|v| *v += 1)
+                .or_insert(1usize);
+        });
+
+    let duplicated_column_set = column_count_map
+        .iter()
+        .filter(|(_, count)| **count > 1usize)
+        .map(|(column, _)| column)
+        .cloned()
+        .collect::<HashSet<String>>();
+
+    // check if there is ambiguous column.
+    let using_columns = expr.to_columns()?;
+    let ambiguous_column = using_columns.iter().find(|column| {
+        column.relation.is_none() && duplicated_column_set.contains(&column.name)
+    });
+
+    if let Some(column) = ambiguous_column {
+        let maybe_field = schemas
+            .iter()
+            .flat_map(|schema| {
+                schema
+                    .field_with_unqualified_name(&column.name)
+                    .map(|f| f.qualified_name())
+                    .ok()
+            })
+            .collect::<Vec<_>>();
+        Err(DataFusionError::Plan(format!(
+            "reference \'{}\' is ambiguous, could be {};",
+            column.name,
+            maybe_field.join(","),

Review Comment:
   Very concrete error hint 👍 



##########
datafusion/sql/src/planner.rs:
##########
@@ -3005,6 +3011,66 @@ fn extract_possible_join_keys(
     }
 }
 
+/// Ensure any column reference of the expression is determined.
+/// Assume we have two schema:
+/// schema1: a, b ,c
+/// schema2: a, d, e
+///
+/// `schema1.a + schema2.a` is determined.
+/// `a + d` is not determined, because `a` may come from schema1 or schema2.
+fn ensure_any_column_reference_is_determined(
+    expr: &Expr,
+    schemas: &[DFSchemaRef],
+) -> Result<()> {
+    if schemas.len() == 1 {
+        return Ok(());
+    }
+
+    // extract columns both in one more schemas.
+    let mut column_count_map: HashMap<String, usize> = HashMap::new();
+    schemas
+        .iter()
+        .flat_map(|schema| schema.fields())
+        .for_each(|field| {
+            column_count_map
+                .entry(field.name().into())
+                .and_modify(|v| *v += 1)
+                .or_insert(1usize);
+        });
+
+    let duplicated_column_set = column_count_map
+        .iter()
+        .filter(|(_, count)| **count > 1usize)
+        .map(|(column, _)| column)
+        .cloned()
+        .collect::<HashSet<String>>();

Review Comment:
   nit:
   ```suggestion
       let duplicated_column_set = column_count_map
           .into_iter()
           .filter_map(|(column, count)| if count > 1 { Some(column) } else { None })
           .collect::<HashSet<String>>();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org