You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/01 10:29:25 UTC

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #796: Convert unsupported conditions in left right join to filters

alamb commented on a change in pull request #796:
URL: https://github.com/apache/arrow-datafusion/pull/796#discussion_r680489263



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -372,25 +373,91 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 // extract join keys
                 extract_join_keys(&expr, &mut keys, &mut filter);
 
+                let mut cols = HashSet::new();
+                exprlist_to_columns(&filter, &mut cols)?;
+
                 let (left_keys, right_keys): (Vec<Column>, Vec<Column>) =
                     keys.into_iter().unzip();
-                // return the logical plan representing the join
-                let join = LogicalPlanBuilder::from(left).join(
-                    right,
-                    join_type,
-                    (left_keys, right_keys),
-                )?;
 
+                // return the logical plan representing the join
                 if filter.is_empty() {
+                    let join = LogicalPlanBuilder::from(left).join(
+                        &right,
+                        join_type,
+                        (left_keys, right_keys),
+                    )?;
                     join.build()
                 } else if join_type == JoinType::Inner {
+                    let join = LogicalPlanBuilder::from(left).join(
+                        &right,
+                        join_type,
+                        (left_keys, right_keys),
+                    )?;
                     join.filter(
                         filter
                             .iter()
                             .skip(1)
                             .fold(filter[0].clone(), |acc, e| acc.and(e.clone())),
                     )?
                     .build()
+                }
+                // Left join with all non-equijoin expressions from the right

Review comment:
       > I am curious for left join, is it valid to have the non-equijoin filter expression filter on columns from the left side as well?
   
   Yes, as @Dandandan  mentioned it is perfectly valid (and DataFusion today reasonably says "not supported")
   
   However, to implement the semantics correctly, again as @Dandandan , you have to implement the predicate within the join itself. 
   
   The reason for this is if you have a query like `A LEFT JOIN B` the semantics mean *output at least one entry for all rows in A; If the ON condition evaluates to FALSE then output the row from A with the values for B set to NULL"
   
   The technique in this PR is to recognize you can push filtering on the non-preserved side below the join using a filter

##########
File path: benchmarks/src/bin/tpch.rs
##########
@@ -744,6 +744,11 @@ mod tests {
         run_query(12).await
     }
 
+    #[tokio::test]
+    async fn run_q13() -> Result<()> {

Review comment:
       ❤️ 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org