You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/30 21:28:06 UTC

[GitHub] [arrow-datafusion] Dandandan opened a new pull request #796: Unsupported conditions in join

Dandandan opened a new pull request #796:
URL: https://github.com/apache/arrow-datafusion/pull/796


   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #.
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Move more unsupported conditions in LEFT join to filters.
   This is needed to get TPC-H query 13 to run.
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #796: Convert unsupported conditions in left right join to filters

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #796:
URL: https://github.com/apache/arrow-datafusion/pull/796#issuecomment-890340547


   I plan to review this tomorrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #796: Convert unsupported conditions in left right join to filters

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #796:
URL: https://github.com/apache/arrow-datafusion/pull/796#discussion_r680472226



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -372,25 +373,91 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 // extract join keys
                 extract_join_keys(&expr, &mut keys, &mut filter);
 
+                let mut cols = HashSet::new();
+                exprlist_to_columns(&filter, &mut cols)?;
+
                 let (left_keys, right_keys): (Vec<Column>, Vec<Column>) =
                     keys.into_iter().unzip();
-                // return the logical plan representing the join
-                let join = LogicalPlanBuilder::from(left).join(
-                    right,
-                    join_type,
-                    (left_keys, right_keys),
-                )?;
 
+                // return the logical plan representing the join
                 if filter.is_empty() {
+                    let join = LogicalPlanBuilder::from(left).join(
+                        &right,
+                        join_type,
+                        (left_keys, right_keys),
+                    )?;
                     join.build()
                 } else if join_type == JoinType::Inner {
+                    let join = LogicalPlanBuilder::from(left).join(
+                        &right,
+                        join_type,
+                        (left_keys, right_keys),
+                    )?;
                     join.filter(
                         filter
                             .iter()
                             .skip(1)
                             .fold(filter[0].clone(), |acc, e| acc.and(e.clone())),
                     )?
                     .build()
+                }
+                // Left join with all non-equijoin expressions from the right

Review comment:
       If you would push the filters to the left logical plan, it would filter out the left keys.
   
   For example:
   
   
   ```
   left | right
   1    |   2
   2    |   4
   ```
   
   if we would push the filter `left > 2` to the left side it would filter out the left rows, but the left join requires to result in all left keys (and only match / not match to values on the right).
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp merged pull request #796: Convert unsupported conditions in left right join to filters

Posted by GitBox <gi...@apache.org>.
houqp merged pull request #796:
URL: https://github.com/apache/arrow-datafusion/pull/796


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on a change in pull request #796: Convert unsupported conditions in left right join to filters

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #796:
URL: https://github.com/apache/arrow-datafusion/pull/796#discussion_r680634953



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -372,25 +373,91 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 // extract join keys
                 extract_join_keys(&expr, &mut keys, &mut filter);
 
+                let mut cols = HashSet::new();
+                exprlist_to_columns(&filter, &mut cols)?;
+
                 let (left_keys, right_keys): (Vec<Column>, Vec<Column>) =
                     keys.into_iter().unzip();
-                // return the logical plan representing the join
-                let join = LogicalPlanBuilder::from(left).join(
-                    right,
-                    join_type,
-                    (left_keys, right_keys),
-                )?;
 
+                // return the logical plan representing the join
                 if filter.is_empty() {
+                    let join = LogicalPlanBuilder::from(left).join(
+                        &right,
+                        join_type,
+                        (left_keys, right_keys),
+                    )?;
                     join.build()
                 } else if join_type == JoinType::Inner {
+                    let join = LogicalPlanBuilder::from(left).join(
+                        &right,
+                        join_type,
+                        (left_keys, right_keys),
+                    )?;
                     join.filter(
                         filter
                             .iter()
                             .skip(1)
                             .fold(filter[0].clone(), |acc, e| acc.and(e.clone())),
                     )?
                     .build()
+                }
+                // Left join with all non-equijoin expressions from the right

Review comment:
       Thanks @Dandandan and @alamb for the detailed and clear explanation, makes total sense to me :+1: 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #796: Convert unsupported conditions in left right join to filters

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #796:
URL: https://github.com/apache/arrow-datafusion/pull/796#discussion_r680472226



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -372,25 +373,91 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 // extract join keys
                 extract_join_keys(&expr, &mut keys, &mut filter);
 
+                let mut cols = HashSet::new();
+                exprlist_to_columns(&filter, &mut cols)?;
+
                 let (left_keys, right_keys): (Vec<Column>, Vec<Column>) =
                     keys.into_iter().unzip();
-                // return the logical plan representing the join
-                let join = LogicalPlanBuilder::from(left).join(
-                    right,
-                    join_type,
-                    (left_keys, right_keys),
-                )?;
 
+                // return the logical plan representing the join
                 if filter.is_empty() {
+                    let join = LogicalPlanBuilder::from(left).join(
+                        &right,
+                        join_type,
+                        (left_keys, right_keys),
+                    )?;
                     join.build()
                 } else if join_type == JoinType::Inner {
+                    let join = LogicalPlanBuilder::from(left).join(
+                        &right,
+                        join_type,
+                        (left_keys, right_keys),
+                    )?;
                     join.filter(
                         filter
                             .iter()
                             .skip(1)
                             .fold(filter[0].clone(), |acc, e| acc.and(e.clone())),
                     )?
                     .build()
+                }
+                // Left join with all non-equijoin expressions from the right

Review comment:
       If you would push the filters to the left logical plan, it would filter out the left keys.
   
   For example:
   
   
   ```
   left | right
   1    |   2
   2    |   4
   ```
   
   if we would push the filter `left > 2` to the left side it would filter out that row, but the left join requires to result in all left keys (and only match / not match to values on the right).
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #796: Convert unsupported conditions in left right join to filters

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #796:
URL: https://github.com/apache/arrow-datafusion/pull/796#discussion_r680489263



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -372,25 +373,91 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 // extract join keys
                 extract_join_keys(&expr, &mut keys, &mut filter);
 
+                let mut cols = HashSet::new();
+                exprlist_to_columns(&filter, &mut cols)?;
+
                 let (left_keys, right_keys): (Vec<Column>, Vec<Column>) =
                     keys.into_iter().unzip();
-                // return the logical plan representing the join
-                let join = LogicalPlanBuilder::from(left).join(
-                    right,
-                    join_type,
-                    (left_keys, right_keys),
-                )?;
 
+                // return the logical plan representing the join
                 if filter.is_empty() {
+                    let join = LogicalPlanBuilder::from(left).join(
+                        &right,
+                        join_type,
+                        (left_keys, right_keys),
+                    )?;
                     join.build()
                 } else if join_type == JoinType::Inner {
+                    let join = LogicalPlanBuilder::from(left).join(
+                        &right,
+                        join_type,
+                        (left_keys, right_keys),
+                    )?;
                     join.filter(
                         filter
                             .iter()
                             .skip(1)
                             .fold(filter[0].clone(), |acc, e| acc.and(e.clone())),
                     )?
                     .build()
+                }
+                // Left join with all non-equijoin expressions from the right

Review comment:
       > I am curious for left join, is it valid to have the non-equijoin filter expression filter on columns from the left side as well?
   
   Yes, as @Dandandan  mentioned it is perfectly valid (and DataFusion today reasonably says "not supported")
   
   However, to implement the semantics correctly, again as @Dandandan , you have to implement the predicate within the join itself. 
   
   The reason for this is if you have a query like `A LEFT JOIN B` the semantics mean *output at least one entry for all rows in A; If the ON condition evaluates to FALSE then output the row from A with the values for B set to NULL"
   
   The technique in this PR is to recognize you can push filtering on the non-preserved side below the join using a filter

##########
File path: benchmarks/src/bin/tpch.rs
##########
@@ -744,6 +744,11 @@ mod tests {
         run_query(12).await
     }
 
+    #[tokio::test]
+    async fn run_q13() -> Result<()> {

Review comment:
       ❤️ 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on a change in pull request #796: Convert unsupported conditions in left right join to filters

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #796:
URL: https://github.com/apache/arrow-datafusion/pull/796#discussion_r680467346



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -372,25 +373,91 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 // extract join keys
                 extract_join_keys(&expr, &mut keys, &mut filter);
 
+                let mut cols = HashSet::new();
+                exprlist_to_columns(&filter, &mut cols)?;
+
                 let (left_keys, right_keys): (Vec<Column>, Vec<Column>) =
                     keys.into_iter().unzip();
-                // return the logical plan representing the join
-                let join = LogicalPlanBuilder::from(left).join(
-                    right,
-                    join_type,
-                    (left_keys, right_keys),
-                )?;
 
+                // return the logical plan representing the join
                 if filter.is_empty() {
+                    let join = LogicalPlanBuilder::from(left).join(
+                        &right,
+                        join_type,
+                        (left_keys, right_keys),
+                    )?;
                     join.build()
                 } else if join_type == JoinType::Inner {
+                    let join = LogicalPlanBuilder::from(left).join(
+                        &right,
+                        join_type,
+                        (left_keys, right_keys),
+                    )?;
                     join.filter(
                         filter
                             .iter()
                             .skip(1)
                             .fold(filter[0].clone(), |acc, e| acc.and(e.clone())),
                     )?
                     .build()
+                }
+                // Left join with all non-equijoin expressions from the right

Review comment:
       I am curious for left join, is it valid to have the non-equijoin filter expression filter on columns from the left side as well?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #796: Convert unsupported conditions in left right join to filters

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #796:
URL: https://github.com/apache/arrow-datafusion/pull/796#discussion_r680475141



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -372,25 +373,91 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 // extract join keys
                 extract_join_keys(&expr, &mut keys, &mut filter);
 
+                let mut cols = HashSet::new();
+                exprlist_to_columns(&filter, &mut cols)?;
+
                 let (left_keys, right_keys): (Vec<Column>, Vec<Column>) =
                     keys.into_iter().unzip();
-                // return the logical plan representing the join
-                let join = LogicalPlanBuilder::from(left).join(
-                    right,
-                    join_type,
-                    (left_keys, right_keys),
-                )?;
 
+                // return the logical plan representing the join
                 if filter.is_empty() {
+                    let join = LogicalPlanBuilder::from(left).join(
+                        &right,
+                        join_type,
+                        (left_keys, right_keys),
+                    )?;
                     join.build()
                 } else if join_type == JoinType::Inner {
+                    let join = LogicalPlanBuilder::from(left).join(
+                        &right,
+                        join_type,
+                        (left_keys, right_keys),
+                    )?;
                     join.filter(
                         filter
                             .iter()
                             .skip(1)
                             .fold(filter[0].clone(), |acc, e| acc.and(e.clone())),
                     )?
                     .build()
+                }
+                // Left join with all non-equijoin expressions from the right

Review comment:
       An option to support non-equijoin filter expression for left/left and right/right combinations would be to support filter expression in the join implementation.
   
   For efficiency, for expressions only referring to the left (or right for right join) side, this can be done before loading those values into the hashmap and directly produce values for the not matching left + null values for the right - this saves memory / time if the additional condition filters out a lot.
   
   Filters that include both the left and right columns (e.g.  `left_col > right_col`) could be done inside the probing loop when producing the (left, right) tuples or as some "post processing" by nullifying the non-matching rows on the right side.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org