You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/29 19:07:58 UTC

[GitHub] [arrow] drusso opened a new pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

drusso opened a new pull request #9364:
URL: https://github.com/apache/arrow/pull/9364


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] drusso commented on a change in pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
drusso commented on a change in pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#discussion_r567486688



##########
File path: rust/datafusion/src/sql/planner.rs
##########
@@ -421,16 +415,85 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
         // The SELECT expressions, with wildcards expanded.
         let select_exprs = self.prepare_select_exprs(&plan, &select.projection)?;
 
+        // Optionally the HAVING expression.
+        let having_expr_opt = if let Some(having_expr) = &select.having {
+            let having_expr = self.sql_expr_to_logical_expr(having_expr)?;
+
+            // This step "dereferences" any aliases in the HAVING clause.
+            //
+            // This is how we support queries with HAVING expressions that
+            // refer to aliased columns.
+            //
+            // For example:
+            //
+            //   SELECT c1 AS m FROM t HAVING m > 10;
+            //   SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING m > 10;
+            //
+            // are rewritten as, respectively:
+            //
+            //   SELECT c1 AS m FROM t HAVING c1 > 10;
+            //   SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING MAX(c2) > 10;
+            //
+            let having_expr =
+                resolve_aliases_to_exprs(&having_expr, &extract_aliases(&select_exprs))?;
+
+            Some(having_expr)
+        } else {
+            None
+        };
+
+        // The outer expressions we will search through for
+        // aggregates. Aggregates may be sourced from the SELECT...
+        let mut aggr_expr_haystack = select_exprs.clone();
+
+        // ... or from the HAVING.
+        if let Some(having_expr) = &having_expr_opt {
+            aggr_expr_haystack.push(having_expr.clone());
+        }
+
         // All of the aggregate expressions (deduplicated).
-        let aggr_exprs = find_aggregate_exprs(&select_exprs);
+        let aggr_exprs = find_aggregate_exprs(&aggr_expr_haystack);
 
-        let (plan, select_exprs_post_aggr) =
+        let (plan, select_exprs_post_aggr, having_expr_post_aggr_opt) =
             if !select.group_by.is_empty() || !aggr_exprs.is_empty() {
-                self.aggregate(&plan, &select_exprs, &select.group_by, &aggr_exprs)?
+                self.aggregate(
+                    &plan,
+                    &select_exprs,
+                    &having_expr_opt,
+                    &select.group_by,
+                    &aggr_exprs,
+                )?
             } else {
-                (plan, select_exprs)
+                if let Some(having_expr) = &having_expr_opt {
+                    let available_columns = select_exprs
+                        .iter()
+                        .map(|expr| expr_as_column_expr(expr, &plan))
+                        .collect::<Result<Vec<Expr>>>()?;
+
+                    // Ensure the HAVING expression is using only columns
+                    // provided by the SELECT.
+                    if !can_columns_satisfy_exprs(
+                        &available_columns,
+                        &vec![having_expr.clone()],

Review comment:
       I didn't end up making this change since the callee is expecting a `&Vec`. I gather we might generally prefer slices rather than vectors (please correct me if I am mistaken!), and since there's a handful of functions in ./utils.rs that can be updated, I can follow up with another PR to address them all in one shot. 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-772730164


   🎉 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-772685101


   Looks like github is operating in degraded fashion: https://www.githubstatus.com/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-772730565


   🎉 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-771222900


   I plan to review this carefully tomorrow


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan edited a comment on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-770023513


   This is looking good @drusso ! I think it can use some tests with example (tabular) input as well? To make sure the results are as expected. There are some more end to end tests in the tests directory.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] drusso commented on a change in pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
drusso commented on a change in pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#discussion_r567506751



##########
File path: rust/datafusion/src/sql/planner.rs
##########
@@ -421,16 +415,85 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
         // The SELECT expressions, with wildcards expanded.
         let select_exprs = self.prepare_select_exprs(&plan, &select.projection)?;
 
+        // Optionally the HAVING expression.
+        let having_expr_opt = if let Some(having_expr) = &select.having {
+            let having_expr = self.sql_expr_to_logical_expr(having_expr)?;
+
+            // This step "dereferences" any aliases in the HAVING clause.
+            //
+            // This is how we support queries with HAVING expressions that
+            // refer to aliased columns.
+            //
+            // For example:
+            //
+            //   SELECT c1 AS m FROM t HAVING m > 10;
+            //   SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING m > 10;
+            //
+            // are rewritten as, respectively:
+            //
+            //   SELECT c1 AS m FROM t HAVING c1 > 10;
+            //   SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING MAX(c2) > 10;
+            //
+            let having_expr =
+                resolve_aliases_to_exprs(&having_expr, &extract_aliases(&select_exprs))?;
+
+            Some(having_expr)
+        } else {
+            None
+        };
+
+        // The outer expressions we will search through for
+        // aggregates. Aggregates may be sourced from the SELECT...
+        let mut aggr_expr_haystack = select_exprs.clone();
+
+        // ... or from the HAVING.
+        if let Some(having_expr) = &having_expr_opt {
+            aggr_expr_haystack.push(having_expr.clone());
+        }
+
         // All of the aggregate expressions (deduplicated).
-        let aggr_exprs = find_aggregate_exprs(&select_exprs);
+        let aggr_exprs = find_aggregate_exprs(&aggr_expr_haystack);
 
-        let (plan, select_exprs_post_aggr) =
+        let (plan, select_exprs_post_aggr, having_expr_post_aggr_opt) =
             if !select.group_by.is_empty() || !aggr_exprs.is_empty() {
-                self.aggregate(&plan, &select_exprs, &select.group_by, &aggr_exprs)?
+                self.aggregate(
+                    &plan,
+                    &select_exprs,
+                    &having_expr_opt,
+                    &select.group_by,
+                    &aggr_exprs,
+                )?
             } else {
-                (plan, select_exprs)
+                if let Some(having_expr) = &having_expr_opt {
+                    let available_columns = select_exprs
+                        .iter()
+                        .map(|expr| expr_as_column_expr(expr, &plan))
+                        .collect::<Result<Vec<Expr>>>()?;
+
+                    // Ensure the HAVING expression is using only columns
+                    // provided by the SELECT.
+                    if !can_columns_satisfy_exprs(
+                        &available_columns,
+                        &vec![having_expr.clone()],

Review comment:
       Great, thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] drusso commented on a change in pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
drusso commented on a change in pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#discussion_r567507844



##########
File path: rust/datafusion/src/sql/planner.rs
##########
@@ -421,16 +415,85 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
         // The SELECT expressions, with wildcards expanded.
         let select_exprs = self.prepare_select_exprs(&plan, &select.projection)?;
 
+        // Optionally the HAVING expression.
+        let having_expr_opt = if let Some(having_expr) = &select.having {

Review comment:
       Updated in c04743c. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#discussion_r567063657



##########
File path: rust/datafusion/src/sql/planner.rs
##########
@@ -421,16 +415,85 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
         // The SELECT expressions, with wildcards expanded.
         let select_exprs = self.prepare_select_exprs(&plan, &select.projection)?;
 
+        // Optionally the HAVING expression.
+        let having_expr_opt = if let Some(having_expr) = &select.having {

Review comment:
       This probably could use `Option::map` instead of `if let`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9364:
URL: https://github.com/apache/arrow/pull/9364


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#discussion_r567487475



##########
File path: rust/datafusion/src/sql/planner.rs
##########
@@ -421,16 +415,85 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
         // The SELECT expressions, with wildcards expanded.
         let select_exprs = self.prepare_select_exprs(&plan, &select.projection)?;
 
+        // Optionally the HAVING expression.
+        let having_expr_opt = if let Some(having_expr) = &select.having {
+            let having_expr = self.sql_expr_to_logical_expr(having_expr)?;
+
+            // This step "dereferences" any aliases in the HAVING clause.
+            //
+            // This is how we support queries with HAVING expressions that
+            // refer to aliased columns.
+            //
+            // For example:
+            //
+            //   SELECT c1 AS m FROM t HAVING m > 10;
+            //   SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING m > 10;
+            //
+            // are rewritten as, respectively:
+            //
+            //   SELECT c1 AS m FROM t HAVING c1 > 10;
+            //   SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING MAX(c2) > 10;
+            //
+            let having_expr =
+                resolve_aliases_to_exprs(&having_expr, &extract_aliases(&select_exprs))?;
+
+            Some(having_expr)
+        } else {
+            None
+        };
+
+        // The outer expressions we will search through for
+        // aggregates. Aggregates may be sourced from the SELECT...
+        let mut aggr_expr_haystack = select_exprs.clone();
+
+        // ... or from the HAVING.
+        if let Some(having_expr) = &having_expr_opt {
+            aggr_expr_haystack.push(having_expr.clone());
+        }
+
         // All of the aggregate expressions (deduplicated).
-        let aggr_exprs = find_aggregate_exprs(&select_exprs);
+        let aggr_exprs = find_aggregate_exprs(&aggr_expr_haystack);
 
-        let (plan, select_exprs_post_aggr) =
+        let (plan, select_exprs_post_aggr, having_expr_post_aggr_opt) =
             if !select.group_by.is_empty() || !aggr_exprs.is_empty() {
-                self.aggregate(&plan, &select_exprs, &select.group_by, &aggr_exprs)?
+                self.aggregate(
+                    &plan,
+                    &select_exprs,
+                    &having_expr_opt,
+                    &select.group_by,
+                    &aggr_exprs,
+                )?
             } else {
-                (plan, select_exprs)
+                if let Some(having_expr) = &having_expr_opt {
+                    let available_columns = select_exprs
+                        .iter()
+                        .map(|expr| expr_as_column_expr(expr, &plan))
+                        .collect::<Result<Vec<Expr>>>()?;
+
+                    // Ensure the HAVING expression is using only columns
+                    // provided by the SELECT.
+                    if !can_columns_satisfy_exprs(
+                        &available_columns,
+                        &vec![having_expr.clone()],

Review comment:
       FYI, I openedhttps://github.com/apache/arrow/pull/9380 to address this!

##########
File path: rust/datafusion/src/sql/planner.rs
##########
@@ -421,16 +415,85 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
         // The SELECT expressions, with wildcards expanded.
         let select_exprs = self.prepare_select_exprs(&plan, &select.projection)?;
 
+        // Optionally the HAVING expression.
+        let having_expr_opt = if let Some(having_expr) = &select.having {
+            let having_expr = self.sql_expr_to_logical_expr(having_expr)?;
+
+            // This step "dereferences" any aliases in the HAVING clause.
+            //
+            // This is how we support queries with HAVING expressions that
+            // refer to aliased columns.
+            //
+            // For example:
+            //
+            //   SELECT c1 AS m FROM t HAVING m > 10;
+            //   SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING m > 10;
+            //
+            // are rewritten as, respectively:
+            //
+            //   SELECT c1 AS m FROM t HAVING c1 > 10;
+            //   SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING MAX(c2) > 10;
+            //
+            let having_expr =
+                resolve_aliases_to_exprs(&having_expr, &extract_aliases(&select_exprs))?;
+
+            Some(having_expr)
+        } else {
+            None
+        };
+
+        // The outer expressions we will search through for
+        // aggregates. Aggregates may be sourced from the SELECT...
+        let mut aggr_expr_haystack = select_exprs.clone();
+
+        // ... or from the HAVING.
+        if let Some(having_expr) = &having_expr_opt {
+            aggr_expr_haystack.push(having_expr.clone());
+        }
+
         // All of the aggregate expressions (deduplicated).
-        let aggr_exprs = find_aggregate_exprs(&select_exprs);
+        let aggr_exprs = find_aggregate_exprs(&aggr_expr_haystack);
 
-        let (plan, select_exprs_post_aggr) =
+        let (plan, select_exprs_post_aggr, having_expr_post_aggr_opt) =
             if !select.group_by.is_empty() || !aggr_exprs.is_empty() {
-                self.aggregate(&plan, &select_exprs, &select.group_by, &aggr_exprs)?
+                self.aggregate(
+                    &plan,
+                    &select_exprs,
+                    &having_expr_opt,
+                    &select.group_by,
+                    &aggr_exprs,
+                )?
             } else {
-                (plan, select_exprs)
+                if let Some(having_expr) = &having_expr_opt {
+                    let available_columns = select_exprs
+                        .iter()
+                        .map(|expr| expr_as_column_expr(expr, &plan))
+                        .collect::<Result<Vec<Expr>>>()?;
+
+                    // Ensure the HAVING expression is using only columns
+                    // provided by the SELECT.
+                    if !can_columns_satisfy_exprs(
+                        &available_columns,
+                        &vec![having_expr.clone()],

Review comment:
       FYI, I opened https://github.com/apache/arrow/pull/9380 to address this!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] drusso commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
drusso commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-771639766


   Thanks @jorgecarleitao and @alamb! 
   
   I added a note in the README. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-772592093


   Thanks @drusso !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#discussion_r568781006



##########
File path: rust/datafusion/src/sql/planner.rs
##########
@@ -1233,6 +1321,257 @@ mod tests {
         quick_test(sql, expected);
     }
 
+    #[test]
+    fn select_with_having() {
+        let sql = "SELECT id, age
+                   FROM person
+                   HAVING age > 100 AND age < 200";
+        let expected = "Projection: #id, #age\
+                        \n  Filter: #age Gt Int64(100) And #age Lt Int64(200)\
+                        \n    TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_with_having_referencing_column_not_in_select() {
+        let sql = "SELECT id, age
+                   FROM person
+                   HAVING first_name = 'M'";
+        let err = logical_plan(sql).expect_err("query should have failed");
+        assert_eq!(
+            "Plan(\"Having references column(s) not provided by the select\")",
+            format!("{:?}", err)
+        );
+    }
+
+    #[test]
+    fn select_with_having_referencing_column_nested_in_select_expression() {
+        let sql = "SELECT id, age + 1
+                   FROM person
+                   HAVING age > 100";
+        let err = logical_plan(sql).expect_err("query should have failed");
+        assert_eq!(
+            "Plan(\"Having references column(s) not provided by the select\")",
+            format!("{:?}", err)
+        );
+    }
+
+    #[test]
+    fn select_with_having_with_aggregate_not_in_select() {
+        let sql = "SELECT first_name
+                   FROM person
+                   HAVING MAX(age) > 100";
+        let err = logical_plan(sql).expect_err("query should have failed");
+        assert_eq!(
+            "Plan(\"Projection references non-aggregate values\")",
+            format!("{:?}", err)
+        );
+    }
+
+    #[test]
+    fn select_aggregate_with_having_that_reuses_aggregate() {
+        let sql = "SELECT MAX(age)
+                   FROM person
+                   HAVING MAX(age) < 30";
+        let expected = "Filter: #MAX(age) Lt Int64(30)\
+                        \n  Aggregate: groupBy=[[]], aggr=[[MAX(#age)]]\
+                        \n    TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_having_with_aggregate_not_in_select() {
+        let sql = "SELECT MAX(age)
+                   FROM person
+                   HAVING MAX(first_name) > 'M'";
+        let expected = "Projection: #MAX(age)\
+                        \n  Filter: #MAX(first_name) Gt Utf8(\"M\")\
+                        \n    Aggregate: groupBy=[[]], aggr=[[MAX(#age), MAX(#first_name)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_having_referencing_column_not_in_select() {
+        let sql = "SELECT COUNT(*)
+                   FROM person
+                   HAVING first_name = 'M'";
+        let err = logical_plan(sql).expect_err("query should have failed");
+        assert_eq!(
+            "Plan(\"Having references non-aggregate values\")",
+            format!("{:?}", err)
+        );
+    }
+
+    #[test]
+    fn select_aggregate_aliased_with_having_referencing_aggregate_by_its_alias() {
+        let sql = "SELECT MAX(age) as max_age
+                   FROM person
+                   HAVING max_age < 30";
+        let expected = "Projection: #MAX(age) AS max_age\
+                        \n  Filter: #MAX(age) Lt Int64(30)\
+                        \n    Aggregate: groupBy=[[]], aggr=[[MAX(#age)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_aliased_with_having_that_reuses_aggregate_but_not_by_its_alias() {
+        let sql = "SELECT MAX(age) as max_age
+                   FROM person
+                   HAVING MAX(age) < 30";
+        let expected = "Projection: #MAX(age) AS max_age\
+                        \n  Filter: #MAX(age) Lt Int64(30)\
+                        \n    Aggregate: groupBy=[[]], aggr=[[MAX(#age)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having() {
+        let sql = "SELECT first_name, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING first_name = 'M'";
+        let expected = "Filter: #first_name Eq Utf8(\"M\")\
+                        \n  Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age)]]\
+                        \n    TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_using_column_by_alias() {
+        let sql = "SELECT first_name AS fn, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 2 AND fn = 'M'";
+        let expected = "Projection: #first_name AS fn, #MAX(age)\
+                        \n  Filter: #MAX(age) Gt Int64(2) And #first_name Eq Utf8(\"M\")\
+                        \n    Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_using_columns_with_and_without_their_aliases(
+    ) {
+        let sql = "SELECT first_name AS fn, MAX(age) AS max_age
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 2 AND max_age < 5 AND first_name = 'M' AND fn = 'N'";
+        let expected = "Projection: #first_name AS fn, #MAX(age) AS max_age\
+                        \n  Filter: #MAX(age) Gt Int64(2) And #MAX(age) Lt Int64(5) And #first_name Eq Utf8(\"M\") And #first_name Eq Utf8(\"N\")\
+                        \n    Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_that_reuses_aggregate() {
+        let sql = "SELECT first_name, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 100";
+        let expected = "Filter: #MAX(age) Gt Int64(100)\
+                        \n  Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age)]]\
+                        \n    TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_referencing_column_not_in_group_by() {
+        let sql = "SELECT first_name, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 10 AND last_name = 'M'";
+        let err = logical_plan(sql).expect_err("query should have failed");
+        assert_eq!(
+            "Plan(\"Having references non-aggregate values\")",
+            format!("{:?}", err)
+        );
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_that_reuses_aggregate_multiple_times() {
+        let sql = "SELECT first_name, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 100 AND MAX(age) < 200";
+        let expected = "Filter: #MAX(age) Gt Int64(100) And #MAX(age) Lt Int64(200)\
+                        \n  Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age)]]\
+                        \n    TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_using_aggreagate_not_in_select() {
+        let sql = "SELECT first_name, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 100 AND MIN(id) < 50";
+        let expected = "Projection: #first_name, #MAX(age)\
+                        \n  Filter: #MAX(age) Gt Int64(100) And #MIN(id) Lt Int64(50)\
+                        \n    Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age), MIN(#id)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_aliased_with_group_by_with_having_referencing_aggregate_by_its_alias(
+    ) {
+        let sql = "SELECT first_name, MAX(age) AS max_age
+                   FROM person
+                   GROUP BY first_name
+                   HAVING max_age > 100";
+        let expected = "Projection: #first_name, #MAX(age) AS max_age\
+                        \n  Filter: #MAX(age) Gt Int64(100)\
+                        \n    Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_compound_aliased_with_group_by_with_having_referencing_compound_aggregate_by_its_alias(
+    ) {
+        let sql = "SELECT first_name, MAX(age) + 1 AS max_age_plus_one
+                   FROM person
+                   GROUP BY first_name
+                   HAVING max_age_plus_one > 100";
+        let expected =
+            "Projection: #first_name, #MAX(age) Plus Int64(1) AS max_age_plus_one\
+                        \n  Filter: #MAX(age) Plus Int64(1) Gt Int64(100)\
+                        \n    Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_using_derived_column_aggreagate_not_in_select(
+    ) {
+        let sql = "SELECT first_name, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 100 AND MIN(id - 2) < 50";
+        let expected = "Projection: #first_name, #MAX(age)\
+                        \n  Filter: #MAX(age) Gt Int64(100) And #MIN(id Minus Int64(2)) Lt Int64(50)\
+                        \n    Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age), MIN(#id Minus Int64(2))]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_using_count_star_not_in_select() {
+        let sql = "SELECT first_name, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 100 AND COUNT(*) < 50";
+        let expected = "Projection: #first_name, #MAX(age)\
+                        \n  Filter: #MAX(age) Gt Int64(100) And #COUNT(UInt8(1)) Lt Int64(50)\
+                        \n    Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age), COUNT(UInt8(1))]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+

Review comment:
       These test cases are pretty amazing @drusso  -- thank you. The only one I think we should also add is a query that has both a `HAVING` and `WHERE` clause
   
   Something like 
   
   ```
    "SELECT first_name, MAX(age)
                      FROM person
                      WHERE id > 5
                      GROUP BY first_name
                      HAVING MAX(age) <100";
   ```
   
   And 
   ```
    "SELECT first_name, MAX(age)
                      FROM person
                      WHERE id > 5 AND age > 18
                      GROUP BY first_name
                      HAVING MAX(age) <100";
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#discussion_r567055101



##########
File path: rust/datafusion/src/sql/planner.rs
##########
@@ -421,16 +415,85 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
         // The SELECT expressions, with wildcards expanded.
         let select_exprs = self.prepare_select_exprs(&plan, &select.projection)?;
 
+        // Optionally the HAVING expression.
+        let having_expr_opt = if let Some(having_expr) = &select.having {
+            let having_expr = self.sql_expr_to_logical_expr(having_expr)?;
+
+            // This step "dereferences" any aliases in the HAVING clause.
+            //
+            // This is how we support queries with HAVING expressions that
+            // refer to aliased columns.
+            //
+            // For example:
+            //
+            //   SELECT c1 AS m FROM t HAVING m > 10;
+            //   SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING m > 10;
+            //
+            // are rewritten as, respectively:
+            //
+            //   SELECT c1 AS m FROM t HAVING c1 > 10;
+            //   SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING MAX(c2) > 10;
+            //
+            let having_expr =
+                resolve_aliases_to_exprs(&having_expr, &extract_aliases(&select_exprs))?;
+
+            Some(having_expr)
+        } else {
+            None
+        };
+
+        // The outer expressions we will search through for
+        // aggregates. Aggregates may be sourced from the SELECT...
+        let mut aggr_expr_haystack = select_exprs.clone();
+
+        // ... or from the HAVING.
+        if let Some(having_expr) = &having_expr_opt {
+            aggr_expr_haystack.push(having_expr.clone());
+        }
+
         // All of the aggregate expressions (deduplicated).
-        let aggr_exprs = find_aggregate_exprs(&select_exprs);
+        let aggr_exprs = find_aggregate_exprs(&aggr_expr_haystack);
 
-        let (plan, select_exprs_post_aggr) =
+        let (plan, select_exprs_post_aggr, having_expr_post_aggr_opt) =
             if !select.group_by.is_empty() || !aggr_exprs.is_empty() {
-                self.aggregate(&plan, &select_exprs, &select.group_by, &aggr_exprs)?
+                self.aggregate(
+                    &plan,
+                    &select_exprs,
+                    &having_expr_opt,
+                    &select.group_by,
+                    &aggr_exprs,
+                )?
             } else {
-                (plan, select_exprs)
+                if let Some(having_expr) = &having_expr_opt {
+                    let available_columns = select_exprs
+                        .iter()
+                        .map(|expr| expr_as_column_expr(expr, &plan))
+                        .collect::<Result<Vec<Expr>>>()?;
+
+                    // Ensure the HAVING expression is using only columns
+                    // provided by the SELECT.
+                    if !can_columns_satisfy_exprs(
+                        &available_columns,
+                        &vec![having_expr.clone()],

Review comment:
       I believe this can be `&[having_expr.clone()]`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-772620858


   I am just waiting for the CI to finish on this PR and then I plan to merge it!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#discussion_r567415267



##########
File path: rust/datafusion/src/sql/utils.rs
##########
@@ -335,3 +336,35 @@ where
         },
     }
 }
+
+/// Returns mapping of each alias (`String`) to the expression (`Expr`) it is
+/// aliasing.
+pub(crate) fn extract_aliases(exprs: &Vec<Expr>) -> HashMap<String, Expr> {

Review comment:
       This can probably be something like the following if you wanted more idomatic rust style (I don't think it really matters in this case)
   
   ```suggestion
   pub(crate) fn extract_aliases(exprs: &[Expr]) -> HashMap<String, Expr> {
   ```
   
   Maybe even something fancier like anything that can become an iterator:
   
   ```
   pub(crate) fn extract_aliases(exprs: impl Iterator<Item=&Expr>) -> HashMap<String, Expr> {
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] drusso commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
drusso commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-771639766


   Thanks @jorgecarleitao and @alamb! 
   
   I added a note in the README. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] drusso commented on a change in pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
drusso commented on a change in pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#discussion_r567507800



##########
File path: rust/datafusion/src/sql/utils.rs
##########
@@ -335,3 +336,35 @@ where
         },
     }
 }
+
+/// Returns mapping of each alias (`String`) to the expression (`Expr`) it is
+/// aliasing.
+pub(crate) fn extract_aliases(exprs: &Vec<Expr>) -> HashMap<String, Expr> {

Review comment:
       Thanks for the tip! I hadn't realized there's automatic deref coercion from vectors to slices. 
   
   Updated in 3e52167. 
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] drusso commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
drusso commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-770478760


   @alamb @Dandandan I've updated the PR, let me know if there's anything outstanding. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#discussion_r568781006



##########
File path: rust/datafusion/src/sql/planner.rs
##########
@@ -1233,6 +1321,257 @@ mod tests {
         quick_test(sql, expected);
     }
 
+    #[test]
+    fn select_with_having() {
+        let sql = "SELECT id, age
+                   FROM person
+                   HAVING age > 100 AND age < 200";
+        let expected = "Projection: #id, #age\
+                        \n  Filter: #age Gt Int64(100) And #age Lt Int64(200)\
+                        \n    TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_with_having_referencing_column_not_in_select() {
+        let sql = "SELECT id, age
+                   FROM person
+                   HAVING first_name = 'M'";
+        let err = logical_plan(sql).expect_err("query should have failed");
+        assert_eq!(
+            "Plan(\"Having references column(s) not provided by the select\")",
+            format!("{:?}", err)
+        );
+    }
+
+    #[test]
+    fn select_with_having_referencing_column_nested_in_select_expression() {
+        let sql = "SELECT id, age + 1
+                   FROM person
+                   HAVING age > 100";
+        let err = logical_plan(sql).expect_err("query should have failed");
+        assert_eq!(
+            "Plan(\"Having references column(s) not provided by the select\")",
+            format!("{:?}", err)
+        );
+    }
+
+    #[test]
+    fn select_with_having_with_aggregate_not_in_select() {
+        let sql = "SELECT first_name
+                   FROM person
+                   HAVING MAX(age) > 100";
+        let err = logical_plan(sql).expect_err("query should have failed");
+        assert_eq!(
+            "Plan(\"Projection references non-aggregate values\")",
+            format!("{:?}", err)
+        );
+    }
+
+    #[test]
+    fn select_aggregate_with_having_that_reuses_aggregate() {
+        let sql = "SELECT MAX(age)
+                   FROM person
+                   HAVING MAX(age) < 30";
+        let expected = "Filter: #MAX(age) Lt Int64(30)\
+                        \n  Aggregate: groupBy=[[]], aggr=[[MAX(#age)]]\
+                        \n    TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_having_with_aggregate_not_in_select() {
+        let sql = "SELECT MAX(age)
+                   FROM person
+                   HAVING MAX(first_name) > 'M'";
+        let expected = "Projection: #MAX(age)\
+                        \n  Filter: #MAX(first_name) Gt Utf8(\"M\")\
+                        \n    Aggregate: groupBy=[[]], aggr=[[MAX(#age), MAX(#first_name)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_having_referencing_column_not_in_select() {
+        let sql = "SELECT COUNT(*)
+                   FROM person
+                   HAVING first_name = 'M'";
+        let err = logical_plan(sql).expect_err("query should have failed");
+        assert_eq!(
+            "Plan(\"Having references non-aggregate values\")",
+            format!("{:?}", err)
+        );
+    }
+
+    #[test]
+    fn select_aggregate_aliased_with_having_referencing_aggregate_by_its_alias() {
+        let sql = "SELECT MAX(age) as max_age
+                   FROM person
+                   HAVING max_age < 30";
+        let expected = "Projection: #MAX(age) AS max_age\
+                        \n  Filter: #MAX(age) Lt Int64(30)\
+                        \n    Aggregate: groupBy=[[]], aggr=[[MAX(#age)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_aliased_with_having_that_reuses_aggregate_but_not_by_its_alias() {
+        let sql = "SELECT MAX(age) as max_age
+                   FROM person
+                   HAVING MAX(age) < 30";
+        let expected = "Projection: #MAX(age) AS max_age\
+                        \n  Filter: #MAX(age) Lt Int64(30)\
+                        \n    Aggregate: groupBy=[[]], aggr=[[MAX(#age)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having() {
+        let sql = "SELECT first_name, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING first_name = 'M'";
+        let expected = "Filter: #first_name Eq Utf8(\"M\")\
+                        \n  Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age)]]\
+                        \n    TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_using_column_by_alias() {
+        let sql = "SELECT first_name AS fn, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 2 AND fn = 'M'";
+        let expected = "Projection: #first_name AS fn, #MAX(age)\
+                        \n  Filter: #MAX(age) Gt Int64(2) And #first_name Eq Utf8(\"M\")\
+                        \n    Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_using_columns_with_and_without_their_aliases(
+    ) {
+        let sql = "SELECT first_name AS fn, MAX(age) AS max_age
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 2 AND max_age < 5 AND first_name = 'M' AND fn = 'N'";
+        let expected = "Projection: #first_name AS fn, #MAX(age) AS max_age\
+                        \n  Filter: #MAX(age) Gt Int64(2) And #MAX(age) Lt Int64(5) And #first_name Eq Utf8(\"M\") And #first_name Eq Utf8(\"N\")\
+                        \n    Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_that_reuses_aggregate() {
+        let sql = "SELECT first_name, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 100";
+        let expected = "Filter: #MAX(age) Gt Int64(100)\
+                        \n  Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age)]]\
+                        \n    TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_referencing_column_not_in_group_by() {
+        let sql = "SELECT first_name, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 10 AND last_name = 'M'";
+        let err = logical_plan(sql).expect_err("query should have failed");
+        assert_eq!(
+            "Plan(\"Having references non-aggregate values\")",
+            format!("{:?}", err)
+        );
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_that_reuses_aggregate_multiple_times() {
+        let sql = "SELECT first_name, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 100 AND MAX(age) < 200";
+        let expected = "Filter: #MAX(age) Gt Int64(100) And #MAX(age) Lt Int64(200)\
+                        \n  Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age)]]\
+                        \n    TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_using_aggreagate_not_in_select() {
+        let sql = "SELECT first_name, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 100 AND MIN(id) < 50";
+        let expected = "Projection: #first_name, #MAX(age)\
+                        \n  Filter: #MAX(age) Gt Int64(100) And #MIN(id) Lt Int64(50)\
+                        \n    Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age), MIN(#id)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_aliased_with_group_by_with_having_referencing_aggregate_by_its_alias(
+    ) {
+        let sql = "SELECT first_name, MAX(age) AS max_age
+                   FROM person
+                   GROUP BY first_name
+                   HAVING max_age > 100";
+        let expected = "Projection: #first_name, #MAX(age) AS max_age\
+                        \n  Filter: #MAX(age) Gt Int64(100)\
+                        \n    Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_compound_aliased_with_group_by_with_having_referencing_compound_aggregate_by_its_alias(
+    ) {
+        let sql = "SELECT first_name, MAX(age) + 1 AS max_age_plus_one
+                   FROM person
+                   GROUP BY first_name
+                   HAVING max_age_plus_one > 100";
+        let expected =
+            "Projection: #first_name, #MAX(age) Plus Int64(1) AS max_age_plus_one\
+                        \n  Filter: #MAX(age) Plus Int64(1) Gt Int64(100)\
+                        \n    Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age)]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_using_derived_column_aggreagate_not_in_select(
+    ) {
+        let sql = "SELECT first_name, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 100 AND MIN(id - 2) < 50";
+        let expected = "Projection: #first_name, #MAX(age)\
+                        \n  Filter: #MAX(age) Gt Int64(100) And #MIN(id Minus Int64(2)) Lt Int64(50)\
+                        \n    Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age), MIN(#id Minus Int64(2))]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn select_aggregate_with_group_by_with_having_using_count_star_not_in_select() {
+        let sql = "SELECT first_name, MAX(age)
+                   FROM person
+                   GROUP BY first_name
+                   HAVING MAX(age) > 100 AND COUNT(*) < 50";
+        let expected = "Projection: #first_name, #MAX(age)\
+                        \n  Filter: #MAX(age) Gt Int64(100) And #COUNT(UInt8(1)) Lt Int64(50)\
+                        \n    Aggregate: groupBy=[[#first_name]], aggr=[[MAX(#age), COUNT(UInt8(1))]]\
+                        \n      TableScan: person projection=None";
+        quick_test(sql, expected);
+    }
+

Review comment:
       These test cases are pretty amazing @drusso  -- thank you. The only one I think we should also add is a query that has both a `HAVING` and `WHERE` clause
   
   Something like 
   
   ```
    "SELECT first_name, MAX(age)
                      FROM person
                      WHERE id > 5
                      GROUP BY first_name
                      HAVING MAX(age) <100";
   ```
   
   And 
   ```
    "SELECT first_name, MAX(age)
                      FROM person
                      WHERE id > 5 AND age > 18
                      GROUP BY first_name
                      HAVING MAX(age) <100";
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-769992523


   https://issues.apache.org/jira/browse/ARROW-11431


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-772729243


   All the rust tests are passing. Merging this in even though Travis is still running.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-772738853


   🚀


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-770023513


   This is looking good @drusso ! I think it can use some tests with example (tabular) input as well? To make sure the results are expected. There are some more end to end tests in the tests directory.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#discussion_r567487475



##########
File path: rust/datafusion/src/sql/planner.rs
##########
@@ -421,16 +415,85 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
         // The SELECT expressions, with wildcards expanded.
         let select_exprs = self.prepare_select_exprs(&plan, &select.projection)?;
 
+        // Optionally the HAVING expression.
+        let having_expr_opt = if let Some(having_expr) = &select.having {
+            let having_expr = self.sql_expr_to_logical_expr(having_expr)?;
+
+            // This step "dereferences" any aliases in the HAVING clause.
+            //
+            // This is how we support queries with HAVING expressions that
+            // refer to aliased columns.
+            //
+            // For example:
+            //
+            //   SELECT c1 AS m FROM t HAVING m > 10;
+            //   SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING m > 10;
+            //
+            // are rewritten as, respectively:
+            //
+            //   SELECT c1 AS m FROM t HAVING c1 > 10;
+            //   SELECT c1, MAX(c2) AS m FROM t GROUP BY c1 HAVING MAX(c2) > 10;
+            //
+            let having_expr =
+                resolve_aliases_to_exprs(&having_expr, &extract_aliases(&select_exprs))?;
+
+            Some(having_expr)
+        } else {
+            None
+        };
+
+        // The outer expressions we will search through for
+        // aggregates. Aggregates may be sourced from the SELECT...
+        let mut aggr_expr_haystack = select_exprs.clone();
+
+        // ... or from the HAVING.
+        if let Some(having_expr) = &having_expr_opt {
+            aggr_expr_haystack.push(having_expr.clone());
+        }
+
         // All of the aggregate expressions (deduplicated).
-        let aggr_exprs = find_aggregate_exprs(&select_exprs);
+        let aggr_exprs = find_aggregate_exprs(&aggr_expr_haystack);
 
-        let (plan, select_exprs_post_aggr) =
+        let (plan, select_exprs_post_aggr, having_expr_post_aggr_opt) =
             if !select.group_by.is_empty() || !aggr_exprs.is_empty() {
-                self.aggregate(&plan, &select_exprs, &select.group_by, &aggr_exprs)?
+                self.aggregate(
+                    &plan,
+                    &select_exprs,
+                    &having_expr_opt,
+                    &select.group_by,
+                    &aggr_exprs,
+                )?
             } else {
-                (plan, select_exprs)
+                if let Some(having_expr) = &having_expr_opt {
+                    let available_columns = select_exprs
+                        .iter()
+                        .map(|expr| expr_as_column_expr(expr, &plan))
+                        .collect::<Result<Vec<Expr>>>()?;
+
+                    // Ensure the HAVING expression is using only columns
+                    // provided by the SELECT.
+                    if !can_columns_satisfy_exprs(
+                        &available_columns,
+                        &vec![having_expr.clone()],

Review comment:
       FYI, I opened https://github.com/apache/arrow/pulls to address this!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-771222900


   I plan to review this carefully tomorrow


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] drusso commented on pull request #9364: ARROW-11431: [Rust][DataFusion] Support the HAVING clause.

Posted by GitBox <gi...@apache.org>.
drusso commented on pull request #9364:
URL: https://github.com/apache/arrow/pull/9364#issuecomment-770044136


   Thanks!
   
   I will definitely add some more tests. There are also a couple of clippy errors for me to address. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org