You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/06/22 17:42:07 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #6748: Minor: Encapsulate `return_type` and `signature` in `

alamb opened a new pull request, #6748:
URL: https://github.com/apache/arrow-datafusion/pull/6748

   Reviewing the PR with whitespace blind diff I think makes it easier to see what is going on
   
   # Which issue does this PR close?
   
   Follow on to https://github.com/apache/arrow-datafusion/pull/6612
   
   # Rationale for this change
   
   
   This has been a pet peeve of mine and I think it makes working with this code harder than needed as  finding important functionality like "what is the signature of this function" harder because they aren't documented on https://docs.rs/datafusion-expr/26.0.0/datafusion_expr/window_function/enum.WindowFunction.html or https://docs.rs/datafusion-expr/26.0.0/datafusion_expr/aggregate_function/enum.AggregateFunction.html
   
   
   # What changes are included in this PR?
   
   1. Move `signature`, `return_type`  to functions on `WindowFunction` and `BuiltInWindowFunction` and `AggregateFunction`
   2. Mark the existing functions as deprecated
   
   # Are these changes tested?
   existing coverage
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   # Are there any user-facing changes?
   
   Hopefully easier to navigate code
   
   There is no API change as the existing methods still exist, they are just deprecated
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6748: Minor: Encapsulate `return_type` and `signature` in `

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #6748:
URL: https://github.com/apache/arrow-datafusion/pull/6748#discussion_r1238855825


##########
datafusion/expr/src/window_function.rs:
##########
@@ -246,10 +276,10 @@ mod tests {
     #[test]
     fn test_count_return_type() -> Result<()> {
         let fun = find_df_window_func("count").unwrap();
-        let observed = return_type(&fun, &[DataType::Utf8])?;
+        let observed = fun.return_type(&[DataType::Utf8])?;

Review Comment:
   This basically shows the new API -- use a method on the `fun` rather than have to find a free function



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6748: Minor: Encapsulate `return_type` and `signature` in `AggregateFunction` and `WindowFunction`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #6748:
URL: https://github.com/apache/arrow-datafusion/pull/6748#discussion_r1238857253


##########
datafusion/expr/src/aggregate_function.rs:
##########
@@ -172,58 +172,79 @@ impl FromStr for AggregateFunction {
 
 /// Returns the datatype of the aggregate function.
 /// This is used to get the returned data type for aggregate expr.
+#[deprecated(
+    since = "27.0.0",
+    note = "please use `AggregateFunction::return_type` instead"
+)]
 pub fn return_type(
     fun: &AggregateFunction,
     input_expr_types: &[DataType],
 ) -> Result<DataType> {
-    // Note that this function *must* return the same type that the respective physical expression returns
-    // or the execution panics.
+    fun.return_type(input_expr_types)
+}
 
-    let coerced_data_types = crate::type_coercion::aggregates::coerce_types(
-        fun,
-        input_expr_types,
-        &signature(fun),
-    )?;
+impl AggregateFunction {
+    /// Returns the datatype of the aggregate function given its argument types
+    ///
+    /// This is used to get the returned data type for aggregate expr.
+    pub fn return_type(&self, input_expr_types: &[DataType]) -> Result<DataType> {
+        // Note that this function *must* return the same type that the respective physical expression returns
+        // or the execution panics.
 
-    match fun {
-        AggregateFunction::Count | AggregateFunction::ApproxDistinct => {
-            Ok(DataType::Int64)
-        }
-        AggregateFunction::Max | AggregateFunction::Min => {
-            // For min and max agg function, the returned type is same as input type.
-            // The coerced_data_types is same with input_types.
-            Ok(coerced_data_types[0].clone())
-        }
-        AggregateFunction::Sum => sum_return_type(&coerced_data_types[0]),
-        AggregateFunction::BitAnd
-        | AggregateFunction::BitOr
-        | AggregateFunction::BitXor => Ok(coerced_data_types[0].clone()),
-        AggregateFunction::BoolAnd | AggregateFunction::BoolOr => Ok(DataType::Boolean),
-        AggregateFunction::Variance => variance_return_type(&coerced_data_types[0]),
-        AggregateFunction::VariancePop => variance_return_type(&coerced_data_types[0]),
-        AggregateFunction::Covariance => covariance_return_type(&coerced_data_types[0]),
-        AggregateFunction::CovariancePop => {
-            covariance_return_type(&coerced_data_types[0])
-        }
-        AggregateFunction::Correlation => correlation_return_type(&coerced_data_types[0]),
-        AggregateFunction::Stddev => stddev_return_type(&coerced_data_types[0]),
-        AggregateFunction::StddevPop => stddev_return_type(&coerced_data_types[0]),
-        AggregateFunction::Avg => avg_return_type(&coerced_data_types[0]),
-        AggregateFunction::ArrayAgg => Ok(DataType::List(Arc::new(Field::new(
-            "item",
-            coerced_data_types[0].clone(),
-            true,
-        )))),
-        AggregateFunction::ApproxPercentileCont => Ok(coerced_data_types[0].clone()),
-        AggregateFunction::ApproxPercentileContWithWeight => {
-            Ok(coerced_data_types[0].clone())
-        }
-        AggregateFunction::ApproxMedian | AggregateFunction::Median => {
-            Ok(coerced_data_types[0].clone())
-        }
-        AggregateFunction::Grouping => Ok(DataType::Int32),
-        AggregateFunction::FirstValue | AggregateFunction::LastValue => {
-            Ok(coerced_data_types[0].clone())
+        let coerced_data_types = crate::type_coercion::aggregates::coerce_types(
+            self,
+            input_expr_types,
+            &self.signature(),
+        )?;
+
+        match self {

Review Comment:
   This code is now indented one level more



##########
datafusion/expr/src/aggregate_function.rs:
##########
@@ -172,58 +172,79 @@ impl FromStr for AggregateFunction {
 
 /// Returns the datatype of the aggregate function.
 /// This is used to get the returned data type for aggregate expr.
+#[deprecated(
+    since = "27.0.0",
+    note = "please use `AggregateFunction::return_type` instead"
+)]
 pub fn return_type(
     fun: &AggregateFunction,
     input_expr_types: &[DataType],
 ) -> Result<DataType> {
-    // Note that this function *must* return the same type that the respective physical expression returns
-    // or the execution panics.
+    fun.return_type(input_expr_types)
+}
 
-    let coerced_data_types = crate::type_coercion::aggregates::coerce_types(
-        fun,
-        input_expr_types,
-        &signature(fun),
-    )?;
+impl AggregateFunction {
+    /// Returns the datatype of the aggregate function given its argument types
+    ///
+    /// This is used to get the returned data type for aggregate expr.
+    pub fn return_type(&self, input_expr_types: &[DataType]) -> Result<DataType> {
+        // Note that this function *must* return the same type that the respective physical expression returns
+        // or the execution panics.
 
-    match fun {
-        AggregateFunction::Count | AggregateFunction::ApproxDistinct => {
-            Ok(DataType::Int64)
-        }
-        AggregateFunction::Max | AggregateFunction::Min => {
-            // For min and max agg function, the returned type is same as input type.
-            // The coerced_data_types is same with input_types.
-            Ok(coerced_data_types[0].clone())
-        }
-        AggregateFunction::Sum => sum_return_type(&coerced_data_types[0]),
-        AggregateFunction::BitAnd
-        | AggregateFunction::BitOr
-        | AggregateFunction::BitXor => Ok(coerced_data_types[0].clone()),
-        AggregateFunction::BoolAnd | AggregateFunction::BoolOr => Ok(DataType::Boolean),
-        AggregateFunction::Variance => variance_return_type(&coerced_data_types[0]),
-        AggregateFunction::VariancePop => variance_return_type(&coerced_data_types[0]),
-        AggregateFunction::Covariance => covariance_return_type(&coerced_data_types[0]),
-        AggregateFunction::CovariancePop => {
-            covariance_return_type(&coerced_data_types[0])
-        }
-        AggregateFunction::Correlation => correlation_return_type(&coerced_data_types[0]),
-        AggregateFunction::Stddev => stddev_return_type(&coerced_data_types[0]),
-        AggregateFunction::StddevPop => stddev_return_type(&coerced_data_types[0]),
-        AggregateFunction::Avg => avg_return_type(&coerced_data_types[0]),
-        AggregateFunction::ArrayAgg => Ok(DataType::List(Arc::new(Field::new(
-            "item",
-            coerced_data_types[0].clone(),
-            true,
-        )))),
-        AggregateFunction::ApproxPercentileCont => Ok(coerced_data_types[0].clone()),
-        AggregateFunction::ApproxPercentileContWithWeight => {
-            Ok(coerced_data_types[0].clone())
-        }
-        AggregateFunction::ApproxMedian | AggregateFunction::Median => {
-            Ok(coerced_data_types[0].clone())
-        }
-        AggregateFunction::Grouping => Ok(DataType::Int32),
-        AggregateFunction::FirstValue | AggregateFunction::LastValue => {
-            Ok(coerced_data_types[0].clone())
+        let coerced_data_types = crate::type_coercion::aggregates::coerce_types(
+            self,
+            input_expr_types,
+            &self.signature(),
+        )?;
+
+        match self {

Review Comment:
   This code is now indented one level more -- there are no logical changes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #6748: Minor: Encapsulate `return_type` and `signature` in `AggregateFunction` and `WindowFunction`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #6748:
URL: https://github.com/apache/arrow-datafusion/pull/6748


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6748: Minor: Encapsulate `return_type` and `signature` in `AggregateFunction` and `WindowFunction`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6748:
URL: https://github.com/apache/arrow-datafusion/pull/6748#issuecomment-1616603477

   Thank you for the review @jackwener 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org