You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jackwener (via GitHub)" <gi...@apache.org> on 2023/04/02 15:42:23 UTC

[GitHub] [arrow-datafusion] jackwener opened a new pull request, #5831: refactor: move type_coercion to analyzer

jackwener opened a new pull request, #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #.
   
   # Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   # Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #5831: refactor: move type_coercion to analyzer

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on code in PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#discussion_r1158126963


##########
benchmarks/expected-plans/q11.txt:
##########
@@ -16,7 +16,7 @@
 |               |                 Filter: nation.n_name = Utf8("GERMANY")                                                                                                                         |
 |               |                   TableScan: nation projection=[n_nationkey, n_name]                                                                                                            |
 |               |         SubqueryAlias: __scalar_sq_1                                                                                                                                            |
-|               |           Projection: CAST(SUM(partsupp.ps_supplycost * partsupp.ps_availqty) AS Float64) * Float64(0.0001) AS __value                                                          |
+|               |           Projection: CAST(CAST(SUM(partsupp.ps_supplycost * partsupp.ps_availqty) AS Float64) * Float64(0.0001) AS Decimal128(38, 15)) AS __value                              |

Review Comment:
   About merge cast, I prepare to do it in following PR with polish subquery type-coercion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #5831: refactor: move type_coercion to analyzer

Posted by "liukun4515 (via GitHub)" <gi...@apache.org>.
liukun4515 commented on code in PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#discussion_r1158420864


##########
datafusion/optimizer/src/analyzer/mod.rs:
##########
@@ -17,10 +17,12 @@
 
 mod count_wildcard_rule;
 mod inline_table_scan;
+pub(crate) mod type_coercion;
 
 use crate::analyzer::count_wildcard_rule::CountWildcardRule;
 use crate::analyzer::inline_table_scan::InlineTableScan;
 
+use crate::analyzer::type_coercion::TypeCoercion;

Review Comment:
   we can also close this issue https://github.com/apache/arrow-datafusion/issues/3582
   cc @alamb @jackwener 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] melgenek commented on a diff in pull request #5831: refactor: move type_coercion to analyzer

Posted by "melgenek (via GitHub)" <gi...@apache.org>.
melgenek commented on code in PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#discussion_r1158188354


##########
datafusion/core/tests/sqllogictests/test_files/dates.slt:
##########
@@ -85,6 +85,6 @@ g
 h
 
 ## Plan error when compare Utf8 and timestamp in where clause
-statement error DataFusion error: Error during planning: The type of Timestamp\(Nanosecond, Some\("\+00:00"\)\) Plus Utf8 of binary physical should be same
+statement error DataFusion error: Error during planning: Timestamp(Nanosecond, Some("+00:00")) + Utf8 can't be evaluated because there isn't a common type to coerce the types to

Review Comment:
   ```suggestion
   statement error DataFusion error: Error during planning: Timestamp\(Nanosecond, Some\("\+00:00"\)\) \+ Utf8 can't be evaluated because there isn't a common type to coerce the types to
   ```
   
   Regarding https://github.com/apache/arrow-datafusion/pull/5831#issuecomment-1497036876: Sqllogictest supports regex expressions. `()+` are special characters, so the error was parsed as a regex.
   
   Could you please bump the sqllogictest version to `0.13.2` here https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/Cargo.toml#L120. Then `--complete` would work properly for regex-like errors. I am sorry, I somehow missed that the sqllogictest update didn't reach main.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on pull request #5831: refactor: move type_coercion to analyzer

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#issuecomment-1497033428

   This PR contains a fix for type coercion of subquery. I will polish it in following PR.
   This fix is important, because it will move `cast` from `expression` into `subplan`. It means that we don't `cast expression` in `eval expression` and we do cast before `eval expression`.
   But look like it just a little help for performance.
   
   ```sql
   --- before
   tpch q17
   cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path ./data --format tbl --query 17 --batch-size 4096
   Query 17 iteration 0 took 5233.1 ms and returned 1 rows
   Query 17 iteration 1 took 4940.8 ms and returned 1 rows
   Query 17 iteration 2 took 5160.2 ms and returned 1 rows
   Query 17 iteration 3 took 5315.6 ms and returned 1 rows
   Query 17 iteration 4 took 4967.7 ms and returned 1 rows
   Query 17 avg time: 5123.48 ms
   ```
   
   ```sql
   --- after
   tpch q17
   Query 17 iteration 0 took 4789.5 ms and returned 1 rows
   Query 17 iteration 1 took 4785.2 ms and returned 1 rows
   Query 17 iteration 2 took 4791.5 ms and returned 1 rows
   Query 17 iteration 3 took 5051.4 ms and returned 1 rows
   Query 17 iteration 4 took 4817.7 ms and returned 1 rows
   Query 17 avg time: 4847.07 ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #5831: refactor: move type_coercion to analyzer

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on code in PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#discussion_r1158362515


##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -290,13 +293,30 @@ impl ExprSchemable for Expr {
     /// This function errors when it is impossible to cast the
     /// expression to the target [arrow::datatypes::DataType].
     fn cast_to<S: ExprSchema>(self, cast_to_type: &DataType, schema: &S) -> Result<Expr> {
+        let this_type = self.get_type(schema)?;
+        if this_type == *cast_to_type {
+            return Ok(self);
+        }
+
         // TODO(kszucs): most of the operations do not validate the type correctness
         // like all of the binary expressions below. Perhaps Expr should track the
         // type of the expression?
-        let this_type = self.get_type(schema)?;
-        if this_type == *cast_to_type {
-            Ok(self)
-        } else if can_cast_types(&this_type, cast_to_type) {
+
+        // TODO(jackwener): Handle subqueries separately, need to refactor it.

Review Comment:
   add a ticket #5877. 
   
   >  I am not sure if this comment mean you want to improve the code or if there is some bug / limitiation
   It's just for improving codeπŸ˜‚.



##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -290,13 +293,30 @@ impl ExprSchemable for Expr {
     /// This function errors when it is impossible to cast the
     /// expression to the target [arrow::datatypes::DataType].
     fn cast_to<S: ExprSchema>(self, cast_to_type: &DataType, schema: &S) -> Result<Expr> {
+        let this_type = self.get_type(schema)?;
+        if this_type == *cast_to_type {
+            return Ok(self);
+        }
+
         // TODO(kszucs): most of the operations do not validate the type correctness
         // like all of the binary expressions below. Perhaps Expr should track the
         // type of the expression?
-        let this_type = self.get_type(schema)?;
-        if this_type == *cast_to_type {
-            Ok(self)
-        } else if can_cast_types(&this_type, cast_to_type) {
+
+        // TODO(jackwener): Handle subqueries separately, need to refactor it.

Review Comment:
   add a ticket #5877. 
   
   >  I am not sure if this comment mean you want to improve the code or if there is some bug / limitiation
   
   It's just for improving codeπŸ˜‚.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5831: refactor: move type_coercion to analyzer

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#discussion_r1158472961


##########
datafusion/optimizer/src/analyzer/mod.rs:
##########
@@ -17,10 +17,12 @@
 
 mod count_wildcard_rule;
 mod inline_table_scan;
+pub(crate) mod type_coercion;
 
 use crate::analyzer::count_wildcard_rule::CountWildcardRule;
 use crate::analyzer::inline_table_scan::InlineTableScan;
 
+use crate::analyzer::type_coercion::TypeCoercion;

Review Comment:
   FYI @mingmwang  I think you have discussed this as well in the past. It is great to see it has finally been done -- thanks @jackwener !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #5831: refactor: move type_coercion to analyzer

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on code in PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#discussion_r1158126963


##########
benchmarks/expected-plans/q11.txt:
##########
@@ -16,7 +16,7 @@
 |               |                 Filter: nation.n_name = Utf8("GERMANY")                                                                                                                         |
 |               |                   TableScan: nation projection=[n_nationkey, n_name]                                                                                                            |
 |               |         SubqueryAlias: __scalar_sq_1                                                                                                                                            |
-|               |           Projection: CAST(SUM(partsupp.ps_supplycost * partsupp.ps_availqty) AS Float64) * Float64(0.0001) AS __value                                                          |
+|               |           Projection: CAST(CAST(SUM(partsupp.ps_supplycost * partsupp.ps_availqty) AS Float64) * Float64(0.0001) AS Decimal128(38, 15)) AS __value                              |

Review Comment:
   About merge cast, I prepare to do it in following PR with polish subquery type-coercion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #5831: refactor: move type_coercion to analyzer

Posted by "liukun4515 (via GitHub)" <gi...@apache.org>.
liukun4515 commented on code in PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#discussion_r1158401646


##########
datafusion/optimizer/src/analyzer/mod.rs:
##########
@@ -17,10 +17,12 @@
 
 mod count_wildcard_rule;
 mod inline_table_scan;
+pub(crate) mod type_coercion;
 
 use crate::analyzer::count_wildcard_rule::CountWildcardRule;
 use crate::analyzer::inline_table_scan::InlineTableScan;
 
+use crate::analyzer::type_coercion::TypeCoercion;

Review Comment:
   πŸ‘



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on pull request #5831: refactor: move type_coercion to analyzer

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#issuecomment-1499030395

   > @jackwener One thing need to confirm with you is the `Error` behavior.
   > Originally the optimizer rules might return an error, and the logical plan optimizer will skip the failed rule by default and will
   > not fail the SQL. But I think for Analyzer rules, if the rule return an `Error`, I think the SQL should failure immediately.
   > For the `type_coercion` rule, the rule itself is to ensure the correctness, I think if there is error, the SQL should failure.
   
    Agree with it, before this PR, I already fix some bug about it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #5831: refactor: move type_coercion to analyzer

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on code in PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#discussion_r1158119708


##########
datafusion/core/tests/sql/timestamp.rs:
##########
@@ -925,14 +925,12 @@ async fn test_ts_dt_binary_ops() -> Result<()> {
     let batch = &plan[0];
     let mut res: Option<String> = None;
     for row in 0..batch.num_rows() {
-        if &array_value_to_string(batch.column(0), row)?
-            == "logical_plan after type_coercion"
-        {
+        if &array_value_to_string(batch.column(0), row)? == "initial_logical_plan" {
             res = Some(array_value_to_string(batch.column(1), row)?);
             break;
         }
     }
-    assert_eq!(res, Some("Projection: CAST(Utf8(\"2000-01-01\") AS Timestamp(Nanosecond, None)) >= CAST(CAST(Utf8(\"2000-01-01\") AS Date32) AS Timestamp(Nanosecond, None))\n  EmptyRelation".to_string()));
+    assert_eq!(res, Some("Projection: CAST(Utf8(\"2000-01-01\") AS Timestamp(Nanosecond, None)) >= CAST(Utf8(\"2000-01-01\") AS Date32)\n  EmptyRelation".to_string()));

Review Comment:
   After move type_coercion into analyzer, it don't run multiple time, so we can avoid useless cast.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on pull request #5831: refactor: move type_coercion to analyzer

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#issuecomment-1497036876

   slt run fail, but it's strange. cc @alamb @melgenek 
   
   ```sql
   [dates.slt] Running query: "select i_item_desc
   from test
   where d3_date > d2_date + INTERVAL '1 days';"
   Error: statement is expected to fail with error:
   [dates.slt] Running query: "select i_item_desc
   	DataFusion error: Error during planning: Timestamp(Nanosecond, Some("+00:00")) + Utf8 can't be evaluated because there isn't a common type to coerce the types to
   from test
   but got error:
   where d3_date > d2_date + INTERVAL '5 days';"
   	DataFusion error: Error during planning: Timestamp(Nanosecond, Some("+00:00")) + Utf8 can't be evaluated because there isn't a common type to coerce the types to
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5831: refactor: move type_coercion to analyzer

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#discussion_r1158370742


##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -290,13 +293,30 @@ impl ExprSchemable for Expr {
     /// This function errors when it is impossible to cast the
     /// expression to the target [arrow::datatypes::DataType].
     fn cast_to<S: ExprSchema>(self, cast_to_type: &DataType, schema: &S) -> Result<Expr> {
+        let this_type = self.get_type(schema)?;
+        if this_type == *cast_to_type {
+            return Ok(self);
+        }
+
         // TODO(kszucs): most of the operations do not validate the type correctness
         // like all of the binary expressions below. Perhaps Expr should track the
         // type of the expression?
-        let this_type = self.get_type(schema)?;
-        if this_type == *cast_to_type {
-            Ok(self)
-        } else if can_cast_types(&this_type, cast_to_type) {
+
+        // TODO(jackwener): Handle subqueries separately, need to refactor it.

Review Comment:
   I don't think there is any need for a ticket to track improving the code πŸ‘  in the future . Thank you for filing one anyway!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on pull request #5831: refactor: move type_coercion to analyzer

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#issuecomment-1497040552

   PR is ready for reivew, cc @mingmwang @alamb @liukun4515 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5831: refactor: move type_coercion to analyzer

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#discussion_r1158338978


##########
datafusion/core/tests/sqllogictests/test_files/dates.slt:
##########
@@ -85,6 +85,6 @@ g
 h
 
 ## Plan error when compare Utf8 and timestamp in where clause
-statement error DataFusion error: Error during planning: The type of Timestamp\(Nanosecond, Some\("\+00:00"\)\) Plus Utf8 of binary physical should be same
+statement error DataFusion error: Error during planning: Timestamp(Nanosecond, Some("+00:00")) + Utf8 can't be evaluated because there isn't a common type to coerce the types to

Review Comment:
   > Then --complete would work properly for regex-like errors. I am sorry, I somehow missed that the sqllogictest update didn't reach main
   
   If you run `cargo update` in your workspace `sqlloictest` will get updated, but @melgenek is right until we increase Cargo.toml it won't happen automatically.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #5831: refactor: move type_coercion to analyzer

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on code in PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#discussion_r1158191327


##########
datafusion/core/tests/sqllogictests/test_files/dates.slt:
##########
@@ -85,6 +85,6 @@ g
 h
 
 ## Plan error when compare Utf8 and timestamp in where clause
-statement error DataFusion error: Error during planning: The type of Timestamp\(Nanosecond, Some\("\+00:00"\)\) Plus Utf8 of binary physical should be same
+statement error DataFusion error: Error during planning: Timestamp(Nanosecond, Some("+00:00")) + Utf8 can't be evaluated because there isn't a common type to coerce the types to

Review Comment:
   Thank you! @melgenek 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on pull request #5831: refactor: move type_coercion to analyzer

Posted by "mingmwang (via GitHub)" <gi...@apache.org>.
mingmwang commented on PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#issuecomment-1499002868

   I will take a look tomorrow. 
   
   @jackwener One thing need to confirm with you is the `Error` behavior. 
   Originally the optimizer rules might return an error, and the logical plan optimizer will skip the failed rule by default and will
   not fail the SQL.  But I think for Analyzer rules, if the rule return an `Error`, I think the SQL should failure immediately. 
   For the `type_coercion` rule, the rule itself is to ensure the correctness, I think if there is error, the SQL should failure.
   
   #5831
   
   ```rust
          /// When set to true, the logical plan optimizer will produce warning
           /// messages if any optimization rules produce errors and then proceed to the next
           /// rule. When set to false, any rules that produce errors will cause the query to fail
           pub skip_failed_rules: bool, default = true
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5831: refactor: move type_coercion to analyzer

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#discussion_r1158343680


##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -290,13 +293,30 @@ impl ExprSchemable for Expr {
     /// This function errors when it is impossible to cast the
     /// expression to the target [arrow::datatypes::DataType].
     fn cast_to<S: ExprSchema>(self, cast_to_type: &DataType, schema: &S) -> Result<Expr> {
+        let this_type = self.get_type(schema)?;
+        if this_type == *cast_to_type {
+            return Ok(self);
+        }
+
         // TODO(kszucs): most of the operations do not validate the type correctness
         // like all of the binary expressions below. Perhaps Expr should track the
         // type of the expression?
-        let this_type = self.get_type(schema)?;
-        if this_type == *cast_to_type {
-            Ok(self)
-        } else if can_cast_types(&this_type, cast_to_type) {
+
+        // TODO(jackwener): Handle subqueries separately, need to refactor it.

Review Comment:
   Is this worth tracking with a ticket? I am not sure if  this comment mean you want to improve the code or if there is some bug / limitiation



##########
datafusion/optimizer/src/analyzer/mod.rs:
##########
@@ -17,10 +17,12 @@
 
 mod count_wildcard_rule;
 mod inline_table_scan;
+pub(crate) mod type_coercion;
 
 use crate::analyzer::count_wildcard_rule::CountWildcardRule;
 use crate::analyzer::inline_table_scan::InlineTableScan;
 
+use crate::analyzer::type_coercion::TypeCoercion;

Review Comment:
   πŸŽ‰ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener merged pull request #5831: refactor: move type_coercion to analyzer

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener merged PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5831: refactor: move type_coercion to analyzer

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5831:
URL: https://github.com/apache/arrow-datafusion/pull/5831#discussion_r1158370742


##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -290,13 +293,30 @@ impl ExprSchemable for Expr {
     /// This function errors when it is impossible to cast the
     /// expression to the target [arrow::datatypes::DataType].
     fn cast_to<S: ExprSchema>(self, cast_to_type: &DataType, schema: &S) -> Result<Expr> {
+        let this_type = self.get_type(schema)?;
+        if this_type == *cast_to_type {
+            return Ok(self);
+        }
+
         // TODO(kszucs): most of the operations do not validate the type correctness
         // like all of the binary expressions below. Perhaps Expr should track the
         // type of the expression?
-        let this_type = self.get_type(schema)?;
-        if this_type == *cast_to_type {
-            Ok(self)
-        } else if can_cast_types(&this_type, cast_to_type) {
+
+        // TODO(jackwener): Handle subqueries separately, need to refactor it.

Review Comment:
   I don't think there is any need for a ticket to track improving the code πŸ‘ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org