You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/05/03 15:43:38 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #6206: RFC: Distinguish between argument coercion and output type

alamb opened a new pull request, #6206:
URL: https://github.com/apache/arrow-datafusion/pull/6206

   I am trying to collect some feedback on this approach before pushing along with it
   
   # Which issue does this PR close?
   
   Closes https://github.com/apache/arrow-datafusion/issues/3419
   
   # Rationale for this change
   
   Every time I see the type coercion and binary.rs I am saddened -- it is overly complicated for what it is doing (which is already complicated). Since I spent a while this morning loading in the context into my head to review https://github.com/apache/arrow-datafusion/pull/6196 and related code, I figured I would start trying to clean this up.
   
   # What changes are included in this PR?
   
   1. Introduce a structure `Coercion` to represent what coercion is needed for given argument operands and operator
   
   
   # Are these changes tested?
   
   Existing tests
   
   # Are there any user-facing changes?
   
   As I have it now, there is no user facing API change, but I would like to press on and remove the old API eventually


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #6206: RFC: Distinguish between argument coercion and output type

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on code in PR #6206:
URL: https://github.com/apache/arrow-datafusion/pull/6206#discussion_r1184606516


##########
datafusion/expr/src/type_coercion/binary.rs:
##########
@@ -106,39 +106,62 @@ pub fn binary_operator_data_type(
     }
 }
 
-/// Coercion rules for all binary operators. Returns the output type
-/// of applying `op` to an argument of `lhs_type` and `rhs_type`.
-///
-/// Returns None if no suitable type can be found.
-///
-/// TODO this function is trying to serve two purposes at once; it
-/// determines the result type of the binary operation and also
-/// determines how the inputs can be coerced but this results in
-/// inconsistencies in some cases (particular around date + interval)
-/// when the input argument types do not match the output argument
-/// types
+/// The result of applying coercion to a binary operator.
+pub struct Coercion {

Review Comment:
   This is a very nice PR!👍. 
   
   I have discuss about it with @liukun4515 . 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #6206: RFC: Distinguish between argument coercion and output type

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on code in PR #6206:
URL: https://github.com/apache/arrow-datafusion/pull/6206#discussion_r1184606516


##########
datafusion/expr/src/type_coercion/binary.rs:
##########
@@ -106,39 +106,62 @@ pub fn binary_operator_data_type(
     }
 }
 
-/// Coercion rules for all binary operators. Returns the output type
-/// of applying `op` to an argument of `lhs_type` and `rhs_type`.
-///
-/// Returns None if no suitable type can be found.
-///
-/// TODO this function is trying to serve two purposes at once; it
-/// determines the result type of the binary operation and also
-/// determines how the inputs can be coerced but this results in
-/// inconsistencies in some cases (particular around date + interval)
-/// when the input argument types do not match the output argument
-/// types
+/// The result of applying coercion to a binary operator.
+pub struct Coercion {

Review Comment:
   This is a very nice PR!👍. 
   
   I have discuss about it with @liukun4515 . 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed pull request #6206: RFC: Distinguish between argument coercion and output type

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed pull request #6206: RFC: Distinguish between argument coercion and output type
URL: https://github.com/apache/arrow-datafusion/pull/6206


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6206: RFC: Distinguish between argument coercion and output type

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6206:
URL: https://github.com/apache/arrow-datafusion/pull/6206#issuecomment-1534855911

   related PR https://github.com/apache/arrow-datafusion/pull/6221


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6206: RFC: Distinguish between argument coercion and output type

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6206:
URL: https://github.com/apache/arrow-datafusion/pull/6206#issuecomment-1533546277

   > I am doing a PR to separate getCommonType / getResultType
   
   
   
   Sounds great -- thank you @jackwener 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6206: RFC: Distinguish between argument coercion and output type

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #6206:
URL: https://github.com/apache/arrow-datafusion/pull/6206#discussion_r1183877003


##########
datafusion/expr/src/type_coercion/binary.rs:
##########
@@ -106,39 +106,62 @@ pub fn binary_operator_data_type(
     }
 }
 
-/// Coercion rules for all binary operators. Returns the output type
-/// of applying `op` to an argument of `lhs_type` and `rhs_type`.
-///
-/// Returns None if no suitable type can be found.
-///
-/// TODO this function is trying to serve two purposes at once; it
-/// determines the result type of the binary operation and also
-/// determines how the inputs can be coerced but this results in
-/// inconsistencies in some cases (particular around date + interval)
-/// when the input argument types do not match the output argument
-/// types
+/// The result of applying coercion to a binary operator.
+pub struct Coercion {

Review Comment:
   This is the key difference in this PR -- the result of the coercion calculation that splits up the argument types needed from the output type.  The current API (`coerce_types`) intermixes the two, as explained in https://github.com/apache/arrow-datafusion/issues/3419
   
   I am not sure this would cover all coercions, but I think it would cover the binary ones. 
   
   I also noticed that @viirya  has added some non trivial decimal coercion logic that delays coercion that I don't really understand that I haven't fully worked out



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on pull request #6206: RFC: Distinguish between argument coercion and output type

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on PR #6206:
URL: https://github.com/apache/arrow-datafusion/pull/6206#issuecomment-1533442450

   I am doing a PR to separate getCommonType / getResultType


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org