You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/23 13:26:14 UTC

[GitHub] [arrow] andygrove opened a new pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype

andygrove opened a new pull request #7253:
URL: https://github.com/apache/arrow/pull/7253


   Re-implement get_supertype so that it is complete and easier to comprehend.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #7253:
URL: https://github.com/apache/arrow/pull/7253#discussion_r431971445



##########
File path: rust/datafusion/src/optimizer/utils.rs
##########
@@ -131,116 +131,90 @@ pub fn exprlist_to_fields(expr: &[Expr], input_schema: &Schema) -> Result<Vec<Fi
 
 /// Given two datatypes, determine the supertype that both types can safely be cast to
 pub fn get_supertype(l: &DataType, r: &DataType) -> Result<DataType> {
-    match _get_supertype(l, r) {
-        Some(dt) => Ok(dt),
-        None => _get_supertype(r, l).ok_or_else(|| {
-            ExecutionError::InternalError(format!(
-                "Failed to determine supertype of {:?} and {:?}",
-                l, r
-            ))
-        }),
-    }
-}
-
-/// Given two datatypes, determine the supertype that both types can safely be cast to
-fn _get_supertype(l: &DataType, r: &DataType) -> Option<DataType> {
     use arrow::datatypes::DataType::*;
-    match (l, r) {
-        (UInt8, Int8) => Some(Int8),
-        (UInt8, Int16) => Some(Int16),
-        (UInt8, Int32) => Some(Int32),
-        (UInt8, Int64) => Some(Int64),
-
-        (UInt16, Int16) => Some(Int16),
-        (UInt16, Int32) => Some(Int32),
-        (UInt16, Int64) => Some(Int64),
-
-        (UInt32, Int32) => Some(Int32),
-        (UInt32, Int64) => Some(Int64),
-
-        (UInt64, Int64) => Some(Int64),
-
-        (Int8, UInt8) => Some(Int8),
-
-        (Int16, UInt8) => Some(Int16),
-        (Int16, UInt16) => Some(Int16),
-
-        (Int32, UInt8) => Some(Int32),
-        (Int32, UInt16) => Some(Int32),
-        (Int32, UInt32) => Some(Int32),
-
-        (Int64, UInt8) => Some(Int64),
-        (Int64, UInt16) => Some(Int64),
-        (Int64, UInt32) => Some(Int64),
-        (Int64, UInt64) => Some(Int64),
-
-        (UInt8, UInt8) => Some(UInt8),
-        (UInt8, UInt16) => Some(UInt16),
-        (UInt8, UInt32) => Some(UInt32),
-        (UInt8, UInt64) => Some(UInt64),
-        (UInt8, Float32) => Some(Float32),
-        (UInt8, Float64) => Some(Float64),
 
-        (UInt16, UInt8) => Some(UInt16),
-        (UInt16, UInt16) => Some(UInt16),
-        (UInt16, UInt32) => Some(UInt32),
-        (UInt16, UInt64) => Some(UInt64),
-        (UInt16, Float32) => Some(Float32),
-        (UInt16, Float64) => Some(Float64),
-
-        (UInt32, UInt8) => Some(UInt32),
-        (UInt32, UInt16) => Some(UInt32),
-        (UInt32, UInt32) => Some(UInt32),
-        (UInt32, UInt64) => Some(UInt64),
-        (UInt32, Float32) => Some(Float32),
-        (UInt32, Float64) => Some(Float64),
-
-        (UInt64, UInt8) => Some(UInt64),
-        (UInt64, UInt16) => Some(UInt64),
-        (UInt64, UInt32) => Some(UInt64),
-        (UInt64, UInt64) => Some(UInt64),
-        (UInt64, Float32) => Some(Float32),
-        (UInt64, Float64) => Some(Float64),
-
-        (Int8, Int8) => Some(Int8),
-        (Int8, Int16) => Some(Int16),
-        (Int8, Int32) => Some(Int32),
-        (Int8, Int64) => Some(Int64),
-        (Int8, Float32) => Some(Float32),
-        (Int8, Float64) => Some(Float64),
-
-        (Int16, Int8) => Some(Int16),
-        (Int16, Int16) => Some(Int16),
-        (Int16, Int32) => Some(Int32),
-        (Int16, Int64) => Some(Int64),
-        (Int16, Float32) => Some(Float32),
-        (Int16, Float64) => Some(Float64),
-
-        (Int32, Int8) => Some(Int32),
-        (Int32, Int16) => Some(Int32),
-        (Int32, Int32) => Some(Int32),
-        (Int32, Int64) => Some(Int64),
-        (Int32, Float32) => Some(Float32),
-        (Int32, Float64) => Some(Float64),
-
-        (Int64, Int8) => Some(Int64),
-        (Int64, Int16) => Some(Int64),
-        (Int64, Int32) => Some(Int64),
-        (Int64, Int64) => Some(Int64),
-        (Int64, Float32) => Some(Float32),
-        (Int64, Float64) => Some(Float64),
-
-        (Float32, Float32) => Some(Float32),
-        (Float32, Float64) => Some(Float64),
-        (Float64, Float32) => Some(Float64),
-        (Float64, Float64) => Some(Float64),
-
-        (Utf8, _) => Some(Utf8),
-        (_, Utf8) => Some(Utf8),
-
-        (Boolean, Boolean) => Some(Boolean),
+    if l == r {
+        return Ok(l.clone());
+    }
 
+    let super_type = match l {
+        UInt8 => match r {
+            UInt16 | UInt32 | UInt64 => Some(r.clone()),
+            Int16 | Int32 | Int64 => Some(r.clone()),
+            Float32 | Float64 => Some(r.clone()),
+            _ => None,
+        },
+        UInt16 => match r {
+            UInt8 => Some(l.clone()),
+            UInt32 | UInt64 => Some(r.clone()),
+            Int8 => Some(l.clone()),
+            Int32 | Int64 => Some(r.clone()),
+            Float32 | Float64 => Some(r.clone()),
+            _ => None,
+        },
+        UInt32 => match r {
+            UInt8 | UInt16 => Some(l.clone()),
+            UInt64 => Some(r.clone()),
+            Int8 | Int16 => Some(l.clone()),
+            Int64 => Some(r.clone()),
+            Float32 | Float64 => Some(r.clone()),
+            _ => None,
+        },
+        UInt64 => match r {
+            UInt8 | UInt16 | UInt32 => Some(l.clone()),
+            Int8 | Int16 | Int32 => Some(l.clone()),
+            Float32 | Float64 => Some(r.clone()),
+            _ => None,
+        },
+        Int8 => match r {

Review comment:
       with regards to int8 vs int16, I was thinking 255 from uint8 doesn't fit into range of int8, so increase of width to 16 is required.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #7253:
URL: https://github.com/apache/arrow/pull/7253#issuecomment-633146402


   Since integer to float conversion can also cause data loss, do we want to leave that to explicit cast path as well?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on a change in pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #7253:
URL: https://github.com/apache/arrow/pull/7253#discussion_r429578850



##########
File path: rust/datafusion/src/logicalplan.rs
##########
@@ -725,36 +725,47 @@ impl fmt::Debug for LogicalPlan {
 /// Verify a given type cast can be performed
 pub fn can_coerce_from(type_into: &DataType, type_from: &DataType) -> bool {
     use self::DataType::*;
+
+    if type_from == type_into {
+        return true;
+    }
+
     match type_into {
         Int8 => match type_from {
             Int8 => true,
             _ => false,
         },
         Int16 => match type_from {
-            Int8 | Int16 | UInt8 => true,
+            Int8 | Int16 => true,
+            UInt8 => true,
             _ => false,
         },
         Int32 => match type_from {
-            Int8 | Int16 | Int32 | UInt8 | UInt16 => true,
+            Int8 | Int16 | Int32 => true,
+            UInt8 | UInt16 => true,
             _ => false,
         },
         Int64 => match type_from {
-            Int8 | Int16 | Int32 | Int64 | UInt8 | UInt16 | UInt32 => true,
+            Int8 | Int16 | Int32 | Int64 => true,
+            UInt8 | UInt16 | UInt32 => true,
             _ => false,
         },
         UInt8 => match type_from {
             UInt8 => true,
             _ => false,
         },
         UInt16 => match type_from {
+            Int8 => true,

Review comment:
       Thanks, this is actually a bug. The goal of this can_coalesce method is to determine if DataFusion can safely cast from one type to another without any data loss, so it can add implicit casts in queries. This is convenient when the user is comparing two expressions of different types.
   
   Separately from this, users can add any explicit CAST they want to in their query plan.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7253:
URL: https://github.com/apache/arrow/pull/7253#issuecomment-633054362


   https://issues.apache.org/jira/browse/ARROW-4957


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #7253:
URL: https://github.com/apache/arrow/pull/7253#discussion_r429568842



##########
File path: rust/datafusion/src/logicalplan.rs
##########
@@ -725,36 +725,47 @@ impl fmt::Debug for LogicalPlan {
 /// Verify a given type cast can be performed
 pub fn can_coerce_from(type_into: &DataType, type_from: &DataType) -> bool {
     use self::DataType::*;
+
+    if type_from == type_into {
+        return true;
+    }
+
     match type_into {
         Int8 => match type_from {
             Int8 => true,
             _ => false,
         },
         Int16 => match type_from {
-            Int8 | Int16 | UInt8 => true,
+            Int8 | Int16 => true,
+            UInt8 => true,
             _ => false,
         },
         Int32 => match type_from {
-            Int8 | Int16 | Int32 | UInt8 | UInt16 => true,
+            Int8 | Int16 | Int32 => true,
+            UInt8 | UInt16 => true,
             _ => false,
         },
         Int64 => match type_from {
-            Int8 | Int16 | Int32 | Int64 | UInt8 | UInt16 | UInt32 => true,
+            Int8 | Int16 | Int32 | Int64 => true,
+            UInt8 | UInt16 | UInt32 => true,
             _ => false,
         },
         UInt8 => match type_from {
             UInt8 => true,
             _ => false,
         },
         UInt16 => match type_from {
+            Int8 => true,

Review comment:
       for my own curiosity, what's the expected behavior when a negative int8 is coerced into uint16?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #7253:
URL: https://github.com/apache/arrow/pull/7253#discussion_r429569531



##########
File path: rust/datafusion/src/optimizer/utils.rs
##########
@@ -131,116 +131,90 @@ pub fn exprlist_to_fields(expr: &[Expr], input_schema: &Schema) -> Result<Vec<Fi
 
 /// Given two datatypes, determine the supertype that both types can safely be cast to
 pub fn get_supertype(l: &DataType, r: &DataType) -> Result<DataType> {
-    match _get_supertype(l, r) {
-        Some(dt) => Ok(dt),
-        None => _get_supertype(r, l).ok_or_else(|| {
-            ExecutionError::InternalError(format!(
-                "Failed to determine supertype of {:?} and {:?}",
-                l, r
-            ))
-        }),
-    }
-}
-
-/// Given two datatypes, determine the supertype that both types can safely be cast to
-fn _get_supertype(l: &DataType, r: &DataType) -> Option<DataType> {
     use arrow::datatypes::DataType::*;
-    match (l, r) {
-        (UInt8, Int8) => Some(Int8),
-        (UInt8, Int16) => Some(Int16),
-        (UInt8, Int32) => Some(Int32),
-        (UInt8, Int64) => Some(Int64),
-
-        (UInt16, Int16) => Some(Int16),
-        (UInt16, Int32) => Some(Int32),
-        (UInt16, Int64) => Some(Int64),
-
-        (UInt32, Int32) => Some(Int32),
-        (UInt32, Int64) => Some(Int64),
-
-        (UInt64, Int64) => Some(Int64),
-
-        (Int8, UInt8) => Some(Int8),
-
-        (Int16, UInt8) => Some(Int16),
-        (Int16, UInt16) => Some(Int16),
-
-        (Int32, UInt8) => Some(Int32),
-        (Int32, UInt16) => Some(Int32),
-        (Int32, UInt32) => Some(Int32),
-
-        (Int64, UInt8) => Some(Int64),
-        (Int64, UInt16) => Some(Int64),
-        (Int64, UInt32) => Some(Int64),
-        (Int64, UInt64) => Some(Int64),
-
-        (UInt8, UInt8) => Some(UInt8),
-        (UInt8, UInt16) => Some(UInt16),
-        (UInt8, UInt32) => Some(UInt32),
-        (UInt8, UInt64) => Some(UInt64),
-        (UInt8, Float32) => Some(Float32),
-        (UInt8, Float64) => Some(Float64),
 
-        (UInt16, UInt8) => Some(UInt16),
-        (UInt16, UInt16) => Some(UInt16),
-        (UInt16, UInt32) => Some(UInt32),
-        (UInt16, UInt64) => Some(UInt64),
-        (UInt16, Float32) => Some(Float32),
-        (UInt16, Float64) => Some(Float64),
-
-        (UInt32, UInt8) => Some(UInt32),
-        (UInt32, UInt16) => Some(UInt32),
-        (UInt32, UInt32) => Some(UInt32),
-        (UInt32, UInt64) => Some(UInt64),
-        (UInt32, Float32) => Some(Float32),
-        (UInt32, Float64) => Some(Float64),
-
-        (UInt64, UInt8) => Some(UInt64),
-        (UInt64, UInt16) => Some(UInt64),
-        (UInt64, UInt32) => Some(UInt64),
-        (UInt64, UInt64) => Some(UInt64),
-        (UInt64, Float32) => Some(Float32),
-        (UInt64, Float64) => Some(Float64),
-
-        (Int8, Int8) => Some(Int8),
-        (Int8, Int16) => Some(Int16),
-        (Int8, Int32) => Some(Int32),
-        (Int8, Int64) => Some(Int64),
-        (Int8, Float32) => Some(Float32),
-        (Int8, Float64) => Some(Float64),
-
-        (Int16, Int8) => Some(Int16),
-        (Int16, Int16) => Some(Int16),
-        (Int16, Int32) => Some(Int32),
-        (Int16, Int64) => Some(Int64),
-        (Int16, Float32) => Some(Float32),
-        (Int16, Float64) => Some(Float64),
-
-        (Int32, Int8) => Some(Int32),
-        (Int32, Int16) => Some(Int32),
-        (Int32, Int32) => Some(Int32),
-        (Int32, Int64) => Some(Int64),
-        (Int32, Float32) => Some(Float32),
-        (Int32, Float64) => Some(Float64),
-
-        (Int64, Int8) => Some(Int64),
-        (Int64, Int16) => Some(Int64),
-        (Int64, Int32) => Some(Int64),
-        (Int64, Int64) => Some(Int64),
-        (Int64, Float32) => Some(Float32),
-        (Int64, Float64) => Some(Float64),
-
-        (Float32, Float32) => Some(Float32),
-        (Float32, Float64) => Some(Float64),
-        (Float64, Float32) => Some(Float64),
-        (Float64, Float64) => Some(Float64),
-
-        (Utf8, _) => Some(Utf8),
-        (_, Utf8) => Some(Utf8),
-
-        (Boolean, Boolean) => Some(Boolean),
+    if l == r {
+        return Ok(l.clone());
+    }
 
+    let super_type = match l {
+        UInt8 => match r {
+            UInt16 | UInt32 | UInt64 => Some(r.clone()),
+            Int16 | Int32 | Int64 => Some(r.clone()),
+            Float32 | Float64 => Some(r.clone()),
+            _ => None,
+        },
+        UInt16 => match r {
+            UInt8 => Some(l.clone()),
+            UInt32 | UInt64 => Some(r.clone()),
+            Int8 => Some(l.clone()),
+            Int32 | Int64 => Some(r.clone()),
+            Float32 | Float64 => Some(r.clone()),
+            _ => None,
+        },
+        UInt32 => match r {
+            UInt8 | UInt16 => Some(l.clone()),
+            UInt64 => Some(r.clone()),
+            Int8 | Int16 => Some(l.clone()),
+            Int64 => Some(r.clone()),
+            Float32 | Float64 => Some(r.clone()),
+            _ => None,
+        },
+        UInt64 => match r {
+            UInt8 | UInt16 | UInt32 => Some(l.clone()),
+            Int8 | Int16 | Int32 => Some(l.clone()),
+            Float32 | Float64 => Some(r.clone()),
+            _ => None,
+        },
+        Int8 => match r {

Review comment:
       should we support signed and unsigned with the same width as well? for example (int8, uint8) -> int16.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #7253:
URL: https://github.com/apache/arrow/pull/7253#issuecomment-633144310


   @houqp I pushed more changes. There will be more to do but for now, DataFusion will only add implicit casts when safe (so no more conversions between signed and unsigned int types).
   
   Also, for now, no unsafe casts are supported explicitly either. I will create a JIRA to add support for explicit casts between signed and unsigned types.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #7253:
URL: https://github.com/apache/arrow/pull/7253#discussion_r429575153



##########
File path: rust/datafusion/src/logicalplan.rs
##########
@@ -725,36 +725,47 @@ impl fmt::Debug for LogicalPlan {
 /// Verify a given type cast can be performed
 pub fn can_coerce_from(type_into: &DataType, type_from: &DataType) -> bool {
     use self::DataType::*;
+
+    if type_from == type_into {
+        return true;
+    }
+
     match type_into {
         Int8 => match type_from {
             Int8 => true,
             _ => false,
         },
         Int16 => match type_from {
-            Int8 | Int16 | UInt8 => true,
+            Int8 | Int16 => true,
+            UInt8 => true,
             _ => false,
         },
         Int32 => match type_from {
-            Int8 | Int16 | Int32 | UInt8 | UInt16 => true,
+            Int8 | Int16 | Int32 => true,
+            UInt8 | UInt16 => true,
             _ => false,
         },
         Int64 => match type_from {
-            Int8 | Int16 | Int32 | Int64 | UInt8 | UInt16 | UInt32 => true,
+            Int8 | Int16 | Int32 | Int64 => true,
+            UInt8 | UInt16 | UInt32 => true,
             _ => false,
         },
         UInt8 => match type_from {
             UInt8 => true,
             _ => false,
         },
         UInt16 => match type_from {
+            Int8 => true,

Review comment:
       looking at the code, looks like it will return null for negative values. if that's the case, we should be able to support conversion from signed to unsigned with the same width as well right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #7253:
URL: https://github.com/apache/arrow/pull/7253#issuecomment-633254544


   I'm rethinking the approach here. I am fixing two things in this PR and it might be better to attempt to address them separately. The initial goal was to re-implement `get_supertype` for the implicit casting case. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove closed pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype

Posted by GitBox <gi...@apache.org>.
andygrove closed pull request #7253:
URL: https://github.com/apache/arrow/pull/7253


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #7253:
URL: https://github.com/apache/arrow/pull/7253#issuecomment-654248872


   I'm going to take another run at this, with smaller changes in new PRs. Thanks for the reviews so far.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org