You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/06 12:34:27 UTC

[GitHub] [arrow] alamb commented on a change in pull request #8359: ARROW-10163: [Rust] [DataFusion] Add DictionaryArray coercion support to physical plans

alamb commented on a change in pull request #8359:
URL: https://github.com/apache/arrow/pull/8359#discussion_r500236277



##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -1166,16 +1201,13 @@ fn order_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType>
         return Some(lhs_type.clone());
     }
 
-    match numerical_coercion(lhs_type, rhs_type) {
-        None => {
-            // strings are naturally ordered, and thus ordering can be applied to them.
-            string_coercion(lhs_type, rhs_type)
-        }
-        t => t,
-    }
+    numerical_coercion(lhs_type, rhs_type)
+        .or_else(|| string_coercion(lhs_type, rhs_type))

Review comment:
       FWIW I don't understand why `order_coercion` and `eq_coercion` are different (eq_coercion does not include string_coercion).
   
   

##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -1542,20 +1574,7 @@ pub fn cast(
     if expr_type == cast_type {
         return Ok(expr.clone());
     }
-    if is_numeric(&expr_type) && (is_numeric(&cast_type) || cast_type == DataType::Utf8) {
-        Ok(Arc::new(CastExpr { expr, cast_type }))
-    } else if expr_type == DataType::Binary && cast_type == DataType::Utf8 {
-        Ok(Arc::new(CastExpr { expr, cast_type }))
-    } else if is_numeric(&expr_type)
-        && cast_type == DataType::Timestamp(TimeUnit::Nanosecond, None)
-    {
-        Ok(Arc::new(CastExpr { expr, cast_type }))
-    } else {
-        Err(ExecutionError::General(format!(
-            "Invalid CAST from {:?} to {:?}",
-            expr_type, cast_type
-        )))
-    }
+    Ok(Arc::new(CastExpr { expr, cast_type }))

Review comment:
       As in https://github.com/apache/arrow/pull/8340, I don't think partially redundant and incomplete checks for casting support in DataFusion adds much over the actual arrow casting, so I have removed the plan time checks

##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -1242,33 +1242,23 @@ async fn query_on_string_dictionary() -> Result<()> {
     let expected = vec![vec!["one"], vec!["three"]];
     assert_eq!(expected, actual);
 
-    // The following queries are not yet supported
-
-    // // filtering with constant
-    // let sql = "SELECT * FROM test WHERE d1 = 'three'";
-    // let actual = execute(&mut ctx, sql).await;
-    // let expected = vec![
-    //     vec!["three"],
-    // ];
-    // assert_eq!(expected, actual);
-
-    // // Expression evaluation
-    // let sql = "SELECT concat(d1, '-foo') FROM test";
-    // let actual = execute(&mut ctx, sql).await;
-    // let expected = vec![
-    //     vec!["one-foo"],
-    //     vec!["NULL"],
-    //     vec!["three-foo"],
-    // ];
-    // assert_eq!(expected, actual);
-
-    // // aggregation
-    // let sql = "SELECT COUNT(d1) FROM test";
-    // let actual = execute(&mut ctx, sql).await;
-    // let expected = vec![
-    //     vec!["2"]
-    // ];
-    // assert_eq!(expected, actual);
+    // filtering with constant

Review comment:
       here is the (simple) end to end test for DictionaryArray support

##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -1782,11 +1804,13 @@ mod tests {
 
     // runs an end-to-end test of physical type coercion:
     // 1. construct a record batch with two columns of type A and B
+    //  (*_ARRAY is the Rust Arrow array type, and *_TYPE is the DataType of the elements)

Review comment:
       Unrelated to this PR, I just renamed some of these arguments to match the comment description




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org