You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/14 16:47:50 UTC

[GitHub] [arrow] alamb opened a new pull request #8463: ARROW-10163: [Rust] [DataFusion] Add DictionaryArray coercion support

alamb opened a new pull request #8463:
URL: https://github.com/apache/arrow/pull/8463


   This PR builds on https://github.com/apache/arrow/pull/8460 so leaving as a draft until that is merged
   
   It adds:
   1. Basic `DictionaryArray` coercion (not cast) support in DataFusion
   2. A test in `sql.rs` demonstrating basic  `DictionaryArray` support 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #8463: ARROW-10163: [Rust] [DataFusion] Add DictionaryArray coercion support

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #8463:
URL: https://github.com/apache/arrow/pull/8463


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8463: ARROW-10163: [Rust] [DataFusion] Add DictionaryArray coercion support

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8463:
URL: https://github.com/apache/arrow/pull/8463#discussion_r507172485



##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -930,14 +930,20 @@ fn register_alltypes_parquet(ctx: &mut ExecutionContext) {
 /// Execute query and return result set as 2-d table of Vecs
 /// `result[row][column]`
 async fn execute(ctx: &mut ExecutionContext, sql: &str) -> Vec<Vec<String>> {
-    let plan = ctx.create_logical_plan(&sql).unwrap();
+    let msg = format!("Creating logical plan for '{}'", sql);
+    let plan = ctx.create_logical_plan(&sql).expect(&msg);

Review comment:
       I see what you mean. I take my comment back (was LGTM regardless) :) 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8463: ARROW-10163: [Rust] [DataFusion] Add DictionaryArray coercion support

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8463:
URL: https://github.com/apache/arrow/pull/8463#issuecomment-708528485


   https://issues.apache.org/jira/browse/ARROW-10163


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8463: ARROW-10163: [Rust] [DataFusion] Add DictionaryArray coercion support

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8463:
URL: https://github.com/apache/arrow/pull/8463#discussion_r507106463



##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -930,14 +930,20 @@ fn register_alltypes_parquet(ctx: &mut ExecutionContext) {
 /// Execute query and return result set as 2-d table of Vecs
 /// `result[row][column]`
 async fn execute(ctx: &mut ExecutionContext, sql: &str) -> Vec<Vec<String>> {
-    let plan = ctx.create_logical_plan(&sql).unwrap();
+    let msg = format!("Creating logical plan for '{}'", sql);
+    let plan = ctx.create_logical_plan(&sql).expect(&msg);

Review comment:
       I am not quite sure what you are suggesting here @jorgecarleitao  -- I can revert these changes to `execute` if they are too confusing 
   
   Perhaps you suggesting I rewrite `execute` like:
   ```rust
   
   /// Execute query and return result set as 2-d table of Vecs
   /// `result[row][column]`
   async fn execute(ctx: &mut ExecutionContext, sql: &str) -> Result<Vec<Vec<String>>> {
       let plan = ctx.create_logical_plan(&sql)?
       let logical_schema = plan.schema();
   ....
       result_vec(&results)
   }
   ```
   ? If I did so, that would require changing most of the other `#[test]` functions in this file -- which I would prefer not to do in this PR (I could do it in a subsequent one if you like)
   
   
   The reason I added these messages was so that I could get more context of where  the tests were failing. For example, without these changes, failures generate an error like:
   
   ```
   ---- query_on_string_dictionary stdout ----
   thread 'query_on_string_dictionary' panicked at 'called `Result::unwrap()` on an `Err` value: General("\'Dictionary(Int32, Utf8) = Utf8\' can\'t be evaluated because there isn\'t a common type to coerce the types to")', datafusion/tests/sql.rs:940:16
   ```
   
   But with these changes, then the actual error message is displayed:
   
   ```
   ---- query_on_string_dictionary stdout ----
   thread 'query_on_string_dictionary' panicked at 'Creating physical plan for 'SELECT * FROM test WHERE d1 = 'three'': Projection: #d1
     Filter: #d1 Eq Utf8("three")
       TableScan: test projection=Some([0]): General("\'Dictionary(Int32, Utf8) = Utf8\' can\'t be evaluated because there isn\'t a common type to coerce the types to")', datafusion/tests/sql.rs:942:16
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   ```
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8463: ARROW-10163: [Rust] [DataFusion] Add DictionaryArray coercion support

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8463:
URL: https://github.com/apache/arrow/pull/8463#discussion_r506793405



##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -930,14 +930,20 @@ fn register_alltypes_parquet(ctx: &mut ExecutionContext) {
 /// Execute query and return result set as 2-d table of Vecs
 /// `result[row][column]`
 async fn execute(ctx: &mut ExecutionContext, sql: &str) -> Vec<Vec<String>> {
-    let plan = ctx.create_logical_plan(&sql).unwrap();
+    let msg = format!("Creating logical plan for '{}'", sql);
+    let plan = ctx.create_logical_plan(&sql).expect(&msg);

Review comment:
       an alternative here is return `Result` and place a `?`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8463: ARROW-10163: [Rust] [DataFusion] Add DictionaryArray coercion support

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8463:
URL: https://github.com/apache/arrow/pull/8463#issuecomment-711154929


   > LGTM, there's a comment from Jorge @alamb, let us know if you'd like to address it, otherwise we can merge this
   
   Thanks @nevi-me  -- I responded to @jorgecarleitao  and I have rebased this PR against master. I think it is good to go, but I will wait for @jorgecarleitao 's response


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8463: ARROW-10163: [Rust] [DataFusion] Add DictionaryArray coercion support

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8463:
URL: https://github.com/apache/arrow/pull/8463#discussion_r504827840



##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -1238,3 +1244,59 @@ async fn query_count_distinct() -> Result<()> {
     assert_eq!(expected, actual);
     Ok(())
 }
+

Review comment:
       This test demonstrates `DictionaryArray`s being used in DataFusion

##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -1161,16 +1197,13 @@ fn order_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType>
         return Some(lhs_type.clone());
     }
 
-    match numerical_coercion(lhs_type, rhs_type) {
-        None => {
-            // strings are naturally ordered, and thus ordering can be applied to them.
-            string_coercion(lhs_type, rhs_type)
-        }
-        t => t,
-    }
+    numerical_coercion(lhs_type, rhs_type)

Review comment:
       this is just a refactor, it is not meant to change the semantics

##########
File path: rust/datafusion/src/physical_plan/expressions.rs
##########
@@ -1773,11 +1809,13 @@ mod tests {
 
     // runs an end-to-end test of physical type coercion:
     // 1. construct a record batch with two columns of type A and B
+    //  (*_ARRAY is the Rust Arrow array type, and *_TYPE is the DataType of the elements)
     // 2. construct a physical expression of A OP B
     // 3. evaluate the expression
     // 4. verify that the resulting expression is of type C
+    // 5. verify that the results of evaluation are $VEC
     macro_rules! test_coercion {
-        ($A_ARRAY:ident, $A_TYPE:expr, $A_VEC:expr, $B_ARRAY:ident, $B_TYPE:expr, $B_VEC:expr, $OP:expr, $TYPEARRAY:ident, $TYPE:expr, $VEC:expr) => {{

Review comment:
       I updated the parameter names here to make this macro more consistent with its documentation




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org