You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/05 11:06:33 UTC

[GitHub] [arrow] alamb commented on a change in pull request #8340: ARROW-10165: [Rust] [DataFusion]: Remove special case DataFusion casting checks in favor of Arrow cast kernel

alamb commented on a change in pull request #8340:
URL: https://github.com/apache/arrow/pull/8340#discussion_r499515276



##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -918,14 +918,20 @@ fn register_alltypes_parquet(ctx: &mut ExecutionContext) {
 /// Execute query and return result set as 2-d table of Vecs
 /// `result[row][column]`
 async fn execute(ctx: &mut ExecutionContext, sql: &str) -> Vec<Vec<String>> {
-    let plan = ctx.create_logical_plan(&sql).unwrap();
+    let msg = format!("Creating logical plan for '{}'", sql);

Review comment:
       These are some changes to improve the test error reporting (rather than a straight up panic, some diagnostic information is printed as well)

##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -1200,3 +1206,69 @@ async fn query_is_not_null() -> Result<()> {
     assert_eq!(expected, actual);
     Ok(())
 }
+
+#[tokio::test]
+async fn query_on_string_dictionary() -> Result<()> {

Review comment:
       Here is the end to end testcast I am working on for `DictionaryArray` support -- with this PR we can do basic filtering in DataFusion. There are a few more PRs needed to complete expressions and aggregation, but they are coming.

##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -1200,3 +1206,69 @@ async fn query_is_not_null() -> Result<()> {
     assert_eq!(expected, actual);
     Ok(())
 }
+
+#[tokio::test]
+async fn query_on_string_dictionary() -> Result<()> {
+    // Test to ensure DataFusion can operate on dictionary types
+    // Use StringDictionary (32 bit indexes = keys)
+    let field_type =
+        DataType::Dictionary(Box::new(DataType::Int32), Box::new(DataType::Utf8));
+    let schema = Arc::new(Schema::new(vec![Field::new("d1", field_type, true)]));
+
+    let keys_builder = PrimitiveBuilder::<Int32Type>::new(10);
+    let values_builder = StringBuilder::new(10);
+    let mut builder = StringDictionaryBuilder::new(keys_builder, values_builder);
+
+    builder.append("one")?;
+    builder.append_null()?;
+    builder.append("three")?;
+    let array = Arc::new(builder.finish());
+
+    let data = RecordBatch::try_new(schema.clone(), vec![array])?;
+
+    let table = MemTable::new(schema, vec![vec![data]])?;
+    let mut ctx = ExecutionContext::new();
+    ctx.register_table("test", Box::new(table));
+
+    // Basic SELECT
+    let sql = "SELECT * FROM test";
+    let actual = execute(&mut ctx, sql).await;
+    let expected = vec![vec!["one"], vec!["NULL"], vec!["three"]];
+    assert_eq!(expected, actual);
+
+    // basic filtering
+    let sql = "SELECT * FROM test WHERE d1 IS NOT NULL";
+    let actual = execute(&mut ctx, sql).await;
+    let expected = vec![vec!["one"], vec!["three"]];
+    assert_eq!(expected, actual);
+
+    // The following queries are not yet supported
+
+    // // filtering with constant

Review comment:
       I have PRs in the works to support these cases and I will uncomment them as I do so.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org