You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/14 22:48:47 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request #1449: Support identifiers with `.` in them

alamb opened a new pull request #1449:
URL: https://github.com/apache/arrow-datafusion/pull/1449


   # Which issue does this PR close?
   
   
   Closes https://github.com/apache/arrow-datafusion/issues/1432
   
    # Rationale for this change
   Field names containing period such as `f.c1` cannot be named in SQL query
   
   
   # What changes are included in this PR?
   1. Directly construct `Expr::Column` references rather than trying to parse a string using `col()` when converting SQL
   
   # Are there any user-facing changes?
   columns with periods can now be used
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1449: Support identifiers with `.` in them

Posted by GitBox <gi...@apache.org>.
xudong963 commented on a change in pull request #1449:
URL: https://github.com/apache/arrow-datafusion/pull/1449#discussion_r769217116



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -1062,8 +1062,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 }
 
                 let field = schema.field(field_index - 1);
-                let col_ident = SQLExpr::Identifier(Ident::new(field.qualified_name()));
-                self.sql_expr_to_logical_expr(&col_ident, schema)?
+                Expr::Column(field.qualified_column())

Review comment:
       👍, FYI @hntd187 

##########
File path: datafusion/tests/sql.rs
##########
@@ -5426,6 +5426,67 @@ async fn qualified_table_references() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn qualified_table_references_and_fields() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+
+    let c1: StringArray = vec!["foofoo", "foobar", "foobaz"]
+        .into_iter()
+        .map(Some)
+        .collect();
+    let c2: Int64Array = vec![1, 2, 3].into_iter().map(Some).collect();
+
+    let batch = RecordBatch::try_from_iter(vec![
+        ("f.c1", Arc::new(c1) as ArrayRef),
+        //  evil -- use the same name as the table
+        ("test.c2", Arc::new(c2) as ArrayRef),
+    ])?;
+
+    let table = MemTable::try_new(batch.schema(), vec![vec![batch]])?;
+    ctx.register_table("test", Arc::new(table))?;
+
+    // referring to the unquoted column is an error
+    let sql = format!(r#"SELECT f1.c1 from test"#);

Review comment:
       Clippy: useless_format




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1449: Support identifiers with `.` in them

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1449:
URL: https://github.com/apache/arrow-datafusion/pull/1449#discussion_r769107230



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -1323,9 +1325,14 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                     let var_names = vec![id.value.clone()];
                     Ok(Expr::ScalarVariable(var_names))
                 } else {
-                    // create a column expression based on raw user input, this column will be
-                    // normalized with qualifer later by the SQL planner.
-                    Ok(col(&id.value))
+                    // Don't use `col()` here because it will try to
+                    // interpret names with '.' as if they were
+                    // compound indenfiers, but this is not a compound
+                    // identifier. (e.g. it is "foo.bar" not foo.bar)
+                    Ok(Expr::Column(Column {

Review comment:
       This is the core change -- `col()` attempts to parse a string as a potentially compound identifier. For a `SQLExpr::Identifier` this should not be done and the value should be used as an unqualified relation




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1449: Support identifiers with `.` in them

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1449:
URL: https://github.com/apache/arrow-datafusion/pull/1449#issuecomment-994115746


   FYI @liukun4515 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1449: Support identifiers with `.` in them

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1449:
URL: https://github.com/apache/arrow-datafusion/pull/1449#discussion_r769108105



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -1062,8 +1062,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 }
 
                 let field = schema.field(field_index - 1);
-                let col_ident = SQLExpr::Identifier(Ident::new(field.qualified_name()));
-                self.sql_expr_to_logical_expr(&col_ident, schema)?
+                Expr::Column(field.qualified_column())

Review comment:
       This change avoids trying to reparse the string as a column name, and instead just creates the `Column` reference directly




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1449: Support identifiers with `.` in them

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1449:
URL: https://github.com/apache/arrow-datafusion/pull/1449#discussion_r769995757



##########
File path: datafusion/tests/sql.rs
##########
@@ -5426,6 +5426,67 @@ async fn qualified_table_references() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn qualified_table_references_and_fields() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+
+    let c1: StringArray = vec!["foofoo", "foobar", "foobaz"]
+        .into_iter()
+        .map(Some)
+        .collect();
+    let c2: Int64Array = vec![1, 2, 3].into_iter().map(Some).collect();
+
+    let batch = RecordBatch::try_from_iter(vec![
+        ("f.c1", Arc::new(c1) as ArrayRef),
+        //  evil -- use the same name as the table
+        ("test.c2", Arc::new(c2) as ArrayRef),
+    ])?;
+
+    let table = MemTable::try_new(batch.schema(), vec![vec![batch]])?;
+    ctx.register_table("test", Arc::new(table))?;
+
+    // referring to the unquoted column is an error
+    let sql = format!(r#"SELECT f1.c1 from test"#);
+    let error = ctx.create_logical_plan(&sql).unwrap_err();
+    assert_contains!(
+        error.to_string(),
+        "No field named 'f1.c1'. Valid fields are 'test.f.c1', 'test.test.c2'"
+    );
+
+    // however, enclosing it in double quotes is ok
+    let sql = format!(r#"SELECT "f.c1" from test"#);
+    let actual = execute_to_batches(&mut ctx, &sql).await;
+    let expected = vec![
+        "+--------+",
+        "| f.c1   |",

Review comment:
       I think it is "ok" from the point of view of DataFusion should run the query. I don't think anyone should actually do it 😆 
   
   Added a test for this case in 3e4418c67




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #1449: Support identifiers with `.` in them

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1449:
URL: https://github.com/apache/arrow-datafusion/pull/1449


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1449: Support identifiers with `.` in them

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1449:
URL: https://github.com/apache/arrow-datafusion/pull/1449#discussion_r769990495



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -1062,8 +1062,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 }
 
                 let field = schema.field(field_index - 1);
-                let col_ident = SQLExpr::Identifier(Ident::new(field.qualified_name()));
-                self.sql_expr_to_logical_expr(&col_ident, schema)?
+                Expr::Column(field.qualified_column())

Review comment:
       We are all learning together!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1449: Support identifiers with `.` in them

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1449:
URL: https://github.com/apache/arrow-datafusion/pull/1449#discussion_r769990897



##########
File path: datafusion/tests/sql.rs
##########
@@ -5426,6 +5426,67 @@ async fn qualified_table_references() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn qualified_table_references_and_fields() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+
+    let c1: StringArray = vec!["foofoo", "foobar", "foobaz"]
+        .into_iter()
+        .map(Some)
+        .collect();
+    let c2: Int64Array = vec![1, 2, 3].into_iter().map(Some).collect();
+
+    let batch = RecordBatch::try_from_iter(vec![
+        ("f.c1", Arc::new(c1) as ArrayRef),
+        //  evil -- use the same name as the table
+        ("test.c2", Arc::new(c2) as ArrayRef),
+    ])?;
+
+    let table = MemTable::try_new(batch.schema(), vec![vec![batch]])?;
+    ctx.register_table("test", Arc::new(table))?;
+
+    // referring to the unquoted column is an error
+    let sql = format!(r#"SELECT f1.c1 from test"#);

Review comment:
       will fix




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] hntd187 commented on a change in pull request #1449: Support identifiers with `.` in them

Posted by GitBox <gi...@apache.org>.
hntd187 commented on a change in pull request #1449:
URL: https://github.com/apache/arrow-datafusion/pull/1449#discussion_r769232972



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -1062,8 +1062,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 }
 
                 let field = schema.field(field_index - 1);
-                let col_ident = SQLExpr::Identifier(Ident::new(field.qualified_name()));
-                self.sql_expr_to_logical_expr(&col_ident, schema)?
+                Expr::Column(field.qualified_column())

Review comment:
       This is much better than what I had :-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #1449: Support identifiers with `.` in them

Posted by GitBox <gi...@apache.org>.
Jimexist commented on a change in pull request #1449:
URL: https://github.com/apache/arrow-datafusion/pull/1449#discussion_r769181060



##########
File path: datafusion/tests/sql.rs
##########
@@ -5426,6 +5426,67 @@ async fn qualified_table_references() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn qualified_table_references_and_fields() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+
+    let c1: StringArray = vec!["foofoo", "foobar", "foobaz"]
+        .into_iter()
+        .map(Some)
+        .collect();
+    let c2: Int64Array = vec![1, 2, 3].into_iter().map(Some).collect();
+
+    let batch = RecordBatch::try_from_iter(vec![
+        ("f.c1", Arc::new(c1) as ArrayRef),
+        //  evil -- use the same name as the table
+        ("test.c2", Arc::new(c2) as ArrayRef),
+    ])?;
+
+    let table = MemTable::try_new(batch.schema(), vec![vec![batch]])?;
+    ctx.register_table("test", Arc::new(table))?;
+
+    // referring to the unquoted column is an error
+    let sql = format!(r#"SELECT f1.c1 from test"#);
+    let error = ctx.create_logical_plan(&sql).unwrap_err();
+    assert_contains!(
+        error.to_string(),
+        "No field named 'f1.c1'. Valid fields are 'test.f.c1', 'test.test.c2'"
+    );
+
+    // however, enclosing it in double quotes is ok
+    let sql = format!(r#"SELECT "f.c1" from test"#);
+    let actual = execute_to_batches(&mut ctx, &sql).await;
+    let expected = vec![
+        "+--------+",
+        "| f.c1   |",

Review comment:
       i wonder if it is okay to create column `....`

##########
File path: datafusion/tests/sql.rs
##########
@@ -5426,6 +5426,67 @@ async fn qualified_table_references() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn qualified_table_references_and_fields() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+
+    let c1: StringArray = vec!["foofoo", "foobar", "foobaz"]
+        .into_iter()
+        .map(Some)
+        .collect();
+    let c2: Int64Array = vec![1, 2, 3].into_iter().map(Some).collect();
+
+    let batch = RecordBatch::try_from_iter(vec![
+        ("f.c1", Arc::new(c1) as ArrayRef),
+        //  evil -- use the same name as the table
+        ("test.c2", Arc::new(c2) as ArrayRef),
+    ])?;
+
+    let table = MemTable::try_new(batch.schema(), vec![vec![batch]])?;
+    ctx.register_table("test", Arc::new(table))?;
+
+    // referring to the unquoted column is an error
+    let sql = format!(r#"SELECT f1.c1 from test"#);
+    let error = ctx.create_logical_plan(&sql).unwrap_err();
+    assert_contains!(
+        error.to_string(),
+        "No field named 'f1.c1'. Valid fields are 'test.f.c1', 'test.test.c2'"
+    );
+
+    // however, enclosing it in double quotes is ok
+    let sql = format!(r#"SELECT "f.c1" from test"#);
+    let actual = execute_to_batches(&mut ctx, &sql).await;
+    let expected = vec![
+        "+--------+",
+        "| f.c1   |",

Review comment:
       i wonder if it is okay to create column `"...."`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org