You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/12 14:34:30 UTC

[GitHub] [arrow-datafusion] ReggieFan opened a new pull request, #2211: fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL

ReggieFan opened a new pull request, #2211:
URL: https://github.com/apache/arrow-datafusion/pull/2211

   # Which issue does this PR close?
   Closes #2210
   
    # Rationale for this change
   fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL
   
   # What changes are included in this PR?
   Make the tablename and filed name case-insensitive in SQL. 
   
   # Are there any user-facing changes?
   Field name with uppercase letters can be used in SQL correctly and SQL statement will be case-insensitive. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #2211: fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL

Posted by GitBox <gi...@apache.org>.
jackwener commented on code in PR #2211:
URL: https://github.com/apache/arrow-datafusion/pull/2211#discussion_r849645647


##########
datafusion/core/tests/sql/select.rs:
##########
@@ -1001,3 +1001,41 @@ async fn query_empty_table() {
     let expected = vec!["++", "++"];
     assert_batches_sorted_eq!(expected, &result);
 }
+
+#[tokio::test]
+async fn case_insensitive_in_sql() -> Result<()> {
+    // Test that field name and table name in sql is case-insensitive
+    let fields = vec![
+        Field::new("Column1", DataType::Utf8, true),
+        Field::new("COLUMN2", DataType::Utf8, true),
+        Field::new("column3", DataType::Utf8, true),
+    ];
+    let schema = Schema::new(fields);
+    let a = StringArray::from(vec!["content1"]);
+    let b = StringArray::from(vec!["content2"]);
+    let c = StringArray::from(vec!["content3"]);
+    let record_batch = RecordBatch::try_new(
+        Arc::new(schema),
+        vec![Arc::new(a), Arc::new(b), Arc::new(c)],
+    )
+    .unwrap();
+    let table =
+        MemTable::try_new(record_batch.schema(), vec![vec![record_batch]]).unwrap();
+
+    let ctx = SessionContext::new();
+    ctx.register_table("test", Arc::new(table))?;
+
+    // Select with lower and upper case
+    let sql = "SELECT COLumn1,colUMN2,column3 FROM test where TEst.COLUMN1='content1' and column2='content2' and COLumn3='content3'";

Review Comment:
   minor change
   ```suggestion
       let sql = "SELECT COLumn1, colUMN2, column3 FROM test WHERE TEst.COLUMN1='content1' and column2='content2' and COLumn3='content3'";
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ReggieFan closed pull request #2211: fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL

Posted by GitBox <gi...@apache.org>.
ReggieFan closed pull request #2211: fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL
URL: https://github.com/apache/arrow-datafusion/pull/2211


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] liukun4515 commented on pull request #2211: fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on PR #2211:
URL: https://github.com/apache/arrow-datafusion/pull/2211#issuecomment-1097615212

   @ReggieFan thanks for your contributions.
   Could you please add the test for this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2211: fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL

Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #2211:
URL: https://github.com/apache/arrow-datafusion/pull/2211#discussion_r849855154


##########
datafusion/core/tests/sql/select.rs:
##########
@@ -1001,3 +1001,41 @@ async fn query_empty_table() {
     let expected = vec!["++", "++"];
     assert_batches_sorted_eq!(expected, &result);
 }
+
+#[tokio::test]
+async fn case_insensitive_in_sql() -> Result<()> {
+    // Test that field name and table name in sql is case-insensitive
+    let fields = vec![
+        Field::new("Column1", DataType::Utf8, true),

Review Comment:
   It seems like we should have had a unit test to catch the regression. I filed https://github.com/apache/arrow-datafusion/issues/2227 to add one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on pull request #2211: fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #2211:
URL: https://github.com/apache/arrow-datafusion/pull/2211#issuecomment-1098242516

   I haven't reviewed the entire codebase for how we handle case sensitivity of identifiers but it looks like DataFusion is case-sensitive today and this PR proposes to change it to case-insensitive.
   
   Postgres supports both case-sensitive and case-insensitive identifiers, depending on whether they are quoted or not, as described in https://www.postgresql.org/docs/current/sql-syntax-lexical.html
   
   Perhaps it would be better to follow Postgres's approach of converting unquoted identifiers to lower-case internally rather than change comparisons to use case-insensitive matching?
   
   @alamb What do you think?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2211: fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2211:
URL: https://github.com/apache/arrow-datafusion/pull/2211#discussion_r849761590


##########
datafusion/core/tests/sql/select.rs:
##########
@@ -1001,3 +1001,41 @@ async fn query_empty_table() {
     let expected = vec!["++", "++"];
     assert_batches_sorted_eq!(expected, &result);
 }
+
+#[tokio::test]
+async fn case_insensitive_in_sql() -> Result<()> {
+    // Test that field name and table name in sql is case-insensitive
+    let fields = vec![
+        Field::new("Column1", DataType::Utf8, true),

Review Comment:
   This table has schema equivalent to doing the following (note the `"` are very important as they say to not normalize case):
   
   ```sql
   CREATE TABLE test(
     "Column1" varchar,
     "COLUMN2" varchar,
     "column3" varchar
   )
   ```
   
   And in that case, I would expect 
   
   ```sql
   "SELECT COLumn1, colUMN2, column3 FROM test WHERE TEst.COLUMN1='content1' and column2='content2' and COLumn3='content3'"
   ``` 
   
   to fail. And in fact it does in postgres:
   
   ```sql
   alamb=# CREATE TABLE test(
     "Column1" varchar,
     "COLUMN2" varchar,
     "column3" varchar
   );
   CREATE TABLE
   alamb=# SELECT COLumn1, colUMN2, column3 FROM test WHERE TEst.COLUMN1='content1' and column2='content2' and COLumn3='content3';
   ERROR:  column "column1" does not exist
   LINE 1: SELECT COLumn1, colUMN2, column3 FROM test WHERE TEst.COLUMN...
                  ^
   HINT:  Perhaps you meant to reference the column "test.Column1" or the column "test.column3".
   alamb=# 
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2211: fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2211:
URL: https://github.com/apache/arrow-datafusion/pull/2211#discussion_r850374596


##########
datafusion/core/tests/sql/select.rs:
##########
@@ -1001,3 +1001,41 @@ async fn query_empty_table() {
     let expected = vec!["++", "++"];
     assert_batches_sorted_eq!(expected, &result);
 }
+
+#[tokio::test]
+async fn case_insensitive_in_sql() -> Result<()> {
+    // Test that field name and table name in sql is case-insensitive
+    let fields = vec![
+        Field::new("Column1", DataType::Utf8, true),

Review Comment:
   Sorry -- I should have been more specific -- I don't think #2211 is a bug. I left a comment there explaining why: https://github.com/apache/arrow-datafusion/issues/2210#issuecomment-1099113763



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ReggieFan commented on a diff in pull request #2211: fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL

Posted by GitBox <gi...@apache.org>.
ReggieFan commented on code in PR #2211:
URL: https://github.com/apache/arrow-datafusion/pull/2211#discussion_r850400855


##########
datafusion/core/tests/sql/select.rs:
##########
@@ -1001,3 +1001,41 @@ async fn query_empty_table() {
     let expected = vec!["++", "++"];
     assert_batches_sorted_eq!(expected, &result);
 }
+
+#[tokio::test]
+async fn case_insensitive_in_sql() -> Result<()> {
+    // Test that field name and table name in sql is case-insensitive
+    let fields = vec![
+        Field::new("Column1", DataType::Utf8, true),

Review Comment:
   Thanks, i got it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ReggieFan commented on a diff in pull request #2211: fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL

Posted by GitBox <gi...@apache.org>.
ReggieFan commented on code in PR #2211:
URL: https://github.com/apache/arrow-datafusion/pull/2211#discussion_r850032198


##########
datafusion/core/tests/sql/select.rs:
##########
@@ -1001,3 +1001,41 @@ async fn query_empty_table() {
     let expected = vec!["++", "++"];
     assert_batches_sorted_eq!(expected, &result);
 }
+
+#[tokio::test]
+async fn case_insensitive_in_sql() -> Result<()> {
+    // Test that field name and table name in sql is case-insensitive
+    let fields = vec![
+        Field::new("Column1", DataType::Utf8, true),

Review Comment:
   This PR is to fix the issue #2211 , so i make it case-insensitive to solve it. I think that if we want to keep it case-sensitive, we should solve the problem in another way.
   
   Accroding to the error: No field named 'test.column1'. i think the field name was converted to lower-case and compare to the real filed name. If the real field name contains uppercase letters, it would always fail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2211: fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2211:
URL: https://github.com/apache/arrow-datafusion/pull/2211#discussion_r849761590


##########
datafusion/core/tests/sql/select.rs:
##########
@@ -1001,3 +1001,41 @@ async fn query_empty_table() {
     let expected = vec!["++", "++"];
     assert_batches_sorted_eq!(expected, &result);
 }
+
+#[tokio::test]
+async fn case_insensitive_in_sql() -> Result<()> {
+    // Test that field name and table name in sql is case-insensitive
+    let fields = vec![
+        Field::new("Column1", DataType::Utf8, true),

Review Comment:
   This table has schema equivalent to doing the following (note the `"` are very important as they say to not normalize case):
   
   ```sql
   CREATE TABLE test(
     "Column1" varchar,
     "COLUMN2" varchar,
     "column3" varchar
   )
   ```
   
   And in that case, I would expect 
   
   ```sql
   "SELECT COLumn1, colUMN2, column3 FROM test WHERE TEst.COLUMN1='content1' and column2='content2' and COLumn3='content3'"
   ``` 
   
   to fail



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ReggieFan commented on pull request #2211: fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL

Posted by GitBox <gi...@apache.org>.
ReggieFan commented on PR #2211:
URL: https://github.com/apache/arrow-datafusion/pull/2211#issuecomment-1097948321

   @liukun4515 Sure, i'ill make it up later. I ran into another problem now, can the field name contains '.' ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2211: fix: ‘Invalid identifier #xxx’ caused by Case-to-case conversion in SQL

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2211:
URL: https://github.com/apache/arrow-datafusion/pull/2211#issuecomment-1100137962

   Thanks again @ReggieFan  -- I put up a "negative test case" PR in https://github.com/apache/arrow-datafusion/pull/2243 as suggested by @andygrove 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org