You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/07 22:24:42 UTC

[GitHub] [arrow-datafusion] sarahyurick opened a new pull request, #3763: Support datetime keywords in functions

sarahyurick opened a new pull request, #3763:
URL: https://github.com/apache/arrow-datafusion/pull/3763

   # Rationale for this change
   While working on the `TIMESTAMPADD` function for https://github.com/dask-contrib/dask-sql/pull/813, I was having trouble parsing something like `TIMESTAMPADD(YEAR, 2, d)` (SchemaError FieldNotField) because `YEAR` is not a column name in the table.
   
   # What changes are included in this PR?
   I created a new function to iterate through the arguments of a function and check if any of them are unquoted datetime keywords. If so, I essentially modify the argument to be a quoted string instead.
   
   # Are there any user-facing changes?
   None.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #3763: Support datetime keywords in functions

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #3763:
URL: https://github.com/apache/arrow-datafusion/pull/3763#issuecomment-1274637788

   Marking as draft to indicate this is waiting on some items -- please mark it as ready for review when it is ready for another round of review. Thanks again @sarahyurick 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on pull request #3763: Support datetime keywords in functions

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #3763:
URL: https://github.com/apache/arrow-datafusion/pull/3763#issuecomment-1273955139

   I plan on reviewing this tomorrow @sarahyurick 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] sarahyurick commented on pull request #3763: Support datetime keywords in functions

Posted by GitBox <gi...@apache.org>.
sarahyurick commented on PR #3763:
URL: https://github.com/apache/arrow-datafusion/pull/3763#issuecomment-1272135142

   One other thing that I considered was that perhaps the table _does_ have a column called "year", etc. These changes don't affect anything like `SELECT year FROM table`, only functions (`SELECT my_func(year, bla, blah, month, etc) FROM table`).
   
   We could have the function first check whether or not a field called "year" already exists (such as with the existing `fields_with_unqualified_name` function) and act accordingly. The only case where this wouldn't work is if we had a table with a column called "year" _and_ we were doing a `TIMESTAMPADD` with it, like: `TIMESTAMPADD(year, 1, year)` (where the first "year" is a DateTimeField and the second is the column name).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on pull request #3763: Support datetime keywords in functions

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #3763:
URL: https://github.com/apache/arrow-datafusion/pull/3763#issuecomment-1276394804

   @sarahyurick I created a PR in dask-sql to add support for parsing `timestampadd(year, ...)` using a custom prefix parser. This means that we do not need changes in DataFusion.
   
   https://github.com/dask-contrib/dask-sql/pull/857
   
   @alamb You may be interested to see a real-world use case for the new functionality of overriding a prefix parser.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #3763: Support datetime keywords in functions

Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #3763:
URL: https://github.com/apache/arrow-datafusion/pull/3763#discussion_r992828969


##########
datafusion/sql/src/planner.rs:
##########
@@ -5309,6 +5456,19 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_datetime_keyword_conversion() {
+        // Field "year" does not exist in table.
+        let sql_expr = "SELECT AVG(year) FROM person";
+        let err1 = logical_plan(sql_expr).expect_err("query should have failed");
+        assert_plan_error(err1, "Avg");

Review Comment:
   > The general idea of this test is that originally, the query would have failed with a "Field Not Found" error because "year" isn't in the "person" table. 
   
   That sounds like the correct behavior to me. Here is an example from Postgres:
   
   ```
   postgres=# select avg(year) from person;
   ERROR:  column "year" does not exist
   LINE 1: select avg(year) from person;
   ```
   
   > However, because Avg is a function and "year" isn't in the table, we want to convert "year" to a datetime keyword to check if it is valid, so with my changes it fails with a "The function Avg does not support inputs of type Utf8" instead.
   
   I think it would be useful to have a test of a valid use of year as a datetime keyword, as I am not fully understanding the goal here yet.
   
   > This is a pretty roundabout way of testing the functionality, but because these changes were implemented to allow datetime keyword conversion before calling something like Timestampadd in Dask-SQL, which DataFusion does not support, this was the alternative testing method I came up with. Let me know what you think.
   
   I will ping you directly with questions about the Dask SQL requirement driving this.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] sarahyurick commented on a diff in pull request #3763: Support datetime keywords in functions

Posted by GitBox <gi...@apache.org>.
sarahyurick commented on code in PR #3763:
URL: https://github.com/apache/arrow-datafusion/pull/3763#discussion_r992663284


##########
datafusion/sql/src/planner.rs:
##########
@@ -5309,6 +5456,19 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_datetime_keyword_conversion() {
+        // Field "year" does not exist in table.
+        let sql_expr = "SELECT AVG(year) FROM person";
+        let err1 = logical_plan(sql_expr).expect_err("query should have failed");
+        assert_plan_error(err1, "Avg");

Review Comment:
   The general idea of this test is that originally, the query would have failed with a "Field Not Found" error because "year" isn't in the "person" table. However, because Avg is a function and "year" isn't in the table, we want to convert "year" to a datetime keyword to check if it is valid, so with my changes it fails with a "The function Avg does not support inputs of type Utf8" instead.
   
   This is a pretty roundabout way of testing the functionality, but because these changes were implemented to allow datetime keyword conversion before calling something like Timestampadd in Dask-SQL, which DataFusion does not support, this was the alternative testing method I came up with.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] sarahyurick commented on a diff in pull request #3763: Support datetime keywords in functions

Posted by GitBox <gi...@apache.org>.
sarahyurick commented on code in PR #3763:
URL: https://github.com/apache/arrow-datafusion/pull/3763#discussion_r992663284


##########
datafusion/sql/src/planner.rs:
##########
@@ -5309,6 +5456,19 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_datetime_keyword_conversion() {
+        // Field "year" does not exist in table.
+        let sql_expr = "SELECT AVG(year) FROM person";
+        let err1 = logical_plan(sql_expr).expect_err("query should have failed");
+        assert_plan_error(err1, "Avg");

Review Comment:
   The general idea of this test is that originally, the query would have failed with a "Field Not Found" error because "year" isn't in the "person" table. However, because Avg is a function and "year" isn't in the table, we want to convert "year" to a datetime keyword to check if it is valid, so with my changes it fails with a "The function Avg does not support inputs of type Utf8" instead.
   
   This is a pretty roundabout way of testing the functionality, but because these changes were implemented to allow datetime keyword conversion before calling something like Timestampadd in Dask-SQL, which DataFusion does not support, this was the alternative testing method I came up with. Let me know what you think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] sarahyurick closed pull request #3763: Support datetime keywords in functions

Posted by GitBox <gi...@apache.org>.
sarahyurick closed pull request #3763: Support datetime keywords in functions
URL: https://github.com/apache/arrow-datafusion/pull/3763


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #3763: Support datetime keywords in functions

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #3763:
URL: https://github.com/apache/arrow-datafusion/pull/3763#issuecomment-1274625168

   Looks like there is a small clippy failuire


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] sarahyurick commented on pull request #3763: Support datetime keywords in functions

Posted by GitBox <gi...@apache.org>.
sarahyurick commented on PR #3763:
URL: https://github.com/apache/arrow-datafusion/pull/3763#issuecomment-1275116991

   I'm not really sure why the cargo check is failing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] sarahyurick commented on pull request #3763: Support datetime keywords in functions

Posted by GitBox <gi...@apache.org>.
sarahyurick commented on PR #3763:
URL: https://github.com/apache/arrow-datafusion/pull/3763#issuecomment-1276415504

   Cool, thanks @andygrove 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org