You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/23 10:43:01 UTC

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #387: Optimize to_timestamp function

alamb commented on a change in pull request #387:
URL: https://github.com/apache/arrow-datafusion/pull/387#discussion_r637523360



##########
File path: datafusion/src/optimizer/constant_folding.rs
##########
@@ -217,6 +218,29 @@ impl<'a> ExprRewriter for ConstantRewriter<'a> {
                     .query_execution_start_time
                     .timestamp_nanos(),
             ))),
+            Expr::ScalarFunction {

Review comment:
       Matching on the literal argument is good as the `rewrite` pass happens *after* all children are visited 👍 
   

##########
File path: datafusion/src/optimizer/constant_folding.rs
##########
@@ -217,6 +218,29 @@ impl<'a> ExprRewriter for ConstantRewriter<'a> {
                     .query_execution_start_time
                     .timestamp_nanos(),
             ))),
+            Expr::ScalarFunction {
+                fun: BuiltinScalarFunction::ToTimestamp,
+                args,
+            } => {
+                if !args.is_empty() {
+                    match &args[0] {
+                        Expr::Literal(ScalarValue::Utf8(Some(val))) => {
+                            Expr::Literal(ScalarValue::TimestampNanosecond(
+                                string_to_timestamp_nanos(val).ok(),

Review comment:
       I don't think the `ok()` here preserves the same semantics.
   
   I tested what happens when an invalid argument is passed to `to_timestamp` on master (via `cargo run -p datafusion-cli`):
   ```
   > select to_timestamp('foo');
   ArrowError(ExternalError(Execution("Error parsing \'foo\' as timestamp")))
   ```
   
   On this branch
   ```
   > select to_timestamp('foo');
   +---------------------------+
   | TimestampNanosecond(NULL) |
   +---------------------------+
   |                           |
   +---------------------------+
   ```
   
   
   I suggest that if the `to_timestamp(..)` call would generate a runtime error, then the optimizer pass should just leave the expression as is (aka don't do the rewrite and leave the original expression)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org