You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "徐洲 (Jira)" <ji...@apache.org> on 2019/10/10 07:36:00 UTC

[jira] [Created] (IMPALA-9035) Simplify cast string to timestamp

徐洲 created IMPALA-9035:
--------------------------

             Summary: Simplify cast string to timestamp
                 Key: IMPALA-9035
                 URL: https://issues.apache.org/jira/browse/IMPALA-9035
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: 徐洲
         Attachments: image-2019-10-10-15-22-31-320.png

Some BI tools generate compatible sqls to cast string to timestamp:

cast(unix_timestamp('timestr', 'fmt') as timestamp) or cast(unix_timestamp('timestr') as timestamp)

The internal type change in impala: StringVal->TimestampValue->BigIntVal→TimestampValue→TimestampVal

In earlier 2.x versions,  casting TimestampValue to BigIntVal using libc functions which is more expensive than current design with CCTZ.

However, the cast to BIgIntVal seems to be redundant. We can simplify cast(unix_timestamp('timestr', 'fmt') as timestamp) to to_timestamp('timestr', 'fmt'), simplify cast(unix_timestamp('timestr') as timestamp) to cast('timestr' as timestamp).

I managed to modify the fe to support such expr rewriting.

Benefiting from the rewriting, the query time cost is reduced from 2.52s to 2.02s.

Here is the sql:

SELECT `t2`.`phy_category1_name` `d0`, SUM(`t1`.`apply_count`) `m0` FROM `tr_orders` `t1` LEFT JOIN `dim_sku` `t2` ON (`t1`.`sku_id` = `t2`.`sku_id`) WHERE (((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) >= cast('2018-08-26' as timestamp)) AND ((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) < cast('2019-09-02' as timestamp))) GROUP BY `t2`.`phy_category1_name` order by d0;
Query: SELECT `t2`.`phy_category1_name` `d0`, SUM(`t1`.`apply_count`) `m0` FROM `tr_orders` `t1` LEFT JOIN `dim_sku` `t2` ON (`t1`.`sku_id` = `t2`.`sku_id`) WHERE (((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) >= cast('2018-08-26' as timestamp)) AND ((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) < cast('2019-09-02' as timestamp))) GROUP BY `t2`.`phy_category1_name` order by d0

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)