You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "徐洲 (Jira)" <ji...@apache.org> on 2019/10/10 07:36:00 UTC
[jira] [Created] (IMPALA-9035) Simplify cast string to timestamp
徐洲 created IMPALA-9035:
--------------------------
Summary: Simplify cast string to timestamp
Key: IMPALA-9035
URL: https://issues.apache.org/jira/browse/IMPALA-9035
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Reporter: 徐洲
Attachments: image-2019-10-10-15-22-31-320.png
Some BI tools generate compatible sqls to cast string to timestamp:
cast(unix_timestamp('timestr', 'fmt') as timestamp) or cast(unix_timestamp('timestr') as timestamp)
The internal type change in impala: StringVal->TimestampValue->BigIntVal→TimestampValue→TimestampVal
In earlier 2.x versions, casting TimestampValue to BigIntVal using libc functions which is more expensive than current design with CCTZ.
However, the cast to BIgIntVal seems to be redundant. We can simplify cast(unix_timestamp('timestr', 'fmt') as timestamp) to to_timestamp('timestr', 'fmt'), simplify cast(unix_timestamp('timestr') as timestamp) to cast('timestr' as timestamp).
I managed to modify the fe to support such expr rewriting.
Benefiting from the rewriting, the query time cost is reduced from 2.52s to 2.02s.
Here is the sql:
SELECT `t2`.`phy_category1_name` `d0`, SUM(`t1`.`apply_count`) `m0` FROM `tr_orders` `t1` LEFT JOIN `dim_sku` `t2` ON (`t1`.`sku_id` = `t2`.`sku_id`) WHERE (((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) >= cast('2018-08-26' as timestamp)) AND ((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) < cast('2019-09-02' as timestamp))) GROUP BY `t2`.`phy_category1_name` order by d0;
Query: SELECT `t2`.`phy_category1_name` `d0`, SUM(`t1`.`apply_count`) `m0` FROM `tr_orders` `t1` LEFT JOIN `dim_sku` `t2` ON (`t1`.`sku_id` = `t2`.`sku_id`) WHERE (((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) >= cast('2018-08-26' as timestamp)) AND ((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) < cast('2019-09-02' as timestamp))) GROUP BY `t2`.`phy_category1_name` order by d0
--
This message was sent by Atlassian Jira
(v8.3.4#803005)