You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "xuzhou (Jira)" <ji...@apache.org> on 2020/01/10 02:39:00 UTC

[jira] [Resolved] (IMPALA-9035) Simplify cast string to timestamp

     [ https://issues.apache.org/jira/browse/IMPALA-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

xuzhou resolved IMPALA-9035.
----------------------------
    Resolution: Fixed

> Simplify cast string to timestamp
> ---------------------------------
>
>                 Key: IMPALA-9035
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9035
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: 徐洲
>            Assignee: xuzhou
>            Priority: Critical
>
> Some BI tools generate compatible sqls to cast string to timestamp:
> cast(unix_timestamp('timestr', 'fmt') as timestamp) or cast(unix_timestamp('timestr') as timestamp)
> The internal type change in impala: StringVal->TimestampValue->BigIntVal→TimestampValue→TimestampVal
> In earlier 2.x versions,  casting TimestampValue to BigIntVal using libc functions which is more expensive than current design with CCTZ.
> However, the cast to BIgIntVal seems to be redundant. We can simplify cast(unix_timestamp('timestr', 'fmt') as timestamp) to to_timestamp('timestr', 'fmt'), simplify cast(unix_timestamp('timestr') as timestamp) to cast('timestr' as timestamp).
> I managed to modify the fe to support such expr rewriting.
> Benefiting from the rewriting, the query time cost is reduced from 2.52s to 2.02s.
> Here is the sql:
> SELECT `t2`.`phy_category1_name` `d0`, SUM(`t1`.`apply_count`) `m0` FROM `tr_orders` `t1` LEFT JOIN `dim_sku` `t2` ON (`t1`.`sku_id` = `t2`.`sku_id`) WHERE (((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) >= cast('2018-08-26' as timestamp)) AND ((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) < cast('2019-09-02' as timestamp))) GROUP BY `t2`.`phy_category1_name` order by d0;
> Query: SELECT `t2`.`phy_category1_name` `d0`, SUM(`t1`.`apply_count`) `m0` FROM `tr_orders` `t1` LEFT JOIN `dim_sku` `t2` ON (`t1`.`sku_id` = `t2`.`sku_id`) WHERE (((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) >= cast('2018-08-26' as timestamp)) AND ((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) < cast('2019-09-02' as timestamp))) GROUP BY `t2`.`phy_category1_name` order by d0
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)