You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Zhihua Deng (Jira)" <ji...@apache.org> on 2020/07/22 03:47:00 UTC

[jira] [Updated] (HIVE-23893) Extract deterministic conditions for pdd when the predicate contains non-deterministic function

     [ https://issues.apache.org/jira/browse/HIVE-23893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihua Deng updated HIVE-23893:
-------------------------------
    Description: 
Taken the following query for example, assume unix_timestamp is non-deterministic before version 1.3.0:
  
 {{SELECT}}
 {{        from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') AS ft,}}
 {{        b.game_id AS game_id,}}
 {{        b.game_name AS game_name,}}
 {{        count(DISTINCT a.sha1_imei) uv}}
 {{FROM}}
 {{        gamesdk_userprofile a}}
 {{        JOIN game_info_all b ON a.appid = b.dev_app_id}}
 {{WHERE}}
 {{        a.date = 20200704}}
 {{        AND from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') = 20200704}}
 {{        AND b.date = 20200704}}
 {{GROUP BY}}
 {{        from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd'),}}
 {{        b.game_id,}}
 {{        b.game_name}}
 {{ORDER BY}}
 {{        uv DESC}}
 {{LIMIT 200;}}
  
 The predicates(a.date = 20200704, b.date = 20200704) are unable to push down to join op, make the optimizer unable to prune partitions, which may result  to a full scan on tables gamesdk_userprofile and game_info_all.

  was:
Taken the following query for example, assume unix_timestamp is non-deterministic before version 1.3.0:
 
{{SELECT}}
{{        from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') AS ft,}}
{{        b.game_id AS game_id,}}
{{        b.game_name AS game_name,}}
{{        count(DISTINCT a.sha1_imei) uv}}
{{FROM}}
{{        gamesdk_userprofile a}}
{{        JOIN game_info_all b ON a.appid = b.dev_app_id}}
{{WHERE}}
{{        a.date = 20200704}}
{{        AND from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') = 20200704}}
{{        AND b.date = 20200704}}
{{GROUP BY}}
{{        from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd'),}}
{{        b.game_id,}}
{{        b.game_name}}
{{ORDER BY}}
{{        uv DESC}}
{{LIMIT 200;}}
 
The predicates(a.date = 20200704, b.date = 20200704) are unable to push down to join op, make the optimizer unable to prune partitions, which may result  to a full scan on tables gamesdk_userprofile and game_info_all.
{{}}


> Extract deterministic conditions for pdd when the predicate contains non-deterministic function
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-23893
>                 URL: https://issues.apache.org/jira/browse/HIVE-23893
>             Project: Hive
>          Issue Type: Improvement
>          Components: Logical Optimizer
>            Reporter: Zhihua Deng
>            Priority: Major
>
> Taken the following query for example, assume unix_timestamp is non-deterministic before version 1.3.0:
>   
>  {{SELECT}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') AS ft,}}
>  {{        b.game_id AS game_id,}}
>  {{        b.game_name AS game_name,}}
>  {{        count(DISTINCT a.sha1_imei) uv}}
>  {{FROM}}
>  {{        gamesdk_userprofile a}}
>  {{        JOIN game_info_all b ON a.appid = b.dev_app_id}}
>  {{WHERE}}
>  {{        a.date = 20200704}}
>  {{        AND from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') = 20200704}}
>  {{        AND b.date = 20200704}}
>  {{GROUP BY}}
>  {{        from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd'),}}
>  {{        b.game_id,}}
>  {{        b.game_name}}
>  {{ORDER BY}}
>  {{        uv DESC}}
>  {{LIMIT 200;}}
>   
>  The predicates(a.date = 20200704, b.date = 20200704) are unable to push down to join op, make the optimizer unable to prune partitions, which may result  to a full scan on tables gamesdk_userprofile and game_info_all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)