You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Zhihua Deng (Jira)" <ji...@apache.org> on 2020/07/22 03:47:00 UTC
[jira] [Updated] (HIVE-23893) Extract deterministic conditions for
pdd when the predicate contains non-deterministic function
[ https://issues.apache.org/jira/browse/HIVE-23893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhihua Deng updated HIVE-23893:
-------------------------------
Description:
Taken the following query for example, assume unix_timestamp is non-deterministic before version 1.3.0:
{{SELECT}}
{{ from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') AS ft,}}
{{ b.game_id AS game_id,}}
{{ b.game_name AS game_name,}}
{{ count(DISTINCT a.sha1_imei) uv}}
{{FROM}}
{{ gamesdk_userprofile a}}
{{ JOIN game_info_all b ON a.appid = b.dev_app_id}}
{{WHERE}}
{{ a.date = 20200704}}
{{ AND from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') = 20200704}}
{{ AND b.date = 20200704}}
{{GROUP BY}}
{{ from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd'),}}
{{ b.game_id,}}
{{ b.game_name}}
{{ORDER BY}}
{{ uv DESC}}
{{LIMIT 200;}}
The predicates(a.date = 20200704, b.date = 20200704) are unable to push down to join op, make the optimizer unable to prune partitions, which may result to a full scan on tables gamesdk_userprofile and game_info_all.
was:
Taken the following query for example, assume unix_timestamp is non-deterministic before version 1.3.0:
{{SELECT}}
{{ from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') AS ft,}}
{{ b.game_id AS game_id,}}
{{ b.game_name AS game_name,}}
{{ count(DISTINCT a.sha1_imei) uv}}
{{FROM}}
{{ gamesdk_userprofile a}}
{{ JOIN game_info_all b ON a.appid = b.dev_app_id}}
{{WHERE}}
{{ a.date = 20200704}}
{{ AND from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') = 20200704}}
{{ AND b.date = 20200704}}
{{GROUP BY}}
{{ from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd'),}}
{{ b.game_id,}}
{{ b.game_name}}
{{ORDER BY}}
{{ uv DESC}}
{{LIMIT 200;}}
The predicates(a.date = 20200704, b.date = 20200704) are unable to push down to join op, make the optimizer unable to prune partitions, which may result to a full scan on tables gamesdk_userprofile and game_info_all.
{{}}
> Extract deterministic conditions for pdd when the predicate contains non-deterministic function
> -----------------------------------------------------------------------------------------------
>
> Key: HIVE-23893
> URL: https://issues.apache.org/jira/browse/HIVE-23893
> Project: Hive
> Issue Type: Improvement
> Components: Logical Optimizer
> Reporter: Zhihua Deng
> Priority: Major
>
> Taken the following query for example, assume unix_timestamp is non-deterministic before version 1.3.0:
>
> {{SELECT}}
> {{ from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') AS ft,}}
> {{ b.game_id AS game_id,}}
> {{ b.game_name AS game_name,}}
> {{ count(DISTINCT a.sha1_imei) uv}}
> {{FROM}}
> {{ gamesdk_userprofile a}}
> {{ JOIN game_info_all b ON a.appid = b.dev_app_id}}
> {{WHERE}}
> {{ a.date = 20200704}}
> {{ AND from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') = 20200704}}
> {{ AND b.date = 20200704}}
> {{GROUP BY}}
> {{ from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd'),}}
> {{ b.game_id,}}
> {{ b.game_name}}
> {{ORDER BY}}
> {{ uv DESC}}
> {{LIMIT 200;}}
>
> The predicates(a.date = 20200704, b.date = 20200704) are unable to push down to join op, make the optimizer unable to prune partitions, which may result to a full scan on tables gamesdk_userprofile and game_info_all.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)