You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Alexander Pivovarov (JIRA)" <ji...@apache.org> on 2017/02/08 05:29:41 UTC

[jira] [Commented] (HIVE-9988) Evaluating UDF before query is run

    [ https://issues.apache.org/jira/browse/HIVE-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857435#comment-15857435 ] 

Alexander Pivovarov commented on HIVE-9988:
-------------------------------------------

You can assign the expression to variable before query is evaluated and then use the variable in WHERE
{code}
set dt=from_unixtime(unix_timestamp(),'yyyyMMdd');

select * from A where dt=${hiveconf:dt};
{code}

> Evaluating UDF before query is run
> ----------------------------------
>
>                 Key: HIVE-9988
>                 URL: https://issues.apache.org/jira/browse/HIVE-9988
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ådne Brunborg
>
> When using UDFs on partition column in Hive, all partitions are scanned before the UDF is resolved. 
> If the UDF could be evaluated before query is run, this would greatly improve performance in cases like this.
> Example - the table has a partition by datestamp (bigint): 
> The following where clause touches upon all 82 partitions:
> {{WHERE datestamp=cast(from_unixtime(unix_timestamp(),'yyyyMMdd') as bigint)}}
> {{15/03/16 09:21:53 INFO mapred.FileInputFormat: Total input paths to process : 82}}
> …whereas the following only touches the one partition:
> {{WHERE datestamp=20150316}}
> {{15/03/16 09:23:06 INFO input.FileInputFormat: Total input paths to process : 1}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)