You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "zhaowei (JIRA)" <ji...@apache.org> on 2011/02/14 05:44:58 UTC

[jira] Created: (HIVE-1986) partition pruner do not take effect for non-deterministic UDF

partition pruner do not take effect for non-deterministic UDF
-------------------------------------------------------------

                 Key: HIVE-1986
                 URL: https://issues.apache.org/jira/browse/HIVE-1986
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.6.0, 0.5.0, 0.4.1, 0.7.0
         Environment: trunk-src,hive default configure
            Reporter: zhaowei
             Fix For: 0.7.0


hive udf can be deterministic or non-deterministic,but for non-deterministic udf such as rand and unix_timestamp,ppr do not take effect.
and for unix_timestamp with para, for example unix_timestamp('2010-01-01'),I think it is deterministic.

case :

hive -hiveconf hive.root.logger=DEBUG,console

create kv_part(key int,value string) partitioned by(ds string);
alter table kv_part add partition (ds=2010) partition (ds=2011) partition (ds=2012);

create kv2(key int,value string) partitioned by(ds string);
alter table kv2 add partition (ds=2013) partition (ds=2014) partition (ds=2015);

explain select * from kv_part join kv2 on(kv_part.key=kv2.key) where kv_part.ds=2011 and rand() > 0.5


rand() is non-deterministic ,so kv_part.ds=2011 no not filter the partition ds=2010,ds=2012
.....
11/02/14 12:22:32 DEBUG lazy.LazySimpleSerDe: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with: columnNames=[key, value] columnTypes=[int, string] separator=[[B@1ac9683] nullstring=\N lastColumnTakesRest=false
11/02/14 12:22:32 INFO hive.log: DDL: struct kv_part { i32 key, string value}
11/02/14 12:22:32 DEBUG optimizer.GenMapRedUtils: Information added for path hdfs://172.25.38.253:54310/user/hive/warehouse/kv_part/ds=2010
11/02/14 12:22:32 DEBUG optimizer.GenMapRedUtils: Information added for path hdfs://172.25.38.253:54310/user/hive/warehouse/kv_part/ds=2011
11/02/14 12:22:32 DEBUG optimizer.GenMapRedUtils: Information added for path hdfs://172.25.38.253:54310/user/hive/warehouse/kv_part/ds=2012
11/02/14 12:22:32 INFO parse.SemanticAnalyzer: Completed plan generation
.....

explain select * from kv_part join kv2 on(kv_part.key=kv2.key) where kv_part.ds=2011 and sin(kv2.key) < 0.5;

sin() is deterministic,so ppr work ok
.....
11/02/14 12:25:22 DEBUG optimizer.GenMapRedUtils: Information added for path hdfs://172.25.38.253:54310/user/hive/warehouse/kv_part/ds=2011
....

And user should get the deterministic info for UDF from wiki,or we shoud add this info to describe function


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira