You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Andrey Pankov <ap...@iponweb.net> on 2009/11/04 17:49:21 UTC

questions regarding hive partitions

Hi guys,

Recently I started to investigate Hive, so far I have several questions.

1). I have a table A partitioned by column x INT. The table has
several partitions say x=1, x=2, x=10. Running query "select max(x)
from A" hive runs map-reduce job trying to find column x inside data
files. Is it possible / realizable to get Hive more smart and just do
simple query against metastore to get highest value of partition in
case when I'm interested in only (several) "meta" column, thats
available in metastore only?
Next one I guess is correct behavior but also could be fine to get
more sophisticated. When I'm doing "select x from A" it runs
map-reduce which is correct. As I understand in this case Hive returns
value of x multiplied by number of rows of any partition. But again,
"select distinct x from A" fires map-reduce...

2). It could be very useful if hive could use UDF or value of column
inside dataset when creating new partition. In particular:
     insert overwrite table B partition(ds=A.foo) select A.foo, A.bar from A;
     insert overwrite table B partition(x=unix_timestamp()) select
A.foo, A.bar from A;
So far I didn't find case when it's possible.

Thanks guys!

-- 
Andrey Pankov