You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Prasad Chakka <pc...@facebook.com> on 2009/11/04 17:53:42 UTC

Re: [hive-users] questions regarding hive partitions

Hi Andrey,

hive-users@hadoop.apache.org is appropriate list for this question.

Both are improvements that can be useful but not supported yet.

For 1) you can use 'show partitions <tblname>' which will print out the sorted list of partitions.

There is no work around for 2) but this feature is very useful and has been requested for some time. Not sure if any one is working on it but contributions are welcome :)

Prasad

________________________________
From: Andrey Pankov <ap...@iponweb.net>
Date: Wed, 4 Nov 2009 07:51:03 -0800
To: <hi...@publists.facebook.com>
Subject: [hive-users] questions regarding hive partitions

Hi guys,

I recently started to investigate Hive, so far I have several questions.

1). I have a table A partitioned by column x INT. The table has
several partitions say x=1, x=2, x=10. Running query "select max(x)
from A" hive runs map-reduce job trying to find column x inside data
files. Is it possible / realizable to get Hive more smart and just do
simple query against metastore to get highest value of partition in
case when I'm interested in only (several) "meta" column, thats
available in metastore only?
Next one I guess is correct behavior but also could be fine to get
more sophisticated. When I'm doing "select x from A" it runs
map-reduce which is correct. As I understand in this case Hive returns
value of x multiplied by number of rows of any partition. But again,
"select distinct x from A" fires map-reduce...

2). It could be very useful if hive could use UDF or value of column
inside dataset when creating new partition. In particular:

      insert overwrite table B partition(ds=A.foo) select A.foo, A.bar from A;
      insert overwrite table B partition(x=unix_timestamp()) select
A.foo, A.bar from A;

So far I didn't find case when it's possible.

Thanks guys!

--
Andrey Pankov
_______________________________________________
hive-users mailing list
hive-users@publists.facebook.com
http://publists.facebook.com/mailman/listinfo/hive-users

Re: [hive-users] questions regarding hive partitions

Posted by Andrey Pankov <ap...@iponweb.net>.

Thanks Prasad

On Wed, Nov 4, 2009 at 18:53, Prasad Chakka <pc...@facebook.com> wrote:
> Hi Andrey,
>
> hive-users@hadoop.apache.org is appropriate list for this question.
>
> Both are improvements that can be useful but not supported yet.
>
> For 1) you can use ‘show partitions <tblname>’ which will print out the
> sorted list of partitions.
>
> There is no work around for 2) but this feature is very useful and has been
> requested for some time. Not sure if any one is working on it but
> contributions are welcome :)
>
> Prasad
>
> ________________________________
> From: Andrey Pankov <ap...@iponweb.net>
> Date: Wed, 4 Nov 2009 07:51:03 -0800
> To: <hi...@publists.facebook.com>
> Subject: [hive-users] questions regarding hive partitions
>
> Hi guys,
>
> I recently started to investigate Hive, so far I have several questions.
>
> 1). I have a table A partitioned by column x INT. The table has
> several partitions say x=1, x=2, x=10. Running query "select max(x)
> from A" hive runs map-reduce job trying to find column x inside data
> files. Is it possible / realizable to get Hive more smart and just do
> simple query against metastore to get highest value of partition in
> case when I'm interested in only (several) "meta" column, thats
> available in metastore only?
> Next one I guess is correct behavior but also could be fine to get
> more sophisticated. When I'm doing "select x from A" it runs
> map-reduce which is correct. As I understand in this case Hive returns
> value of x multiplied by number of rows of any partition. But again,
> "select distinct x from A" fires map-reduce...
>
> 2). It could be very useful if hive could use UDF or value of column
> inside dataset when creating new partition. In particular:
>
>       insert overwrite table B partition(ds=A.foo) select A.foo, A.bar from
> A;
>       insert overwrite table B partition(x=unix_timestamp()) select
> A.foo, A.bar from A;
>
> So far I didn't find case when it's possible.
>
> Thanks guys!
>
> --
> Andrey Pankov
> _______________________________________________
> hive-users mailing list
> hive-users@publists.facebook.com
> http://publists.facebook.com/mailman/listinfo/hive-users
>
>



-- 
Andrey Pankov