You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Paul Yang <py...@facebook.com> on 2010/07/01 01:38:36 UTC

RE: Best way to create a view with index from explode ?

1. In the current implementation, explode does have that contract.

2. WHERE is allowed, but there was a bug with predicate push down that caused a NPE to be thrown. There's a patch for this at https://issues.apache.org/jira/browse/HIVE-1056 or you can add 'set hive.optimize.ppd=false;' before you run your queries.

3. Yes, you would need to override this method.

4. If you want to output two columns, check out GenericUDTFExplode2 in contrib for an example on how this can be done.


From: Viraj Bhat [mailto:viraj@yahoo-inc.com]
Sent: Tuesday, June 29, 2010 5:06 PM
To: hive-user@hadoop.apache.org
Subject: RE: Best way to create a view with index from explode ?

Hi Paul,
 Thanks for your inputs. I was looking at the Generic explode UDTF in your piggybank in the contrib. directory.
Here are questions:
1)     Does the explode have a contract that if we take in an array of maps we return only the maps, not maps and another column (index) with it.
2)     Second do UDTF's have some restriction that I cannot have a WHERE clause or select other columns. I want to write something like this:
select  bcookie, Myexplode(info) as (newcol, index) from myoldtable where bcookie is not null;
3)     For adding the additional index column, do I need to override the following method  [[public void process(Object[] o) throws HiveException ]]
4)     Do you have other examples of writing UDTF's
Any insights would be appreciated.
Viraj

________________________________
From: Paul Yang [mailto:pyang@facebook.com]
Sent: Tuesday, June 29, 2010 3:01 PM
To: hive-user@hadoop.apache.org
Subject: RE: Best way to create a view with index from explode ?

Yes, the UDTF explode() is probably the easiest way to get multiple rows from an array. Another way would be to write your own custom script and use the transform clause. To generate the index of the info, you'll need the field() UDF, but there is currently an issue with it:

https://issues.apache.org/jira/browse/HIVE-1385


From: Viraj Bhat [mailto:viraj@yahoo-inc.com]
Sent: Tuesday, June 29, 2010 2:06 PM
To: hive-user@hadoop.apache.org
Subject: Best way to create a view with index from explode ?

Hi all,
I have an input table which has 3 columns one of which is an array list,

bcookie, id, info [Schema: string, string, arraylist<map<string,string>>]

Here is a sample row

1245, 1, {[myid#id2, myage#100], [myid#id3, myage#101], [myid#id3, myage#102]}

I wanted to explode the arraylist column and create an additional column which represents the index of the info

bcookie, id, info, index
1245, 1, [myid#id2, myage#100], 0
1245, 1, [myid#id3, myage#101], 1
1245, 1, [myid#id3, myage#102] , 2

I was thinking of the lines of:
SELECT * FROM srctable LATERAL VIEW explode(info)... ;

Is UDTF the only way to achieve this?
Thanks Viraj