You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sunita Arvind <su...@gmail.com> on 2014/01/30 00:08:42 UTC

Simple UDF to return array

Hello Experts,

I am trying to write a UDF to parse a logline and provide the output in the
form of an array. Basically I want to be able to use LATERAL VIEW explode
subsequently to make it into columns.

This is how a typical log entry looks:

24-JUN-2012 05:00:42 *
(CONNECT_DATA=(SERVICE_NAME=abcd.efg.hij.com)(failover_mode=(type=select)(method=basic))(CID=(PROGRAM=sqlplus)(HOST=xyz)(USER=u1))(SERVER=dedicated)(INSTANCE_NAME=aaa))
* (ADDRESS=(PROTOCOL=tcp)(HOST=9.9.9.9)(PORT=60000)) * establish *
abcd.efg.hij.com * 0

Attached is my LogParser class which is basically the UDF. Excerpts below:

class LogParser extends UDF {
  int current_index=0;

  ArrayList<String> record= new ArrayList<>();
  public ArrayList<String> evaluate(Text input) {
......
String  logdate = null;
...
logdate = getDate(line);
record.add(logdate);
return record;


I've tried changing the return type to ArrayList<Text>, Object etc.I just
get an error like this when I try to use the UDF:

select explode(strparse(record)) as newcols from logdump limit 1;

OK converting to local hdfs://tlbd-ns/user/TestUser1/LogParserStrArr.jar
Added
/tmp/3c583384-0592-41a3-ad9e-b12d2207df7b_resources/LogParserStrArr.jar to
class path Added resource:
/tmp/3c583384-0592-41a3-ad9e-b12d2207df7b_resources/LogParserStrArr.jar OK
FAILED: UDFArgumentException explode() takes an array or a map as a
parameter

I tried cast to array and that fails as well.

Requesting help from the community. I am considering writing generic UDF,
but this is a simple requirement and would like to be able to use simple
UDF if I can.

regards
Sunita

Re: Simple UDF to return array

Posted by Sunita Arvind <su...@gmail.com>.
Thanks Roberto. Will try that out.

regards
Sunita


On Thu, Jan 30, 2014 at 10:14 AM, Roberto Congiu
<ro...@openx.com>wrote:

> Hi Sunita,
> yes, it's definitely possible and you should use Generic UDFs.
> I wrote one UDF that takes n arrays (each one with the same number of
> elements) and returns an array of structs which is usually used in a
> lateral view.
>
> A good article on how to write a generic UDF is this one:
> http://www.baynote.com/2012/11/a-word-from-the-engineers/
>
>
> On Thu, Jan 30, 2014 at 7:06 AM, Sunita Arvind <su...@gmail.com>wrote:
>
>> Can someone please suggest if this is doable or not? Is generic udf the
>> only option? How would using generic vs simple udf make any difference
>> since I would be returning the same object either ways.
>>
>> Thank you
>> Sunita
>>
>> ---------- Forwarded message ----------
>> From: *Sunita Arvind* <su...@gmail.com>
>> Date: Wednesday, January 29, 2014
>> Subject: Simple UDF to return array
>> To: "user@hive.apache.org" <us...@hive.apache.org>
>>
>>
>> Hello Experts,
>>
>> I am trying to write a UDF to parse a logline and provide the output in
>> the form of an array. Basically I want to be able to use LATERAL VIEW
>> explode subsequently to make it into columns.
>>
>> This is how a typical log entry looks:
>>
>> 24-JUN-2012 05:00:42 * (CONNECT_DATA=(SERVICE_NAME=abcd.efg.hij.com)(failover_mode=(type=select)(method=basic))(CID=(PROGRAM=sqlplus)(HOST=xyz)(USER=u1))(SERVER=dedicated)(INSTANCE_NAME=aaa))
>> * (ADDRESS=(PROTOCOL=tcp)(HOST=9.9.9.9)(PORT=60000)) * establish *
>> abcd.efg.hij.com * 0
>>
>> Attached is my LogParser class which is basically the UDF. Excerpts below:
>>
>> class LogParser extends UDF {
>>   int current_index=0;
>>
>>   ArrayList<String> record= new ArrayList<>();
>>   public ArrayList<String> evaluate(Text input) {
>> ......
>> String  logdate = null;
>> ...
>> logdate = getDate(line);
>> record.add(logdate);
>> return record;
>>
>>
>> I've tried changing the return type to ArrayList<Text>, Object etc.I just
>> get an error like this when I try to use the UDF:
>>
>> select explode(strparse(record)) as newcols from logdump limit 1;
>>
>> OK converting to local hdfs://tlbd-ns/user/TestUser1/LogParserStrArr.jar
>> Added
>> /tmp/3c583384-0592-41a3-ad9e-b12d2207df7b_resources/LogParserStrArr.jar to
>> class path Added resource:
>> /tmp/3c583384-0592-41a3-ad9e-b12d2207df7b_resources/LogParserStrArr.jar OK
>> FAILED: UDFArgumentException explode() takes an array or a map as a
>> parameter
>>
>> I tried cast to array and that fails as well.
>>
>> Requesting help from the community. I am considering writing generic UDF,
>> but this is a simple requirement and would like to be able to use simple
>> UDF if I can.
>>
>> regards
>>  Sunita
>>
>>
>>
>
>
> --
> ----------------------------------------------------------
> Good judgement comes with experience.
> Experience comes with bad judgement.
> ----------------------------------------------------------
> Roberto Congiu - Data Engineer - OpenX
> tel: +1 626 466 1141
>

Re: Simple UDF to return array

Posted by Roberto Congiu <ro...@openx.com>.
Hi Sunita,
yes, it's definitely possible and you should use Generic UDFs.
I wrote one UDF that takes n arrays (each one with the same number of
elements) and returns an array of structs which is usually used in a
lateral view.

A good article on how to write a generic UDF is this one:
http://www.baynote.com/2012/11/a-word-from-the-engineers/


On Thu, Jan 30, 2014 at 7:06 AM, Sunita Arvind <su...@gmail.com>wrote:

> Can someone please suggest if this is doable or not? Is generic udf the
> only option? How would using generic vs simple udf make any difference
> since I would be returning the same object either ways.
>
> Thank you
> Sunita
>
> ---------- Forwarded message ----------
> From: *Sunita Arvind* <su...@gmail.com>
> Date: Wednesday, January 29, 2014
> Subject: Simple UDF to return array
> To: "user@hive.apache.org" <us...@hive.apache.org>
>
>
> Hello Experts,
>
> I am trying to write a UDF to parse a logline and provide the output in
> the form of an array. Basically I want to be able to use LATERAL VIEW
> explode subsequently to make it into columns.
>
> This is how a typical log entry looks:
>
> 24-JUN-2012 05:00:42 * (CONNECT_DATA=(SERVICE_NAME=abcd.efg.hij.com)(failover_mode=(type=select)(method=basic))(CID=(PROGRAM=sqlplus)(HOST=xyz)(USER=u1))(SERVER=dedicated)(INSTANCE_NAME=aaa))
> * (ADDRESS=(PROTOCOL=tcp)(HOST=9.9.9.9)(PORT=60000)) * establish *
> abcd.efg.hij.com * 0
>
> Attached is my LogParser class which is basically the UDF. Excerpts below:
>
> class LogParser extends UDF {
>   int current_index=0;
>
>   ArrayList<String> record= new ArrayList<>();
>   public ArrayList<String> evaluate(Text input) {
> ......
> String  logdate = null;
> ...
> logdate = getDate(line);
> record.add(logdate);
> return record;
>
>
> I've tried changing the return type to ArrayList<Text>, Object etc.I just
> get an error like this when I try to use the UDF:
>
> select explode(strparse(record)) as newcols from logdump limit 1;
>
> OK converting to local hdfs://tlbd-ns/user/TestUser1/LogParserStrArr.jar
> Added
> /tmp/3c583384-0592-41a3-ad9e-b12d2207df7b_resources/LogParserStrArr.jar to
> class path Added resource:
> /tmp/3c583384-0592-41a3-ad9e-b12d2207df7b_resources/LogParserStrArr.jar OK
> FAILED: UDFArgumentException explode() takes an array or a map as a
> parameter
>
> I tried cast to array and that fails as well.
>
> Requesting help from the community. I am considering writing generic UDF,
> but this is a simple requirement and would like to be able to use simple
> UDF if I can.
>
> regards
> Sunita
>
>
>


-- 
----------------------------------------------------------
Good judgement comes with experience.
Experience comes with bad judgement.
----------------------------------------------------------
Roberto Congiu - Data Engineer - OpenX
tel: +1 626 466 1141

Fwd: Simple UDF to return array

Posted by Sunita Arvind <su...@gmail.com>.
Can someone please suggest if this is doable or not? Is generic udf the
only option? How would using generic vs simple udf make any difference
since I would be returning the same object either ways.

Thank you
Sunita

---------- Forwarded message ----------
From: *Sunita Arvind* <su...@gmail.com>
Date: Wednesday, January 29, 2014
Subject: Simple UDF to return array
To: "user@hive.apache.org" <us...@hive.apache.org>


Hello Experts,

I am trying to write a UDF to parse a logline and provide the output in the
form of an array. Basically I want to be able to use LATERAL VIEW explode
subsequently to make it into columns.

This is how a typical log entry looks:

24-JUN-2012 05:00:42 *
(CONNECT_DATA=(SERVICE_NAME=abcd.efg.hij.com)(failover_mode=(type=select)(method=basic))(CID=(PROGRAM=sqlplus)(HOST=xyz)(USER=u1))(SERVER=dedicated)(INSTANCE_NAME=aaa))
* (ADDRESS=(PROTOCOL=tcp)(HOST=9.9.9.9)(PORT=60000)) * establish *
abcd.efg.hij.com * 0

Attached is my LogParser class which is basically the UDF. Excerpts below:

class LogParser extends UDF {
  int current_index=0;

  ArrayList<String> record= new ArrayList<>();
  public ArrayList<String> evaluate(Text input) {
......
String  logdate = null;
...
logdate = getDate(line);
record.add(logdate);
return record;


I've tried changing the return type to ArrayList<Text>, Object etc.I just
get an error like this when I try to use the UDF:

select explode(strparse(record)) as newcols from logdump limit 1;

OK converting to local hdfs://tlbd-ns/user/TestUser1/LogParserStrArr.jar
Added
/tmp/3c583384-0592-41a3-ad9e-b12d2207df7b_resources/LogParserStrArr.jar to
class path Added resource:
/tmp/3c583384-0592-41a3-ad9e-b12d2207df7b_resources/LogParserStrArr.jar OK
FAILED: UDFArgumentException explode() takes an array or a map as a
parameter

I tried cast to array and that fails as well.

Requesting help from the community. I am considering writing generic UDF,
but this is a simple requirement and would like to be able to use simple
UDF if I can.

regards
Sunita