You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Anirudh Paramshetti <an...@gmail.com> on 2016/02/02 15:29:14 UTC

GenericUDF

Hi,

I have written a custom UDF in Java extending the GenericUDF class. I have
some print statements in the constructor and initialize method, as to
understand the number of calls made to them. From what I have read about
GenericUDF, I was expecting the constructor and initialize method to be
called once per UDF instance. But what I found out was, the constructor was
called thrice(once while creating the temporary function and twice while
using it in the hive query) and the initialize method was called
twice(while using it in the hive query).

UDF output:

hive> create temporary function replace as
'package.name.GenericNullReplacement';
Inside constructor of GenericNullReplacement

hive> select replace(column_name, 0.01) from dummy_table;
Inside constructor of GenericNullReplacement
Inside constructor of GenericNullReplacement
Inside initialize() method of GenericNullReplacement
Inside initialize() method of GenericNullReplacement
1.23
4.56
4.56
0.01
4.56
9.56

It would be great if someone could explain me what is happening here?


Thanks and Regards,
Anirudh Paramshetti

Re: GenericUDF

Posted by Jason Dere <jd...@hortonworks.com>.
Same flow - initialize() is called on the GenericUDF very soon after construction, by the same methods that created the UDF.

Take a look at TypeCheckProcFactory or ExprNodeGenericFuncDesc


________________________________
From: Anirudh Paramshetti <an...@gmail.com>
Sent: Tuesday, February 02, 2016 9:24 PM
To: user@hive.apache.org
Subject: Re: GenericUDF

Thanks Jason for your inputs.

I believe you are talking about the number of instances created, which explains why the constructor was called thrice. But I'm still unclear about the two calls made to the initialize method, when I use the temporary function in the query. Can you put some more light on the call flow to the initialize method.?

Regards,
Anirudh Paramshetti



On Wed, Feb 3, 2016 at 6:08 AM, Jason Dere <jd...@hortonworks.com>> wrote:

- Created once when registering the function to the FunctionRegistry.

- The UDF is copied from the version in the registry during query compilation

- The query plan is serialized, then deserialized by the tasks during query execution, which constructs another instance of the UDF.



________________________________
From: Anirudh Paramshetti <an...@gmail.com>>
Sent: Tuesday, February 02, 2016 6:29 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: GenericUDF

Hi,

I have written a custom UDF in Java extending the GenericUDF class. I have some print statements in the constructor and initialize method, as to understand the number of calls made to them. From what I have read about GenericUDF, I was expecting the constructor and initialize method to be called once per UDF instance. But what I found out was, the constructor was called thrice(once while creating the temporary function and twice while using it in the hive query) and the initialize method was called twice(while using it in the hive query).

UDF output:

hive> create temporary function replace as 'package.name.GenericNullReplacement';
Inside constructor of GenericNullReplacement

hive> select replace(column_name, 0.01) from dummy_table;
Inside constructor of GenericNullReplacement
Inside constructor of GenericNullReplacement
Inside initialize() method of GenericNullReplacement
Inside initialize() method of GenericNullReplacement
1.23
4.56
4.56
0.01
4.56
9.56

It would be great if someone could explain me what is happening here?


Thanks and Regards,
Anirudh Paramshetti


Re: GenericUDF

Posted by Anirudh Paramshetti <an...@gmail.com>.
Thanks Jason for your inputs.

I believe you are talking about the number of instances created, which
explains why the constructor was called thrice. But I'm still unclear about
the two calls made to the initialize method, when I use the temporary
function in the query. Can you put some more light on the call flow to the
initialize method.?

Regards,
Anirudh Paramshetti



On Wed, Feb 3, 2016 at 6:08 AM, Jason Dere <jd...@hortonworks.com> wrote:

> - Created once when registering the function to the FunctionRegistry.
>
> - The UDF is copied from the version in the registry during query
> compilation
>
> - The query plan is serialized, then deserialized by the tasks during
> query execution, which constructs another instance of the UDF.
>
>
>
> ------------------------------
> *From:* Anirudh Paramshetti <an...@gmail.com>
> *Sent:* Tuesday, February 02, 2016 6:29 AM
> *To:* user@hive.apache.org
> *Subject:* GenericUDF
>
> Hi,
>
> I have written a custom UDF in Java extending the GenericUDF class. I have
> some print statements in the constructor and initialize method, as to
> understand the number of calls made to them. From what I have read about
> GenericUDF, I was expecting the constructor and initialize method to be
> called once per UDF instance. But what I found out was, the constructor was
> called thrice(once while creating the temporary function and twice while
> using it in the hive query) and the initialize method was called
> twice(while using it in the hive query).
>
> UDF output:
>
> hive> create temporary function replace as
> 'package.name.GenericNullReplacement';
> Inside constructor of GenericNullReplacement
>
> hive> select replace(column_name, 0.01) from dummy_table;
> Inside constructor of GenericNullReplacement
> Inside constructor of GenericNullReplacement
> Inside initialize() method of GenericNullReplacement
> Inside initialize() method of GenericNullReplacement
> 1.23
> 4.56
> 4.56
> 0.01
> 4.56
> 9.56
>
> It would be great if someone could explain me what is happening here?
>
>
> Thanks and Regards,
> Anirudh Paramshetti
>

Re: GenericUDF

Posted by Jason Dere <jd...@hortonworks.com>.
- Created once when registering the function to the FunctionRegistry.

- The UDF is copied from the version in the registry during query compilation

- The query plan is serialized, then deserialized by the tasks during query execution, which constructs another instance of the UDF.



________________________________
From: Anirudh Paramshetti <an...@gmail.com>
Sent: Tuesday, February 02, 2016 6:29 AM
To: user@hive.apache.org
Subject: GenericUDF

Hi,

I have written a custom UDF in Java extending the GenericUDF class. I have some print statements in the constructor and initialize method, as to understand the number of calls made to them. From what I have read about GenericUDF, I was expecting the constructor and initialize method to be called once per UDF instance. But what I found out was, the constructor was called thrice(once while creating the temporary function and twice while using it in the hive query) and the initialize method was called twice(while using it in the hive query).

UDF output:

hive> create temporary function replace as 'package.name.GenericNullReplacement';
Inside constructor of GenericNullReplacement

hive> select replace(column_name, 0.01) from dummy_table;
Inside constructor of GenericNullReplacement
Inside constructor of GenericNullReplacement
Inside initialize() method of GenericNullReplacement
Inside initialize() method of GenericNullReplacement
1.23
4.56
4.56
0.01
4.56
9.56

It would be great if someone could explain me what is happening here?


Thanks and Regards,
Anirudh Paramshetti