You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sanjay Subramanian <Sa...@wizecommerce.com> on 2013/07/23 21:12:46 UTC

Calling same UDF multiple times in a SELECT query

Hi

V r using version hive-exec-0.9.0-cdh4.1.2 in production

I need to check and use the output from a UDF in a query to assign values to 2 columns in a SELECT query

Example

SELECT
     a,
     IF(fooUdf(a) < -1  , -1, fooUdf(a)) as b,
     IF(fooUdf(a) < -1  , fooUdf(a), 0) as c
FROM
     my_hive_table


So will fooUdf be called 4 times ? Or once ?

Why this is important is because in our case this UDF calls a web service and I don't want so many calls to the service.

Thanks

sanjay



CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Calling same UDF multiple times in a SELECT query

Posted by Navis류승우 <na...@nexr.com>.
It will be called 4 times whatever you annotated on the UDF if you are
using released version of hive.

https://issues.apache.org/jira/browse/HIVE-4209 , which will be
included in 0.12.0, will make that single UDF call by caching result.

2013/7/24 Sanjay Subramanian <Sa...@wizecommerce.com>:
> Thanks Jan
>
> I will mod my UDF and test it out
>
> I want to make sure I understand your words here
> "The obvious condition is that it must always return the identical result
> when called with same parameters."
>
> If I can make sure that a call to the web service is successful it will
> always return same output for a given set of input
>
> F(x1,y1) ---->will always equal -----> z1
>
> that’s what u mean right ?
>
> sanjay
>
> From: Jan Dolinár <do...@gmail.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Tuesday, July 23, 2013 12:35 PM
> To: user <us...@hive.apache.org>
>
> Subject: Re: Calling same UDF multiple times in a SELECT query
>
> Hi,
>
> If you use annotation, Hive should be able to optimize it to single call:
>
>  @UDFType(deterministic = true)
>
> The obvious condition is that it must always return the identical result
> when called with same parameters.
>
> Little bit more on this can be found in Mark Grovers post at
> http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html.
>
> Regards,
> Jan
>
>
> On Tue, Jul 23, 2013 at 9:25 PM, Nitin Pawar <ni...@gmail.com>
> wrote:
>>
>> fucntion return values are not stored for repeat use of same (as per my
>> understanding)
>>
>> I know you may have already thought about other approach as
>>
>> select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call
>> from table
>>
>>
>>
>>
>> On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian
>> <Sa...@wizecommerce.com> wrote:
>>>
>>> Hi
>>>
>>> V r using version hive-exec-0.9.0-cdh4.1.2 in production
>>>
>>> I need to check and use the output from a UDF in a query to assign values
>>> to 2 columns in a SELECT query
>>>
>>> Example
>>>
>>> SELECT
>>>      a,
>>>      IF(fooUdf(a) < -1  , -1, fooUdf(a)) as b,
>>>      IF(fooUdf(a) < -1  , fooUdf(a), 0) as c
>>> FROM
>>>      my_hive_table
>>>
>>>
>>> So will fooUdf be called 4 times ? Or once ?
>>>
>>> Why this is important is because in our case this UDF calls a web service
>>> and I don't want so many calls to the service.
>>>
>>> Thanks
>>>
>>> sanjay
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> ======================
>>> This email message and any attachments are for the exclusive use of the
>>> intended recipient(s) and may contain confidential and privileged
>>> information. Any unauthorized review, use, disclosure or distribution is
>>> prohibited. If you are not the intended recipient, please contact the sender
>>> by reply email and destroy all copies of the original message along with any
>>> attachments, from your computer system. If you are the intended recipient,
>>> please be advised that the content of this message is subject to access,
>>> review and disclosure by the sender's Email System Administrator.
>>
>>
>>
>>
>> --
>> Nitin Pawar
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the sender
> by reply email and destroy all copies of the original message along with any
> attachments, from your computer system. If you are the intended recipient,
> please be advised that the content of this message is subject to access,
> review and disclosure by the sender's Email System Administrator.

Re: Calling same UDF multiple times in a SELECT query

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
Thanks Jan

I will mod my UDF and test it out

I want to make sure I understand your words here
"The obvious condition is that it must always return the identical result when called with same parameters."

If I can make sure that a call to the web service is successful it will always return same output for a given set of input

F(x1,y1) ---->will always equal -----> z1

that’s what u mean right ?

sanjay

From: Jan Dolinár <do...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Tuesday, July 23, 2013 12:35 PM
To: user <us...@hive.apache.org>>
Subject: Re: Calling same UDF multiple times in a SELECT query

Hi,

If you use annotation, Hive should be able to optimize it to single call:

 @UDFType(deterministic = true)

The obvious condition is that it must always return the identical result when called with same parameters.

Little bit more on this can be found in Mark Grovers post at http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html.

Regards,
Jan


On Tue, Jul 23, 2013 at 9:25 PM, Nitin Pawar <ni...@gmail.com>> wrote:
fucntion return values are not stored for repeat use of same (as per my understanding)

I know you may have already thought about other approach as

select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call from table




On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian <Sa...@wizecommerce.com>> wrote:
Hi

V r using version hive-exec-0.9.0-cdh4.1.2 in production

I need to check and use the output from a UDF in a query to assign values to 2 columns in a SELECT query

Example

SELECT
     a,
     IF(fooUdf(a) < -1  , -1, fooUdf(a)) as b,
     IF(fooUdf(a) < -1  , fooUdf(a), 0) as c
FROM
     my_hive_table


So will fooUdf be called 4 times ? Or once ?

Why this is important is because in our case this UDF calls a web service and I don't want so many calls to the service.

Thanks

sanjay



CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.



--
Nitin Pawar


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Calling same UDF multiple times in a SELECT query

Posted by Jan Dolinár <do...@gmail.com>.
Hi,

If you use annotation, Hive should be able to optimize it to single call:

 @UDFType(deterministic = true)

The obvious condition is that it must always return the identical result
when called with same parameters.

Little bit more on this can be found in Mark Grovers post at
http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html.

Regards,
Jan


On Tue, Jul 23, 2013 at 9:25 PM, Nitin Pawar <ni...@gmail.com>wrote:

> fucntion return values are not stored for repeat use of same (as per my
> understanding)
>
> I know you may have already thought about other approach as
>
> select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call
> from table
>
>
>
>
> On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian <
> Sanjay.Subramanian@wizecommerce.com> wrote:
>
>>  Hi
>>
>>  V r using version hive-exec-0.9.0-cdh4.1.2 in production
>>
>>  I need to check and use the output from a UDF in a query to assign
>> values to 2 columns in a SELECT query
>>
>>  Example
>>
>>  SELECT
>>      a,
>>      IF(fooUdf(a) < -1  , -1, fooUdf(a)) as b,
>>      IF(fooUdf(a) < -1  , fooUdf(a), 0) as c
>> FROM
>>      my_hive_table
>>
>>
>>  So will fooUdf be called 4 times ? Or once ?
>>
>>  Why this is important is because in our case this UDF calls a web
>> service and I don't want so many calls to the service.
>>
>>  Thanks
>>
>>  sanjay
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.
>>
>
>
>
> --
> Nitin Pawar
>

Re: Calling same UDF multiple times in a SELECT query

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
Hi Nitin

Thanks
Yes I did actually do a nested query but it spawns reducers that I did not want…I wanted to keep it to one select so that only mappers are called and then I can invoke several mappers to call the we b service

Thanks

sanjay

From: Nitin Pawar <ni...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Tuesday, July 23, 2013 12:25 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: Calling same UDF multiple times in a SELECT query

fucntion return values are not stored for repeat use of same (as per my understanding)

I know you may have already thought about other approach as

select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call from table




On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian <Sa...@wizecommerce.com>> wrote:
Hi

V r using version hive-exec-0.9.0-cdh4.1.2 in production

I need to check and use the output from a UDF in a query to assign values to 2 columns in a SELECT query

Example

SELECT
     a,
     IF(fooUdf(a) < -1  , -1, fooUdf(a)) as b,
     IF(fooUdf(a) < -1  , fooUdf(a), 0) as c
FROM
     my_hive_table


So will fooUdf be called 4 times ? Or once ?

Why this is important is because in our case this UDF calls a web service and I don't want so many calls to the service.

Thanks

sanjay



CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.



--
Nitin Pawar

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Calling same UDF multiple times in a SELECT query

Posted by Nitin Pawar <ni...@gmail.com>.
fucntion return values are not stored for repeat use of same (as per my
understanding)

I know you may have already thought about other approach as

select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call
from table




On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian <
Sanjay.Subramanian@wizecommerce.com> wrote:

>  Hi
>
>  V r using version hive-exec-0.9.0-cdh4.1.2 in production
>
>  I need to check and use the output from a UDF in a query to assign
> values to 2 columns in a SELECT query
>
>  Example
>
>  SELECT
>      a,
>      IF(fooUdf(a) < -1  , -1, fooUdf(a)) as b,
>      IF(fooUdf(a) < -1  , fooUdf(a), 0) as c
> FROM
>      my_hive_table
>
>
>  So will fooUdf be called 4 times ? Or once ?
>
>  Why this is important is because in our case this UDF calls a web
> service and I don't want so many calls to the service.
>
>  Thanks
>
>  sanjay
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>



-- 
Nitin Pawar