You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sanjay Subramanian <Sa...@wizecommerce.com> on 2013/07/23 21:12:46 UTC
Calling same UDF multiple times in a SELECT query
Hi
V r using version hive-exec-0.9.0-cdh4.1.2 in production
I need to check and use the output from a UDF in a query to assign values to 2 columns in a SELECT query
Example
SELECT
a,
IF(fooUdf(a) < -1 , -1, fooUdf(a)) as b,
IF(fooUdf(a) < -1 , fooUdf(a), 0) as c
FROM
my_hive_table
So will fooUdf be called 4 times ? Or once ?
Why this is important is because in our case this UDF calls a web service and I don't want so many calls to the service.
Thanks
sanjay
CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Calling same UDF multiple times in a SELECT query
Posted by Navis류승우 <na...@nexr.com>.
It will be called 4 times whatever you annotated on the UDF if you are
using released version of hive.
https://issues.apache.org/jira/browse/HIVE-4209 , which will be
included in 0.12.0, will make that single UDF call by caching result.
2013/7/24 Sanjay Subramanian <Sa...@wizecommerce.com>:
> Thanks Jan
>
> I will mod my UDF and test it out
>
> I want to make sure I understand your words here
> "The obvious condition is that it must always return the identical result
> when called with same parameters."
>
> If I can make sure that a call to the web service is successful it will
> always return same output for a given set of input
>
> F(x1,y1) ---->will always equal -----> z1
>
> that’s what u mean right ?
>
> sanjay
>
> From: Jan Dolinár <do...@gmail.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Tuesday, July 23, 2013 12:35 PM
> To: user <us...@hive.apache.org>
>
> Subject: Re: Calling same UDF multiple times in a SELECT query
>
> Hi,
>
> If you use annotation, Hive should be able to optimize it to single call:
>
> @UDFType(deterministic = true)
>
> The obvious condition is that it must always return the identical result
> when called with same parameters.
>
> Little bit more on this can be found in Mark Grovers post at
> http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html.
>
> Regards,
> Jan
>
>
> On Tue, Jul 23, 2013 at 9:25 PM, Nitin Pawar <ni...@gmail.com>
> wrote:
>>
>> fucntion return values are not stored for repeat use of same (as per my
>> understanding)
>>
>> I know you may have already thought about other approach as
>>
>> select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call
>> from table
>>
>>
>>
>>
>> On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian
>> <Sa...@wizecommerce.com> wrote:
>>>
>>> Hi
>>>
>>> V r using version hive-exec-0.9.0-cdh4.1.2 in production
>>>
>>> I need to check and use the output from a UDF in a query to assign values
>>> to 2 columns in a SELECT query
>>>
>>> Example
>>>
>>> SELECT
>>> a,
>>> IF(fooUdf(a) < -1 , -1, fooUdf(a)) as b,
>>> IF(fooUdf(a) < -1 , fooUdf(a), 0) as c
>>> FROM
>>> my_hive_table
>>>
>>>
>>> So will fooUdf be called 4 times ? Or once ?
>>>
>>> Why this is important is because in our case this UDF calls a web service
>>> and I don't want so many calls to the service.
>>>
>>> Thanks
>>>
>>> sanjay
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> ======================
>>> This email message and any attachments are for the exclusive use of the
>>> intended recipient(s) and may contain confidential and privileged
>>> information. Any unauthorized review, use, disclosure or distribution is
>>> prohibited. If you are not the intended recipient, please contact the sender
>>> by reply email and destroy all copies of the original message along with any
>>> attachments, from your computer system. If you are the intended recipient,
>>> please be advised that the content of this message is subject to access,
>>> review and disclosure by the sender's Email System Administrator.
>>
>>
>>
>>
>> --
>> Nitin Pawar
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the sender
> by reply email and destroy all copies of the original message along with any
> attachments, from your computer system. If you are the intended recipient,
> please be advised that the content of this message is subject to access,
> review and disclosure by the sender's Email System Administrator.
Re: Calling same UDF multiple times in a SELECT query
Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
Thanks Jan
I will mod my UDF and test it out
I want to make sure I understand your words here
"The obvious condition is that it must always return the identical result when called with same parameters."
If I can make sure that a call to the web service is successful it will always return same output for a given set of input
F(x1,y1) ---->will always equal -----> z1
that’s what u mean right ?
sanjay
From: Jan Dolinár <do...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Tuesday, July 23, 2013 12:35 PM
To: user <us...@hive.apache.org>>
Subject: Re: Calling same UDF multiple times in a SELECT query
Hi,
If you use annotation, Hive should be able to optimize it to single call:
@UDFType(deterministic = true)
The obvious condition is that it must always return the identical result when called with same parameters.
Little bit more on this can be found in Mark Grovers post at http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html.
Regards,
Jan
On Tue, Jul 23, 2013 at 9:25 PM, Nitin Pawar <ni...@gmail.com>> wrote:
fucntion return values are not stored for repeat use of same (as per my understanding)
I know you may have already thought about other approach as
select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call from table
On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian <Sa...@wizecommerce.com>> wrote:
Hi
V r using version hive-exec-0.9.0-cdh4.1.2 in production
I need to check and use the output from a UDF in a query to assign values to 2 columns in a SELECT query
Example
SELECT
a,
IF(fooUdf(a) < -1 , -1, fooUdf(a)) as b,
IF(fooUdf(a) < -1 , fooUdf(a), 0) as c
FROM
my_hive_table
So will fooUdf be called 4 times ? Or once ?
Why this is important is because in our case this UDF calls a web service and I don't want so many calls to the service.
Thanks
sanjay
CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
--
Nitin Pawar
CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Calling same UDF multiple times in a SELECT query
Posted by Jan Dolinár <do...@gmail.com>.
Hi,
If you use annotation, Hive should be able to optimize it to single call:
@UDFType(deterministic = true)
The obvious condition is that it must always return the identical result
when called with same parameters.
Little bit more on this can be found in Mark Grovers post at
http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html.
Regards,
Jan
On Tue, Jul 23, 2013 at 9:25 PM, Nitin Pawar <ni...@gmail.com>wrote:
> fucntion return values are not stored for repeat use of same (as per my
> understanding)
>
> I know you may have already thought about other approach as
>
> select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call
> from table
>
>
>
>
> On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian <
> Sanjay.Subramanian@wizecommerce.com> wrote:
>
>> Hi
>>
>> V r using version hive-exec-0.9.0-cdh4.1.2 in production
>>
>> I need to check and use the output from a UDF in a query to assign
>> values to 2 columns in a SELECT query
>>
>> Example
>>
>> SELECT
>> a,
>> IF(fooUdf(a) < -1 , -1, fooUdf(a)) as b,
>> IF(fooUdf(a) < -1 , fooUdf(a), 0) as c
>> FROM
>> my_hive_table
>>
>>
>> So will fooUdf be called 4 times ? Or once ?
>>
>> Why this is important is because in our case this UDF calls a web
>> service and I don't want so many calls to the service.
>>
>> Thanks
>>
>> sanjay
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.
>>
>
>
>
> --
> Nitin Pawar
>
Re: Calling same UDF multiple times in a SELECT query
Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
Hi Nitin
Thanks
Yes I did actually do a nested query but it spawns reducers that I did not want…I wanted to keep it to one select so that only mappers are called and then I can invoke several mappers to call the we b service
Thanks
sanjay
From: Nitin Pawar <ni...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Tuesday, July 23, 2013 12:25 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: Calling same UDF multiple times in a SELECT query
fucntion return values are not stored for repeat use of same (as per my understanding)
I know you may have already thought about other approach as
select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call from table
On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian <Sa...@wizecommerce.com>> wrote:
Hi
V r using version hive-exec-0.9.0-cdh4.1.2 in production
I need to check and use the output from a UDF in a query to assign values to 2 columns in a SELECT query
Example
SELECT
a,
IF(fooUdf(a) < -1 , -1, fooUdf(a)) as b,
IF(fooUdf(a) < -1 , fooUdf(a), 0) as c
FROM
my_hive_table
So will fooUdf be called 4 times ? Or once ?
Why this is important is because in our case this UDF calls a web service and I don't want so many calls to the service.
Thanks
sanjay
CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
--
Nitin Pawar
CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Calling same UDF multiple times in a SELECT query
Posted by Nitin Pawar <ni...@gmail.com>.
fucntion return values are not stored for repeat use of same (as per my
understanding)
I know you may have already thought about other approach as
select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call
from table
On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian <
Sanjay.Subramanian@wizecommerce.com> wrote:
> Hi
>
> V r using version hive-exec-0.9.0-cdh4.1.2 in production
>
> I need to check and use the output from a UDF in a query to assign
> values to 2 columns in a SELECT query
>
> Example
>
> SELECT
> a,
> IF(fooUdf(a) < -1 , -1, fooUdf(a)) as b,
> IF(fooUdf(a) < -1 , fooUdf(a), 0) as c
> FROM
> my_hive_table
>
>
> So will fooUdf be called 4 times ? Or once ?
>
> Why this is important is because in our case this UDF calls a web
> service and I don't want so many calls to the service.
>
> Thanks
>
> sanjay
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
--
Nitin Pawar