You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Shaun Clowes <sc...@atlassian.com> on 2013/03/11 04:12:59 UTC

UDFs and Thread Safety?

Hi All,

Could anyone describe what the required thread safety for a UDF is? I
understand that one is instantiated for each use of the function in an
expression, but can there be multiple threads executing the methods of a
single UDF object at once?

Thanks,
Shaun

Re: UDFs and Thread Safety?

Posted by Dean Wampler <de...@thinkbiganalytics.com>.
Hadoop tasks use a single thread, so there won't be multiple threads
accessing the UDF.

However, there's a flip side of thread safety if your UDF maintains state;
is it receiving all the data it should or is the data being sharded over
multiple processes in a way that defeats the UDF? My favorite example is a
moving average calculator (like you might use in Finance). Most
full-featured SQLs have window functions for this purpose.

Suppose I'm averaging over the last 50 closing prices for a given financial
instrument. To do this I cache the last 50 I've seen in the UDF as each
record is passed to me (keeping the data for each instrument properly
separated). If some records go to one mapper task and other records go to a
different mapper task, then at least some of my averages will be wrong due
to missing data.

dean

On Sun, Mar 10, 2013 at 10:12 PM, Shaun Clowes <sc...@atlassian.com>wrote:

> Hi All,
>
> Could anyone describe what the required thread safety for a UDF is? I
> understand that one is instantiated for each use of the function in an
> expression, but can there be multiple threads executing the methods of a
> single UDF object at once?
>
> Thanks,
> Shaun
>



-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330

Re: UDFs and Thread Safety?

Posted by Nagarjuna Kanamarlapudi <na...@gmail.com>.
Yes , in a map only query your udf will be executed at the mapper side.


I don't know how you can make your udf thread safe.  But what I do set the number of reducers to 1 and make sure that I write a query which has both map and reduce. 
Then the udf will be executed at the reduce phase and suffice my requirement . 
—
Sent from  iPhone

On Mon, Mar 11, 2013 at 8:43 AM, Shaun Clowes <sc...@atlassian.com>
wrote:

> Hi All,
> Could anyone describe what the required thread safety for a UDF is? I
> understand that one is instantiated for each use of the function in an
> expression, but can there be multiple threads executing the methods of a
> single UDF object at once?
> Thanks,
> Shaun