You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Paul Rogers <pa...@yahoo.com> on 2019/08/08 16:55:10 UTC

Re: [QUESTION]: Caching UDFs

Hi Charles,

In general, we cannot know if a function is deterministic. Your function might be rand(seed, max). It might do a JDBC lookup or a REST call. Drill can't know (unless we add some way to know that a function is deterministic: maybe a @Deterministic annotation.)

That said, you can build in caching inside the function. Should your cache be separate from mine for security reasons? Should the cache be shared across execution threads on a given node? Local to a single minor fragment?

Aggregates are example of functions that have internal state, perhaps the idea can be extended for a function-specific results cache.

Thanks,
- Paul

 

    On Thursday, August 8, 2019, 09:46:12 AM PDT, Charles Givre <ch...@gtkcyber.com> wrote:  
 
 Hello Drill Devs,I have a question about UDFs.  Let's say you have a non-trivial UDF called foo(x,y) which returns some value.  Assuming that if the arguments are the same, the function foo() will return the same result, does Drill have any optimizations to prevent running the non-trivial function?  
I was thinking that it might make sense to cache the arguments and results in memory and before the function is executed, check the cache to see if they're there.  If they are, return the cached results, and if not, execute the function.  I was thinking that for some functions, like date/time functions, we might want to include something in the code to ensure that the results do not get cached. 
Thoughts?

Charles S. Givre CISSPData Scientist, Co-Founder GTK Cyber LLC
charles.givre@gtkcyber.comMobile: (443) 762-3286