You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2021/04/05 19:36:00 UTC

[jira] [Created] (CALCITE-4564) Initialization context for non-static user-defined functions (UDFs)

Julian Hyde created CALCITE-4564:
------------------------------------

             Summary: Initialization context for non-static user-defined functions (UDFs)
                 Key: CALCITE-4564
                 URL: https://issues.apache.org/jira/browse/CALCITE-4564
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde


I propose to allow user-defined functions (UDFs) to read from an initialization context during construction. The initialization context would be a new Java {{interface UdfInitializer}} that provides, among other things, a type factory and the values of the arguments to the function call whose values are literals.

The purpose of this feature is to allow functions to do more work at initialization time and less work on each invocation. Suppose I wanted to write a UDF {{regexMatch(pattern, string)}} that matches Java regular expressions. If {{pattern}} is a literal, I would like to create an instance of the function object that calls {{Pattern.compile(pattern)}} in its constructor and stores the resulting {{Pattern}} object as a field. Each invocation of the function can use that {{Pattern}} object, and does not have to pay the cost of compilation.

In order to use this feature, a UDF class would have a public constructor with a single argument that is a {{UdfInitializer}}. The method that invokes the function, conventionally called {{eval}}, must be non-static.

This feature is optional. A UDF that has a public constructor with zero arguments (which is the current contract for non-static UDFs) will continue to work. [class MyPlusFunction|https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/test/java/org/apache/calcite/util/Smalls.java#L429] is an example of this kind of UDF.

This feature would apply to all UDFs, including table functions (i.e. those whose argument are tables or which return tables) and aggregate functions.

The initialization context would not affect type derivation aspects of the function. The return type, operand types, and so forth, will already have been derived during validate time, and is complete well before any code is generated or executed. If you want to control type derivation, you should create your own sub-class of {{SqlOperator}}, as today.

There are some implementation challenges:
* The code generator will need to generate an instance of {{UdfInitializer}} for each UDF call that occurs in the query. Some data structures that are readily available at validate time (e.g. {{RexCall}}) are not easily re-created at run time, so we should be conservative what information is available via {{UdfInitializer}}.
* The code generator must ensure that those instances are constructed exactly once during the execution of the query; those instances should not be variables in the {{execute}} method, but should instead be fields, or perhaps static fields, in the generated class.
* This functionality needs to work through both the interpreter ({{Bindable}} convention) and generated code ({{Enumerable}} convention).




--
This message was sent by Atlassian Jira
(v8.3.4#803005)