You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2016/06/18 21:51:05 UTC
[jira] [Commented] (DRILL-4726) Dynamic UDFs support

    [ https://issues.apache.org/jira/browse/DRILL-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338215#comment-15338215 ] 

Paul Rogers commented on DRILL-4726:
------------------------------------

Added many comments in the doc and on the user mailing list of the form "have we considered problem X?" Here are a few references for possible solutions.

Hive is one good model. Hive adds code using the Hive CLI (equivelent to Drill's sqlline client). It seems that code can be added globally, or per session. (Per-session would be handy in Drill for data exporation that needs a one-off function without the need to install the code for all users.)

ADD JAR your.jar;

Adds a jar to the class path. Something similar in Drill would allow adding any number of jar files, including dependencies. Then:

CREATE TEMPORARY FUNCTION myFunc AS 'com.yoyodyne.MyFunc';

This does three things. First, it maps a SQL name to the function. Second, it allows any number of functions in the same jar. And third, it establishes a life cycle for the function (used only for this one section in the above example.)

Hive also provides;

LIST JARS;

To identify what is on the class path. This command implies that Hive has a separate class path (class loader) for user jars, since the command does not list Hive's own jars and dependencies. For Drill, we'd want to list jars added by the user, not the many, many jars that ship with Drill.

Other details:

DELETE TEMPORARY FUNCTION myFunc;
DELETE JAR your.jar;

To remove a function (1) or jar. Interestingly, the jar commands are implemented by the CLI tool, the function commands by the Hive engine.

Hive uses the idea of Resources (2) to distribute the code for each job. Hive supports other resources such as files and archives. (3) Hive executes a query using MapReduce, which makes it easier to distribute code for each job. With Drill we'd need to invent a different mechanism.

Of course, all of this may be well beyond the scope of this JIRA. But, it does provide a perspective of what other products do.

(1) https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/ReloadFunction
(2) https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveResources for information on Hive resources
(3) https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Commands

> Dynamic UDFs support
> --------------------
>
>                 Key: DRILL-4726
>                 URL: https://issues.apache.org/jira/browse/DRILL-4726
>             Project: Apache Drill
>          Issue Type: New Feature
>    Affects Versions: 1.6.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>             Fix For: Future
>
>
> Allow register UDFs without  restart of Drillbits.
> Design is described in document below:
> https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)