You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Fabian Hueske (JIRA)" <ji...@apache.org> on 2017/02/15 08:14:41 UTC

[jira] [Commented] (FLINK-5802) Flink SQL calling Hive User-Defined Functions

    [ https://issues.apache.org/jira/browse/FLINK-5802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867437#comment-15867437 ] 

Fabian Hueske commented on FLINK-5802:
--------------------------------------

Thanks for opening this issue [~clarkyzl].
I agree, supporting Hive UDFs would be a great feature for the Table API.

Since there are several ways to achieve this, we should discuss the design first.
From the top of my head I can think of two approaches:

1. Native support by extending the internals of the Table API. This would mean that we have Hive specific code to register functions and integrate them with code generation. Depending on the interface of the Hive UDFs it might even mean that we have to generate a different physical execution plan. It also means that we will have a dependency on Hive in the Table API which I personally would like to avoid.
2. Support by wrapping. For this, we would implement Table API UDFs that internally wrap Hive UDFs. Since the wrappers are treated like regular Table API UDFs, we do not need to change the internals of the Table API. On the other hand this does only work if the interfaces of Table API UDFs and Hive UDFs are compatible (if they are not we would probably need different execution plans). Another pro aspect is that the wrappers could be located in a separate Maven module which also prevents a hard Hive dependency in flink-table.

I would opt for the second approach because it has less implications on the internals of the Table API. If we figure out that this approach will not work for some Hive UDFs we have to decide whether it is worth to support those or not.

> Flink SQL calling Hive User-Defined Functions
> ---------------------------------------------
>
>                 Key: FLINK-5802
>                 URL: https://issues.apache.org/jira/browse/FLINK-5802
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Zhuoluo Yang
>              Labels: features
>
> It's important to call hive udf in Flink SQL. A great many udfs were written in hive since last ten years. 
> It's really important to reuse the hive udfs. This feature will reduce the cost of migration and bring more users to flink.
> Spark SQL has already supported this function.
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_spark-guide/content/calling-udfs.html
> The Hive UDFs here include both built-in UDFs and customized UDFs. As many business logic had been written in UDFs, the customized UDFs are more important than the built-in UDFs. 
> Generally, there are three kinds of UDFs in Hive: UDF, UDTF and UDAF.
> Here is the document of the Spark SQL: http://spark.apache.org/docs/latest/sql-programming-guide.html#compatibility-with-apache-hive 
> Spark code:
> https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
> https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)