You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Vinod KC (JIRA)" <ji...@apache.org> on 2018/09/03 05:03:00 UTC
[jira] [Updated] (SPARK-25301) When a view uses an UDF from a non
default database, Spark analyser throws AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-25301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod KC updated SPARK-25301:
-----------------------------
Description:
When a hive view uses an UDF from a non default database, Spark analyser throws AnalysisException
Steps to simulate this issue
-----------------------------
In Hive
--------
1) CREATE DATABASE d100;
2) create function d100.udf100 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper'; // Note: udf100 is created in d100
3) create view d100.v100 as select *d100.udf100*(name) from default.emp; // Note : table default.emp has two columns 'name', 'address',
5) select * from d100.v100; // query on view d100.v100 gives correct result
In Spark
-------------
1) spark.sql("select * from d100.v100").show
throws
```
org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. This function is neither a registered temporary function nor a permanent function registered in the database '*default*'
```
This is because, while parsing the SQL statement of the View 'select `d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to split database name and udf name and hence Spark function registry tries to load the UDF 'd100.udf100' from 'default' database.
was:
When a hive view uses an UDF from a non default database, Spark analyser throws AnalysisException
Steps to simulate this issue
-----------------------------
In Hive
--------
1) CREATE DATABASE d100;
2) ADD JAR /usr/udf/masking.jar // masking.jar has a custom udf class 'com.uzx.udf.Masking'
3) create function d100.udf100 as "com.uzx.udf.Masking"; // Note: udf100 is created in d100
4) create view d100.v100 as select *d100.udf100*(name) from default.emp; // Note : table default.emp has two columns 'nanme', 'address',
5) select * from d100.v100; // query on view d100.v100 gives correct result
In Spark
-------------
1) spark.sql("select * from d100.v100").show
throws
```
org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. This function is neither a registered temporary function nor a permanent function registered in the database '*default*'
```
This is because, while parsing the SQL statement of the View 'select `d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to split database name and udf name and hence Spark function registry tries to load the UDF 'd100.udf100' from 'default' database.
> When a view uses an UDF from a non default database, Spark analyser throws AnalysisException
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-25301
> URL: https://issues.apache.org/jira/browse/SPARK-25301
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.0
> Reporter: Vinod KC
> Priority: Minor
>
> When a hive view uses an UDF from a non default database, Spark analyser throws AnalysisException
> Steps to simulate this issue
> -----------------------------
> In Hive
> --------
> 1) CREATE DATABASE d100;
> 2) create function d100.udf100 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper'; // Note: udf100 is created in d100
> 3) create view d100.v100 as select *d100.udf100*(name) from default.emp; // Note : table default.emp has two columns 'name', 'address',
> 5) select * from d100.v100; // query on view d100.v100 gives correct result
> In Spark
> -------------
> 1) spark.sql("select * from d100.v100").show
> throws
> ```
> org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. This function is neither a registered temporary function nor a permanent function registered in the database '*default*'
> ```
> This is because, while parsing the SQL statement of the View 'select `d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to split database name and udf name and hence Spark function registry tries to load the UDF 'd100.udf100' from 'default' database.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org