You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:37:40 UTC

[jira] [Resolved] (SPARK-15418) SparkSQL does not support using a UDAF in a CREATE VIEW clause

     [ https://issues.apache.org/jira/browse/SPARK-15418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-15418.
----------------------------------
    Resolution: Incomplete

> SparkSQL does not support using a UDAF in a CREATE VIEW clause
> --------------------------------------------------------------
>
>                 Key: SPARK-15418
>                 URL: https://issues.apache.org/jira/browse/SPARK-15418
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Hanbo Wang
>            Priority: Major
>              Labels: bulk-closed, spark, sparksql
>
> I am using AWS EMR + Spark 1.6.1 + Hive 1.0.0
> I have this UDAF and have included it in the classpath of spark https://github.com/scribd/hive-udaf-maxrow/blob/master/src/com/scribd/hive/udaf/GenericUDAFMaxRow.java
> And registered it in spark by sqlContext.sql("CREATE TEMPORARY FUNCTION maxrow AS 'some.cool.package.hive.udf.GenericUDAFMaxRow'")
> However, when I call it in Spark in the following CREATE VIEW query
> {code}
> CREATE VIEW VIEW_1 AS
>       SELECT
>         a.A,
>         a.B,
>         maxrow ( a.C,
>                  a.D,
>                  a.E,
>                  a.F,
>                  a.G,
>                  a.H,
>                  a.I
>             ) as m
>         FROM
>             table_1 a
>         JOIN
>             table_2 b
>         ON
>                 b.Z = a.D
>             AND b.Y  = a.C
>         JOIN dummy_table
>         GROUP BY
>             a.A,
>             a.B
> {code}
> It gave me the following error
> {code}
> 16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.A was overwritten in RowResolver map: _col0: string by _col0: string
> 16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.B was overwritten in RowResolver map: _col1: bigint by _col1: bigint
> 16/05/18 19:49:14 ERROR Driver: FAILED: SemanticException [Error 10002]: Line 16:32 Invalid column reference 'C'
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 16:32 Invalid column reference 'C'
>                 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10643)
>                 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10591)
>                 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3656)
> {code}
> Running the query without CREATE VIEW is fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org