You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by vi...@socialinfra.net on 2014/07/25 17:06:48 UTC

Support for Percentile and Variance Aggregation functions in Spark with HiveContext





Hi all,
I am using Spark 1.0.0 with CDH 5.1.0.
I want to
aggregate the data in a raw table using a simple query like
below
SELECT MIN(field1), MAX(field2), AVG(field3),
PERCENTILE(field4), year,month,day FROM  raw_data_table  GROUP
BY year, month, day
MIN, MAX and AVG functions work fine
for me, but with PERCENTILE, I get an error as shown
below.
Exception in thread "main"
java.lang.RuntimeException: No handler for udf class
org.apache.hadoop.hive.ql.udf.UDAFPercentile

        at
scala.sys.package$.error(package.scala:27)

        at
org.apache.spark.sql.hive.HiveFunctionRegistry$.lookupFunction(hiveUdfs.scala:69)

        at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:115)

        at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:113)

        at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
I
have read in the documentation that with HiveContext Spark SQL supports
all the UDFs supported in Hive.
I want to know if there is anything
else I need to follow to use Percentile with Spark SQL..?? Or .. Are there
any limitations still in Spark SQL with respect to UDFs and UDAFs in the
version I am using..??
 
 
Thanks and
regards
Vinay Kashyap

Re: Support for Percentile and Variance Aggregation functions in Spark with HiveContext

Posted by Michael Armbrust <mi...@databricks.com>.

Hmm, in general we try to support all the UDAFs, but this one must be using
a different base class that we don't have a wrapper for.  JIRA here:
https://issues.apache.org/jira/browse/SPARK-2693


On Fri, Jul 25, 2014 at 8:06 AM, <vi...@socialinfra.net> wrote:

>
> Hi all,
>
> I am using Spark 1.0.0 with CDH 5.1.0.
>
> I want to aggregate the data in a raw table using a simple query like below
>
> *SELECT MIN(field1), MAX(field2), AVG(field3), PERCENTILE(field4),
> year,month,day FROM  raw_data_table  GROUP BY year, month, day*
>
> MIN, MAX and AVG functions work fine for me, but with PERCENTILE, I get an
> error as shown below.
>
> Exception in thread "main" java.lang.RuntimeException: No handler for udf
> class org.apache.hadoop.hive.ql.udf.UDAFPercentile
>         at scala.sys.package$.error(package.scala:27)
>         at
> org.apache.spark.sql.hive.HiveFunctionRegistry$.lookupFunction(hiveUdfs.scala:69)
>         at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:115)
>         at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$4$$anonfun$applyOrElse$3.applyOrElse(Analyzer.scala:113)
>         at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>
> I have read in the documentation that with HiveContext Spark SQL supports
> all the UDFs supported in Hive.
>
> I want to know if there is anything else I need to follow to use
> Percentile with Spark SQL..?? Or .. Are there any limitations still in
> Spark SQL with respect to UDFs and UDAFs in the version I am using..??
>
>
>
>
>
> Thanks and regards
>
> Vinay Kashyap
>