You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2014/11/06 04:45:33 UTC

[jira] [Commented] (SPARK-4263) PERCENTILE is not working

    [ https://issues.apache.org/jira/browse/SPARK-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199723#comment-14199723 ] 

Yin Huai commented on SPARK-4263:
---------------------------------

I think you cannot use percentile for string type column. Also, check https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inAggregateFunctions(UDAF).

> PERCENTILE is not working
> -------------------------
>
>                 Key: SPARK-4263
>                 URL: https://issues.apache.org/jira/browse/SPARK-4263
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Cheng Hao
>            Priority: Minor
>
> When query "select percentile(key, array(0, 0.5,1)) from src", it will throws exception like:
> {panel}
> org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method for class org.apache.hadoop.hive.ql.udf.UDAFPercentile with (string, array<double>). Possible choices: _FUNC_(bigint, array<double>)  _FUNC_(bigint, double)  
> 	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1213)
> 	at org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:84)
> 	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56)
> 	at org.apache.spark.sql.hive.HiveUdaf.objectInspector$lzycompute(hiveUdfs.scala:234)
> 	at org.apache.spark.sql.hive.HiveUdaf.objectInspector(hiveUdfs.scala:233)
> 	at org.apache.spark.sql.hive.HiveUdaf.dataType(hiveUdfs.scala:241)
> 	at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:104)
> 	at org.apache.spark.sql.catalyst.plans.logical.Aggregate$$anonfun$output$6.apply(basicOperators.scala:143)
> 	at org.apache.spark.sql.catalyst.plans.logical.Aggregate$$anonfun$output$6.apply(basicOperators.scala:143)
> 	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> 	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> 	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> 	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> 	at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> 	at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> 	at org.apache.spark.sql.catalyst.plans.logical.Aggregate.output(basicOperators.scala:143)
> 	at org.apache.spark.sql.catalyst.plans.logical.Limit.output(basicOperators.scala:147)
> 	at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$unapply$1.apply(patterns.scala:61)
> 	at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$unapply$1.apply(patterns.scala:61)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.sql.catalyst.planning.PhysicalOperation$.unapply(patterns.scala:61)
> 	at org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:34)
> 	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> 	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> 	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> 	at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
> 	at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
> 	at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
> 	at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
> 	at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
> 	at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:425)
> 	at org.apache.spark.sql.hive.thriftserver.AbstractSparkSQLDriver.run(AbstractSparkSQLDriver.scala:59)
> 	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:276)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
> 	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:211)
> 	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org