You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Andrés Ivaldi <ia...@gmail.com> on 2017/06/13 18:52:35 UTC

UDF percentile_approx

Hello, I`m trying to user percentile_approx  on my SQL query, but It's like
spark context can´t find the function

I'm using it like this
import org.apache.spark.sql.functions._
import org.apache.spark.sql.DataFrameStatFunctions

val e = expr("percentile_approx(Cantidadcon0234514)")
df.agg(e).show()

and exception is

org.apache.spark.sql.AnalysisException: Undefined function:
'percentile_approx'. This function is neither a registered temporary
function nor a permanent function registered

I've also tryid with callUDF

Regards.

-- 
Ing. Ivaldi Andres

Re: UDF percentile_approx

Posted by Andrés Ivaldi <ia...@gmail.com>.

Hello,
Riccardo I was able to make it run, the problem is that HiveContext doesn't
exists any more in Spark 2.0.2, as far I can see. But exists the method
enableHiveSupport to add the hive functionality to SparkSession. To enable
this the spark-hive_2.11 dependency is needed.

In the Spark API Docs this is not well explained, only says that SqlContext
and HiveContext are now part of SparkSession

"SparkSession is now the new entry point of Spark that replaces the old
SQLContext and HiveContext. Note that the old SQLContext and HiveContext
are kept for backward compatibility. A new catalog interface is accessible
from SparkSession - existing API on databases and tables access such as
listTables, createExternalTable, dropTempView, cacheTable are moved here."

I think would be a good idea document enableHiveSupport also.

Thanks,

On Wed, Jun 14, 2017 at 5:13 AM, Takeshi Yamamuro <li...@gmail.com>
wrote:

> You can use the function w/o hive and you can try:
>
> scala> Seq(1.0, 8.0).toDF("a").selectExpr("percentile_approx(a,
> 0.5)").show
>
> +------------------------------------------------+
>
> |percentile_approx(a, CAST(0.5 AS DOUBLE), 10000)|
>
> +------------------------------------------------+
>
> |                                             8.0|
>
> +------------------------------------------------+
>
>
> // maropu
>
>
>
> On Wed, Jun 14, 2017 at 5:04 PM, Riccardo Ferrari <fe...@gmail.com>
> wrote:
>
>> Hi Andres,
>>
>> I can't find the refrence, last time I searched for that I found that
>> 'percentile_approx' is only available via hive context. You should register
>> a temp table and use it from there.
>>
>> Best,
>>
>> On Tue, Jun 13, 2017 at 8:52 PM, Andrés Ivaldi <ia...@gmail.com>
>> wrote:
>>
>>> Hello, I`m trying to user percentile_approx  on my SQL query, but It's
>>> like spark context can´t find the function
>>>
>>> I'm using it like this
>>> import org.apache.spark.sql.functions._
>>> import org.apache.spark.sql.DataFrameStatFunctions
>>>
>>> val e = expr("percentile_approx(Cantidadcon0234514)")
>>> df.agg(e).show()
>>>
>>> and exception is
>>>
>>> org.apache.spark.sql.AnalysisException: Undefined function:
>>> 'percentile_approx'. This function is neither a registered temporary
>>> function nor a permanent function registered
>>>
>>> I've also tryid with callUDF
>>>
>>> Regards.
>>>
>>> --
>>> Ing. Ivaldi Andres
>>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

-- 
Ing. Ivaldi Andres

Re: UDF percentile_approx

Posted by Takeshi Yamamuro <li...@gmail.com>.

You can use the function w/o hive and you can try:

scala> Seq(1.0, 8.0).toDF("a").selectExpr("percentile_approx(a, 0.5)").show

+------------------------------------------------+

|percentile_approx(a, CAST(0.5 AS DOUBLE), 10000)|

+------------------------------------------------+

|                                             8.0|

+------------------------------------------------+


// maropu



On Wed, Jun 14, 2017 at 5:04 PM, Riccardo Ferrari <fe...@gmail.com>
wrote:

> Hi Andres,
>
> I can't find the refrence, last time I searched for that I found that
> 'percentile_approx' is only available via hive context. You should register
> a temp table and use it from there.
>
> Best,
>
> On Tue, Jun 13, 2017 at 8:52 PM, Andrés Ivaldi <ia...@gmail.com> wrote:
>
>> Hello, I`m trying to user percentile_approx  on my SQL query, but It's
>> like spark context can´t find the function
>>
>> I'm using it like this
>> import org.apache.spark.sql.functions._
>> import org.apache.spark.sql.DataFrameStatFunctions
>>
>> val e = expr("percentile_approx(Cantidadcon0234514)")
>> df.agg(e).show()
>>
>> and exception is
>>
>> org.apache.spark.sql.AnalysisException: Undefined function:
>> 'percentile_approx'. This function is neither a registered temporary
>> function nor a permanent function registered
>>
>> I've also tryid with callUDF
>>
>> Regards.
>>
>> --
>> Ing. Ivaldi Andres
>>
>
>


-- 
---
Takeshi Yamamuro

Re: UDF percentile_approx

Posted by Riccardo Ferrari <fe...@gmail.com>.

Hi Andres,

I can't find the refrence, last time I searched for that I found that
'percentile_approx' is only available via hive context. You should register
a temp table and use it from there.

Best,

On Tue, Jun 13, 2017 at 8:52 PM, Andrés Ivaldi <ia...@gmail.com> wrote:

> Hello, I`m trying to user percentile_approx  on my SQL query, but It's
> like spark context can´t find the function
>
> I'm using it like this
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.DataFrameStatFunctions
>
> val e = expr("percentile_approx(Cantidadcon0234514)")
> df.agg(e).show()
>
> and exception is
>
> org.apache.spark.sql.AnalysisException: Undefined function:
> 'percentile_approx'. This function is neither a registered temporary
> function nor a permanent function registered
>
> I've also tryid with callUDF
>
> Regards.
>
> --
> Ing. Ivaldi Andres
>