You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by unk1102 <um...@gmail.com> on 2015/10/02 13:25:26 UTC

How to use registered Hive UDF in Spark DataFrame?

Hi I have registed my hive UDF using the following code:

hiveContext.udf().register("MyUDF",new UDF1(String,String)) {
public String call(String o) throws Execption {
//bla bla
}
},DataTypes.String);

Now I want to use above MyUDF in DataFrame. How do we use it? I know how to
use it in a sql and it works fine

hiveContext.sql(select MyUDF("test") from myTable);

My hiveContext.sql() query involves group by on multiple columns so for
scaling purpose I am trying to convert this query into DataFrame APIs

dataframe.select("col1","col2","coln").groupby(""col1","col2","coln").count();

Can we do the follwing dataframe.select(MyUDF("col1"))??? Please guide.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-registered-Hive-UDF-in-Spark-DataFrame-tp24907.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: How to use registered Hive UDF in Spark DataFrame?

Posted by Umesh Kacha <um...@gmail.com>.

Hi I tried to use callUDF in the following way it throws exception saying
cant recognise myUDF even though I registered it.

List<Column> colList = new ArrayList<Column>();
colSeq.add(col("myColumn").as("modifiedColumn"));
Seq<Column> colSeq = JavaConversions.asScalaBuffer(colList);//I need to do
this because the following call wont accept just one col() it needs
Seq<Column>
DataFrame resultFrame =
sourceFrame.select(callUDF("MyUDF").toString(),colSeq);

Above call fails saying cant recognise ''MyUDF myColumn as modifiedColumn'
in given columns bla bla...

On Sat, Oct 3, 2015 at 2:36 AM, Michael Armbrust <mi...@databricks.com>
wrote:

> callUDF("MyUDF", col("col1").as("name")
>
> or
>
> callUDF("MyUDF", col("col1").alias("name")
>
> On Fri, Oct 2, 2015 at 3:29 PM, Umesh Kacha <um...@gmail.com> wrote:
>
>> Hi Michael,
>>
>> Thanks much. How do we give alias name for resultant columns? For e.g.
>> when using
>>
>> hiveContext.sql("select MyUDF("test") as mytest from myTable");
>>
>> how do we do that in DataFrame callUDF
>>
>> callUDF("MyUDF", col("col1"))???
>>
>> On Fri, Oct 2, 2015 at 8:23 PM, Michael Armbrust <mi...@databricks.com>
>> wrote:
>>
>>> import org.apache.spark.sql.functions.*
>>>
>>> callUDF("MyUDF", col("col1"), col("col2"))
>>>
>>> On Fri, Oct 2, 2015 at 6:25 AM, unk1102 <um...@gmail.com> wrote:
>>>
>>>> Hi I have registed my hive UDF using the following code:
>>>>
>>>> hiveContext.udf().register("MyUDF",new UDF1(String,String)) {
>>>> public String call(String o) throws Execption {
>>>> //bla bla
>>>> }
>>>> },DataTypes.String);
>>>>
>>>> Now I want to use above MyUDF in DataFrame. How do we use it? I know
>>>> how to
>>>> use it in a sql and it works fine
>>>>
>>>> hiveContext.sql(select MyUDF("test") from myTable);
>>>>
>>>> My hiveContext.sql() query involves group by on multiple columns so for
>>>> scaling purpose I am trying to convert this query into DataFrame APIs
>>>>
>>>>
>>>> dataframe.select("col1","col2","coln").groupby(""col1","col2","coln").count();
>>>>
>>>> Can we do the follwing dataframe.select(MyUDF("col1"))??? Please guide.
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-registered-Hive-UDF-in-Spark-DataFrame-tp24907.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: How to use registered Hive UDF in Spark DataFrame?

Posted by Michael Armbrust <mi...@databricks.com>.

callUDF("MyUDF", col("col1").as("name")

or

callUDF("MyUDF", col("col1").alias("name")

On Fri, Oct 2, 2015 at 3:29 PM, Umesh Kacha <um...@gmail.com> wrote:

> Hi Michael,
>
> Thanks much. How do we give alias name for resultant columns? For e.g.
> when using
>
> hiveContext.sql("select MyUDF("test") as mytest from myTable");
>
> how do we do that in DataFrame callUDF
>
> callUDF("MyUDF", col("col1"))???
>
> On Fri, Oct 2, 2015 at 8:23 PM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> import org.apache.spark.sql.functions.*
>>
>> callUDF("MyUDF", col("col1"), col("col2"))
>>
>> On Fri, Oct 2, 2015 at 6:25 AM, unk1102 <um...@gmail.com> wrote:
>>
>>> Hi I have registed my hive UDF using the following code:
>>>
>>> hiveContext.udf().register("MyUDF",new UDF1(String,String)) {
>>> public String call(String o) throws Execption {
>>> //bla bla
>>> }
>>> },DataTypes.String);
>>>
>>> Now I want to use above MyUDF in DataFrame. How do we use it? I know how
>>> to
>>> use it in a sql and it works fine
>>>
>>> hiveContext.sql(select MyUDF("test") from myTable);
>>>
>>> My hiveContext.sql() query involves group by on multiple columns so for
>>> scaling purpose I am trying to convert this query into DataFrame APIs
>>>
>>>
>>> dataframe.select("col1","col2","coln").groupby(""col1","col2","coln").count();
>>>
>>> Can we do the follwing dataframe.select(MyUDF("col1"))??? Please guide.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-registered-Hive-UDF-in-Spark-DataFrame-tp24907.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Re: How to use registered Hive UDF in Spark DataFrame?

Posted by Umesh Kacha <um...@gmail.com>.

Hi Michael,

Thanks much. How do we give alias name for resultant columns? For e.g. when
using

hiveContext.sql("select MyUDF("test") as mytest from myTable");

how do we do that in DataFrame callUDF

callUDF("MyUDF", col("col1"))???

On Fri, Oct 2, 2015 at 8:23 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> import org.apache.spark.sql.functions.*
>
> callUDF("MyUDF", col("col1"), col("col2"))
>
> On Fri, Oct 2, 2015 at 6:25 AM, unk1102 <um...@gmail.com> wrote:
>
>> Hi I have registed my hive UDF using the following code:
>>
>> hiveContext.udf().register("MyUDF",new UDF1(String,String)) {
>> public String call(String o) throws Execption {
>> //bla bla
>> }
>> },DataTypes.String);
>>
>> Now I want to use above MyUDF in DataFrame. How do we use it? I know how
>> to
>> use it in a sql and it works fine
>>
>> hiveContext.sql(select MyUDF("test") from myTable);
>>
>> My hiveContext.sql() query involves group by on multiple columns so for
>> scaling purpose I am trying to convert this query into DataFrame APIs
>>
>>
>> dataframe.select("col1","col2","coln").groupby(""col1","col2","coln").count();
>>
>> Can we do the follwing dataframe.select(MyUDF("col1"))??? Please guide.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-registered-Hive-UDF-in-Spark-DataFrame-tp24907.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: How to use registered Hive UDF in Spark DataFrame?

Posted by Michael Armbrust <mi...@databricks.com>.

import org.apache.spark.sql.functions.*

callUDF("MyUDF", col("col1"), col("col2"))

On Fri, Oct 2, 2015 at 6:25 AM, unk1102 <um...@gmail.com> wrote:

> Hi I have registed my hive UDF using the following code:
>
> hiveContext.udf().register("MyUDF",new UDF1(String,String)) {
> public String call(String o) throws Execption {
> //bla bla
> }
> },DataTypes.String);
>
> Now I want to use above MyUDF in DataFrame. How do we use it? I know how to
> use it in a sql and it works fine
>
> hiveContext.sql(select MyUDF("test") from myTable);
>
> My hiveContext.sql() query involves group by on multiple columns so for
> scaling purpose I am trying to convert this query into DataFrame APIs
>
>
> dataframe.select("col1","col2","coln").groupby(""col1","col2","coln").count();
>
> Can we do the follwing dataframe.select(MyUDF("col1"))??? Please guide.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-registered-Hive-UDF-in-Spark-DataFrame-tp24907.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>