You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Everett Anderson <ev...@nuna.com.INVALID> on 2016/07/21 17:10:40 UTC

Programmatic use of UDFs from Java

Hi,

In the Java Spark DataFrames API, you can create a UDF, register it, and
then access it by string name by using the convenience UDF classes in
org.apache.spark.sql.api.java
<https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/api/java/package-summary.html>
.

Example

UDF1<String, Long> testUdf1 = new UDF1<>() { ... }

sqlContext.udf().register("testfn", testUdf1, DataTypes.LongType);

DataFrame df2 = df.withColumn("new_col", *functions.callUDF("testfn"*,
df.col("old_col")));

However, I'd like to avoid registering these by name, if possible, since I
have many of them and would need to deal with name conflicts.

There are udf() methods like this that seem to be from the Scala API
<https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#udf(scala.Function1,%20scala.reflect.api.TypeTags.TypeTag,%20scala.reflect.api.TypeTags.TypeTag)>,
where you don't have to register everything by name first.

However, using those methods from Java would require interacting with
Scala's scala.reflect.api.TypeTags.TypeTag. I'm having a hard time figuring
out how to create a TypeTag from Java.

Does anyone have an example of using the udf() methods from Java?

Thanks!

- Everett

Re: Programmatic use of UDFs from Java

Posted by Everett Anderson <ev...@nuna.com.INVALID>.
Thanks for the pointer, Bryan! Sounds like I was on the right track in
terms of what's available for now.

(And Gourav -- I'm certainly interested in migrating to Scala, but our team
is mostly Java, Python, and R based right now!)


On Thu, Jul 21, 2016 at 11:00 PM, Bryan Cutler <cu...@gmail.com> wrote:

> Everett, I had the same question today and came across this old thread.
> Not sure if there has been any more recent work to support this.
> http://apache-spark-developers-list.1001551.n3.nabble.com/Using-UDFs-in-Java-without-registration-td12497.html
>
>
> On Thu, Jul 21, 2016 at 10:10 AM, Everett Anderson <
> everett@nuna.com.invalid> wrote:
>
>> Hi,
>>
>> In the Java Spark DataFrames API, you can create a UDF, register it, and
>> then access it by string name by using the convenience UDF classes in
>> org.apache.spark.sql.api.java
>> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/api/java/package-summary.html>
>> .
>>
>> Example
>>
>> UDF1<String, Long> testUdf1 = new UDF1<>() { ... }
>>
>> sqlContext.udf().register("testfn", testUdf1, DataTypes.LongType);
>>
>> DataFrame df2 = df.withColumn("new_col", *functions.callUDF("testfn"*,
>> df.col("old_col")));
>>
>> However, I'd like to avoid registering these by name, if possible, since
>> I have many of them and would need to deal with name conflicts.
>>
>> There are udf() methods like this that seem to be from the Scala API
>> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#udf(scala.Function1,%20scala.reflect.api.TypeTags.TypeTag,%20scala.reflect.api.TypeTags.TypeTag)>,
>> where you don't have to register everything by name first.
>>
>> However, using those methods from Java would require interacting with
>> Scala's scala.reflect.api.TypeTags.TypeTag. I'm having a hard time
>> figuring out how to create a TypeTag from Java.
>>
>> Does anyone have an example of using the udf() methods from Java?
>>
>> Thanks!
>>
>> - Everett
>>
>>
>

Re: Programmatic use of UDFs from Java

Posted by Bryan Cutler <cu...@gmail.com>.
Everett, I had the same question today and came across this old thread.
Not sure if there has been any more recent work to support this.
http://apache-spark-developers-list.1001551.n3.nabble.com/Using-UDFs-in-Java-without-registration-td12497.html


On Thu, Jul 21, 2016 at 10:10 AM, Everett Anderson <everett@nuna.com.invalid
> wrote:

> Hi,
>
> In the Java Spark DataFrames API, you can create a UDF, register it, and
> then access it by string name by using the convenience UDF classes in
> org.apache.spark.sql.api.java
> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/api/java/package-summary.html>
> .
>
> Example
>
> UDF1<String, Long> testUdf1 = new UDF1<>() { ... }
>
> sqlContext.udf().register("testfn", testUdf1, DataTypes.LongType);
>
> DataFrame df2 = df.withColumn("new_col", *functions.callUDF("testfn"*,
> df.col("old_col")));
>
> However, I'd like to avoid registering these by name, if possible, since I
> have many of them and would need to deal with name conflicts.
>
> There are udf() methods like this that seem to be from the Scala API
> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#udf(scala.Function1,%20scala.reflect.api.TypeTags.TypeTag,%20scala.reflect.api.TypeTags.TypeTag)>,
> where you don't have to register everything by name first.
>
> However, using those methods from Java would require interacting with
> Scala's scala.reflect.api.TypeTags.TypeTag. I'm having a hard time
> figuring out how to create a TypeTag from Java.
>
> Does anyone have an example of using the udf() methods from Java?
>
> Thanks!
>
> - Everett
>
>

Re: Programmatic use of UDFs from Java

Posted by Gourav Sengupta <go...@gmail.com>.
JAVA seriously?????

On Thu, Jul 21, 2016 at 6:10 PM, Everett Anderson <ev...@nuna.com.invalid>
wrote:

> Hi,
>
> In the Java Spark DataFrames API, you can create a UDF, register it, and
> then access it by string name by using the convenience UDF classes in
> org.apache.spark.sql.api.java
> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/api/java/package-summary.html>
> .
>
> Example
>
> UDF1<String, Long> testUdf1 = new UDF1<>() { ... }
>
> sqlContext.udf().register("testfn", testUdf1, DataTypes.LongType);
>
> DataFrame df2 = df.withColumn("new_col", *functions.callUDF("testfn"*,
> df.col("old_col")));
>
> However, I'd like to avoid registering these by name, if possible, since I
> have many of them and would need to deal with name conflicts.
>
> There are udf() methods like this that seem to be from the Scala API
> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#udf(scala.Function1,%20scala.reflect.api.TypeTags.TypeTag,%20scala.reflect.api.TypeTags.TypeTag)>,
> where you don't have to register everything by name first.
>
> However, using those methods from Java would require interacting with
> Scala's scala.reflect.api.TypeTags.TypeTag. I'm having a hard time
> figuring out how to create a TypeTag from Java.
>
> Does anyone have an example of using the udf() methods from Java?
>
> Thanks!
>
> - Everett
>
>