You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by raghukiran <ra...@gmail.com> on 2016/01/13 20:58:30 UTC

SQL UDF problem (with re to types)

While registering and using SQL UDFs, I am running into the following
problem:

UDF registered:

ctx.udf().register("Test", new UDF1<Double, String>() {
			/**
			 * 
			 */
			private static final long serialVersionUID = -8231917155671435931L;

			public String call(Double x) throws Exception {
				return "testing";
			}
		}, DataTypes.StringType);

Usage:
query = "SELECT Test(82.4)";
		result = sqlCtx.sql(query).first();
		System.out.println(result.toString());

Problem: Class Cast exception thrown
Caused by: java.lang.ClassCastException: java.math.BigDecimal cannot be cast
to java.lang.Double

This problem occurs with Spark v1.5.2 and 1.6.0.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQL-UDF-problem-with-re-to-types-tp25968.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: SQL UDF problem (with re to types)

Posted by Ted Yu <yu...@gmail.com>.

While reading some book on Java 8, I saw a reference to the following
w.r.t. declaration-site variance :

https://bugs.openjdk.java.net/browse/JDK-8043488

The above reportedly targets Java 9.

FYI

On Thu, Jan 14, 2016 at 12:33 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> I don't believe that Java 8 got rid of erasure. In fact I think its
> actually worse when you use Java 8 lambdas.
>
> On Thu, Jan 14, 2016 at 10:54 AM, Raghu Ganti <ra...@gmail.com>
> wrote:
>
>> Would this go away if the Spark source was compiled against Java 1.8
>> (since the problem of type erasure is solved through proper generics
>> implementation in Java 1.8).
>>
>> On Thu, Jan 14, 2016 at 1:42 PM, Michael Armbrust <michael@databricks.com
>> > wrote:
>>
>>> We automatically convert types for UDFs defined in Scala, but we can't
>>> do it in Java because the types are erased by the compiler.  If you want to
>>> use double you should cast before calling the UDF.
>>>
>>> On Wed, Jan 13, 2016 at 8:10 PM, Raghu Ganti <ra...@gmail.com>
>>> wrote:
>>>
>>>> So, when I try BigDecimal, it works. But, should it not parse based on
>>>> what the UDF defines? Am I missing something here?
>>>>
>>>> On Wed, Jan 13, 2016 at 4:57 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>
>>>>> Please take a look
>>>>> at sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleSum.java
>>>>> which shows a UserDefinedAggregateFunction that works on DoubleType column.
>>>>>
>>>>> sql/hive/src/test/java/org/apache/spark/sql/hive/JavaDataFrameSuite.java
>>>>> shows how it is registered.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Wed, Jan 13, 2016 at 11:58 AM, raghukiran <ra...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> While registering and using SQL UDFs, I am running into the following
>>>>>> problem:
>>>>>>
>>>>>> UDF registered:
>>>>>>
>>>>>> ctx.udf().register("Test", new UDF1<Double, String>() {
>>>>>>                         /**
>>>>>>                          *
>>>>>>                          */
>>>>>>                         private static final long serialVersionUID =
>>>>>> -8231917155671435931L;
>>>>>>
>>>>>>                         public String call(Double x) throws Exception
>>>>>> {
>>>>>>                                 return "testing";
>>>>>>                         }
>>>>>>                 }, DataTypes.StringType);
>>>>>>
>>>>>> Usage:
>>>>>> query = "SELECT Test(82.4)";
>>>>>>                 result = sqlCtx.sql(query).first();
>>>>>>                 System.out.println(result.toString());
>>>>>>
>>>>>> Problem: Class Cast exception thrown
>>>>>> Caused by: java.lang.ClassCastException: java.math.BigDecimal cannot
>>>>>> be cast
>>>>>> to java.lang.Double
>>>>>>
>>>>>> This problem occurs with Spark v1.5.2 and 1.6.0.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/SQL-UDF-problem-with-re-to-types-tp25968.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: SQL UDF problem (with re to types)

Posted by Michael Armbrust <mi...@databricks.com>.

I don't believe that Java 8 got rid of erasure. In fact I think its
actually worse when you use Java 8 lambdas.

On Thu, Jan 14, 2016 at 10:54 AM, Raghu Ganti <ra...@gmail.com> wrote:

> Would this go away if the Spark source was compiled against Java 1.8
> (since the problem of type erasure is solved through proper generics
> implementation in Java 1.8).
>
> On Thu, Jan 14, 2016 at 1:42 PM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> We automatically convert types for UDFs defined in Scala, but we can't do
>> it in Java because the types are erased by the compiler.  If you want to
>> use double you should cast before calling the UDF.
>>
>> On Wed, Jan 13, 2016 at 8:10 PM, Raghu Ganti <ra...@gmail.com>
>> wrote:
>>
>>> So, when I try BigDecimal, it works. But, should it not parse based on
>>> what the UDF defines? Am I missing something here?
>>>
>>> On Wed, Jan 13, 2016 at 4:57 PM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>>> Please take a look
>>>> at sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleSum.java
>>>> which shows a UserDefinedAggregateFunction that works on DoubleType column.
>>>>
>>>> sql/hive/src/test/java/org/apache/spark/sql/hive/JavaDataFrameSuite.java
>>>> shows how it is registered.
>>>>
>>>> Cheers
>>>>
>>>> On Wed, Jan 13, 2016 at 11:58 AM, raghukiran <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> While registering and using SQL UDFs, I am running into the following
>>>>> problem:
>>>>>
>>>>> UDF registered:
>>>>>
>>>>> ctx.udf().register("Test", new UDF1<Double, String>() {
>>>>>                         /**
>>>>>                          *
>>>>>                          */
>>>>>                         private static final long serialVersionUID =
>>>>> -8231917155671435931L;
>>>>>
>>>>>                         public String call(Double x) throws Exception {
>>>>>                                 return "testing";
>>>>>                         }
>>>>>                 }, DataTypes.StringType);
>>>>>
>>>>> Usage:
>>>>> query = "SELECT Test(82.4)";
>>>>>                 result = sqlCtx.sql(query).first();
>>>>>                 System.out.println(result.toString());
>>>>>
>>>>> Problem: Class Cast exception thrown
>>>>> Caused by: java.lang.ClassCastException: java.math.BigDecimal cannot
>>>>> be cast
>>>>> to java.lang.Double
>>>>>
>>>>> This problem occurs with Spark v1.5.2 and 1.6.0.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/SQL-UDF-problem-with-re-to-types-tp25968.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: SQL UDF problem (with re to types)

Posted by Raghu Ganti <ra...@gmail.com>.

Would this go away if the Spark source was compiled against Java 1.8 (since
the problem of type erasure is solved through proper generics
implementation in Java 1.8).

On Thu, Jan 14, 2016 at 1:42 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> We automatically convert types for UDFs defined in Scala, but we can't do
> it in Java because the types are erased by the compiler.  If you want to
> use double you should cast before calling the UDF.
>
> On Wed, Jan 13, 2016 at 8:10 PM, Raghu Ganti <ra...@gmail.com> wrote:
>
>> So, when I try BigDecimal, it works. But, should it not parse based on
>> what the UDF defines? Am I missing something here?
>>
>> On Wed, Jan 13, 2016 at 4:57 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Please take a look
>>> at sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleSum.java
>>> which shows a UserDefinedAggregateFunction that works on DoubleType column.
>>>
>>> sql/hive/src/test/java/org/apache/spark/sql/hive/JavaDataFrameSuite.java
>>> shows how it is registered.
>>>
>>> Cheers
>>>
>>> On Wed, Jan 13, 2016 at 11:58 AM, raghukiran <ra...@gmail.com>
>>> wrote:
>>>
>>>> While registering and using SQL UDFs, I am running into the following
>>>> problem:
>>>>
>>>> UDF registered:
>>>>
>>>> ctx.udf().register("Test", new UDF1<Double, String>() {
>>>>                         /**
>>>>                          *
>>>>                          */
>>>>                         private static final long serialVersionUID =
>>>> -8231917155671435931L;
>>>>
>>>>                         public String call(Double x) throws Exception {
>>>>                                 return "testing";
>>>>                         }
>>>>                 }, DataTypes.StringType);
>>>>
>>>> Usage:
>>>> query = "SELECT Test(82.4)";
>>>>                 result = sqlCtx.sql(query).first();
>>>>                 System.out.println(result.toString());
>>>>
>>>> Problem: Class Cast exception thrown
>>>> Caused by: java.lang.ClassCastException: java.math.BigDecimal cannot be
>>>> cast
>>>> to java.lang.Double
>>>>
>>>> This problem occurs with Spark v1.5.2 and 1.6.0.
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/SQL-UDF-problem-with-re-to-types-tp25968.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: SQL UDF problem (with re to types)

Posted by Michael Armbrust <mi...@databricks.com>.

We automatically convert types for UDFs defined in Scala, but we can't do
it in Java because the types are erased by the compiler.  If you want to
use double you should cast before calling the UDF.

On Wed, Jan 13, 2016 at 8:10 PM, Raghu Ganti <ra...@gmail.com> wrote:

> So, when I try BigDecimal, it works. But, should it not parse based on
> what the UDF defines? Am I missing something here?
>
> On Wed, Jan 13, 2016 at 4:57 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Please take a look
>> at sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleSum.java
>> which shows a UserDefinedAggregateFunction that works on DoubleType column.
>>
>> sql/hive/src/test/java/org/apache/spark/sql/hive/JavaDataFrameSuite.java
>> shows how it is registered.
>>
>> Cheers
>>
>> On Wed, Jan 13, 2016 at 11:58 AM, raghukiran <ra...@gmail.com>
>> wrote:
>>
>>> While registering and using SQL UDFs, I am running into the following
>>> problem:
>>>
>>> UDF registered:
>>>
>>> ctx.udf().register("Test", new UDF1<Double, String>() {
>>>                         /**
>>>                          *
>>>                          */
>>>                         private static final long serialVersionUID =
>>> -8231917155671435931L;
>>>
>>>                         public String call(Double x) throws Exception {
>>>                                 return "testing";
>>>                         }
>>>                 }, DataTypes.StringType);
>>>
>>> Usage:
>>> query = "SELECT Test(82.4)";
>>>                 result = sqlCtx.sql(query).first();
>>>                 System.out.println(result.toString());
>>>
>>> Problem: Class Cast exception thrown
>>> Caused by: java.lang.ClassCastException: java.math.BigDecimal cannot be
>>> cast
>>> to java.lang.Double
>>>
>>> This problem occurs with Spark v1.5.2 and 1.6.0.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/SQL-UDF-problem-with-re-to-types-tp25968.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Re: SQL UDF problem (with re to types)

Posted by Raghu Ganti <ra...@gmail.com>.

So, when I try BigDecimal, it works. But, should it not parse based on what
the UDF defines? Am I missing something here?

On Wed, Jan 13, 2016 at 4:57 PM, Ted Yu <yu...@gmail.com> wrote:

> Please take a look
> at sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleSum.java
> which shows a UserDefinedAggregateFunction that works on DoubleType column.
>
> sql/hive/src/test/java/org/apache/spark/sql/hive/JavaDataFrameSuite.java
> shows how it is registered.
>
> Cheers
>
> On Wed, Jan 13, 2016 at 11:58 AM, raghukiran <ra...@gmail.com> wrote:
>
>> While registering and using SQL UDFs, I am running into the following
>> problem:
>>
>> UDF registered:
>>
>> ctx.udf().register("Test", new UDF1<Double, String>() {
>>                         /**
>>                          *
>>                          */
>>                         private static final long serialVersionUID =
>> -8231917155671435931L;
>>
>>                         public String call(Double x) throws Exception {
>>                                 return "testing";
>>                         }
>>                 }, DataTypes.StringType);
>>
>> Usage:
>> query = "SELECT Test(82.4)";
>>                 result = sqlCtx.sql(query).first();
>>                 System.out.println(result.toString());
>>
>> Problem: Class Cast exception thrown
>> Caused by: java.lang.ClassCastException: java.math.BigDecimal cannot be
>> cast
>> to java.lang.Double
>>
>> This problem occurs with Spark v1.5.2 and 1.6.0.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/SQL-UDF-problem-with-re-to-types-tp25968.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: SQL UDF problem (with re to types)

Posted by Ted Yu <yu...@gmail.com>.

Please take a look
at sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleSum.java
which shows a UserDefinedAggregateFunction that works on DoubleType column.

sql/hive/src/test/java/org/apache/spark/sql/hive/JavaDataFrameSuite.java
shows how it is registered.

Cheers

On Wed, Jan 13, 2016 at 11:58 AM, raghukiran <ra...@gmail.com> wrote:

> While registering and using SQL UDFs, I am running into the following
> problem:
>
> UDF registered:
>
> ctx.udf().register("Test", new UDF1<Double, String>() {
>                         /**
>                          *
>                          */
>                         private static final long serialVersionUID =
> -8231917155671435931L;
>
>                         public String call(Double x) throws Exception {
>                                 return "testing";
>                         }
>                 }, DataTypes.StringType);
>
> Usage:
> query = "SELECT Test(82.4)";
>                 result = sqlCtx.sql(query).first();
>                 System.out.println(result.toString());
>
> Problem: Class Cast exception thrown
> Caused by: java.lang.ClassCastException: java.math.BigDecimal cannot be
> cast
> to java.lang.Double
>
> This problem occurs with Spark v1.5.2 and 1.6.0.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SQL-UDF-problem-with-re-to-types-tp25968.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: SQL UDF problem (with re to types)

Posted by Ted Yu <yu...@gmail.com>.

Looks like BigDecimal was passed to your call() method.

Can you modify your udf to see if using BigDecimal works ?

Cheers

On Wed, Jan 13, 2016 at 11:58 AM, raghukiran <ra...@gmail.com> wrote:

> While registering and using SQL UDFs, I am running into the following
> problem:
>
> UDF registered:
>
> ctx.udf().register("Test", new UDF1<Double, String>() {
>                         /**
>                          *
>                          */
>                         private static final long serialVersionUID =
> -8231917155671435931L;
>
>                         public String call(Double x) throws Exception {
>                                 return "testing";
>                         }
>                 }, DataTypes.StringType);
>
> Usage:
> query = "SELECT Test(82.4)";
>                 result = sqlCtx.sql(query).first();
>                 System.out.println(result.toString());
>
> Problem: Class Cast exception thrown
> Caused by: java.lang.ClassCastException: java.math.BigDecimal cannot be
> cast
> to java.lang.Double
>
> This problem occurs with Spark v1.5.2 and 1.6.0.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SQL-UDF-problem-with-re-to-types-tp25968.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>