You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/02/13 18:55:36 UTC

using udf to convert Oracle number column in Data Frame

Hi,

 

 

Unfortunately Oracle table columns defined as NUMBER result in overflow.

 

An alternative seems to be to create a UDF to map that column to Double

 

val toDouble = udf((d: java.math.BigDecimal) => d.toString.toDouble)

 

 

This is the DF I have defined to fetch one column as below from the Oracle
table

 

  val s = sqlContext.load("jdbc",

     Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",

  "dbtable" -> "(select PROD_ID from sh.sales)",

  "user" -> "sh",

"password" -> "xxxxx"))

 

This obviously works

 

scala> s.count

res13: Long = 918843

 

Now the question is how to use that UDF toDouble to read column PROD_ID? Do
I need to create a temporary table? 

 

 

Thanks

 

Mich Talebzadeh

 

LinkedIn
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABU
rV8Pw>
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUr
V8Pw

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Technology Ltd, its subsidiaries nor their
employees accept any responsibility.

RE: using udf to convert Oracle number column in Data Frame

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.

Hi Ted,

Thanks for this. If generic functions exist then they are always faster and more efficient than UDFs from my experience. For example writing a UDF to do standard deviation in Oracle(nned this one for Oracle TimesTen IMDB)  turned out not to be any quick compared to Oracle’s own function STDDEV()

 

Unfortunately all columns defined as NUMBER, NUMBER(10,2) etc cause overflow in spark. However, they map fine in Hive using BigInt or NUMERIC(10,2)

 

So basically in the JDBC  I used Oracle to_CHAR()  function to convert these into strings and it seems to be OK as TO_CHAR( ) is a generic Oracle function and not UDF.

 

 

Thanks again

 

 

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.

 

 

From: Ted Yu [mailto:yuzhihong@gmail.com] 
Sent: 13 February 2016 18:36
To: Mich Talebzadeh <mi...@peridale.co.uk>
Cc: user <us...@spark.apache.org>
Subject: Re: using udf to convert Oracle number column in Data Frame

 

Please take a look at sql/core/src/main/scala/org/apache/spark/sql/functions.scala :

 

  def udf(f: AnyRef, dataType: DataType): UserDefinedFunction = {

    UserDefinedFunction(f, dataType, None)

 

And sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala :

 

  test("udf") {

    val foo = udf((a: Int, b: String) => a.toString + b)

 

    checkAnswer(

      // SELECT *, foo(key, value) FROM testData

      testData.select($"*", foo('key, 'value)).limit(3),

 

Cheers

 

On Sat, Feb 13, 2016 at 9:55 AM, Mich Talebzadeh <mich@peridale.co.uk <ma...@peridale.co.uk> > wrote:

Hi,

 

 

Unfortunately Oracle table columns defined as NUMBER result in overflow.

 

An alternative seems to be to create a UDF to map that column to Double

 

val toDouble = udf((d: java.math.BigDecimal) => d.toString.toDouble)

 

 

This is the DF I have defined to fetch one column as below from the Oracle table

 

  val s = sqlContext.load("jdbc",

     Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",

  "dbtable" -> "(select PROD_ID from sh.sales)",

  "user" -> "sh",

"password" -> "xxxxx"))

 

This obviously works

 

scala> s.count

res13: Long = 918843

 

Now the question is how to use that UDF toDouble to read column PROD_ID? Do I need to create a temporary table? 

 

 

Thanks

 

Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.

Re: using udf to convert Oracle number column in Data Frame

Posted by Ted Yu <yu...@gmail.com>.

Please take a look
at sql/core/src/main/scala/org/apache/spark/sql/functions.scala :

  def udf(f: AnyRef, dataType: DataType): UserDefinedFunction = {
    UserDefinedFunction(f, dataType, None)

And sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala :

  test("udf") {
    val foo = udf((a: Int, b: String) => a.toString + b)

    checkAnswer(
      // SELECT *, foo(key, value) FROM testData
      testData.select($"*", foo('key, 'value)).limit(3),

Cheers

On Sat, Feb 13, 2016 at 9:55 AM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:

> Hi,
>
>
>
>
>
> Unfortunately Oracle table columns defined as NUMBER result in overflow.
>
>
>
> An alternative seems to be to create a UDF to map that column to Double
>
>
>
> val toDouble = udf((d: java.math.BigDecimal) => d.toString.toDouble)
>
>
>
>
>
> This is the DF I have defined to fetch one column as below from the Oracle
> table
>
>
>
>   val s = sqlContext.load("jdbc",
>
>      Map("url" -> "jdbc:oracle:thin:@rhes564:1521:mydb",
>
>   "dbtable" -> "(select PROD_ID from sh.sales)",
>
>   "user" -> "sh",
>
> "password" -> "xxxxx"))
>
>
>
> This obviously works
>
>
>
> scala> s.count
>
> res13: Long = 918843
>
>
>
> Now the question is how to use that UDF toDouble to read column PROD_ID?
> Do I need to create a temporary table?
>
>
>
>
>
> Thanks
>
>
>
> Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>
>
>