You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by "Shkurenko, Alex" <as...@enova.com> on 2015/08/14 19:30:00 UTC

SparkR DataFrame fail to return data of Decimal type

Got an issue similar to https://issues.apache.org/jira/browse/SPARK-8897,
but with the Decimal datatype coming from a Postgres DB:

//Set up SparkR

>Sys.setenv(SPARK_HOME="/Users/ashkurenko/work/git_repos/spark")
>Sys.setenv(SPARKR_SUBMIT_ARGS="--driver-class-path
~/Downloads/postgresql-9.4-1201.jdbc4.jar sparkr-shell")
>.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
>library(SparkR)
>sc <- sparkR.init(master="local")

// Connect to a Postgres DB via JDBC
>sqlContext <- sparkRSQL.init(sc)
>sql(sqlContext, "
    CREATE TEMPORARY TABLE mytable
    USING org.apache.spark.sql.jdbc
    OPTIONS (url 'jdbc:postgresql://servername:5432/dbname'
    ,dbtable 'mydbtable'
)
")

// Try pulling a Decimal column from a table
>myDataFrame <- sql(sqlContext,("select a_decimal_column  from mytable "))

// The schema shows up fine

>show(myDataFrame)

DataFrame[a_decimal_column:decimal(10,0)]

>schema(myDataFrame)

StructType
|-name = "a_decimal_column", type = "DecimalType(10,0)", nullable = TRUE

// ... but pulling data fails:

localDF <- collect(myDataFrame)

Error in as.data.frame.default(x[[i]], optional = TRUE) :
  cannot coerce class ""jobj"" to a data.frame


-------
Proposed fix:

diff --git a/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
b/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
index d5b4260..b77ae2a 100644
--- a/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
+++ b/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
@@ -219,6 +219,9 @@ private[spark] object SerDe {
         case "float" | "java.lang.Float" =>
           writeType(dos, "double")
           writeDouble(dos, value.asInstanceOf[Float].toDouble)
+        case "decimal" | "java.math.BigDecimal" =>
+           writeType(dos, "double")
+           writeDouble(dos,
scala.math.BigDecimal(value.asInstanceOf[java.math.BigDecimal]).toDouble)
         case "double" | "java.lang.Double" =>
           writeType(dos, "double")
           writeDouble(dos, value.asInstanceOf[Double])

Thanks,
Alex

Re: SparkR DataFrame fail to return data of Decimal type

Posted by "Shkurenko, Alex" <as...@enova.com>.

Created https://issues.apache.org/jira/browse/SPARK-9982, working on the PR

On Fri, Aug 14, 2015 at 12:43 PM, Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> Thanks for the catch. Could you send a PR with this diff ?
>
> On Fri, Aug 14, 2015 at 10:30 AM, Shkurenko, Alex <as...@enova.com>
> wrote:
> > Got an issue similar to https://issues.apache.org/jira/browse/SPARK-8897
> ,
> > but with the Decimal datatype coming from a Postgres DB:
> >
> > //Set up SparkR
> >
> >>Sys.setenv(SPARK_HOME="/Users/ashkurenko/work/git_repos/spark")
> >>Sys.setenv(SPARKR_SUBMIT_ARGS="--driver-class-path
> >> ~/Downloads/postgresql-9.4-1201.jdbc4.jar sparkr-shell")
> >>.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),
> .libPaths()))
> >>library(SparkR)
> >>sc <- sparkR.init(master="local")
> >
> > // Connect to a Postgres DB via JDBC
> >>sqlContext <- sparkRSQL.init(sc)
> >>sql(sqlContext, "
> >     CREATE TEMPORARY TABLE mytable
> >     USING org.apache.spark.sql.jdbc
> >     OPTIONS (url 'jdbc:postgresql://servername:5432/dbname'
> >     ,dbtable 'mydbtable'
> > )
> > ")
> >
> > // Try pulling a Decimal column from a table
> >>myDataFrame <- sql(sqlContext,("select a_decimal_column  from mytable "))
> >
> > // The schema shows up fine
> >
> >>show(myDataFrame)
> >
> > DataFrame[a_decimal_column:decimal(10,0)]
> >
> >>schema(myDataFrame)
> >
> > StructType
> > |-name = "a_decimal_column", type = "DecimalType(10,0)", nullable = TRUE
> >
> > // ... but pulling data fails:
> >
> > localDF <- collect(myDataFrame)
> >
> > Error in as.data.frame.default(x[[i]], optional = TRUE) :
> >   cannot coerce class ""jobj"" to a data.frame
> >
> >
> > -------
> > Proposed fix:
> >
> > diff --git a/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
> > b/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
> > index d5b4260..b77ae2a 100644
> > --- a/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
> > +++ b/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
> > @@ -219,6 +219,9 @@ private[spark] object SerDe {
> >          case "float" | "java.lang.Float" =>
> >            writeType(dos, "double")
> >            writeDouble(dos, value.asInstanceOf[Float].toDouble)
> > +        case "decimal" | "java.math.BigDecimal" =>
> > +           writeType(dos, "double")
> > +           writeDouble(dos,
> > scala.math.BigDecimal(value.asInstanceOf[java.math.BigDecimal]).toDouble)
> >          case "double" | "java.lang.Double" =>
> >            writeType(dos, "double")
> >            writeDouble(dos, value.asInstanceOf[Double])
> >
> > Thanks,
> > Alex
>

Re: SparkR DataFrame fail to return data of Decimal type

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.

Thanks for the catch. Could you send a PR with this diff ?

On Fri, Aug 14, 2015 at 10:30 AM, Shkurenko, Alex <as...@enova.com> wrote:
> Got an issue similar to https://issues.apache.org/jira/browse/SPARK-8897,
> but with the Decimal datatype coming from a Postgres DB:
>
> //Set up SparkR
>
>>Sys.setenv(SPARK_HOME="/Users/ashkurenko/work/git_repos/spark")
>>Sys.setenv(SPARKR_SUBMIT_ARGS="--driver-class-path
>> ~/Downloads/postgresql-9.4-1201.jdbc4.jar sparkr-shell")
>>.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
>>library(SparkR)
>>sc <- sparkR.init(master="local")
>
> // Connect to a Postgres DB via JDBC
>>sqlContext <- sparkRSQL.init(sc)
>>sql(sqlContext, "
>     CREATE TEMPORARY TABLE mytable
>     USING org.apache.spark.sql.jdbc
>     OPTIONS (url 'jdbc:postgresql://servername:5432/dbname'
>     ,dbtable 'mydbtable'
> )
> ")
>
> // Try pulling a Decimal column from a table
>>myDataFrame <- sql(sqlContext,("select a_decimal_column  from mytable "))
>
> // The schema shows up fine
>
>>show(myDataFrame)
>
> DataFrame[a_decimal_column:decimal(10,0)]
>
>>schema(myDataFrame)
>
> StructType
> |-name = "a_decimal_column", type = "DecimalType(10,0)", nullable = TRUE
>
> // ... but pulling data fails:
>
> localDF <- collect(myDataFrame)
>
> Error in as.data.frame.default(x[[i]], optional = TRUE) :
>   cannot coerce class ""jobj"" to a data.frame
>
>
> -------
> Proposed fix:
>
> diff --git a/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
> b/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
> index d5b4260..b77ae2a 100644
> --- a/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
> +++ b/core/src/main/scala/org/apache/spark/api/r/SerDe.scala
> @@ -219,6 +219,9 @@ private[spark] object SerDe {
>          case "float" | "java.lang.Float" =>
>            writeType(dos, "double")
>            writeDouble(dos, value.asInstanceOf[Float].toDouble)
> +        case "decimal" | "java.math.BigDecimal" =>
> +           writeType(dos, "double")
> +           writeDouble(dos,
> scala.math.BigDecimal(value.asInstanceOf[java.math.BigDecimal]).toDouble)
>          case "double" | "java.lang.Double" =>
>            writeType(dos, "double")
>            writeDouble(dos, value.asInstanceOf[Double])
>
> Thanks,
> Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org