You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Justin Pihony <ju...@gmail.com> on 2015/03/18 16:20:48 UTC

Did DataFrames break basic SQLContext?

I started to play with 1.3.0 and found that there are a lot of breaking
changes. Previously, I could do the following:

    case class Foo(x: Int)
    val rdd = sc.parallelize(List(Foo(1)))
    import sqlContext._
    rdd.registerTempTable("foo")

Now, I am not able to directly use my RDD object and have it implicitly
become a DataFrame. It can be used as a DataFrameHolder, of which I could
write:

    rdd.toDF.registerTempTable("foo")

But, that is kind of a pain in comparison. The other problem for me is that
I keep getting a SQLException:

    java.sql.SQLException: Failed to start database 'metastore_db' with
class loader  sun.misc.Launcher$AppClassLoader@10393e97, see the next
exception for details.

This seems to be a dependency on Hive, when previously (1.2.0) there was no
such dependency. I can open tickets for these, but wanted to ask here
first....maybe I am doing something wrong?

Thanks,
Justin



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Did-DataFrames-break-basic-SQLContext-tp22120.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Did DataFrames break basic SQLContext?

Posted by Justin Pihony <ju...@gmail.com>.

It appears that the metastore_db problem is related to
https://issues.apache.org/jira/browse/SPARK-4758. I had another shell open
that was stuck. This is probably a bug, though?

    import sqlContext.implicits
    case class Foo(x: Int)
    val rdd = sc.parallelize(List(Foo(1)))
    rdd.toDF

results in a frozen shell after this line:

    INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on
mysql: Lexical error at line 1, column 5.  Encountered: "@" (64), after :
"".

which, locks the internally created metastore_db


On Wed, Mar 18, 2015 at 11:20 AM, Justin Pihony <ju...@gmail.com>
wrote:

> I started to play with 1.3.0 and found that there are a lot of breaking
> changes. Previously, I could do the following:
>
>     case class Foo(x: Int)
>     val rdd = sc.parallelize(List(Foo(1)))
>     import sqlContext._
>     rdd.registerTempTable("foo")
>
> Now, I am not able to directly use my RDD object and have it implicitly
> become a DataFrame. It can be used as a DataFrameHolder, of which I could
> write:
>
>     rdd.toDF.registerTempTable("foo")
>
> But, that is kind of a pain in comparison. The other problem for me is that
> I keep getting a SQLException:
>
>     java.sql.SQLException: Failed to start database 'metastore_db' with
> class loader  sun.misc.Launcher$AppClassLoader@10393e97, see the next
> exception for details.
>
> This seems to be a dependency on Hive, when previously (1.2.0) there was no
> such dependency. I can open tickets for these, but wanted to ask here
> first....maybe I am doing something wrong?
>
> Thanks,
> Justin
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Did-DataFrames-break-basic-SQLContext-tp22120.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: Did DataFrames break basic SQLContext?

Posted by Nick Pentreath <ni...@gmail.com>.

To answer your first question - yes 1.3.0 did break backward compatibility for the change from SchemaRDD -> DataFrame. SparkSQL was an alpha component so api breaking changes could happen. It is no longer an alpha component as of 1.3.0 so this will not be the case in future.




Adding toDF should hopefully not be too much of an effort.




For the second point - I also have seen these exceptions when upgrading jobs to 1.3.0 - but they don't fail my jobs. Not sure what the cause is would be good to understand this.









—
Sent from Mailbox

On Wed, Mar 18, 2015 at 5:22 PM, Justin Pihony <ju...@gmail.com>
wrote:

> I started to play with 1.3.0 and found that there are a lot of breaking
> changes. Previously, I could do the following:
>     case class Foo(x: Int)
>     val rdd = sc.parallelize(List(Foo(1)))
>     import sqlContext._
>     rdd.registerTempTable("foo")
> Now, I am not able to directly use my RDD object and have it implicitly
> become a DataFrame. It can be used as a DataFrameHolder, of which I could
> write:
>     rdd.toDF.registerTempTable("foo")
> But, that is kind of a pain in comparison. The other problem for me is that
> I keep getting a SQLException:
>     java.sql.SQLException: Failed to start database 'metastore_db' with
> class loader  sun.misc.Launcher$AppClassLoader@10393e97, see the next
> exception for details.
> This seems to be a dependency on Hive, when previously (1.2.0) there was no
> such dependency. I can open tickets for these, but wanted to ask here
> first....maybe I am doing something wrong?
> Thanks,
> Justin
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Did-DataFrames-break-basic-SQLContext-tp22120.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org

Re: Did DataFrames break basic SQLContext?

Posted by Michael Armbrust <mi...@databricks.com>.

>
> Now, I am not able to directly use my RDD object and have it implicitly
> become a DataFrame. It can be used as a DataFrameHolder, of which I could
> write:
>
>     rdd.toDF.registerTempTable("foo")
>

The rational here was that we added a lot of methods to DataFrame and made
the implicits more powerful, but that increased the likelihood of
accidental application of the implicit.  I personally have had to explain
the accidental application of implicits (and the confusing compiler
messages that can result) to beginners so many times that we decided to
remove the subtle conversion from RDD to DataFrame and instead make it
explicit method call.