You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tor Myklebust (JIRA)" <ji...@apache.org> on 2015/01/29 17:38:34 UTC
[jira] [Comment Edited] (SPARK-5472) Add support for reading from and writing to a JDBC database

    [ https://issues.apache.org/jira/browse/SPARK-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297105#comment-14297105 ] 

Tor Myklebust edited comment on SPARK-5472 at 1/29/15 4:37 PM:
---------------------------------------------------------------

Not sure what you mean by "essentially" here.  JdbcRDD certainly lets you pull information out of a database, and from there you can munge it to accomplish whatever task you want to accomplish.  Part of the point here is to eliminate, or at least drastically reduce, the need for manual munging.

JdbcRDD gives you an RDD of Array[Object]'s or, if you specify a function that maps ResultSet rows to objects of your choosing, an RDD of some class of your choosing.  It doesn't natively produce Spark SQL DataFrames.  In order to get a DataFrame, you need an RDD of Row objects and their schema; a lot of the work here comes from type mapping between types in the external database and Spark SQL types.

JdbcRDD also doesn't expose itself as a data source in Spark SQL; you can't "CREATE TABLE foo USING something" with some options in Spark SQL in order to get a table named foo that really lives inside an external database.


was (Author: tmyklebu):
Not sure what you mean by "essentially" here.

JdbcRDD gives you an RDD of Array[Object]'s or, if you specify a function that maps ResultSet rows to objects of your choosing, an RDD of some class of your choosing.  It doesn't natively produce Spark SQL DataFrames.  In order to get a DataFrame, you need an RDD of Row objects and their schema; a lot of the work here comes from type mapping between types in the external database and Spark SQL types.

JdbcRDD also doesn't expose itself as a data source in Spark SQL; you can't "CREATE TABLE foo USING something" with some options in Spark SQL in order to get a table named foo that really lives inside an external database.

> Add support for reading from and writing to a JDBC database
> -----------------------------------------------------------
>
>                 Key: SPARK-5472
>                 URL: https://issues.apache.org/jira/browse/SPARK-5472
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Tor Myklebust
>            Priority: Minor
>
> It would be nice to be able to make a table in a JDBC database appear as a table in Spark SQL.  This would let users, for instance, perform a JOIN between a DataFrame in Spark SQL with a table in a Postgres database.
> It might also be nice to be able to go the other direction---save a DataFrame to a database---for instance in an ETL job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org