You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2014/11/03 23:31:33 UTC

[jira] [Commented] (SPARK-2710) Build SchemaRDD from a JdbcRDD with MetaData (no hard-coded case class)

    [ https://issues.apache.org/jira/browse/SPARK-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195240#comment-14195240 ] 

Michael Armbrust commented on SPARK-2710:
-----------------------------------------

Now that its been merged, it would be great if this feature could be implemented using the DataSource API.

> Build SchemaRDD from a JdbcRDD with MetaData (no hard-coded case class)
> -----------------------------------------------------------------------
>
>                 Key: SPARK-2710
>                 URL: https://issues.apache.org/jira/browse/SPARK-2710
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, SQL
>            Reporter: Teng Qiu
>
> Spark SQL can take Parquet files or JSON files as a table directly (without given a case class to define the schema)
> as a component named SQL, it should also be able to take a ResultSet from RDBMS easily.
> i find that there is a JdbcRDD in core: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala
> so i want to make some small change in this file to allow SQLContext to read the MetaData from the PreparedStatement (read metadata do not need to execute the query really).
> Then, in Spark SQL, SQLContext can create SchemaRDD with JdbcRDD and his MetaData.
> In the further, maybe we can add a feature in sql-shell, so that user can using spark-thrift-server join tables from different sources
> such as:
> {code}
> CREATE TABLE jdbc_tbl1 AS JDBC "connectionString" "username" "password" "initQuery" "bound" ...
> CREATE TABLE parquet_files AS PARQUET "hdfs://tmp/parquet_table/"
> SELECT parquet_files.colX, jdbc_tbl1.colY
>   FROM parquet_files
>   JOIN jdbc_tbl1
>     ON (parquet_files.id = jdbc_tbl1.id)
> {code}
> I think such a feature will be useful, like facebook Presto engine does.
> oh, and there is a small bug in JdbcRDD
> in compute(), method close()
> {code}
> if (null != conn && ! stmt.isClosed()) conn.close()
> {code}
> should be
> {code}
> if (null != conn && ! conn.isClosed()) conn.close()
> {code}
> just a small write error :)
> but such a close method will never be able to close conn...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org