You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/10/22 20:27:27 UTC

[jira] [Assigned] (SPARK-11261) Provide a more flexible alternative to Jdbc RDD

     [ https://issues.apache.org/jira/browse/SPARK-11261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-11261:
------------------------------------

    Assignee:     (was: Apache Spark)

> Provide a more flexible alternative to Jdbc RDD
> -----------------------------------------------
>
>                 Key: SPARK-11261
>                 URL: https://issues.apache.org/jira/browse/SPARK-11261
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Richard Marscher
>
> The existing JdbcRDD only covers a limited number of use cases by requiring the semantics of your query to operate on upper and lower bound predicates like: "select title, author from books where ? <= id and id <= ?"
> However, there are many use cases that cannot use such a method and/or are much more inefficient doing so.
> For example, we have a MySQL table partitioned on a partition key. We don't have range values to lookup but rather want to get all entries matching a predicate and have Spark run 1 query in a partition against each logical partition of our MySQL table. For example: "select * from devices where partition_id = ? and app_id = 'abcd'".
> Another use case, looking up against a distinct set of identifiers that don't fall within an ordering. "select * from users where user_id in (?,?,?,?,?,?,?)". The number of identifiers may be quite large and/or dynamic.
> Solution:
> Instead of addressing each use case differently with new RDD types, provide an alternate, general RDD that gives the user direct control over how the query is partitioned in Spark and filling in the placeholders.
> The user should be able to control which placeholder values are available on each partition of the RDD and also how they are inserted into the PreparedStatement. Ideally it can support dynamic placeholder values like inserting a set of values for an IN clause or similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org