You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by "javacaoyu@163.com" <ja...@163.com> on 2022/09/19 11:16:40 UTC

[discuss]Support RDD using JDBC data source in PySpark

Hi guys:

When i using pyspark, i wanna get data from mysql database.  so i want use JDBCRDD. but that is not be supported in PySpark.

For some reasons, i can't using DataFrame API, only can use RDD(datastream) API. Even i know the DataFrame can get data from jdbc source fairly well.


So i want to implement functionality that can use rdd to get data from jdbc source for PySpark.

But i don't know if that are necessary for PySpark.   so we can discuss it.

If it is necessary for PySpark, i want to contribute to Spark.   i want to create a jira task and hope can get assigned to me.
I am a bigdata engineer, like to contribute for open source. I already summit 2 PR for Apache Flink(FLINK-26609, FLINK-26728) and its merged\closed.
So i think if i can get the jira ticket, i can implemented it fairly well.



thanks.



.



javacaoyu@163.com