You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "javacaoyu@163.com" <ja...@163.com> on 2022/09/19 11:16:40 UTC
[discuss]Support RDD using JDBC data source in PySpark
Hi guys:
When i using pyspark, i wanna get data from mysql database. so i want use JDBCRDD. but that is not be supported in PySpark.
For some reasons, i can't using DataFrame API, only can use RDD(datastream) API. Even i know the DataFrame can get data from jdbc source fairly well.
So i want to implement functionality that can use rdd to get data from jdbc source for PySpark.
But i don't know if that are necessary for PySpark. so we can discuss it.
If it is necessary for PySpark, i want to contribute to Spark. i want to create a jira task and hope can get assigned to me.
I am a bigdata engineer, like to contribute for open source. I already summit 2 PR for Apache Flink(FLINK-26609, FLINK-26728) and its merged\closed.
So i think if i can get the jira ticket, i can implemented it fairly well.
thanks.
.
javacaoyu@163.com