You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Shiv Prashant Sood <sh...@gmail.com> on 2019/07/12 22:33:23 UTC

JDBC connector for DataSourceV2

Can someone please help understand the current Status of DataSource V2
based JDBC connector? I see connectors for various file formats in Master,
but can't find a JDBC implementation or related JIRA.

DatasourceV2 APIs to me look in good shape to attempt a JDBC connector for
READ/WRITE path.

Thanks & Regards,
Shiv

Re: JDBC connector for DataSourceV2

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

Sounds great! Ping me on the review, I think this will be really valuable.

On Fri, Jul 12, 2019 at 6:51 PM Xianyin Xin <xi...@alibaba-inc.com>
wrote:

> If there’s nobody working on that, I’d like to contribute.
>
>
>
> Loop in @Gengliang Wang.
>
>
>
> Xianyin
>
>
>
> *From: *Ryan Blue <rb...@netflix.com.INVALID>
> *Reply-To: *<rb...@netflix.com>
> *Date: *Saturday, July 13, 2019 at 6:54 AM
> *To: *Shiv Prashant Sood <sh...@gmail.com>
> *Cc: *Spark Dev List <de...@spark.apache.org>
> *Subject: *Re: JDBC connector for DataSourceV2
>
>
>
> I'm not aware of a JDBC connector effort. It would be great to have
> someone build one!
>
>
>
> On Fri, Jul 12, 2019 at 3:33 PM Shiv Prashant Sood <sh...@gmail.com>
> wrote:
>
> Can someone please help understand the current Status of DataSource V2
> based JDBC connector? I see connectors for various file formats in Master,
> but can't find a JDBC implementation or related JIRA.
>
>
>
> DatasourceV2 APIs to me look in good shape to attempt a JDBC connector for
> READ/WRITE path.
>
> Thanks & Regards,
>
> Shiv
>
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: JDBC connector for DataSourceV2

Posted by Shiv Prashant Sood <sh...@gmail.com>.

All ,

Have a first draft of DataSourceV2 based JDBC connector available now (
PR#25211 <https://github.com/apache/spark/pull/25211>) . The goal was a MVP
implementation with support for batch read/write.
I am looking forward for your review comments to help guide direction. Note
that i am still understanding/addressing some issues. The plan, status
issues is capture in the Readme.md
<https://github.com/shivsood/spark/blob/dsv2_jdbc/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/Readme.md>

Ssummary of changes
- V2 connector changes are under under
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2*/jdbc.
*The implementation heavily reuses infra provided by JDBCUtils.
- *JDBCUtils*(sql/core/../datasources/jdbc/JdbcUtils.scala) file is
refactored ( for few functions) to suite V2 needs.
*- *E2E test cases are in
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/
*MsSqlServerIntegrationSuite.scala*

[image: image.png]

Current Status ( Refer Readme.md
<https://github.com/shivsood/spark/blob/dsv2_jdbc/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/Readme.md>
for details)
[image: image.png]


Regards,
Shiv

On Mon, Jul 15, 2019 at 5:16 PM Priyanka Gomatam <
Priyanka.Gomatam@microsoft.com> wrote:

> I would have thought one of the most important goals would be pushing down
> limits since V2 supports it.
>
>
>
> I am also interested in collaborating. Thanks!
>
>
>
> Priyanka Gomatam
>
>
>
> *From:* Shiv Prashant Sood <sh...@gmail.com>
> *Sent:* Monday, July 15, 2019 10:22 AM
> *To:* Gabor Somogyi <ga...@gmail.com>
> *Cc:* Xianyin Xin <xi...@alibaba-inc.com>; Ryan Blue <
> rblue@netflix.com>; gengliang.wang@databricks.com; Spark Dev List <
> dev@spark.apache.org>
> *Subject:* Re: JDBC connector for DataSourceV2
>
>
>
> Agree. Let's use SPARK-24907
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-24907&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362796637&sdata=fg5iQwBwWwZ6BonaijR%2FTJ%2FiKBTsNbE8XOFGN6Y5VCs%3D&reserved=0>
> as the JIRA for this work. Thanks for resolving SPARK-28380
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-28380&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362806632&sdata=a9VYhhEXeiDC4Pj2KXIWzOr2BBQ2mXORotgfviDTBkE%3D&reserved=0>
> as dupe of this.
>
>
>
> Regards,
>
> Shiv
>
>
>
> On Mon, Jul 15, 2019 at 1:50 AM Gabor Somogyi <ga...@gmail.com>
> wrote:
>
> I've had a look at the jiras and seems like the intention is the same
> (correct me if I'm wrong).
>
> I think one is enough and the rest can be closed with duplicate.
>
> We should keep multiple jiras only when the intention is different.
>
>
>
> BR,
>
> G
>
>
>
>
>
> On Mon, Jul 15, 2019 at 6:01 AM Xianyin Xin <xi...@alibaba-inc.com>
> wrote:
>
> There’s another pr https://github.com/apache/spark/pull/21861
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F21861&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362806632&sdata=asUJggik%2B8bWUNCUR6NL1%2Bf2FDtF%2BZoiK5c23z0tHx8%3D&reserved=0>
> but which is based the old V2 APIs.
>
>
>
> We’d better link the JIRAs, SPARK-24907
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-24907&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362806632&sdata=i1%2BRs0ShcH0IfS%2FT8nXJZMHvpeOuWHCT3F%2BSZkJeoxg%3D&reserved=0>,
> SPARK-25547
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-25547&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362816625&sdata=QbdzhA5l4ZDjsUZK96UqrB0numqMjwnMVB0xr2c4WQI%3D&reserved=0>,
> and SPARK-28380
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-28380&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362816625&sdata=wWFRcjFMPbOf9ThaTKu2IY1tvzgTm0fXeRtTXgBYgVA%3D&reserved=0>
> and finalize a plan.
>
>
>
> Xianyin
>
>
>
> *From: *Shiv Prashant Sood <sh...@gmail.com>
> *Date: *Sunday, July 14, 2019 at 2:59 AM
> *To: *Gabor Somogyi <ga...@gmail.com>
> *Cc: *Xianyin Xin <xi...@alibaba-inc.com>, Ryan Blue <
> rblue@netflix.com>, <ge...@databricks.com>, Spark Dev List <
> dev@spark.apache.org>
> *Subject: *Re: JDBC connector for DataSourceV2
>
>
>
> To me this looks like refactoring of DS1 JDBC to enable user provided
> connection factories. In itself a good change, but IMO not DSV2 related.
>
>
>
> I created a JIRA and added some goals. Please comments/add as relevant.
>
>
>
> https://issues.apache.org/jira/browse/SPARK-28380
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-28380&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362826620&sdata=JgT2Kyi4paiK01aLwggBg0d5lfen%2BBwQminOR1PlprM%3D&reserved=0>
>
>
>
> JIRA for DataSourceV2 API based JDBC connector.
>
> Goals :
>
>    - Generic connector based on JDBC that supports all databases (min bar
>    is support for all V1 data bases).
>    - Reference implementation and Interface for any specialized JDBC
>    connectors.
>
>
>
> Regards,
>
> Shiv
>
>
>
> On Sat, Jul 13, 2019 at 2:17 AM Gabor Somogyi <ga...@gmail.com>
> wrote:
>
> Hi Guys,
>
>
>
> Don't know what's the intention exactly here but there is such a PR:
> https://github.com/apache/spark/pull/22560
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F22560&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362826620&sdata=i0Fc5PT0hYyki99JuldhryMpoSWHGUnoKgubwoM9Woo%3D&reserved=0>
>
> If that's what we need maybe we can resurrect it. BTW, I'm also interested
> in...
>
>
>
> BR,
>
> G
>
>
>
>
>
> On Sat, Jul 13, 2019 at 4:09 AM Shiv Prashant Sood <sh...@gmail.com>
> wrote:
>
> Thanks all. I can also contribute toward this effort.
>
>
>
> Regards,
>
> Shiv
>
> Sent from my iPhone
>
>
> On Jul 12, 2019, at 6:51 PM, Xianyin Xin <xi...@alibaba-inc.com>
> wrote:
>
> If there’s nobody working on that, I’d like to contribute.
>
>
>
> Loop in @Gengliang Wang.
>
>
>
> Xianyin
>
>
>
> *From: *Ryan Blue <rb...@netflix.com.INVALID>
> *Reply-To: *<rb...@netflix.com>
> *Date: *Saturday, July 13, 2019 at 6:54 AM
> *To: *Shiv Prashant Sood <sh...@gmail.com>
> *Cc: *Spark Dev List <de...@spark.apache.org>
> *Subject: *Re: JDBC connector for DataSourceV2
>
>
>
> I'm not aware of a JDBC connector effort. It would be great to have
> someone build one!
>
>
>
> On Fri, Jul 12, 2019 at 3:33 PM Shiv Prashant Sood <sh...@gmail.com>
> wrote:
>
> Can someone please help understand the current Status of DataSource V2
> based JDBC connector? I see connectors for various file formats in Master,
> but can't find a JDBC implementation or related JIRA.
>
>
>
> DatasourceV2 APIs to me look in good shape to attempt a JDBC connector for
> READ/WRITE path.
>
> Thanks & Regards,
>
> Shiv
>
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>
>

RE: JDBC connector for DataSourceV2

Posted by Priyanka Gomatam <Pr...@microsoft.com.INVALID>.

I would have thought one of the most important goals would be pushing down limits since V2 supports it.

I am also interested in collaborating. Thanks!

Priyanka Gomatam

From: Shiv Prashant Sood <sh...@gmail.com>
Sent: Monday, July 15, 2019 10:22 AM
To: Gabor Somogyi <ga...@gmail.com>
Cc: Xianyin Xin <xi...@alibaba-inc.com>; Ryan Blue <rb...@netflix.com>; gengliang.wang@databricks.com; Spark Dev List <de...@spark.apache.org>
Subject: Re: JDBC connector for DataSourceV2

Agree. Let's use SPARK-24907<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-24907&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362796637&sdata=fg5iQwBwWwZ6BonaijR%2FTJ%2FiKBTsNbE8XOFGN6Y5VCs%3D&reserved=0> as the JIRA for this work. Thanks for resolving SPARK-28380<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-28380&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362806632&sdata=a9VYhhEXeiDC4Pj2KXIWzOr2BBQ2mXORotgfviDTBkE%3D&reserved=0> as dupe of this.

Regards,
Shiv

On Mon, Jul 15, 2019 at 1:50 AM Gabor Somogyi <ga...@gmail.com>> wrote:
I've had a look at the jiras and seems like the intention is the same (correct me if I'm wrong).
I think one is enough and the rest can be closed with duplicate.
We should keep multiple jiras only when the intention is different.

BR,
G

On Mon, Jul 15, 2019 at 6:01 AM Xianyin Xin <xi...@alibaba-inc.com>> wrote:
There’s another pr https://github.com/apache/spark/pull/21861<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F21861&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362806632&sdata=asUJggik%2B8bWUNCUR6NL1%2Bf2FDtF%2BZoiK5c23z0tHx8%3D&reserved=0> but which is based the old V2 APIs.

We’d better link the JIRAs, SPARK-24907<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-24907&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362806632&sdata=i1%2BRs0ShcH0IfS%2FT8nXJZMHvpeOuWHCT3F%2BSZkJeoxg%3D&reserved=0>, SPARK-25547<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-25547&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362816625&sdata=QbdzhA5l4ZDjsUZK96UqrB0numqMjwnMVB0xr2c4WQI%3D&reserved=0>, and SPARK-28380<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-28380&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362816625&sdata=wWFRcjFMPbOf9ThaTKu2IY1tvzgTm0fXeRtTXgBYgVA%3D&reserved=0> and finalize a plan.

Xianyin

From: Shiv Prashant Sood <sh...@gmail.com>>
Date: Sunday, July 14, 2019 at 2:59 AM
To: Gabor Somogyi <ga...@gmail.com>>
Cc: Xianyin Xin <xi...@alibaba-inc.com>>, Ryan Blue <rb...@netflix.com>>, <ge...@databricks.com>>, Spark Dev List <de...@spark.apache.org>>
Subject: Re: JDBC connector for DataSourceV2

To me this looks like refactoring of DS1 JDBC to enable user provided connection factories. In itself a good change, but IMO not DSV2 related.

I created a JIRA and added some goals. Please comments/add as relevant.

https://issues.apache.org/jira/browse/SPARK-28380<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-28380&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362826620&sdata=JgT2Kyi4paiK01aLwggBg0d5lfen%2BBwQminOR1PlprM%3D&reserved=0>

JIRA for DataSourceV2 API based JDBC connector.

Goals :

  *   Generic connector based on JDBC that supports all databases (min bar is support for all V1 data bases).
  *   Reference implementation and Interface for any specialized JDBC connectors.

Regards,
Shiv

On Sat, Jul 13, 2019 at 2:17 AM Gabor Somogyi <ga...@gmail.com>> wrote:
Hi Guys,

Don't know what's the intention exactly here but there is such a PR: https://github.com/apache/spark/pull/22560<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F22560&data=02%7C01%7CPriyanka.Gomatam%40microsoft.com%7Ce40bb48f96de41aad82408d70948fa94%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636988081362826620&sdata=i0Fc5PT0hYyki99JuldhryMpoSWHGUnoKgubwoM9Woo%3D&reserved=0>
If that's what we need maybe we can resurrect it. BTW, I'm also interested in...

BR,
G

On Sat, Jul 13, 2019 at 4:09 AM Shiv Prashant Sood <sh...@gmail.com>> wrote:
Thanks all. I can also contribute toward this effort.

Regards,
Shiv
Sent from my iPhone

On Jul 12, 2019, at 6:51 PM, Xianyin Xin <xi...@alibaba-inc.com>> wrote:
If there’s nobody working on that, I’d like to contribute.

Loop in @Gengliang Wang.

Xianyin

From: Ryan Blue <rb...@netflix.com.INVALID>>
Reply-To: <rb...@netflix.com>>
Date: Saturday, July 13, 2019 at 6:54 AM
To: Shiv Prashant Sood <sh...@gmail.com>>
Cc: Spark Dev List <de...@spark.apache.org>>
Subject: Re: JDBC connector for DataSourceV2

I'm not aware of a JDBC connector effort. It would be great to have someone build one!

On Fri, Jul 12, 2019 at 3:33 PM Shiv Prashant Sood <sh...@gmail.com>> wrote:
Can someone please help understand the current Status of DataSource V2 based JDBC connector? I see connectors for various file formats in Master, but can't find a JDBC implementation or related JIRA.

DatasourceV2 APIs to me look in good shape to attempt a JDBC connector for READ/WRITE path.
Thanks & Regards,
Shiv

--
Ryan Blue
Software Engineer
Netflix

Re: JDBC connector for DataSourceV2

Posted by Shiv Prashant Sood <sh...@gmail.com>.

Agree. Let's use SPARK-24907
<https://issues.apache.org/jira/browse/SPARK-24907> as the JIRA for this
work. Thanks for resolving SPARK-28380
<https://issues.apache.org/jira/browse/SPARK-28380> as dupe of this.

Regards,
Shiv

On Mon, Jul 15, 2019 at 1:50 AM Gabor Somogyi <ga...@gmail.com>
wrote:

> I've had a look at the jiras and seems like the intention is the same
> (correct me if I'm wrong).
> I think one is enough and the rest can be closed with duplicate.
> We should keep multiple jiras only when the intention is different.
>
> BR,
> G
>
>
> On Mon, Jul 15, 2019 at 6:01 AM Xianyin Xin <xi...@alibaba-inc.com>
> wrote:
>
>> There’s another pr https://github.com/apache/spark/pull/21861 but which
>> is based the old V2 APIs.
>>
>>
>>
>> We’d better link the JIRAs, SPARK-24907
>> <https://issues.apache.org/jira/browse/SPARK-24907>, SPARK-25547
>> <https://issues.apache.org/jira/browse/SPARK-25547>, and SPARK-28380
>> <https://issues.apache.org/jira/browse/SPARK-28380> and finalize a plan.
>>
>>
>>
>> Xianyin
>>
>>
>>
>> *From: *Shiv Prashant Sood <sh...@gmail.com>
>> *Date: *Sunday, July 14, 2019 at 2:59 AM
>> *To: *Gabor Somogyi <ga...@gmail.com>
>> *Cc: *Xianyin Xin <xi...@alibaba-inc.com>, Ryan Blue <
>> rblue@netflix.com>, <ge...@databricks.com>, Spark Dev List <
>> dev@spark.apache.org>
>> *Subject: *Re: JDBC connector for DataSourceV2
>>
>>
>>
>> To me this looks like refactoring of DS1 JDBC to enable user provided
>> connection factories. In itself a good change, but IMO not DSV2 related.
>>
>>
>>
>> I created a JIRA and added some goals. Please comments/add as relevant.
>>
>>
>>
>> https://issues.apache.org/jira/browse/SPARK-28380
>>
>>
>>
>> JIRA for DataSourceV2 API based JDBC connector.
>>
>> Goals :
>>
>>    - Generic connector based on JDBC that supports all databases (min
>>    bar is support for all V1 data bases).
>>    - Reference implementation and Interface for any specialized JDBC
>>    connectors.
>>
>>
>>
>> Regards,
>>
>> Shiv
>>
>>
>>
>> On Sat, Jul 13, 2019 at 2:17 AM Gabor Somogyi <ga...@gmail.com>
>> wrote:
>>
>> Hi Guys,
>>
>>
>>
>> Don't know what's the intention exactly here but there is such a PR:
>> https://github.com/apache/spark/pull/22560
>>
>> If that's what we need maybe we can resurrect it. BTW, I'm also
>> interested in...
>>
>>
>>
>> BR,
>>
>> G
>>
>>
>>
>>
>>
>> On Sat, Jul 13, 2019 at 4:09 AM Shiv Prashant Sood <
>> shivprashant@gmail.com> wrote:
>>
>> Thanks all. I can also contribute toward this effort.
>>
>>
>>
>> Regards,
>>
>> Shiv
>>
>> Sent from my iPhone
>>
>>
>> On Jul 12, 2019, at 6:51 PM, Xianyin Xin <xi...@alibaba-inc.com>
>> wrote:
>>
>> If there’s nobody working on that, I’d like to contribute.
>>
>>
>>
>> Loop in @Gengliang Wang.
>>
>>
>>
>> Xianyin
>>
>>
>>
>> *From: *Ryan Blue <rb...@netflix.com.INVALID>
>> *Reply-To: *<rb...@netflix.com>
>> *Date: *Saturday, July 13, 2019 at 6:54 AM
>> *To: *Shiv Prashant Sood <sh...@gmail.com>
>> *Cc: *Spark Dev List <de...@spark.apache.org>
>> *Subject: *Re: JDBC connector for DataSourceV2
>>
>>
>>
>> I'm not aware of a JDBC connector effort. It would be great to have
>> someone build one!
>>
>>
>>
>> On Fri, Jul 12, 2019 at 3:33 PM Shiv Prashant Sood <
>> shivprashant@gmail.com> wrote:
>>
>> Can someone please help understand the current Status of DataSource V2
>> based JDBC connector? I see connectors for various file formats in Master,
>> but can't find a JDBC implementation or related JIRA.
>>
>>
>>
>> DatasourceV2 APIs to me look in good shape to attempt a JDBC connector
>> for READ/WRITE path.
>>
>> Thanks & Regards,
>>
>> Shiv
>>
>>
>>
>>
>> --
>>
>> Ryan Blue
>>
>> Software Engineer
>>
>> Netflix
>>
>>

Re: JDBC connector for DataSourceV2

Posted by Gabor Somogyi <ga...@gmail.com>.

I've had a look at the jiras and seems like the intention is the same
(correct me if I'm wrong).
I think one is enough and the rest can be closed with duplicate.
We should keep multiple jiras only when the intention is different.

BR,
G


On Mon, Jul 15, 2019 at 6:01 AM Xianyin Xin <xi...@alibaba-inc.com>
wrote:

> There’s another pr https://github.com/apache/spark/pull/21861 but which
> is based the old V2 APIs.
>
>
>
> We’d better link the JIRAs, SPARK-24907
> <https://issues.apache.org/jira/browse/SPARK-24907>, SPARK-25547
> <https://issues.apache.org/jira/browse/SPARK-25547>, and SPARK-28380
> <https://issues.apache.org/jira/browse/SPARK-28380> and finalize a plan.
>
>
>
> Xianyin
>
>
>
> *From: *Shiv Prashant Sood <sh...@gmail.com>
> *Date: *Sunday, July 14, 2019 at 2:59 AM
> *To: *Gabor Somogyi <ga...@gmail.com>
> *Cc: *Xianyin Xin <xi...@alibaba-inc.com>, Ryan Blue <
> rblue@netflix.com>, <ge...@databricks.com>, Spark Dev List <
> dev@spark.apache.org>
> *Subject: *Re: JDBC connector for DataSourceV2
>
>
>
> To me this looks like refactoring of DS1 JDBC to enable user provided
> connection factories. In itself a good change, but IMO not DSV2 related.
>
>
>
> I created a JIRA and added some goals. Please comments/add as relevant.
>
>
>
> https://issues.apache.org/jira/browse/SPARK-28380
>
>
>
> JIRA for DataSourceV2 API based JDBC connector.
>
> Goals :
>
>    - Generic connector based on JDBC that supports all databases (min bar
>    is support for all V1 data bases).
>    - Reference implementation and Interface for any specialized JDBC
>    connectors.
>
>
>
> Regards,
>
> Shiv
>
>
>
> On Sat, Jul 13, 2019 at 2:17 AM Gabor Somogyi <ga...@gmail.com>
> wrote:
>
> Hi Guys,
>
>
>
> Don't know what's the intention exactly here but there is such a PR:
> https://github.com/apache/spark/pull/22560
>
> If that's what we need maybe we can resurrect it. BTW, I'm also interested
> in...
>
>
>
> BR,
>
> G
>
>
>
>
>
> On Sat, Jul 13, 2019 at 4:09 AM Shiv Prashant Sood <sh...@gmail.com>
> wrote:
>
> Thanks all. I can also contribute toward this effort.
>
>
>
> Regards,
>
> Shiv
>
> Sent from my iPhone
>
>
> On Jul 12, 2019, at 6:51 PM, Xianyin Xin <xi...@alibaba-inc.com>
> wrote:
>
> If there’s nobody working on that, I’d like to contribute.
>
>
>
> Loop in @Gengliang Wang.
>
>
>
> Xianyin
>
>
>
> *From: *Ryan Blue <rb...@netflix.com.INVALID>
> *Reply-To: *<rb...@netflix.com>
> *Date: *Saturday, July 13, 2019 at 6:54 AM
> *To: *Shiv Prashant Sood <sh...@gmail.com>
> *Cc: *Spark Dev List <de...@spark.apache.org>
> *Subject: *Re: JDBC connector for DataSourceV2
>
>
>
> I'm not aware of a JDBC connector effort. It would be great to have
> someone build one!
>
>
>
> On Fri, Jul 12, 2019 at 3:33 PM Shiv Prashant Sood <sh...@gmail.com>
> wrote:
>
> Can someone please help understand the current Status of DataSource V2
> based JDBC connector? I see connectors for various file formats in Master,
> but can't find a JDBC implementation or related JIRA.
>
>
>
> DatasourceV2 APIs to me look in good shape to attempt a JDBC connector for
> READ/WRITE path.
>
> Thanks & Regards,
>
> Shiv
>
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>
>

Re: JDBC connector for DataSourceV2

Posted by Xianyin Xin <xi...@alibaba-inc.com>.

There’s another pr https://github.com/apache/spark/pull/21861 but which is based the old V2 APIs.

 

We’d better link the JIRAs, SPARK-24907, SPARK-25547, and SPARK-28380 and finalize a plan.

 

Xianyin

 

From: Shiv Prashant Sood <sh...@gmail.com>
Date: Sunday, July 14, 2019 at 2:59 AM
To: Gabor Somogyi <ga...@gmail.com>
Cc: Xianyin Xin <xi...@alibaba-inc.com>, Ryan Blue <rb...@netflix.com>, <ge...@databricks.com>, Spark Dev List <de...@spark.apache.org>
Subject: Re: JDBC connector for DataSourceV2

 

To me this looks like refactoring of DS1 JDBC to enable user provided connection factories. In itself a good change, but IMO not DSV2 related. 

 

I created a JIRA and added some goals. Please comments/add as relevant.

 

https://issues.apache.org/jira/browse/SPARK-28380

 

JIRA for DataSourceV2 API based JDBC connector.

Goals :
Generic connector based on JDBC that supports all databases (min bar is support for all V1 data bases).
Reference implementation and Interface for any specialized JDBC connectors.
 

Regards,

Shiv

 

On Sat, Jul 13, 2019 at 2:17 AM Gabor Somogyi <ga...@gmail.com> wrote:

Hi Guys,

 

Don't know what's the intention exactly here but there is such a PR: https://github.com/apache/spark/pull/22560

If that's what we need maybe we can resurrect it. BTW, I'm also interested in...

 

BR,

G

 

 

On Sat, Jul 13, 2019 at 4:09 AM Shiv Prashant Sood <sh...@gmail.com> wrote:

Thanks all. I can also contribute toward this effort.

 

Regards,

Shiv

Sent from my iPhone


On Jul 12, 2019, at 6:51 PM, Xianyin Xin <xi...@alibaba-inc.com> wrote:

If there’s nobody working on that, I’d like to contribute. 

 

Loop in @Gengliang Wang.

 

Xianyin

 

From: Ryan Blue <rb...@netflix.com.INVALID>
Reply-To: <rb...@netflix.com>
Date: Saturday, July 13, 2019 at 6:54 AM
To: Shiv Prashant Sood <sh...@gmail.com>
Cc: Spark Dev List <de...@spark.apache.org>
Subject: Re: JDBC connector for DataSourceV2

 

I'm not aware of a JDBC connector effort. It would be great to have someone build one!

 

On Fri, Jul 12, 2019 at 3:33 PM Shiv Prashant Sood <sh...@gmail.com> wrote:

Can someone please help understand the current Status of DataSource V2 based JDBC connector? I see connectors for various file formats in Master, but can't find a JDBC implementation or related JIRA. 

 

DatasourceV2 APIs to me look in good shape to attempt a JDBC connector for READ/WRITE path.

Thanks & Regards,

Shiv


 

-- 

Ryan Blue

Software Engineer

Netflix

Re: JDBC connector for DataSourceV2

Posted by Shiv Prashant Sood <sh...@gmail.com>.

To me this looks like refactoring of DS1 JDBC to enable user provided
connection factories. In itself a good change, but IMO not DSV2 related.

I created a JIRA and added some goals. Please comments/add as relevant.

https://issues.apache.org/jira/browse/SPARK-28380

JIRA for DataSourceV2 API based JDBC connector.

Goals :

   - Generic connector based on JDBC that supports all databases (min bar
   is support for all V1 data bases).
   - Reference implementation and Interface for any specialized JDBC
   connectors.


Regards,
Shiv

On Sat, Jul 13, 2019 at 2:17 AM Gabor Somogyi <ga...@gmail.com>
wrote:

> Hi Guys,
>
> Don't know what's the intention exactly here but there is such a PR:
> https://github.com/apache/spark/pull/22560
> If that's what we need maybe we can resurrect it. BTW, I'm also interested
> in...
>
> BR,
> G
>
>
> On Sat, Jul 13, 2019 at 4:09 AM Shiv Prashant Sood <sh...@gmail.com>
> wrote:
>
>> Thanks all. I can also contribute toward this effort.
>>
>> Regards,
>> Shiv
>>
>> Sent from my iPhone
>>
>> On Jul 12, 2019, at 6:51 PM, Xianyin Xin <xi...@alibaba-inc.com>
>> wrote:
>>
>> If there’s nobody working on that, I’d like to contribute.
>>
>>
>>
>> Loop in @Gengliang Wang.
>>
>>
>>
>> Xianyin
>>
>>
>>
>> *From: *Ryan Blue <rb...@netflix.com.INVALID>
>> *Reply-To: *<rb...@netflix.com>
>> *Date: *Saturday, July 13, 2019 at 6:54 AM
>> *To: *Shiv Prashant Sood <sh...@gmail.com>
>> *Cc: *Spark Dev List <de...@spark.apache.org>
>> *Subject: *Re: JDBC connector for DataSourceV2
>>
>>
>>
>> I'm not aware of a JDBC connector effort. It would be great to have
>> someone build one!
>>
>>
>>
>> On Fri, Jul 12, 2019 at 3:33 PM Shiv Prashant Sood <
>> shivprashant@gmail.com> wrote:
>>
>> Can someone please help understand the current Status of DataSource V2
>> based JDBC connector? I see connectors for various file formats in Master,
>> but can't find a JDBC implementation or related JIRA.
>>
>>
>>
>> DatasourceV2 APIs to me look in good shape to attempt a JDBC connector
>> for READ/WRITE path.
>>
>> Thanks & Regards,
>>
>> Shiv
>>
>>
>>
>>
>> --
>>
>> Ryan Blue
>>
>> Software Engineer
>>
>> Netflix
>>
>>

Re: JDBC connector for DataSourceV2

Posted by Gabor Somogyi <ga...@gmail.com>.

Hi Guys,

Don't know what's the intention exactly here but there is such a PR:
https://github.com/apache/spark/pull/22560
If that's what we need maybe we can resurrect it. BTW, I'm also interested
in...

BR,
G


On Sat, Jul 13, 2019 at 4:09 AM Shiv Prashant Sood <sh...@gmail.com>
wrote:

> Thanks all. I can also contribute toward this effort.
>
> Regards,
> Shiv
>
> Sent from my iPhone
>
> On Jul 12, 2019, at 6:51 PM, Xianyin Xin <xi...@alibaba-inc.com>
> wrote:
>
> If there’s nobody working on that, I’d like to contribute.
>
>
>
> Loop in @Gengliang Wang.
>
>
>
> Xianyin
>
>
>
> *From: *Ryan Blue <rb...@netflix.com.INVALID>
> *Reply-To: *<rb...@netflix.com>
> *Date: *Saturday, July 13, 2019 at 6:54 AM
> *To: *Shiv Prashant Sood <sh...@gmail.com>
> *Cc: *Spark Dev List <de...@spark.apache.org>
> *Subject: *Re: JDBC connector for DataSourceV2
>
>
>
> I'm not aware of a JDBC connector effort. It would be great to have
> someone build one!
>
>
>
> On Fri, Jul 12, 2019 at 3:33 PM Shiv Prashant Sood <sh...@gmail.com>
> wrote:
>
> Can someone please help understand the current Status of DataSource V2
> based JDBC connector? I see connectors for various file formats in Master,
> but can't find a JDBC implementation or related JIRA.
>
>
>
> DatasourceV2 APIs to me look in good shape to attempt a JDBC connector for
> READ/WRITE path.
>
> Thanks & Regards,
>
> Shiv
>
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>
>

Re: JDBC connector for DataSourceV2

Posted by Shiv Prashant Sood <sh...@gmail.com>.

Thanks all. I can also contribute toward this effort.

Regards,
Shiv

Sent from my iPhone

> On Jul 12, 2019, at 6:51 PM, Xianyin Xin <xi...@alibaba-inc.com> wrote:
> 
> If there’s nobody working on that, I’d like to contribute.
>  
> Loop in @Gengliang Wang.
>  
> Xianyin
>  
> From: Ryan Blue <rb...@netflix.com.INVALID>
> Reply-To: <rb...@netflix.com>
> Date: Saturday, July 13, 2019 at 6:54 AM
> To: Shiv Prashant Sood <sh...@gmail.com>
> Cc: Spark Dev List <de...@spark.apache.org>
> Subject: Re: JDBC connector for DataSourceV2
>  
> I'm not aware of a JDBC connector effort. It would be great to have someone build one!
>  
> On Fri, Jul 12, 2019 at 3:33 PM Shiv Prashant Sood <sh...@gmail.com> wrote:
> Can someone please help understand the current Status of DataSource V2 based JDBC connector? I see connectors for various file formats in Master, but can't find a JDBC implementation or related JIRA.
>  
> DatasourceV2 APIs to me look in good shape to attempt a JDBC connector for READ/WRITE path.
> 
> Thanks & Regards,
> Shiv
> 
>  
> --
> Ryan Blue
> Software Engineer
> Netflix

Re: JDBC connector for DataSourceV2

Posted by Xianyin Xin <xi...@alibaba-inc.com>.

If there’s nobody working on that, I’d like to contribute. 

 

Loop in @Gengliang Wang.

 

Xianyin

 

From: Ryan Blue <rb...@netflix.com.INVALID>
Reply-To: <rb...@netflix.com>
Date: Saturday, July 13, 2019 at 6:54 AM
To: Shiv Prashant Sood <sh...@gmail.com>
Cc: Spark Dev List <de...@spark.apache.org>
Subject: Re: JDBC connector for DataSourceV2

 

I'm not aware of a JDBC connector effort. It would be great to have someone build one!

 

On Fri, Jul 12, 2019 at 3:33 PM Shiv Prashant Sood <sh...@gmail.com> wrote:

Can someone please help understand the current Status of DataSource V2 based JDBC connector? I see connectors for various file formats in Master, but can't find a JDBC implementation or related JIRA. 

 

DatasourceV2 APIs to me look in good shape to attempt a JDBC connector for READ/WRITE path.

Thanks & Regards,

Shiv


 

-- 

Ryan Blue

Software Engineer

Netflix

Re: JDBC connector for DataSourceV2

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

I'm not aware of a JDBC connector effort. It would be great to have someone
build one!

On Fri, Jul 12, 2019 at 3:33 PM Shiv Prashant Sood <sh...@gmail.com>
wrote:

> Can someone please help understand the current Status of DataSource V2
> based JDBC connector? I see connectors for various file formats in Master,
> but can't find a JDBC implementation or related JIRA.
>
> DatasourceV2 APIs to me look in good shape to attempt a JDBC connector for
> READ/WRITE path.
>
> Thanks & Regards,
> Shiv
>


-- 
Ryan Blue
Software Engineer
Netflix