You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Bryan Herger <br...@microfocus.com> on 2019/12/10 13:13:48 UTC

I would like to add JDBCDialect to support Vertica database

Hi, I am a Vertica support engineer, and we have open support requests around NULL values and SQL type conversion with DataFrame read/write over JDBC when connecting to a Vertica database.  The stack traces point to issues with the generic JDBCDialect in Spark-SQL.

I saw that other vendors (Teradata, DB2...) have contributed a JDBCDialect class to address JDBC compatibility, so I wrote up a dialect for Vertica.

The changeset is on my fork of apache/spark at https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d

I have tested this against Vertica 9.3 and found that this changeset addresses both issues reported to us (issue with NULL values - setNull() - for valid java.sql.Types, and String to VARCHAR conversion)

Is the an acceptable change?  If so, how should I go about submitting a pull request?

Thanks, Bryan Herger
Vertica Solution Engineer


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: I would like to add JDBCDialect to support Vertica database

Posted by Jungtaek Lim <ka...@gmail.com>.

If I understand correctly, you'll just want to package your implementation
with your preference of project manager (maven, sbt, etc.) which registers
your dialect implementation into JdbcDialects, and pass the jar and let end
users load the jar. That will automatically do everything and they can use
VerticaDialect and no need to do custom patch of Spark. That's how
third-party plugins are working.

On Thu, Dec 12, 2019 at 12:58 AM Bryan Herger <br...@microfocus.com>
wrote:

> It kind of already is.  I was able to build the VerticaDialect as a sort
> of plugin as follows:
>
>
>
> Check out apache/spark tree
>
> Copy in VerticaDialect.scala
>
> Build with “mvn -DskipTests compile”
>
> package the compiled class plus companion object into a JAR
>
> Copy JAR to jars folder in Spark binary installation (optional, probably
> can set path in an extra --jars argument instead)
>
>
>
> Then run the following test in spark-shell after creating Vertica table
> and sample data:
>
>
>
>
> org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(org.apache.spark.sql.jdbc.VerticaDialect)
>
> val jdbcDF = spark.read.format("jdbc").option("url",
> "jdbc:vertica://hpbox:5433/docker").option("dbtable",
> "test_alltypes").option("user", "dbadmin").option("password",
> "Vertica1!").load()
>
> jdbcDF.show()
>
> jdbcDF.write.mode("append").format("jdbc").option("url",
> "jdbc:vertica://hpbox:5433/docker").option("dbtable",
> "test_alltypes").option("user", "dbadmin").option("password",
> "Vertica1!").save()
>
> JdbcDialects.unregisterDialect(org.apache.spark.sql.jdbc.VerticaDialect)
>
>
>
> If it would be preferable to write documentation describing the above, I
> can do that instead.  The hard part is checking out the matching
> apache/spark tree then copying to the Spark cluster – I can install master
> branch and latest binary and apply patches since I have root on all my test
> boxes, but customers may not be able to.  Still, this provides another
> route to support new JDBC dialects.
>
>
>
> BryanH
>
>
>
> *From:* Wenchen Fan [mailto:cloud0fan@gmail.com]
> *Sent:* Wednesday, December 11, 2019 10:48 AM
> *To:* Xiao Li <li...@databricks.com>
> *Cc:* Bryan Herger <br...@microfocus.com>; Sean Owen <
> srowen@gmail.com>; dev@spark.apache.org
> *Subject:* Re: I would like to add JDBCDialect to support Vertica database
>
>
>
> Can we make the JDBCDialect a public API that users can plugin? It looks
> like an end-less job to make sure Spark JDBC source supports all databases.
>
>
>
> On Wed, Dec 11, 2019 at 11:41 PM Xiao Li <li...@databricks.com> wrote:
>
> You can follow how we test the other JDBC dialects. All JDBC dialects
> require the docker integration tests.
> https://github.com/apache/spark/tree/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc
>
>
>
>
>
> On Wed, Dec 11, 2019 at 7:33 AM Bryan Herger <br...@microfocus.com>
> wrote:
>
> Hi, to answer both questions raised:
>
>
>
> Though Vertica is derived from Postgres, Vertica does not recognize type
> names TEXT, NVARCHAR, BYTEA, ARRAY, and also handles DATETIME differently
> enough to cause issues.  The major changes are to use type names and date
> format supported by Vertica.
>
>
>
> For testing, I have a SQL script plus Scala and PySpark scripts, but these
> require a Vertica database to connect, so automated testing on a build
> server wouldn’t work.  It’s possible to include my test scripts and
> directions to run manually, but not sure where in the repo that would go.
> If automated testing is required, I can ask our engineers whether there
> exists something like a mockito that could be included.
>
>
>
> Thanks, Bryan H
>
>
>
> *From:* Xiao Li [mailto:lixiao@databricks.com]
> *Sent:* Wednesday, December 11, 2019 10:13 AM
> *To:* Sean Owen <sr...@gmail.com>
> *Cc:* Bryan Herger <br...@microfocus.com>; dev@spark.apache.org
> *Subject:* Re: I would like to add JDBCDialect to support Vertica database
>
>
>
> How can the dev community test it?
>
>
>
> Xiao
>
>
>
> On Wed, Dec 11, 2019 at 6:52 AM Sean Owen <sr...@gmail.com> wrote:
>
> It's probably OK, IMHO. The overhead of another dialect is small. Are
> there differences that require a new dialect? I assume so and might
> just be useful to summarize them if you open a PR.
>
> On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger
> <br...@microfocus.com> wrote:
> >
> > Hi, I am a Vertica support engineer, and we have open support requests
> around NULL values and SQL type conversion with DataFrame read/write over
> JDBC when connecting to a Vertica database.  The stack traces point to
> issues with the generic JDBCDialect in Spark-SQL.
> >
> > I saw that other vendors (Teradata, DB2...) have contributed a
> JDBCDialect class to address JDBC compatibility, so I wrote up a dialect
> for Vertica.
> >
> > The changeset is on my fork of apache/spark at
> https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d
> >
> > I have tested this against Vertica 9.3 and found that this changeset
> addresses both issues reported to us (issue with NULL values - setNull() -
> for valid java.sql.Types, and String to VARCHAR conversion)
> >
> > Is the an acceptable change?  If so, how should I go about submitting a
> pull request?
> >
> > Thanks, Bryan Herger
> > Vertica Solution Engineer
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
> --
>
> [image: Databricks Summit - Watch the talks]
> <https://databricks.com/sparkaisummit/north-america>
>
>
>
>
> --
>
> [image: Databricks Summit - Watch the talks]
> <https://databricks.com/sparkaisummit/north-america>
>
>

Re: I would like to add JDBCDialect to support Vertica database

Posted by Takeshi Yamamuro <li...@gmail.com>.

Not sure, too.
Can't you use Spark Packages for your scenario?
https://spark-packages.org/


On Thu, Dec 12, 2019 at 9:46 AM Hyukjin Kwon <gu...@gmail.com> wrote:

> I am not so sure about it too. I think it is enough to expose JDBCDialect
> as an API (which seems already is).
> It brings some overhead to dev (e.g., to test and review PRs related to
> another third party).
> Such third party integration might better exist as a third party library
> without a strong reason.
>
> 2019년 12월 12일 (목) 오전 12:58, Bryan Herger <br...@microfocus.com>님이
> 작성:
>
>> It kind of already is.  I was able to build the VerticaDialect as a sort
>> of plugin as follows:
>>
>>
>>
>> Check out apache/spark tree
>>
>> Copy in VerticaDialect.scala
>>
>> Build with “mvn -DskipTests compile”
>>
>> package the compiled class plus companion object into a JAR
>>
>> Copy JAR to jars folder in Spark binary installation (optional, probably
>> can set path in an extra --jars argument instead)
>>
>>
>>
>> Then run the following test in spark-shell after creating Vertica table
>> and sample data:
>>
>>
>>
>>
>> org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(org.apache.spark.sql.jdbc.VerticaDialect)
>>
>> val jdbcDF = spark.read.format("jdbc").option("url",
>> "jdbc:vertica://hpbox:5433/docker").option("dbtable",
>> "test_alltypes").option("user", "dbadmin").option("password",
>> "Vertica1!").load()
>>
>> jdbcDF.show()
>>
>> jdbcDF.write.mode("append").format("jdbc").option("url",
>> "jdbc:vertica://hpbox:5433/docker").option("dbtable",
>> "test_alltypes").option("user", "dbadmin").option("password",
>> "Vertica1!").save()
>>
>> JdbcDialects.unregisterDialect(org.apache.spark.sql.jdbc.VerticaDialect)
>>
>>
>>
>> If it would be preferable to write documentation describing the above, I
>> can do that instead.  The hard part is checking out the matching
>> apache/spark tree then copying to the Spark cluster – I can install master
>> branch and latest binary and apply patches since I have root on all my test
>> boxes, but customers may not be able to.  Still, this provides another
>> route to support new JDBC dialects.
>>
>>
>>
>> BryanH
>>
>>
>>
>> *From:* Wenchen Fan [mailto:cloud0fan@gmail.com]
>> *Sent:* Wednesday, December 11, 2019 10:48 AM
>> *To:* Xiao Li <li...@databricks.com>
>> *Cc:* Bryan Herger <br...@microfocus.com>; Sean Owen <
>> srowen@gmail.com>; dev@spark.apache.org
>> *Subject:* Re: I would like to add JDBCDialect to support Vertica
>> database
>>
>>
>>
>> Can we make the JDBCDialect a public API that users can plugin? It looks
>> like an end-less job to make sure Spark JDBC source supports all databases.
>>
>>
>>
>> On Wed, Dec 11, 2019 at 11:41 PM Xiao Li <li...@databricks.com> wrote:
>>
>> You can follow how we test the other JDBC dialects. All JDBC dialects
>> require the docker integration tests.
>> https://github.com/apache/spark/tree/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc
>>
>>
>>
>>
>>
>> On Wed, Dec 11, 2019 at 7:33 AM Bryan Herger <br...@microfocus.com>
>> wrote:
>>
>> Hi, to answer both questions raised:
>>
>>
>>
>> Though Vertica is derived from Postgres, Vertica does not recognize type
>> names TEXT, NVARCHAR, BYTEA, ARRAY, and also handles DATETIME differently
>> enough to cause issues.  The major changes are to use type names and date
>> format supported by Vertica.
>>
>>
>>
>> For testing, I have a SQL script plus Scala and PySpark scripts, but
>> these require a Vertica database to connect, so automated testing on a
>> build server wouldn’t work.  It’s possible to include my test scripts and
>> directions to run manually, but not sure where in the repo that would go.
>> If automated testing is required, I can ask our engineers whether there
>> exists something like a mockito that could be included.
>>
>>
>>
>> Thanks, Bryan H
>>
>>
>>
>> *From:* Xiao Li [mailto:lixiao@databricks.com]
>> *Sent:* Wednesday, December 11, 2019 10:13 AM
>> *To:* Sean Owen <sr...@gmail.com>
>> *Cc:* Bryan Herger <br...@microfocus.com>; dev@spark.apache.org
>> *Subject:* Re: I would like to add JDBCDialect to support Vertica
>> database
>>
>>
>>
>> How can the dev community test it?
>>
>>
>>
>> Xiao
>>
>>
>>
>> On Wed, Dec 11, 2019 at 6:52 AM Sean Owen <sr...@gmail.com> wrote:
>>
>> It's probably OK, IMHO. The overhead of another dialect is small. Are
>> there differences that require a new dialect? I assume so and might
>> just be useful to summarize them if you open a PR.
>>
>> On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger
>> <br...@microfocus.com> wrote:
>> >
>> > Hi, I am a Vertica support engineer, and we have open support requests
>> around NULL values and SQL type conversion with DataFrame read/write over
>> JDBC when connecting to a Vertica database.  The stack traces point to
>> issues with the generic JDBCDialect in Spark-SQL.
>> >
>> > I saw that other vendors (Teradata, DB2...) have contributed a
>> JDBCDialect class to address JDBC compatibility, so I wrote up a dialect
>> for Vertica.
>> >
>> > The changeset is on my fork of apache/spark at
>> https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d
>> >
>> > I have tested this against Vertica 9.3 and found that this changeset
>> addresses both issues reported to us (issue with NULL values - setNull() -
>> for valid java.sql.Types, and String to VARCHAR conversion)
>> >
>> > Is the an acceptable change?  If so, how should I go about submitting a
>> pull request?
>> >
>> > Thanks, Bryan Herger
>> > Vertica Solution Engineer
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>> --
>>
>> [image: Databricks Summit - Watch the talks]
>> <https://databricks.com/sparkaisummit/north-america>
>>
>>
>>
>>
>> --
>>
>> [image: Databricks Summit - Watch the talks]
>> <https://databricks.com/sparkaisummit/north-america>
>>
>>

-- 
---
Takeshi Yamamuro

Re: I would like to add JDBCDialect to support Vertica database

Posted by Hyukjin Kwon <gu...@gmail.com>.

I am not so sure about it too. I think it is enough to expose JDBCDialect
as an API (which seems already is).
It brings some overhead to dev (e.g., to test and review PRs related to
another third party).
Such third party integration might better exist as a third party library
without a strong reason.

2019년 12월 12일 (목) 오전 12:58, Bryan Herger <br...@microfocus.com>님이 작성:

> It kind of already is.  I was able to build the VerticaDialect as a sort
> of plugin as follows:
>
>
>
> Check out apache/spark tree
>
> Copy in VerticaDialect.scala
>
> Build with “mvn -DskipTests compile”
>
> package the compiled class plus companion object into a JAR
>
> Copy JAR to jars folder in Spark binary installation (optional, probably
> can set path in an extra --jars argument instead)
>
>
>
> Then run the following test in spark-shell after creating Vertica table
> and sample data:
>
>
>
>
> org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(org.apache.spark.sql.jdbc.VerticaDialect)
>
> val jdbcDF = spark.read.format("jdbc").option("url",
> "jdbc:vertica://hpbox:5433/docker").option("dbtable",
> "test_alltypes").option("user", "dbadmin").option("password",
> "Vertica1!").load()
>
> jdbcDF.show()
>
> jdbcDF.write.mode("append").format("jdbc").option("url",
> "jdbc:vertica://hpbox:5433/docker").option("dbtable",
> "test_alltypes").option("user", "dbadmin").option("password",
> "Vertica1!").save()
>
> JdbcDialects.unregisterDialect(org.apache.spark.sql.jdbc.VerticaDialect)
>
>
>
> If it would be preferable to write documentation describing the above, I
> can do that instead.  The hard part is checking out the matching
> apache/spark tree then copying to the Spark cluster – I can install master
> branch and latest binary and apply patches since I have root on all my test
> boxes, but customers may not be able to.  Still, this provides another
> route to support new JDBC dialects.
>
>
>
> BryanH
>
>
>
> *From:* Wenchen Fan [mailto:cloud0fan@gmail.com]
> *Sent:* Wednesday, December 11, 2019 10:48 AM
> *To:* Xiao Li <li...@databricks.com>
> *Cc:* Bryan Herger <br...@microfocus.com>; Sean Owen <
> srowen@gmail.com>; dev@spark.apache.org
> *Subject:* Re: I would like to add JDBCDialect to support Vertica database
>
>
>
> Can we make the JDBCDialect a public API that users can plugin? It looks
> like an end-less job to make sure Spark JDBC source supports all databases.
>
>
>
> On Wed, Dec 11, 2019 at 11:41 PM Xiao Li <li...@databricks.com> wrote:
>
> You can follow how we test the other JDBC dialects. All JDBC dialects
> require the docker integration tests.
> https://github.com/apache/spark/tree/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc
>
>
>
>
>
> On Wed, Dec 11, 2019 at 7:33 AM Bryan Herger <br...@microfocus.com>
> wrote:
>
> Hi, to answer both questions raised:
>
>
>
> Though Vertica is derived from Postgres, Vertica does not recognize type
> names TEXT, NVARCHAR, BYTEA, ARRAY, and also handles DATETIME differently
> enough to cause issues.  The major changes are to use type names and date
> format supported by Vertica.
>
>
>
> For testing, I have a SQL script plus Scala and PySpark scripts, but these
> require a Vertica database to connect, so automated testing on a build
> server wouldn’t work.  It’s possible to include my test scripts and
> directions to run manually, but not sure where in the repo that would go.
> If automated testing is required, I can ask our engineers whether there
> exists something like a mockito that could be included.
>
>
>
> Thanks, Bryan H
>
>
>
> *From:* Xiao Li [mailto:lixiao@databricks.com]
> *Sent:* Wednesday, December 11, 2019 10:13 AM
> *To:* Sean Owen <sr...@gmail.com>
> *Cc:* Bryan Herger <br...@microfocus.com>; dev@spark.apache.org
> *Subject:* Re: I would like to add JDBCDialect to support Vertica database
>
>
>
> How can the dev community test it?
>
>
>
> Xiao
>
>
>
> On Wed, Dec 11, 2019 at 6:52 AM Sean Owen <sr...@gmail.com> wrote:
>
> It's probably OK, IMHO. The overhead of another dialect is small. Are
> there differences that require a new dialect? I assume so and might
> just be useful to summarize them if you open a PR.
>
> On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger
> <br...@microfocus.com> wrote:
> >
> > Hi, I am a Vertica support engineer, and we have open support requests
> around NULL values and SQL type conversion with DataFrame read/write over
> JDBC when connecting to a Vertica database.  The stack traces point to
> issues with the generic JDBCDialect in Spark-SQL.
> >
> > I saw that other vendors (Teradata, DB2...) have contributed a
> JDBCDialect class to address JDBC compatibility, so I wrote up a dialect
> for Vertica.
> >
> > The changeset is on my fork of apache/spark at
> https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d
> >
> > I have tested this against Vertica 9.3 and found that this changeset
> addresses both issues reported to us (issue with NULL values - setNull() -
> for valid java.sql.Types, and String to VARCHAR conversion)
> >
> > Is the an acceptable change?  If so, how should I go about submitting a
> pull request?
> >
> > Thanks, Bryan Herger
> > Vertica Solution Engineer
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
> --
>
> [image: Databricks Summit - Watch the talks]
> <https://databricks.com/sparkaisummit/north-america>
>
>
>
>
> --
>
> [image: Databricks Summit - Watch the talks]
> <https://databricks.com/sparkaisummit/north-america>
>
>

RE: I would like to add JDBCDialect to support Vertica database

Posted by Bryan Herger <br...@microfocus.com>.

It kind of already is.  I was able to build the VerticaDialect as a sort of plugin as follows:

Check out apache/spark tree
Copy in VerticaDialect.scala
Build with “mvn -DskipTests compile”
package the compiled class plus companion object into a JAR
Copy JAR to jars folder in Spark binary installation (optional, probably can set path in an extra --jars argument instead)

Then run the following test in spark-shell after creating Vertica table and sample data:

org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(org.apache.spark.sql.jdbc.VerticaDialect)
val jdbcDF = spark.read.format("jdbc").option("url", "jdbc:vertica://hpbox:5433/docker").option("dbtable", "test_alltypes").option("user", "dbadmin").option("password", "Vertica1!").load()
jdbcDF.show()
jdbcDF.write.mode("append").format("jdbc").option("url", "jdbc:vertica://hpbox:5433/docker").option("dbtable", "test_alltypes").option("user", "dbadmin").option("password", "Vertica1!").save()
JdbcDialects.unregisterDialect(org.apache.spark.sql.jdbc.VerticaDialect)

If it would be preferable to write documentation describing the above, I can do that instead.  The hard part is checking out the matching apache/spark tree then copying to the Spark cluster – I can install master branch and latest binary and apply patches since I have root on all my test boxes, but customers may not be able to.  Still, this provides another route to support new JDBC dialects.

BryanH

From: Wenchen Fan [mailto:cloud0fan@gmail.com]
Sent: Wednesday, December 11, 2019 10:48 AM
To: Xiao Li <li...@databricks.com>
Cc: Bryan Herger <br...@microfocus.com>; Sean Owen <sr...@gmail.com>; dev@spark.apache.org
Subject: Re: I would like to add JDBCDialect to support Vertica database

Can we make the JDBCDialect a public API that users can plugin? It looks like an end-less job to make sure Spark JDBC source supports all databases.

On Wed, Dec 11, 2019 at 11:41 PM Xiao Li <li...@databricks.com>> wrote:
You can follow how we test the other JDBC dialects. All JDBC dialects require the docker integration tests. https://github.com/apache/spark/tree/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc

On Wed, Dec 11, 2019 at 7:33 AM Bryan Herger <br...@microfocus.com>> wrote:
Hi, to answer both questions raised:

Though Vertica is derived from Postgres, Vertica does not recognize type names TEXT, NVARCHAR, BYTEA, ARRAY, and also handles DATETIME differently enough to cause issues.  The major changes are to use type names and date format supported by Vertica.

For testing, I have a SQL script plus Scala and PySpark scripts, but these require a Vertica database to connect, so automated testing on a build server wouldn’t work.  It’s possible to include my test scripts and directions to run manually, but not sure where in the repo that would go.  If automated testing is required, I can ask our engineers whether there exists something like a mockito that could be included.

Thanks, Bryan H

From: Xiao Li [mailto:lixiao@databricks.com<ma...@databricks.com>]
Sent: Wednesday, December 11, 2019 10:13 AM
To: Sean Owen <sr...@gmail.com>>
Cc: Bryan Herger <br...@microfocus.com>>; dev@spark.apache.org<ma...@spark.apache.org>
Subject: Re: I would like to add JDBCDialect to support Vertica database

How can the dev community test it?

Xiao

On Wed, Dec 11, 2019 at 6:52 AM Sean Owen <sr...@gmail.com>> wrote:
It's probably OK, IMHO. The overhead of another dialect is small. Are
there differences that require a new dialect? I assume so and might
just be useful to summarize them if you open a PR.

On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger
<br...@microfocus.com>> wrote:
>
> Hi, I am a Vertica support engineer, and we have open support requests around NULL values and SQL type conversion with DataFrame read/write over JDBC when connecting to a Vertica database.  The stack traces point to issues with the generic JDBCDialect in Spark-SQL.
>
> I saw that other vendors (Teradata, DB2...) have contributed a JDBCDialect class to address JDBC compatibility, so I wrote up a dialect for Vertica.
>
> The changeset is on my fork of apache/spark at https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d
>
> I have tested this against Vertica 9.3 and found that this changeset addresses both issues reported to us (issue with NULL values - setNull() - for valid java.sql.Types, and String to VARCHAR conversion)
>
> Is the an acceptable change?  If so, how should I go about submitting a pull request?
>
> Thanks, Bryan Herger
> Vertica Solution Engineer
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org<ma...@spark.apache.org>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org<ma...@spark.apache.org>
--
[Databricks Summit - Watch the talks]<https://databricks.com/sparkaisummit/north-america>

--
[Databricks Summit - Watch the talks]<https://databricks.com/sparkaisummit/north-america>

Re: I would like to add JDBCDialect to support Vertica database

Posted by Wenchen Fan <cl...@gmail.com>.

Can we make the JDBCDialect a public API that users can plugin? It looks
like an end-less job to make sure Spark JDBC source supports all databases.

On Wed, Dec 11, 2019 at 11:41 PM Xiao Li <li...@databricks.com> wrote:

> You can follow how we test the other JDBC dialects. All JDBC dialects
> require the docker integration tests.
> https://github.com/apache/spark/tree/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc
>
>
> On Wed, Dec 11, 2019 at 7:33 AM Bryan Herger <br...@microfocus.com>
> wrote:
>
>> Hi, to answer both questions raised:
>>
>>
>>
>> Though Vertica is derived from Postgres, Vertica does not recognize type
>> names TEXT, NVARCHAR, BYTEA, ARRAY, and also handles DATETIME differently
>> enough to cause issues.  The major changes are to use type names and date
>> format supported by Vertica.
>>
>>
>>
>> For testing, I have a SQL script plus Scala and PySpark scripts, but
>> these require a Vertica database to connect, so automated testing on a
>> build server wouldn’t work.  It’s possible to include my test scripts and
>> directions to run manually, but not sure where in the repo that would go.
>> If automated testing is required, I can ask our engineers whether there
>> exists something like a mockito that could be included.
>>
>>
>>
>> Thanks, Bryan H
>>
>>
>>
>> *From:* Xiao Li [mailto:lixiao@databricks.com]
>> *Sent:* Wednesday, December 11, 2019 10:13 AM
>> *To:* Sean Owen <sr...@gmail.com>
>> *Cc:* Bryan Herger <br...@microfocus.com>; dev@spark.apache.org
>> *Subject:* Re: I would like to add JDBCDialect to support Vertica
>> database
>>
>>
>>
>> How can the dev community test it?
>>
>>
>>
>> Xiao
>>
>>
>>
>> On Wed, Dec 11, 2019 at 6:52 AM Sean Owen <sr...@gmail.com> wrote:
>>
>> It's probably OK, IMHO. The overhead of another dialect is small. Are
>> there differences that require a new dialect? I assume so and might
>> just be useful to summarize them if you open a PR.
>>
>> On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger
>> <br...@microfocus.com> wrote:
>> >
>> > Hi, I am a Vertica support engineer, and we have open support requests
>> around NULL values and SQL type conversion with DataFrame read/write over
>> JDBC when connecting to a Vertica database.  The stack traces point to
>> issues with the generic JDBCDialect in Spark-SQL.
>> >
>> > I saw that other vendors (Teradata, DB2...) have contributed a
>> JDBCDialect class to address JDBC compatibility, so I wrote up a dialect
>> for Vertica.
>> >
>> > The changeset is on my fork of apache/spark at
>> https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d
>> >
>> > I have tested this against Vertica 9.3 and found that this changeset
>> addresses both issues reported to us (issue with NULL values - setNull() -
>> for valid java.sql.Types, and String to VARCHAR conversion)
>> >
>> > Is the an acceptable change?  If so, how should I go about submitting a
>> pull request?
>> >
>> > Thanks, Bryan Herger
>> > Vertica Solution Engineer
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>> --
>>
>> [image: Databricks Summit - Watch the talks]
>> <https://databricks.com/sparkaisummit/north-america>
>>
>
>
> --
> [image: Databricks Summit - Watch the talks]
> <https://databricks.com/sparkaisummit/north-america>
>

Re: I would like to add JDBCDialect to support Vertica database

Posted by Xiao Li <li...@databricks.com>.

You can follow how we test the other JDBC dialects. All JDBC dialects
require the docker integration tests.
https://github.com/apache/spark/tree/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc


On Wed, Dec 11, 2019 at 7:33 AM Bryan Herger <br...@microfocus.com>
wrote:

> Hi, to answer both questions raised:
>
>
>
> Though Vertica is derived from Postgres, Vertica does not recognize type
> names TEXT, NVARCHAR, BYTEA, ARRAY, and also handles DATETIME differently
> enough to cause issues.  The major changes are to use type names and date
> format supported by Vertica.
>
>
>
> For testing, I have a SQL script plus Scala and PySpark scripts, but these
> require a Vertica database to connect, so automated testing on a build
> server wouldn’t work.  It’s possible to include my test scripts and
> directions to run manually, but not sure where in the repo that would go.
> If automated testing is required, I can ask our engineers whether there
> exists something like a mockito that could be included.
>
>
>
> Thanks, Bryan H
>
>
>
> *From:* Xiao Li [mailto:lixiao@databricks.com]
> *Sent:* Wednesday, December 11, 2019 10:13 AM
> *To:* Sean Owen <sr...@gmail.com>
> *Cc:* Bryan Herger <br...@microfocus.com>; dev@spark.apache.org
> *Subject:* Re: I would like to add JDBCDialect to support Vertica database
>
>
>
> How can the dev community test it?
>
>
>
> Xiao
>
>
>
> On Wed, Dec 11, 2019 at 6:52 AM Sean Owen <sr...@gmail.com> wrote:
>
> It's probably OK, IMHO. The overhead of another dialect is small. Are
> there differences that require a new dialect? I assume so and might
> just be useful to summarize them if you open a PR.
>
> On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger
> <br...@microfocus.com> wrote:
> >
> > Hi, I am a Vertica support engineer, and we have open support requests
> around NULL values and SQL type conversion with DataFrame read/write over
> JDBC when connecting to a Vertica database.  The stack traces point to
> issues with the generic JDBCDialect in Spark-SQL.
> >
> > I saw that other vendors (Teradata, DB2...) have contributed a
> JDBCDialect class to address JDBC compatibility, so I wrote up a dialect
> for Vertica.
> >
> > The changeset is on my fork of apache/spark at
> https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d
> >
> > I have tested this against Vertica 9.3 and found that this changeset
> addresses both issues reported to us (issue with NULL values - setNull() -
> for valid java.sql.Types, and String to VARCHAR conversion)
> >
> > Is the an acceptable change?  If so, how should I go about submitting a
> pull request?
> >
> > Thanks, Bryan Herger
> > Vertica Solution Engineer
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
> --
>
> [image: Databricks Summit - Watch the talks]
> <https://databricks.com/sparkaisummit/north-america>
>


-- 
[image: Databricks Summit - Watch the talks]
<https://databricks.com/sparkaisummit/north-america>

RE: I would like to add JDBCDialect to support Vertica database

Posted by Bryan Herger <br...@microfocus.com>.

Hi, to answer both questions raised:

Though Vertica is derived from Postgres, Vertica does not recognize type names TEXT, NVARCHAR, BYTEA, ARRAY, and also handles DATETIME differently enough to cause issues.  The major changes are to use type names and date format supported by Vertica.

For testing, I have a SQL script plus Scala and PySpark scripts, but these require a Vertica database to connect, so automated testing on a build server wouldn’t work.  It’s possible to include my test scripts and directions to run manually, but not sure where in the repo that would go.  If automated testing is required, I can ask our engineers whether there exists something like a mockito that could be included.

Thanks, Bryan H

From: Xiao Li [mailto:lixiao@databricks.com]
Sent: Wednesday, December 11, 2019 10:13 AM
To: Sean Owen <sr...@gmail.com>
Cc: Bryan Herger <br...@microfocus.com>; dev@spark.apache.org
Subject: Re: I would like to add JDBCDialect to support Vertica database

How can the dev community test it?

Xiao

On Wed, Dec 11, 2019 at 6:52 AM Sean Owen <sr...@gmail.com>> wrote:
It's probably OK, IMHO. The overhead of another dialect is small. Are
there differences that require a new dialect? I assume so and might
just be useful to summarize them if you open a PR.

On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger
<br...@microfocus.com>> wrote:
>
> Hi, I am a Vertica support engineer, and we have open support requests around NULL values and SQL type conversion with DataFrame read/write over JDBC when connecting to a Vertica database.  The stack traces point to issues with the generic JDBCDialect in Spark-SQL.
>
> I saw that other vendors (Teradata, DB2...) have contributed a JDBCDialect class to address JDBC compatibility, so I wrote up a dialect for Vertica.
>
> The changeset is on my fork of apache/spark at https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d
>
> I have tested this against Vertica 9.3 and found that this changeset addresses both issues reported to us (issue with NULL values - setNull() - for valid java.sql.Types, and String to VARCHAR conversion)
>
> Is the an acceptable change?  If so, how should I go about submitting a pull request?
>
> Thanks, Bryan Herger
> Vertica Solution Engineer
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org<ma...@spark.apache.org>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org<ma...@spark.apache.org>
--
[Databricks Summit - Watch the talks]<https://databricks.com/sparkaisummit/north-america>

Re: I would like to add JDBCDialect to support Vertica database

Posted by Xiao Li <li...@databricks.com>.

How can the dev community test it?

Xiao

On Wed, Dec 11, 2019 at 6:52 AM Sean Owen <sr...@gmail.com> wrote:

> It's probably OK, IMHO. The overhead of another dialect is small. Are
> there differences that require a new dialect? I assume so and might
> just be useful to summarize them if you open a PR.
>
> On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger
> <br...@microfocus.com> wrote:
> >
> > Hi, I am a Vertica support engineer, and we have open support requests
> around NULL values and SQL type conversion with DataFrame read/write over
> JDBC when connecting to a Vertica database.  The stack traces point to
> issues with the generic JDBCDialect in Spark-SQL.
> >
> > I saw that other vendors (Teradata, DB2...) have contributed a
> JDBCDialect class to address JDBC compatibility, so I wrote up a dialect
> for Vertica.
> >
> > The changeset is on my fork of apache/spark at
> https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d
> >
> > I have tested this against Vertica 9.3 and found that this changeset
> addresses both issues reported to us (issue with NULL values - setNull() -
> for valid java.sql.Types, and String to VARCHAR conversion)
> >
> > Is the an acceptable change?  If so, how should I go about submitting a
> pull request?
> >
> > Thanks, Bryan Herger
> > Vertica Solution Engineer
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
> --
[image: Databricks Summit - Watch the talks]
<https://databricks.com/sparkaisummit/north-america>

Re: I would like to add JDBCDialect to support Vertica database

Posted by Sean Owen <sr...@gmail.com>.

It's probably OK, IMHO. The overhead of another dialect is small. Are
there differences that require a new dialect? I assume so and might
just be useful to summarize them if you open a PR.

On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger
<br...@microfocus.com> wrote:
>
> Hi, I am a Vertica support engineer, and we have open support requests around NULL values and SQL type conversion with DataFrame read/write over JDBC when connecting to a Vertica database.  The stack traces point to issues with the generic JDBCDialect in Spark-SQL.
>
> I saw that other vendors (Teradata, DB2...) have contributed a JDBCDialect class to address JDBC compatibility, so I wrote up a dialect for Vertica.
>
> The changeset is on my fork of apache/spark at https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d
>
> I have tested this against Vertica 9.3 and found that this changeset addresses both issues reported to us (issue with NULL values - setNull() - for valid java.sql.Types, and String to VARCHAR conversion)
>
> Is the an acceptable change?  If so, how should I go about submitting a pull request?
>
> Thanks, Bryan Herger
> Vertica Solution Engineer
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org