You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Aaron Ilovici <ai...@wayfair.com> on 2016/05/26 15:08:26 UTC

JDBC Dialect for saving DataFrame into Vertica Table

I am attempting to write a DataFrame of Rows to Vertica via DataFrameWriter's jdbc function in the following manner:

dataframe.write().mode(SaveMode.Append).jdbc(url, table, properties);

This works when there are no NULL values in any of the Rows in my DataFrame. However, when there are rows, I get the following error:

ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 24)
java.sql.SQLFeatureNotSupportedException: [Vertica][JDBC](10220) Driver not capable.
    at com.vertica.exceptions.ExceptionConverter.toSQLException(Unknown Source)
    at com.vertica.jdbc.common.SPreparedStatement.checkTypeSupported(Unknown Source)
    at com.vertica.jdbc.common.SPreparedStatement.setNull(Unknown Source)

This appears to be Spark's attempt to set a null value in a PreparedStatement, but Vertica does not understand the type upon executing the transaction. I see in JdbcDialects.scala that there are dialects for MySQL, Postgres, DB2, MsSQLServer, Derby, and Oracle.

1 - Would writing a dialect for Vertica eleviate the issue, by setting a 'NULL' in a type that Vertica would understand?
2 - What would be the best way to do this without a Spark patch? Scala, Java, make a jar and call 'JdbcDialects.registerDialect(VerticaDialect)' once created?
3 - Where would one find the proper mapping between Spark DataTypes and Vertica DataTypes? I don't see 'NULL' handling for any of the dialects, only the base case 'case _ => None' - is None mapped to the proper NULL type elsewhere?

My environment: Spark 1.6, Vertica Driver 7.2.2, Java 1.7
I would be happy to create a Jira and submit a pull request with the VerticaDialect once I figure this out.

Thank you for any insight on this,

AARON ILOVICI
Software Engineer
Marketing Engineering

[cid:image001.png@01D1B73E.F0A44D90]

WAYFAIR
4 Copley Place
Boston, MA 02116
(617) 532-6100 x1231
ailovici@wayfair.com<ma...@wayfair.com>

Re: JDBC Dialect for saving DataFrame into Vertica Table

Posted by Aaron Ilovici <ai...@wayfair.com>.

Mohammed,

The Spark Connector for Vertica is still in Beta and while that is still an option I would prefer native support from Spark. Considering all data types seem to map with the aggregated dialect except for NULL types, I imagine the work involved would be relatively minimal. I would be happy to code it out and submit a pull request, but I a question about the dialect:


-          Are NULL data types implicitly defined somewhere? I don’t see NULL cases in the other dialects.

I have come up with answers to the other questions below, and found Java->Vertica data type conversions. The only piece I am missing is the NULL value, which is the root of the necessity to have this dialect in the first place.

Reynold,

I agree. Vertica seemed to have a MVP GA, then make it performant in later releases, so I doubt the performance loss would be drastic.

Thank you for any help,


AARON ILOVICI
Software Engineer
Marketing Engineering

[cid:image001.png@01D1B7F9.A3949B20]

WAYFAIR
4 Copley Place
Boston, MA 02116
(617) 532-6100 x1231
ailovici@wayfair.com<ma...@wayfair.com>



From: Reynold Xin <rx...@databricks.com>
Date: Thursday, May 26, 2016 at 6:11 PM
To: Mohammed Guller <mo...@glassbeam.com>
Cc: Aaron Ilovici <ai...@wayfair.com>, "user@spark.apache.org" <us...@spark.apache.org>, "dev@spark.apache.org" <de...@spark.apache.org>
Subject: Re: JDBC Dialect for saving DataFrame into Vertica Table

It's probably a good idea to have the vertica dialect too, since it doesn't seem like it'd be too difficult to maintain. It is not going to be as performant as the native Vertica data source, but is going to be much lighter weight.


On Thu, May 26, 2016 at 3:09 PM, Mohammed Guller <mo...@glassbeam.com>> wrote:
Vertica also provides a Spark connector. It was not GA the last time I looked at it, but available on the Vertica community site. Have you tried using the Vertica Spark connector instead of the JDBC driver?

Mohammed
Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Aaron Ilovici [mailto:ailovici@wayfair.com<ma...@wayfair.com>]
Sent: Thursday, May 26, 2016 8:08 AM
To: user@spark.apache.org<ma...@spark.apache.org>; dev@spark.apache.org<ma...@spark.apache.org>
Subject: JDBC Dialect for saving DataFrame into Vertica Table

I am attempting to write a DataFrame of Rows to Vertica via DataFrameWriter's jdbc function in the following manner:

dataframe.write().mode(SaveMode.Append).jdbc(url, table, properties);

This works when there are no NULL values in any of the Rows in my DataFrame. However, when there are rows, I get the following error:

ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 24)
java.sql.SQLFeatureNotSupportedException: [Vertica][JDBC](10220) Driver not capable.
    at com.vertica.exceptions.ExceptionConverter.toSQLException(Unknown Source)
    at com.vertica.jdbc.common.SPreparedStatement.checkTypeSupported(Unknown Source)
    at com.vertica.jdbc.common.SPreparedStatement.setNull(Unknown Source)

This appears to be Spark's attempt to set a null value in a PreparedStatement, but Vertica does not understand the type upon executing the transaction. I see in JdbcDialects.scala that there are dialects for MySQL, Postgres, DB2, MsSQLServer, Derby, and Oracle.

1 - Would writing a dialect for Vertica eleviate the issue, by setting a 'NULL' in a type that Vertica would understand?
2 - What would be the best way to do this without a Spark patch? Scala, Java, make a jar and call 'JdbcDialects.registerDialect(VerticaDialect)' once created?
3 - Where would one find the proper mapping between Spark DataTypes and Vertica DataTypes? I don't see 'NULL' handling for any of the dialects, only the base case 'case _ => None' - is None mapped to the proper NULL type elsewhere?

My environment: Spark 1.6, Vertica Driver 7.2.2, Java 1.7

I would be happy to create a Jira and submit a pull request with the VerticaDialect once I figure this out.

Thank you for any insight on this,

AARON ILOVICI
Software Engineer
Marketing Engineering

[cid:image002.png@01D1B7F9.A3949B20]

WAYFAIR
4 Copley Place
Boston, MA 02116
(617) 532-6100 x1231<tel:%28617%29%20532-6100%20x1231>
ailovici@wayfair.com<ma...@wayfair.com>

Re: JDBC Dialect for saving DataFrame into Vertica Table

Posted by Reynold Xin <rx...@databricks.com>.

It's probably a good idea to have the vertica dialect too, since it doesn't
seem like it'd be too difficult to maintain. It is not going to be as
performant as the native Vertica data source, but is going to be much
lighter weight.


On Thu, May 26, 2016 at 3:09 PM, Mohammed Guller <mo...@glassbeam.com>
wrote:

> Vertica also provides a Spark connector. It was not GA the last time I
> looked at it, but available on the Vertica community site. Have you tried
> using the Vertica Spark connector instead of the JDBC driver?
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Aaron Ilovici [mailto:ailovici@wayfair.com]
> *Sent:* Thursday, May 26, 2016 8:08 AM
> *To:* user@spark.apache.org; dev@spark.apache.org
> *Subject:* JDBC Dialect for saving DataFrame into Vertica Table
>
>
>
> I am attempting to write a DataFrame of Rows to Vertica via
> DataFrameWriter's jdbc function in the following manner:
>
>
>
> dataframe.write().mode(SaveMode.Append).jdbc(url, table, properties);
>
>
>
> This works when there are no NULL values in any of the Rows in my
> DataFrame. However, when there are rows, I get the following error:
>
>
>
> ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 24)
>
> java.sql.SQLFeatureNotSupportedException: [Vertica][JDBC](10220) Driver
> not capable.
>
>     at com.vertica.exceptions.ExceptionConverter.toSQLException(Unknown
> Source)
>
>     at
> com.vertica.jdbc.common.SPreparedStatement.checkTypeSupported(Unknown
> Source)
>
>     at com.vertica.jdbc.common.SPreparedStatement.setNull(Unknown Source)
>
>
>
> This appears to be Spark's attempt to set a null value in a
> PreparedStatement, but Vertica does not understand the type upon executing
> the transaction. I see in JdbcDialects.scala that there are dialects for
> MySQL, Postgres, DB2, MsSQLServer, Derby, and Oracle.
>
>
>
> 1 - Would writing a dialect for Vertica eleviate the issue, by setting a
> 'NULL' in a type that Vertica would understand?
>
> 2 - What would be the best way to do this without a Spark patch? Scala,
> Java, make a jar and call 'JdbcDialects.registerDialect(VerticaDialect)'
> once created?
>
> 3 - Where would one find the proper mapping between Spark DataTypes and
> Vertica DataTypes? I don't see 'NULL' handling for any of the dialects,
> only the base case 'case _ => None' - is None mapped to the proper NULL
> type elsewhere?
>
>
>
> My environment: Spark 1.6, Vertica Driver 7.2.2, Java 1.7
>
>
>
> I would be happy to create a Jira and submit a pull request with the
> VerticaDialect once I figure this out.
>
>
>
> Thank you for any insight on this,
>
>
>
> *AARON ILOVICI*
> Software Engineer
>
> Marketing Engineering
>
> *WAYFAIR*
> 4 Copley Place
> Boston, MA 02116
>
> (617) 532-6100 x1231
> ailovici@wayfair.com
>
>
>

Re: JDBC Dialect for saving DataFrame into Vertica Table

Posted by Reynold Xin <rx...@databricks.com>.

It's probably a good idea to have the vertica dialect too, since it doesn't
seem like it'd be too difficult to maintain. It is not going to be as
performant as the native Vertica data source, but is going to be much
lighter weight.


On Thu, May 26, 2016 at 3:09 PM, Mohammed Guller <mo...@glassbeam.com>
wrote:

> Vertica also provides a Spark connector. It was not GA the last time I
> looked at it, but available on the Vertica community site. Have you tried
> using the Vertica Spark connector instead of the JDBC driver?
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Aaron Ilovici [mailto:ailovici@wayfair.com]
> *Sent:* Thursday, May 26, 2016 8:08 AM
> *To:* user@spark.apache.org; dev@spark.apache.org
> *Subject:* JDBC Dialect for saving DataFrame into Vertica Table
>
>
>
> I am attempting to write a DataFrame of Rows to Vertica via
> DataFrameWriter's jdbc function in the following manner:
>
>
>
> dataframe.write().mode(SaveMode.Append).jdbc(url, table, properties);
>
>
>
> This works when there are no NULL values in any of the Rows in my
> DataFrame. However, when there are rows, I get the following error:
>
>
>
> ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 24)
>
> java.sql.SQLFeatureNotSupportedException: [Vertica][JDBC](10220) Driver
> not capable.
>
>     at com.vertica.exceptions.ExceptionConverter.toSQLException(Unknown
> Source)
>
>     at
> com.vertica.jdbc.common.SPreparedStatement.checkTypeSupported(Unknown
> Source)
>
>     at com.vertica.jdbc.common.SPreparedStatement.setNull(Unknown Source)
>
>
>
> This appears to be Spark's attempt to set a null value in a
> PreparedStatement, but Vertica does not understand the type upon executing
> the transaction. I see in JdbcDialects.scala that there are dialects for
> MySQL, Postgres, DB2, MsSQLServer, Derby, and Oracle.
>
>
>
> 1 - Would writing a dialect for Vertica eleviate the issue, by setting a
> 'NULL' in a type that Vertica would understand?
>
> 2 - What would be the best way to do this without a Spark patch? Scala,
> Java, make a jar and call 'JdbcDialects.registerDialect(VerticaDialect)'
> once created?
>
> 3 - Where would one find the proper mapping between Spark DataTypes and
> Vertica DataTypes? I don't see 'NULL' handling for any of the dialects,
> only the base case 'case _ => None' - is None mapped to the proper NULL
> type elsewhere?
>
>
>
> My environment: Spark 1.6, Vertica Driver 7.2.2, Java 1.7
>
>
>
> I would be happy to create a Jira and submit a pull request with the
> VerticaDialect once I figure this out.
>
>
>
> Thank you for any insight on this,
>
>
>
> *AARON ILOVICI*
> Software Engineer
>
> Marketing Engineering
>
> *WAYFAIR*
> 4 Copley Place
> Boston, MA 02116
>
> (617) 532-6100 x1231
> ailovici@wayfair.com
>
>
>

RE: JDBC Dialect for saving DataFrame into Vertica Table

Posted by Mohammed Guller <mo...@glassbeam.com>.

Vertica also provides a Spark connector. It was not GA the last time I looked at it, but available on the Vertica community site. Have you tried using the Vertica Spark connector instead of the JDBC driver?

Mohammed
Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Aaron Ilovici [mailto:ailovici@wayfair.com]
Sent: Thursday, May 26, 2016 8:08 AM
To: user@spark.apache.org; dev@spark.apache.org
Subject: JDBC Dialect for saving DataFrame into Vertica Table

I am attempting to write a DataFrame of Rows to Vertica via DataFrameWriter's jdbc function in the following manner:

dataframe.write().mode(SaveMode.Append).jdbc(url, table, properties);

This works when there are no NULL values in any of the Rows in my DataFrame. However, when there are rows, I get the following error:

ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 24)
java.sql.SQLFeatureNotSupportedException: [Vertica][JDBC](10220) Driver not capable.
    at com.vertica.exceptions.ExceptionConverter.toSQLException(Unknown Source)
    at com.vertica.jdbc.common.SPreparedStatement.checkTypeSupported(Unknown Source)
    at com.vertica.jdbc.common.SPreparedStatement.setNull(Unknown Source)

This appears to be Spark's attempt to set a null value in a PreparedStatement, but Vertica does not understand the type upon executing the transaction. I see in JdbcDialects.scala that there are dialects for MySQL, Postgres, DB2, MsSQLServer, Derby, and Oracle.

1 - Would writing a dialect for Vertica eleviate the issue, by setting a 'NULL' in a type that Vertica would understand?
2 - What would be the best way to do this without a Spark patch? Scala, Java, make a jar and call 'JdbcDialects.registerDialect(VerticaDialect)' once created?
3 - Where would one find the proper mapping between Spark DataTypes and Vertica DataTypes? I don't see 'NULL' handling for any of the dialects, only the base case 'case _ => None' - is None mapped to the proper NULL type elsewhere?

My environment: Spark 1.6, Vertica Driver 7.2.2, Java 1.7

I would be happy to create a Jira and submit a pull request with the VerticaDialect once I figure this out.

Thank you for any insight on this,

AARON ILOVICI
Software Engineer
Marketing Engineering

[cid:image001.png@01D1B760.973BD800]

WAYFAIR
4 Copley Place
Boston, MA 02116
(617) 532-6100 x1231
ailovici@wayfair.com<ma...@wayfair.com>

RE: JDBC Dialect for saving DataFrame into Vertica Table

Posted by Mohammed Guller <mo...@glassbeam.com>.

Vertica also provides a Spark connector. It was not GA the last time I looked at it, but available on the Vertica community site. Have you tried using the Vertica Spark connector instead of the JDBC driver?

Mohammed
Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Aaron Ilovici [mailto:ailovici@wayfair.com]
Sent: Thursday, May 26, 2016 8:08 AM
To: user@spark.apache.org; dev@spark.apache.org
Subject: JDBC Dialect for saving DataFrame into Vertica Table

I am attempting to write a DataFrame of Rows to Vertica via DataFrameWriter's jdbc function in the following manner:

dataframe.write().mode(SaveMode.Append).jdbc(url, table, properties);

This works when there are no NULL values in any of the Rows in my DataFrame. However, when there are rows, I get the following error:

ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 24)
java.sql.SQLFeatureNotSupportedException: [Vertica][JDBC](10220) Driver not capable.
    at com.vertica.exceptions.ExceptionConverter.toSQLException(Unknown Source)
    at com.vertica.jdbc.common.SPreparedStatement.checkTypeSupported(Unknown Source)
    at com.vertica.jdbc.common.SPreparedStatement.setNull(Unknown Source)

This appears to be Spark's attempt to set a null value in a PreparedStatement, but Vertica does not understand the type upon executing the transaction. I see in JdbcDialects.scala that there are dialects for MySQL, Postgres, DB2, MsSQLServer, Derby, and Oracle.

1 - Would writing a dialect for Vertica eleviate the issue, by setting a 'NULL' in a type that Vertica would understand?
2 - What would be the best way to do this without a Spark patch? Scala, Java, make a jar and call 'JdbcDialects.registerDialect(VerticaDialect)' once created?
3 - Where would one find the proper mapping between Spark DataTypes and Vertica DataTypes? I don't see 'NULL' handling for any of the dialects, only the base case 'case _ => None' - is None mapped to the proper NULL type elsewhere?

My environment: Spark 1.6, Vertica Driver 7.2.2, Java 1.7

I would be happy to create a Jira and submit a pull request with the VerticaDialect once I figure this out.

Thank you for any insight on this,

AARON ILOVICI
Software Engineer
Marketing Engineering

[cid:image001.png@01D1B760.973BD800]

WAYFAIR
4 Copley Place
Boston, MA 02116
(617) 532-6100 x1231
ailovici@wayfair.com<ma...@wayfair.com>