You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ivan (Jira)" <ji...@apache.org> on 2021/07/15 14:44:00 UTC

[jira] [Created] (SPARK-36163) Propagate correct JDBC properties in JDBC connector provider and add "connectionProvider" option

Ivan created SPARK-36163:
----------------------------

             Summary: Propagate correct JDBC properties in JDBC connector provider and add "connectionProvider" option
                 Key: SPARK-36163
                 URL: https://issues.apache.org/jira/browse/SPARK-36163
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.1.2, 3.1.1, 3.1.0
            Reporter: Ivan


There are a couple of issues with JDBC connection providers. The first is a bug caused by [https://github.com/apache/spark/commit/c3ce9701b458511255072c72b9b245036fa98653] where we would pass all properties, including JDBC data source keys, to the JDBC driver which results in errors like {{java.sql.SQLException: Unrecognized connection property 'url'}}.

Connection properties are supposed to only include vendor properties, url config is a JDBC option and should be excluded.

The fix would be replacing {{jdbcOptions.asProperties.asScala.foreach}} with {{jdbcOptions.asConnectionProperties.asScala.foreach}} which is java.sql.Driver friendly.

 

I also investigated the problem with multiple providers and I think there are a couple of oversights in {{ConnectionProvider}} implementation. I think it is missing two things:
 * Any {{JdbcConnectionProvider}} should take precedence over {{BasicConnectionProvider}}. {{BasicConnectionProvider}} should only be selected if there was no match found when inferring providers that can handle JDBC url.

 * There is currently no way to select a specific provider that you want, similar to how you can select a JDBC driver. The use case is, for example, having connection providers for two databases that handle the same URL but have slightly different semantics and you want to select one in one case and the other one in others.

 ** I think the first point could be discarded when the second one is addressed.

You can technically use {{spark.sql.sources.disabledJdbcConnProviderList}} to exclude ones that don’t need to be included, but I am not quite sure why it was done that way - it is much simpler to allow users to enforce the provider they want.

This ticket fixes it by adding a {{connectionProvider}} option to the JDBC data source that allows users to select a particular provider when the ambiguity arises.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org