You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2017/07/24 09:04:00 UTC

[jira] [Commented] (SPARK-21519) Add an option to the JDBC data source to initialize the environment of the remote database session

    [ https://issues.apache.org/jira/browse/SPARK-21519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098105#comment-16098105 ] 

Apache Spark commented on SPARK-21519:
--------------------------------------

User 'LucaCanali' has created a pull request for this issue:
https://github.com/apache/spark/pull/18724

> Add an option to the JDBC data source to initialize the environment of the remote database session
> --------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21519
>                 URL: https://issues.apache.org/jira/browse/SPARK-21519
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.1.0, 2.1.1, 2.2.0
>            Reporter: Luca Canali
>            Priority: Minor
>
> This proposes an option to the JDBC datasource, tentatively called " sessionInitStatement" to implement the functionality of session initialization present for example in the Sqoop connector for Oracle (see https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements ) . After each database session is opened to the remote DB, and before starting to read data, this option executes a custom SQL statement (or a PL/SQL block in the case of Oracle).
> Example of usage, relevant to Oracle JDBC:
> ```
> val preambleSQL="""
> begin 
>   execute immediate 'alter session set tracefile_identifier=sparkora'; 
>   execute immediate 'alter session set "_serial_direct_read"=true';
>   execute immediate 'alter session set time_zone=''+02:00''';
> end;
> bin/spark-shell –jars ojdb6.jar
> val df = spark.read.format("jdbc").option("url", "jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", "oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, systimestamp, current_timestamp, localtimestamp from dual)").option("user", "MYUSER").option("password", "MYPASSWORK").option("fetchsize",1000).option("sessionInitStatement", preambleSQL).load()
> df.show(5,false)
> ```
> Comments: This proposal has been developed and tested for connecting the Spark JDBC data source to Oracle databases, however it believe it can be useful for other target DBs too, as it is quite generic.  
> Note the proposed code allows to inject SQL into the target database. This is not a security concern as such, as it requires password authentication, however beware of the possibilities for injecting user-provided SQL (and PL/SQL) that this opens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org