You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/06/18 15:59:26 UTC

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #28863: [SPARK-31336][SQL] Support Oracle Kerberos login in JDBC connector

gaborgsomogyi commented on a change in pull request #28863:
URL: https://github.com/apache/spark/pull/28863#discussion_r442333450



##########
File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
##########
@@ -32,35 +32,29 @@ import org.apache.spark.tags.DockerTest
 
 /**
  * This patch was tested using the Oracle docker. Created this integration suite for the same.
- * The ojdbc6-11.2.0.2.0.jar was to be downloaded from the maven repository. Since there was
- * no jdbc jar available in the maven repository, the jar was downloaded from oracle site
- * manually and installed in the local; thus tested. So, for SparkQA test case run, the
- * ojdbc jar might be manually placed in the local maven repository(com/oracle/ojdbc6/11.2.0.2.0)
- * while Spark QA test run.
  *
  * The following would be the steps to test this
  * 1. Build Oracle database in Docker, please refer below link about how to.
  *    https://github.com/oracle/docker-images/blob/master/OracleDatabase/SingleInstance/README.md
  * 2. export ORACLE_DOCKER_IMAGE_NAME=$ORACLE_DOCKER_IMAGE_NAME
  *    Pull oracle $ORACLE_DOCKER_IMAGE_NAME image - docker pull $ORACLE_DOCKER_IMAGE_NAME
  * 3. Start docker - sudo service docker start
- * 4. Download oracle 11g driver jar and put it in maven local repo:
- *    (com/oracle/ojdbc6/11.2.0.2.0/ojdbc6-11.2.0.2.0.jar)
- * 5. The timeout and interval parameter to be increased from 60,1 to a high value for oracle test
- *    in DockerJDBCIntegrationSuite.scala (Locally tested with 200,200 and executed successfully).
- * 6. Run spark test - ./build/sbt "test-only org.apache.spark.sql.jdbc.OracleIntegrationSuite"
+ * 4. The timeout and interval parameter to be increased to a high value for oracle test in
+ *     DockerJDBCIntegrationSuite.scala (Locally tested with timeout(20.minutes), interval(1.second)
+ *     and executed successfully).
+ * 5. Run spark test - ./build/sbt "test-only org.apache.spark.sql.jdbc.OracleIntegrationSuite"
  *
- * All tests in this suite are ignored because of the dependency with the oracle jar from maven
- * repository.
+ * It has been validated with 18.4.0 Express Edition.

Review comment:
       Actual commands executed:
   ```
   git clone https://github.com/oracle/docker-images.git
   cd docker-images/OracleDatabase/SingleInstance/dockerfiles
   ./buildDockerImage.sh -v 18.4.0 -x
   export ORACLE_DOCKER_IMAGE_NAME=oracle/database:18.4.0-xe
   # Execute OracleIntegrationSuite manually from Intellij
   ```
   

##########
File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
##########
@@ -32,35 +32,29 @@ import org.apache.spark.tags.DockerTest
 
 /**
  * This patch was tested using the Oracle docker. Created this integration suite for the same.
- * The ojdbc6-11.2.0.2.0.jar was to be downloaded from the maven repository. Since there was
- * no jdbc jar available in the maven repository, the jar was downloaded from oracle site
- * manually and installed in the local; thus tested. So, for SparkQA test case run, the
- * ojdbc jar might be manually placed in the local maven repository(com/oracle/ojdbc6/11.2.0.2.0)
- * while Spark QA test run.
  *
  * The following would be the steps to test this
  * 1. Build Oracle database in Docker, please refer below link about how to.
  *    https://github.com/oracle/docker-images/blob/master/OracleDatabase/SingleInstance/README.md
  * 2. export ORACLE_DOCKER_IMAGE_NAME=$ORACLE_DOCKER_IMAGE_NAME
  *    Pull oracle $ORACLE_DOCKER_IMAGE_NAME image - docker pull $ORACLE_DOCKER_IMAGE_NAME
  * 3. Start docker - sudo service docker start
- * 4. Download oracle 11g driver jar and put it in maven local repo:
- *    (com/oracle/ojdbc6/11.2.0.2.0/ojdbc6-11.2.0.2.0.jar)
- * 5. The timeout and interval parameter to be increased from 60,1 to a high value for oracle test
- *    in DockerJDBCIntegrationSuite.scala (Locally tested with 200,200 and executed successfully).
- * 6. Run spark test - ./build/sbt "test-only org.apache.spark.sql.jdbc.OracleIntegrationSuite"
+ * 4. The timeout and interval parameter to be increased to a high value for oracle test in
+ *     DockerJDBCIntegrationSuite.scala (Locally tested with timeout(20.minutes), interval(1.second)
+ *     and executed successfully).
+ * 5. Run spark test - ./build/sbt "test-only org.apache.spark.sql.jdbc.OracleIntegrationSuite"
  *
- * All tests in this suite are ignored because of the dependency with the oracle jar from maven
- * repository.
+ * It has been validated with 18.4.0 Express Edition.
  */
+
 @DockerTest
 class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSparkSession {
   import testImplicits._
 
   override val db = new DatabaseOnDocker {
     override val imageName = sys.env("ORACLE_DOCKER_IMAGE_NAME")
     override val env = Map(
-      "ORACLE_ROOT_PASSWORD" -> "oracle"
+      "ORACLE_PWD" -> "oracle"

Review comment:
       18.4.0 Express Edition has changed the name of the parameter.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/OracleConnectionProvider.scala
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.jdbc.connection
+
+import java.security.PrivilegedExceptionAction
+import java.sql.{Connection, Driver}
+import java.util.Properties
+
+import org.apache.hadoop.security.UserGroupInformation
+
+import org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions
+
+private[sql] class OracleConnectionProvider(driver: Driver, options: JDBCOptions)

Review comment:
       The implementation is based on [this](https://docs.oracle.com/en/database/oracle/oracle-database/19/jjdbc/client-side-security.html).

##########
File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
##########
@@ -32,35 +32,29 @@ import org.apache.spark.tags.DockerTest
 
 /**
  * This patch was tested using the Oracle docker. Created this integration suite for the same.
- * The ojdbc6-11.2.0.2.0.jar was to be downloaded from the maven repository. Since there was
- * no jdbc jar available in the maven repository, the jar was downloaded from oracle site
- * manually and installed in the local; thus tested. So, for SparkQA test case run, the
- * ojdbc jar might be manually placed in the local maven repository(com/oracle/ojdbc6/11.2.0.2.0)
- * while Spark QA test run.
  *
  * The following would be the steps to test this
  * 1. Build Oracle database in Docker, please refer below link about how to.
  *    https://github.com/oracle/docker-images/blob/master/OracleDatabase/SingleInstance/README.md
  * 2. export ORACLE_DOCKER_IMAGE_NAME=$ORACLE_DOCKER_IMAGE_NAME
  *    Pull oracle $ORACLE_DOCKER_IMAGE_NAME image - docker pull $ORACLE_DOCKER_IMAGE_NAME
  * 3. Start docker - sudo service docker start
- * 4. Download oracle 11g driver jar and put it in maven local repo:
- *    (com/oracle/ojdbc6/11.2.0.2.0/ojdbc6-11.2.0.2.0.jar)
- * 5. The timeout and interval parameter to be increased from 60,1 to a high value for oracle test
- *    in DockerJDBCIntegrationSuite.scala (Locally tested with 200,200 and executed successfully).
- * 6. Run spark test - ./build/sbt "test-only org.apache.spark.sql.jdbc.OracleIntegrationSuite"
+ * 4. The timeout and interval parameter to be increased to a high value for oracle test in
+ *     DockerJDBCIntegrationSuite.scala (Locally tested with timeout(20.minutes), interval(1.second)

Review comment:
       I feel like it's a bit hacky to change timeout and interval manually but I think it would be an overkill to increase it globally.

##########
File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
##########
@@ -69,6 +63,7 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSpark
   }
 
   override def dataPreparation(conn: Connection): Unit = {
+    conn.setAutoCommit(false)

Review comment:
       In 18.4.0 Express Edition auto commit is enabled by default.

##########
File path: pom.xml
##########
@@ -984,6 +984,12 @@
         <version>8.2.2.jre8</version>
         <scope>test</scope>
       </dependency>
+      <dependency>
+        <groupId>com.oracle.database.jdbc</groupId>

Review comment:
       This is the latest version from the Oracle JDBC driver which supports JDK8, JDK9, and JDK11: https://mvnrepository.com/artifact/com.oracle.database.jdbc/ojdbc8

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/OracleConnectionProvider.scala
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.jdbc.connection
+
+import java.security.PrivilegedExceptionAction
+import java.sql.{Connection, Driver}
+import java.util.Properties
+
+import org.apache.hadoop.security.UserGroupInformation
+
+import org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions
+
+private[sql] class OracleConnectionProvider(driver: Driver, options: JDBCOptions)
+    extends SecureConnectionProvider(driver, options) {
+  override val appEntry: String = "kprb5module"
+
+  override def getConnection(): Connection = {
+    setAuthenticationConfigIfNeeded()
+    UserGroupInformation.loginUserFromKeytabAndReturnUGI(options.principal, options.keytab).doAs(
+      new PrivilegedExceptionAction[Connection]() {
+        override def run(): Connection = {
+          OracleConnectionProvider.super.getConnection()
+        }
+      }
+    )
+  }
+
+  override def getAdditionalProperties(): Properties = {
+    val result = new Properties()
+    // This prop needed to turn on kerberos authentication in the JDBC driver
+    result.put("oracle.net.authentication_services", "(KERBEROS5)");
+    result
+  }
+
+  override def setAuthenticationConfigIfNeeded(): Unit = SecurityConfigurationLock.synchronized {

Review comment:
       Here synchronization is important to avoid race just like in other providers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org