You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "LuciferYang (via GitHub)" <gi...@apache.org> on 2023/03/02 15:53:06 UTC

[GitHub] [spark] LuciferYang commented on a diff in pull request #40252: [SPARK-42555][CONNECT] Add JDBC to DataFrameReader

LuciferYang commented on code in PR #40252:
URL: https://github.com/apache/spark/pull/40252#discussion_r1123323759


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala:
##########
@@ -184,6 +186,67 @@ class DataFrameReader private[sql] (sparkSession: SparkSession) extends Logging
     }
   }
 
+  /**
+   * Construct a `DataFrame` representing the database table accessible via JDBC URL
+   * url named table and connection properties.
+   *
+   * You can find the JDBC-specific option and parameter documentation for reading tables
+   * via JDBC in
+   * <a href="https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#data-source-option">
+   *   Data Source Option</a> in the version you use.
+   *
+   * @since 3.5.0
+   */
+  def jdbc(url: String, table: String, properties: Properties): DataFrame = {
+    // properties should override settings in extraOptions.
+    this.extraOptions ++= properties.asScala
+    // explicit url and dbtable should override all
+    this.extraOptions ++= Seq("url" -> url, "dbtable" -> table)
+    format("jdbc").load()
+  }
+
+  // scalastyle:off line.size.limit
+  /**
+   * Construct a `DataFrame` representing the database table accessible via JDBC URL
+   * url named table. Partitions of the table will be retrieved in parallel based on the parameters
+   * passed to this function.
+   *
+   * Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
+   * your external database systems.
+   *
+   * You can find the JDBC-specific option and parameter documentation for reading tables via JDBC in
+   * <a href="https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#data-source-option">
+   *   Data Source Option</a> in the version you use.
+   *
+   * @param table Name of the table in the external database.
+   * @param columnName Alias of `partitionColumn` option. Refer to `partitionColumn` in
+   *                   <a href="https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#data-source-option">
+   *                     Data Source Option</a> in the version you use.
+   * @param connectionProperties JDBC database connection arguments, a list of arbitrary string
+   *                             tag/value. Normally at least a "user" and "password" property
+   *                             should be included. "fetchsize" can be used to control the
+   *                             number of rows per fetch and "queryTimeout" can be used to wait
+   *                             for a Statement object to execute to the given number of seconds.
+   * @since 3.5.0

Review Comment:
   Will this not be backport to 3.4.0?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org