You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/03 17:47:54 UTC

[GitHub] [spark] huaxingao opened a new pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

huaxingao opened a new pull request #29939:
URL: https://github.com/apache/spark/pull/29939


   
   
   ### What changes were proposed in this pull request?
   Support multiple catalogs in the following use case:
   ```
   DataFrameReader.jdbc(url, "catalog.db.tbl", properties)
   DataFrameReader.jdbc(url, "catalog.db.tbl", columnName, lowerBound, upperBound, numPartitions, properties)
   DataFrameReader.jdbc(url, "catalog.db.tbl",  predicates, properties)
   ```
   
   
   ### Why are the changes needed?
   
   Make DataFrameWriter.jdbc work for DataSource V2
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   new tests
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703777702


   **[Test build #129418 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129418/testReport)** for PR 29939 at commit [`5fb2a1a`](https://github.com/apache/spark/commit/5fb2a1aedf696356124b004fff723256e232632e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703777702


   **[Test build #129418 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129418/testReport)** for PR 29939 at commit [`5fb2a1a`](https://github.com/apache/spark/commit/5fb2a1aedf696356124b004fff723256e232632e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-704444581


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34064/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703872868


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

huaxingao commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r499959316



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -333,9 +333,26 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
     assertNoSpecifiedSchema("jdbc")
     // properties should override settings in extraOptions.
     this.extraOptions ++= properties.asScala
-    // explicit url and dbtable should override all
-    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table)
-    format("jdbc").load()
+    // explicit url should override all
+    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url)
+
+    import sparkSession.sessionState.analyzer.NonSessionCatalogAndIdentifier
+
+    this.source = "jdbc"
+    if (table.contains(" ")) { // if table is not a table name, e.g. a SELECT statement

Review comment:
       I don't have a good way to check this. It could be a SELECT statement. I think it could also be other statements such as JOIN or inline table. Seems to me that if it's a table name, it shouldn't contains a space. Please let me know if you have a better way to check. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703872868






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703976184


   **[Test build #129431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129431/testReport)** for PR 29939 at commit [`e1b2ad6`](https://github.com/apache/spark/commit/e1b2ad65127f5a687d850eca5c057fffe3f08217).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703140382


   **[Test build #129379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129379/testReport)** for PR 29939 at commit [`ffe8e42`](https://github.com/apache/spark/commit/ffe8e42ac3dbf9b9a528458ea5b0a82a2c65e295).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r501441101



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -221,4 +221,21 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession {
       checkAnswer(sql("SELECT name, id FROM h2.test.abc"), Row("bob", 4))
     }
   }
+
+  test("DataFrameReader: jdbc") {
+    withTable("h2.test.abc") {
+      sql("CREATE TABLE h2.test.abc USING _ AS SELECT * FROM h2.test.people")
+      val properties = new Properties()
+      val df1 = spark.read.jdbc(url, "h2.test.abc", properties)

Review comment:
       I'm a bit confused about this. There are 3 ways to use JDBC data source:
   1. use `DataFrameReader/Writer` API to access JDBC tables/queries directly.
   1. register as a table, and access the table.
   1. register as a catalog, and access tables inside the catalog.
   
   `spark.read.jdbc(url, "h2.test.abc", properties)` seems like a mix of 1 and 3. What's the use case you are targeting?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r501488458



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -221,4 +221,21 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession {
       checkAnswer(sql("SELECT name, id FROM h2.test.abc"), Row("bob", 4))
     }
   }
+
+  test("DataFrameReader: jdbc") {
+    withTable("h2.test.abc") {
+      sql("CREATE TABLE h2.test.abc USING _ AS SELECT * FROM h2.test.people")
+      val properties = new Properties()
+      val df1 = spark.read.jdbc(url, "h2.test.abc", properties)

Review comment:
       We need to distinguish between APIs and shortcuts. For `DataFrameWriter`, it has 3 APIs: `save`, `insertInto` and `saveAsTable`. `parquet`, `json`, `jdbc`, etc. are shortcuts and eventually calls `save()`.
   
   For `insertInto` and `saveAsTable`, they take Spark table name and should support multi catalogs. For `save`, it interacts with data source directly with options, and thus shouldn't support multi-catalog.
   
   For this particular test, it looks confusing as the registered JDBC catalog should already have the url config, why do we need to specify it again in `spark.read.jdbc`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

huaxingao commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r500460870



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
##########
@@ -46,7 +47,8 @@ class UnresolvedException[TreeType <: TreeNode[_]](tree: TreeType, function: Str
 case class UnresolvedRelation(
     multipartIdentifier: Seq[String],
     options: CaseInsensitiveStringMap = CaseInsensitiveStringMap.empty(),
-    override val isStreaming: Boolean = false)
+    override val isStreaming: Boolean = false,
+    partitions: Seq[Partition] = Seq.empty[Partition])

Review comment:
       Yes, `partitions` is only for JDBC V2 path, but it seems to me that we can't use `options` to pass the partition info into JDBC Scan. In `JDBCTable.newScanBuilder`, it merges `jdbcOptions` with `options` in `CaseInsensitiveStringMap`, but partitions is not a `JDBCOptions`, so it won't be passed to `JDBCScanBuilder`. I am not sure if we should add partitions into `JDBCOptions`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703986213


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34038/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

huaxingao commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r501481894



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -221,4 +221,21 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession {
       checkAnswer(sql("SELECT name, id FROM h2.test.abc"), Row("bob", 4))
     }
   }
+
+  test("DataFrameReader: jdbc") {
+    withTable("h2.test.abc") {
+      sql("CREATE TABLE h2.test.abc USING _ AS SELECT * FROM h2.test.people")
+      val properties = new Properties()
+      val df1 = spark.read.jdbc(url, "h2.test.abc", properties)

Review comment:
       Thanks for taking a look at this. The reason I am doing this is because I saw other `DataFrameReader/Writer` APIs have added the support for DataSource V2 multiple catalogs, for example, `DataFrameWriter.insertInto` and `DataFrameWriter.saveAsTable`, so I thought all the `DataFrameReader/Writer` APIs need to support DataSource V2 multiple catalogs as well.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -221,4 +221,21 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession {
       checkAnswer(sql("SELECT name, id FROM h2.test.abc"), Row("bob", 4))
     }
   }
+
+  test("DataFrameReader: jdbc") {
+    withTable("h2.test.abc") {
+      sql("CREATE TABLE h2.test.abc USING _ AS SELECT * FROM h2.test.people")
+      val properties = new Properties()
+      val df1 = spark.read.jdbc(url, "h2.test.abc", properties)

Review comment:
       It seems to me that these `jdbc` APIs take Spark table name and probably should support multi catalogs as well. Otherwise, we might want to say explicitly in the doc that the table names in these APIs can not be multi catalogs table names. 
   I will close these two PRs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703815860






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-704576939


   **[Test build #129457 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129457/testReport)** for PR 29939 at commit [`135eb08`](https://github.com/apache/spark/commit/135eb08a14dbee9fa5e9ae9b9748a0da43035710).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703990730






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-704454163


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34064/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

maropu commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r500025419



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -333,9 +333,26 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
     assertNoSpecifiedSchema("jdbc")
     // properties should override settings in extraOptions.
     this.extraOptions ++= properties.asScala
-    // explicit url and dbtable should override all
-    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table)
-    format("jdbc").load()
+    // explicit url should override all
+    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url)
+
+    import sparkSession.sessionState.analyzer.NonSessionCatalogAndIdentifier
+
+    this.source = "jdbc"
+    if (table.contains(" ")) { // if table is not a table name, e.g. a SELECT statement

Review comment:
       hm, I see. But, what if the table name has spaces , e.g., \`table  name\`? How about using `try parseMultipartIdentifier(table) catch ...` instead? This is not a smart way, but it looks better than the current one, I think.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703140382


   **[Test build #129379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129379/testReport)** for PR 29939 at commit [`ffe8e42`](https://github.com/apache/spark/commit/ffe8e42ac3dbf9b9a528458ea5b0a82a2c65e295).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703146789


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33987/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

huaxingao commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r501481894



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -221,4 +221,21 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession {
       checkAnswer(sql("SELECT name, id FROM h2.test.abc"), Row("bob", 4))
     }
   }
+
+  test("DataFrameReader: jdbc") {
+    withTable("h2.test.abc") {
+      sql("CREATE TABLE h2.test.abc USING _ AS SELECT * FROM h2.test.people")
+      val properties = new Properties()
+      val df1 = spark.read.jdbc(url, "h2.test.abc", properties)

Review comment:
       Thanks for taking a look at this. The reason I am doing this is because I saw other `DataFrameReader/Writer` APIs have added the support for DataSource V2 multiple catalogs, for example, `DataFrameWriter.insertInto` and `DataFrameWriter.saveAsTable`, so I thought all the `DataFrameReader/Writer` APIs need to support DataSource V2 multiple catalogs as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-704454186






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703872393


   **[Test build #129418 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129418/testReport)** for PR 29939 at commit [`5fb2a1a`](https://github.com/apache/spark/commit/5fb2a1aedf696356124b004fff723256e232632e).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-704454186






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703150394






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703141982


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129379/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

maropu commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r500025419



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -333,9 +333,26 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
     assertNoSpecifiedSchema("jdbc")
     // properties should override settings in extraOptions.
     this.extraOptions ++= properties.asScala
-    // explicit url and dbtable should override all
-    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table)
-    format("jdbc").load()
+    // explicit url should override all
+    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url)
+
+    import sparkSession.sessionState.analyzer.NonSessionCatalogAndIdentifier
+
+    this.source = "jdbc"
+    if (table.contains(" ")) { // if table is not a table name, e.g. a SELECT statement

Review comment:
       hm, I see. But, what if the table name has spaces , e.g., \`table  name\`? How about using `try parseMultipartIdentifier(table) catch ...` instead? This is not a smart way, but it looks better than current one, I think.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

maropu commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r501423762



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
##########
@@ -46,7 +47,8 @@ class UnresolvedException[TreeType <: TreeNode[_]](tree: TreeType, function: Str
 case class UnresolvedRelation(
     multipartIdentifier: Seq[String],
     options: CaseInsensitiveStringMap = CaseInsensitiveStringMap.empty(),
-    override val isStreaming: Boolean = false)
+    override val isStreaming: Boolean = false,
+    partitions: Seq[Partition] = Seq.empty[Partition])

Review comment:
       IMHO adding it in `JDBCOptions` looks better than adding it in `UnresolvedRelation` because the partition info is dedicated for JDBC scans and having physical partition info in a class field of the logical plan is a bit weird. But, its okay to wait for the comments of other reviewers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703141980


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

maropu commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r500025419



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -333,9 +333,26 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
     assertNoSpecifiedSchema("jdbc")
     // properties should override settings in extraOptions.
     this.extraOptions ++= properties.asScala
-    // explicit url and dbtable should override all
-    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table)
-    format("jdbc").load()
+    // explicit url should override all
+    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url)
+
+    import sparkSession.sessionState.analyzer.NonSessionCatalogAndIdentifier
+
+    this.source = "jdbc"
+    if (table.contains(" ")) { // if table is not a table name, e.g. a SELECT statement

Review comment:
       hm, I see. But, what if the table has space , e.g., \`table  name\`? How about using `try parseMultipartIdentifier(table) catch ...` instead? This is not a smart way, but it looks better than current one, I think.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

huaxingao commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r501505124



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -221,4 +221,21 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession {
       checkAnswer(sql("SELECT name, id FROM h2.test.abc"), Row("bob", 4))
     }
   }
+
+  test("DataFrameReader: jdbc") {
+    withTable("h2.test.abc") {
+      sql("CREATE TABLE h2.test.abc USING _ AS SELECT * FROM h2.test.people")
+      val properties = new Properties()
+      val df1 = spark.read.jdbc(url, "h2.test.abc", properties)

Review comment:
       It seems to me that these `jdbc` APIs take Spark table name and probably should support multi catalogs as well. Otherwise, we might want to say explicitly in the doc that the table names in these APIs can not be multi catalogs table names. 
   I will close these two PRs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-704042783


   **[Test build #129431 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129431/testReport)** for PR 29939 at commit [`e1b2ad6`](https://github.com/apache/spark/commit/e1b2ad65127f5a687d850eca5c057fffe3f08217).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703990730






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-704043485






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

huaxingao commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r500021925



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -333,9 +333,26 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
     assertNoSpecifiedSchema("jdbc")
     // properties should override settings in extraOptions.
     this.extraOptions ++= properties.asScala
-    // explicit url and dbtable should override all
-    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table)
-    format("jdbc").load()
+    // explicit url should override all
+    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url)
+
+    import sparkSession.sessionState.analyzer.NonSessionCatalogAndIdentifier
+
+    this.source = "jdbc"
+    if (table.contains(" ")) { // if table is not a table name, e.g. a SELECT statement

Review comment:
       `JDBCWriteSuite.test("SPARK-10849: create table using user specified column type and verify on target table")` will fail. 
   ```
         val query =
           """
             |(SELECT column_name, type_name, character_maximum_length
             | FROM information_schema.columns WHERE table_name = 'DBCOLTYPETEST')
           """.stripMargin
         val rows = spark.read.jdbc(url1, query, properties).collect()
   ```
   
   Error message:
   ```
   extraneous input '(' expecting {'ADD', 'AFTER', 'ALL', 'ALTER', 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', DATABASES, 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 'DIV', 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNC
 TIONS', 'GLOBAL', 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'IF', 'IGNORE', 'IMPORT', 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 'MATCHED', 'MERGE', 'MSCK', 'NAMESPACE', 'NAMESPACES', 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SELECT', 'SEMI'
 , 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TIME', 'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 2, pos 0)
   
   == SQL ==
   
   (SELECT column_name, type_name, character_maximum_length
   ^^^
    FROM information_schema.columns WHERE table_name = 'DBCOLTYPETEST')
           
   
   org.apache.spark.sql.catalyst.parser.ParseException: 
   extraneous input '(' expecting {'ADD', 'AFTER', 'ALL', 'ALTER', 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 'CASCADE', 'CASE', 'CAST', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 'CLUSTERED', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIONS', 'COMPUTE', 'CONCATENATE', 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DATA', 'DATABASE', DATABASES, 'DBPROPERTIES', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 'DIV', 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FOLLOWING', 'FOR', 'FOREIGN', 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNC
 TIONS', 'GLOBAL', 'GRANT', 'GROUP', 'GROUPING', 'HAVING', 'IF', 'IGNORE', 'IMPORT', 'IN', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEYS', 'LAST', 'LATERAL', 'LAZY', 'LEADING', 'LEFT', 'LIKE', 'LIMIT', 'LINES', 'LIST', 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 'MATCHED', 'MERGE', 'MSCK', 'NAMESPACE', 'NAMESPACES', 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENT', 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PROPERTIES', 'PURGE', 'QUERY', 'RANGE', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'RENAME', 'REPAIR', 'REPLACE', 'RESET', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 'ROWS', 'SCHEMA', 'SELECT', 'SEMI'
 , 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS', 'SETS', 'SHOW', 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'TABLE', 'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TIME', 'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 'TRUE', 'TRUNCATE', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 2, pos 0)
   
   == SQL ==
   
   (SELECT column_name, type_name, character_maximum_length
   ^^^
    FROM information_schema.columns WHERE table_name = 'DBCOLTYPETEST')
           
   
   	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263)
   	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130)
   	at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
   	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:67)
   	at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:347)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703150396


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/33987/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

huaxingao commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-705294537


   @cloud-fan one more to review. Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703815860






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-704421674


   **[Test build #129457 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129457/testReport)** for PR 29939 at commit [`135eb08`](https://github.com/apache/spark/commit/135eb08a14dbee9fa5e9ae9b9748a0da43035710).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

huaxingao commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703984061


   cc @maropu 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703976184


   **[Test build #129431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129431/testReport)** for PR 29939 at commit [`e1b2ad6`](https://github.com/apache/spark/commit/e1b2ad65127f5a687d850eca5c057fffe3f08217).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-704577708






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703150386


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33987/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r501441101



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -221,4 +221,21 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession {
       checkAnswer(sql("SELECT name, id FROM h2.test.abc"), Row("bob", 4))
     }
   }
+
+  test("DataFrameReader: jdbc") {
+    withTable("h2.test.abc") {
+      sql("CREATE TABLE h2.test.abc USING _ AS SELECT * FROM h2.test.people")
+      val properties = new Properties()
+      val df1 = spark.read.jdbc(url, "h2.test.abc", properties)

Review comment:
       I'm a bit confused about this. There are 3 ways to use JDBC data source:
   1. use `DataFrameReader/Writer` API to access JDBC tables/queries directly.
   1. register as a table, and access the table.
   1. register as a catalog, and access tables inside the catalog.
   
   `spark.read.jdbc(url, "h2.test.abc", properties)` seems like a mix of 1 and 3. What's the use case you are targeting?
   

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -221,4 +221,21 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession {
       checkAnswer(sql("SELECT name, id FROM h2.test.abc"), Row("bob", 4))
     }
   }
+
+  test("DataFrameReader: jdbc") {
+    withTable("h2.test.abc") {
+      sql("CREATE TABLE h2.test.abc USING _ AS SELECT * FROM h2.test.people")
+      val properties = new Properties()
+      val df1 = spark.read.jdbc(url, "h2.test.abc", properties)

Review comment:
       We need to distinguish between APIs and shortcuts. For `DataFrameWriter`, it has 3 APIs: `save`, `insertInto` and `saveAsTable`. `parquet`, `json`, `jdbc`, etc. are shortcuts and eventually calls `save()`.
   
   For `insertInto` and `saveAsTable`, they take Spark table name and should support multi catalogs. For `save`, it interacts with data source directly with options, and thus shouldn't support multi-catalog.
   
   For this particular test, it looks confusing as the registered JDBC catalog should already have the url config, why do we need to specify it again in `spark.read.jdbc`?

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -221,4 +221,21 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession {
       checkAnswer(sql("SELECT name, id FROM h2.test.abc"), Row("bob", 4))
     }
   }
+
+  test("DataFrameReader: jdbc") {
+    withTable("h2.test.abc") {
+      sql("CREATE TABLE h2.test.abc USING _ AS SELECT * FROM h2.test.people")
+      val properties = new Properties()
+      val df1 = spark.read.jdbc(url, "h2.test.abc", properties)

Review comment:
       In the doc of `spark.read.jdbc`: `@param table Name of the table in the external database.`
   
   This is not a spark table name, but a table name in the remote JDBC server such as MySQL.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] huaxingao closed pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

huaxingao closed pull request #29939:
URL: https://github.com/apache/spark/pull/29939


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-704577708






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703141980






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] huaxingao closed pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

huaxingao closed pull request #29939:
URL: https://github.com/apache/spark/pull/29939


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r501552867



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -221,4 +221,21 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession {
       checkAnswer(sql("SELECT name, id FROM h2.test.abc"), Row("bob", 4))
     }
   }
+
+  test("DataFrameReader: jdbc") {
+    withTable("h2.test.abc") {
+      sql("CREATE TABLE h2.test.abc USING _ AS SELECT * FROM h2.test.people")
+      val properties = new Properties()
+      val df1 = spark.read.jdbc(url, "h2.test.abc", properties)

Review comment:
       In the doc of `spark.read.jdbc`: `@param table Name of the table in the external database.`
   
   This is not a spark table name, but a table name in the remote JDBC server such as MySQL.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-704421674


   **[Test build #129457 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129457/testReport)** for PR 29939 at commit [`135eb08`](https://github.com/apache/spark/commit/135eb08a14dbee9fa5e9ae9b9748a0da43035710).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703815832


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34025/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703990721


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34038/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703872878


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129418/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-704043485






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

maropu commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r500021342



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
##########
@@ -46,7 +47,8 @@ class UnresolvedException[TreeType <: TreeNode[_]](tree: TreeType, function: Str
 case class UnresolvedRelation(
     multipartIdentifier: Seq[String],
     options: CaseInsensitiveStringMap = CaseInsensitiveStringMap.empty(),
-    override val isStreaming: Boolean = false)
+    override val isStreaming: Boolean = false,
+    partitions: Seq[Partition] = Seq.empty[Partition])

Review comment:
       Is `partitions` only for the JDBC V2 path? If so, could we use `options` instead for passing partition info into the JDBC scan?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

maropu commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r500016049



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -333,9 +333,26 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
     assertNoSpecifiedSchema("jdbc")
     // properties should override settings in extraOptions.
     this.extraOptions ++= properties.asScala
-    // explicit url and dbtable should override all
-    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table)
-    format("jdbc").load()
+    // explicit url should override all
+    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url)
+
+    import sparkSession.sessionState.analyzer.NonSessionCatalogAndIdentifier
+
+    this.source = "jdbc"
+    if (table.contains(" ")) { // if table is not a table name, e.g. a SELECT statement

Review comment:
       What if we don't have this check?
   ```
         this.source = "jdbc"
         sparkSession.sessionState.sqlParser.parseMultipartIdentifier(table) match {
           case _ @ NonSessionCatalogAndIdentifier(_, _) =>
             this.table(table)
           case _ =>
             // explicit dbtable should override all
             this.extraOptions += JDBCOptions.JDBC_TABLE_NAME -> table
             load
         }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r499819222



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala
##########
@@ -36,6 +37,8 @@ case class JDBCScanBuilder(
 
   private var prunedSchema = schema
 
+  var partition = Array.empty[Partition]

Review comment:
       How about to avoid `var`, and add `partition` to the case class `JDBCScanBuilder`, and use `.copy`?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala
##########
@@ -64,7 +67,11 @@ case class JDBCScanBuilder(
   override def build(): Scan = {
     val resolver = session.sessionState.conf.resolver
     val timeZoneId = session.sessionState.conf.sessionLocalTimeZone
-    val parts = JDBCRelation.columnPartition(schema, resolver, timeZoneId, jdbcOptions)
-    JDBCScan(JDBCRelation(schema, parts, jdbcOptions)(session), prunedSchema, pushedFilter)
+    val part = if (partition.isEmpty) {

Review comment:
       Is there any reason for renaming `parts` -> `part`. It is an array of partitions, correct?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##########
@@ -362,7 +363,8 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
               }
           }
 
-          val relation = DataSourceV2Relation.create(table, catalog, ident, dsOptions)
+          val relation = DataSourceV2Relation.create(table, catalog, ident, dsOptions,
+            Array.empty[Partition])

Review comment:
       ```suggestion
             val relation = DataSourceV2Relation.create(
               table,
               catalog,
               ident,
               dsOptions,
               Array.empty[Partition])
   ```

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
##########
@@ -43,7 +44,8 @@ case class DataSourceV2Relation(
     output: Seq[AttributeReference],
     catalog: Option[CatalogPlugin],
     identifier: Option[Identifier],
-    options: CaseInsensitiveStringMap)
+    options: CaseInsensitiveStringMap,
+    partitions: Array[Partition] = Array.empty[Partition])

Review comment:
       Why did you add the default value `Array.empty[Partition]`?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -333,9 +333,26 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
     assertNoSpecifiedSchema("jdbc")
     // properties should override settings in extraOptions.
     this.extraOptions ++= properties.asScala
-    // explicit url and dbtable should override all
-    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table)
-    format("jdbc").load()
+    // explicit url should override all
+    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url)

Review comment:
       Probably, no need to construct the Seq.
   ```suggestion
       this.extraOptions += JDBCOptions.JDBC_URL -> url
   ```

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -333,9 +333,26 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
     assertNoSpecifiedSchema("jdbc")
     // properties should override settings in extraOptions.
     this.extraOptions ++= properties.asScala
-    // explicit url and dbtable should override all
-    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table)
-    format("jdbc").load()
+    // explicit url should override all
+    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url)
+
+    import sparkSession.sessionState.analyzer.NonSessionCatalogAndIdentifier
+
+    this.source = "jdbc"
+    if (table.contains(" ")) { // if table is not a table name, e.g. a SELECT statement

Review comment:
       Is it reliable check? 

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -333,9 +333,26 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
     assertNoSpecifiedSchema("jdbc")
     // properties should override settings in extraOptions.
     this.extraOptions ++= properties.asScala
-    // explicit url and dbtable should override all
-    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table)
-    format("jdbc").load()
+    // explicit url should override all
+    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url)
+
+    import sparkSession.sessionState.analyzer.NonSessionCatalogAndIdentifier
+
+    this.source = "jdbc"
+    if (table.contains(" ")) { // if table is not a table name, e.g. a SELECT statement
+      // explicit dbtable should override all
+      this.extraOptions ++= Seq(JDBCOptions.JDBC_TABLE_NAME -> table)
+      load
+    } else {
+      sparkSession.sessionState.sqlParser.parseMultipartIdentifier(table) match {
+        case nameParts@NonSessionCatalogAndIdentifier(_, _) =>

Review comment:
       ```suggestion
           case nameParts @ NonSessionCatalogAndIdentifier(_, _) =>
   ```

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -957,7 +959,8 @@ class Analyzer(
                 Some(StreamingRelationV2(None, table.name, table, options,
                   table.schema.toAttributes, Some(catalog), Some(ident), None))
               } else {
-                Some(DataSourceV2Relation.create(table, Some(catalog), Some(ident), options))
+                Some(DataSourceV2Relation.create(table, Some(catalog), Some(ident), options,
+                  partitions))

Review comment:
       According to the style guide:
   ```suggestion
                   Some(DataSourceV2Relation.create(
                     table,
                     Some(catalog),
                     Some(ident),
                     options,
                     partitions))
   ```

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -333,9 +333,26 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
     assertNoSpecifiedSchema("jdbc")
     // properties should override settings in extraOptions.
     this.extraOptions ++= properties.asScala
-    // explicit url and dbtable should override all
-    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table)
-    format("jdbc").load()
+    // explicit url should override all
+    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url)
+
+    import sparkSession.sessionState.analyzer.NonSessionCatalogAndIdentifier
+
+    this.source = "jdbc"
+    if (table.contains(" ")) { // if table is not a table name, e.g. a SELECT statement
+      // explicit dbtable should override all
+      this.extraOptions ++= Seq(JDBCOptions.JDBC_TABLE_NAME -> table)
+      load
+    } else {
+      sparkSession.sessionState.sqlParser.parseMultipartIdentifier(table) match {
+        case nameParts@NonSessionCatalogAndIdentifier(_, _) =>

Review comment:
       `nameParts` is not used actually. Maybe:
   ```suggestion
           case _ @ NonSessionCatalogAndIdentifier(_, _) =>
   ```

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -333,9 +333,26 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
     assertNoSpecifiedSchema("jdbc")
     // properties should override settings in extraOptions.
     this.extraOptions ++= properties.asScala
-    // explicit url and dbtable should override all
-    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table)
-    format("jdbc").load()
+    // explicit url should override all
+    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url)
+
+    import sparkSession.sessionState.analyzer.NonSessionCatalogAndIdentifier
+
+    this.source = "jdbc"
+    if (table.contains(" ")) { // if table is not a table name, e.g. a SELECT statement
+      // explicit dbtable should override all
+      this.extraOptions ++= Seq(JDBCOptions.JDBC_TABLE_NAME -> table)
+      load
+    } else {
+      sparkSession.sessionState.sqlParser.parseMultipartIdentifier(table) match {
+        case nameParts@NonSessionCatalogAndIdentifier(_, _) =>
+          this.table(table)
+        case _ =>
+          // explicit dbtable should override all
+          this.extraOptions ++= Seq(JDBCOptions.JDBC_TABLE_NAME -> table)

Review comment:
       ```suggestion
             this.extraOptions += JDBCOptions.JDBC_TABLE_NAME -> table
   ```

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -333,9 +333,26 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
     assertNoSpecifiedSchema("jdbc")
     // properties should override settings in extraOptions.
     this.extraOptions ++= properties.asScala
-    // explicit url and dbtable should override all
-    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table)
-    format("jdbc").load()
+    // explicit url should override all
+    this.extraOptions ++= Seq(JDBCOptions.JDBC_URL -> url)
+
+    import sparkSession.sessionState.analyzer.NonSessionCatalogAndIdentifier
+
+    this.source = "jdbc"
+    if (table.contains(" ")) { // if table is not a table name, e.g. a SELECT statement
+      // explicit dbtable should override all
+      this.extraOptions ++= Seq(JDBCOptions.JDBC_TABLE_NAME -> table)

Review comment:
       ```suggestion
         this.extraOptions += JDBCOptions.JDBC_TABLE_NAME -> table
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703150394


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703805531


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34025/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29939: [SPARK-33062][SQL] Make DataFrameReader.jdbc work for DataSource V2

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #29939:
URL: https://github.com/apache/spark/pull/29939#issuecomment-703141973


   **[Test build #129379 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129379/testReport)** for PR 29939 at commit [`ffe8e42`](https://github.com/apache/spark/commit/ffe8e42ac3dbf9b9a528458ea5b0a82a2c65e295).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org