You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/11/23 12:09:45 UTC

[GitHub] [spark] peter-toth opened a new pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

peter-toth opened a new pull request #34693:
URL: https://github.com/apache/spark/pull/34693


   ### What changes were proposed in this pull request?
   CTE queries are not supported with MSSQL server via JDBC as MSSQL server doesn't support statements with nested CTEs. When Spark builds the final query, that it will send via JDBC to the server, it wraps the original query (`options.tableOrQuery`) into parentheses in `JDBCRDD.resolveTable()` and `JDBCRDD.compute()`.
   Unfortunately, it is non-trivial to split an arbitrary query it into "with" and "regular" query clauses in `MsSqlServerDialect`. So instead, I'm proposing a new general JDBC option "withClause" that users can use if they have complex queries with CTE.
   
   ### Why are the changes needed?
   To support CTE queries with MSSQL.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, CTE queries are supported form now.
   
   ### How was this patch tested?
   Added new integration UTs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-976848358


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145548/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-976848358


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145548/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
peter-toth commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-976828400


   Hmm, failures in `ExpressionsSchemaSuite` look unrelated...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-984478397


   > Unfortunately, it is non-trivial to split an arbitrary query it into "with" and "regular" clauses in MsSqlServerDialect.
   Could you please show us some example queries where the split is non-trivial?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
peter-toth commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977671500


   > > This change also seem to work with MSSQL's temp table syntax:
   > > ```
   > > val withClause = "(SELECT * INTO #TempTable FROM (SELECT * FROM tbl WHERE x > 10) t)"
   > > val query = "SELECT * FROM #TempTable"
   > > val df = spark.read.format("jdbc")
   > >   .option("url", jdbcUrl)
   > >   .option("withClause", withClause)
   > >   .option("query", query)
   > >   .load()
   > > ```
   > 
   > Since it also works with temp table syntax, do you think it would be a good idea to include it the title of this PR and modify the title to "Support CTE and TempTable queries with MSSQL JDBC"?
   
   Thanks, make sense. I've added a new test in: https://github.com/apache/spark/pull/34693/commits/dea730a12503328b98375cf4d4590444db31d090


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-1065990017


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #34693:
URL: https://github.com/apache/spark/pull/34693


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977679549


   **[Test build #145572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145572/testReport)** for PR 34693 at commit [`b53ef47`](https://github.com/apache/spark/commit/b53ef475b6349e351e74635c1b1e5d4a923398fb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977715174


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50045/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-976453455


   **[Test build #145548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145548/testReport)** for PR 34693 at commit [`e2c9577`](https://github.com/apache/spark/commit/e2c957718196f8269c56fb4e3c6a8b4d1eed3074).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-976568407


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50020/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth edited a comment on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
peter-toth edited a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-976728888


   This change also seem to work with MSSQL's temp table syntax:
   ```
   val withClause = "(SELECT * INTO #TempTable FROM (SELECT * FROM tbl WHERE x > 10) t)"
   val query = "SELECT * FROM #TempTable"
   val df = spark.read.format("jdbc")
     .option("url", jdbcUrl)
     .option("withClause", withClause)
     .option("query", query)
     .load()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-976489807


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50020/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977750115


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50045/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977904347






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sumeetgajjar commented on a change in pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
sumeetgajjar commented on a change in pull request #34693:
URL: https://github.com/apache/spark/pull/34693#discussion_r755398500



##########
File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
##########
@@ -356,4 +357,42 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationSuite {
         0, 3, 0, 0, 0, -1, -1, -1, -1, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0,
         0, 0, 0, 1, 0, 0, 0, 3))
   }
+
+  test("withClause and query  JDBC options") {
+    val expectedResult = Set(
+      (42, "fred"),
+      (17, "dave")
+    ).map { case (x, y) =>
+      Row(Integer.valueOf(x), String.valueOf(y))
+    }
+
+    val withClause = "WITH t AS (SELECT x, y FROM tbl)"
+    val query = "SELECT * FROM t WHERE x > 10"
+    val df = spark.read.format("jdbc")
+      .option("url", jdbcUrl)
+      .option("withClause", withClause)
+      .option("query", query)
+      .load()
+    assert(df.collect.toSet === expectedResult)
+
+    df.explain(true)
+  }
+
+  test("withClause and dbtable JDBC options") {
+    val expectedResult = Set(
+      (42, "fred"),
+      (17, "dave")
+    ).map { case (x, y) =>
+      Row(Integer.valueOf(x), String.valueOf(y))
+    }
+
+    val withClause = "WITH t AS (SELECT x, y FROM tbl WHERE x > 10)"
+    val dbtable = "t"
+    val df = spark.read.format("jdbc")
+      .option("url", jdbcUrl)
+      .option("withClause", withClause)
+      .option("dbtable", dbtable)
+      .load()
+    assert(df.collect.toSet === expectedResult)
+  }

Review comment:
       Since it already works with temp table syntax, should we also add the corresponding UTs?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
##########
@@ -325,6 +325,6 @@ private[sql] case class JDBCRelation(
   override def toString: String = {
     val partitioningInfo = if (parts.nonEmpty) s" [numPartitions=${parts.length}]" else ""
     // credentials should not be included in the plan output, table information is sufficient.
-    s"JDBCRelation(${jdbcOptions.tableOrQuery})" + partitioningInfo
+    s"JDBCRelation(${jdbcOptions.withClause}${jdbcOptions.tableOrQuery})" + partitioningInfo

Review comment:
       Since `partitioningInfo` is also a string, should we create the final string as 
   `s"JDBCRelation(${jdbcOptions.withClause}${jdbcOptions.tableOrQuery})$partitioningInfo"`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] KevinAppelBofa edited a comment on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
KevinAppelBofa edited a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-984676447


   @attilapiros Finding in the query where the WHEN piece stops, then where the SELECT begins is where I found the place to split.  In the test query 
   `query2 = """
   WITH DummyCTE AS
   (
   SELECT 1 as DummyCOL
   )
   SELECT *
   FROM DummyCTE
   """`
   
   This splits into
   `withClause = """
   WITH DummyCTE AS
   (
   SELECT 1 as DummyCOL
   )
   
   """
   query = """
   SELECT *
   FROM DummyCTE
   """`
   
   In the actual query we are running is more complex and is a bunch of chained WHEN together, in that one I did the same approach and where the actual WHEN part ends to stick that into the whenClause and then the rest into the query
   
   This same technique works for the temp table query, to split it up where the part generating the temp table goes into the whenClause and the rest goes into the query
   
   `query3 = """
   (SELECT *
   INTO #Temp1a
   FROM
   (SELECT @@VERSION as version) data
   )
   
   (SELECT *
   FROM
   #Temp1a)
   """`
   
   Turns into
   `withClause = """
   (SELECT *
   INTO #Temp1a
   FROM
   (SELECT @@VERSION as version) data
   )
   """
   
   query = """
   (SELECT *
   FROM
   #Temp1a)
   """`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] KevinAppelBofa commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
KevinAppelBofa commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-981788128


   @peter-toth I was able to get the CTE query to work using the split method you have, this was a little trial and error to find the right place to split and is producing the same results using the other method that uses the useRawQuery option and appears to be a similar run time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977760966


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50044/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-976453455


   **[Test build #145548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145548/testReport)** for PR 34693 at commit [`e2c9577`](https://github.com/apache/spark/commit/e2c957718196f8269c56fb4e3c6a8b4d1eed3074).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-976827223


   **[Test build #145548 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145548/testReport)** for PR 34693 at commit [`e2c9577`](https://github.com/apache/spark/commit/e2c957718196f8269c56fb4e3c6a8b4d1eed3074).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977672214


   **[Test build #145571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145571/testReport)** for PR 34693 at commit [`dea730a`](https://github.com/apache/spark/commit/dea730a12503328b98375cf4d4590444db31d090).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] KevinAppelBofa edited a comment on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
KevinAppelBofa edited a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-984676447


   @attilapiros Finding in the query where the WHEN piece stops, then where the SELECT begins is where I found the place to split.  In the test query 
   ``` python 
   query2 = """
   WITH DummyCTE AS
   (
   SELECT 1 as DummyCOL
   )
   SELECT *
   FROM DummyCTE
   """
   ```
   
   This splits into
   ``` python 
   withClause = """
   WITH DummyCTE AS
   (
   SELECT 1 as DummyCOL
   )
   
   """
   query = """
   SELECT *
   FROM DummyCTE
   """
   ```
   
   In the actual query we are running is more complex and is a bunch of chained WHEN together, in that one I did the same approach and where the actual WHEN part ends to stick that into the whenClause and then the rest into the query
   
   This same technique works for the temp table query, to split it up where the part generating the temp table goes into the whenClause and the rest goes into the query
   
   ``` python 
   query3 = """
   (SELECT *
   INTO #Temp1a
   FROM
   (SELECT @@VERSION as version) data
   )
   
   (SELECT *
   FROM
   #Temp1a)
   """
   ```
   
   Turns into
   ``` python 
   withClause = """
   (SELECT *
   INTO #Temp1a
   FROM
   (SELECT @@VERSION as version) data
   )
   """
   
   query = """
   (SELECT *
   FROM
   #Temp1a)
   """
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-984829159


   > @attilapiros Finding in the query where the WITH piece stops, then where the SELECT begins is where I found the place to split.
   
   Thanks, I see that. But still it is really hard to automate it. So I think what Peter come up with is best we have right now.
   
   I was thinking about how to avoid `SELECT * FROM $table WHERE 1=0`. One of my idea was just replacing all the `SELECT` (ignoring case) with `SELECT top(0)` as that could be done even in string literals as it does not change the schema. But if `top` was already used somewhere then this ugly hack fails. And this is just one part of the problem to get the schema without running the query. The other one in (in `JDBCRDD.compute()`) is even harder to crack where the partitioning and pushed down group by is handled.
   
   So based on this LGTM.
   
   cc @viirya, @HyukjinKwon 
   
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977672214


   **[Test build #145571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145571/testReport)** for PR 34693 at commit [`dea730a`](https://github.com/apache/spark/commit/dea730a12503328b98375cf4d4590444db31d090).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977713681


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50044/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977679549


   **[Test build #145572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145572/testReport)** for PR 34693 at commit [`b53ef47`](https://github.com/apache/spark/commit/b53ef475b6349e351e74635c1b1e5d4a923398fb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
peter-toth commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-976728888


   This change also seem to work with MSSQL's temp table syntax:
   ```
   val withClause = "(SELECT * INTO #TempTable FROM (SELECT * FROM tbl) t)"
   val query = "SELECT * FROM #TempTable"
   val df = spark.read.format("jdbc")
     .option("url", jdbcUrl)
     .option("withClause", withClause)
     .option("query", query)
     .load()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sumeetgajjar commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
sumeetgajjar commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-976972982


   > This change also seem to work with MSSQL's temp table syntax:
   > 
   > ```
   > val withClause = "(SELECT * INTO #TempTable FROM (SELECT * FROM tbl WHERE x > 10) t)"
   > val query = "SELECT * FROM #TempTable"
   > val df = spark.read.format("jdbc")
   >   .option("url", jdbcUrl)
   >   .option("withClause", withClause)
   >   .option("query", query)
   >   .load()
   > ```
   
   Since it also works with temp table syntax, do you think it would be a good idea to include it the title of this PR and modify the title to "Support CTE and TempTable queries with MSSQL JDBC"?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a change in pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #34693:
URL: https://github.com/apache/spark/pull/34693#discussion_r755830269



##########
File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
##########
@@ -356,4 +357,42 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationSuite {
         0, 3, 0, 0, 0, -1, -1, -1, -1, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0,
         0, 0, 0, 1, 0, 0, 0, 3))
   }
+
+  test("withClause and query  JDBC options") {
+    val expectedResult = Set(
+      (42, "fred"),
+      (17, "dave")
+    ).map { case (x, y) =>
+      Row(Integer.valueOf(x), String.valueOf(y))
+    }
+
+    val withClause = "WITH t AS (SELECT x, y FROM tbl)"
+    val query = "SELECT * FROM t WHERE x > 10"
+    val df = spark.read.format("jdbc")
+      .option("url", jdbcUrl)
+      .option("withClause", withClause)
+      .option("query", query)
+      .load()
+    assert(df.collect.toSet === expectedResult)
+
+    df.explain(true)
+  }
+
+  test("withClause and dbtable JDBC options") {
+    val expectedResult = Set(
+      (42, "fred"),
+      (17, "dave")
+    ).map { case (x, y) =>
+      Row(Integer.valueOf(x), String.valueOf(y))
+    }
+
+    val withClause = "WITH t AS (SELECT x, y FROM tbl WHERE x > 10)"
+    val dbtable = "t"
+    val df = spark.read.format("jdbc")
+      .option("url", jdbcUrl)
+      .option("withClause", withClause)
+      .option("dbtable", dbtable)
+      .load()
+    assert(df.collect.toSet === expectedResult)
+  }

Review comment:
       Ok. Added in https://github.com/apache/spark/commit/dea730a12503328b98375cf4d4590444db31d090




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] KevinAppelBofa commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
KevinAppelBofa commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-981708494


   @peter-toth thank you for working on this, I was able to get a spark 3.3.0-snapshot compiled and test the changes you made.  I ran both the sample queries first and those were able to work, then I ran the temp table query and this is also working; that one was easy to split into the withClause and query.  I am running into an issue though getting the CTE query to run, i have tried to split this up a few ways but I keep getting the same error which is below.  I'm going to try to add a logwarning to dump out the query it is trying to run to get the schema and see if I can get that to run directly in the sql server.  This was the issue I ran into originally I was able to get the test CTE to work and doing a $CTEQUERY where 1=0; was working but in this more complex CTE I can't find a spot where to add the 1=0 to get a schema back only.
   
   `py4j.protocol.Py4JJavaError: An error occurred while calling o85.load.
   : com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword 'WITH'.
           at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
           at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1632)
           at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:602)
           at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:524)
           at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7418)
           at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3272)
           at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:247)
           at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:222)
           at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:446)
           at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:69)
           at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:59)
           at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:240)
           at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36)
           at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
           at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:227)
           at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:209)
           at scala.Option.getOrElse(Option.scala:189)
           at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:209)
           at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:170)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
           at py4j.Gateway.invoke(Gateway.java:282)
           at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
           at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
           at java.lang.Thread.run(Thread.java:748)`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-984520584


   > We could use ANTLR to parse the query just like we use it to parse Spark SQL statements but I'm not sure it is worth it...
   
   I agree that would be too much here and without a complex parsing logic we might fail at the splitting
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-976570563


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50020/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34693: [SPARK-37259][SQL] Support CTE queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-976570563


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50020/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on a change in pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
peter-toth commented on a change in pull request #34693:
URL: https://github.com/apache/spark/pull/34693#discussion_r755836451



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
##########
@@ -325,6 +325,6 @@ private[sql] case class JDBCRelation(
   override def toString: String = {
     val partitioningInfo = if (parts.nonEmpty) s" [numPartitions=${parts.length}]" else ""
     // credentials should not be included in the plan output, table information is sufficient.
-    s"JDBCRelation(${jdbcOptions.tableOrQuery})" + partitioningInfo
+    s"JDBCRelation(${jdbcOptions.withClause}${jdbcOptions.tableOrQuery})" + partitioningInfo

Review comment:
       Fixed in https://github.com/apache/spark/pull/34693/commits/b53ef475b6349e351e74635c1b1e5d4a923398fb




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977898727


   **[Test build #145571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145571/testReport)** for PR 34693 at commit [`dea730a`](https://github.com/apache/spark/commit/dea730a12503328b98375cf4d4590444db31d090).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977764281






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977764284






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth edited a comment on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
peter-toth edited a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977671500


   > > This change also seem to work with MSSQL's temp table syntax:
   > > ```
   > > val withClause = "(SELECT * INTO #TempTable FROM (SELECT * FROM tbl WHERE x > 10) t)"
   > > val query = "SELECT * FROM #TempTable"
   > > val df = spark.read.format("jdbc")
   > >   .option("url", jdbcUrl)
   > >   .option("withClause", withClause)
   > >   .option("query", query)
   > >   .load()
   > > ```
   > 
   > Since it also works with temp table syntax, do you think it would be a good idea to include it the title of this PR and modify the title to "Support CTE and TempTable queries with MSSQL JDBC"?
   
   Thanks, make sense. I've modified PR title and description and added a new test in: https://github.com/apache/spark/pull/34693/commits/dea730a12503328b98375cf4d4590444db31d090


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth edited a comment on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
peter-toth edited a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977671500


   > > This change also seem to work with MSSQL's temp table syntax:
   > > ```
   > > val withClause = "(SELECT * INTO #TempTable FROM (SELECT * FROM tbl WHERE x > 10) t)"
   > > val query = "SELECT * FROM #TempTable"
   > > val df = spark.read.format("jdbc")
   > >   .option("url", jdbcUrl)
   > >   .option("withClause", withClause)
   > >   .option("query", query)
   > >   .load()
   > > ```
   > 
   > Since it also works with temp table syntax, do you think it would be a good idea to include it the title of this PR and modify the title to "Support CTE and TempTable queries with MSSQL JDBC"?
   
   Thanks, make sense. I've modified the title and description and added a new test in: https://github.com/apache/spark/pull/34693/commits/dea730a12503328b98375cf4d4590444db31d090


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977904347






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth edited a comment on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
peter-toth edited a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977671500


   > > This change also seem to work with MSSQL's temp table syntax:
   > > ```
   > > val withClause = "(SELECT * INTO #TempTable FROM (SELECT * FROM tbl WHERE x > 10) t)"
   > > val query = "SELECT * FROM #TempTable"
   > > val df = spark.read.format("jdbc")
   > >   .option("url", jdbcUrl)
   > >   .option("withClause", withClause)
   > >   .option("query", query)
   > >   .load()
   > > ```
   > 
   > Since it also works with temp table syntax, do you think it would be a good idea to include it the title of this PR and modify the title to "Support CTE and TempTable queries with MSSQL JDBC"?
   
   Thanks, makes sense. I've modified PR title and description and added a new test in: https://github.com/apache/spark/pull/34693/commits/dea730a12503328b98375cf4d4590444db31d090
   
   Actually, I wonder if it makes sense to rename `withClause` to a more general `queryPrefix` option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-977890443


   **[Test build #145572 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145572/testReport)** for PR 34693 at commit [`b53ef47`](https://github.com/apache/spark/commit/b53ef475b6349e351e74635c1b1e5d4a923398fb).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
peter-toth commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-984501867


   > > Unfortunately, it is non-trivial to split an arbitrary query it into "with" and "regular" clauses in MsSqlServerDialect.
   > 
   > Could you please show us some example queries where the split is non-trivial?
   
   For example:
   ```
     WITH t AS (SELECT x FROM tbl), t2 AS (SELECT y FROM tbl2) SELECT * FROM t WHERE x > SELECT max(y) FROM t2 
   ```
   You need to split this to `WITH t AS (SELECT x FROM tbl), t2 AS (SELECT y FROM tbl2)` and `SELECT * FROM t WHERE x > SELECT max(y) FROM t2`. Obviously there can be more brackets in the query and some might not be paired/closed if they appear in strings like: `t2 AS (SELECT y, ')' AS dummy FROM tbl2)`. We could use ANTLR to parse the query just like we use it to parse Spark SQL statements but I'm not sure it is worth it...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros edited a comment on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
attilapiros edited a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-984478397


   > Unfortunately, it is non-trivial to split an arbitrary query it into "with" and "regular" clauses in MsSqlServerDialect.
   
   Could you please show us some example queries where the split is non-trivial?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] KevinAppelBofa edited a comment on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
KevinAppelBofa edited a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-984676447


   @attilapiros Finding in the query where the WHEN piece stops, then where the SELECT begins is where I found the place to split.  In the test query 
   ``` python 
   query2 = """
   WITH DummyCTE AS
   (
   SELECT 1 as DummyCOL
   )
   SELECT *
   FROM DummyCTE
   """
   ```
   
   This splits into
   `withClause = """
   WITH DummyCTE AS
   (
   SELECT 1 as DummyCOL
   )
   
   """
   query = """
   SELECT *
   FROM DummyCTE
   """`
   
   In the actual query we are running is more complex and is a bunch of chained WHEN together, in that one I did the same approach and where the actual WHEN part ends to stick that into the whenClause and then the rest into the query
   
   This same technique works for the temp table query, to split it up where the part generating the temp table goes into the whenClause and the rest goes into the query
   
   `query3 = """
   (SELECT *
   INTO #Temp1a
   FROM
   (SELECT @@VERSION as version) data
   )
   
   (SELECT *
   FROM
   #Temp1a)
   """`
   
   Turns into
   `withClause = """
   (SELECT *
   INTO #Temp1a
   FROM
   (SELECT @@VERSION as version) data
   )
   """
   
   query = """
   (SELECT *
   FROM
   #Temp1a)
   """`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] KevinAppelBofa commented on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
KevinAppelBofa commented on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-984676447


   @attilapiros Finding in the query where the WHEN piece stops, then where the SELECT begins is where I found the place to split.  In the test query 
   `query2 = """
   WITH DummyCTE AS
   (
   SELECT 1 as DummyCOL
   )
   SELECT *
   FROM DummyCTE
   """`
   
   This splits into
   `withClause = """
   WITH DummyCTE AS
   (
   SELECT 1 as DummyCOL
   )
   """
   query = """
   SELECT *
   FROM DummyCTE
   """`
   
   In the actual query we are running is more complex and is a bunch of chained WHEN together, in that one I did the same approach and where the actual WHEN part ends to stick that into the whenClause and then the rest into the query
   
   This same technique works for the temp table query, to split it up where the part generating the temp table goes into the whenClause and the rest goes into the query
   
   `query3 = """
   (SELECT *
   INTO #Temp1a
   FROM
   (SELECT @@VERSION as version) data
   )
   
   (SELECT *
   FROM
   #Temp1a)
   """`
   
   Turns into
   `withClause = """
   (SELECT *
   INTO #Temp1a
   FROM
   (SELECT @@VERSION as version) data
   )
   """
   query = """
   (SELECT *
   FROM
   #Temp1a)
   """`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] KevinAppelBofa edited a comment on pull request #34693: [SPARK-37259][SQL] Support CTE and TempTable queries with MSSQL JDBC

Posted by GitBox <gi...@apache.org>.
KevinAppelBofa edited a comment on pull request #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-984676447


   @attilapiros Finding in the query where the WITH piece stops, then where the SELECT begins is where I found the place to split.  In the test query 
   ``` python 
   query2 = """
   WITH DummyCTE AS
   (
   SELECT 1 as DummyCOL
   )
   SELECT *
   FROM DummyCTE
   """
   ```
   
   This splits into
   ``` python 
   withClause = """
   WITH DummyCTE AS
   (
   SELECT 1 as DummyCOL
   )
   
   """
   query = """
   SELECT *
   FROM DummyCTE
   """
   ```
   
   In the actual query we are running is more complex and is a bunch of chained WITH together, in that one I did the same approach and where the actual WITH part ends to stick that into the withClause and then the rest into the query
   
   This same technique works for the temp table query, to split it up where the part generating the temp table goes into the withClause and the rest goes into the query
   
   ``` python 
   query3 = """
   (SELECT *
   INTO #Temp1a
   FROM
   (SELECT @@VERSION as version) data
   )
   
   (SELECT *
   FROM
   #Temp1a)
   """
   ```
   
   Turns into
   ``` python 
   withClause = """
   (SELECT *
   INTO #Temp1a
   FROM
   (SELECT @@VERSION as version) data
   )
   """
   
   query = """
   (SELECT *
   FROM
   #Temp1a)
   """
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org