You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2023/02/13 06:24:31 UTC

[GitHub] [spark] beliefer opened a new pull request, #39990: [WIP][SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

beliefer opened a new pull request, #39990:
URL: https://github.com/apache/spark/pull/39990

   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #39990:
URL: https://github.com/apache/spark/pull/39990#issuecomment-1450189049

   > Can you split this PR into two? Instead of solving two problems, it would be better to have a separate PR for each.
   
   Yeah. It's better to split them. I will create another PR to do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

Posted by "sadikovi (via GitHub)" <gi...@apache.org>.
sadikovi commented on PR #39990:
URL: https://github.com/apache/spark/pull/39990#issuecomment-1447143805

   Can you split this PR into two? Instead of solving two problems, it would be better to have a separate PR for each.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #39990:
URL: https://github.com/apache/spark/pull/39990#discussion_r1130954327


##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala:
##########
@@ -410,6 +410,16 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu
     assert(sorts.isEmpty)
   }
 
+  private def offsetPushed(df: DataFrame, offset: Int): Boolean = {

Review Comment:
   `checkOffsetPushed`



##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala:
##########
@@ -410,6 +410,16 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu
     assert(sorts.isEmpty)
   }
 
+  private def offsetPushed(df: DataFrame, offset: Int): Boolean = {

Review Comment:
   we should rename `limitPushed` as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #39990:
URL: https://github.com/apache/spark/pull/39990#discussion_r1130957560


##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala:
##########
@@ -410,6 +410,16 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu
     assert(sorts.isEmpty)
   }
 
+  private def offsetPushed(df: DataFrame, offset: Int): Boolean = {

Review Comment:
   nvm, this two functions return a boolean. We can rename them if we update them to do the check directly instead of returning boolean.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #39990:
URL: https://github.com/apache/spark/pull/39990#discussion_r1130961053


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala:
##########
@@ -181,22 +181,42 @@ private case object OracleDialect extends JdbcDialect {
     if (limit > 0) s"WHERE rownum <= $limit" else ""
   }
 
+  override def getOffsetClause(offset: Integer): String = {
+    // Oracle doesn't support OFFSET clause.
+    // We can use rownum > n to skip some rows in the result set.
+    // Note: rn is an alias of rownum.
+    if (offset > 0) s"WHERE rn > $offset" else ""
+  }
+
   class OracleSQLQueryBuilder(dialect: JdbcDialect, options: JDBCOptions)
     extends JdbcSQLQueryBuilder(dialect, options) {
 
-    // TODO[SPARK-42289]: DS V2 pushdown could let JDBC dialect decide to push down offset
     override def build(): String = {
       val selectStmt = s"SELECT $columnList FROM ${options.tableOrQuery} $tableSampleClause" +
         s" $whereClause $groupByClause $orderByClause"
-      if (limit > 0) {
-        val limitClause = dialect.getLimitClause(limit)
-        options.prepareQuery + s"SELECT tab.* FROM ($selectStmt) tab $limitClause"
+      val finalSelectStmt = if (limit > 0) {

Review Comment:
   is this a bug fix?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #39990:
URL: https://github.com/apache/spark/pull/39990#discussion_r1130957560


##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala:
##########
@@ -410,6 +410,16 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu
     assert(sorts.isEmpty)
   }
 
+  private def offsetPushed(df: DataFrame, offset: Int): Boolean = {

Review Comment:
   nvm, this two functions return a boolean. We can rename them if we update them to do the check directly instead of returning boolean, e.g. passing `-1` means pushdown should not happen.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #39990:
URL: https://github.com/apache/spark/pull/39990#issuecomment-1445670336

   ping @huaxingao cc @cloud-fan @sadikovi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer closed pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer closed pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.
URL: https://github.com/apache/spark/pull/39990


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #39990:
URL: https://github.com/apache/spark/pull/39990#issuecomment-1465922089

   https://github.com/apache/spark/pull/40396 used to replace this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org