You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/12 06:36:20 UTC

[GitHub] [spark] beliefer opened a new pull request, #36519: [SPARK-39159][SQL] Add new Dataset API for Offset

beliefer opened a new pull request, #36519:
URL: https://github.com/apache/spark/pull/36519

   ### What changes were proposed in this pull request?
   Currently, Spark added `Offset` operator.
   This PR try to add `offset` API into `Dataset`.
   
   
   ### Why are the changes needed?
   `offset` API is very useful and construct test case more easily.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   New feature.
   
   
   ### How was this patch tested?
   New tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #36519: [SPARK-39159][SQL] Add new Dataset API for Offset

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #36519: [SPARK-39159][SQL] Add new Dataset API for Offset
URL: https://github.com/apache/spark/pull/36519


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36519: [SPARK-39159][SQL] Add new Dataset API for Offset

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36519:
URL: https://github.com/apache/spark/pull/36519#discussion_r871440865


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -605,6 +605,20 @@ class DataFrameSuite extends QueryTest
     )
   }
 
+  test("offset") {
+    checkAnswer(
+      testData.offset(90),
+      testData.collect().drop(90).toSeq)
+
+    checkAnswer(
+      arrayData.toDF().offset(99),
+      arrayData.collect().drop(99).map(r => Row.fromSeq(r.productIterator.toSeq)))
+
+    checkAnswer(
+      mapData.toDF().offset(99),
+      mapData.collect().drop(99).map(r => Row.fromSeq(r.productIterator.toSeq)))

Review Comment:
   can we test a few combinations? like limit followed by offset, and offset followed by limit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36519: [SPARK-39159][SQL] Add new Dataset API for Offset

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36519:
URL: https://github.com/apache/spark/pull/36519#discussion_r871986222


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -605,6 +605,20 @@ class DataFrameSuite extends QueryTest
     )
   }
 
+  test("offset") {
+    checkAnswer(
+      testData.offset(90),
+      testData.collect().drop(90).toSeq)
+
+    checkAnswer(
+      arrayData.toDF().offset(99),
+      arrayData.collect().drop(99).map(r => Row.fromSeq(r.productIterator.toSeq)))
+
+    checkAnswer(
+      mapData.toDF().offset(99),
+      mapData.collect().drop(99).map(r => Row.fromSeq(r.productIterator.toSeq)))

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #36519: [SPARK-39159][SQL] Add new Dataset API for Offset

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #36519:
URL: https://github.com/apache/spark/pull/36519#issuecomment-1124742630

   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #36519: [SPARK-39159][SQL] Add new Dataset API for Offset

Posted by GitBox <gi...@apache.org>.
beliefer commented on code in PR #36519:
URL: https://github.com/apache/spark/pull/36519#discussion_r880091005


##########
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala:
##########
@@ -2102,6 +2102,16 @@ class Dataset[T] private[sql](
     Limit(Literal(n), logicalPlan)
   }
 
+  /**
+   * Returns a new Dataset by skipping the first `m` rows.

Review Comment:
   Yeah. this is a minor issue. I will fix it by the way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jerqi commented on a diff in pull request #36519: [SPARK-39159][SQL] Add new Dataset API for Offset

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #36519:
URL: https://github.com/apache/spark/pull/36519#discussion_r879981480


##########
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala:
##########
@@ -2102,6 +2102,16 @@ class Dataset[T] private[sql](
     Limit(Literal(n), logicalPlan)
   }
 
+  /**
+   * Returns a new Dataset by skipping the first `m` rows.

Review Comment:
    a typo? may be 'n'?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #36519: [SPARK-39159][SQL] Add new Dataset API for Offset

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #36519:
URL: https://github.com/apache/spark/pull/36519#issuecomment-1125776300

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #36519: [SPARK-39159][SQL] Add new Dataset API for Offset

Posted by GitBox <gi...@apache.org>.
beliefer commented on PR #36519:
URL: https://github.com/apache/spark/pull/36519#issuecomment-1125800326

   @cloud-fan Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org