You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2023/04/13 13:37:40 UTC

[GitHub] [spark] beliefer opened a new pull request, #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

beliefer opened a new pull request, #40782:
URL: https://github.com/apache/spark/pull/40782

   ### What changes were proposed in this pull request?
   Operations on `LocalRelation` can mostly be done locally (without sending RPCs).
   We should leverage this.
   
   
   ### Why are the changes needed?
   Avoid sending RPCs for `LocalRelation`.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   New feature.
   
   
   ### How was this patch tested?
   Exists test cases.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #40782:
URL: https://github.com/apache/spark/pull/40782#discussion_r1169388839


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -439,7 +440,23 @@ class SparkSession private[sql] (
   }
 
   private[sql] def execute[T](plan: proto.Plan, encoder: AgnosticEncoder[T]): SparkResult[T] = {
-    val value = client.execute(plan)
+    val value = if (plan.hasRoot && plan.getRoot.hasLocalRelation) {
+      // Short circuit local relation RPCs
+      val localRelation = plan.getRoot.getLocalRelation
+      val response = proto.ExecutePlanResponse.newBuilder()
+      val reader = new ArrowStreamReader(localRelation.getData.newInput(), allocator)
+      reader.loadNextBatch()
+      val batch = proto.ExecutePlanResponse.ArrowBatch
+        .newBuilder()
+        .setRowCount(reader.getVectorSchemaRoot.getRowCount)

Review Comment:
   Do we need to set row count? You can just modify SparkResult to make the assert a bit more lenient.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40782:
URL: https://github.com/apache/spark/pull/40782#discussion_r1166348527


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -439,8 +439,19 @@ class SparkSession private[sql] (
   }
 
   private[sql] def execute[T](plan: proto.Plan, encoder: AgnosticEncoder[T]): SparkResult[T] = {
-    val value = client.execute(plan)
-    val result = new SparkResult(value, allocator, encoder)
+    val result = if (plan.hasRoot && plan.getRoot.hasLocalRelation) {
+      // Short circuit local relation RPCs
+      val localRelation = plan.getRoot.getLocalRelation
+      new SparkResult(
+        Seq.empty[proto.ExecutePlanResponse].iterator.asJava,

Review Comment:
   It seems a good idea.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40782:
URL: https://github.com/apache/spark/pull/40782#issuecomment-1552613772

   Because there are difference suggestion from @hvanhovell and @ueshin, I don't know how to continue this job.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40782:
URL: https://github.com/apache/spark/pull/40782#issuecomment-1591190171

   ping @hvanhovell cc @ueshin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #40782:
URL: https://github.com/apache/spark/pull/40782#discussion_r1166244987


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -439,8 +439,19 @@ class SparkSession private[sql] (
   }
 
   private[sql] def execute[T](plan: proto.Plan, encoder: AgnosticEncoder[T]): SparkResult[T] = {
-    val value = client.execute(plan)
-    val result = new SparkResult(value, allocator, encoder)
+    val result = if (plan.hasRoot && plan.getRoot.hasLocalRelation) {
+      // Short circuit local relation RPCs
+      val localRelation = plan.getRoot.getLocalRelation
+      new SparkResult(
+        Seq.empty[proto.ExecutePlanResponse].iterator.asJava,

Review Comment:
   I am wondering if you can populate the data from the LocalRelation into an instance of `proto.ExecutePlanResponse` thus we do not need to branch the code in `SparkResult`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42669][CONNECT] Short circuit local relation RPCs [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #40782:
URL: https://github.com/apache/spark/pull/40782#issuecomment-2091944074

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40782:
URL: https://github.com/apache/spark/pull/40782#issuecomment-1507999037

   > So this happens when
   > 
   It happens when the root plan is `LocalRelation`.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42669][CONNECT] Short circuit local relation RPCs [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #40782:
URL: https://github.com/apache/spark/pull/40782#issuecomment-1905067467

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on PR #40782:
URL: https://github.com/apache/spark/pull/40782#issuecomment-1507878936

   So this happens when 
   ```
   val df = createDataFrame(...)
   df.collect()
   ```
   ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40782:
URL: https://github.com/apache/spark/pull/40782#issuecomment-1554336985

   @ueshin @hvanhovell Recently, https://github.com/apache/spark/pull/41064 added the rowCount statistics to `LocalRelation`. In this PR, @ueshin also suggested to add the row count as optional to `LocalRelation` message.
   So I think this is a chance to add optional row count to `LocalRelation` message.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42669][CONNECT] Short circuit local relation RPCs [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40782:
URL: https://github.com/apache/spark/pull/40782#issuecomment-1762631600

   @hvanhovell Do we need this change?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40782:
URL: https://github.com/apache/spark/pull/40782#issuecomment-1507797626

   The failure GA is unrelated to this PR>
   ping @hvanhovell @zhengruifeng cc @dongjoon-hyun @amaliujia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40782:
URL: https://github.com/apache/spark/pull/40782#issuecomment-1513048262

   @hvanhovell The failed GA is unrelated to this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #40782:
URL: https://github.com/apache/spark/pull/40782#issuecomment-1732142935

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40782:
URL: https://github.com/apache/spark/pull/40782#discussion_r1166348527


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -439,8 +439,19 @@ class SparkSession private[sql] (
   }
 
   private[sql] def execute[T](plan: proto.Plan, encoder: AgnosticEncoder[T]): SparkResult[T] = {
-    val value = client.execute(plan)
-    val result = new SparkResult(value, allocator, encoder)
+    val result = if (plan.hasRoot && plan.getRoot.hasLocalRelation) {
+      // Short circuit local relation RPCs
+      val localRelation = plan.getRoot.getLocalRelation
+      new SparkResult(
+        Seq.empty[proto.ExecutePlanResponse].iterator.asJava,

Review Comment:
   SGTM.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #40782:
URL: https://github.com/apache/spark/pull/40782#discussion_r1169389077


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -439,7 +440,23 @@ class SparkSession private[sql] (
   }
 
   private[sql] def execute[T](plan: proto.Plan, encoder: AgnosticEncoder[T]): SparkResult[T] = {
-    val value = client.execute(plan)
+    val value = if (plan.hasRoot && plan.getRoot.hasLocalRelation) {
+      // Short circuit local relation RPCs
+      val localRelation = plan.getRoot.getLocalRelation
+      val response = proto.ExecutePlanResponse.newBuilder()
+      val reader = new ArrowStreamReader(localRelation.getData.newInput(), allocator)
+      reader.loadNextBatch()

Review Comment:
   I think we need to close the reader, otherwise the vector root is never cleaned up (memory leak).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40782:
URL: https://github.com/apache/spark/pull/40782#discussion_r1169541014


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -439,7 +440,23 @@ class SparkSession private[sql] (
   }
 
   private[sql] def execute[T](plan: proto.Plan, encoder: AgnosticEncoder[T]): SparkResult[T] = {
-    val value = client.execute(plan)
+    val value = if (plan.hasRoot && plan.getRoot.hasLocalRelation) {
+      // Short circuit local relation RPCs
+      val localRelation = plan.getRoot.getLocalRelation
+      val response = proto.ExecutePlanResponse.newBuilder()
+      val reader = new ArrowStreamReader(localRelation.getData.newInput(), allocator)
+      reader.loadNextBatch()

Review Comment:
   Sorry. I forgot it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40782:
URL: https://github.com/apache/spark/pull/40782#issuecomment-1560418014

   cc @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #40782:
URL: https://github.com/apache/spark/pull/40782#discussion_r1174158327


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala:
##########
@@ -80,7 +80,10 @@ private[sql] class SparkResult[T](
           }
           while (reader.loadNextBatch()) {
             val rowCount = root.getRowCount
-            assert(root.getRowCount == response.getArrowBatch.getRowCount) // HUH!
+            assert(
+              response.getIsLocalBuilt ||
+                root.getRowCount == response.getArrowBatch.getRowCount

Review Comment:
   I guess adding the row count as optional to `LocalRelation` message is more useful.
   The server can check if the expected number of rows are provided, and we can reuse it here to set the row count.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40782:
URL: https://github.com/apache/spark/pull/40782#discussion_r1174493003


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala:
##########
@@ -80,7 +80,10 @@ private[sql] class SparkResult[T](
           }
           while (reader.loadNextBatch()) {
             val rowCount = root.getRowCount
-            assert(root.getRowCount == response.getArrowBatch.getRowCount) // HUH!
+            assert(
+              response.getIsLocalBuilt ||
+                root.getRowCount == response.getArrowBatch.getRowCount

Review Comment:
   @hvanhovell 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40782:
URL: https://github.com/apache/spark/pull/40782#discussion_r1173418320


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala:
##########
@@ -80,7 +80,10 @@ private[sql] class SparkResult[T](
           }
           while (reader.loadNextBatch()) {
             val rowCount = root.getRowCount
-            assert(root.getRowCount == response.getArrowBatch.getRowCount) // HUH!
+            assert(
+              response.getIsLocalBuilt ||
+                root.getRowCount == response.getArrowBatch.getRowCount

Review Comment:
   ```
   Error: Field "1" on message "ArrowBatch" moved from outside to inside a oneof.
   [8](https://github.com/beliefer/spark/actions/runs/4761585011/jobs/8462987474#step:5:9)
   Error: buf found 1 breaking changes.
   ```
   
   It seems 1 breaking change.



##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala:
##########
@@ -80,7 +80,10 @@ private[sql] class SparkResult[T](
           }
           while (reader.loadNextBatch()) {
             val rowCount = root.getRowCount
-            assert(root.getRowCount == response.getArrowBatch.getRowCount) // HUH!
+            assert(
+              response.getIsLocalBuilt ||
+                root.getRowCount == response.getArrowBatch.getRowCount

Review Comment:
   ```
   Error: Field "1" on message "ArrowBatch" moved from outside to inside a oneof.
   [8](https://github.com/beliefer/spark/actions/runs/4761585011/jobs/8462987474#step:5:9)
   Error: buf found 1 breaking changes.
   ```
   
   It seems 1 breaking change. @hvanhovell 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #40782:
URL: https://github.com/apache/spark/pull/40782#discussion_r1173246221


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -439,7 +439,20 @@ class SparkSession private[sql] (
   }
 
   private[sql] def execute[T](plan: proto.Plan, encoder: AgnosticEncoder[T]): SparkResult[T] = {
-    val value = client.execute(plan)
+    val value = if (plan.hasRoot && plan.getRoot.hasLocalRelation) {
+      // Short circuit local relation RPCs
+      val localRelation = plan.getRoot.getLocalRelation
+      val response = proto.ExecutePlanResponse.newBuilder()
+      val batch = proto.ExecutePlanResponse.ArrowBatch
+        .newBuilder()
+        .setData(localRelation.getData)
+        .build()
+      response.setArrowBatch(batch).setIsLocalBuilt(true)

Review Comment:
   Why do we need this? I don't think we should change the protocol for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs
URL: https://github.com/apache/spark/pull/40782


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #40782:
URL: https://github.com/apache/spark/pull/40782#discussion_r1183173048


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala:
##########
@@ -80,7 +80,10 @@ private[sql] class SparkResult[T](
           }
           while (reader.loadNextBatch()) {
             val rowCount = root.getRowCount
-            assert(root.getRowCount == response.getArrowBatch.getRowCount) // HUH!
+            assert(
+              response.getIsLocalBuilt ||
+                root.getRowCount == response.getArrowBatch.getRowCount

Review Comment:
   @hvanhovell What do you think about my idea if making the row count optional causes the breaking change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #40782:
URL: https://github.com/apache/spark/pull/40782#issuecomment-1507802914

   sorry, `test_parity_torch_distributor` became flaky, I am going to disable related cases for now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40782:
URL: https://github.com/apache/spark/pull/40782#discussion_r1189342837


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala:
##########
@@ -80,7 +80,10 @@ private[sql] class SparkResult[T](
           }
           while (reader.loadNextBatch()) {
             val rowCount = root.getRowCount
-            assert(root.getRowCount == response.getArrowBatch.getRowCount) // HUH!
+            assert(
+              response.getIsLocalBuilt ||
+                root.getRowCount == response.getArrowBatch.getRowCount

Review Comment:
   @hvanhovell Could you take a close look ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #40782:
URL: https://github.com/apache/spark/pull/40782#discussion_r1173273281


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala:
##########
@@ -80,7 +80,10 @@ private[sql] class SparkResult[T](
           }
           while (reader.loadNextBatch()) {
             val rowCount = root.getRowCount
-            assert(root.getRowCount == response.getArrowBatch.getRowCount) // HUH!
+            assert(
+              response.getIsLocalBuilt ||
+                root.getRowCount == response.getArrowBatch.getRowCount

Review Comment:
   You means change the protocol that make the row count optional ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #40782:
URL: https://github.com/apache/spark/pull/40782#discussion_r1173246465


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala:
##########
@@ -80,7 +80,10 @@ private[sql] class SparkResult[T](
           }
           while (reader.loadNextBatch()) {
             val rowCount = root.getRowCount
-            assert(root.getRowCount == response.getArrowBatch.getRowCount) // HUH!
+            assert(
+              response.getIsLocalBuilt ||
+                root.getRowCount == response.getArrowBatch.getRowCount

Review Comment:
   I'd just make the row count optional. It is kinda weird it is in the protocol anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org