You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "HyukjinKwon (via GitHub)" <gi...@apache.org> on 2023/11/22 08:54:43 UTC

[PR] [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite [spark]

HyukjinKwon opened a new pull request, #43955:
URL: https://github.com/apache/spark/pull/43955

   ### What changes were proposed in this pull request?
   
   This PR fixes the request stream iterator within `ExecutePlanResponseReattachableIterator` explicitly lazy. 
   
   ### Why are the changes needed?
   
   In order to preserve the previous behaviour at gRPC 1.56.0. After upgrading gRPC from 1.56.0 to 1.59.3, it makes the first request when the stream is created, but our code here assumes that no request is made before that.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, the upgrade (SPARK-46039) has not been released out yet.
   
   ### How was this patch tested?
   
   Fixed unittests. I manually debugged via IDE.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Np.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43955:
URL: https://github.com/apache/spark/pull/43955#issuecomment-1822476780

   All tests passed. Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #43955: [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite 
URL: https://github.com/apache/spark/pull/43955


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43955:
URL: https://github.com/apache/spark/pull/43955#issuecomment-1822350285

   TBH, this is rather a band-aid fix .. I failed to which change exactly has been made in gRPC (I looked through the release notes but could not find such mention). I manually debugged the server/client side via IDE, and found out that this at least fixes the problem.
   
   cc @juliuszsompolski @hvanhovell @dongjoon-hyun FYI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite [spark]

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #43955:
URL: https://github.com/apache/spark/pull/43955#issuecomment-1823022933

   This is using `io.grpc.stub.ClientCalls.blockingServerStreamingCall` and I see that https://github.com/grpc/grpc-java/pull/10496 is the only relevant diff between grpc 1.56 and 1.59, though I don't understand yet how this resulted in this behaviour change... I created https://github.com/grpc/grpc-java/issues/10697


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite [spark]

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on code in PR #43955:
URL: https://github.com/apache/spark/pull/43955#discussion_r1401915625


##########
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/service/SparkConnectServiceE2ESuite.scala:
##########
@@ -77,10 +77,10 @@ class SparkConnectServiceE2ESuite extends SparkConnectServerTest {
 
       // query3 has not been submitted before, so it should now fail with SESSION_CLOSED
       // TODO(SPARK-46042) Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite

Review Comment:
   nit: this TODO is leftover after this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43955:
URL: https://github.com/apache/spark/pull/43955#discussion_r1401744439


##########
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala:
##########
@@ -101,7 +101,20 @@ class ExecutePlanResponseReattachableIterator(
   // throw error on first iter.hasNext() or iter.next()
   // Visible for testing.
   private[connect] var iter: Option[java.util.Iterator[proto.ExecutePlanResponse]] =
-    Some(rawBlockingStub.executePlan(initialRequest))
+    Some(makeLazyIter(rawBlockingStub.executePlan(initialRequest)))
+
+  // Creates a request that contains the query and returns a stream of `ExecutePlanResponse`.
+  // After upgrading gRPC from 1.56.0 to 1.59.3, it makes the first request when
+  // the stream is created, but here the code here assumes that no request is made before
+  // that, see also SPARK-46042
+  private def makeLazyIter(f: => java.util.Iterator[proto.ExecutePlanResponse])
+      : java.util.Iterator[proto.ExecutePlanResponse] = {
+    new java.util.Iterator[proto.ExecutePlanResponse] {
+      private lazy val internalIter = f
+      override def hasNext: Boolean = internalIter.hasNext
+      override def next(): proto.ExecutePlanResponse = internalIter.next
+    }
+  }

Review Comment:
   Wow, thank you for the quick fix!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43955:
URL: https://github.com/apache/spark/pull/43955#issuecomment-1822938318

   I didn't check Python side actually .. I'll probably check tmr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43955:
URL: https://github.com/apache/spark/pull/43955#issuecomment-1822353835

   Build: https://github.com/HyukjinKwon/spark/actions/runs/6954927792/job/18922718206


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43955:
URL: https://github.com/apache/spark/pull/43955#discussion_r1401950399


##########
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/service/SparkConnectServiceE2ESuite.scala:
##########
@@ -77,10 +77,10 @@ class SparkConnectServiceE2ESuite extends SparkConnectServerTest {
 
       // query3 has not been submitted before, so it should now fail with SESSION_CLOSED
       // TODO(SPARK-46042) Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite

Review Comment:
   Oops



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43955:
URL: https://github.com/apache/spark/pull/43955#issuecomment-1822650297

   I'm totally open to other changes. Just made a quick fix to keep the previous behaviour ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite [spark]

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on code in PR #43955:
URL: https://github.com/apache/spark/pull/43955#discussion_r1401923866


##########
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala:
##########
@@ -101,7 +101,20 @@ class ExecutePlanResponseReattachableIterator(
   // throw error on first iter.hasNext() or iter.next()
   // Visible for testing.
   private[connect] var iter: Option[java.util.Iterator[proto.ExecutePlanResponse]] =
-    Some(rawBlockingStub.executePlan(initialRequest))
+    Some(makeLazyIter(rawBlockingStub.executePlan(initialRequest)))
+
+  // Creates a request that contains the query and returns a stream of `ExecutePlanResponse`.
+  // After upgrading gRPC from 1.56.0 to 1.59.3, it makes the first request when
+  // the stream is created, but here the code here assumes that no request is made before
+  // that, see also SPARK-46042
+  private def makeLazyIter(f: => java.util.Iterator[proto.ExecutePlanResponse])
+      : java.util.Iterator[proto.ExecutePlanResponse] = {
+    new java.util.Iterator[proto.ExecutePlanResponse] {
+      private lazy val internalIter = f
+      override def hasNext: Boolean = internalIter.hasNext
+      override def next(): proto.ExecutePlanResponse = internalIter.next
+    }
+  }

Review Comment:
   I think it's an unnecessary hack to preserve a behaviour that wasn't really desired in the first place.
   As written above, I'd rather just embrace the change. The only public API impact is on `Dataset.toLocalIterator` now eagerly submitting query to the server.
   
   There is an existing difference in Spark Connect, and non-Spark Connect `toLocalIterator`.
   In non Spark Connect, `Dataset.toLocalIterator` will proceed to lazily execute the query, task by task (each task submitted as a separate Spark Job) as the iterator is consumed and more data is needed; this will be blocking every time a new task is needed, as it will only be triggered when the data from it is needed.
   In Spark Connect, the whole query will be submitted and executed just like normally. The change to do it when `Dataset.toLocalIterator` is first called as opposed when the iterator is opened for the first time is not significant compared to the existing difference outlined above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite [spark]

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #43955:
URL: https://github.com/apache/spark/pull/43955#issuecomment-1822846660

   Looking into it further, I take it back. Actually, our error retrying depends on this lazy behaviour. So it either have to be further reworked, or it has to also be fixed in `GrpcRetryHandler`...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46042][CONNECT] Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite [spark]

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #43955:
URL: https://github.com/apache/spark/pull/43955#issuecomment-1822605607

   > In order to preserve the previous behaviour at gRPC 1.56.0. After upgrading gRPC from 1.56.0 to 1.59.3, it makes the first request when the stream is created, but our code here assumes that no request is made before that.
   
   Whoa, that's a rather big change to drop in that small of an gRPC upgrade.
   
   FWIW, I really disliked the previous behaviour, and, on the other hand, many tests have to do
   ```
   iter.hasNext() // open the iterator
   ```
   because of it. I think it would be more logical if just calling `execute` sent the execution to start on the server.
   
   FWIW, I would be in favor of allowing this behaviour change, and just removing that piece of test case as not needed anymore.
   The only public API affected by it would be `Dataset.toLocalIterator`, because others open and consume the iterator immediately anyway.
   For `Dataset.toLocalIterator`, I believe that the behaviour change is a good one. Before, the query wouldn't even be submitted to the server until someone calls `next` or `hasNext` on the resulting iterator. I think that it is better that the query is submitted for execution immediately.
   
   WDYT @HyukjinKwon @LuciferYang @dongjoon-hyun ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org