You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "juliuszsompolski (via GitHub)" <gi...@apache.org> on 2023/09/04 16:57:27 UTC

[GitHub] [spark] juliuszsompolski opened a new pull request, #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

juliuszsompolski opened a new pull request, #42806:
URL: https://github.com/apache/spark/pull/42806

   ### What changes were proposed in this pull request?
   
   After an "INVALID_HANDLE.OPERATION_NOT_FOUND" error, client would realize that the failure in ReattachExecute was because the initial ExecutePlan didn't reach the server. It would then call another ExecutePlan, and it will throw a RetryException to let the retry logic handle retrying. However, the retry logic would then immediately send a ReattachExecute, and the client will want to use the iterator of the reattach.
   
   However, on the server the ExecutePlan and ReattachExecute could race with each other:
   * ExecutePlan didn't reach executeHolder.runGrpcResponseSender(responseSender) in SparkConnectExecutePlanHandler yet.
   * ReattachExecute races around and reaches executeHolder.runGrpcResponseSender(responseSender) in SparkConnectReattachExecuteHandler first.
   * When ExecutePlan reaches executeHolder.runGrpcResponseSender(responseSender), and executionObserver.attachConsumer(this) is called in ExecuteGrpcResponseSender of ExecutePlan, it will kick out the ExecuteGrpcResponseSender of ReattachExecute.
   
   So even though ReattachExecute came later, it will get interrupted by the earlier ExecutePlan and finish with a INVALID_CURSOR.DISCONNECTED error.
   
   After this change, such a race between ExecutePlan / ReattachExecute can still happens, but the client should no longer send these requests in such quick succession.
   
   ### Why are the changes needed?
   
   While deemed unlikely in the ticket, this turned out to happen in practice in testing.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Integration testing.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #42806:
URL: https://github.com/apache/spark/pull/42806#issuecomment-1709958121

   huh, I don't know how I missed it when I was checking and commenting https://github.com/apache/spark/pull/42806#issuecomment-1706640901 ...
   Thanks again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #42806:
URL: https://github.com/apache/spark/pull/42806#issuecomment-1705547797

   @hvanhovell @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #42806:
URL: https://github.com/apache/spark/pull/42806#issuecomment-1706640901

   https://github.com/juliuszsompolski/apache-spark/actions/runs/6076122424/job/16483638602
   This module timed out. All connect related tests finished successfuly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42806:
URL: https://github.com/apache/spark/pull/42806#issuecomment-1707679262

   Merged to master and branch-3.5.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42806:
URL: https://github.com/apache/spark/pull/42806#issuecomment-1707812707

   Yeah, let me make a quick followup.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #42806:
URL: https://github.com/apache/spark/pull/42806#issuecomment-1708001498

   Thank you @HyukjinKwon .
   Does CI linting not catch this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #42806:
URL: https://github.com/apache/spark/pull/42806#issuecomment-1707809729

   Isn't this error related to your changes?
   ```
   starting mypy annotations test...
   annotations failed mypy checks:
   python/pyspark/sql/connect/client/reattach.py:149: error: Incompatible types in assignment (expression has type "None", variable has type "Iterator[ExecutePlanResponse]")  [assignment]
   python/pyspark/sql/connect/client/reattach.py:254: error: Incompatible types in assignment (expression has type "None", variable has type "Iterator[ExecutePlanResponse]")  [assignment]
   python/pyspark/sql/connect/client/reattach.py:[25](https://github.com/apache/spark/actions/runs/6093065151/job/16532169212#step:19:26)8: error: Incompatible types in assignment (expression has type "None", variable has type "Iterator[ExecutePlanResponse]")  [assignment]
   Found 3 errors in 1 file (checked 703 source files)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute
URL: https://github.com/apache/spark/pull/42806


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42806:
URL: https://github.com/apache/spark/pull/42806#issuecomment-1709284803

   Actually it did :-) https://github.com/juliuszsompolski/apache-spark/actions/runs/6076122424/job/16483638163


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42806:
URL: https://github.com/apache/spark/pull/42806#issuecomment-1707816856

   https://github.com/apache/spark/pull/42830


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org