You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "cdkrot (via GitHub)" <gi...@apache.org> on 2023/10/10 16:01:52 UTC

[PR] [SPARK-45486][CONNECT] Make add_artifact request idempotent [spark]

cdkrot opened a new pull request, #43314:
URL: https://github.com/apache/spark/pull/43314

   ### What changes were proposed in this pull request?
   
   1. Make add_artifact request idempotent i.e. subsequent requests will succeed if the same content is provided. This makes retrying more safe.
   2. Fix existing error handling mechanism:
   
   Before the update the error looks like that
   ```
   >>> spark.addArtifact("tmp.py", pyfile=True)
   >>> spark.addArtifact("tmp.py", pyfile=True) # fails
   2023-10-09 15:55:30,352 82873 DEBUG __iter__ Will retry call after 60014.279746934524 ms sleep (error: <_InactiveRpcError of RPC that terminated with:
           status = StatusCode.UNKNOWN
           details = ""
           debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"", grpc_status:2, created_time:"2023-10-09T15:55:30.351541+02:00"}"
   >)
   (this is also getting retried)
   ```
   
   Now it looks:
   
   ```
   >>> spark.addArtifact("abc.sh", file=True)
   >>> spark.addArtifact("abc.sh", file=True) # passes
   >>> # update file's content
   >>> spark.addArtifact("abc.sh", file=True) # now fails
   Traceback (most recent call last):
   [...]
   grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
           status = StatusCode.ALREADY_EXISTS
           details = "Duplicate Artifact: files/env.sh. Artifacts cannot be overwritten."
           debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"Duplicate Artifact: files/abc.sh. Artifacts cannot be overwritten.", grpc_status:6, created_time:"2023-10-10T01:25:38.231317+02:00"}"
   >
   
   ```
   
   ### Why are the changes needed?
   
   Makes retrying more robust, adds user-friendly error (see above).
   
   ### Does this PR introduce _any_ user-facing change?
   
   Mostly internal improvements
   
   
   ### How was this patch tested?
   Unit testing, testing against server
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45486][CONNECT] Make add_artifact request idempotent [spark]

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on PR #43314:
URL: https://github.com/apache/spark/pull/43314#issuecomment-1759369905

   Retriggered, thanks! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45486][CONNECT] Make add_artifact request idempotent [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43314:
URL: https://github.com/apache/spark/pull/43314#issuecomment-1758732953

   Mind retriggering the failed tests https://github.com/cdkrot/apache_spark/runs/17602108213 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45486][CONNECT] Make add_artifact request idempotent [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43314:
URL: https://github.com/apache/spark/pull/43314#issuecomment-1763932059

   The JIRA number was wrong. Let me switch the JIRA with SPARK-45486


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45486][CONNECT] Make add_artifact request idempotent [spark]

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on PR #43314:
URL: https://github.com/apache/spark/pull/43314#issuecomment-1755753515

   cc @grundprinzip, @HyukjinKwon  -- moved those bits from somewhere else.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45486][CONNECT] Make add_artifact request idempotent [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #43314: [SPARK-45486][CONNECT] Make add_artifact request idempotent
URL: https://github.com/apache/spark/pull/43314


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45486][CONNECT] Make add_artifact request idempotent [spark]

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on code in PR #43314:
URL: https://github.com/apache/spark/pull/43314#discussion_r1358099191


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala:
##########
@@ -82,6 +83,40 @@ class SparkConnectArtifactManager(sessionHolder: SessionHolder) extends Logging
    */
   def getSparkConnectPythonIncludes: Seq[String] = pythonIncludeList.asScala.toSeq
 
+  private def areFilesEqual(path1: Path, path2: Path): Boolean = {

Review Comment:
   Added this, thanks for the feedback!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45486][CONNECT] Make add_artifact request idempotent [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43314:
URL: https://github.com/apache/spark/pull/43314#discussion_r1356985023


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala:
##########
@@ -82,6 +83,40 @@ class SparkConnectArtifactManager(sessionHolder: SessionHolder) extends Logging
    */
   def getSparkConnectPythonIncludes: Seq[String] = pythonIncludeList.asScala.toSeq
 
+  private def areFilesEqual(path1: Path, path2: Path): Boolean = {

Review Comment:
   Can't you re-use some functions from libs like Apache Common: `IOUtils::contentEquals` or `FileUtils.contentEquals`. Or there is some reason to don't do that. Please, clarify that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45486][CONNECT] Make add_artifact request idempotent [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43314:
URL: https://github.com/apache/spark/pull/43314#issuecomment-1763552121

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45486][CONNECT] Make add_artifact request idempotent [spark]

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on code in PR #43314:
URL: https://github.com/apache/spark/pull/43314#discussion_r1358090333


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala:
##########
@@ -82,6 +83,40 @@ class SparkConnectArtifactManager(sessionHolder: SessionHolder) extends Logging
    */
   def getSparkConnectPythonIncludes: Seq[String] = pythonIncludeList.asScala.toSeq
 
+  private def areFilesEqual(path1: Path, path2: Path): Boolean = {

Review Comment:
   That sounds good, let me see if we can use it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org