You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/16 07:41:25 UTC

[GitHub] [spark] zhouyejoe opened a new pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

zhouyejoe opened a new pull request #34018:
URL: https://github.com/apache/spark/pull/34018


   
   ### What changes were proposed in this pull request?
   Remove the appAttemptId from TransportConf, and parsing through SparkEnv. Share the PR in WIP to gather initial feedback. Will add unit test.
   
   ### Why are the changes needed?
   Push based shuffle will fail if there are any attemptId set in the SparkConf
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Will test within our cluster.
   Unit test to be added.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922144142


   BTW, I tested in three modes:
   1. Local mode with Spark Pi example.
   2. Yarn client mode from Spark-shell with a large amount of shuffle.
   3. Yarn cluster mode by launching an actual application with a large amount of shuffle.
   cc @mridulm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922228161


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47941/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922235573


   Merging to master/3.2
   @zhouyejoe thanks for the work and test!
   @Ngone51 @mridulm @venkata91 thanks for the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711118051



##########
File path: core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala
##########
@@ -298,7 +298,7 @@ private class CSMockExternalClusterManager extends ExternalClusterManager {
     ts = mock[TaskSchedulerImpl]
     when(ts.sc).thenReturn(sc)
     when(ts.applicationId()).thenReturn("appid1")
-    when(ts.applicationAttemptId()).thenReturn(Some("attempt1"))
+    when(ts.applicationAttemptId()).thenReturn(Some("1"))

Review comment:
       This test has it as a a string and not an `int` as attempt id need not be an int for non-yarn envs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710125319



##########
File path: core/src/main/scala/org/apache/spark/SparkEnv.scala
##########
@@ -355,6 +355,11 @@ object SparkEnv extends Logging {
       None
     }
 
+    // Set the application attemptId in the ExternalShuffleClient is applicable.
+    // If there is no attemptId assigned, set the attemptId to -1.
+    externalShuffleClient.foreach(
+      shuffleClient => shuffleClient.setAppAttemptId(conf.getInt(config.APP_ATTEMPT_ID.key, -1)))

Review comment:
       SparkEnv is created at L460:
   https://github.com/apache/spark/blob/adbea252db0f61ed3aaa45ea20b704bad6c47408/core/src/main/scala/org/apache/spark/SparkContext.scala#L460
   
   But application attemptId is set at L586 :
   https://github.com/apache/spark/blob/adbea252db0f61ed3aaa45ea20b704bad6c47408/core/src/main/scala/org/apache/spark/SparkContext.scala#L586
   
   So, isn't `conf.getInt(config.APP_ATTEMPT_ID.key, -1)` only return -1 here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on a change in pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710681929



##########
File path: core/src/main/scala/org/apache/spark/SparkEnv.scala
##########
@@ -355,6 +355,11 @@ object SparkEnv extends Logging {
       None
     }
 
+    // Set the application attemptId in the ExternalShuffleClient is applicable.
+    // If there is no attemptId assigned, set the attemptId to -1.
+    externalShuffleClient.foreach(
+      shuffleClient => shuffleClient.setAppAttemptId(conf.getInt(config.APP_ATTEMPT_ID.key, -1)))

Review comment:
       Driver case is fixed by directly calling the setAppAttemptId when setting the conf. Tested in our Yarn cluster, works fine in Yarn cluster mode.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710831644



##########
File path: core/src/main/scala/org/apache/spark/SparkEnv.scala
##########
@@ -355,6 +355,11 @@ object SparkEnv extends Logging {
       None
     }
 
+    // Set the application attemptId in the ExternalShuffleClient is applicable.
+    // If there is no attemptId assigned, set the attemptId to -1.
+    externalShuffleClient.foreach(
+      shuffleClient => shuffleClient.setAppAttemptId(conf.getInt(config.APP_ATTEMPT_ID.key, -1)))

Review comment:
       Make sense. Updated. Tested again within the cluster which worked fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711118051



##########
File path: core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala
##########
@@ -298,7 +298,7 @@ private class CSMockExternalClusterManager extends ExternalClusterManager {
     ts = mock[TaskSchedulerImpl]
     when(ts.sc).thenReturn(sc)
     when(ts.applicationId()).thenReturn("appid1")
-    when(ts.applicationAttemptId()).thenReturn(Some("attempt1"))
+    when(ts.applicationAttemptId()).thenReturn(Some("1"))

Review comment:
       This test has it as a a string and not an `int` - as attempt id need not be an int for non-yarn envs; we have to preserve that flexibility.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922240171


   **[Test build #143433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143433/testReport)** for PR 34018 at commit [`f6e47b8`](https://github.com/apache/spark/commit/f6e47b8108e458b0057dd20554876dbf79b93e37).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] venkata91 commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
venkata91 commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711347931



##########
File path: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
##########
@@ -470,7 +470,9 @@ private[spark] object CoarseGrainedExecutorBackend extends Logging {
       driverConf.set(EXECUTOR_ID, arguments.executorId)
       val env = SparkEnv.createExecutorEnv(driverConf, arguments.executorId, arguments.bindAddress,
         arguments.hostname, arguments.cores, cfg.ioEncryptionKey, isLocal = false)
-
+      // Set the application attemptId in the BlockStoreClient is applicable.

Review comment:
       super nit: `Set the application attemptId in the BlockStoreClient is applicable.` -> `Set the application attemptId in the BlockStoreClient if available.`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm edited a comment on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm edited a comment on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-921930779






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711177485



##########
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##########
@@ -583,7 +583,10 @@ class SparkContext(config: SparkConf) extends Logging {
     _applicationId = _taskScheduler.applicationId()
     _applicationAttemptId = _taskScheduler.applicationAttemptId()
     _conf.set("spark.app.id", _applicationId)
-    _applicationAttemptId.foreach(attemptId => _conf.set(APP_ATTEMPT_ID, attemptId))
+    _applicationAttemptId.foreach { attemptId =>
+      _conf.set(APP_ATTEMPT_ID, attemptId)
+      _env.blockManager.blockStoreClient.setAppAttemptId(attemptId.toInt)

Review comment:
       Yeah..I think we missed it previously. The best way is to use String type too across protocols. I'm not sure we'd change it now.  Currently, `toInt` looks fine. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922228161


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47941/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on a change in pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710681652



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java
##########
@@ -46,6 +46,8 @@
 
   protected volatile TransportClientFactory clientFactory;
   protected String appId;
+  // Store the application attemptId
+  protected int appAttemptId;

Review comment:
       I actually keep the appAttemptId within BlockStoreClient and move the setAppAttemptId into the BlockStoreClient, as in BlockManager it only calls from BlockStoreClient, other than ExternalBlockStoreClient.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711200437



##########
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##########
@@ -583,7 +583,10 @@ class SparkContext(config: SparkConf) extends Logging {
     _applicationId = _taskScheduler.applicationId()
     _applicationAttemptId = _taskScheduler.applicationAttemptId()
     _conf.set("spark.app.id", _applicationId)
-    _applicationAttemptId.foreach(attemptId => _conf.set(APP_ATTEMPT_ID, attemptId))
+    _applicationAttemptId.foreach { attemptId =>
+      _conf.set(APP_ATTEMPT_ID, attemptId)
+      _env.blockManager.blockStoreClient.setAppAttemptId(attemptId.toInt)

Review comment:
       Resolving conversation based on comment [here](https://github.com/apache/spark/pull/34018#issuecomment-921930779).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922208595


   Ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-920666163


   cc @mridulm @Ngone51 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710748822



##########
File path: core/src/main/scala/org/apache/spark/SparkEnv.scala
##########
@@ -355,6 +355,11 @@ object SparkEnv extends Logging {
       None
     }
 
+    // Set the application attemptId in the ExternalShuffleClient is applicable.
+    // If there is no attemptId assigned, set the attemptId to -1.
+    externalShuffleClient.foreach(
+      shuffleClient => shuffleClient.setAppAttemptId(conf.getInt(config.APP_ATTEMPT_ID.key, -1)))

Review comment:
       Shall we also set appAttemptId after the executor's SparkEnv is created? Just like what driver does. e.g., we can set after:
   https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L471-L472




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922208782


   **[Test build #143433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143433/testReport)** for PR 34018 at commit [`f6e47b8`](https://github.com/apache/spark/commit/f6e47b8108e458b0057dd20554876dbf79b93e37).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922208782


   **[Test build #143433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143433/testReport)** for PR 34018 at commit [`f6e47b8`](https://github.com/apache/spark/commit/f6e47b8108e458b0057dd20554876dbf79b93e37).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-920868190


   @zhouyejoe Thanks for the fix. The approach SGTM.
   I will cut 3.2.0 RC3 after this one is merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-921513068


   @zhouyejoe could you update the PR title and description?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922243519


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143433/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711433027



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
##########
@@ -83,6 +87,34 @@ public void init(String appId) {
     clientFactory = context.createClientFactory(bootstraps);
   }
 
+  @Override
+  public void setAppAttemptId(String appAttemptId) {
+    super.setAppAttemptId(appAttemptId);
+    setComparableAppAttemptId(appAttemptId);
+  }
+
+  @Override
+  public String getAppAttemptId() {
+    return Integer.toString(getComparableAppAttemptId());
+  }
+
+  private void setComparableAppAttemptId(String appAttemptId) {
+    // For now, push based shuffle only supports running in YARN.
+    // Application attemptId in YARN is integer and it can be safely parsed
+    // to integer here. For the application attemptId from other cluster set up
+    // which is not numeric, it needs to generate this comparableAppAttemptId
+    // from the String typed appAttemptId through some other customized logic.
+    try {
+      this.comparableAppAttemptId = Integer.parseInt(appAttemptId);
+    } catch (NumberFormatException e) {
+      logger.warn("Push based shuffle requires comparable application attemptId", e);

Review comment:
       Add the parameter to the exception msg as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711068023



##########
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##########
@@ -583,7 +583,10 @@ class SparkContext(config: SparkConf) extends Logging {
     _applicationId = _taskScheduler.applicationId()
     _applicationAttemptId = _taskScheduler.applicationAttemptId()
     _conf.set("spark.app.id", _applicationId)
-    _applicationAttemptId.foreach(attemptId => _conf.set(APP_ATTEMPT_ID, attemptId))
+    _applicationAttemptId.foreach { attemptId =>
+      _conf.set(APP_ATTEMPT_ID, attemptId)
+      _env.blockManager.blockStoreClient.setAppAttemptId(attemptId.toInt)

Review comment:
       Thinking more, attempt id is an `int` for yarn - we dont have an expectation for other cluster managers - and we currently treat it as a String.
   +CC @Ngone51, thoughts of coercing it always to `int` in `BlockStoreClient`  and doing the `toInt` here ? (will result in exception if not int)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710318332



##########
File path: core/src/main/scala/org/apache/spark/SparkEnv.scala
##########
@@ -355,6 +355,11 @@ object SparkEnv extends Logging {
       None
     }
 
+    // Set the application attemptId in the ExternalShuffleClient is applicable.
+    // If there is no attemptId assigned, set the attemptId to -1.
+    externalShuffleClient.foreach(
+      shuffleClient => shuffleClient.setAppAttemptId(conf.getInt(config.APP_ATTEMPT_ID.key, -1)))

Review comment:
       There are two cases here:
   * SparkEnv created in executors - for which this set is fine (which will need attempt id for block push).
   * SparkEnv created at driver - for which `APP_ATTEMPT_ID` is available much later in SparkContext initialization : we should do a `setAppAttemptId` along with setting `APP_ATTEMPT_ID` for the `externalShuffleClient` as well (which will need attempt id for finalization)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-921532085


   > @zhouyejoe could you update the PR title and description?
   
   Updated the PR title and description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922384561


   Thanks for reviewing and merging it @gengliangwang !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922419804


   Thanks for review @mridulm @Ngone51 @venkata91 @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-921412059


   Tested the patch within our own cluster with YARN set up. Push based shuffle works fine in yarn cluster mode, where Yarn will provide default application attempt 1 to the job. Added unit test, leveraging the CSMockExternalBlockManager, started a SparkContext and check whether the application attemptId is being set in BlockStoreClient


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711060433



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java
##########
@@ -46,6 +46,8 @@
 
   protected volatile TransportClientFactory clientFactory;
   protected String appId;
+  // Store the application attemptId
+  protected int appAttemptId;

Review comment:
       If we are moving it here, make it `private`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710747285



##########
File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala
##########
@@ -1325,6 +1325,17 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
       }
     }
   }
+
+  test("SPARK-36772: Store application attemptId in BlockStoreClient for push based shuffle") {
+    // Regression test for SPARK-4180

Review comment:
       SPARK-4180?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710746433



##########
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##########
@@ -583,7 +583,10 @@ class SparkContext(config: SparkConf) extends Logging {
     _applicationId = _taskScheduler.applicationId()
     _applicationAttemptId = _taskScheduler.applicationAttemptId()
     _conf.set("spark.app.id", _applicationId)
-    _applicationAttemptId.foreach(attemptId => _conf.set(APP_ATTEMPT_ID, attemptId))
+    _applicationAttemptId.foreach(attemptId => {
+      _conf.set(APP_ATTEMPT_ID, attemptId)
+      _env.blockManager.blockStoreClient.setAppAttemptId(attemptId.toInt)
+    })

Review comment:
       ```suggestion
       _applicationAttemptId.foreach { attemptId =>
         _conf.set(APP_ATTEMPT_ID, attemptId)
         _env.blockManager.blockStoreClient.setAppAttemptId(attemptId.toInt)
       }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710345503



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java
##########
@@ -46,6 +46,8 @@
 
   protected volatile TransportClientFactory clientFactory;
   protected String appId;
+  // Store the application attemptId
+  protected int appAttemptId;

Review comment:
       Or rather, move it to ExternalBlockStoreClient and keep it private there ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711408619



##########
File path: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
##########
@@ -470,7 +470,9 @@ private[spark] object CoarseGrainedExecutorBackend extends Logging {
       driverConf.set(EXECUTOR_ID, arguments.executorId)
       val env = SparkEnv.createExecutorEnv(driverConf, arguments.executorId, arguments.bindAddress,
         arguments.hostname, arguments.cores, cfg.ioEncryptionKey, isLocal = false)
-
+      // Set the application attemptId in the BlockStoreClient is applicable.

Review comment:
       Thanks. Updated. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang edited a comment on pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
gengliangwang edited a comment on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-920868190


   @zhouyejoe Thanks for the fix. 
   I will cut 3.2.0 RC3 after this one is merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-921955984


   I'm thinking of the potential compatibility issue in the future by using the `int` type of `appAttemptId` inside `BlockStoreClinet`. Using `int` might impede the future extension of CMs (e.g., K8s, Standalone, or something new)  if they want the arbitrary attempt id to include more info. So I think it's good to keep it as a string. However, since push-based shuffle would require a comparable number value for its functionality,  we'd still require the attempt id to at least includes one comparable number value, e.g., (xxx_attempt_id_12). Then, in the case of push-based shuffle, we can just extract the number value into `int` number, which would be compatible with the current push-based shuffle protocols. 
   @mridulm @zhouyejoe WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-920670534


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on a change in pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710781414



##########
File path: core/src/main/scala/org/apache/spark/SparkEnv.scala
##########
@@ -355,6 +355,11 @@ object SparkEnv extends Logging {
       None
     }
 
+    // Set the application attemptId in the ExternalShuffleClient is applicable.
+    // If there is no attemptId assigned, set the attemptId to -1.
+    externalShuffleClient.foreach(
+      shuffleClient => shuffleClient.setAppAttemptId(conf.getInt(config.APP_ATTEMPT_ID.key, -1)))

Review comment:
       I don't think we need to change the CoarseGrainedExecutorBackend to set the attemptId. The _conf.set(APP_ATTEMPT_ID, attemptId) in SparkContext happens before the Executor starts.
   Executor will fetch the Spark confs from Driver, and Driver will respond the confs. 
   The [SparkProperties here is lazy val](https://github.com/apache/spark/blob/917d7dad4dcdbeac5094899fa9b7fffc67376cec/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L132), so it will include all the conf changes made in Driver.
   Then in Executor, after it [gets the Spark confs](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L451) from Driver, it will[ create the SparkEnv after that](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L471).
   Then the SparkEnv will have the attemptId in the conf.
   The test in our Yarn cluster runs fine, so Executor should have the attemptId set, otherwise, the push blocks from Executor will fail.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711408258



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java
##########
@@ -46,6 +46,8 @@
 
   protected volatile TransportClientFactory clientFactory;
   protected String appId;
+  // Store the application attemptId
+  protected int appAttemptId;

Review comment:
       Moved to private.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang closed pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
gengliangwang closed pull request #34018:
URL: https://github.com/apache/spark/pull/34018


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710318332



##########
File path: core/src/main/scala/org/apache/spark/SparkEnv.scala
##########
@@ -355,6 +355,11 @@ object SparkEnv extends Logging {
       None
     }
 
+    // Set the application attemptId in the ExternalShuffleClient is applicable.
+    // If there is no attemptId assigned, set the attemptId to -1.
+    externalShuffleClient.foreach(
+      shuffleClient => shuffleClient.setAppAttemptId(conf.getInt(config.APP_ATTEMPT_ID.key, -1)))

Review comment:
       Yes, we should not set it here - but along with existing `APP_ATTEMPT_ID` set.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922282272


   Late lgtm. Thanks @zhouyejoe 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710853053



##########
File path: core/src/main/scala/org/apache/spark/SparkEnv.scala
##########
@@ -355,6 +355,11 @@ object SparkEnv extends Logging {
       None
     }
 
+    // Set the application attemptId in the ExternalShuffleClient is applicable.
+    // If there is no attemptId assigned, set the attemptId to -1.
+    externalShuffleClient.foreach(
+      shuffleClient => shuffleClient.setAppAttemptId(conf.getInt(config.APP_ATTEMPT_ID.key, -1)))

Review comment:
       Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711068023



##########
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##########
@@ -583,7 +583,10 @@ class SparkContext(config: SparkConf) extends Logging {
     _applicationId = _taskScheduler.applicationId()
     _applicationAttemptId = _taskScheduler.applicationAttemptId()
     _conf.set("spark.app.id", _applicationId)
-    _applicationAttemptId.foreach(attemptId => _conf.set(APP_ATTEMPT_ID, attemptId))
+    _applicationAttemptId.foreach { attemptId =>
+      _conf.set(APP_ATTEMPT_ID, attemptId)
+      _env.blockManager.blockStoreClient.setAppAttemptId(attemptId.toInt)

Review comment:
       Thinking more, attempt id is an `int` for yarn - we dont have an expectation for other cluster managers - and we currently treat it as a String.
   +CC @Ngone51, thoughts of coercing it always to `int` in `BlockStoreClient`  and doing the `toInt` here ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711408419



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
##########
@@ -146,7 +146,7 @@ public void pushBlocks(
               assert inputListener instanceof BlockPushingListener :
                 "Expecting a BlockPushingListener, but got " + inputListener.getClass();
               TransportClient client = clientFactory.createClient(host, port);
-              new OneForOneBlockPusher(client, appId, transportConf.appAttemptId(), inputBlockId,
+              new OneForOneBlockPusher(client, appId, appAttemptId, inputBlockId,

Review comment:
       Changed to use comparableAppAttemptId, and since it is private var, so I don't choose to use the getAppAttemptId() here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711432954



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
##########
@@ -83,6 +87,34 @@ public void init(String appId) {
     clientFactory = context.createClientFactory(bootstraps);
   }
 
+  @Override
+  public void setAppAttemptId(String appAttemptId) {
+    super.setAppAttemptId(appAttemptId);
+    setComparableAppAttemptId(appAttemptId);
+  }
+
+  @Override
+  public String getAppAttemptId() {
+    return Integer.toString(getComparableAppAttemptId());
+  }
+
+  private void setComparableAppAttemptId(String appAttemptId) {
+    // For now, push based shuffle only supports running in YARN.
+    // Application attemptId in YARN is integer and it can be safely parsed
+    // to integer here. For the application attemptId from other cluster set up
+    // which is not numeric, it needs to generate this comparableAppAttemptId
+    // from the String typed appAttemptId through some other customized logic.
+    try {
+      this.comparableAppAttemptId = Integer.parseInt(appAttemptId);
+    } catch (NumberFormatException e) {
+      logger.warn("Push based shuffle requires comparable application attemptId", e);
+    }
+  }
+
+  private int getComparableAppAttemptId() {
+    return this.comparableAppAttemptId;

Review comment:
       nit: Given it is private field and private method, we can omit the method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922227605


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47941/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-921930779


   Discussed with @Ngone51.
   @zhouyejoe can we make the following change please:
   * Move toInt into setAttemptId (which takes String as param).
   * try/catch the Integer.parseInt and log warning in case it is not able to parse attempt id to string.
     * If successful, set attempt id to that value - else leave it unchanged (defaulting to -1).
   * Let us keep the attempt id in BlockStoreClient itself.
   
   Thoughts ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922142969


   I am also thinking about the same.
   
   > Discussed with @Ngone51.
   > @zhouyejoe can we make the following change please:
   > 
   > * Move toInt from SparkContext into setAttemptId (which will take it as a `String` param).
   > * try/catch the `Integer.parseInt` and log warning in case it is not able to parse attempt id to string.
   >   
   >   * If successful, set attempt id to that value - else leave it unchanged (defaulting to -1).
   > * Let us keep the attempt id in BlockStoreClient itself.
   > 
   > Thoughts ?
   
   Update: 
   1. Changed the AppAttemptId type in BlockStoreClient from Int to String. Add integer comparableAppAttemptId in ExternalBlockStoreClient, and override the getAppAttemptId and setAppAttemptId there.
   2. Try catch the String to Integer parse in setComparableAppAttemptId.
   3. The default value of comparableAppAttemptId is -1.
   4. Only setAppAttemptId in Driver or Executor when Push based shuffle is enabled.
   5. Minor change in PushBasedExternalClusterManager to return 1 for the TaskScheduler.applicationAttemptId()


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711432954



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
##########
@@ -83,6 +87,34 @@ public void init(String appId) {
     clientFactory = context.createClientFactory(bootstraps);
   }
 
+  @Override
+  public void setAppAttemptId(String appAttemptId) {
+    super.setAppAttemptId(appAttemptId);
+    setComparableAppAttemptId(appAttemptId);
+  }
+
+  @Override
+  public String getAppAttemptId() {
+    return Integer.toString(getComparableAppAttemptId());
+  }
+
+  private void setComparableAppAttemptId(String appAttemptId) {
+    // For now, push based shuffle only supports running in YARN.
+    // Application attemptId in YARN is integer and it can be safely parsed
+    // to integer here. For the application attemptId from other cluster set up
+    // which is not numeric, it needs to generate this comparableAppAttemptId
+    // from the String typed appAttemptId through some other customized logic.
+    try {
+      this.comparableAppAttemptId = Integer.parseInt(appAttemptId);
+    } catch (NumberFormatException e) {
+      logger.warn("Push based shuffle requires comparable application attemptId", e);
+    }
+  }
+
+  private int getComparableAppAttemptId() {
+    return this.comparableAppAttemptId;

Review comment:
       nit: Given it is private field, we can omit the method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922207989


   Updated. Last run PR test failed on some unrelated unit test.
   Jenkins, test this please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922226623


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47941/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922243519


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143433/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711432454



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
##########
@@ -83,6 +87,34 @@ public void init(String appId) {
     clientFactory = context.createClientFactory(bootstraps);
   }
 
+  @Override
+  public void setAppAttemptId(String appAttemptId) {
+    super.setAppAttemptId(appAttemptId);
+    setComparableAppAttemptId(appAttemptId);
+  }
+
+  @Override
+  public String getAppAttemptId() {
+    return Integer.toString(getComparableAppAttemptId());

Review comment:
       We dont need to override this

##########
File path: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
##########
@@ -470,7 +470,13 @@ private[spark] object CoarseGrainedExecutorBackend extends Logging {
       driverConf.set(EXECUTOR_ID, arguments.executorId)
       val env = SparkEnv.createExecutorEnv(driverConf, arguments.executorId, arguments.bindAddress,
         arguments.hostname, arguments.cores, cfg.ioEncryptionKey, isLocal = false)
-
+      // Set the application attemptId in the BlockStoreClient if available.
+      if (Utils.isPushBasedShuffleEnabled(env.conf)) {

Review comment:
       Same here: always set, if check not required

##########
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##########
@@ -583,7 +583,12 @@ class SparkContext(config: SparkConf) extends Logging {
     _applicationId = _taskScheduler.applicationId()
     _applicationAttemptId = _taskScheduler.applicationAttemptId()
     _conf.set("spark.app.id", _applicationId)
-    _applicationAttemptId.foreach(attemptId => _conf.set(APP_ATTEMPT_ID, attemptId))
+    if (Utils.isPushBasedShuffleEnabled(_conf)) {

Review comment:
       The `if` check is not required.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711408461



##########
File path: core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala
##########
@@ -298,7 +298,7 @@ private class CSMockExternalClusterManager extends ExternalClusterManager {
     ts = mock[TaskSchedulerImpl]
     when(ts.sc).thenReturn(sc)
     when(ts.applicationId()).thenReturn("appid1")
-    when(ts.applicationAttemptId()).thenReturn(Some("attempt1"))
+    when(ts.applicationAttemptId()).thenReturn(Some("1"))

Review comment:
       Reverted this change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710793117



##########
File path: core/src/main/scala/org/apache/spark/SparkEnv.scala
##########
@@ -355,6 +355,11 @@ object SparkEnv extends Logging {
       None
     }
 
+    // Set the application attemptId in the ExternalShuffleClient is applicable.
+    // If there is no attemptId assigned, set the attemptId to -1.
+    externalShuffleClient.foreach(
+      shuffleClient => shuffleClient.setAppAttemptId(conf.getInt(config.APP_ATTEMPT_ID.key, -1)))

Review comment:
       I know the executor can get the right attempted. My concern is that keep setting it here is wrong in terms of the driver's view. People could be confused by setting 2 times.  Besides, setting after SparkEnv created for the executor makes the behavior be consistent.  
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710344992



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java
##########
@@ -46,6 +46,8 @@
 
   protected volatile TransportClientFactory clientFactory;
   protected String appId;
+  // Store the application attemptId
+  protected int appAttemptId;

Review comment:
       `protected` -> `private` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-920670534


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyejoe commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
zhouyejoe commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-921955509


   > Discussed with @Ngone51.
   > @zhouyejoe can we make the following change please:
   > 
   > * Move toInt from SparkContext into setAttemptId (which will take it as a `String` param).
   > * try/catch the `Integer.parseInt` and log warning in case it is not able to parse attempt id to string.
   >   
   >   * If successful, set attempt id to that value - else leave it unchanged (defaulting to -1).
   > * Let us keep the attempt id in BlockStoreClient itself.
   > 
   > Thoughts ?
   
   SGTM. Will update accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [WIP][SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r710318332



##########
File path: core/src/main/scala/org/apache/spark/SparkEnv.scala
##########
@@ -355,6 +355,11 @@ object SparkEnv extends Logging {
       None
     }
 
+    // Set the application attemptId in the ExternalShuffleClient is applicable.
+    // If there is no attemptId assigned, set the attemptId to -1.
+    externalShuffleClient.foreach(
+      shuffleClient => shuffleClient.setAppAttemptId(conf.getInt(config.APP_ATTEMPT_ID.key, -1)))

Review comment:
       There are two cases here:
   * SparkEnv created in executors - for which this set is fine (which will need attempt id for block push).
   * SparkEnv created at driver - for which `APP_ATTEMPT_ID` is available much later in SparkContext initialization : we should do a `setAppAttemptId` along with setting `APP_ATTEMPT_ID` for the `externalShuffleClient` as well.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711060623



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
##########
@@ -146,7 +146,7 @@ public void pushBlocks(
               assert inputListener instanceof BlockPushingListener :
                 "Expecting a BlockPushingListener, but got " + inputListener.getClass();
               TransportClient client = clientFactory.createClient(host, port);
-              new OneForOneBlockPusher(client, appId, transportConf.appAttemptId(), inputBlockId,
+              new OneForOneBlockPusher(client, appId, appAttemptId, inputBlockId,

Review comment:
       use `getAppAttemptId`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34018:
URL: https://github.com/apache/spark/pull/34018#discussion_r711118051



##########
File path: core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala
##########
@@ -298,7 +298,7 @@ private class CSMockExternalClusterManager extends ExternalClusterManager {
     ts = mock[TaskSchedulerImpl]
     when(ts.sc).thenReturn(sc)
     when(ts.applicationId()).thenReturn("appid1")
-    when(ts.applicationAttemptId()).thenReturn(Some("attempt1"))
+    when(ts.applicationAttemptId()).thenReturn(Some("1"))

Review comment:
       This test has it as a a string and not an `int` as attempt id need not be an int for non-yarn envs; we have to preserve that flexibility.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org