You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/31 14:56:51 UTC

[GitHub] [spark] holdenk commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

holdenk commented on pull request #29211:
URL: https://github.com/apache/spark/pull/29211#issuecomment-667163174


   So we don’t reject tasks sent to us and an executor can start
   decommissioning without the driver knowing yet so it is possible (although
   unlikely) to get a new task after we’ve started decommissioning.
   
   
   
   On Fri, Jul 31, 2020 at 6:58 AM Attila Zsolt Piros <no...@github.com>
   wrote:
   
   > *@attilapiros* commented on this pull request.
   > ------------------------------
   >
   > In
   > core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala
   > <https://github.com/apache/spark/pull/29211#discussion_r463627379>:
   >
   > > @@ -327,4 +354,28 @@ private[storage] class BlockManagerDecommissioner(
   >      }
   >      logInfo("Stopped storage decommissioner")
   >    }
   > +
   > +  /*
   > +   *  Returns the last migration time and a boolean for if all blocks have been migrated.
   > +   *  If there are any tasks running since that time the boolean may be incorrect.
   > +   */
   > +  private[storage] def lastMigrationInfo(): (Long, Boolean) = {
   > +    if (stopped || (stoppedRDD && stoppedShuffle)) {
   > +      (System.nanoTime(), true)
   > +    } else {
   > +      // Chose the min of the running times.
   > +      val lastMigrationTime = if (
   > +        conf.get(config.STORAGE_DECOMMISSION_SHUFFLE_BLOCKS_ENABLED) &&
   > +        conf.get(config.STORAGE_DECOMMISSION_RDD_BLOCKS_ENABLED)) {
   > +        Math.min(lastRDDMigrationTime, lastShuffleMigrationTime)
   >
   > Now I am starting to get this part. Can we try to simplify this? :)
   >
   > On a decommissioned executor there will be no new tasks started, right?
   > (Theoretically one task could be started as the scheduler does not
   > processed the DecommissionExecutor message but let's take this corner
   > case out as we can have this first 1 sec sleep which more or less handles
   > this).
   >
   > So when executor.numRunningTasks will be 0 it stays 0. As I remember
   > caching is part of task running so no new cached blocks will be created
   > when numRunningTasks=0.
   >
   > So would it work if we first wait to reach numRunningTasks==0 in a sleep
   > check loop and then we would check for the migration finished flag without
   > using the time part of lastMigrationInfo?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/spark/pull/29211#discussion_r463627379>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAAOT5I4FIQYBIEAGKJPETTR6LEXXANCNFSM4PGENWSA>
   > .
   >
   -- 
   Cell : 425-233-8271
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org