You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by GitBox <gi...@apache.org> on 2021/09/11 09:20:05 UTC

[GitHub] [mesos] asekretenko commented on a change in pull request #408: Fixed an agent crash in case of duplicate task ID.

asekretenko commented on a change in pull request #408:
URL: https://github.com/apache/mesos/pull/408#discussion_r706588131



##########
File path: src/slave/slave.cpp
##########
@@ -3191,7 +3191,17 @@ void Slave::__run(
     if (taskGroup.isNone() && task->has_command()) {
       // We are dealing with command task; a new command executor will be
       // launched.
-      CHECK(executor == nullptr);
+      // It is possible for an executor with this ID to already exist, if the
+      // TaskID was re-used - see MESOS-9657. If this happens, we have no
+      // choice but to drop the task.
+      if (executor != nullptr) {
+        sendTaskDroppedUpdate(
+            TaskStatus::REASON_TASK_INVALID,
+            "Master wants to launch executor, but one already exists "

Review comment:
       Looks like in some/many cases it is the framework which is responsible for creating this situation? Prehaps something not attributing the error explicitly to the master could be better, like "Cannot reuse an already existing executor for a command task" ?
   
   Or do we have a similar check in master (unreliable, as master is not the source of truth about executors), and this only happens when the master is not aware that an executor with this ID already exists?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@mesos.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org