You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "James Peach (JIRA)" <ji...@apache.org> on 2018/01/12 18:24:00 UTC

[jira] [Created] (MESOS-8440) `network/ports` isolator kills legitimate tasks on recovery.

James Peach created MESOS-8440:
----------------------------------

             Summary: `network/ports` isolator kills legitimate tasks on recovery.
                 Key: MESOS-8440
                 URL: https://issues.apache.org/jira/browse/MESOS-8440
             Project: Mesos
          Issue Type: Bug
          Components: containerization
    Affects Versions: 1.5.0
            Reporter: James Peach
            Assignee: James Peach


At recovery time, the containerizer sends all the resources *except* the ports. This means that the ports check will race against the subsequent resources update. The root cause of this is that only the executor resources are provided at recovery time, whereas at update time the isolator gets the whole container resources as calculated by {{Executor::allocatedResources()}}.

{noformat}
I0112 08:22:23.930830 28937 linux_launcher.cpp:300] Recovered container 80a2d9dc-0492-4af5-a131-05f1cd66d672
I0112 08:22:23.931637 28933 ports.cpp:398] recovering container executor_info {
  executor_id {
    value: "fff42f68-4aed-4ca6-a62f-71b7166bbd7a"
  }
  resources {
    name: "cpus"
    type: SCALAR
    scalar {
      value: 0.1
    }
    allocation_info {
      role: "*"
    }
  }
  resources {
    name: "mem"
    type: SCALAR
    scalar {
      value: 32
    }
    allocation_info {
      role: "*"
    }
  }
  command {
    value: "/home/jpeach/src/mesos/build/src/mesos-executor"
    shell: false
    arguments: "mesos-executor"
    arguments: "--launcher_dir=/home/jpeach/src/mesos/build/src"
  }
  framework_id {
    value: "4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000"
  }
  name: "Command Executor (Task: fff42f68-4aed-4ca6-a62f-71b7166bbd7a) (Command: sh -c \'nc -k -l 31446\')"
  source: "fff42f68-4aed-4ca6-a62f-71b7166bbd7a"
}
container_id {
  value: "80a2d9dc-0492-4af5-a131-05f1cd66d672"
}
pid: 28955
directory: "/tmp/NetworkPortsIsolatorTest_ROOT_NC_RecoverGoodTask_eTlVKl/slaves/4ad59c30-7b1e-4991-bda2-e7f9275d3693-S0/frameworks/4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000/executors/fff42f68-4aed-4ca6-a62f-71b7166bbd7a/runs/80a2d9dc-0492-4af5-a131-05f1cd66d672"
I0112 08:22:23.932137 28933 ports.cpp:530] Updated ports to [] for container 80a2d9dc-0492-4af5-a131-05f1cd66d672
I0112 08:22:23.932982 28937 provisioner.cpp:493] Provisioner recovery complete
I0112 08:22:23.933924 28928 slave.cpp:6581] Sending reconnect request to executor 'fff42f68-4aed-4ca6-a62f-71b7166bbd7a' of framework 4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000 at executor(1)@17.228.224.108:42187
I0112 08:22:23.934587 28957 exec.cpp:282] Received reconnect request from agent 4ad59c30-7b1e-4991-bda2-e7f9275d3693-S0
I0112 08:22:23.935724 28931 slave.cpp:4426] Received re-registration message from executor 'fff42f68-4aed-4ca6-a62f-71b7166bbd7a' of framework 4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000
I0112 08:22:23.936646 28967 exec.cpp:259] Executor re-registered on agent 4ad59c30-7b1e-4991-bda2-e7f9275d3693-S0
I0112 08:22:23.936820 28929 ports.cpp:530] Updated ports to [31446-31446] for container 80a2d9dc-0492-4af5-a131-05f1cd66d672
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)