You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Deshi Xiao (JIRA)" <ji...@apache.org> on 2017/04/01 08:49:41 UTC
[jira] [Comment Edited] (MESOS-7210) MESOS HTTP checks doesn't work
when mesos runs with --docker_mesos_image ( pid namespace mismatch )
[ https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950633#comment-15950633 ]
Deshi Xiao edited comment on MESOS-7210 at 4/1/17 8:49 AM:
-----------------------------------------------------------
sorry, this is my misunderstand. if add mesos_docker_image, the docker executor will spawn new container, the container is docker executor, it should be add pid=host to mapping host pid pool.
============
it difficult to fix due to the mesos agent is wrap into container, we only manually add --pid=host to the mesos-agent container, then the pid can find same pid with container inside process pid. this is not mesos fault, we prefer suggest user can use systemd to running the mesos agent instead of mesos agent container, it will benefit with developers and users each other.
was (Author: xds2000):
it difficult to fix due to the mesos agent is wrap into container, we only manually add --pid=host to the mesos-agent container, then the pid can find same pid with container inside process pid. this is not mesos fault, we prefer suggest user can use systemd to running the mesos agent instead of mesos agent container, it will benefit with developers and users each other.
> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )
> ---------------------------------------------------------------------------------------------------
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
> Issue Type: Bug
> Components: docker
> Affects Versions: 1.1.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers spawned by marathon 1.4.1
> Reporter: Wojciech Sielski
> Assignee: haosdent
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos --containerizers=docker,mesos --executor_registration_timeout=5mins --hostname=standalone --ip=0.0.0.0 --docker_stop_timeout=5secs --gc_delay=1days --docker_socket=/var/run/docker.sock --no-systemd_enable_support --work_dir=/tmp/mesos --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
> net: host
> privileged: true
> pid: host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
> "id": "python-example-stable",
> "cmd": "python3 -m http.server 8080",
> "mem": 16,
> "cpus": 0.1,
> "instances": 2,
> "container": {
> "type": "DOCKER",
> "docker": {
> "image": "python:alpine",
> "network": "BRIDGE",
> "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
> ]
> }
> },
> "env": {
> "SERVICE_NAME" : "python"
> },
> "healthChecks": [
> {
> "path": "/",
> "portIndex": 0,
> "protocol": "MESOS_HTTP",
> "gracePeriodSeconds": 30,
> "intervalSeconds": 10,
> "timeoutSeconds": 30,
> "maxConsecutiveFailures": 3
> }
> ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.844293 35 health_checker.cpp:94] Failed to enter the net namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d google::LogMessage::Fail()
> @ 0x7f51770b29d0 google::LogMessage::SendToLog()
> @ 0x7f51770b0803 google::LogMessage::Flush()
> @ 0x7f51770b33f9 google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46 _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167 process::internal::cloneChild()
> @ 0x7f5177065c32 process::subprocess()
> @ 0x7f5176481a9d mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7 mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c process::ProcessBase::visit()
> @ 0x7f517702c8b3 process::ProcessManager::resume()
> @ 0x7f517702fb77 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80 (unknown)
> @ 0x7f5174cf06ba start_thread
> @ 0x7f5174a2682d (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is not using "pid host" option same as mother container was started, but has his own PID namespace (so it doesn't matter if mother container was started with "pid host" or not it will never be able to find PID)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)