You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Jay Taylor <ou...@gmail.com> on 2015/10/07 03:49:12 UTC

Can health-checks be run by Mesos for docker tasks?

Does Mesos support health checks for docker image tasks?  Mesos seems to be
ignoring the TaskInfo.HealthCheck field for me.

Example TaskInfo JSON received back from Mesos:

{
>
>   "name":"hello-app.web.v3",
>
>   "task_id":{
>
>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>
>   },
>
>   "slave_id":{
>
>     "value":"20150924-210922-1608624320-5050-1792-S1"
>
>   },
>
>   "resources":[
>
>     {
>
>       "name":"cpus",
>
>       "type":0,
>
>       "scalar":{
>
>         "value":0.1
>
>       }
>
>     },
>
>     {
>
>       "name":"mem",
>
>       "type":0,
>
>       "scalar":{
>
>         "value":256
>
>       }
>
>     },
>
>     {
>
>       "name":"ports",
>
>       "type":1,
>
>       "ranges":{
>
>         "range":[
>
>           {
>
>             "begin":31002,
>
>             "end":31002
>
>           }
>
>         ]
>
>       }
>
>     }
>
>   ],
>
>   "command":{
>
>     "container":{
>
>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>
>     },
>
>     "shell":false
>
>   },
>
>   "container":{
>
>     "type":1,
>
>     "docker":{
>
>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>
>       "network":2,
>
>       "port_mappings":[
>
>         {
>
>           "host_port":31002,
>
>           "container_port":8000,
>
>           "protocol":"tcp"
>
>         }
>
>       ],
>
>       "privileged":false,
>
>       "parameters":[],
>
>       "force_pull_image":false
>
>     }
>
>   },
>
>   "health_check":{
>
>     "delay_seconds":5,
>
>     "interval_seconds":10,
>
>     "timeout_seconds":10,
>
>     "consecutive_failures":3,
>
>     "grace_period_seconds":0,
>
>     "command":{
>
>       "shell":true,
>
>       "value":"sleep 5",
>
>       "user":"root"
>
>     }
>
>   }
>
> }
>
>
I have searched all machines and containers to see if they ever run the
command (in this case `sleep 5`), but have not found any indication that it
is being executed.

In the mesos src code the health-checks are invoked from
src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
mean that health-checks are only supported for custom executors and not for
docker tasks?

What I am trying to accomplish is to have the 0/non-zero exit-status of a
health-check command translate to task health.

Thanks!
Jay

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
(sending this again because my last transmission appears to have come from
the wrong email address and been rejected by the list, sorry for the noise!)

Hi Marco et. al.,

My reply is inline below-



On Mon, Oct 12, 2015 at 2:27 PM, Marco Massenzio <ma...@mesosphere.io>
wrote:

>
> On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <ma...@mesosphere.io>
> wrote:
>
>> Are those the stdout logs of the Agent? Because I don't see the
>> --launcher-dir set, however, if I look into one that is running off the
>> same 0.24.1 package, this is what I see:
>>
>> I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup:
>> --appc_store_dir="/tmp/mesos/store/appc"
>> --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5"
>> --cgroups_cpu_enable_pids_and_tids_count="false"
>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>> --enforce_container_disk_quota="false"
>> --executor_registration_timeout="1mins"
>> --executor_shutdown_grace_period="5secs"
>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>> --hadoop_home="" --help="false" --initialize_driver_logging="true"
>> --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem"
>> --launcher_dir="/usr/libexec/mesos"
>> --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0"
>> --logging_level="INFO" --master="zk://192.168.33.1:2181/mesos/vagrant"
>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>> --registration_backoff_factor="1secs"
>> --resource_monitoring_interval="1secs"
>> --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]"
>> --revocable_cpu_low_priority="true"
>> --sandbox_directory="/var/local/sandbox" --strict="true"
>> --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
>> (this is run off the Vagrantfile at [0] in case you want to reproduce).
>> That agent is not run via the init command, though, I execute it manually
>> via the `run-agent.sh` in the same directory.
>>
>> I don't really think this matters, but I assume you also restarted the
>> agent after making the config changes?
>> (and, for your own sanity - you can double check the version by looking
>> at the very head of the logs).
>>
>
Yes I definitely restarted all mesos processes after config changes; in
fact I've become quite adept at this cycle ;)

Here's equivalent info to what you posted from one of the slaves' INFO log
in my cluster:

Log file created at: 2015/10/12 20:22:58
> Running on machine: mesos-worker2a
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> I1012 20:22:58.469779  9605 logging.cpp:172] INFO level logging started!
> I1012 20:22:58.470006  9605 main.cpp:185] Build: 2015-09-25 19:13:24 by
> root
> I1012 20:22:58.470023  9605 main.cpp:187] Version: 0.24.1
> I1012 20:22:58.470031  9605 main.cpp:190] Git tag: 0.24.1
> I1012 20:22:58.470039  9605 main.cpp:194] Git SHA:
> 44873806c2bb55da37e9adbece938274d8cd7c48
> I1012 20:22:58.470221  9605 containerizer.cpp:143] Using isolation:
> posix/cpu,posix/mem,filesystem/posix
> I1012 20:22:58.573750  9605 main.cpp:272] Starting Mesos slave
> I1012 20:22:58.574662  9621 slave.cpp:190] Slave started on 1)@
> 192.168.225.59:5050
> I1012 20:22:58.574695  9621 slave.cpp:191] Flags at startup:
> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
> --cgroups_cpu_enable_pids_and_tids_count="false"
> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
> --cgroups_limit_swap="false" --cgroups_root="mesos"
> --container_disk_watch_interval="15secs" --containerizers="mesos,docker"
> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
> --enforce_container_disk_quota="false"
> --executor_registration_timeout="5mins"
> --executor_shutdown_grace_period="5secs"
> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
> --hadoop_home="" --help="false" --hostname="mesos-worker2a"
> --initialize_driver_logging="true" --ip="192.168.225.59"
> --isolation="posix/cpu,posix/mem" --*launcher_dir="/usr/libexec/mesos"* --log_dir="/var/log/mesos"
> --logbufsecs="0" --logging_level="INFO"
> --master="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
> --perf_interval="1mins" --port="5050" --qos_correction_interval_min="0ns"
> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
> --registration_backoff_factor="1secs"
> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
> --switch_user="true" --version="false" --work_dir="/tmp/mesos"


The launcher dir is picked up by the mesos-slave process.  We can also see
the cmdline flag is picked up from /etc/mesos-slave like this:

mesos-worker2a:~$ ps -ef | grep mesos
> root      9605     1  1 20:22 ?        00:01:18 /usr/sbin/mesos-slave
> --ip=192.168.225.59 --log_dir=/var/log/mesos --
> *launcher_dir=/usr/libexec/mesos*root      9612  9605  0 20:22 ?
>  00:00:00 logger -p user.info -t mesos-slave[9605]
> root      9613  9605  0 20:22 ?        00:00:00 logger -p user.err -t
> mesos-slave[9605]
> vagrant   9951  6010  0 21:36 pts/0    00:00:00 grep --color=auto mesos



What I keep coming back to is the fact that the MESOS_LAUNCHER_DIR env var
does not seem get picked up here:
https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L573-L576
:

  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>   string path =
>     envPath.isSome() ? envPath.get()
>                      : os::realpath(Path(argv[0]).dirname()).get();


And argv[0] (which contains the slave work dir) is the path we see in the
tasks stdout.

I'm still having trouble understanding how flags defined in
mesos::internal::slave::Flags::Flags (src/slave/flags.[ch]pp) are
propagated or expanded to MESOS_<flag_in_caps> environment variables.  Can
you confirm if such a mechanism exists and if so where it is?

Otherwise, if my understanding is correct and such a mechanism doesn't
exist:

How can the requisite MESOS_LAUNCHER_DIR env var be available when
docker/executor.cpp (a child process of mesos-slave) attempts to read it?

The lack of such a mechanism would explain the behavior I'm currently
observing.

Thanks!
Jay


>
>>
>> [0] http://github.com/massenz/zk-mesos
>
>>
>>
>>
>>
>> --
>> *Marco Massenzio*
>> Distributed Systems Engineer
>> http://codetrips.com
>>
>> On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> Hi Haosdent and Mesos friends,
>>>
>>> I've rebuilt the cluster from scratch and installed mesos 0.24.1 from
>>> the mesosphere apt repo:
>>>
>>> $ dpkg -l | grep mesos
>>> ii  mesos                               0.24.1-0.2.35.ubuntu1404
>>>    amd64        Cluster resource manager with efficient resource isolation
>>>
>>> Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir on
>>> the slaves:
>>>
>>> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
>>> /usr/libexec/mesos
>>>
>>> And yet the task health-checks are still being launched from the sandbox
>>> directory like before!
>>>
>>> I've also tested setting the MESOS_LAUNCHER_DIR env var and get the
>>> identical result (just as before on the cluster where many versions of
>>> mesos had been installed):
>>>
>>> STDOUT:
>>>
>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>> --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --stop_timeout="0ns"
>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>> --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --stop_timeout="0ns"
>>>> Registered docker executor on mesos-worker1a
>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>> Launching health check process:
>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
>>>> --executor=(1)@192.168.225.58:48912
>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>> 127.0.0.1:8000
>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>> Health check process launched at pid: 11253
>>>
>>>
>>>
>>> STDERR:
>>>
>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>> --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --stop_timeout="0ns"
>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>> --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --stop_timeout="0ns"
>>>> Registered docker executor on mesos-worker1a
>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>> *Launching health check process:
>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
>>>> --executor=(1)@192.168.225.58:48912
>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>> 127.0.0.1:8000
>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>> Health check process launched at pid: 11253
>>>
>>>
>>> Any ideas on where to go from here?  Is there any additional information
>>> I can provide?
>>>
>>> Thanks as always,
>>> Jay
>>>
>>>
>>> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <ha...@gmail.com> wrote:
>>>
>>>> For flag sent to the executor from containerizer, the flag would
>>>> stringify and become a command line parameter when launch executor.
>>>>
>>>> You could see this in
>>>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>>>
>>>> But for launcher_dir, the executor get it from `argv[0]`, as you
>>>> mentioned above.
>>>> ```
>>>>   string path =
>>>>     envPath.isSome() ? envPath.get()
>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>
>>>> ```
>>>> So I want to figure out why your argv[0] would become sandbox dir, not
>>>> "/usr/libexec/mesos".
>>>>
>>>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <ou...@gmail.com>
>>>> wrote:
>>>>
>>>>> I see.  And then how are the flags sent to the executor?
>>>>>
>>>>>
>>>>>
>>>>> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>> Yes. The related code is located in
>>>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>>>
>>>>> In fact, environment variables starts with MESOS_ would load as flags
>>>>> variables.
>>>>>
>>>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>>>
>>>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> One question for you haosdent-
>>>>>>
>>>>>> You mentioned that the flags.launcher_dir should propagate to the
>>>>>> docker executor all the way up the chain.  Can you show me where this logic
>>>>>> is in the codebase?  I didn't see where that was happening and would like
>>>>>> to understand the mechanism.
>>>>>>
>>>>>> Thanks!
>>>>>> Jay
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>
>>>>>> Maybe tomorrow I will build a fresh cluster from scratch to see if
>>>>>> the broken behavior experienced today still persists.
>>>>>>
>>>>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>
>>>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set
>>>>>> flags.launcher_dir which would find mesos-docker-executor
>>>>>> and mesos-health-check under this dir. Although the env is not propagated,
>>>>>> but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is get
>>>>>> from it.
>>>>>>
>>>>>> For example, because I
>>>>>> ```
>>>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>>>> ```
>>>>>> before start mesos-slave. So when I launch slave, I could find this
>>>>>> log in slave log
>>>>>> ```
>>>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>>>>> xxxxx  --launcher_dir="/tmp"
>>>>>> ```
>>>>>>
>>>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become
>>>>>> sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other
>>>>>> scripts?
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>>>>>
>>>>>>> I just tried setting both the env var and flag on the slaves, and
>>>>>>> have determined that the env var is not present when it is being checked
>>>>>>> src/docker/executor.cpp @ line 573:
>>>>>>>
>>>>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>   string path =
>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>>>>
>>>>>>>
>>>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
>>>>>>> propagated along up to the point of mesos-slave launch):
>>>>>>>
>>>>>>> $ cat /etc/default/mesos-slave
>>>>>>>> export
>>>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>>>> export MESOS_PORT="5050"
>>>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>>>
>>>>>>>
>>>>>>> TASK OUTPUT:
>>>>>>>
>>>>>>>
>>>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>> Launching health check process:
>>>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>>>>> --executor=(1)@192.168.225.59:44523
>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>>>>> sh -c \" \/bin\/bash
>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>> Health check process launched at pid: 2519
>>>>>>>
>>>>>>>
>>>>>>> The env var is not propagated when the docker executor is launched
>>>>>>> in src/slave/containerizer/docker.cpp around line 903:
>>>>>>>
>>>>>>>   vector<string> argv;
>>>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>>>   // Construct the mesos-docker-executor using the "name" we gave
>>>>>>>> the
>>>>>>>>   // container (to distinguish it from Docker containers not created
>>>>>>>>   // by Mesos).
>>>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>>>       argv,
>>>>>>>>       Subprocess::PIPE(),
>>>>>>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>>>>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>>>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>>>>>       environment,
>>>>>>>>       lambda::bind(&setup, container->directory));
>>>>>>>
>>>>>>>
>>>>>>> A little ways above we can see the environment is setup w/ the
>>>>>>> container tasks defined env vars.
>>>>>>>
>>>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>>>
>>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>>>            container->executor.command().environment().variables())
>>>>>>>> {
>>>>>>>>     environment[variable.name()] = variable.value();
>>>>>>>>   }
>>>>>>>
>>>>>>>
>>>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>
>>>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>>>> 0.24.1 should be works.
>>>>>>>>
>>>>>>>> >Do any of you know which host the path
>>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>>>>>> failing.
>>>>>>>>
>>>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before?
>>>>>>>> We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the
>>>>>>>> same dir of mesos-docker-executor.
>>>>>>>>
>>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Maybe I spoke too soon.
>>>>>>>>>
>>>>>>>>> Now the checks are attempting to run, however the STDERR is not
>>>>>>>>> looking good.  I've added some debugging to the error message output to
>>>>>>>>> show the path, argv, and envp variables:
>>>>>>>>>
>>>>>>>>> STDOUT:
>>>>>>>>>
>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>> Starting task
>>>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>> Launching health check process:
>>>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>>>>> sh -c \" exit 1
>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>> Health check process launched at pid: 3012
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> STDERR:
>>>>>>>>>
>>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on
>>>>>>>>>> slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>>> memory limited without swap.
>>>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in
>>>>>>>>>> childMain
>>>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700)
>>>>>>>>>> from PID 3012; stack trace: ***
>>>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>>>> @ 0x41921c _Abort()
>>>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>>>> @ 0x43cc9c
>>>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>>>> @ 0x7f4a39d92827
>>>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence
>>>>>>>>> execution failing.
>>>>>>>>>
>>>>>>>>> This is with current master, git hash
>>>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>>>
>>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Jay
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Update:
>>>>>>>>>>
>>>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile
>>>>>>>>>> and package the latest master (0.26.x) and deployed it to the cluster, and
>>>>>>>>>> now health checks are working as advertised in both Marathon and my own
>>>>>>>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>>>>>>
>>>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Jay
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Haosdent,
>>>>>>>>>>>
>>>>>>>>>>> Can you share your Marathon POST request that results in Mesos
>>>>>>>>>>> executing the health checks?
>>>>>>>>>>>
>>>>>>>>>>> Since we can reference the Marathon framework, I've been doing
>>>>>>>>>>> some digging around.
>>>>>>>>>>>
>>>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>>>
>>>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>>>
>>>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's
>>>>>>>>>>> dependencies
>>>>>>>>>>>
>>>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to
>>>>>>>>>>> /tmp/X in both the TaskFactory as well an right before the task is sent to
>>>>>>>>>>> Mesos via driver.launchTasks:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>>>
>>>>>>>>>>> $ git diff
>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>>>
>>>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>          CreatedTask(
>>>>>>>>>>>>            taskInfo,
>>>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>>>
>>>>>>>>>>> $ git diff
>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") {
>>>>>>>>>>>> driver =>
>>>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>>>> +      var i = 0
>>>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() +
>>>>>>>>>>>> "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>> +      }
>>>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>>>>>> taskInfos.asJava)
>>>>>>>>>>>>      }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Then I built and deployed the hacked Marathon and restarted the
>>>>>>>>>>> marathon service.
>>>>>>>>>>>
>>>>>>>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>>>>>>>> container with a simple hello-world ruby app running on
>>>>>>>>>>> 0.0.0.0:8000)
>>>>>>>>>>>
>>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST
>>>>>>>>>>>> -H'Content-Type: application/json' -d'
>>>>>>>>>>>> {
>>>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>>>   "apps": [
>>>>>>>>>>>>     {
>>>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>>>       "container": {
>>>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>>>         "docker": {
>>>>>>>>>>>>           "image":
>>>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>>>           "portMappings": [
>>>>>>>>>>>>             {
>>>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>>>             }
>>>>>>>>>>>>           ]
>>>>>>>>>>>>         }
>>>>>>>>>>>>       },
>>>>>>>>>>>>       "env": {
>>>>>>>>>>>>
>>>>>>>>>>>>       },
>>>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>>>         }
>>>>>>>>>>>>       ],
>>>>>>>>>>>>       "instances": 1,
>>>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>>>       "mem": 512
>>>>>>>>>>>>     }
>>>>>>>>>>>>   ]
>>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> $ ls /tmp/
>>>>>>>>>>>>
>>>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>
>>>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Do they match?
>>>>>>>>>>>
>>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yes, so I am confident this is the information being sent across
>>>>>>>>>>> the wire to Mesos.
>>>>>>>>>>>
>>>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>>>
>>>>>>>>>>> $ cat
>>>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>> {
>>>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>
>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>   },
>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>   },
>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>     {
>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>         "value":1.0
>>>>>>>>>>>>       },
>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>     },
>>>>>>>>>>>>     {
>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>         "value":512.0
>>>>>>>>>>>>       },
>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>     },
>>>>>>>>>>>>     {
>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>           {
>>>>>>>>>>>>             "begin":31641,
>>>>>>>>>>>>             "end":31641
>>>>>>>>>>>>           }
>>>>>>>>>>>>         ]
>>>>>>>>>>>>       },
>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>     }
>>>>>>>>>>>>   ],
>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>     "environment":{
>>>>>>>>>>>>       "variables":[
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>>>
>>>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>>>
>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>         }
>>>>>>>>>>>>       ]
>>>>>>>>>>>>     },
>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>   },
>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>
>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>         }
>>>>>>>>>>>>       ],
>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>     }
>>>>>>>>>>>>   }
>>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>>>
>>>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>>>
>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>>> Starting task
>>>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> And STDERR:
>>>>>>>>>>>
>>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered
>>>>>>>>>>>> on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>>>>> memory limited without swap.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>>>
>>>>>>>>>>> Any ideas of other things to try or what I could be missing?
>>>>>>>>>>> Can't say either way about the Mesos health-check system working or not if
>>>>>>>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for all your help!
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Jay
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Maybe you could post your executor stdout/stderr so that we
>>>>>>>>>>>> could know whether health check running not.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> marathon also use mesos health check. When I use health check,
>>>>>>>>>>>>> I could saw the log like this in executor stdout.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>>>> ```
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <
>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am using my own framework, and the full task info I'm using
>>>>>>>>>>>>>> is posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>>>>>>>>>>> Mesos's health checks for its health check system?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, launch the health task through its definition in
>>>>>>>>>>>>>> taskinfo. Do you launch your task through Marathon? I could test it in my
>>>>>>>>>>>>>> side.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <
>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Precisely, and there are none of those statements.  Are you
>>>>>>>>>>>>>>> or others confident health-checks are part of the code path when defined
>>>>>>>>>>>>>>> via task info for docker container tasks?  Going through the code, I wasn't
>>>>>>>>>>>>>>> able to find the linkage for anything other than health-checks triggered
>>>>>>>>>>>>>>> through a custom executor.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With that being said it is a pretty good sized code base and
>>>>>>>>>>>>>>> I'm not very familiar with it, so my analysis this far has by no means been
>>>>>>>>>>>>>>> exhaustive.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When health check launch, it would have a log like this in
>>>>>>>>>>>>>>> your executor stdout
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in
>>>>>>>>>>>>>>>> the logs with the string "health" or "Health" if the health-check were
>>>>>>>>>>>>>>>> active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether
>>>>>>>>>>>>>>>> could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <
>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let
>>>>>>>>>>>>>>>>>>> me double check.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll
>>>>>>>>>>>>>>>>>>>> look there :)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to
>>>>>>>>>>>>>>>>>>>>> test it out?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks
>>>>>>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker exec with the
>>>>>>>>>>>>>>>>>>>>>> command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image
>>>>>>>>>>>>>>>>>>>>>> tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if
>>>>>>>>>>>>>>>>>>>>>> they ever run the command (in this case `sleep 5`), but have not found any
>>>>>>>>>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked
>>>>>>>>>>>>>>>>>>>>>> from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does
>>>>>>>>>>>>>>>>>>>>>> this mean that health-checks are only supported for custom executors and
>>>>>>>>>>>>>>>>>>>>>> not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the
>>>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
I've rebuilt with your patch and confirmed that all previously failing
health-check configurations now work:

[OK] Using launcher_dir flag
[OK] Using MESOS_LAUNCHER_DIR environment variable
[OK] Not setting the flag or variable, health-checks now launch fine!

Thanks Haosdent et. al.!

Best,
Jay

On Thu, Oct 15, 2015 at 10:59 PM, haosdent <ha...@gmail.com> wrote:

> Yes, you could see my patch in
> https://issues.apache.org/jira/browse/MESOS-3738 . I also upload patch
> for other versions in attachments. you could download it and use "patch -p1
> < MESOS-3738-xxx.patch" to try again in your local.
>
> On Fri, Oct 16, 2015 at 1:51 PM, Jay Taylor <ou...@gmail.com> wrote:
>
>> Hey Haosdent,
>>
>> Thanks for following up!  Glad to hear that others have reproduced the
>> issue and it's not just me.
>>
>> It's too bad that both the launcher dir flag and argv[0] are broken and
>> not caught by unit-tests.  With that being said I completely understand and
>> empathize with the devs about how these kinds of things happen.
>>
>> Thanks again for all your help!
>>
>> Best,
>> Jay
>>
>> Btw health checks still do not work for me even if I set the
>> MESOS_LAUNCHER_DIR env var to /usr/libexec/mesos.  Have you tried using it
>> for health checks with a docker container with the latest HEAD of the
>> master branch?  It so, was the var picked up by the health checker for you?
>>
>> I've resorted to using a hacked build with the path hard coded for the
>> time being.
>>
>>
>>
>> On Oct 15, 2015, at 10:29 PM, haosdent <ha...@gmail.com> wrote:
>>
>> Hi, Jay I have to say sorry for you. When I build the docker image for
>> you, I found the problem for launcher_dir.
>> https://issues.apache.org/jira/browse/MESOS-3738
>>
>> On Tue, Oct 13, 2015 at 10:12 AM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> Sure, I'm game.
>>>
>>> On Mon, Oct 12, 2015 at 7:11 PM, haosdent <ha...@gmail.com> wrote:
>>>
>>>> I think I could provide you a docker image later to run mesos master
>>>> and agent, so that we could debug this problem and find the cause more
>>>> easier.
>>>> On Oct 13, 2015 6:46 AM, "Jay Taylor" <ou...@gmail.com> wrote:
>>>>
>>>>> Ah ha, I see now that the permissions are fine - just needed to click
>>>>> "Create" instead of the arrow.  Oh JIRA.. :)
>>>>>
>>>>> On Mon, Oct 12, 2015 at 3:26 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>>
>>>>>> Hi Marco,
>>>>>>
>>>>>> What a relief!
>>>>>>
>>>>>> I'd love to file the JIRA ticket for this, but I don't think my
>>>>>> account has permissions over on
>>>>>> https://issues.apache.org/jira/browse/MESOS.  I am "jaytaylor" over
>>>>>> there.  Please let me know if you can help with that and we can get the
>>>>>> ball rolling on this.
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 12, 2015 at 3:14 PM, Marco Massenzio <marco@mesosphere.io
>>>>>> > wrote:
>>>>>>
>>>>>>> Jay:
>>>>>>>
>>>>>>> you hit the nail on the head: the direction is definitely one-way
>>>>>>> (from MESOS_ENV var to Flag) and we don't reflect --flag back into the
>>>>>>> MESOS_FLAG env var.
>>>>>>> Others more familiar with the matter may correct me, but it looks
>>>>>>> like you have uncovered a bug in the executor code: could you please file a
>>>>>>> Jira for us to look into?
>>>>>>>
>>>>>>> It seems to me that, at present, the only workaround is for you
>>>>>>> would be to set the MESOS_LAUNCHER_DIR env var, as the flag won't be picked
>>>>>>> by the executor.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Marco Massenzio*
>>>>>>> Distributed Systems Engineer
>>>>>>> http://codetrips.com
>>>>>>>
>>>>>>> On Mon, Oct 12, 2015 at 11:44 PM, Jay Taylor <ja...@jaytaylor.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Marco,
>>>>>>>>
>>>>>>>> My reply is inline below-
>>>>>>>>
>>>>>>>> On Mon, Oct 12, 2015 at 2:27 PM, Marco Massenzio <
>>>>>>>> marco@mesosphere.io> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <
>>>>>>>>> marco@mesosphere.io> wrote:
>>>>>>>>>
>>>>>>>>>> Are those the stdout logs of the Agent? Because I don't see the
>>>>>>>>>> --launcher-dir set, however, if I look into one that is running off the
>>>>>>>>>> same 0.24.1 package, this is what I see:
>>>>>>>>>>
>>>>>>>>>> I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup:
>>>>>>>>>> --appc_store_dir="/tmp/mesos/store/appc"
>>>>>>>>>> --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5"
>>>>>>>>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>>>>>>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>>>>>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>>>>>>>> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
>>>>>>>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>>>>>>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>>>>>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>>>>>>>> --enforce_container_disk_quota="false"
>>>>>>>>>> --executor_registration_timeout="1mins"
>>>>>>>>>> --executor_shutdown_grace_period="5secs"
>>>>>>>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>>>>>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>>>>>>>> --hadoop_home="" --help="false" --initialize_driver_logging="true"
>>>>>>>>>> --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem"
>>>>>>>>>> --launcher_dir="/usr/libexec/mesos"
>>>>>>>>>> --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0"
>>>>>>>>>> --logging_level="INFO" --master="zk://
>>>>>>>>>> 192.168.33.1:2181/mesos/vagrant"
>>>>>>>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>>>>>>>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
>>>>>>>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>>>>>>>> --registration_backoff_factor="1secs"
>>>>>>>>>> --resource_monitoring_interval="1secs"
>>>>>>>>>> --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]"
>>>>>>>>>> --revocable_cpu_low_priority="true"
>>>>>>>>>> --sandbox_directory="/var/local/sandbox" --strict="true"
>>>>>>>>>> --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
>>>>>>>>>>
>>>>>>>>> (this is run off the Vagrantfile at [0] in case you want to
>>>>>>>>>> reproduce).
>>>>>>>>>> That agent is not run via the init command, though, I execute it
>>>>>>>>>> manually via the `run-agent.sh` in the same directory.
>>>>>>>>>>
>>>>>>>>>> I don't really think this matters, but I assume you also
>>>>>>>>>> restarted the agent after making the config changes?
>>>>>>>>>> (and, for your own sanity - you can double check the version by
>>>>>>>>>> looking at the very head of the logs).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yes I definitely restarted all mesos processes after config changes
>>>>>>>> :)
>>>>>>>>
>>>>>>>> Here s info equivalent to what you posted from one of the slaves
>>>>>>>> INFO log:
>>>>>>>>
>>>>>>>> Log file created at: 2015/10/12 20:22:58
>>>>>>>>> Running on machine: mesos-worker2a
>>>>>>>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>>>>>>>> I1012 20:22:58.469779  9605 logging.cpp:172] INFO level logging
>>>>>>>>> started!
>>>>>>>>> I1012 20:22:58.470006  9605 main.cpp:185] Build: 2015-09-25
>>>>>>>>> 19:13:24 by root
>>>>>>>>> I1012 20:22:58.470023  9605 main.cpp:187] Version: 0.24.1
>>>>>>>>> I1012 20:22:58.470031  9605 main.cpp:190] Git tag: 0.24.1
>>>>>>>>> I1012 20:22:58.470039  9605 main.cpp:194] Git SHA:
>>>>>>>>> 44873806c2bb55da37e9adbece938274d8cd7c48
>>>>>>>>> I1012 20:22:58.470221  9605 containerizer.cpp:143] Using
>>>>>>>>> isolation: posix/cpu,posix/mem,filesystem/posix
>>>>>>>>> I1012 20:22:58.573750  9605 main.cpp:272] Starting Mesos slave
>>>>>>>>> I1012 20:22:58.574662  9621 slave.cpp:190] Slave started on 1)@
>>>>>>>>> 192.168.225.59:5050
>>>>>>>>> I1012 20:22:58.574695  9621 slave.cpp:191] Flags at startup:
>>>>>>>>> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
>>>>>>>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>>>>>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>>>>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>>>>>>> --container_disk_watch_interval="15secs" --containerizers="mesos,docker"
>>>>>>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>>>>>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>>>>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>>>>>>> --enforce_container_disk_quota="false"
>>>>>>>>> --executor_registration_timeout="5mins"
>>>>>>>>> --executor_shutdown_grace_period="5secs"
>>>>>>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>>>>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>>>>>>> --hadoop_home="" --help="false" --hostname="
>>>>>>>>> mesos-worker2a-hobart.gigawatt.io"
>>>>>>>>> --initialize_driver_logging="true" --ip="192.168.225.59"
>>>>>>>>> --isolation="posix/cpu,posix/mem" --
>>>>>>>>> *launcher_dir="/usr/libexec/mesos"* --log_dir="/var/log/mesos"
>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>> --master="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>>>>>>> --perf_interval="1mins" --port="5050" --qos_correction_interval_min="0ns"
>>>>>>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>>>>>>> --registration_backoff_factor="1secs"
>>>>>>>>> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
>>>>>>>>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
>>>>>>>>> --switch_user="true" --version="false" --work_dir="/tmp/mesos"
>>>>>>>>
>>>>>>>>
>>>>>>>> The launcher dir is picked up by the mesos-slave process.  We can
>>>>>>>> also see the cmdline flag is picked up from /etc/mesos-slave like this:
>>>>>>>>
>>>>>>>> mesos-worker2a:~$ ps -ef | grep mesos
>>>>>>>>> root      9605     1  1 20:22 ?        00:01:18
>>>>>>>>> /usr/sbin/mesos-slave --ip=192.168.225.59 --log_dir=/var/log/mesos --
>>>>>>>>> *launcher_dir=/usr/libexec/mesos*root      9612  9605  0 20:22 ?
>>>>>>>>>        00:00:00 logger -p user.info -t mesos-slave[9605]
>>>>>>>>> root      9613  9605  0 20:22 ?        00:00:00 logger -p user.err
>>>>>>>>> -t mesos-slave[9605]
>>>>>>>>> vagrant   9951  6010  0 21:36 pts/0    00:00:00 grep --color=auto
>>>>>>>>> mesos
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> What I keep coming back to is the fact that the MESOS_LAUNCHER_DIR
>>>>>>>> env var does not seem get picked up here:
>>>>>>>> https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L573-L576
>>>>>>>> :
>>>>>>>>
>>>>>>>>   const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>>   string path =
>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>
>>>>>>>>
>>>>>>>> And argv[0] (which contains the slave work dir) is the path we see
>>>>>>>> in the tasks stdout.
>>>>>>>>
>>>>>>>> I'm still having trouble understanding how flags defined in
>>>>>>>> mesos::internal::slave::Flags::Flags (src/slave/flags.[ch]pp) are
>>>>>>>> propagated or expanded to MESOS_<flag_in_caps> environment variables.  Can
>>>>>>>> you confirm if such a mechanism exists and if so where it is?
>>>>>>>>
>>>>>>>> Otherwise, if my understanding is correct and such a mechanism
>>>>>>>> doesn't exist:
>>>>>>>>
>>>>>>>> How can the requisite MESOS_LAUNHER_DIR env var be available when
>>>>>>>> docker/executor.cpp (a child process of mesos-slave) attempts to read it?
>>>>>>>>
>>>>>>>> The lack of such a mechanism would explain the behavior I'm
>>>>>>>> currently observing.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Jay
>>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [0] http://github.com/massenz/zk-mesos
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Marco Massenzio*
>>>>>>>>>> Distributed Systems Engineer
>>>>>>>>>> http://codetrips.com
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <outtatime@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Haosdent and Mesos friends,
>>>>>>>>>>>
>>>>>>>>>>> I've rebuilt the cluster from scratch and installed mesos 0.24.1
>>>>>>>>>>> from the mesosphere apt repo:
>>>>>>>>>>>
>>>>>>>>>>> $ dpkg -l | grep mesos
>>>>>>>>>>> ii  mesos                               0.24.1-0.2.35.ubuntu1404
>>>>>>>>>>>            amd64        Cluster resource manager with efficient resource
>>>>>>>>>>> isolation
>>>>>>>>>>>
>>>>>>>>>>> Then added the `launcher_dir' flag to
>>>>>>>>>>> /etc/mesos-slave/launcher_dir on the slaves:
>>>>>>>>>>>
>>>>>>>>>>> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
>>>>>>>>>>> /usr/libexec/mesos
>>>>>>>>>>>
>>>>>>>>>>> And yet the task health-checks are still being launched from the
>>>>>>>>>>> sandbox directory like before!
>>>>>>>>>>>
>>>>>>>>>>> I've also tested setting the MESOS_LAUNCHER_DIR env var and get
>>>>>>>>>>> the identical result (just as before on the cluster where many versions of
>>>>>>>>>>> mesos had been installed):
>>>>>>>>>>>
>>>>>>>>>>> STDOUT:
>>>>>>>>>>>
>>>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>>> Starting task
>>>>>>>>>>>> hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
>>>>>>>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>>>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>>>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>>>>>>>> 127.0.0.1:8000
>>>>>>>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>>>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>>>> Health check process launched at pid: 11253
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> STDERR:
>>>>>>>>>>>
>>>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>>> Starting task
>>>>>>>>>>>> hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>>>> *Launching health check process:
>>>>>>>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
>>>>>>>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>>>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>>>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>>>>>>>> 127.0.0.1:8000
>>>>>>>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>>>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>>>> Health check process launched at pid: 11253
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Any ideas on where to go from here?  Is there any additional
>>>>>>>>>>> information I can provide?
>>>>>>>>>>>
>>>>>>>>>>> Thanks as always,
>>>>>>>>>>> Jay
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> For flag sent to the executor from containerizer, the flag
>>>>>>>>>>>> would stringify and become a command line parameter when launch executor.
>>>>>>>>>>>>
>>>>>>>>>>>> You could see this in
>>>>>>>>>>>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>>>>>>>>>>>
>>>>>>>>>>>> But for launcher_dir, the executor get it from `argv[0]`, as
>>>>>>>>>>>> you mentioned above.
>>>>>>>>>>>> ```
>>>>>>>>>>>>   string path =
>>>>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>>>>                      :
>>>>>>>>>>>> os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>>>>
>>>>>>>>>>>> ```
>>>>>>>>>>>> So I want to figure out why your argv[0] would become sandbox
>>>>>>>>>>>> dir, not "/usr/libexec/mesos".
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <
>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I see.  And then how are the flags sent to the executor?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes. The related code is located in
>>>>>>>>>>>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>>>>>>>>>>>
>>>>>>>>>>>>> In fact, environment variables starts with MESOS_ would load
>>>>>>>>>>>>> as flags variables.
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <
>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> One question for you haosdent-
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You mentioned that the flags.launcher_dir should propagate to
>>>>>>>>>>>>>> the docker executor all the way up the chain.  Can you show me where this
>>>>>>>>>>>>>> logic is in the codebase?  I didn't see where that was happening and would
>>>>>>>>>>>>>> like to understand the mechanism.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe tomorrow I will build a fresh cluster from scratch to
>>>>>>>>>>>>>> see if the broken behavior experienced today still persists.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set
>>>>>>>>>>>>>> flags.launcher_dir which would find mesos-docker-executor
>>>>>>>>>>>>>> and mesos-health-check under this dir. Although the env is not propagated,
>>>>>>>>>>>>>> but MESOS_LAUNCHER_DIR still works
>>>>>>>>>>>>>> because flags.launcher_dir is get from it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For example, because I
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> before start mesos-slave. So when I launch slave, I could
>>>>>>>>>>>>>> find this log in slave log
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>>>>>>>>>>>>> xxxxx  --launcher_dir="/tmp"
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR
>>>>>>>>>>>>>> become sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your
>>>>>>>>>>>>>> other scripts?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <
>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir
>>>>>>>>>>>>>>> before.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I just tried setting both the env var and flag on the
>>>>>>>>>>>>>>> slaves, and have determined that the env var is not present when it is
>>>>>>>>>>>>>>> being checked src/docker/executor.cpp @ line 573:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  const Option<string> envPath =
>>>>>>>>>>>>>>>> os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>>>>>>>>>   string path =
>>>>>>>>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>>>>>>>>                      :
>>>>>>>>>>>>>>>> os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>>>>>>>>>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" <<
>>>>>>>>>>>>>>>> endl;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is
>>>>>>>>>>>>>>> correctly propagated along up to the point of mesos-slave launch):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> $ cat /etc/default/mesos-slave
>>>>>>>>>>>>>>>> export
>>>>>>>>>>>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>>>>>>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>>>>>>>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>>>>>>>>>>>> export MESOS_PORT="5050"
>>>>>>>>>>>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> TASK OUTPUT:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>>>>>>>>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>> hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>>>>>>>>>>>>> --executor=(1)@192.168.225.59:44523
>>>>>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>>>>>>>>>>>>> sh -c \" \/bin\/bash
>>>>>>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>>>>>>>> Health check process launched at pid: 2519
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The env var is not propagated when the docker executor is
>>>>>>>>>>>>>>> launched in src/slave/containerizer/docker.cpp around line 903:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   vector<string> argv;
>>>>>>>>>>>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>>>>>>>>>>>   // Construct the mesos-docker-executor using the "name"
>>>>>>>>>>>>>>>> we gave the
>>>>>>>>>>>>>>>>   // container (to distinguish it from Docker containers
>>>>>>>>>>>>>>>> not created
>>>>>>>>>>>>>>>>   // by Mesos).
>>>>>>>>>>>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>>>>>>>>>>>       path::join(flags.launcher_dir,
>>>>>>>>>>>>>>>> "mesos-docker-executor"),
>>>>>>>>>>>>>>>>       argv,
>>>>>>>>>>>>>>>>       Subprocess::PIPE(),
>>>>>>>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>>>>>>>> "stdout")),
>>>>>>>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>>>>>>>> "stderr")),
>>>>>>>>>>>>>>>>       dockerFlags(flags, container->name(),
>>>>>>>>>>>>>>>> container->directory),
>>>>>>>>>>>>>>>>       environment,
>>>>>>>>>>>>>>>>       lambda::bind(&setup, container->directory));
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> A little ways above we can see the environment is setup w/
>>>>>>>>>>>>>>> the container tasks defined env vars.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>>>>>>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  container->executor.command().environment().variables()) {
>>>>>>>>>>>>>>>>     environment[variable.name()] = variable.value();
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <haosdent@gmail.com
>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>>>>>>>>>>>> 0.24.1 should be works.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >Do any of you know which host the path
>>>>>>>>>>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>>>>>>>>>>>>>> failing.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly
>>>>>>>>>>>>>>>> before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or
>>>>>>>>>>>>>>>> use the same dir of mesos-docker-executor.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <
>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Maybe I spoke too soon.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Now the checks are attempting to run, however the STDERR
>>>>>>>>>>>>>>>>> is not looking good.  I've added some debugging to the error message output
>>>>>>>>>>>>>>>>> to show the path, argv, and envp variables:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> STDOUT:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>>>>>>>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>>>>>>>>>>>>> sh -c \" exit 1
>>>>>>>>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>>>>>>>> Health check process launched at pid: 3012
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> STDERR:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>>>>>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor
>>>>>>>>>>>>>>>>>> registered on slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit
>>>>>>>>>>>>>>>>>> capabilities, memory limited without swap.
>>>>>>>>>>>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in
>>>>>>>>>>>>>>>>>> childMain
>>>>>>>>>>>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>>>>>>>>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>>>>>>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID
>>>>>>>>>>>>>>>>>> 0x7f4a2f9f6700) from PID 3012; stack trace: ***
>>>>>>>>>>>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>>>>>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>>>>>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>>>>>>>>>>>> @ 0x41921c _Abort()
>>>>>>>>>>>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>>>>>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>>>>>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>>>>>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>>>>>>>>>>>> @ 0x43cc9c
>>>>>>>>>>>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>>>>>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>>>>>>>>>>>> @ 0x7f4a39d92827
>>>>>>>>>>>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>>>>>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>>>>>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>>>>>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>>>>>>>>> should exist on? It definitely doesn't exist on the
>>>>>>>>>>>>>>>>> slave, hence execution failing.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is with current master, git hash
>>>>>>>>>>>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>>>>>>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>>>>>>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <
>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Update:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to
>>>>>>>>>>>>>>>>>> compile and package the latest master (0.26.x) and deployed it to the
>>>>>>>>>>>>>>>>>> cluster, and now health checks are working as advertised in both Marathon
>>>>>>>>>>>>>>>>>> and my own framework!  Not sure what was going on with health-checks in
>>>>>>>>>>>>>>>>>> 0.24.0..
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Haosdent,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Can you share your Marathon POST request that results in
>>>>>>>>>>>>>>>>>>> Mesos executing the health checks?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Since we can reference the Marathon framework, I've been
>>>>>>>>>>>>>>>>>>> doing some digging around.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's
>>>>>>>>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as
>>>>>>>>>>>>>>>>>>> JSON to /tmp/X in both the TaskFactory as well an right before the task is
>>>>>>>>>>>>>>>>>>> sent to Mesos via driver.launchTasks:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>>>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>>>>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>>>>>>>>>>>> +        import
>>>>>>>>>>>>>>>>>>>> com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>>>>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>>>>>>>          CreatedTask(
>>>>>>>>>>>>>>>>>>>>            taskInfo,
>>>>>>>>>>>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>>>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class
>>>>>>>>>>>>>>>>>>>> TaskLauncherImpl(
>>>>>>>>>>>>>>>>>>>>    override def launchTasks(offerID: OfferID,
>>>>>>>>>>>>>>>>>>>> taskInfos: Seq[TaskInfo]): Boolean = {
>>>>>>>>>>>>>>>>>>>>      val launched =
>>>>>>>>>>>>>>>>>>>> withDriver(s"launchTasks($offerID)") { driver =>
>>>>>>>>>>>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>>>>>>>>>>>> +      var i = 0
>>>>>>>>>>>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>>>>>>>>>>>> +        import
>>>>>>>>>>>>>>>>>>>> com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" +
>>>>>>>>>>>>>>>>>>>> i.toString() + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new
>>>>>>>>>>>>>>>>>>>> FileWriter(file))
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>  bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  driver.launchTasks(Collections.singleton(offerID), taskInfos.asJava)
>>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Then I built and deployed the hacked Marathon and
>>>>>>>>>>>>>>>>>>> restarted the marathon service.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Next I created the app via the Marathon API ("hello app"
>>>>>>>>>>>>>>>>>>> is a container with a simple hello-world ruby app running on
>>>>>>>>>>>>>>>>>>> 0.0.0.0:8000)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST
>>>>>>>>>>>>>>>>>>>> -H'Content-Type: application/json' -d'
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>>>>>>>>>>>   "apps": [
>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>>>>>>>>>>>       "container": {
>>>>>>>>>>>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>>>>>>>>>>>         "docker": {
>>>>>>>>>>>>>>>>>>>>           "image":
>>>>>>>>>>>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>>>>>>>>>>>           "portMappings": [
>>>>>>>>>>>>>>>>>>>>             {
>>>>>>>>>>>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>>>>>>>           ]
>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>>       "env": {
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>       "instances": 1,
>>>>>>>>>>>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>>>>>>>>>>>       "mem": 512
>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>   ]
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> $ ls /tmp/
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Do they match?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, so I am confident this is the information being
>>>>>>>>>>>>>>>>>>> sent across the wire to Mesos.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> $ cat
>>>>>>>>>>>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>         "value":1.0
>>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>         "value":512.0
>>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>             "begin":31641,
>>>>>>>>>>>>>>>>>>>>             "end":31641
>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>     "environment":{
>>>>>>>>>>>>>>>>>>>>       "variables":[
>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>       ]
>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> And STDERR:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>>>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor
>>>>>>>>>>>>>>>>>>>> registered on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit
>>>>>>>>>>>>>>>>>>>> capabilities, memory limited without swap.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Any ideas of other things to try or what I could be
>>>>>>>>>>>>>>>>>>> missing?  Can't say either way about the Mesos health-check system working
>>>>>>>>>>>>>>>>>>> or not if Marathon won't put the health-check into the task it sends to
>>>>>>>>>>>>>>>>>>> Mesos.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for all your help!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <
>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Maybe you could post your executor stdout/stderr so
>>>>>>>>>>>>>>>>>>>> that we could know whether health check running not.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <
>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> marathon also use mesos health check. When I use
>>>>>>>>>>>>>>>>>>>>> health check, I could saw the log like this in executor stdout.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I am using my own framework, and the full task info
>>>>>>>>>>>>>>>>>>>>>> I'm using is posted earlier in this thread.  Do you happen to know if
>>>>>>>>>>>>>>>>>>>>>> Marathon uses Mesos's health checks for its health check system?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <
>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Yes, launch the health task through its definition in
>>>>>>>>>>>>>>>>>>>>>> taskinfo. Do you launch your task through Marathon? I could test it in my
>>>>>>>>>>>>>>>>>>>>>> side.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Precisely, and there are none of those statements.
>>>>>>>>>>>>>>>>>>>>>>> Are you or others confident health-checks are part of the code path when
>>>>>>>>>>>>>>>>>>>>>>> defined via task info for docker container tasks?  Going through the code,
>>>>>>>>>>>>>>>>>>>>>>> I wasn't able to find the linkage for anything other than health-checks
>>>>>>>>>>>>>>>>>>>>>>> triggered through a custom executor.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> With that being said it is a pretty good sized code
>>>>>>>>>>>>>>>>>>>>>>> base and I'm not very familiar with it, so my analysis this far has by no
>>>>>>>>>>>>>>>>>>>>>>> means been exhaustive.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <
>>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> When health check launch, it would have a log like
>>>>>>>>>>>>>>>>>>>>>>> this in your executor stdout
>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be
>>>>>>>>>>>>>>>>>>>>>>>> output in the logs with the string "health" or "Health" if the health-check
>>>>>>>>>>>>>>>>>>>>>>>> were active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <
>>>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see
>>>>>>>>>>>>>>>>>>>>>>>> whether could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <
>>>>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and
>>>>>>>>>>>>>>>>>>>>>>>>>> 0.24.1
>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this
>>>>>>>>>>>>>>>>>>>>>>>>>>> backport, let me double check.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in
>>>>>>>>>>>>>>>>>>>>>>>>>>>> master.  I'll look there :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> checkout to test it out?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tasks that's in master but not yet released. It will run docker exec with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> image tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mesos:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to see if they ever run the command (in this case `sleep 5`), but have not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> found any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that health-checks are only supported for custom executors
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
Yes, you could see my patch in
https://issues.apache.org/jira/browse/MESOS-3738 . I also upload patch for
other versions in attachments. you could download it and use "patch -p1 <
MESOS-3738-xxx.patch" to try again in your local.

On Fri, Oct 16, 2015 at 1:51 PM, Jay Taylor <ou...@gmail.com> wrote:

> Hey Haosdent,
>
> Thanks for following up!  Glad to hear that others have reproduced the
> issue and it's not just me.
>
> It's too bad that both the launcher dir flag and argv[0] are broken and
> not caught by unit-tests.  With that being said I completely understand and
> empathize with the devs about how these kinds of things happen.
>
> Thanks again for all your help!
>
> Best,
> Jay
>
> Btw health checks still do not work for me even if I set the
> MESOS_LAUNCHER_DIR env var to /usr/libexec/mesos.  Have you tried using it
> for health checks with a docker container with the latest HEAD of the
> master branch?  It so, was the var picked up by the health checker for you?
>
> I've resorted to using a hacked build with the path hard coded for the
> time being.
>
>
>
> On Oct 15, 2015, at 10:29 PM, haosdent <ha...@gmail.com> wrote:
>
> Hi, Jay I have to say sorry for you. When I build the docker image for
> you, I found the problem for launcher_dir.
> https://issues.apache.org/jira/browse/MESOS-3738
>
> On Tue, Oct 13, 2015 at 10:12 AM, Jay Taylor <ou...@gmail.com> wrote:
>
>> Sure, I'm game.
>>
>> On Mon, Oct 12, 2015 at 7:11 PM, haosdent <ha...@gmail.com> wrote:
>>
>>> I think I could provide you a docker image later to run mesos master and
>>> agent, so that we could debug this problem and find the cause more easier.
>>> On Oct 13, 2015 6:46 AM, "Jay Taylor" <ou...@gmail.com> wrote:
>>>
>>>> Ah ha, I see now that the permissions are fine - just needed to click
>>>> "Create" instead of the arrow.  Oh JIRA.. :)
>>>>
>>>> On Mon, Oct 12, 2015 at 3:26 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>
>>>>> Hi Marco,
>>>>>
>>>>> What a relief!
>>>>>
>>>>> I'd love to file the JIRA ticket for this, but I don't think my
>>>>> account has permissions over on
>>>>> https://issues.apache.org/jira/browse/MESOS.  I am "jaytaylor" over
>>>>> there.  Please let me know if you can help with that and we can get the
>>>>> ball rolling on this.
>>>>>
>>>>>
>>>>> On Mon, Oct 12, 2015 at 3:14 PM, Marco Massenzio <ma...@mesosphere.io>
>>>>> wrote:
>>>>>
>>>>>> Jay:
>>>>>>
>>>>>> you hit the nail on the head: the direction is definitely one-way
>>>>>> (from MESOS_ENV var to Flag) and we don't reflect --flag back into the
>>>>>> MESOS_FLAG env var.
>>>>>> Others more familiar with the matter may correct me, but it looks
>>>>>> like you have uncovered a bug in the executor code: could you please file a
>>>>>> Jira for us to look into?
>>>>>>
>>>>>> It seems to me that, at present, the only workaround is for you would
>>>>>> be to set the MESOS_LAUNCHER_DIR env var, as the flag won't be picked by
>>>>>> the executor.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Marco Massenzio*
>>>>>> Distributed Systems Engineer
>>>>>> http://codetrips.com
>>>>>>
>>>>>> On Mon, Oct 12, 2015 at 11:44 PM, Jay Taylor <ja...@jaytaylor.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Marco,
>>>>>>>
>>>>>>> My reply is inline below-
>>>>>>>
>>>>>>> On Mon, Oct 12, 2015 at 2:27 PM, Marco Massenzio <
>>>>>>> marco@mesosphere.io> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <
>>>>>>>> marco@mesosphere.io> wrote:
>>>>>>>>
>>>>>>>>> Are those the stdout logs of the Agent? Because I don't see the
>>>>>>>>> --launcher-dir set, however, if I look into one that is running off the
>>>>>>>>> same 0.24.1 package, this is what I see:
>>>>>>>>>
>>>>>>>>> I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup:
>>>>>>>>> --appc_store_dir="/tmp/mesos/store/appc"
>>>>>>>>> --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5"
>>>>>>>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>>>>>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>>>>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>>>>>>> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
>>>>>>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>>>>>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>>>>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>>>>>>> --enforce_container_disk_quota="false"
>>>>>>>>> --executor_registration_timeout="1mins"
>>>>>>>>> --executor_shutdown_grace_period="5secs"
>>>>>>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>>>>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>>>>>>> --hadoop_home="" --help="false" --initialize_driver_logging="true"
>>>>>>>>> --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem"
>>>>>>>>> --launcher_dir="/usr/libexec/mesos"
>>>>>>>>> --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0"
>>>>>>>>> --logging_level="INFO" --master="zk://
>>>>>>>>> 192.168.33.1:2181/mesos/vagrant"
>>>>>>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>>>>>>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
>>>>>>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>>>>>>> --registration_backoff_factor="1secs"
>>>>>>>>> --resource_monitoring_interval="1secs"
>>>>>>>>> --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]"
>>>>>>>>> --revocable_cpu_low_priority="true"
>>>>>>>>> --sandbox_directory="/var/local/sandbox" --strict="true"
>>>>>>>>> --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
>>>>>>>>>
>>>>>>>> (this is run off the Vagrantfile at [0] in case you want to
>>>>>>>>> reproduce).
>>>>>>>>> That agent is not run via the init command, though, I execute it
>>>>>>>>> manually via the `run-agent.sh` in the same directory.
>>>>>>>>>
>>>>>>>>> I don't really think this matters, but I assume you also restarted
>>>>>>>>> the agent after making the config changes?
>>>>>>>>> (and, for your own sanity - you can double check the version by
>>>>>>>>> looking at the very head of the logs).
>>>>>>>>>
>>>>>>>>
>>>>>>> Yes I definitely restarted all mesos processes after config changes
>>>>>>> :)
>>>>>>>
>>>>>>> Here s info equivalent to what you posted from one of the slaves
>>>>>>> INFO log:
>>>>>>>
>>>>>>> Log file created at: 2015/10/12 20:22:58
>>>>>>>> Running on machine: mesos-worker2a
>>>>>>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>>>>>>> I1012 20:22:58.469779  9605 logging.cpp:172] INFO level logging
>>>>>>>> started!
>>>>>>>> I1012 20:22:58.470006  9605 main.cpp:185] Build: 2015-09-25
>>>>>>>> 19:13:24 by root
>>>>>>>> I1012 20:22:58.470023  9605 main.cpp:187] Version: 0.24.1
>>>>>>>> I1012 20:22:58.470031  9605 main.cpp:190] Git tag: 0.24.1
>>>>>>>> I1012 20:22:58.470039  9605 main.cpp:194] Git SHA:
>>>>>>>> 44873806c2bb55da37e9adbece938274d8cd7c48
>>>>>>>> I1012 20:22:58.470221  9605 containerizer.cpp:143] Using isolation:
>>>>>>>> posix/cpu,posix/mem,filesystem/posix
>>>>>>>> I1012 20:22:58.573750  9605 main.cpp:272] Starting Mesos slave
>>>>>>>> I1012 20:22:58.574662  9621 slave.cpp:190] Slave started on 1)@
>>>>>>>> 192.168.225.59:5050
>>>>>>>> I1012 20:22:58.574695  9621 slave.cpp:191] Flags at startup:
>>>>>>>> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
>>>>>>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>>>>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>>>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>>>>>> --container_disk_watch_interval="15secs" --containerizers="mesos,docker"
>>>>>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>>>>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>>>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>>>>>> --enforce_container_disk_quota="false"
>>>>>>>> --executor_registration_timeout="5mins"
>>>>>>>> --executor_shutdown_grace_period="5secs"
>>>>>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>>>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>>>>>> --hadoop_home="" --help="false" --hostname="
>>>>>>>> mesos-worker2a-hobart.gigawatt.io"
>>>>>>>> --initialize_driver_logging="true" --ip="192.168.225.59"
>>>>>>>> --isolation="posix/cpu,posix/mem" --
>>>>>>>> *launcher_dir="/usr/libexec/mesos"* --log_dir="/var/log/mesos"
>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>> --master="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>>>>>> --perf_interval="1mins" --port="5050" --qos_correction_interval_min="0ns"
>>>>>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>>>>>> --registration_backoff_factor="1secs"
>>>>>>>> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
>>>>>>>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
>>>>>>>> --switch_user="true" --version="false" --work_dir="/tmp/mesos"
>>>>>>>
>>>>>>>
>>>>>>> The launcher dir is picked up by the mesos-slave process.  We can
>>>>>>> also see the cmdline flag is picked up from /etc/mesos-slave like this:
>>>>>>>
>>>>>>> mesos-worker2a:~$ ps -ef | grep mesos
>>>>>>>> root      9605     1  1 20:22 ?        00:01:18
>>>>>>>> /usr/sbin/mesos-slave --ip=192.168.225.59 --log_dir=/var/log/mesos --
>>>>>>>> *launcher_dir=/usr/libexec/mesos*root      9612  9605  0 20:22 ?
>>>>>>>>      00:00:00 logger -p user.info -t mesos-slave[9605]
>>>>>>>> root      9613  9605  0 20:22 ?        00:00:00 logger -p user.err
>>>>>>>> -t mesos-slave[9605]
>>>>>>>> vagrant   9951  6010  0 21:36 pts/0    00:00:00 grep --color=auto
>>>>>>>> mesos
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> What I keep coming back to is the fact that the MESOS_LAUNCHER_DIR
>>>>>>> env var does not seem get picked up here:
>>>>>>> https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L573-L576
>>>>>>> :
>>>>>>>
>>>>>>>   const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>   string path =
>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>
>>>>>>>
>>>>>>> And argv[0] (which contains the slave work dir) is the path we see
>>>>>>> in the tasks stdout.
>>>>>>>
>>>>>>> I'm still having trouble understanding how flags defined in
>>>>>>> mesos::internal::slave::Flags::Flags (src/slave/flags.[ch]pp) are
>>>>>>> propagated or expanded to MESOS_<flag_in_caps> environment variables.  Can
>>>>>>> you confirm if such a mechanism exists and if so where it is?
>>>>>>>
>>>>>>> Otherwise, if my understanding is correct and such a mechanism
>>>>>>> doesn't exist:
>>>>>>>
>>>>>>> How can the requisite MESOS_LAUNHER_DIR env var be available when
>>>>>>> docker/executor.cpp (a child process of mesos-slave) attempts to read it?
>>>>>>>
>>>>>>> The lack of such a mechanism would explain the behavior I'm
>>>>>>> currently observing.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Jay
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> [0] http://github.com/massenz/zk-mesos
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Marco Massenzio*
>>>>>>>>> Distributed Systems Engineer
>>>>>>>>> http://codetrips.com
>>>>>>>>>
>>>>>>>>> On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Haosdent and Mesos friends,
>>>>>>>>>>
>>>>>>>>>> I've rebuilt the cluster from scratch and installed mesos 0.24.1
>>>>>>>>>> from the mesosphere apt repo:
>>>>>>>>>>
>>>>>>>>>> $ dpkg -l | grep mesos
>>>>>>>>>> ii  mesos                               0.24.1-0.2.35.ubuntu1404
>>>>>>>>>>            amd64        Cluster resource manager with efficient resource
>>>>>>>>>> isolation
>>>>>>>>>>
>>>>>>>>>> Then added the `launcher_dir' flag to
>>>>>>>>>> /etc/mesos-slave/launcher_dir on the slaves:
>>>>>>>>>>
>>>>>>>>>> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
>>>>>>>>>> /usr/libexec/mesos
>>>>>>>>>>
>>>>>>>>>> And yet the task health-checks are still being launched from the
>>>>>>>>>> sandbox directory like before!
>>>>>>>>>>
>>>>>>>>>> I've also tested setting the MESOS_LAUNCHER_DIR env var and get
>>>>>>>>>> the identical result (just as before on the cluster where many versions of
>>>>>>>>>> mesos had been installed):
>>>>>>>>>>
>>>>>>>>>> STDOUT:
>>>>>>>>>>
>>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>> Starting task
>>>>>>>>>>> hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>>> Launching health check process:
>>>>>>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
>>>>>>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>>>>>>> 127.0.0.1:8000
>>>>>>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>>> Health check process launched at pid: 11253
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> STDERR:
>>>>>>>>>>
>>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>> Starting task
>>>>>>>>>>> hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>>> *Launching health check process:
>>>>>>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
>>>>>>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>>>>>>> 127.0.0.1:8000
>>>>>>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>>> Health check process launched at pid: 11253
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Any ideas on where to go from here?  Is there any additional
>>>>>>>>>> information I can provide?
>>>>>>>>>>
>>>>>>>>>> Thanks as always,
>>>>>>>>>> Jay
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <ha...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> For flag sent to the executor from containerizer, the flag would
>>>>>>>>>>> stringify and become a command line parameter when launch executor.
>>>>>>>>>>>
>>>>>>>>>>> You could see this in
>>>>>>>>>>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>>>>>>>>>>
>>>>>>>>>>> But for launcher_dir, the executor get it from `argv[0]`, as you
>>>>>>>>>>> mentioned above.
>>>>>>>>>>> ```
>>>>>>>>>>>   string path =
>>>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>>>                      :
>>>>>>>>>>> os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>> So I want to figure out why your argv[0] would become sandbox
>>>>>>>>>>> dir, not "/usr/libexec/mesos".
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <outtatime@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I see.  And then how are the flags sent to the executor?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Yes. The related code is located in
>>>>>>>>>>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>>>>>>>>>>
>>>>>>>>>>>> In fact, environment variables starts with MESOS_ would load as
>>>>>>>>>>>> flags variables.
>>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <
>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> One question for you haosdent-
>>>>>>>>>>>>>
>>>>>>>>>>>>> You mentioned that the flags.launcher_dir should propagate to
>>>>>>>>>>>>> the docker executor all the way up the chain.  Can you show me where this
>>>>>>>>>>>>> logic is in the codebase?  I didn't see where that was happening and would
>>>>>>>>>>>>> like to understand the mechanism.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe tomorrow I will build a fresh cluster from scratch to
>>>>>>>>>>>>> see if the broken behavior experienced today still persists.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set
>>>>>>>>>>>>> flags.launcher_dir which would find mesos-docker-executor
>>>>>>>>>>>>> and mesos-health-check under this dir. Although the env is not propagated,
>>>>>>>>>>>>> but MESOS_LAUNCHER_DIR still works because flags.launcher_dir
>>>>>>>>>>>>> is get from it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For example, because I
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> before start mesos-slave. So when I launch slave, I could find
>>>>>>>>>>>>> this log in slave log
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>>>>>>>>>>>> xxxxx  --launcher_dir="/tmp"
>>>>>>>>>>>>> ```
>>>>>>>>>>>>>
>>>>>>>>>>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR
>>>>>>>>>>>>> become sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your
>>>>>>>>>>>>> other scripts?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <
>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir
>>>>>>>>>>>>>> before.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I just tried setting both the env var and flag on the slaves,
>>>>>>>>>>>>>> and have determined that the env var is not present when it is being
>>>>>>>>>>>>>> checked src/docker/executor.cpp @ line 573:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  const Option<string> envPath =
>>>>>>>>>>>>>>> os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>>>>>>>>   string path =
>>>>>>>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>>>>>>>                      :
>>>>>>>>>>>>>>> os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>>>>>>>>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" <<
>>>>>>>>>>>>>>> endl;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is
>>>>>>>>>>>>>> correctly propagated along up to the point of mesos-slave launch):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ cat /etc/default/mesos-slave
>>>>>>>>>>>>>>> export
>>>>>>>>>>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>>>>>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>>>>>>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>>>>>>>>>>> export MESOS_PORT="5050"
>>>>>>>>>>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> TASK OUTPUT:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>>>>>>>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>> hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>>>>>>>>>>>> --executor=(1)@192.168.225.59:44523
>>>>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>>>>>>>>>>>> sh -c \" \/bin\/bash
>>>>>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>>>>>>> Health check process launched at pid: 2519
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The env var is not propagated when the docker executor is
>>>>>>>>>>>>>> launched in src/slave/containerizer/docker.cpp around line 903:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   vector<string> argv;
>>>>>>>>>>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>>>>>>>>>>   // Construct the mesos-docker-executor using the "name" we
>>>>>>>>>>>>>>> gave the
>>>>>>>>>>>>>>>   // container (to distinguish it from Docker containers not
>>>>>>>>>>>>>>> created
>>>>>>>>>>>>>>>   // by Mesos).
>>>>>>>>>>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>>>>>>>>>>       path::join(flags.launcher_dir,
>>>>>>>>>>>>>>> "mesos-docker-executor"),
>>>>>>>>>>>>>>>       argv,
>>>>>>>>>>>>>>>       Subprocess::PIPE(),
>>>>>>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>>>>>>> "stdout")),
>>>>>>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>>>>>>> "stderr")),
>>>>>>>>>>>>>>>       dockerFlags(flags, container->name(),
>>>>>>>>>>>>>>> container->directory),
>>>>>>>>>>>>>>>       environment,
>>>>>>>>>>>>>>>       lambda::bind(&setup, container->directory));
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> A little ways above we can see the environment is setup w/
>>>>>>>>>>>>>> the container tasks defined env vars.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>>>>>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  container->executor.command().environment().variables()) {
>>>>>>>>>>>>>>>     environment[variable.name()] = variable.value();
>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>>>>>>>>>>> 0.24.1 should be works.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >Do any of you know which host the path
>>>>>>>>>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>>>>>>>>>>>>> failing.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly
>>>>>>>>>>>>>>> before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or
>>>>>>>>>>>>>>> use the same dir of mesos-docker-executor.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <
>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Maybe I spoke too soon.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Now the checks are attempting to run, however the STDERR is
>>>>>>>>>>>>>>>> not looking good.  I've added some debugging to the error message output to
>>>>>>>>>>>>>>>> show the path, argv, and envp variables:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> STDOUT:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>>>>>>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>>>>>>>>>>>> sh -c \" exit 1
>>>>>>>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>>>>>>> Health check process launched at pid: 3012
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> STDERR:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>>>>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor
>>>>>>>>>>>>>>>>> registered on slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit
>>>>>>>>>>>>>>>>> capabilities, memory limited without swap.
>>>>>>>>>>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in
>>>>>>>>>>>>>>>>> childMain
>>>>>>>>>>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>>>>>>>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>>>>>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID
>>>>>>>>>>>>>>>>> 0x7f4a2f9f6700) from PID 3012; stack trace: ***
>>>>>>>>>>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>>>>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>>>>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>>>>>>>>>>> @ 0x41921c _Abort()
>>>>>>>>>>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>>>>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>>>>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>>>>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>>>>>>>>>>> @ 0x43cc9c
>>>>>>>>>>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>>>>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>>>>>>>>>>> @ 0x7f4a39d92827
>>>>>>>>>>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>>>>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>>>>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>>>>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>>>>>>>> should exist on? It definitely doesn't exist on the slave,
>>>>>>>>>>>>>>>> hence execution failing.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is with current master, git hash
>>>>>>>>>>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>>>>>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>>>>>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <
>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Update:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to
>>>>>>>>>>>>>>>>> compile and package the latest master (0.26.x) and deployed it to the
>>>>>>>>>>>>>>>>> cluster, and now health checks are working as advertised in both Marathon
>>>>>>>>>>>>>>>>> and my own framework!  Not sure what was going on with health-checks in
>>>>>>>>>>>>>>>>> 0.24.0..
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <
>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Haosdent,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Can you share your Marathon POST request that results in
>>>>>>>>>>>>>>>>>> Mesos executing the health checks?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Since we can reference the Marathon framework, I've been
>>>>>>>>>>>>>>>>>> doing some digging around.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's
>>>>>>>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as
>>>>>>>>>>>>>>>>>> JSON to /tmp/X in both the TaskFactory as well an right before the task is
>>>>>>>>>>>>>>>>>> sent to Mesos via driver.launchTasks:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>>>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>>>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>>>>>>          CreatedTask(
>>>>>>>>>>>>>>>>>>>            taskInfo,
>>>>>>>>>>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class
>>>>>>>>>>>>>>>>>>> TaskLauncherImpl(
>>>>>>>>>>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>>>>>>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>>>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)")
>>>>>>>>>>>>>>>>>>> { driver =>
>>>>>>>>>>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>>>>>>>>>>> +      var i = 0
>>>>>>>>>>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" +
>>>>>>>>>>>>>>>>>>> i.toString() + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new
>>>>>>>>>>>>>>>>>>> FileWriter(file))
>>>>>>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  driver.launchTasks(Collections.singleton(offerID), taskInfos.asJava)
>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Then I built and deployed the hacked Marathon and
>>>>>>>>>>>>>>>>>> restarted the marathon service.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Next I created the app via the Marathon API ("hello app"
>>>>>>>>>>>>>>>>>> is a container with a simple hello-world ruby app running on
>>>>>>>>>>>>>>>>>> 0.0.0.0:8000)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST
>>>>>>>>>>>>>>>>>>> -H'Content-Type: application/json' -d'
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>>>>>>>>>>   "apps": [
>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>>>>>>>>>>       "container": {
>>>>>>>>>>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>>>>>>>>>>         "docker": {
>>>>>>>>>>>>>>>>>>>           "image":
>>>>>>>>>>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>>>>>>>>>>           "portMappings": [
>>>>>>>>>>>>>>>>>>>             {
>>>>>>>>>>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>>>>>>           ]
>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>       "env": {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>       "instances": 1,
>>>>>>>>>>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>>>>>>>>>>       "mem": 512
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>   ]
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> $ ls /tmp/
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Do they match?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes, so I am confident this is the information being sent
>>>>>>>>>>>>>>>>>> across the wire to Mesos.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> $ cat
>>>>>>>>>>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>         "value":1.0
>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>         "value":512.0
>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>             "begin":31641,
>>>>>>>>>>>>>>>>>>>             "end":31641
>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>     "environment":{
>>>>>>>>>>>>>>>>>>>       "variables":[
>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>       ]
>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> And STDERR:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor
>>>>>>>>>>>>>>>>>>> registered on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit
>>>>>>>>>>>>>>>>>>> capabilities, memory limited without swap.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Any ideas of other things to try or what I could be
>>>>>>>>>>>>>>>>>> missing?  Can't say either way about the Mesos health-check system working
>>>>>>>>>>>>>>>>>> or not if Marathon won't put the health-check into the task it sends to
>>>>>>>>>>>>>>>>>> Mesos.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for all your help!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <
>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Maybe you could post your executor stdout/stderr so that
>>>>>>>>>>>>>>>>>>> we could know whether health check running not.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <
>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> marathon also use mesos health check. When I use health
>>>>>>>>>>>>>>>>>>>> check, I could saw the log like this in executor stdout.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I am using my own framework, and the full task info
>>>>>>>>>>>>>>>>>>>>> I'm using is posted earlier in this thread.  Do you happen to know if
>>>>>>>>>>>>>>>>>>>>> Marathon uses Mesos's health checks for its health check system?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <
>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yes, launch the health task through its definition in
>>>>>>>>>>>>>>>>>>>>> taskinfo. Do you launch your task through Marathon? I could test it in my
>>>>>>>>>>>>>>>>>>>>> side.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Precisely, and there are none of those statements.
>>>>>>>>>>>>>>>>>>>>>> Are you or others confident health-checks are part of the code path when
>>>>>>>>>>>>>>>>>>>>>> defined via task info for docker container tasks?  Going through the code,
>>>>>>>>>>>>>>>>>>>>>> I wasn't able to find the linkage for anything other than health-checks
>>>>>>>>>>>>>>>>>>>>>> triggered through a custom executor.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> With that being said it is a pretty good sized code
>>>>>>>>>>>>>>>>>>>>>> base and I'm not very familiar with it, so my analysis this far has by no
>>>>>>>>>>>>>>>>>>>>>> means been exhaustive.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <
>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> When health check launch, it would have a log like
>>>>>>>>>>>>>>>>>>>>>> this in your executor stdout
>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be
>>>>>>>>>>>>>>>>>>>>>>> output in the logs with the string "health" or "Health" if the health-check
>>>>>>>>>>>>>>>>>>>>>>> were active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <
>>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see
>>>>>>>>>>>>>>>>>>>>>>> whether could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <
>>>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this
>>>>>>>>>>>>>>>>>>>>>>>>>> backport, let me double check.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in
>>>>>>>>>>>>>>>>>>>>>>>>>>> master.  I'll look there :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to test it out?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tasks that's in master but not yet released. It will run docker exec with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> image tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see if they ever run the command (in this case `sleep 5`), but have not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> found any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that health-checks are only supported for custom executors
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>
>


-- 
Best Regards,
Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
Hey Haosdent,

Thanks for following up!  Glad to hear that others have reproduced the issue and it's not just me.

It's too bad that both the launcher dir flag and argv[0] are broken and not caught by unit-tests.  With that being said I completely understand and empathize with the devs about how these kinds of things happen.

Thanks again for all your help!

Best,
Jay

Btw health checks still do not work for me even if I set the MESOS_LAUNCHER_DIR env var to /usr/libexec/mesos.  Have you tried using it for health checks with a docker container with the latest HEAD of the master branch?  It so, was the var picked up by the health checker for you?

I've resorted to using a hacked build with the path hard coded for the time being.



> On Oct 15, 2015, at 10:29 PM, haosdent <ha...@gmail.com> wrote:
> 
> Hi, Jay I have to say sorry for you. When I build the docker image for you, I found the problem for launcher_dir. https://issues.apache.org/jira/browse/MESOS-3738 
> 
>> On Tue, Oct 13, 2015 at 10:12 AM, Jay Taylor <ou...@gmail.com> wrote:
>> Sure, I'm game.
>> 
>>> On Mon, Oct 12, 2015 at 7:11 PM, haosdent <ha...@gmail.com> wrote:
>>> I think I could provide you a docker image later to run mesos master and agent, so that we could debug this problem and find the cause more easier.
>>> 
>>>> On Oct 13, 2015 6:46 AM, "Jay Taylor" <ou...@gmail.com> wrote:
>>>> Ah ha, I see now that the permissions are fine - just needed to click "Create" instead of the arrow.  Oh JIRA.. :)
>>>> 
>>>>> On Mon, Oct 12, 2015 at 3:26 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>> Hi Marco,
>>>>> 
>>>>> What a relief!
>>>>> 
>>>>> I'd love to file the JIRA ticket for this, but I don't think my account has permissions over on https://issues.apache.org/jira/browse/MESOS.  I am "jaytaylor" over there.  Please let me know if you can help with that and we can get the ball rolling on this.
>>>>> 
>>>>> 
>>>>>> On Mon, Oct 12, 2015 at 3:14 PM, Marco Massenzio <ma...@mesosphere.io> wrote:
>>>>>> Jay:
>>>>>> 
>>>>>> you hit the nail on the head: the direction is definitely one-way (from MESOS_ENV var to Flag) and we don't reflect --flag back into the MESOS_FLAG env var.
>>>>>> Others more familiar with the matter may correct me, but it looks like you have uncovered a bug in the executor code: could you please file a Jira for us to look into?
>>>>>> 
>>>>>> It seems to me that, at present, the only workaround is for you would be to set the MESOS_LAUNCHER_DIR env var, as the flag won't be picked by the executor.
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Marco Massenzio
>>>>>> Distributed Systems Engineer
>>>>>> http://codetrips.com
>>>>>> 
>>>>>>> On Mon, Oct 12, 2015 at 11:44 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>>>> Hi Marco,
>>>>>>> 
>>>>>>> My reply is inline below-
>>>>>>> 
>>>>>>>> On Mon, Oct 12, 2015 at 2:27 PM, Marco Massenzio <ma...@mesosphere.io> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <ma...@mesosphere.io> wrote:
>>>>>>>>> Are those the stdout logs of the Agent? Because I don't see the --launcher-dir set, however, if I look into one that is running off the same 0.24.1 package, this is what I see:
>>>>>>>>> 
>>>>>>>>> I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup: --appc_store_dir="/tmp/mesos/store/appc" --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --initialize_driver_logging="true" --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem" 
>>>>>>>>> --launcher_dir="/usr/libexec/mesos" 
>>>>>>>>> --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0" --logging_level="INFO" --master="zk://192.168.33.1:2181/mesos/vagrant" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --resource_monitoring_interval="1secs" --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]" --revocable_cpu_low_priority="true" --sandbox_directory="/var/local/sandbox" --strict="true" --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
>>>>>>>>> 
>>>>>>>>> (this is run off the Vagrantfile at [0] in case you want to reproduce).
>>>>>>>>> That agent is not run via the init command, though, I execute it manually via the `run-agent.sh` in the same directory.
>>>>>>>>> 
>>>>>>>>> I don't really think this matters, but I assume you also restarted the agent after making the config changes?
>>>>>>>>> (and, for your own sanity - you can double check the version by looking at the very head of the logs).
>>>>>>> 
>>>>>>> Yes I definitely restarted all mesos processes after config changes :)
>>>>>>> 
>>>>>>> Here s info equivalent to what you posted from one of the slaves INFO log:
>>>>>>> 
>>>>>>>> Log file created at: 2015/10/12 20:22:58
>>>>>>>> Running on machine: mesos-worker2a
>>>>>>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>>>>>>> I1012 20:22:58.469779  9605 logging.cpp:172] INFO level logging started!
>>>>>>>> I1012 20:22:58.470006  9605 main.cpp:185] Build: 2015-09-25 19:13:24 by root
>>>>>>>> I1012 20:22:58.470023  9605 main.cpp:187] Version: 0.24.1
>>>>>>>> I1012 20:22:58.470031  9605 main.cpp:190] Git tag: 0.24.1
>>>>>>>> I1012 20:22:58.470039  9605 main.cpp:194] Git SHA: 44873806c2bb55da37e9adbece938274d8cd7c48
>>>>>>>> I1012 20:22:58.470221  9605 containerizer.cpp:143] Using isolation: posix/cpu,posix/mem,filesystem/posix
>>>>>>>> I1012 20:22:58.573750  9605 main.cpp:272] Starting Mesos slave
>>>>>>>> I1012 20:22:58.574662  9621 slave.cpp:190] Slave started on 1)@192.168.225.59:5050
>>>>>>>> I1012 20:22:58.574695  9621 slave.cpp:191] Flags at startup: --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos,docker" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --enforce_container_disk_quota="false" --executor_registration_timeout="5mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname="mesos-worker2a-hobart.gigawatt.io" --initialize_driver_logging="true" --ip="192.168.225.59" --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5050" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --version="false" --work_dir="/tmp/mesos"
>>>>>>> 
>>>>>>> 
>>>>>>> The launcher dir is picked up by the mesos-slave process.  We can also see the cmdline flag is picked up from /etc/mesos-slave like this:
>>>>>>> 
>>>>>>>> mesos-worker2a:~$ ps -ef | grep mesos
>>>>>>>> root      9605     1  1 20:22 ?        00:01:18 /usr/sbin/mesos-slave --ip=192.168.225.59 --log_dir=/var/log/mesos --launcher_dir=/usr/libexec/mesos
>>>>>>>> root      9612  9605  0 20:22 ?        00:00:00 logger -p user.info -t mesos-slave[9605]
>>>>>>>> root      9613  9605  0 20:22 ?        00:00:00 logger -p user.err -t mesos-slave[9605]
>>>>>>>> vagrant   9951  6010  0 21:36 pts/0    00:00:00 grep --color=auto mesos
>>>>>>> 
>>>>>>> 
>>>>>>>  
>>>>>>> What I keep coming back to is the fact that the MESOS_LAUNCHER_DIR env var does not seem get picked up here: https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L573-L576:
>>>>>>> 
>>>>>>>>   const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>   string path =
>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>> 
>>>>>>> And argv[0] (which contains the slave work dir) is the path we see in the tasks stdout.
>>>>>>> 
>>>>>>> I'm still having trouble understanding how flags defined in mesos::internal::slave::Flags::Flags (src/slave/flags.[ch]pp) are propagated or expanded to MESOS_<flag_in_caps> environment variables.  Can you confirm if such a mechanism exists and if so where it is?
>>>>>>> 
>>>>>>> Otherwise, if my understanding is correct and such a mechanism doesn't exist:
>>>>>>> 
>>>>>>> How can the requisite MESOS_LAUNHER_DIR env var be available when docker/executor.cpp (a child process of mesos-slave) attempts to read it?
>>>>>>> 
>>>>>>> The lack of such a mechanism would explain the behavior I'm currently observing.
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> Jay
>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> [0] http://github.com/massenz/zk-mesos 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Marco Massenzio
>>>>>>>>> Distributed Systems Engineer
>>>>>>>>> http://codetrips.com
>>>>>>>>> 
>>>>>>>>>> On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>> Hi Haosdent and Mesos friends,
>>>>>>>>>> 
>>>>>>>>>> I've rebuilt the cluster from scratch and installed mesos 0.24.1 from the mesosphere apt repo:
>>>>>>>>>> 
>>>>>>>>>> $ dpkg -l | grep mesos
>>>>>>>>>> ii  mesos                               0.24.1-0.2.35.ubuntu1404            amd64        Cluster resource manager with efficient resource isolation
>>>>>>>>>> 
>>>>>>>>>> Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir on the slaves:
>>>>>>>>>> 
>>>>>>>>>> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
>>>>>>>>>> /usr/libexec/mesos
>>>>>>>>>> 
>>>>>>>>>> And yet the task health-checks are still being launched from the sandbox directory like before!
>>>>>>>>>> 
>>>>>>>>>> I've also tested setting the MESOS_LAUNCHER_DIR env var and get the identical result (just as before on the cluster where many versions of mesos had been installed):
>>>>>>>>>> 
>>>>>>>>>> STDOUT:
>>>>>>>>>> 
>>>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" --stop_timeout="0ns"
>>>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" --stop_timeout="0ns"
>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>>> Launching health check process: /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check --executor=(1)@192.168.225.58:48912 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/127.0.0.1:8000 \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1} --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>>> Health check process launched at pid: 11253
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> STDERR:
>>>>>>>>>> 
>>>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" --stop_timeout="0ns"
>>>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" --stop_timeout="0ns"
>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>>> Launching health check process: /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check --executor=(1)@192.168.225.58:48912 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/127.0.0.1:8000 \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1} --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>>> Health check process launched at pid: 11253
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Any ideas on where to go from here?  Is there any additional information I can provide?
>>>>>>>>>> 
>>>>>>>>>> Thanks as always,
>>>>>>>>>> Jay
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>> For flag sent to the executor from containerizer, the flag would stringify and become a command line parameter when launch executor.
>>>>>>>>>>> 
>>>>>>>>>>> You could see this in https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>>>>>>>>>> 
>>>>>>>>>>> But for launcher_dir, the executor get it from `argv[0]`, as you mentioned above.
>>>>>>>>>>> ```
>>>>>>>>>>>   string path =
>>>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>>> 
>>>>>>>>>>> ```
>>>>>>>>>>> So I want to figure out why your argv[0] would become sandbox dir, not "/usr/libexec/mesos".
>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>> I see.  And then how are the flags sent to the executor?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes. The related code is located in https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In fact, environment variables starts with MESOS_ would load as flags variables.
>>>>>>>>>>>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>> One question for you haosdent-
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> You mentioned that the flags.launcher_dir should propagate to the docker executor all the way up the chain.  Can you show me where this logic is in the codebase?  I didn't see where that was happening and would like to understand the mechanism.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Maybe tomorrow I will build a fresh cluster from scratch to see if the broken behavior experienced today still persists.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir which would find mesos-docker-executor and mesos-health-check under this dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is get from it.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> For example, because I 
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>> before start mesos-slave. So when I launch slave, I could find this log in slave log
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup: xxxxx  --launcher_dir="/tmp"
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I just tried setting both the env var and flag on the slaves, and have determined that the env var is not present when it is being checked src/docker/executor.cpp @ line 573:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>>>>>>>>>>>   string path =
>>>>>>>>>>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>>>>>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>>>>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly propagated along up to the point of mesos-slave launch):
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> $ cat /etc/default/mesos-slave
>>>>>>>>>>>>>>>>>> export MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>>>>>>>>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>>>>>>>>>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>>>>>>>>>>>>>> export MESOS_PORT="5050"
>>>>>>>>>>>>>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> TASK OUTPUT:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> MESOS_LAUNCHER_DIR: envpath.isSome()->no
>>>>>>>>>>>>>>>>>> MESOS_LAUNCHER_DIR: path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'
>>>>>>>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>>>>>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>>>>>>>>>> Launching health check process: /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check --executor=(1)@192.168.225.59:44523 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad sh -c \" \/bin\/bash \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>>>>>>>>>> Health check process launched at pid: 2519
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The env var is not propagated when the docker executor is launched in src/slave/containerizer/docker.cpp around line 903:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>   vector<string> argv;
>>>>>>>>>>>>>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>>>>>>>>>>>>>   // Construct the mesos-docker-executor using the "name" we gave the
>>>>>>>>>>>>>>>>>>   // container (to distinguish it from Docker containers not created
>>>>>>>>>>>>>>>>>>   // by Mesos).
>>>>>>>>>>>>>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>>>>>>>>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>>>>>>>>>>>>>       argv,
>>>>>>>>>>>>>>>>>>       Subprocess::PIPE(),
>>>>>>>>>>>>>>>>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>>>>>>>>>>>>>>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>>>>>>>>>>>>>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>>>>>>>>>>>>>>>       environment,
>>>>>>>>>>>>>>>>>>       lambda::bind(&setup, container->directory));
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> A little ways above we can see the environment is setup w/ the container tasks defined env vars.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>>>>>>>>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>>>>>>>>>>>>>            container->executor.command().environment().variables()) {
>>>>>>>>>>>>>>>>>>     environment[variable.name()] = variable.value();
>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>>>>>>>>>>>>>> 0.24.1 should be works.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> >Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" should exist on? It definitely doesn't exist on the slave, hence execution failing.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same dir of mesos-docker-executor. 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Maybe I spoke too soon.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Now the checks are attempting to run, however the STDERR is not looking good.  I've added some debugging to the error message output to show the path, argv, and envp variables:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> STDOUT:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --stop_timeout="0ns"
>>>>>>>>>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --stop_timeout="0ns"
>>>>>>>>>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>>>>>>>>>> Starting task app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>>>>>>>>>> Launching health check process: /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check --executor=(1)@192.168.225.59:43917 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc sh -c \" exit 1 \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>>>>>>>>>> Health check process launched at pid: 3012
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> STDERR:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>>>>>>>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
>>>>>>>>>>>>>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', envp=''): No such file or directory*** Aborted at 1444270649 (unix time) try "date -d @1444270649" if you are using GNU date ***
>>>>>>>>>>>>>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>>>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID 3012; stack trace: ***
>>>>>>>>>>>>>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>>>>>>>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>>>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>>>>>>>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>>>>>>>>>>>>>> @ 0x41921c _Abort()
>>>>>>>>>>>>>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>>>>>>>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>>>>>>>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>>>>>>>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>>>>>>>>>>>>>> @ 0x43cc9c mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>>>>>>>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>>>>>>>>>>>>>> @ 0x7f4a39d92827 _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>>>>>>>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>>>>>>>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>>>>>>>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" should exist on? It definitely doesn't exist on the slave, hence execution failing.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> This is with current master, git hash 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>>>>>>>>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>>>>>>>>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> Update:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile and package the latest master (0.26.x) and deployed it to the cluster, and now health checks are working as advertised in both Marathon and my own framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>> Hi Haosdent,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Can you share your Marathon POST request that results in Mesos executing the health checks?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Since we can reference the Marathon framework, I've been doing some digging around.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X in both the TaskFactory as well an right before the task is sent to Mesos via driver.launchTasks:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> $ git diff src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>>>>>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId, config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>>>>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>>>>>>>>>          CreatedTask(
>>>>>>>>>>>>>>>>>>>>>>            taskInfo,
>>>>>>>>>>>>>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> $ git diff src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>>>>>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>>>>>>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos: Seq[TaskInfo]): Boolean = {
>>>>>>>>>>>>>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>>>>>>>>>>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>>>>>>>>>>>>>> +      var i = 0
>>>>>>>>>>>>>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID), taskInfos.asJava)
>>>>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Then I built and deployed the hacked Marathon and restarted the marathon service.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Next I created the app via the Marathon API ("hello app" is a container with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type: application/json' -d'
>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>>>>>>>>>>>>>   "apps": [
>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>>>>>>>>>>>>>       "container": {
>>>>>>>>>>>>>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>>>>>>>>>>>>>         "docker": {
>>>>>>>>>>>>>>>>>>>>>>           "image": "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>>>>>>>>>>>>>           "portMappings": [
>>>>>>>>>>>>>>>>>>>>>>             {
>>>>>>>>>>>>>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>>>>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>>>>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>>>>>>>>>           ]
>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>>>>       "env": {
>>>>>>>>>>>>>>>>>>>>>>         
>>>>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>>>>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>>>>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>>>>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>>>>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>>>>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>       "instances": 1,
>>>>>>>>>>>>>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>>>>>>>>>>>>>       "mem": 512
>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>   ]
>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> $ ls /tmp/
>>>>>>>>>>>>>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Do they match?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Yes, so I am confident this is the information being sent across the wire to Mesos.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> $ cat /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>     "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>         "value":1.0
>>>>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>         "value":512.0
>>>>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>             "begin":31641,
>>>>>>>>>>>>>>>>>>>>>>             "end":31641
>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>     "environment":{
>>>>>>>>>>>>>>>>>>>>>>       "variables":[
>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>>>>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>>>>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>>>>>>>>>>>>>           "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>>>>>>>>>>>>>           "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>>>>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>       ]
>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" --stop_timeout="0ns"
>>>>>>>>>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" --stop_timeout="0ns"
>>>>>>>>>>>>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>>>>>>>>>>>>> Starting task app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> And STDERR:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>>>>>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Any ideas of other things to try or what I could be missing?  Can't say either way about the Mesos health-check system working or not if Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks for all your help!
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> Maybe you could post your executor stdout/stderr so that we could know whether health check running not.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>> marathon also use mesos health check. When I use health check, I could saw the log like this in executor stdout.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>>>>>>>>>>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>>>>>>>>>>>>>> Launching health check process: /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>>>>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>>>>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> I am using my own framework, and the full task info I'm using is posted earlier in this thread.  Do you happen to know if Marathon uses Mesos's health checks for its health check system?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Yes, launch the health task through its definition in taskinfo. Do you launch your task through Marathon? I could test it in my side.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> Precisely, and there are none of those statements.  Are you or others confident health-checks are part of the code path when defined via task info for docker container tasks?  Going through the code, I wasn't able to find the linkage for anything other than health-checks triggered through a custom executor.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> With that being said it is a pretty good sized code base and I'm not very familiar with it, so my analysis this far has by no means been exhaustive.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> When health check launch, it would have a log like this in your executor stdout
>>>>>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the logs with the string "health" or "Health" if the health-check were active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double check.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jay, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks that's in master but not yet released. It will run docker exec with the command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if they ever run the command (in this case `sleep 5`), but have not found any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this mean that health-checks are only supported for custom executors and not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> -- 
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
> 
> 
> 
> -- 
> Best Regards,
> Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
Hi, Jay I have to say sorry for you. When I build the docker image for you,
I found the problem for launcher_dir.
https://issues.apache.org/jira/browse/MESOS-3738

On Tue, Oct 13, 2015 at 10:12 AM, Jay Taylor <ou...@gmail.com> wrote:

> Sure, I'm game.
>
> On Mon, Oct 12, 2015 at 7:11 PM, haosdent <ha...@gmail.com> wrote:
>
>> I think I could provide you a docker image later to run mesos master and
>> agent, so that we could debug this problem and find the cause more easier.
>> On Oct 13, 2015 6:46 AM, "Jay Taylor" <ou...@gmail.com> wrote:
>>
>>> Ah ha, I see now that the permissions are fine - just needed to click
>>> "Create" instead of the arrow.  Oh JIRA.. :)
>>>
>>> On Mon, Oct 12, 2015 at 3:26 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>
>>>> Hi Marco,
>>>>
>>>> What a relief!
>>>>
>>>> I'd love to file the JIRA ticket for this, but I don't think my account
>>>> has permissions over on https://issues.apache.org/jira/browse/MESOS.
>>>> I am "jaytaylor" over there.  Please let me know if you can help with that
>>>> and we can get the ball rolling on this.
>>>>
>>>>
>>>> On Mon, Oct 12, 2015 at 3:14 PM, Marco Massenzio <ma...@mesosphere.io>
>>>> wrote:
>>>>
>>>>> Jay:
>>>>>
>>>>> you hit the nail on the head: the direction is definitely one-way
>>>>> (from MESOS_ENV var to Flag) and we don't reflect --flag back into the
>>>>> MESOS_FLAG env var.
>>>>> Others more familiar with the matter may correct me, but it looks like
>>>>> you have uncovered a bug in the executor code: could you please file a Jira
>>>>> for us to look into?
>>>>>
>>>>> It seems to me that, at present, the only workaround is for you would
>>>>> be to set the MESOS_LAUNCHER_DIR env var, as the flag won't be picked by
>>>>> the executor.
>>>>>
>>>>>
>>>>> --
>>>>> *Marco Massenzio*
>>>>> Distributed Systems Engineer
>>>>> http://codetrips.com
>>>>>
>>>>> On Mon, Oct 12, 2015 at 11:44 PM, Jay Taylor <ja...@jaytaylor.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Marco,
>>>>>>
>>>>>> My reply is inline below-
>>>>>>
>>>>>> On Mon, Oct 12, 2015 at 2:27 PM, Marco Massenzio <marco@mesosphere.io
>>>>>> > wrote:
>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <
>>>>>>> marco@mesosphere.io> wrote:
>>>>>>>
>>>>>>>> Are those the stdout logs of the Agent? Because I don't see the
>>>>>>>> --launcher-dir set, however, if I look into one that is running off the
>>>>>>>> same 0.24.1 package, this is what I see:
>>>>>>>>
>>>>>>>> I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup:
>>>>>>>> --appc_store_dir="/tmp/mesos/store/appc"
>>>>>>>> --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5"
>>>>>>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>>>>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>>>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>>>>>> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
>>>>>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>>>>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>>>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>>>>>> --enforce_container_disk_quota="false"
>>>>>>>> --executor_registration_timeout="1mins"
>>>>>>>> --executor_shutdown_grace_period="5secs"
>>>>>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>>>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>>>>>> --hadoop_home="" --help="false" --initialize_driver_logging="true"
>>>>>>>> --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem"
>>>>>>>> --launcher_dir="/usr/libexec/mesos"
>>>>>>>> --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0"
>>>>>>>> --logging_level="INFO" --master="zk://
>>>>>>>> 192.168.33.1:2181/mesos/vagrant"
>>>>>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>>>>>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
>>>>>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>>>>>> --registration_backoff_factor="1secs"
>>>>>>>> --resource_monitoring_interval="1secs"
>>>>>>>> --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]"
>>>>>>>> --revocable_cpu_low_priority="true"
>>>>>>>> --sandbox_directory="/var/local/sandbox" --strict="true"
>>>>>>>> --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
>>>>>>>>
>>>>>>> (this is run off the Vagrantfile at [0] in case you want to
>>>>>>>> reproduce).
>>>>>>>> That agent is not run via the init command, though, I execute it
>>>>>>>> manually via the `run-agent.sh` in the same directory.
>>>>>>>>
>>>>>>>> I don't really think this matters, but I assume you also restarted
>>>>>>>> the agent after making the config changes?
>>>>>>>> (and, for your own sanity - you can double check the version by
>>>>>>>> looking at the very head of the logs).
>>>>>>>>
>>>>>>>
>>>>>> Yes I definitely restarted all mesos processes after config changes :)
>>>>>>
>>>>>> Here s info equivalent to what you posted from one of the slaves INFO
>>>>>> log:
>>>>>>
>>>>>> Log file created at: 2015/10/12 20:22:58
>>>>>>> Running on machine: mesos-worker2a
>>>>>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>>>>>> I1012 20:22:58.469779  9605 logging.cpp:172] INFO level logging
>>>>>>> started!
>>>>>>> I1012 20:22:58.470006  9605 main.cpp:185] Build: 2015-09-25 19:13:24
>>>>>>> by root
>>>>>>> I1012 20:22:58.470023  9605 main.cpp:187] Version: 0.24.1
>>>>>>> I1012 20:22:58.470031  9605 main.cpp:190] Git tag: 0.24.1
>>>>>>> I1012 20:22:58.470039  9605 main.cpp:194] Git SHA:
>>>>>>> 44873806c2bb55da37e9adbece938274d8cd7c48
>>>>>>> I1012 20:22:58.470221  9605 containerizer.cpp:143] Using isolation:
>>>>>>> posix/cpu,posix/mem,filesystem/posix
>>>>>>> I1012 20:22:58.573750  9605 main.cpp:272] Starting Mesos slave
>>>>>>> I1012 20:22:58.574662  9621 slave.cpp:190] Slave started on 1)@
>>>>>>> 192.168.225.59:5050
>>>>>>> I1012 20:22:58.574695  9621 slave.cpp:191] Flags at startup:
>>>>>>> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
>>>>>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>>>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>>>>> --container_disk_watch_interval="15secs" --containerizers="mesos,docker"
>>>>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>>>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>>>>> --enforce_container_disk_quota="false"
>>>>>>> --executor_registration_timeout="5mins"
>>>>>>> --executor_shutdown_grace_period="5secs"
>>>>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>>>>> --hadoop_home="" --help="false" --hostname="
>>>>>>> mesos-worker2a-hobart.gigawatt.io"
>>>>>>> --initialize_driver_logging="true" --ip="192.168.225.59"
>>>>>>> --isolation="posix/cpu,posix/mem" --
>>>>>>> *launcher_dir="/usr/libexec/mesos"* --log_dir="/var/log/mesos"
>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>> --master="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>>>>> --perf_interval="1mins" --port="5050" --qos_correction_interval_min="0ns"
>>>>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>>>>> --registration_backoff_factor="1secs"
>>>>>>> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
>>>>>>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
>>>>>>> --switch_user="true" --version="false" --work_dir="/tmp/mesos"
>>>>>>
>>>>>>
>>>>>> The launcher dir is picked up by the mesos-slave process.  We can
>>>>>> also see the cmdline flag is picked up from /etc/mesos-slave like this:
>>>>>>
>>>>>> mesos-worker2a:~$ ps -ef | grep mesos
>>>>>>> root      9605     1  1 20:22 ?        00:01:18
>>>>>>> /usr/sbin/mesos-slave --ip=192.168.225.59 --log_dir=/var/log/mesos --
>>>>>>> *launcher_dir=/usr/libexec/mesos*root      9612  9605  0 20:22 ?
>>>>>>>      00:00:00 logger -p user.info -t mesos-slave[9605]
>>>>>>> root      9613  9605  0 20:22 ?        00:00:00 logger -p user.err
>>>>>>> -t mesos-slave[9605]
>>>>>>> vagrant   9951  6010  0 21:36 pts/0    00:00:00 grep --color=auto
>>>>>>> mesos
>>>>>>
>>>>>>
>>>>>>
>>>>>> What I keep coming back to is the fact that the MESOS_LAUNCHER_DIR
>>>>>> env var does not seem get picked up here:
>>>>>> https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L573-L576
>>>>>> :
>>>>>>
>>>>>>   const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>   string path =
>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>
>>>>>>
>>>>>> And argv[0] (which contains the slave work dir) is the path we see in
>>>>>> the tasks stdout.
>>>>>>
>>>>>> I'm still having trouble understanding how flags defined in
>>>>>> mesos::internal::slave::Flags::Flags (src/slave/flags.[ch]pp) are
>>>>>> propagated or expanded to MESOS_<flag_in_caps> environment variables.  Can
>>>>>> you confirm if such a mechanism exists and if so where it is?
>>>>>>
>>>>>> Otherwise, if my understanding is correct and such a mechanism
>>>>>> doesn't exist:
>>>>>>
>>>>>> How can the requisite MESOS_LAUNHER_DIR env var be available when
>>>>>> docker/executor.cpp (a child process of mesos-slave) attempts to read it?
>>>>>>
>>>>>> The lack of such a mechanism would explain the behavior I'm currently
>>>>>> observing.
>>>>>>
>>>>>> Thanks!
>>>>>> Jay
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> [0] http://github.com/massenz/zk-mesos
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Marco Massenzio*
>>>>>>>> Distributed Systems Engineer
>>>>>>>> http://codetrips.com
>>>>>>>>
>>>>>>>> On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Haosdent and Mesos friends,
>>>>>>>>>
>>>>>>>>> I've rebuilt the cluster from scratch and installed mesos 0.24.1
>>>>>>>>> from the mesosphere apt repo:
>>>>>>>>>
>>>>>>>>> $ dpkg -l | grep mesos
>>>>>>>>> ii  mesos                               0.24.1-0.2.35.ubuntu1404
>>>>>>>>>          amd64        Cluster resource manager with efficient resource
>>>>>>>>> isolation
>>>>>>>>>
>>>>>>>>> Then added the `launcher_dir' flag to
>>>>>>>>> /etc/mesos-slave/launcher_dir on the slaves:
>>>>>>>>>
>>>>>>>>> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
>>>>>>>>> /usr/libexec/mesos
>>>>>>>>>
>>>>>>>>> And yet the task health-checks are still being launched from the
>>>>>>>>> sandbox directory like before!
>>>>>>>>>
>>>>>>>>> I've also tested setting the MESOS_LAUNCHER_DIR env var and get
>>>>>>>>> the identical result (just as before on the cluster where many versions of
>>>>>>>>> mesos had been installed):
>>>>>>>>>
>>>>>>>>> STDOUT:
>>>>>>>>>
>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>> Starting task
>>>>>>>>>> hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>> Launching health check process:
>>>>>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
>>>>>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>>>>>> 127.0.0.1:8000
>>>>>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>> Health check process launched at pid: 11253
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> STDERR:
>>>>>>>>>
>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>> Starting task
>>>>>>>>>> hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>> *Launching health check process:
>>>>>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
>>>>>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>>>>>> 127.0.0.1:8000
>>>>>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>>> Health check process launched at pid: 11253
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any ideas on where to go from here?  Is there any additional
>>>>>>>>> information I can provide?
>>>>>>>>>
>>>>>>>>> Thanks as always,
>>>>>>>>> Jay
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <ha...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> For flag sent to the executor from containerizer, the flag would
>>>>>>>>>> stringify and become a command line parameter when launch executor.
>>>>>>>>>>
>>>>>>>>>> You could see this in
>>>>>>>>>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>>>>>>>>>
>>>>>>>>>> But for launcher_dir, the executor get it from `argv[0]`, as you
>>>>>>>>>> mentioned above.
>>>>>>>>>> ```
>>>>>>>>>>   string path =
>>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>>                      :
>>>>>>>>>> os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>>
>>>>>>>>>> ```
>>>>>>>>>> So I want to figure out why your argv[0] would become sandbox
>>>>>>>>>> dir, not "/usr/libexec/mesos".
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I see.  And then how are the flags sent to the executor?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Yes. The related code is located in
>>>>>>>>>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>>>>>>>>>
>>>>>>>>>>> In fact, environment variables starts with MESOS_ would load as
>>>>>>>>>>> flags variables.
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <outtatime@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> One question for you haosdent-
>>>>>>>>>>>>
>>>>>>>>>>>> You mentioned that the flags.launcher_dir should propagate to
>>>>>>>>>>>> the docker executor all the way up the chain.  Can you show me where this
>>>>>>>>>>>> logic is in the codebase?  I didn't see where that was happening and would
>>>>>>>>>>>> like to understand the mechanism.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Jay
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe tomorrow I will build a fresh cluster from scratch to see
>>>>>>>>>>>> if the broken behavior experienced today still persists.
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set
>>>>>>>>>>>> flags.launcher_dir which would find mesos-docker-executor
>>>>>>>>>>>> and mesos-health-check under this dir. Although the env is not propagated,
>>>>>>>>>>>> but MESOS_LAUNCHER_DIR still works because flags.launcher_dir
>>>>>>>>>>>> is get from it.
>>>>>>>>>>>>
>>>>>>>>>>>> For example, because I
>>>>>>>>>>>> ```
>>>>>>>>>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>>>>>>>>>> ```
>>>>>>>>>>>> before start mesos-slave. So when I launch slave, I could find
>>>>>>>>>>>> this log in slave log
>>>>>>>>>>>> ```
>>>>>>>>>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>>>>>>>>>>> xxxxx  --launcher_dir="/tmp"
>>>>>>>>>>>> ```
>>>>>>>>>>>>
>>>>>>>>>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR
>>>>>>>>>>>> become sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your
>>>>>>>>>>>> other scripts?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <outtatime@gmail.com
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir
>>>>>>>>>>>>> before.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I just tried setting both the env var and flag on the slaves,
>>>>>>>>>>>>> and have determined that the env var is not present when it is being
>>>>>>>>>>>>> checked src/docker/executor.cpp @ line 573:
>>>>>>>>>>>>>
>>>>>>>>>>>>>  const Option<string> envPath =
>>>>>>>>>>>>>> os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>>>>>>>   string path =
>>>>>>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>>>>>>                      :
>>>>>>>>>>>>>> os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>>>>>>>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is
>>>>>>>>>>>>> correctly propagated along up to the point of mesos-slave launch):
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ cat /etc/default/mesos-slave
>>>>>>>>>>>>>> export
>>>>>>>>>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>>>>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>>>>>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>>>>>>>>>> export MESOS_PORT="5050"
>>>>>>>>>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> TASK OUTPUT:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>>>>>>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>> hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>>>>>>>>>>> --executor=(1)@192.168.225.59:44523
>>>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>>>>>>>>>>> sh -c \" \/bin\/bash
>>>>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>>>>>> Health check process launched at pid: 2519
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The env var is not propagated when the docker executor is
>>>>>>>>>>>>> launched in src/slave/containerizer/docker.cpp around line 903:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   vector<string> argv;
>>>>>>>>>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>>>>>>>>>   // Construct the mesos-docker-executor using the "name" we
>>>>>>>>>>>>>> gave the
>>>>>>>>>>>>>>   // container (to distinguish it from Docker containers not
>>>>>>>>>>>>>> created
>>>>>>>>>>>>>>   // by Mesos).
>>>>>>>>>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>>>>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>>>>>>>>>       argv,
>>>>>>>>>>>>>>       Subprocess::PIPE(),
>>>>>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>>>>>> "stdout")),
>>>>>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>>>>>> "stderr")),
>>>>>>>>>>>>>>       dockerFlags(flags, container->name(),
>>>>>>>>>>>>>> container->directory),
>>>>>>>>>>>>>>       environment,
>>>>>>>>>>>>>>       lambda::bind(&setup, container->directory));
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> A little ways above we can see the environment is setup w/ the
>>>>>>>>>>>>> container tasks defined env vars.
>>>>>>>>>>>>>
>>>>>>>>>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>>>>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  container->executor.command().environment().variables()) {
>>>>>>>>>>>>>>     environment[variable.name()] = variable.value();
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>>>>>>>>>> 0.24.1 should be works.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> >Do any of you know which host the path
>>>>>>>>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>>>>>>>>>>>> failing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly
>>>>>>>>>>>>>> before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or
>>>>>>>>>>>>>> use the same dir of mesos-docker-executor.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <
>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Maybe I spoke too soon.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now the checks are attempting to run, however the STDERR is
>>>>>>>>>>>>>>> not looking good.  I've added some debugging to the error message output to
>>>>>>>>>>>>>>> show the path, argv, and envp variables:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> STDOUT:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>>>>>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>>>>>>>>>>> sh -c \" exit 1
>>>>>>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>>>>>> Health check process launched at pid: 3012
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> STDERR:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>>>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor
>>>>>>>>>>>>>>>> registered on slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit
>>>>>>>>>>>>>>>> capabilities, memory limited without swap.
>>>>>>>>>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in
>>>>>>>>>>>>>>>> childMain
>>>>>>>>>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>>>>>>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>>>>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID
>>>>>>>>>>>>>>>> 0x7f4a2f9f6700) from PID 3012; stack trace: ***
>>>>>>>>>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>>>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>>>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>>>>>>>>>> @ 0x41921c _Abort()
>>>>>>>>>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>>>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>>>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>>>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>>>>>>>>>> @ 0x43cc9c
>>>>>>>>>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>>>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>>>>>>>>>> @ 0x7f4a39d92827
>>>>>>>>>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>>>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>>>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>>>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>>>>>>> should exist on? It definitely doesn't exist on the slave,
>>>>>>>>>>>>>>> hence execution failing.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is with current master, git hash
>>>>>>>>>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>>>>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>>>>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <
>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Update:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to
>>>>>>>>>>>>>>>> compile and package the latest master (0.26.x) and deployed it to the
>>>>>>>>>>>>>>>> cluster, and now health checks are working as advertised in both Marathon
>>>>>>>>>>>>>>>> and my own framework!  Not sure what was going on with health-checks in
>>>>>>>>>>>>>>>> 0.24.0..
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <
>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Haosdent,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can you share your Marathon POST request that results in
>>>>>>>>>>>>>>>>> Mesos executing the health checks?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Since we can reference the Marathon framework, I've been
>>>>>>>>>>>>>>>>> doing some digging around.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's
>>>>>>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as
>>>>>>>>>>>>>>>>> JSON to /tmp/X in both the TaskFactory as well an right before the task is
>>>>>>>>>>>>>>>>> sent to Mesos via driver.launchTasks:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>>>>>          CreatedTask(
>>>>>>>>>>>>>>>>>>            taskInfo,
>>>>>>>>>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class
>>>>>>>>>>>>>>>>>> TaskLauncherImpl(
>>>>>>>>>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>>>>>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)")
>>>>>>>>>>>>>>>>>> { driver =>
>>>>>>>>>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>>>>>>>>>> +      var i = 0
>>>>>>>>>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" +
>>>>>>>>>>>>>>>>>> i.toString() + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>>>>>>>>>>>> taskInfos.asJava)
>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Then I built and deployed the hacked Marathon and
>>>>>>>>>>>>>>>>> restarted the marathon service.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Next I created the app via the Marathon API ("hello app"
>>>>>>>>>>>>>>>>> is a container with a simple hello-world ruby app running on
>>>>>>>>>>>>>>>>> 0.0.0.0:8000)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST
>>>>>>>>>>>>>>>>>> -H'Content-Type: application/json' -d'
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>>>>>>>>>   "apps": [
>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>>>>>>>>>       "container": {
>>>>>>>>>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>>>>>>>>>         "docker": {
>>>>>>>>>>>>>>>>>>           "image":
>>>>>>>>>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>>>>>>>>>           "portMappings": [
>>>>>>>>>>>>>>>>>>             {
>>>>>>>>>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>>>>>           ]
>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>       "env": {
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>       "instances": 1,
>>>>>>>>>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>>>>>>>>>       "mem": 512
>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>   ]
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> $ ls /tmp/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do they match?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes, so I am confident this is the information being sent
>>>>>>>>>>>>>>>>> across the wire to Mesos.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> $ cat
>>>>>>>>>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>         "value":1.0
>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>         "value":512.0
>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>             "begin":31641,
>>>>>>>>>>>>>>>>>>             "end":31641
>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>     "environment":{
>>>>>>>>>>>>>>>>>>       "variables":[
>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>       ]
>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> And STDERR:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor
>>>>>>>>>>>>>>>>>> registered on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit
>>>>>>>>>>>>>>>>>> capabilities, memory limited without swap.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Any ideas of other things to try or what I could be
>>>>>>>>>>>>>>>>> missing?  Can't say either way about the Mesos health-check system working
>>>>>>>>>>>>>>>>> or not if Marathon won't put the health-check into the task it sends to
>>>>>>>>>>>>>>>>> Mesos.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for all your help!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <
>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Maybe you could post your executor stdout/stderr so that
>>>>>>>>>>>>>>>>>> we could know whether health check running not.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <
>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> marathon also use mesos health check. When I use health
>>>>>>>>>>>>>>>>>>> check, I could saw the log like this in executor stdout.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I am using my own framework, and the full task info I'm
>>>>>>>>>>>>>>>>>>>> using is posted earlier in this thread.  Do you happen to know if Marathon
>>>>>>>>>>>>>>>>>>>> uses Mesos's health checks for its health check system?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <
>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yes, launch the health task through its definition in
>>>>>>>>>>>>>>>>>>>> taskinfo. Do you launch your task through Marathon? I could test it in my
>>>>>>>>>>>>>>>>>>>> side.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Precisely, and there are none of those statements.
>>>>>>>>>>>>>>>>>>>>> Are you or others confident health-checks are part of the code path when
>>>>>>>>>>>>>>>>>>>>> defined via task info for docker container tasks?  Going through the code,
>>>>>>>>>>>>>>>>>>>>> I wasn't able to find the linkage for anything other than health-checks
>>>>>>>>>>>>>>>>>>>>> triggered through a custom executor.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> With that being said it is a pretty good sized code
>>>>>>>>>>>>>>>>>>>>> base and I'm not very familiar with it, so my analysis this far has by no
>>>>>>>>>>>>>>>>>>>>> means been exhaustive.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <
>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> When health check launch, it would have a log like
>>>>>>>>>>>>>>>>>>>>> this in your executor stdout
>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be
>>>>>>>>>>>>>>>>>>>>>> output in the logs with the string "health" or "Health" if the health-check
>>>>>>>>>>>>>>>>>>>>>> were active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <
>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see
>>>>>>>>>>>>>>>>>>>>>> whether could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <
>>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this
>>>>>>>>>>>>>>>>>>>>>>>>> backport, let me double check.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.
>>>>>>>>>>>>>>>>>>>>>>>>>> I'll look there :)
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout
>>>>>>>>>>>>>>>>>>>>>>>>>>> to test it out?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker
>>>>>>>>>>>>>>>>>>>>>>>>>>>> tasks that's in master but not yet released. It will run docker exec with
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker
>>>>>>>>>>>>>>>>>>>>>>>>>>>> image tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for
>>>>>>>>>>>>>>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> see if they ever run the command (in this case `sleep 5`), but have not
>>>>>>>>>>>>>>>>>>>>>>>>>>>> found any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are
>>>>>>>>>>>>>>>>>>>>>>>>>>>> invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that health-checks are only supported for custom executors
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>


-- 
Best Regards,
Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
Sure, I'm game.

On Mon, Oct 12, 2015 at 7:11 PM, haosdent <ha...@gmail.com> wrote:

> I think I could provide you a docker image later to run mesos master and
> agent, so that we could debug this problem and find the cause more easier.
> On Oct 13, 2015 6:46 AM, "Jay Taylor" <ou...@gmail.com> wrote:
>
>> Ah ha, I see now that the permissions are fine - just needed to click
>> "Create" instead of the arrow.  Oh JIRA.. :)
>>
>> On Mon, Oct 12, 2015 at 3:26 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>
>>> Hi Marco,
>>>
>>> What a relief!
>>>
>>> I'd love to file the JIRA ticket for this, but I don't think my account
>>> has permissions over on https://issues.apache.org/jira/browse/MESOS.  I
>>> am "jaytaylor" over there.  Please let me know if you can help with that
>>> and we can get the ball rolling on this.
>>>
>>>
>>> On Mon, Oct 12, 2015 at 3:14 PM, Marco Massenzio <ma...@mesosphere.io>
>>> wrote:
>>>
>>>> Jay:
>>>>
>>>> you hit the nail on the head: the direction is definitely one-way (from
>>>> MESOS_ENV var to Flag) and we don't reflect --flag back into the MESOS_FLAG
>>>> env var.
>>>> Others more familiar with the matter may correct me, but it looks like
>>>> you have uncovered a bug in the executor code: could you please file a Jira
>>>> for us to look into?
>>>>
>>>> It seems to me that, at present, the only workaround is for you would
>>>> be to set the MESOS_LAUNCHER_DIR env var, as the flag won't be picked by
>>>> the executor.
>>>>
>>>>
>>>> --
>>>> *Marco Massenzio*
>>>> Distributed Systems Engineer
>>>> http://codetrips.com
>>>>
>>>> On Mon, Oct 12, 2015 at 11:44 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>
>>>>> Hi Marco,
>>>>>
>>>>> My reply is inline below-
>>>>>
>>>>> On Mon, Oct 12, 2015 at 2:27 PM, Marco Massenzio <ma...@mesosphere.io>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <
>>>>>> marco@mesosphere.io> wrote:
>>>>>>
>>>>>>> Are those the stdout logs of the Agent? Because I don't see the
>>>>>>> --launcher-dir set, however, if I look into one that is running off the
>>>>>>> same 0.24.1 package, this is what I see:
>>>>>>>
>>>>>>> I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup:
>>>>>>> --appc_store_dir="/tmp/mesos/store/appc"
>>>>>>> --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5"
>>>>>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>>>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>>>>> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
>>>>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>>>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>>>>> --enforce_container_disk_quota="false"
>>>>>>> --executor_registration_timeout="1mins"
>>>>>>> --executor_shutdown_grace_period="5secs"
>>>>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>>>>> --hadoop_home="" --help="false" --initialize_driver_logging="true"
>>>>>>> --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem"
>>>>>>> --launcher_dir="/usr/libexec/mesos"
>>>>>>> --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0"
>>>>>>> --logging_level="INFO" --master="zk://
>>>>>>> 192.168.33.1:2181/mesos/vagrant"
>>>>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>>>>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
>>>>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>>>>> --registration_backoff_factor="1secs"
>>>>>>> --resource_monitoring_interval="1secs"
>>>>>>> --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]"
>>>>>>> --revocable_cpu_low_priority="true"
>>>>>>> --sandbox_directory="/var/local/sandbox" --strict="true"
>>>>>>> --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
>>>>>>>
>>>>>> (this is run off the Vagrantfile at [0] in case you want to
>>>>>>> reproduce).
>>>>>>> That agent is not run via the init command, though, I execute it
>>>>>>> manually via the `run-agent.sh` in the same directory.
>>>>>>>
>>>>>>> I don't really think this matters, but I assume you also restarted
>>>>>>> the agent after making the config changes?
>>>>>>> (and, for your own sanity - you can double check the version by
>>>>>>> looking at the very head of the logs).
>>>>>>>
>>>>>>
>>>>> Yes I definitely restarted all mesos processes after config changes :)
>>>>>
>>>>> Here s info equivalent to what you posted from one of the slaves INFO
>>>>> log:
>>>>>
>>>>> Log file created at: 2015/10/12 20:22:58
>>>>>> Running on machine: mesos-worker2a
>>>>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>>>>> I1012 20:22:58.469779  9605 logging.cpp:172] INFO level logging
>>>>>> started!
>>>>>> I1012 20:22:58.470006  9605 main.cpp:185] Build: 2015-09-25 19:13:24
>>>>>> by root
>>>>>> I1012 20:22:58.470023  9605 main.cpp:187] Version: 0.24.1
>>>>>> I1012 20:22:58.470031  9605 main.cpp:190] Git tag: 0.24.1
>>>>>> I1012 20:22:58.470039  9605 main.cpp:194] Git SHA:
>>>>>> 44873806c2bb55da37e9adbece938274d8cd7c48
>>>>>> I1012 20:22:58.470221  9605 containerizer.cpp:143] Using isolation:
>>>>>> posix/cpu,posix/mem,filesystem/posix
>>>>>> I1012 20:22:58.573750  9605 main.cpp:272] Starting Mesos slave
>>>>>> I1012 20:22:58.574662  9621 slave.cpp:190] Slave started on 1)@
>>>>>> 192.168.225.59:5050
>>>>>> I1012 20:22:58.574695  9621 slave.cpp:191] Flags at startup:
>>>>>> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
>>>>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>>>> --container_disk_watch_interval="15secs" --containerizers="mesos,docker"
>>>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>>>> --enforce_container_disk_quota="false"
>>>>>> --executor_registration_timeout="5mins"
>>>>>> --executor_shutdown_grace_period="5secs"
>>>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>>>> --hadoop_home="" --help="false" --hostname="
>>>>>> mesos-worker2a-hobart.gigawatt.io"
>>>>>> --initialize_driver_logging="true" --ip="192.168.225.59"
>>>>>> --isolation="posix/cpu,posix/mem" --
>>>>>> *launcher_dir="/usr/libexec/mesos"* --log_dir="/var/log/mesos"
>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>> --master="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>>>> --perf_interval="1mins" --port="5050" --qos_correction_interval_min="0ns"
>>>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>>>> --registration_backoff_factor="1secs"
>>>>>> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
>>>>>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
>>>>>> --switch_user="true" --version="false" --work_dir="/tmp/mesos"
>>>>>
>>>>>
>>>>> The launcher dir is picked up by the mesos-slave process.  We can also
>>>>> see the cmdline flag is picked up from /etc/mesos-slave like this:
>>>>>
>>>>> mesos-worker2a:~$ ps -ef | grep mesos
>>>>>> root      9605     1  1 20:22 ?        00:01:18 /usr/sbin/mesos-slave
>>>>>> --ip=192.168.225.59 --log_dir=/var/log/mesos --
>>>>>> *launcher_dir=/usr/libexec/mesos*root      9612  9605  0 20:22 ?
>>>>>>    00:00:00 logger -p user.info -t mesos-slave[9605]
>>>>>> root      9613  9605  0 20:22 ?        00:00:00 logger -p user.err -t
>>>>>> mesos-slave[9605]
>>>>>> vagrant   9951  6010  0 21:36 pts/0    00:00:00 grep --color=auto
>>>>>> mesos
>>>>>
>>>>>
>>>>>
>>>>> What I keep coming back to is the fact that the MESOS_LAUNCHER_DIR env
>>>>> var does not seem get picked up here:
>>>>> https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L573-L576
>>>>> :
>>>>>
>>>>>   const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>   string path =
>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>
>>>>>
>>>>> And argv[0] (which contains the slave work dir) is the path we see in
>>>>> the tasks stdout.
>>>>>
>>>>> I'm still having trouble understanding how flags defined in
>>>>> mesos::internal::slave::Flags::Flags (src/slave/flags.[ch]pp) are
>>>>> propagated or expanded to MESOS_<flag_in_caps> environment variables.  Can
>>>>> you confirm if such a mechanism exists and if so where it is?
>>>>>
>>>>> Otherwise, if my understanding is correct and such a mechanism doesn't
>>>>> exist:
>>>>>
>>>>> How can the requisite MESOS_LAUNHER_DIR env var be available when
>>>>> docker/executor.cpp (a child process of mesos-slave) attempts to read it?
>>>>>
>>>>> The lack of such a mechanism would explain the behavior I'm currently
>>>>> observing.
>>>>>
>>>>> Thanks!
>>>>> Jay
>>>>>
>>>>>
>>>>>>>
>>>>>>> [0] http://github.com/massenz/zk-mesos
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Marco Massenzio*
>>>>>>> Distributed Systems Engineer
>>>>>>> http://codetrips.com
>>>>>>>
>>>>>>> On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <ou...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Haosdent and Mesos friends,
>>>>>>>>
>>>>>>>> I've rebuilt the cluster from scratch and installed mesos 0.24.1
>>>>>>>> from the mesosphere apt repo:
>>>>>>>>
>>>>>>>> $ dpkg -l | grep mesos
>>>>>>>> ii  mesos                               0.24.1-0.2.35.ubuntu1404
>>>>>>>>          amd64        Cluster resource manager with efficient resource
>>>>>>>> isolation
>>>>>>>>
>>>>>>>> Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir
>>>>>>>> on the slaves:
>>>>>>>>
>>>>>>>> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
>>>>>>>> /usr/libexec/mesos
>>>>>>>>
>>>>>>>> And yet the task health-checks are still being launched from the
>>>>>>>> sandbox directory like before!
>>>>>>>>
>>>>>>>> I've also tested setting the MESOS_LAUNCHER_DIR env var and get the
>>>>>>>> identical result (just as before on the cluster where many versions of
>>>>>>>> mesos had been installed):
>>>>>>>>
>>>>>>>> STDOUT:
>>>>>>>>
>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>> Launching health check process:
>>>>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
>>>>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>>>>> 127.0.0.1:8000
>>>>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>> Health check process launched at pid: 11253
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> STDERR:
>>>>>>>>
>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>> *Launching health check process:
>>>>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
>>>>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>>>>> 127.0.0.1:8000
>>>>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>>> Health check process launched at pid: 11253
>>>>>>>>
>>>>>>>>
>>>>>>>> Any ideas on where to go from here?  Is there any additional
>>>>>>>> information I can provide?
>>>>>>>>
>>>>>>>> Thanks as always,
>>>>>>>> Jay
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <ha...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> For flag sent to the executor from containerizer, the flag would
>>>>>>>>> stringify and become a command line parameter when launch executor.
>>>>>>>>>
>>>>>>>>> You could see this in
>>>>>>>>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>>>>>>>>
>>>>>>>>> But for launcher_dir, the executor get it from `argv[0]`, as you
>>>>>>>>> mentioned above.
>>>>>>>>> ```
>>>>>>>>>   string path =
>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> So I want to figure out why your argv[0] would become sandbox dir,
>>>>>>>>> not "/usr/libexec/mesos".
>>>>>>>>>
>>>>>>>>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I see.  And then how are the flags sent to the executor?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Yes. The related code is located in
>>>>>>>>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>>>>>>>>
>>>>>>>>>> In fact, environment variables starts with MESOS_ would load as
>>>>>>>>>> flags variables.
>>>>>>>>>>
>>>>>>>>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> One question for you haosdent-
>>>>>>>>>>>
>>>>>>>>>>> You mentioned that the flags.launcher_dir should propagate to
>>>>>>>>>>> the docker executor all the way up the chain.  Can you show me where this
>>>>>>>>>>> logic is in the codebase?  I didn't see where that was happening and would
>>>>>>>>>>> like to understand the mechanism.
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> Jay
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Maybe tomorrow I will build a fresh cluster from scratch to see
>>>>>>>>>>> if the broken behavior experienced today still persists.
>>>>>>>>>>>
>>>>>>>>>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set
>>>>>>>>>>> flags.launcher_dir which would find mesos-docker-executor
>>>>>>>>>>> and mesos-health-check under this dir. Although the env is not propagated,
>>>>>>>>>>> but MESOS_LAUNCHER_DIR still works because flags.launcher_dir
>>>>>>>>>>> is get from it.
>>>>>>>>>>>
>>>>>>>>>>> For example, because I
>>>>>>>>>>> ```
>>>>>>>>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>>>>>>>>> ```
>>>>>>>>>>> before start mesos-slave. So when I launch slave, I could find
>>>>>>>>>>> this log in slave log
>>>>>>>>>>> ```
>>>>>>>>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>>>>>>>>>> xxxxx  --launcher_dir="/tmp"
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become
>>>>>>>>>>> sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other
>>>>>>>>>>> scripts?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir
>>>>>>>>>>>> before.
>>>>>>>>>>>>
>>>>>>>>>>>> I just tried setting both the env var and flag on the slaves,
>>>>>>>>>>>> and have determined that the env var is not present when it is being
>>>>>>>>>>>> checked src/docker/executor.cpp @ line 573:
>>>>>>>>>>>>
>>>>>>>>>>>>  const Option<string> envPath =
>>>>>>>>>>>>> os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>>>>>>   string path =
>>>>>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>>>>>                      :
>>>>>>>>>>>>> os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>>>>>>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is
>>>>>>>>>>>> correctly propagated along up to the point of mesos-slave launch):
>>>>>>>>>>>>
>>>>>>>>>>>> $ cat /etc/default/mesos-slave
>>>>>>>>>>>>> export
>>>>>>>>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>>>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>>>>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>>>>>>>>> export MESOS_PORT="5050"
>>>>>>>>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> TASK OUTPUT:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>>>>>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>> hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>>>>>>>>>> --executor=(1)@192.168.225.59:44523
>>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>>>>>>>>>> sh -c \" \/bin\/bash
>>>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>>>>> Health check process launched at pid: 2519
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The env var is not propagated when the docker executor is
>>>>>>>>>>>> launched in src/slave/containerizer/docker.cpp around line 903:
>>>>>>>>>>>>
>>>>>>>>>>>>   vector<string> argv;
>>>>>>>>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>>>>>>>>   // Construct the mesos-docker-executor using the "name" we
>>>>>>>>>>>>> gave the
>>>>>>>>>>>>>   // container (to distinguish it from Docker containers not
>>>>>>>>>>>>> created
>>>>>>>>>>>>>   // by Mesos).
>>>>>>>>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>>>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>>>>>>>>       argv,
>>>>>>>>>>>>>       Subprocess::PIPE(),
>>>>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>>>>> "stdout")),
>>>>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>>>>> "stderr")),
>>>>>>>>>>>>>       dockerFlags(flags, container->name(),
>>>>>>>>>>>>> container->directory),
>>>>>>>>>>>>>       environment,
>>>>>>>>>>>>>       lambda::bind(&setup, container->directory));
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> A little ways above we can see the environment is setup w/ the
>>>>>>>>>>>> container tasks defined env vars.
>>>>>>>>>>>>
>>>>>>>>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>>>>>>>>
>>>>>>>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>>>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>>>>>>>>
>>>>>>>>>>>>>  container->executor.command().environment().variables()) {
>>>>>>>>>>>>>     environment[variable.name()] = variable.value();
>>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>>>>>>>>> 0.24.1 should be works.
>>>>>>>>>>>>>
>>>>>>>>>>>>> >Do any of you know which host the path
>>>>>>>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>>>>>>>>>>> failing.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly
>>>>>>>>>>>>> before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or
>>>>>>>>>>>>> use the same dir of mesos-docker-executor.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <
>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe I spoke too soon.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now the checks are attempting to run, however the STDERR is
>>>>>>>>>>>>>> not looking good.  I've added some debugging to the error message output to
>>>>>>>>>>>>>> show the path, argv, and envp variables:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> STDOUT:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>>>>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>>>>>>>>>> sh -c \" exit 1
>>>>>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>>>>> Health check process launched at pid: 3012
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> STDERR:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered
>>>>>>>>>>>>>>> on slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit
>>>>>>>>>>>>>>> capabilities, memory limited without swap.
>>>>>>>>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in
>>>>>>>>>>>>>>> childMain
>>>>>>>>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>>>>>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>>>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID
>>>>>>>>>>>>>>> 0x7f4a2f9f6700) from PID 3012; stack trace: ***
>>>>>>>>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>>>>>>>>> @ 0x41921c _Abort()
>>>>>>>>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>>>>>>>>> @ 0x43cc9c
>>>>>>>>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>>>>>>>>> @ 0x7f4a39d92827
>>>>>>>>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>>>>>> should exist on? It definitely doesn't exist on the slave,
>>>>>>>>>>>>>> hence execution failing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is with current master, git hash
>>>>>>>>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>>>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>>>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <
>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Update:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to
>>>>>>>>>>>>>>> compile and package the latest master (0.26.x) and deployed it to the
>>>>>>>>>>>>>>> cluster, and now health checks are working as advertised in both Marathon
>>>>>>>>>>>>>>> and my own framework!  Not sure what was going on with health-checks in
>>>>>>>>>>>>>>> 0.24.0..
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <
>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Haosdent,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can you share your Marathon POST request that results in
>>>>>>>>>>>>>>>> Mesos executing the health checks?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Since we can reference the Marathon framework, I've been
>>>>>>>>>>>>>>>> doing some digging around.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's
>>>>>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as
>>>>>>>>>>>>>>>> JSON to /tmp/X in both the TaskFactory as well an right before the task is
>>>>>>>>>>>>>>>> sent to Mesos via driver.launchTasks:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>>>>          CreatedTask(
>>>>>>>>>>>>>>>>>            taskInfo,
>>>>>>>>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class
>>>>>>>>>>>>>>>>> TaskLauncherImpl(
>>>>>>>>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>>>>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") {
>>>>>>>>>>>>>>>>> driver =>
>>>>>>>>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>>>>>>>>> +      var i = 0
>>>>>>>>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" +
>>>>>>>>>>>>>>>>> i.toString() + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>>>>>>>>>>> taskInfos.asJava)
>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Then I built and deployed the hacked Marathon and restarted
>>>>>>>>>>>>>>>> the marathon service.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Next I created the app via the Marathon API ("hello app" is
>>>>>>>>>>>>>>>> a container with a simple hello-world ruby app running on
>>>>>>>>>>>>>>>> 0.0.0.0:8000)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST
>>>>>>>>>>>>>>>>> -H'Content-Type: application/json' -d'
>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>>>>>>>>   "apps": [
>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>>>>>>>>       "container": {
>>>>>>>>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>>>>>>>>         "docker": {
>>>>>>>>>>>>>>>>>           "image":
>>>>>>>>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>>>>>>>>           "portMappings": [
>>>>>>>>>>>>>>>>>             {
>>>>>>>>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>>>>           ]
>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>       "env": {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>       "instances": 1,
>>>>>>>>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>>>>>>>>       "mem": 512
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>   ]
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> $ ls /tmp/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Do they match?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, so I am confident this is the information being sent
>>>>>>>>>>>>>>>> across the wire to Mesos.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> $ cat
>>>>>>>>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>         "value":1.0
>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>         "value":512.0
>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>             "begin":31641,
>>>>>>>>>>>>>>>>>             "end":31641
>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>     "environment":{
>>>>>>>>>>>>>>>>>       "variables":[
>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>       ]
>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> And STDERR:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor
>>>>>>>>>>>>>>>>> registered on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit
>>>>>>>>>>>>>>>>> capabilities, memory limited without swap.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Any ideas of other things to try or what I could be
>>>>>>>>>>>>>>>> missing?  Can't say either way about the Mesos health-check system working
>>>>>>>>>>>>>>>> or not if Marathon won't put the health-check into the task it sends to
>>>>>>>>>>>>>>>> Mesos.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for all your help!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <
>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Maybe you could post your executor stdout/stderr so that
>>>>>>>>>>>>>>>>> we could know whether health check running not.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <
>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> marathon also use mesos health check. When I use health
>>>>>>>>>>>>>>>>>> check, I could saw the log like this in executor stdout.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I am using my own framework, and the full task info I'm
>>>>>>>>>>>>>>>>>>> using is posted earlier in this thread.  Do you happen to know if Marathon
>>>>>>>>>>>>>>>>>>> uses Mesos's health checks for its health check system?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, launch the health task through its definition in
>>>>>>>>>>>>>>>>>>> taskinfo. Do you launch your task through Marathon? I could test it in my
>>>>>>>>>>>>>>>>>>> side.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Precisely, and there are none of those statements.  Are
>>>>>>>>>>>>>>>>>>>> you or others confident health-checks are part of the code path when
>>>>>>>>>>>>>>>>>>>> defined via task info for docker container tasks?  Going through the code,
>>>>>>>>>>>>>>>>>>>> I wasn't able to find the linkage for anything other than health-checks
>>>>>>>>>>>>>>>>>>>> triggered through a custom executor.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> With that being said it is a pretty good sized code
>>>>>>>>>>>>>>>>>>>> base and I'm not very familiar with it, so my analysis this far has by no
>>>>>>>>>>>>>>>>>>>> means been exhaustive.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <
>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> When health check launch, it would have a log like this
>>>>>>>>>>>>>>>>>>>> in your executor stdout
>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be
>>>>>>>>>>>>>>>>>>>>> output in the logs with the string "health" or "Health" if the health-check
>>>>>>>>>>>>>>>>>>>>> were active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <
>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see
>>>>>>>>>>>>>>>>>>>>> whether could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <
>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this
>>>>>>>>>>>>>>>>>>>>>>>> backport, let me double check.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.
>>>>>>>>>>>>>>>>>>>>>>>>> I'll look there :)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout
>>>>>>>>>>>>>>>>>>>>>>>>>> to test it out?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker
>>>>>>>>>>>>>>>>>>>>>>>>>>> tasks that's in master but not yet released. It will run docker exec with
>>>>>>>>>>>>>>>>>>>>>>>>>>> the command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker
>>>>>>>>>>>>>>>>>>>>>>>>>>> image tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for
>>>>>>>>>>>>>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to
>>>>>>>>>>>>>>>>>>>>>>>>>>> see if they ever run the command (in this case `sleep 5`), but have not
>>>>>>>>>>>>>>>>>>>>>>>>>>> found any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are
>>>>>>>>>>>>>>>>>>>>>>>>>>> invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that health-checks are only supported for custom executors
>>>>>>>>>>>>>>>>>>>>>>>>>>> and not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the
>>>>>>>>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
I think I could provide you a docker image later to run mesos master and
agent, so that we could debug this problem and find the cause more easier.
On Oct 13, 2015 6:46 AM, "Jay Taylor" <ou...@gmail.com> wrote:

> Ah ha, I see now that the permissions are fine - just needed to click
> "Create" instead of the arrow.  Oh JIRA.. :)
>
> On Mon, Oct 12, 2015 at 3:26 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>
>> Hi Marco,
>>
>> What a relief!
>>
>> I'd love to file the JIRA ticket for this, but I don't think my account
>> has permissions over on https://issues.apache.org/jira/browse/MESOS.  I
>> am "jaytaylor" over there.  Please let me know if you can help with that
>> and we can get the ball rolling on this.
>>
>>
>> On Mon, Oct 12, 2015 at 3:14 PM, Marco Massenzio <ma...@mesosphere.io>
>> wrote:
>>
>>> Jay:
>>>
>>> you hit the nail on the head: the direction is definitely one-way (from
>>> MESOS_ENV var to Flag) and we don't reflect --flag back into the MESOS_FLAG
>>> env var.
>>> Others more familiar with the matter may correct me, but it looks like
>>> you have uncovered a bug in the executor code: could you please file a Jira
>>> for us to look into?
>>>
>>> It seems to me that, at present, the only workaround is for you would be
>>> to set the MESOS_LAUNCHER_DIR env var, as the flag won't be picked by the
>>> executor.
>>>
>>>
>>> --
>>> *Marco Massenzio*
>>> Distributed Systems Engineer
>>> http://codetrips.com
>>>
>>> On Mon, Oct 12, 2015 at 11:44 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>
>>>> Hi Marco,
>>>>
>>>> My reply is inline below-
>>>>
>>>> On Mon, Oct 12, 2015 at 2:27 PM, Marco Massenzio <ma...@mesosphere.io>
>>>> wrote:
>>>>
>>>>>
>>>>> On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <marco@mesosphere.io
>>>>> > wrote:
>>>>>
>>>>>> Are those the stdout logs of the Agent? Because I don't see the
>>>>>> --launcher-dir set, however, if I look into one that is running off the
>>>>>> same 0.24.1 package, this is what I see:
>>>>>>
>>>>>> I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup:
>>>>>> --appc_store_dir="/tmp/mesos/store/appc"
>>>>>> --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5"
>>>>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>>>> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
>>>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>>>> --enforce_container_disk_quota="false"
>>>>>> --executor_registration_timeout="1mins"
>>>>>> --executor_shutdown_grace_period="5secs"
>>>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>>>> --hadoop_home="" --help="false" --initialize_driver_logging="true"
>>>>>> --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem"
>>>>>> --launcher_dir="/usr/libexec/mesos"
>>>>>> --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0"
>>>>>> --logging_level="INFO" --master="zk://192.168.33.1:2181/mesos/vagrant"
>>>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>>>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
>>>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>>>> --registration_backoff_factor="1secs"
>>>>>> --resource_monitoring_interval="1secs"
>>>>>> --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]"
>>>>>> --revocable_cpu_low_priority="true"
>>>>>> --sandbox_directory="/var/local/sandbox" --strict="true"
>>>>>> --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
>>>>>>
>>>>> (this is run off the Vagrantfile at [0] in case you want to reproduce).
>>>>>> That agent is not run via the init command, though, I execute it
>>>>>> manually via the `run-agent.sh` in the same directory.
>>>>>>
>>>>>> I don't really think this matters, but I assume you also restarted
>>>>>> the agent after making the config changes?
>>>>>> (and, for your own sanity - you can double check the version by
>>>>>> looking at the very head of the logs).
>>>>>>
>>>>>
>>>> Yes I definitely restarted all mesos processes after config changes :)
>>>>
>>>> Here s info equivalent to what you posted from one of the slaves INFO
>>>> log:
>>>>
>>>> Log file created at: 2015/10/12 20:22:58
>>>>> Running on machine: mesos-worker2a
>>>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>>>> I1012 20:22:58.469779  9605 logging.cpp:172] INFO level logging
>>>>> started!
>>>>> I1012 20:22:58.470006  9605 main.cpp:185] Build: 2015-09-25 19:13:24
>>>>> by root
>>>>> I1012 20:22:58.470023  9605 main.cpp:187] Version: 0.24.1
>>>>> I1012 20:22:58.470031  9605 main.cpp:190] Git tag: 0.24.1
>>>>> I1012 20:22:58.470039  9605 main.cpp:194] Git SHA:
>>>>> 44873806c2bb55da37e9adbece938274d8cd7c48
>>>>> I1012 20:22:58.470221  9605 containerizer.cpp:143] Using isolation:
>>>>> posix/cpu,posix/mem,filesystem/posix
>>>>> I1012 20:22:58.573750  9605 main.cpp:272] Starting Mesos slave
>>>>> I1012 20:22:58.574662  9621 slave.cpp:190] Slave started on 1)@
>>>>> 192.168.225.59:5050
>>>>> I1012 20:22:58.574695  9621 slave.cpp:191] Flags at startup:
>>>>> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
>>>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>>> --container_disk_watch_interval="15secs" --containerizers="mesos,docker"
>>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>>> --enforce_container_disk_quota="false"
>>>>> --executor_registration_timeout="5mins"
>>>>> --executor_shutdown_grace_period="5secs"
>>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>>> --hadoop_home="" --help="false" --hostname="
>>>>> mesos-worker2a-hobart.gigawatt.io" --initialize_driver_logging="true"
>>>>> --ip="192.168.225.59" --isolation="posix/cpu,posix/mem" --
>>>>> *launcher_dir="/usr/libexec/mesos"* --log_dir="/var/log/mesos"
>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>> --master="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>>> --perf_interval="1mins" --port="5050" --qos_correction_interval_min="0ns"
>>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>>> --registration_backoff_factor="1secs"
>>>>> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
>>>>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
>>>>> --switch_user="true" --version="false" --work_dir="/tmp/mesos"
>>>>
>>>>
>>>> The launcher dir is picked up by the mesos-slave process.  We can also
>>>> see the cmdline flag is picked up from /etc/mesos-slave like this:
>>>>
>>>> mesos-worker2a:~$ ps -ef | grep mesos
>>>>> root      9605     1  1 20:22 ?        00:01:18 /usr/sbin/mesos-slave
>>>>> --ip=192.168.225.59 --log_dir=/var/log/mesos --
>>>>> *launcher_dir=/usr/libexec/mesos*root      9612  9605  0 20:22 ?
>>>>>    00:00:00 logger -p user.info -t mesos-slave[9605]
>>>>> root      9613  9605  0 20:22 ?        00:00:00 logger -p user.err -t
>>>>> mesos-slave[9605]
>>>>> vagrant   9951  6010  0 21:36 pts/0    00:00:00 grep --color=auto mesos
>>>>
>>>>
>>>>
>>>> What I keep coming back to is the fact that the MESOS_LAUNCHER_DIR env
>>>> var does not seem get picked up here:
>>>> https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L573-L576
>>>> :
>>>>
>>>>   const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>   string path =
>>>>>     envPath.isSome() ? envPath.get()
>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>
>>>>
>>>> And argv[0] (which contains the slave work dir) is the path we see in
>>>> the tasks stdout.
>>>>
>>>> I'm still having trouble understanding how flags defined in
>>>> mesos::internal::slave::Flags::Flags (src/slave/flags.[ch]pp) are
>>>> propagated or expanded to MESOS_<flag_in_caps> environment variables.  Can
>>>> you confirm if such a mechanism exists and if so where it is?
>>>>
>>>> Otherwise, if my understanding is correct and such a mechanism doesn't
>>>> exist:
>>>>
>>>> How can the requisite MESOS_LAUNHER_DIR env var be available when
>>>> docker/executor.cpp (a child process of mesos-slave) attempts to read it?
>>>>
>>>> The lack of such a mechanism would explain the behavior I'm currently
>>>> observing.
>>>>
>>>> Thanks!
>>>> Jay
>>>>
>>>>
>>>>>>
>>>>>> [0] http://github.com/massenz/zk-mesos
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Marco Massenzio*
>>>>>> Distributed Systems Engineer
>>>>>> http://codetrips.com
>>>>>>
>>>>>> On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <ou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Haosdent and Mesos friends,
>>>>>>>
>>>>>>> I've rebuilt the cluster from scratch and installed mesos 0.24.1
>>>>>>> from the mesosphere apt repo:
>>>>>>>
>>>>>>> $ dpkg -l | grep mesos
>>>>>>> ii  mesos                               0.24.1-0.2.35.ubuntu1404
>>>>>>>        amd64        Cluster resource manager with efficient resource
>>>>>>> isolation
>>>>>>>
>>>>>>> Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir
>>>>>>> on the slaves:
>>>>>>>
>>>>>>> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
>>>>>>> /usr/libexec/mesos
>>>>>>>
>>>>>>> And yet the task health-checks are still being launched from the
>>>>>>> sandbox directory like before!
>>>>>>>
>>>>>>> I've also tested setting the MESOS_LAUNCHER_DIR env var and get the
>>>>>>> identical result (just as before on the cluster where many versions of
>>>>>>> mesos had been installed):
>>>>>>>
>>>>>>> STDOUT:
>>>>>>>
>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>> --stop_timeout="0ns"
>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>> --stop_timeout="0ns"
>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>> Launching health check process:
>>>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
>>>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>>>> 127.0.0.1:8000
>>>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>> Health check process launched at pid: 11253
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> STDERR:
>>>>>>>
>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>> --stop_timeout="0ns"
>>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>>> --stop_timeout="0ns"
>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>> *Launching health check process:
>>>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
>>>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>>>> 127.0.0.1:8000
>>>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>>> Health check process launched at pid: 11253
>>>>>>>
>>>>>>>
>>>>>>> Any ideas on where to go from here?  Is there any additional
>>>>>>> information I can provide?
>>>>>>>
>>>>>>> Thanks as always,
>>>>>>> Jay
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>
>>>>>>>> For flag sent to the executor from containerizer, the flag would
>>>>>>>> stringify and become a command line parameter when launch executor.
>>>>>>>>
>>>>>>>> You could see this in
>>>>>>>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>>>>>>>
>>>>>>>> But for launcher_dir, the executor get it from `argv[0]`, as you
>>>>>>>> mentioned above.
>>>>>>>> ```
>>>>>>>>   string path =
>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>
>>>>>>>> ```
>>>>>>>> So I want to figure out why your argv[0] would become sandbox dir,
>>>>>>>> not "/usr/libexec/mesos".
>>>>>>>>
>>>>>>>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I see.  And then how are the flags sent to the executor?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Yes. The related code is located in
>>>>>>>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>>>>>>>
>>>>>>>>> In fact, environment variables starts with MESOS_ would load as
>>>>>>>>> flags variables.
>>>>>>>>>
>>>>>>>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>>>>>>>
>>>>>>>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> One question for you haosdent-
>>>>>>>>>>
>>>>>>>>>> You mentioned that the flags.launcher_dir should propagate to the
>>>>>>>>>> docker executor all the way up the chain.  Can you show me where this logic
>>>>>>>>>> is in the codebase?  I didn't see where that was happening and would like
>>>>>>>>>> to understand the mechanism.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Jay
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Maybe tomorrow I will build a fresh cluster from scratch to see
>>>>>>>>>> if the broken behavior experienced today still persists.
>>>>>>>>>>
>>>>>>>>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set
>>>>>>>>>> flags.launcher_dir which would find mesos-docker-executor
>>>>>>>>>> and mesos-health-check under this dir. Although the env is not propagated,
>>>>>>>>>> but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is
>>>>>>>>>> get from it.
>>>>>>>>>>
>>>>>>>>>> For example, because I
>>>>>>>>>> ```
>>>>>>>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>>>>>>>> ```
>>>>>>>>>> before start mesos-slave. So when I launch slave, I could find
>>>>>>>>>> this log in slave log
>>>>>>>>>> ```
>>>>>>>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>>>>>>>>> xxxxx  --launcher_dir="/tmp"
>>>>>>>>>> ```
>>>>>>>>>>
>>>>>>>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become
>>>>>>>>>> sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other
>>>>>>>>>> scripts?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>>>>>>>>>
>>>>>>>>>>> I just tried setting both the env var and flag on the slaves,
>>>>>>>>>>> and have determined that the env var is not present when it is being
>>>>>>>>>>> checked src/docker/executor.cpp @ line 573:
>>>>>>>>>>>
>>>>>>>>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>>>>>   string path =
>>>>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>>>>                      :
>>>>>>>>>>>> os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>>>>>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is
>>>>>>>>>>> correctly propagated along up to the point of mesos-slave launch):
>>>>>>>>>>>
>>>>>>>>>>> $ cat /etc/default/mesos-slave
>>>>>>>>>>>> export
>>>>>>>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>>>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>>>>>>>> export MESOS_PORT="5050"
>>>>>>>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> TASK OUTPUT:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>>>>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>> Starting task
>>>>>>>>>>>> hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>>>>>>>>> --executor=(1)@192.168.225.59:44523
>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>>>>>>>>> sh -c \" \/bin\/bash
>>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>>>> Health check process launched at pid: 2519
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The env var is not propagated when the docker executor is
>>>>>>>>>>> launched in src/slave/containerizer/docker.cpp around line 903:
>>>>>>>>>>>
>>>>>>>>>>>   vector<string> argv;
>>>>>>>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>>>>>>>   // Construct the mesos-docker-executor using the "name" we
>>>>>>>>>>>> gave the
>>>>>>>>>>>>   // container (to distinguish it from Docker containers not
>>>>>>>>>>>> created
>>>>>>>>>>>>   // by Mesos).
>>>>>>>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>>>>>>>       argv,
>>>>>>>>>>>>       Subprocess::PIPE(),
>>>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>>>> "stdout")),
>>>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>>>> "stderr")),
>>>>>>>>>>>>       dockerFlags(flags, container->name(),
>>>>>>>>>>>> container->directory),
>>>>>>>>>>>>       environment,
>>>>>>>>>>>>       lambda::bind(&setup, container->directory));
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> A little ways above we can see the environment is setup w/ the
>>>>>>>>>>> container tasks defined env vars.
>>>>>>>>>>>
>>>>>>>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>>>>>>>
>>>>>>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>>>>>>>
>>>>>>>>>>>>  container->executor.command().environment().variables()) {
>>>>>>>>>>>>     environment[variable.name()] = variable.value();
>>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>>>>>>>> 0.24.1 should be works.
>>>>>>>>>>>>
>>>>>>>>>>>> >Do any of you know which host the path
>>>>>>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>>>>>>>>>> failing.
>>>>>>>>>>>>
>>>>>>>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly
>>>>>>>>>>>> before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or
>>>>>>>>>>>> use the same dir of mesos-docker-executor.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <
>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe I spoke too soon.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now the checks are attempting to run, however the STDERR is
>>>>>>>>>>>>> not looking good.  I've added some debugging to the error message output to
>>>>>>>>>>>>> show the path, argv, and envp variables:
>>>>>>>>>>>>>
>>>>>>>>>>>>> STDOUT:
>>>>>>>>>>>>>
>>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>>>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>>>>>>>>> sh -c \" exit 1
>>>>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>>>> Health check process launched at pid: 3012
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> STDERR:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered
>>>>>>>>>>>>>> on slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit
>>>>>>>>>>>>>> capabilities, memory limited without swap.
>>>>>>>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in
>>>>>>>>>>>>>> childMain
>>>>>>>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>>>>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID
>>>>>>>>>>>>>> 0x7f4a2f9f6700) from PID 3012; stack trace: ***
>>>>>>>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>>>>>>>> @ 0x41921c _Abort()
>>>>>>>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>>>>>>>> @ 0x43cc9c
>>>>>>>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>>>>>>>> @ 0x7f4a39d92827
>>>>>>>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>>>>> should exist on? It definitely doesn't exist on the slave,
>>>>>>>>>>>>> hence execution failing.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is with current master, git hash
>>>>>>>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>>>>>>>
>>>>>>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <
>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Update:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to
>>>>>>>>>>>>>> compile and package the latest master (0.26.x) and deployed it to the
>>>>>>>>>>>>>> cluster, and now health checks are working as advertised in both Marathon
>>>>>>>>>>>>>> and my own framework!  Not sure what was going on with health-checks in
>>>>>>>>>>>>>> 0.24.0..
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <
>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Haosdent,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you share your Marathon POST request that results in
>>>>>>>>>>>>>>> Mesos executing the health checks?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Since we can reference the Marathon framework, I've been
>>>>>>>>>>>>>>> doing some digging around.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's
>>>>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON
>>>>>>>>>>>>>>> to /tmp/X in both the TaskFactory as well an right before the task is sent
>>>>>>>>>>>>>>> to Mesos via driver.launchTasks:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>>>          CreatedTask(
>>>>>>>>>>>>>>>>            taskInfo,
>>>>>>>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>>>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") {
>>>>>>>>>>>>>>>> driver =>
>>>>>>>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>>>>>>>> +      var i = 0
>>>>>>>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" +
>>>>>>>>>>>>>>>> i.toString() + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>>>>>>>>>> taskInfos.asJava)
>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then I built and deployed the hacked Marathon and restarted
>>>>>>>>>>>>>>> the marathon service.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Next I created the app via the Marathon API ("hello app" is
>>>>>>>>>>>>>>> a container with a simple hello-world ruby app running on
>>>>>>>>>>>>>>> 0.0.0.0:8000)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST
>>>>>>>>>>>>>>>> -H'Content-Type: application/json' -d'
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>>>>>>>   "apps": [
>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>>>>>>>       "container": {
>>>>>>>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>>>>>>>         "docker": {
>>>>>>>>>>>>>>>>           "image":
>>>>>>>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>>>>>>>           "portMappings": [
>>>>>>>>>>>>>>>>             {
>>>>>>>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>>>           ]
>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>       "env": {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>       "instances": 1,
>>>>>>>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>>>>>>>       "mem": 512
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>   ]
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> $ ls /tmp/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do they match?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, so I am confident this is the information being sent
>>>>>>>>>>>>>>> across the wire to Mesos.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> $ cat
>>>>>>>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>         "value":1.0
>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>         "value":512.0
>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>             "begin":31641,
>>>>>>>>>>>>>>>>             "end":31641
>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>     "environment":{
>>>>>>>>>>>>>>>>       "variables":[
>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>       ]
>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And STDERR:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor
>>>>>>>>>>>>>>>> registered on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit
>>>>>>>>>>>>>>>> capabilities, memory limited without swap.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any ideas of other things to try or what I could be
>>>>>>>>>>>>>>> missing?  Can't say either way about the Mesos health-check system working
>>>>>>>>>>>>>>> or not if Marathon won't put the health-check into the task it sends to
>>>>>>>>>>>>>>> Mesos.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for all your help!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <
>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Maybe you could post your executor stdout/stderr so that we
>>>>>>>>>>>>>>>> could know whether health check running not.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <
>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> marathon also use mesos health check. When I use health
>>>>>>>>>>>>>>>>> check, I could saw the log like this in executor stdout.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <
>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I am using my own framework, and the full task info I'm
>>>>>>>>>>>>>>>>>> using is posted earlier in this thread.  Do you happen to know if Marathon
>>>>>>>>>>>>>>>>>> uses Mesos's health checks for its health check system?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes, launch the health task through its definition in
>>>>>>>>>>>>>>>>>> taskinfo. Do you launch your task through Marathon? I could test it in my
>>>>>>>>>>>>>>>>>> side.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Precisely, and there are none of those statements.  Are
>>>>>>>>>>>>>>>>>>> you or others confident health-checks are part of the code path when
>>>>>>>>>>>>>>>>>>> defined via task info for docker container tasks?  Going through the code,
>>>>>>>>>>>>>>>>>>> I wasn't able to find the linkage for anything other than health-checks
>>>>>>>>>>>>>>>>>>> triggered through a custom executor.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> With that being said it is a pretty good sized code base
>>>>>>>>>>>>>>>>>>> and I'm not very familiar with it, so my analysis this far has by no means
>>>>>>>>>>>>>>>>>>> been exhaustive.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> When health check launch, it would have a log like this
>>>>>>>>>>>>>>>>>>> in your executor stdout
>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output
>>>>>>>>>>>>>>>>>>>> in the logs with the string "health" or "Health" if the health-check were
>>>>>>>>>>>>>>>>>>>> active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <
>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see
>>>>>>>>>>>>>>>>>>>> whether could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <
>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport,
>>>>>>>>>>>>>>>>>>>>>>> let me double check.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.
>>>>>>>>>>>>>>>>>>>>>>>> I'll look there :)
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to
>>>>>>>>>>>>>>>>>>>>>>>>> test it out?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker
>>>>>>>>>>>>>>>>>>>>>>>>>> tasks that's in master but not yet released. It will run docker exec with
>>>>>>>>>>>>>>>>>>>>>>>>>> the command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image
>>>>>>>>>>>>>>>>>>>>>>>>>> tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to
>>>>>>>>>>>>>>>>>>>>>>>>>> see if they ever run the command (in this case `sleep 5`), but have not
>>>>>>>>>>>>>>>>>>>>>>>>>> found any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are
>>>>>>>>>>>>>>>>>>>>>>>>>> invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.
>>>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that health-checks are only supported for custom executors
>>>>>>>>>>>>>>>>>>>>>>>>>> and not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the
>>>>>>>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
Ah ha, I see now that the permissions are fine - just needed to click
"Create" instead of the arrow.  Oh JIRA.. :)

On Mon, Oct 12, 2015 at 3:26 PM, Jay Taylor <ja...@jaytaylor.com> wrote:

> Hi Marco,
>
> What a relief!
>
> I'd love to file the JIRA ticket for this, but I don't think my account
> has permissions over on https://issues.apache.org/jira/browse/MESOS.  I
> am "jaytaylor" over there.  Please let me know if you can help with that
> and we can get the ball rolling on this.
>
>
> On Mon, Oct 12, 2015 at 3:14 PM, Marco Massenzio <ma...@mesosphere.io>
> wrote:
>
>> Jay:
>>
>> you hit the nail on the head: the direction is definitely one-way (from
>> MESOS_ENV var to Flag) and we don't reflect --flag back into the MESOS_FLAG
>> env var.
>> Others more familiar with the matter may correct me, but it looks like
>> you have uncovered a bug in the executor code: could you please file a Jira
>> for us to look into?
>>
>> It seems to me that, at present, the only workaround is for you would be
>> to set the MESOS_LAUNCHER_DIR env var, as the flag won't be picked by the
>> executor.
>>
>>
>> --
>> *Marco Massenzio*
>> Distributed Systems Engineer
>> http://codetrips.com
>>
>> On Mon, Oct 12, 2015 at 11:44 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>
>>> Hi Marco,
>>>
>>> My reply is inline below-
>>>
>>> On Mon, Oct 12, 2015 at 2:27 PM, Marco Massenzio <ma...@mesosphere.io>
>>> wrote:
>>>
>>>>
>>>> On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <ma...@mesosphere.io>
>>>> wrote:
>>>>
>>>>> Are those the stdout logs of the Agent? Because I don't see the
>>>>> --launcher-dir set, however, if I look into one that is running off the
>>>>> same 0.24.1 package, this is what I see:
>>>>>
>>>>> I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup:
>>>>> --appc_store_dir="/tmp/mesos/store/appc"
>>>>> --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5"
>>>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>>> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
>>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>>> --enforce_container_disk_quota="false"
>>>>> --executor_registration_timeout="1mins"
>>>>> --executor_shutdown_grace_period="5secs"
>>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>>> --hadoop_home="" --help="false" --initialize_driver_logging="true"
>>>>> --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem"
>>>>> --launcher_dir="/usr/libexec/mesos"
>>>>> --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0"
>>>>> --logging_level="INFO" --master="zk://192.168.33.1:2181/mesos/vagrant"
>>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
>>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>>> --registration_backoff_factor="1secs"
>>>>> --resource_monitoring_interval="1secs"
>>>>> --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]"
>>>>> --revocable_cpu_low_priority="true"
>>>>> --sandbox_directory="/var/local/sandbox" --strict="true"
>>>>> --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
>>>>>
>>>> (this is run off the Vagrantfile at [0] in case you want to reproduce).
>>>>> That agent is not run via the init command, though, I execute it
>>>>> manually via the `run-agent.sh` in the same directory.
>>>>>
>>>>> I don't really think this matters, but I assume you also restarted the
>>>>> agent after making the config changes?
>>>>> (and, for your own sanity - you can double check the version by
>>>>> looking at the very head of the logs).
>>>>>
>>>>
>>> Yes I definitely restarted all mesos processes after config changes :)
>>>
>>> Here s info equivalent to what you posted from one of the slaves INFO
>>> log:
>>>
>>> Log file created at: 2015/10/12 20:22:58
>>>> Running on machine: mesos-worker2a
>>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>>> I1012 20:22:58.469779  9605 logging.cpp:172] INFO level logging started!
>>>> I1012 20:22:58.470006  9605 main.cpp:185] Build: 2015-09-25 19:13:24 by
>>>> root
>>>> I1012 20:22:58.470023  9605 main.cpp:187] Version: 0.24.1
>>>> I1012 20:22:58.470031  9605 main.cpp:190] Git tag: 0.24.1
>>>> I1012 20:22:58.470039  9605 main.cpp:194] Git SHA:
>>>> 44873806c2bb55da37e9adbece938274d8cd7c48
>>>> I1012 20:22:58.470221  9605 containerizer.cpp:143] Using isolation:
>>>> posix/cpu,posix/mem,filesystem/posix
>>>> I1012 20:22:58.573750  9605 main.cpp:272] Starting Mesos slave
>>>> I1012 20:22:58.574662  9621 slave.cpp:190] Slave started on 1)@
>>>> 192.168.225.59:5050
>>>> I1012 20:22:58.574695  9621 slave.cpp:191] Flags at startup:
>>>> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
>>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>> --container_disk_watch_interval="15secs" --containerizers="mesos,docker"
>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>> --enforce_container_disk_quota="false"
>>>> --executor_registration_timeout="5mins"
>>>> --executor_shutdown_grace_period="5secs"
>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>> --hadoop_home="" --help="false" --hostname="
>>>> mesos-worker2a-hobart.gigawatt.io" --initialize_driver_logging="true"
>>>> --ip="192.168.225.59" --isolation="posix/cpu,posix/mem" --
>>>> *launcher_dir="/usr/libexec/mesos"* --log_dir="/var/log/mesos"
>>>> --logbufsecs="0" --logging_level="INFO"
>>>> --master="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>> --perf_interval="1mins" --port="5050" --qos_correction_interval_min="0ns"
>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>> --registration_backoff_factor="1secs"
>>>> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
>>>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
>>>> --switch_user="true" --version="false" --work_dir="/tmp/mesos"
>>>
>>>
>>> The launcher dir is picked up by the mesos-slave process.  We can also
>>> see the cmdline flag is picked up from /etc/mesos-slave like this:
>>>
>>> mesos-worker2a:~$ ps -ef | grep mesos
>>>> root      9605     1  1 20:22 ?        00:01:18 /usr/sbin/mesos-slave
>>>> --ip=192.168.225.59 --log_dir=/var/log/mesos --
>>>> *launcher_dir=/usr/libexec/mesos*root      9612  9605  0 20:22 ?
>>>>  00:00:00 logger -p user.info -t mesos-slave[9605]
>>>> root      9613  9605  0 20:22 ?        00:00:00 logger -p user.err -t
>>>> mesos-slave[9605]
>>>> vagrant   9951  6010  0 21:36 pts/0    00:00:00 grep --color=auto mesos
>>>
>>>
>>>
>>> What I keep coming back to is the fact that the MESOS_LAUNCHER_DIR env
>>> var does not seem get picked up here:
>>> https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L573-L576
>>> :
>>>
>>>   const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>   string path =
>>>>     envPath.isSome() ? envPath.get()
>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>
>>>
>>> And argv[0] (which contains the slave work dir) is the path we see in
>>> the tasks stdout.
>>>
>>> I'm still having trouble understanding how flags defined in
>>> mesos::internal::slave::Flags::Flags (src/slave/flags.[ch]pp) are
>>> propagated or expanded to MESOS_<flag_in_caps> environment variables.  Can
>>> you confirm if such a mechanism exists and if so where it is?
>>>
>>> Otherwise, if my understanding is correct and such a mechanism doesn't
>>> exist:
>>>
>>> How can the requisite MESOS_LAUNHER_DIR env var be available when
>>> docker/executor.cpp (a child process of mesos-slave) attempts to read it?
>>>
>>> The lack of such a mechanism would explain the behavior I'm currently
>>> observing.
>>>
>>> Thanks!
>>> Jay
>>>
>>>
>>>>>
>>>>> [0] http://github.com/massenz/zk-mesos
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Marco Massenzio*
>>>>> Distributed Systems Engineer
>>>>> http://codetrips.com
>>>>>
>>>>> On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <ou...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Haosdent and Mesos friends,
>>>>>>
>>>>>> I've rebuilt the cluster from scratch and installed mesos 0.24.1 from
>>>>>> the mesosphere apt repo:
>>>>>>
>>>>>> $ dpkg -l | grep mesos
>>>>>> ii  mesos                               0.24.1-0.2.35.ubuntu1404
>>>>>>        amd64        Cluster resource manager with efficient resource
>>>>>> isolation
>>>>>>
>>>>>> Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir
>>>>>> on the slaves:
>>>>>>
>>>>>> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
>>>>>> /usr/libexec/mesos
>>>>>>
>>>>>> And yet the task health-checks are still being launched from the
>>>>>> sandbox directory like before!
>>>>>>
>>>>>> I've also tested setting the MESOS_LAUNCHER_DIR env var and get the
>>>>>> identical result (just as before on the cluster where many versions of
>>>>>> mesos had been installed):
>>>>>>
>>>>>> STDOUT:
>>>>>>
>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>> --stop_timeout="0ns"
>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>> --stop_timeout="0ns"
>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>> Launching health check process:
>>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
>>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>>> 127.0.0.1:8000
>>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>> Health check process launched at pid: 11253
>>>>>>
>>>>>>
>>>>>>
>>>>>> STDERR:
>>>>>>
>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>> --stop_timeout="0ns"
>>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>>> --stop_timeout="0ns"
>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>> *Launching health check process:
>>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
>>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>>> 127.0.0.1:8000
>>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>>> Health check process launched at pid: 11253
>>>>>>
>>>>>>
>>>>>> Any ideas on where to go from here?  Is there any additional
>>>>>> information I can provide?
>>>>>>
>>>>>> Thanks as always,
>>>>>> Jay
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>
>>>>>>> For flag sent to the executor from containerizer, the flag would
>>>>>>> stringify and become a command line parameter when launch executor.
>>>>>>>
>>>>>>> You could see this in
>>>>>>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>>>>>>
>>>>>>> But for launcher_dir, the executor get it from `argv[0]`, as you
>>>>>>> mentioned above.
>>>>>>> ```
>>>>>>>   string path =
>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>
>>>>>>> ```
>>>>>>> So I want to figure out why your argv[0] would become sandbox dir,
>>>>>>> not "/usr/libexec/mesos".
>>>>>>>
>>>>>>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <ou...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I see.  And then how are the flags sent to the executor?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Yes. The related code is located in
>>>>>>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>>>>>>
>>>>>>>> In fact, environment variables starts with MESOS_ would load as
>>>>>>>> flags variables.
>>>>>>>>
>>>>>>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>>>>>>
>>>>>>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> One question for you haosdent-
>>>>>>>>>
>>>>>>>>> You mentioned that the flags.launcher_dir should propagate to the
>>>>>>>>> docker executor all the way up the chain.  Can you show me where this logic
>>>>>>>>> is in the codebase?  I didn't see where that was happening and would like
>>>>>>>>> to understand the mechanism.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Jay
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Maybe tomorrow I will build a fresh cluster from scratch to see if
>>>>>>>>> the broken behavior experienced today still persists.
>>>>>>>>>
>>>>>>>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set
>>>>>>>>> flags.launcher_dir which would find mesos-docker-executor
>>>>>>>>> and mesos-health-check under this dir. Although the env is not propagated,
>>>>>>>>> but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is
>>>>>>>>> get from it.
>>>>>>>>>
>>>>>>>>> For example, because I
>>>>>>>>> ```
>>>>>>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>>>>>>> ```
>>>>>>>>> before start mesos-slave. So when I launch slave, I could find
>>>>>>>>> this log in slave log
>>>>>>>>> ```
>>>>>>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>>>>>>>> xxxxx  --launcher_dir="/tmp"
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become
>>>>>>>>> sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other
>>>>>>>>> scripts?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>>>>>>>>
>>>>>>>>>> I just tried setting both the env var and flag on the slaves, and
>>>>>>>>>> have determined that the env var is not present when it is being checked
>>>>>>>>>> src/docker/executor.cpp @ line 573:
>>>>>>>>>>
>>>>>>>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>>>>   string path =
>>>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>>>                      :
>>>>>>>>>>> os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>>>>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
>>>>>>>>>> propagated along up to the point of mesos-slave launch):
>>>>>>>>>>
>>>>>>>>>> $ cat /etc/default/mesos-slave
>>>>>>>>>>> export
>>>>>>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>>>>>>> export MESOS_PORT="5050"
>>>>>>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> TASK OUTPUT:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>>>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>> Starting task
>>>>>>>>>>> hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>>> Launching health check process:
>>>>>>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>>>>>>>> --executor=(1)@192.168.225.59:44523
>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>>>>>>>> sh -c \" \/bin\/bash
>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>>> Health check process launched at pid: 2519
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The env var is not propagated when the docker executor is
>>>>>>>>>> launched in src/slave/containerizer/docker.cpp around line 903:
>>>>>>>>>>
>>>>>>>>>>   vector<string> argv;
>>>>>>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>>>>>>   // Construct the mesos-docker-executor using the "name" we
>>>>>>>>>>> gave the
>>>>>>>>>>>   // container (to distinguish it from Docker containers not
>>>>>>>>>>> created
>>>>>>>>>>>   // by Mesos).
>>>>>>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>>>>>>       argv,
>>>>>>>>>>>       Subprocess::PIPE(),
>>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>>> "stdout")),
>>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>>> "stderr")),
>>>>>>>>>>>       dockerFlags(flags, container->name(),
>>>>>>>>>>> container->directory),
>>>>>>>>>>>       environment,
>>>>>>>>>>>       lambda::bind(&setup, container->directory));
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> A little ways above we can see the environment is setup w/ the
>>>>>>>>>> container tasks defined env vars.
>>>>>>>>>>
>>>>>>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>>>>>>
>>>>>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>>>>>>
>>>>>>>>>>>  container->executor.command().environment().variables()) {
>>>>>>>>>>>     environment[variable.name()] = variable.value();
>>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>>>>>>> 0.24.1 should be works.
>>>>>>>>>>>
>>>>>>>>>>> >Do any of you know which host the path
>>>>>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>>>>>>>>> failing.
>>>>>>>>>>>
>>>>>>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly
>>>>>>>>>>> before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or
>>>>>>>>>>> use the same dir of mesos-docker-executor.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <outtatime@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Maybe I spoke too soon.
>>>>>>>>>>>>
>>>>>>>>>>>> Now the checks are attempting to run, however the STDERR is not
>>>>>>>>>>>> looking good.  I've added some debugging to the error message output to
>>>>>>>>>>>> show the path, argv, and envp variables:
>>>>>>>>>>>>
>>>>>>>>>>>> STDOUT:
>>>>>>>>>>>>
>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>>>>>>>> sh -c \" exit 1
>>>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>>> Health check process launched at pid: 3012
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> STDERR:
>>>>>>>>>>>>
>>>>>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered
>>>>>>>>>>>>> on slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>>>>>> memory limited without swap.
>>>>>>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in
>>>>>>>>>>>>> childMain
>>>>>>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>>>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700)
>>>>>>>>>>>>> from PID 3012; stack trace: ***
>>>>>>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>>>>>>> @ 0x41921c _Abort()
>>>>>>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>>>>>>> @ 0x43cc9c
>>>>>>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>>>>>>> @ 0x7f4a39d92827
>>>>>>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>>>> should exist on? It definitely doesn't exist on the slave,
>>>>>>>>>>>> hence execution failing.
>>>>>>>>>>>>
>>>>>>>>>>>> This is with current master, git hash
>>>>>>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>>>>>>
>>>>>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Jay
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <outtatime@gmail.com
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Update:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to
>>>>>>>>>>>>> compile and package the latest master (0.26.x) and deployed it to the
>>>>>>>>>>>>> cluster, and now health checks are working as advertised in both Marathon
>>>>>>>>>>>>> and my own framework!  Not sure what was going on with health-checks in
>>>>>>>>>>>>> 0.24.0..
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <
>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Haosdent,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you share your Marathon POST request that results in
>>>>>>>>>>>>>> Mesos executing the health checks?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Since we can reference the Marathon framework, I've been
>>>>>>>>>>>>>> doing some digging around.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's
>>>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON
>>>>>>>>>>>>>> to /tmp/X in both the TaskFactory as well an right before the task is sent
>>>>>>>>>>>>>> to Mesos via driver.launchTasks:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>>          CreatedTask(
>>>>>>>>>>>>>>>            taskInfo,
>>>>>>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") {
>>>>>>>>>>>>>>> driver =>
>>>>>>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>>>>>>> +      var i = 0
>>>>>>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" +
>>>>>>>>>>>>>>> i.toString() + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>>>>>>>>> taskInfos.asJava)
>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then I built and deployed the hacked Marathon and restarted
>>>>>>>>>>>>>> the marathon service.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>>>>>>>>>>> container with a simple hello-world ruby app running on
>>>>>>>>>>>>>> 0.0.0.0:8000)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST
>>>>>>>>>>>>>>> -H'Content-Type: application/json' -d'
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>>>>>>   "apps": [
>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>>>>>>       "container": {
>>>>>>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>>>>>>         "docker": {
>>>>>>>>>>>>>>>           "image":
>>>>>>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>>>>>>           "portMappings": [
>>>>>>>>>>>>>>>             {
>>>>>>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>>           ]
>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>       "env": {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>       "instances": 1,
>>>>>>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>>>>>>       "mem": 512
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>   ]
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ ls /tmp/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do they match?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, so I am confident this is the information being sent
>>>>>>>>>>>>>> across the wire to Mesos.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ cat
>>>>>>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>         "value":1.0
>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>         "value":512.0
>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>             "begin":31641,
>>>>>>>>>>>>>>>             "end":31641
>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>     "environment":{
>>>>>>>>>>>>>>>       "variables":[
>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>       ]
>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And STDERR:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor
>>>>>>>>>>>>>>> registered on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit
>>>>>>>>>>>>>>> capabilities, memory limited without swap.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any ideas of other things to try or what I could be missing?
>>>>>>>>>>>>>> Can't say either way about the Mesos health-check system working or not if
>>>>>>>>>>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for all your help!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <haosdent@gmail.com
>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Maybe you could post your executor stdout/stderr so that we
>>>>>>>>>>>>>>> could know whether health check running not.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <haosdent@gmail.com
>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> marathon also use mesos health check. When I use health
>>>>>>>>>>>>>>>> check, I could saw the log like this in executor stdout.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <
>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am using my own framework, and the full task info I'm
>>>>>>>>>>>>>>>>> using is posted earlier in this thread.  Do you happen to know if Marathon
>>>>>>>>>>>>>>>>> uses Mesos's health checks for its health check system?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes, launch the health task through its definition in
>>>>>>>>>>>>>>>>> taskinfo. Do you launch your task through Marathon? I could test it in my
>>>>>>>>>>>>>>>>> side.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <
>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Precisely, and there are none of those statements.  Are
>>>>>>>>>>>>>>>>>> you or others confident health-checks are part of the code path when
>>>>>>>>>>>>>>>>>> defined via task info for docker container tasks?  Going through the code,
>>>>>>>>>>>>>>>>>> I wasn't able to find the linkage for anything other than health-checks
>>>>>>>>>>>>>>>>>> triggered through a custom executor.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> With that being said it is a pretty good sized code base
>>>>>>>>>>>>>>>>>> and I'm not very familiar with it, so my analysis this far has by no means
>>>>>>>>>>>>>>>>>> been exhaustive.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> When health check launch, it would have a log like this
>>>>>>>>>>>>>>>>>> in your executor stdout
>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output
>>>>>>>>>>>>>>>>>>> in the logs with the string "health" or "Health" if the health-check were
>>>>>>>>>>>>>>>>>>> active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see
>>>>>>>>>>>>>>>>>>> whether could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <
>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport,
>>>>>>>>>>>>>>>>>>>>>> let me double check.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.
>>>>>>>>>>>>>>>>>>>>>>> I'll look there :)
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to
>>>>>>>>>>>>>>>>>>>>>>>> test it out?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker
>>>>>>>>>>>>>>>>>>>>>>>>> tasks that's in master but not yet released. It will run docker exec with
>>>>>>>>>>>>>>>>>>>>>>>>> the command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image
>>>>>>>>>>>>>>>>>>>>>>>>> tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see
>>>>>>>>>>>>>>>>>>>>>>>>> if they ever run the command (in this case `sleep 5`), but have not found
>>>>>>>>>>>>>>>>>>>>>>>>> any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are
>>>>>>>>>>>>>>>>>>>>>>>>> invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.
>>>>>>>>>>>>>>>>>>>>>>>>> Does this mean that health-checks are only supported for custom executors
>>>>>>>>>>>>>>>>>>>>>>>>> and not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the
>>>>>>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ja...@jaytaylor.com>.
Hi Marco,

What a relief!

I'd love to file the JIRA ticket for this, but I don't think my account has
permissions over on https://issues.apache.org/jira/browse/MESOS.  I
am "jaytaylor" over there.  Please let me know if you can help with that
and we can get the ball rolling on this.


On Mon, Oct 12, 2015 at 3:14 PM, Marco Massenzio <ma...@mesosphere.io>
wrote:

> Jay:
>
> you hit the nail on the head: the direction is definitely one-way (from
> MESOS_ENV var to Flag) and we don't reflect --flag back into the MESOS_FLAG
> env var.
> Others more familiar with the matter may correct me, but it looks like you
> have uncovered a bug in the executor code: could you please file a Jira for
> us to look into?
>
> It seems to me that, at present, the only workaround is for you would be
> to set the MESOS_LAUNCHER_DIR env var, as the flag won't be picked by the
> executor.
>
>
> --
> *Marco Massenzio*
> Distributed Systems Engineer
> http://codetrips.com
>
> On Mon, Oct 12, 2015 at 11:44 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>
>> Hi Marco,
>>
>> My reply is inline below-
>>
>> On Mon, Oct 12, 2015 at 2:27 PM, Marco Massenzio <ma...@mesosphere.io>
>> wrote:
>>
>>>
>>> On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <ma...@mesosphere.io>
>>> wrote:
>>>
>>>> Are those the stdout logs of the Agent? Because I don't see the
>>>> --launcher-dir set, however, if I look into one that is running off the
>>>> same 0.24.1 package, this is what I see:
>>>>
>>>> I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup:
>>>> --appc_store_dir="/tmp/mesos/store/appc"
>>>> --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5"
>>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>> --enforce_container_disk_quota="false"
>>>> --executor_registration_timeout="1mins"
>>>> --executor_shutdown_grace_period="5secs"
>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>> --hadoop_home="" --help="false" --initialize_driver_logging="true"
>>>> --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem"
>>>> --launcher_dir="/usr/libexec/mesos"
>>>> --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0"
>>>> --logging_level="INFO" --master="zk://192.168.33.1:2181/mesos/vagrant"
>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>> --registration_backoff_factor="1secs"
>>>> --resource_monitoring_interval="1secs"
>>>> --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]"
>>>> --revocable_cpu_low_priority="true"
>>>> --sandbox_directory="/var/local/sandbox" --strict="true"
>>>> --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
>>>>
>>> (this is run off the Vagrantfile at [0] in case you want to reproduce).
>>>> That agent is not run via the init command, though, I execute it
>>>> manually via the `run-agent.sh` in the same directory.
>>>>
>>>> I don't really think this matters, but I assume you also restarted the
>>>> agent after making the config changes?
>>>> (and, for your own sanity - you can double check the version by looking
>>>> at the very head of the logs).
>>>>
>>>
>> Yes I definitely restarted all mesos processes after config changes :)
>>
>> Here s info equivalent to what you posted from one of the slaves INFO log:
>>
>> Log file created at: 2015/10/12 20:22:58
>>> Running on machine: mesos-worker2a
>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>> I1012 20:22:58.469779  9605 logging.cpp:172] INFO level logging started!
>>> I1012 20:22:58.470006  9605 main.cpp:185] Build: 2015-09-25 19:13:24 by
>>> root
>>> I1012 20:22:58.470023  9605 main.cpp:187] Version: 0.24.1
>>> I1012 20:22:58.470031  9605 main.cpp:190] Git tag: 0.24.1
>>> I1012 20:22:58.470039  9605 main.cpp:194] Git SHA:
>>> 44873806c2bb55da37e9adbece938274d8cd7c48
>>> I1012 20:22:58.470221  9605 containerizer.cpp:143] Using isolation:
>>> posix/cpu,posix/mem,filesystem/posix
>>> I1012 20:22:58.573750  9605 main.cpp:272] Starting Mesos slave
>>> I1012 20:22:58.574662  9621 slave.cpp:190] Slave started on 1)@
>>> 192.168.225.59:5050
>>> I1012 20:22:58.574695  9621 slave.cpp:191] Flags at startup:
>>> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>> --container_disk_watch_interval="15secs" --containerizers="mesos,docker"
>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>> --enforce_container_disk_quota="false"
>>> --executor_registration_timeout="5mins"
>>> --executor_shutdown_grace_period="5secs"
>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>> --hadoop_home="" --help="false" --hostname="
>>> mesos-worker2a-hobart.gigawatt.io" --initialize_driver_logging="true"
>>> --ip="192.168.225.59" --isolation="posix/cpu,posix/mem" --
>>> *launcher_dir="/usr/libexec/mesos"* --log_dir="/var/log/mesos"
>>> --logbufsecs="0" --logging_level="INFO"
>>> --master="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>> --perf_interval="1mins" --port="5050" --qos_correction_interval_min="0ns"
>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>> --registration_backoff_factor="1secs"
>>> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
>>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
>>> --switch_user="true" --version="false" --work_dir="/tmp/mesos"
>>
>>
>> The launcher dir is picked up by the mesos-slave process.  We can also
>> see the cmdline flag is picked up from /etc/mesos-slave like this:
>>
>> mesos-worker2a:~$ ps -ef | grep mesos
>>> root      9605     1  1 20:22 ?        00:01:18 /usr/sbin/mesos-slave
>>> --ip=192.168.225.59 --log_dir=/var/log/mesos --
>>> *launcher_dir=/usr/libexec/mesos*root      9612  9605  0 20:22 ?
>>>  00:00:00 logger -p user.info -t mesos-slave[9605]
>>> root      9613  9605  0 20:22 ?        00:00:00 logger -p user.err -t
>>> mesos-slave[9605]
>>> vagrant   9951  6010  0 21:36 pts/0    00:00:00 grep --color=auto mesos
>>
>>
>>
>> What I keep coming back to is the fact that the MESOS_LAUNCHER_DIR env
>> var does not seem get picked up here:
>> https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L573-L576
>> :
>>
>>   const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>   string path =
>>>     envPath.isSome() ? envPath.get()
>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>
>>
>> And argv[0] (which contains the slave work dir) is the path we see in the
>> tasks stdout.
>>
>> I'm still having trouble understanding how flags defined in
>> mesos::internal::slave::Flags::Flags (src/slave/flags.[ch]pp) are
>> propagated or expanded to MESOS_<flag_in_caps> environment variables.  Can
>> you confirm if such a mechanism exists and if so where it is?
>>
>> Otherwise, if my understanding is correct and such a mechanism doesn't
>> exist:
>>
>> How can the requisite MESOS_LAUNHER_DIR env var be available when
>> docker/executor.cpp (a child process of mesos-slave) attempts to read it?
>>
>> The lack of such a mechanism would explain the behavior I'm currently
>> observing.
>>
>> Thanks!
>> Jay
>>
>>
>>>>
>>>> [0] http://github.com/massenz/zk-mesos
>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Marco Massenzio*
>>>> Distributed Systems Engineer
>>>> http://codetrips.com
>>>>
>>>> On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <ou...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Haosdent and Mesos friends,
>>>>>
>>>>> I've rebuilt the cluster from scratch and installed mesos 0.24.1 from
>>>>> the mesosphere apt repo:
>>>>>
>>>>> $ dpkg -l | grep mesos
>>>>> ii  mesos                               0.24.1-0.2.35.ubuntu1404
>>>>>      amd64        Cluster resource manager with efficient resource isolation
>>>>>
>>>>> Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir on
>>>>> the slaves:
>>>>>
>>>>> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
>>>>> /usr/libexec/mesos
>>>>>
>>>>> And yet the task health-checks are still being launched from the
>>>>> sandbox directory like before!
>>>>>
>>>>> I've also tested setting the MESOS_LAUNCHER_DIR env var and get the
>>>>> identical result (just as before on the cluster where many versions of
>>>>> mesos had been installed):
>>>>>
>>>>> STDOUT:
>>>>>
>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --stop_timeout="0ns"
>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --stop_timeout="0ns"
>>>>>> Registered docker executor on mesos-worker1a
>>>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>> Launching health check process:
>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>> 127.0.0.1:8000
>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>> Health check process launched at pid: 11253
>>>>>
>>>>>
>>>>>
>>>>> STDERR:
>>>>>
>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --stop_timeout="0ns"
>>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>>> --stop_timeout="0ns"
>>>>>> Registered docker executor on mesos-worker1a
>>>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>> *Launching health check process:
>>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
>>>>>> --executor=(1)@192.168.225.58:48912
>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>>> 127.0.0.1:8000
>>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>>> Health check process launched at pid: 11253
>>>>>
>>>>>
>>>>> Any ideas on where to go from here?  Is there any additional
>>>>> information I can provide?
>>>>>
>>>>> Thanks as always,
>>>>> Jay
>>>>>
>>>>>
>>>>> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>>> For flag sent to the executor from containerizer, the flag would
>>>>>> stringify and become a command line parameter when launch executor.
>>>>>>
>>>>>> You could see this in
>>>>>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>>>>>
>>>>>> But for launcher_dir, the executor get it from `argv[0]`, as you
>>>>>> mentioned above.
>>>>>> ```
>>>>>>   string path =
>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>
>>>>>> ```
>>>>>> So I want to figure out why your argv[0] would become sandbox dir,
>>>>>> not "/usr/libexec/mesos".
>>>>>>
>>>>>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <ou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I see.  And then how are the flags sent to the executor?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>
>>>>>>> Yes. The related code is located in
>>>>>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>>>>>
>>>>>>> In fact, environment variables starts with MESOS_ would load as
>>>>>>> flags variables.
>>>>>>>
>>>>>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>>>>>
>>>>>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> One question for you haosdent-
>>>>>>>>
>>>>>>>> You mentioned that the flags.launcher_dir should propagate to the
>>>>>>>> docker executor all the way up the chain.  Can you show me where this logic
>>>>>>>> is in the codebase?  I didn't see where that was happening and would like
>>>>>>>> to understand the mechanism.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Jay
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Maybe tomorrow I will build a fresh cluster from scratch to see if
>>>>>>>> the broken behavior experienced today still persists.
>>>>>>>>
>>>>>>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set
>>>>>>>> flags.launcher_dir which would find mesos-docker-executor
>>>>>>>> and mesos-health-check under this dir. Although the env is not propagated,
>>>>>>>> but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is
>>>>>>>> get from it.
>>>>>>>>
>>>>>>>> For example, because I
>>>>>>>> ```
>>>>>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>>>>>> ```
>>>>>>>> before start mesos-slave. So when I launch slave, I could find this
>>>>>>>> log in slave log
>>>>>>>> ```
>>>>>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>>>>>>> xxxxx  --launcher_dir="/tmp"
>>>>>>>> ```
>>>>>>>>
>>>>>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become
>>>>>>>> sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other
>>>>>>>> scripts?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>>>>>>>
>>>>>>>>> I just tried setting both the env var and flag on the slaves, and
>>>>>>>>> have determined that the env var is not present when it is being checked
>>>>>>>>> src/docker/executor.cpp @ line 573:
>>>>>>>>>
>>>>>>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>>>   string path =
>>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>>                      :
>>>>>>>>>> os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>>>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
>>>>>>>>> propagated along up to the point of mesos-slave launch):
>>>>>>>>>
>>>>>>>>> $ cat /etc/default/mesos-slave
>>>>>>>>>> export
>>>>>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>>>>>> export MESOS_PORT="5050"
>>>>>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> TASK OUTPUT:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>> Starting task
>>>>>>>>>> hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>> Launching health check process:
>>>>>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>>>>>>> --executor=(1)@192.168.225.59:44523
>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>>>>>>> sh -c \" \/bin\/bash
>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>>> Health check process launched at pid: 2519
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The env var is not propagated when the docker executor is launched
>>>>>>>>> in src/slave/containerizer/docker.cpp around line 903:
>>>>>>>>>
>>>>>>>>>   vector<string> argv;
>>>>>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>>>>>   // Construct the mesos-docker-executor using the "name" we gave
>>>>>>>>>> the
>>>>>>>>>>   // container (to distinguish it from Docker containers not
>>>>>>>>>> created
>>>>>>>>>>   // by Mesos).
>>>>>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>>>>>       argv,
>>>>>>>>>>       Subprocess::PIPE(),
>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>> "stdout")),
>>>>>>>>>>       Subprocess::PATH(path::join(container->directory,
>>>>>>>>>> "stderr")),
>>>>>>>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>>>>>>>       environment,
>>>>>>>>>>       lambda::bind(&setup, container->directory));
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> A little ways above we can see the environment is setup w/ the
>>>>>>>>> container tasks defined env vars.
>>>>>>>>>
>>>>>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>>>>>
>>>>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>>>>>
>>>>>>>>>>  container->executor.command().environment().variables()) {
>>>>>>>>>>     environment[variable.name()] = variable.value();
>>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>>>>>> 0.24.1 should be works.
>>>>>>>>>>
>>>>>>>>>> >Do any of you know which host the path
>>>>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>>>>>>>> failing.
>>>>>>>>>>
>>>>>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly
>>>>>>>>>> before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or
>>>>>>>>>> use the same dir of mesos-docker-executor.
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Maybe I spoke too soon.
>>>>>>>>>>>
>>>>>>>>>>> Now the checks are attempting to run, however the STDERR is not
>>>>>>>>>>> looking good.  I've added some debugging to the error message output to
>>>>>>>>>>> show the path, argv, and envp variables:
>>>>>>>>>>>
>>>>>>>>>>> STDOUT:
>>>>>>>>>>>
>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>>> Starting task
>>>>>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>>>>>>> sh -c \" exit 1
>>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>>> Health check process launched at pid: 3012
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> STDERR:
>>>>>>>>>>>
>>>>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on
>>>>>>>>>>>> slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>>>>> memory limited without swap.
>>>>>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in
>>>>>>>>>>>> childMain
>>>>>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700)
>>>>>>>>>>>> from PID 3012; stack trace: ***
>>>>>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>>>>>> @ 0x41921c _Abort()
>>>>>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>>>>>> @ 0x43cc9c
>>>>>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>>>>>> @ 0x7f4a39d92827
>>>>>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>>> should exist on? It definitely doesn't exist on the slave,
>>>>>>>>>>> hence execution failing.
>>>>>>>>>>>
>>>>>>>>>>> This is with current master, git hash
>>>>>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>>>>>
>>>>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Jay
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Update:
>>>>>>>>>>>>
>>>>>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile
>>>>>>>>>>>> and package the latest master (0.26.x) and deployed it to the cluster, and
>>>>>>>>>>>> now health checks are working as advertised in both Marathon and my own
>>>>>>>>>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>>>>>>>>
>>>>>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Jay
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <
>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Haosdent,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you share your Marathon POST request that results in Mesos
>>>>>>>>>>>>> executing the health checks?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Since we can reference the Marathon framework, I've been doing
>>>>>>>>>>>>> some digging around.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>>>>>
>>>>>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's
>>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>
>>>>>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON
>>>>>>>>>>>>> to /tmp/X in both the TaskFactory as well an right before the task is sent
>>>>>>>>>>>>> to Mesos via driver.launchTasks:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>>          CreatedTask(
>>>>>>>>>>>>>>            taskInfo,
>>>>>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") {
>>>>>>>>>>>>>> driver =>
>>>>>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>>>>>> +      var i = 0
>>>>>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString()
>>>>>>>>>>>>>> + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>>>>>>>> taskInfos.asJava)
>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then I built and deployed the hacked Marathon and restarted
>>>>>>>>>>>>> the marathon service.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>>>>>>>>>> container with a simple hello-world ruby app running on
>>>>>>>>>>>>> 0.0.0.0:8000)
>>>>>>>>>>>>>
>>>>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST
>>>>>>>>>>>>>> -H'Content-Type: application/json' -d'
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>>>>>   "apps": [
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>>>>>       "container": {
>>>>>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>>>>>         "docker": {
>>>>>>>>>>>>>>           "image":
>>>>>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>>>>>           "portMappings": [
>>>>>>>>>>>>>>             {
>>>>>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>           ]
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>       "env": {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>       "instances": 1,
>>>>>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>>>>>       "mem": 512
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>   ]
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ ls /tmp/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do they match?
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, so I am confident this is the information being sent
>>>>>>>>>>>>> across the wire to Mesos.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ cat
>>>>>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>         "value":1.0
>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>         "value":512.0
>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>             "begin":31641,
>>>>>>>>>>>>>>             "end":31641
>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>       },
>>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>     "environment":{
>>>>>>>>>>>>>>       "variables":[
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>       ]
>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>>>>>
>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> And STDERR:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered
>>>>>>>>>>>>>> on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit
>>>>>>>>>>>>>> capabilities, memory limited without swap.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any ideas of other things to try or what I could be missing?
>>>>>>>>>>>>> Can't say either way about the Mesos health-check system working or not if
>>>>>>>>>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for all your help!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe you could post your executor stdout/stderr so that we
>>>>>>>>>>>>>> could know whether health check running not.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> marathon also use mesos health check. When I use health
>>>>>>>>>>>>>>> check, I could saw the log like this in executor stdout.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <
>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am using my own framework, and the full task info I'm
>>>>>>>>>>>>>>>> using is posted earlier in this thread.  Do you happen to know if Marathon
>>>>>>>>>>>>>>>> uses Mesos's health checks for its health check system?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, launch the health task through its definition in
>>>>>>>>>>>>>>>> taskinfo. Do you launch your task through Marathon? I could test it in my
>>>>>>>>>>>>>>>> side.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <
>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Precisely, and there are none of those statements.  Are
>>>>>>>>>>>>>>>>> you or others confident health-checks are part of the code path when
>>>>>>>>>>>>>>>>> defined via task info for docker container tasks?  Going through the code,
>>>>>>>>>>>>>>>>> I wasn't able to find the linkage for anything other than health-checks
>>>>>>>>>>>>>>>>> triggered through a custom executor.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> With that being said it is a pretty good sized code base
>>>>>>>>>>>>>>>>> and I'm not very familiar with it, so my analysis this far has by no means
>>>>>>>>>>>>>>>>> been exhaustive.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> When health check launch, it would have a log like this in
>>>>>>>>>>>>>>>>> your executor stdout
>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output
>>>>>>>>>>>>>>>>>> in the logs with the string "health" or "Health" if the health-check were
>>>>>>>>>>>>>>>>>> active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see
>>>>>>>>>>>>>>>>>> whether could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <
>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport,
>>>>>>>>>>>>>>>>>>>>> let me double check.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.
>>>>>>>>>>>>>>>>>>>>>> I'll look there :)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to
>>>>>>>>>>>>>>>>>>>>>>> test it out?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks
>>>>>>>>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker exec with the
>>>>>>>>>>>>>>>>>>>>>>>> command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image
>>>>>>>>>>>>>>>>>>>>>>>> tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see
>>>>>>>>>>>>>>>>>>>>>>>> if they ever run the command (in this case `sleep 5`), but have not found
>>>>>>>>>>>>>>>>>>>>>>>> any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked
>>>>>>>>>>>>>>>>>>>>>>>> from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does
>>>>>>>>>>>>>>>>>>>>>>>> this mean that health-checks are only supported for custom executors and
>>>>>>>>>>>>>>>>>>>>>>>> not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the
>>>>>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Marco Massenzio <ma...@mesosphere.io>.
Jay:

you hit the nail on the head: the direction is definitely one-way (from
MESOS_ENV var to Flag) and we don't reflect --flag back into the MESOS_FLAG
env var.
Others more familiar with the matter may correct me, but it looks like you
have uncovered a bug in the executor code: could you please file a Jira for
us to look into?

It seems to me that, at present, the only workaround is for you would be to
set the MESOS_LAUNCHER_DIR env var, as the flag won't be picked by the
executor.


--
*Marco Massenzio*
Distributed Systems Engineer
http://codetrips.com

On Mon, Oct 12, 2015 at 11:44 PM, Jay Taylor <ja...@jaytaylor.com> wrote:

> Hi Marco,
>
> My reply is inline below-
>
> On Mon, Oct 12, 2015 at 2:27 PM, Marco Massenzio <ma...@mesosphere.io>
> wrote:
>
>>
>> On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <ma...@mesosphere.io>
>> wrote:
>>
>>> Are those the stdout logs of the Agent? Because I don't see the
>>> --launcher-dir set, however, if I look into one that is running off the
>>> same 0.24.1 package, this is what I see:
>>>
>>> I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup:
>>> --appc_store_dir="/tmp/mesos/store/appc"
>>> --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5"
>>> --cgroups_cpu_enable_pids_and_tids_count="false"
>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>> --enforce_container_disk_quota="false"
>>> --executor_registration_timeout="1mins"
>>> --executor_shutdown_grace_period="5secs"
>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>> --hadoop_home="" --help="false" --initialize_driver_logging="true"
>>> --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem"
>>> --launcher_dir="/usr/libexec/mesos"
>>> --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0"
>>> --logging_level="INFO" --master="zk://192.168.33.1:2181/mesos/vagrant"
>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>> --registration_backoff_factor="1secs"
>>> --resource_monitoring_interval="1secs"
>>> --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]"
>>> --revocable_cpu_low_priority="true"
>>> --sandbox_directory="/var/local/sandbox" --strict="true"
>>> --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
>>>
>> (this is run off the Vagrantfile at [0] in case you want to reproduce).
>>> That agent is not run via the init command, though, I execute it
>>> manually via the `run-agent.sh` in the same directory.
>>>
>>> I don't really think this matters, but I assume you also restarted the
>>> agent after making the config changes?
>>> (and, for your own sanity - you can double check the version by looking
>>> at the very head of the logs).
>>>
>>
> Yes I definitely restarted all mesos processes after config changes :)
>
> Here s info equivalent to what you posted from one of the slaves INFO log:
>
> Log file created at: 2015/10/12 20:22:58
>> Running on machine: mesos-worker2a
>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>> I1012 20:22:58.469779  9605 logging.cpp:172] INFO level logging started!
>> I1012 20:22:58.470006  9605 main.cpp:185] Build: 2015-09-25 19:13:24 by
>> root
>> I1012 20:22:58.470023  9605 main.cpp:187] Version: 0.24.1
>> I1012 20:22:58.470031  9605 main.cpp:190] Git tag: 0.24.1
>> I1012 20:22:58.470039  9605 main.cpp:194] Git SHA:
>> 44873806c2bb55da37e9adbece938274d8cd7c48
>> I1012 20:22:58.470221  9605 containerizer.cpp:143] Using isolation:
>> posix/cpu,posix/mem,filesystem/posix
>> I1012 20:22:58.573750  9605 main.cpp:272] Starting Mesos slave
>> I1012 20:22:58.574662  9621 slave.cpp:190] Slave started on 1)@
>> 192.168.225.59:5050
>> I1012 20:22:58.574695  9621 slave.cpp:191] Flags at startup:
>> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
>> --cgroups_cpu_enable_pids_and_tids_count="false"
>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>> --container_disk_watch_interval="15secs" --containerizers="mesos,docker"
>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>> --enforce_container_disk_quota="false"
>> --executor_registration_timeout="5mins"
>> --executor_shutdown_grace_period="5secs"
>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>> --hadoop_home="" --help="false" --hostname="
>> mesos-worker2a-hobart.gigawatt.io" --initialize_driver_logging="true"
>> --ip="192.168.225.59" --isolation="posix/cpu,posix/mem" --
>> *launcher_dir="/usr/libexec/mesos"* --log_dir="/var/log/mesos"
>> --logbufsecs="0" --logging_level="INFO"
>> --master="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>> --perf_interval="1mins" --port="5050" --qos_correction_interval_min="0ns"
>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>> --registration_backoff_factor="1secs"
>> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
>> --switch_user="true" --version="false" --work_dir="/tmp/mesos"
>
>
> The launcher dir is picked up by the mesos-slave process.  We can also see
> the cmdline flag is picked up from /etc/mesos-slave like this:
>
> mesos-worker2a:~$ ps -ef | grep mesos
>> root      9605     1  1 20:22 ?        00:01:18 /usr/sbin/mesos-slave
>> --ip=192.168.225.59 --log_dir=/var/log/mesos --
>> *launcher_dir=/usr/libexec/mesos*root      9612  9605  0 20:22 ?
>>  00:00:00 logger -p user.info -t mesos-slave[9605]
>> root      9613  9605  0 20:22 ?        00:00:00 logger -p user.err -t
>> mesos-slave[9605]
>> vagrant   9951  6010  0 21:36 pts/0    00:00:00 grep --color=auto mesos
>
>
>
> What I keep coming back to is the fact that the MESOS_LAUNCHER_DIR env var
> does not seem get picked up here:
> https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L573-L576
> :
>
>   const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>   string path =
>>     envPath.isSome() ? envPath.get()
>>                      : os::realpath(Path(argv[0]).dirname()).get();
>
>
> And argv[0] (which contains the slave work dir) is the path we see in the
> tasks stdout.
>
> I'm still having trouble understanding how flags defined in
> mesos::internal::slave::Flags::Flags (src/slave/flags.[ch]pp) are
> propagated or expanded to MESOS_<flag_in_caps> environment variables.  Can
> you confirm if such a mechanism exists and if so where it is?
>
> Otherwise, if my understanding is correct and such a mechanism doesn't
> exist:
>
> How can the requisite MESOS_LAUNHER_DIR env var be available when
> docker/executor.cpp (a child process of mesos-slave) attempts to read it?
>
> The lack of such a mechanism would explain the behavior I'm currently
> observing.
>
> Thanks!
> Jay
>
>
>>>
>>> [0] http://github.com/massenz/zk-mesos
>>
>>>
>>>
>>>
>>>
>>> --
>>> *Marco Massenzio*
>>> Distributed Systems Engineer
>>> http://codetrips.com
>>>
>>> On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <ou...@gmail.com>
>>> wrote:
>>>
>>>> Hi Haosdent and Mesos friends,
>>>>
>>>> I've rebuilt the cluster from scratch and installed mesos 0.24.1 from
>>>> the mesosphere apt repo:
>>>>
>>>> $ dpkg -l | grep mesos
>>>> ii  mesos                               0.24.1-0.2.35.ubuntu1404
>>>>      amd64        Cluster resource manager with efficient resource isolation
>>>>
>>>> Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir on
>>>> the slaves:
>>>>
>>>> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
>>>> /usr/libexec/mesos
>>>>
>>>> And yet the task health-checks are still being launched from the
>>>> sandbox directory like before!
>>>>
>>>> I've also tested setting the MESOS_LAUNCHER_DIR env var and get the
>>>> identical result (just as before on the cluster where many versions of
>>>> mesos had been installed):
>>>>
>>>> STDOUT:
>>>>
>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>> --stop_timeout="0ns"
>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>> --stop_timeout="0ns"
>>>>> Registered docker executor on mesos-worker1a
>>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>> Launching health check process:
>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
>>>>> --executor=(1)@192.168.225.58:48912
>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>> 127.0.0.1:8000
>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>> Health check process launched at pid: 11253
>>>>
>>>>
>>>>
>>>> STDERR:
>>>>
>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>> --stop_timeout="0ns"
>>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>>> --stop_timeout="0ns"
>>>>> Registered docker executor on mesos-worker1a
>>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>> *Launching health check process:
>>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
>>>>> --executor=(1)@192.168.225.58:48912
>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>>> 127.0.0.1:8000
>>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>>> Health check process launched at pid: 11253
>>>>
>>>>
>>>> Any ideas on where to go from here?  Is there any additional
>>>> information I can provide?
>>>>
>>>> Thanks as always,
>>>> Jay
>>>>
>>>>
>>>> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <ha...@gmail.com> wrote:
>>>>
>>>>> For flag sent to the executor from containerizer, the flag would
>>>>> stringify and become a command line parameter when launch executor.
>>>>>
>>>>> You could see this in
>>>>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>>>>
>>>>> But for launcher_dir, the executor get it from `argv[0]`, as you
>>>>> mentioned above.
>>>>> ```
>>>>>   string path =
>>>>>     envPath.isSome() ? envPath.get()
>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>
>>>>> ```
>>>>> So I want to figure out why your argv[0] would become sandbox dir, not
>>>>> "/usr/libexec/mesos".
>>>>>
>>>>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <ou...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I see.  And then how are the flags sent to the executor?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>
>>>>>> Yes. The related code is located in
>>>>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>>>>
>>>>>> In fact, environment variables starts with MESOS_ would load as flags
>>>>>> variables.
>>>>>>
>>>>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>>>>
>>>>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> One question for you haosdent-
>>>>>>>
>>>>>>> You mentioned that the flags.launcher_dir should propagate to the
>>>>>>> docker executor all the way up the chain.  Can you show me where this logic
>>>>>>> is in the codebase?  I didn't see where that was happening and would like
>>>>>>> to understand the mechanism.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Jay
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>
>>>>>>> Maybe tomorrow I will build a fresh cluster from scratch to see if
>>>>>>> the broken behavior experienced today still persists.
>>>>>>>
>>>>>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>
>>>>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set
>>>>>>> flags.launcher_dir which would find mesos-docker-executor
>>>>>>> and mesos-health-check under this dir. Although the env is not propagated,
>>>>>>> but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is
>>>>>>> get from it.
>>>>>>>
>>>>>>> For example, because I
>>>>>>> ```
>>>>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>>>>> ```
>>>>>>> before start mesos-slave. So when I launch slave, I could find this
>>>>>>> log in slave log
>>>>>>> ```
>>>>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>>>>>> xxxxx  --launcher_dir="/tmp"
>>>>>>> ```
>>>>>>>
>>>>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become
>>>>>>> sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other
>>>>>>> scripts?
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>>>>>>
>>>>>>>> I just tried setting both the env var and flag on the slaves, and
>>>>>>>> have determined that the env var is not present when it is being checked
>>>>>>>> src/docker/executor.cpp @ line 573:
>>>>>>>>
>>>>>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>>   string path =
>>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>>>>>
>>>>>>>>
>>>>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
>>>>>>>> propagated along up to the point of mesos-slave launch):
>>>>>>>>
>>>>>>>> $ cat /etc/default/mesos-slave
>>>>>>>>> export
>>>>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>>>>> export MESOS_PORT="5050"
>>>>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>>>>
>>>>>>>>
>>>>>>>> TASK OUTPUT:
>>>>>>>>
>>>>>>>>
>>>>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>> Launching health check process:
>>>>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>>>>>> --executor=(1)@192.168.225.59:44523
>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>>>>>> sh -c \" \/bin\/bash
>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>>> Health check process launched at pid: 2519
>>>>>>>>
>>>>>>>>
>>>>>>>> The env var is not propagated when the docker executor is launched
>>>>>>>> in src/slave/containerizer/docker.cpp around line 903:
>>>>>>>>
>>>>>>>>   vector<string> argv;
>>>>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>>>>   // Construct the mesos-docker-executor using the "name" we gave
>>>>>>>>> the
>>>>>>>>>   // container (to distinguish it from Docker containers not
>>>>>>>>> created
>>>>>>>>>   // by Mesos).
>>>>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>>>>       argv,
>>>>>>>>>       Subprocess::PIPE(),
>>>>>>>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>>>>>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>>>>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>>>>>>       environment,
>>>>>>>>>       lambda::bind(&setup, container->directory));
>>>>>>>>
>>>>>>>>
>>>>>>>> A little ways above we can see the environment is setup w/ the
>>>>>>>> container tasks defined env vars.
>>>>>>>>
>>>>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>>>>
>>>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>>>>
>>>>>>>>>  container->executor.command().environment().variables()) {
>>>>>>>>>     environment[variable.name()] = variable.value();
>>>>>>>>>   }
>>>>>>>>
>>>>>>>>
>>>>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>>>>> 0.24.1 should be works.
>>>>>>>>>
>>>>>>>>> >Do any of you know which host the path
>>>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>>>>>>> failing.
>>>>>>>>>
>>>>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before?
>>>>>>>>> We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the
>>>>>>>>> same dir of mesos-docker-executor.
>>>>>>>>>
>>>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Maybe I spoke too soon.
>>>>>>>>>>
>>>>>>>>>> Now the checks are attempting to run, however the STDERR is not
>>>>>>>>>> looking good.  I've added some debugging to the error message output to
>>>>>>>>>> show the path, argv, and envp variables:
>>>>>>>>>>
>>>>>>>>>> STDOUT:
>>>>>>>>>>
>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>>> Starting task
>>>>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>> Launching health check process:
>>>>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>>>>>> sh -c \" exit 1
>>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>>> Health check process launched at pid: 3012
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> STDERR:
>>>>>>>>>>
>>>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on
>>>>>>>>>>> slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>>>> memory limited without swap.
>>>>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in
>>>>>>>>>>> childMain
>>>>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700)
>>>>>>>>>>> from PID 3012; stack trace: ***
>>>>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>>>>> @ 0x41921c _Abort()
>>>>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>>>>> @ 0x43cc9c
>>>>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>>>>> @ 0x7f4a39d92827
>>>>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence
>>>>>>>>>> execution failing.
>>>>>>>>>>
>>>>>>>>>> This is with current master, git hash
>>>>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>>>>
>>>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Jay
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Update:
>>>>>>>>>>>
>>>>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile
>>>>>>>>>>> and package the latest master (0.26.x) and deployed it to the cluster, and
>>>>>>>>>>> now health checks are working as advertised in both Marathon and my own
>>>>>>>>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>>>>>>>
>>>>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Jay
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <outtatime@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Haosdent,
>>>>>>>>>>>>
>>>>>>>>>>>> Can you share your Marathon POST request that results in Mesos
>>>>>>>>>>>> executing the health checks?
>>>>>>>>>>>>
>>>>>>>>>>>> Since we can reference the Marathon framework, I've been doing
>>>>>>>>>>>> some digging around.
>>>>>>>>>>>>
>>>>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>>>>
>>>>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>>>>
>>>>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's
>>>>>>>>>>>> dependencies
>>>>>>>>>>>>
>>>>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to
>>>>>>>>>>>> /tmp/X in both the TaskFactory as well an right before the task is sent to
>>>>>>>>>>>> Mesos via driver.launchTasks:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>>>>
>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>>>>
>>>>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>>          CreatedTask(
>>>>>>>>>>>>>            taskInfo,
>>>>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>>>>
>>>>>>>>>>>> $ git diff
>>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") {
>>>>>>>>>>>>> driver =>
>>>>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>>>>> +      var i = 0
>>>>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString()
>>>>>>>>>>>>> + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>>>>>>> taskInfos.asJava)
>>>>>>>>>>>>>      }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Then I built and deployed the hacked Marathon and restarted the
>>>>>>>>>>>> marathon service.
>>>>>>>>>>>>
>>>>>>>>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>>>>>>>>> container with a simple hello-world ruby app running on
>>>>>>>>>>>> 0.0.0.0:8000)
>>>>>>>>>>>>
>>>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST
>>>>>>>>>>>>> -H'Content-Type: application/json' -d'
>>>>>>>>>>>>> {
>>>>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>>>>   "apps": [
>>>>>>>>>>>>>     {
>>>>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>>>>       "container": {
>>>>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>>>>         "docker": {
>>>>>>>>>>>>>           "image":
>>>>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>>>>           "portMappings": [
>>>>>>>>>>>>>             {
>>>>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>>>>             }
>>>>>>>>>>>>>           ]
>>>>>>>>>>>>>         }
>>>>>>>>>>>>>       },
>>>>>>>>>>>>>       "env": {
>>>>>>>>>>>>>
>>>>>>>>>>>>>       },
>>>>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>>>>         }
>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>       "instances": 1,
>>>>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>>>>       "mem": 512
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>   ]
>>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> $ ls /tmp/
>>>>>>>>>>>>>
>>>>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>>
>>>>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Do they match?
>>>>>>>>>>>>
>>>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, so I am confident this is the information being sent
>>>>>>>>>>>> across the wire to Mesos.
>>>>>>>>>>>>
>>>>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>>>>
>>>>>>>>>>>> $ cat
>>>>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>> {
>>>>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>
>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>     {
>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>         "value":1.0
>>>>>>>>>>>>>       },
>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>     },
>>>>>>>>>>>>>     {
>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>         "value":512.0
>>>>>>>>>>>>>       },
>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>     },
>>>>>>>>>>>>>     {
>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>           {
>>>>>>>>>>>>>             "begin":31641,
>>>>>>>>>>>>>             "end":31641
>>>>>>>>>>>>>           }
>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>       },
>>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>     "environment":{
>>>>>>>>>>>>>       "variables":[
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>         },
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>>>>         },
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>>>>         },
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>>>>
>>>>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>>>>         },
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>>>>
>>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>>         },
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>         },
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>         },
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>>>>         },
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>>         }
>>>>>>>>>>>>>       ]
>>>>>>>>>>>>>     },
>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>
>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>         }
>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>   }
>>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>>>>
>>>>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>>>>
>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> And STDERR:
>>>>>>>>>>>>
>>>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered
>>>>>>>>>>>>> on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>>>>>> memory limited without swap.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>>>>
>>>>>>>>>>>> Any ideas of other things to try or what I could be missing?
>>>>>>>>>>>> Can't say either way about the Mesos health-check system working or not if
>>>>>>>>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for all your help!
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Jay
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe you could post your executor stdout/stderr so that we
>>>>>>>>>>>>> could know whether health check running not.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> marathon also use mesos health check. When I use health
>>>>>>>>>>>>>> check, I could saw the log like this in executor stdout.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <
>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am using my own framework, and the full task info I'm
>>>>>>>>>>>>>>> using is posted earlier in this thread.  Do you happen to know if Marathon
>>>>>>>>>>>>>>> uses Mesos's health checks for its health check system?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, launch the health task through its definition in
>>>>>>>>>>>>>>> taskinfo. Do you launch your task through Marathon? I could test it in my
>>>>>>>>>>>>>>> side.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <
>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Precisely, and there are none of those statements.  Are you
>>>>>>>>>>>>>>>> or others confident health-checks are part of the code path when defined
>>>>>>>>>>>>>>>> via task info for docker container tasks?  Going through the code, I wasn't
>>>>>>>>>>>>>>>> able to find the linkage for anything other than health-checks triggered
>>>>>>>>>>>>>>>> through a custom executor.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> With that being said it is a pretty good sized code base
>>>>>>>>>>>>>>>> and I'm not very familiar with it, so my analysis this far has by no means
>>>>>>>>>>>>>>>> been exhaustive.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> When health check launch, it would have a log like this in
>>>>>>>>>>>>>>>> your executor stdout
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in
>>>>>>>>>>>>>>>>> the logs with the string "health" or "Health" if the health-check were
>>>>>>>>>>>>>>>>> active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether
>>>>>>>>>>>>>>>>> could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <
>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport,
>>>>>>>>>>>>>>>>>>>> let me double check.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll
>>>>>>>>>>>>>>>>>>>>> look there :)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to
>>>>>>>>>>>>>>>>>>>>>> test it out?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks
>>>>>>>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker exec with the
>>>>>>>>>>>>>>>>>>>>>>> command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image
>>>>>>>>>>>>>>>>>>>>>>> tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see
>>>>>>>>>>>>>>>>>>>>>>> if they ever run the command (in this case `sleep 5`), but have not found
>>>>>>>>>>>>>>>>>>>>>>> any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked
>>>>>>>>>>>>>>>>>>>>>>> from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does
>>>>>>>>>>>>>>>>>>>>>>> this mean that health-checks are only supported for custom executors and
>>>>>>>>>>>>>>>>>>>>>>> not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the
>>>>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ja...@jaytaylor.com>.
Hi Marco,

My reply is inline below-

On Mon, Oct 12, 2015 at 2:27 PM, Marco Massenzio <ma...@mesosphere.io>
wrote:

>
> On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <ma...@mesosphere.io>
> wrote:
>
>> Are those the stdout logs of the Agent? Because I don't see the
>> --launcher-dir set, however, if I look into one that is running off the
>> same 0.24.1 package, this is what I see:
>>
>> I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup:
>> --appc_store_dir="/tmp/mesos/store/appc"
>> --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5"
>> --cgroups_cpu_enable_pids_and_tids_count="false"
>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
>> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>> --enforce_container_disk_quota="false"
>> --executor_registration_timeout="1mins"
>> --executor_shutdown_grace_period="5secs"
>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>> --hadoop_home="" --help="false" --initialize_driver_logging="true"
>> --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem"
>> --launcher_dir="/usr/libexec/mesos"
>> --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0"
>> --logging_level="INFO" --master="zk://192.168.33.1:2181/mesos/vagrant"
>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>> --registration_backoff_factor="1secs"
>> --resource_monitoring_interval="1secs"
>> --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]"
>> --revocable_cpu_low_priority="true"
>> --sandbox_directory="/var/local/sandbox" --strict="true"
>> --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
>>
> (this is run off the Vagrantfile at [0] in case you want to reproduce).
>> That agent is not run via the init command, though, I execute it manually
>> via the `run-agent.sh` in the same directory.
>>
>> I don't really think this matters, but I assume you also restarted the
>> agent after making the config changes?
>> (and, for your own sanity - you can double check the version by looking
>> at the very head of the logs).
>>
>
Yes I definitely restarted all mesos processes after config changes :)

Here s info equivalent to what you posted from one of the slaves INFO log:

Log file created at: 2015/10/12 20:22:58
> Running on machine: mesos-worker2a
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> I1012 20:22:58.469779  9605 logging.cpp:172] INFO level logging started!
> I1012 20:22:58.470006  9605 main.cpp:185] Build: 2015-09-25 19:13:24 by
> root
> I1012 20:22:58.470023  9605 main.cpp:187] Version: 0.24.1
> I1012 20:22:58.470031  9605 main.cpp:190] Git tag: 0.24.1
> I1012 20:22:58.470039  9605 main.cpp:194] Git SHA:
> 44873806c2bb55da37e9adbece938274d8cd7c48
> I1012 20:22:58.470221  9605 containerizer.cpp:143] Using isolation:
> posix/cpu,posix/mem,filesystem/posix
> I1012 20:22:58.573750  9605 main.cpp:272] Starting Mesos slave
> I1012 20:22:58.574662  9621 slave.cpp:190] Slave started on 1)@
> 192.168.225.59:5050
> I1012 20:22:58.574695  9621 slave.cpp:191] Flags at startup:
> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5"
> --cgroups_cpu_enable_pids_and_tids_count="false"
> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
> --cgroups_limit_swap="false" --cgroups_root="mesos"
> --container_disk_watch_interval="15secs" --containerizers="mesos,docker"
> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
> --enforce_container_disk_quota="false"
> --executor_registration_timeout="5mins"
> --executor_shutdown_grace_period="5secs"
> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
> --hadoop_home="" --help="false" --hostname="
> mesos-worker2a-hobart.gigawatt.io" --initialize_driver_logging="true"
> --ip="192.168.225.59" --isolation="posix/cpu,posix/mem" --
> *launcher_dir="/usr/libexec/mesos"* --log_dir="/var/log/mesos"
> --logbufsecs="0" --logging_level="INFO"
> --master="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
> --perf_interval="1mins" --port="5050" --qos_correction_interval_min="0ns"
> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
> --registration_backoff_factor="1secs"
> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
> --switch_user="true" --version="false" --work_dir="/tmp/mesos"


The launcher dir is picked up by the mesos-slave process.  We can also see
the cmdline flag is picked up from /etc/mesos-slave like this:

mesos-worker2a:~$ ps -ef | grep mesos
> root      9605     1  1 20:22 ?        00:01:18 /usr/sbin/mesos-slave
> --ip=192.168.225.59 --log_dir=/var/log/mesos --
> *launcher_dir=/usr/libexec/mesos*root      9612  9605  0 20:22 ?
>  00:00:00 logger -p user.info -t mesos-slave[9605]
> root      9613  9605  0 20:22 ?        00:00:00 logger -p user.err -t
> mesos-slave[9605]
> vagrant   9951  6010  0 21:36 pts/0    00:00:00 grep --color=auto mesos



What I keep coming back to is the fact that the MESOS_LAUNCHER_DIR env var
does not seem get picked up here:
https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L573-L576
:

  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>   string path =
>     envPath.isSome() ? envPath.get()
>                      : os::realpath(Path(argv[0]).dirname()).get();


And argv[0] (which contains the slave work dir) is the path we see in the
tasks stdout.

I'm still having trouble understanding how flags defined in
mesos::internal::slave::Flags::Flags (src/slave/flags.[ch]pp) are
propagated or expanded to MESOS_<flag_in_caps> environment variables.  Can
you confirm if such a mechanism exists and if so where it is?

Otherwise, if my understanding is correct and such a mechanism doesn't
exist:

How can the requisite MESOS_LAUNHER_DIR env var be available when
docker/executor.cpp (a child process of mesos-slave) attempts to read it?

The lack of such a mechanism would explain the behavior I'm currently
observing.

Thanks!
Jay


>>
>> [0] http://github.com/massenz/zk-mesos
>
>>
>>
>>
>>
>> --
>> *Marco Massenzio*
>> Distributed Systems Engineer
>> http://codetrips.com
>>
>> On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> Hi Haosdent and Mesos friends,
>>>
>>> I've rebuilt the cluster from scratch and installed mesos 0.24.1 from
>>> the mesosphere apt repo:
>>>
>>> $ dpkg -l | grep mesos
>>> ii  mesos                               0.24.1-0.2.35.ubuntu1404
>>>    amd64        Cluster resource manager with efficient resource isolation
>>>
>>> Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir on
>>> the slaves:
>>>
>>> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
>>> /usr/libexec/mesos
>>>
>>> And yet the task health-checks are still being launched from the sandbox
>>> directory like before!
>>>
>>> I've also tested setting the MESOS_LAUNCHER_DIR env var and get the
>>> identical result (just as before on the cluster where many versions of
>>> mesos had been installed):
>>>
>>> STDOUT:
>>>
>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>> --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --stop_timeout="0ns"
>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>> --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --stop_timeout="0ns"
>>>> Registered docker executor on mesos-worker1a
>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>> Launching health check process:
>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
>>>> --executor=(1)@192.168.225.58:48912
>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>> 127.0.0.1:8000
>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>> Health check process launched at pid: 11253
>>>
>>>
>>>
>>> STDERR:
>>>
>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>> --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --stop_timeout="0ns"
>>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>> --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>>> --stop_timeout="0ns"
>>>> Registered docker executor on mesos-worker1a
>>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>> *Launching health check process:
>>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
>>>> --executor=(1)@192.168.225.58:48912
>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>>> 127.0.0.1:8000
>>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>>> Health check process launched at pid: 11253
>>>
>>>
>>> Any ideas on where to go from here?  Is there any additional information
>>> I can provide?
>>>
>>> Thanks as always,
>>> Jay
>>>
>>>
>>> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <ha...@gmail.com> wrote:
>>>
>>>> For flag sent to the executor from containerizer, the flag would
>>>> stringify and become a command line parameter when launch executor.
>>>>
>>>> You could see this in
>>>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>>>
>>>> But for launcher_dir, the executor get it from `argv[0]`, as you
>>>> mentioned above.
>>>> ```
>>>>   string path =
>>>>     envPath.isSome() ? envPath.get()
>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>
>>>> ```
>>>> So I want to figure out why your argv[0] would become sandbox dir, not
>>>> "/usr/libexec/mesos".
>>>>
>>>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <ou...@gmail.com>
>>>> wrote:
>>>>
>>>>> I see.  And then how are the flags sent to the executor?
>>>>>
>>>>>
>>>>>
>>>>> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>> Yes. The related code is located in
>>>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>>>
>>>>> In fact, environment variables starts with MESOS_ would load as flags
>>>>> variables.
>>>>>
>>>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>>>
>>>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> One question for you haosdent-
>>>>>>
>>>>>> You mentioned that the flags.launcher_dir should propagate to the
>>>>>> docker executor all the way up the chain.  Can you show me where this logic
>>>>>> is in the codebase?  I didn't see where that was happening and would like
>>>>>> to understand the mechanism.
>>>>>>
>>>>>> Thanks!
>>>>>> Jay
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>
>>>>>> Maybe tomorrow I will build a fresh cluster from scratch to see if
>>>>>> the broken behavior experienced today still persists.
>>>>>>
>>>>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>
>>>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set
>>>>>> flags.launcher_dir which would find mesos-docker-executor
>>>>>> and mesos-health-check under this dir. Although the env is not propagated,
>>>>>> but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is get
>>>>>> from it.
>>>>>>
>>>>>> For example, because I
>>>>>> ```
>>>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>>>> ```
>>>>>> before start mesos-slave. So when I launch slave, I could find this
>>>>>> log in slave log
>>>>>> ```
>>>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>>>>> xxxxx  --launcher_dir="/tmp"
>>>>>> ```
>>>>>>
>>>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become
>>>>>> sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other
>>>>>> scripts?
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>>>>>
>>>>>>> I just tried setting both the env var and flag on the slaves, and
>>>>>>> have determined that the env var is not present when it is being checked
>>>>>>> src/docker/executor.cpp @ line 573:
>>>>>>>
>>>>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>>   string path =
>>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>>>>
>>>>>>>
>>>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
>>>>>>> propagated along up to the point of mesos-slave launch):
>>>>>>>
>>>>>>> $ cat /etc/default/mesos-slave
>>>>>>>> export
>>>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>>>> export MESOS_PORT="5050"
>>>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>>>
>>>>>>>
>>>>>>> TASK OUTPUT:
>>>>>>>
>>>>>>>
>>>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>> Launching health check process:
>>>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>>>>> --executor=(1)@192.168.225.59:44523
>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>>>>> sh -c \" \/bin\/bash
>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>>> Health check process launched at pid: 2519
>>>>>>>
>>>>>>>
>>>>>>> The env var is not propagated when the docker executor is launched
>>>>>>> in src/slave/containerizer/docker.cpp around line 903:
>>>>>>>
>>>>>>>   vector<string> argv;
>>>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>>>   // Construct the mesos-docker-executor using the "name" we gave
>>>>>>>> the
>>>>>>>>   // container (to distinguish it from Docker containers not created
>>>>>>>>   // by Mesos).
>>>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>>>       argv,
>>>>>>>>       Subprocess::PIPE(),
>>>>>>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>>>>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>>>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>>>>>       environment,
>>>>>>>>       lambda::bind(&setup, container->directory));
>>>>>>>
>>>>>>>
>>>>>>> A little ways above we can see the environment is setup w/ the
>>>>>>> container tasks defined env vars.
>>>>>>>
>>>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>>>
>>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>>>            container->executor.command().environment().variables())
>>>>>>>> {
>>>>>>>>     environment[variable.name()] = variable.value();
>>>>>>>>   }
>>>>>>>
>>>>>>>
>>>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>
>>>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>>>> 0.24.1 should be works.
>>>>>>>>
>>>>>>>> >Do any of you know which host the path
>>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>>>>>> failing.
>>>>>>>>
>>>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before?
>>>>>>>> We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the
>>>>>>>> same dir of mesos-docker-executor.
>>>>>>>>
>>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Maybe I spoke too soon.
>>>>>>>>>
>>>>>>>>> Now the checks are attempting to run, however the STDERR is not
>>>>>>>>> looking good.  I've added some debugging to the error message output to
>>>>>>>>> show the path, argv, and envp variables:
>>>>>>>>>
>>>>>>>>> STDOUT:
>>>>>>>>>
>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>>> Starting task
>>>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>> Launching health check process:
>>>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>>>>> sh -c \" exit 1
>>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>>> Health check process launched at pid: 3012
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> STDERR:
>>>>>>>>>
>>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on
>>>>>>>>>> slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>>> memory limited without swap.
>>>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in
>>>>>>>>>> childMain
>>>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700)
>>>>>>>>>> from PID 3012; stack trace: ***
>>>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>>>> @ 0x41921c _Abort()
>>>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>>>> @ 0x43cc9c
>>>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>>>> @ 0x7f4a39d92827
>>>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence
>>>>>>>>> execution failing.
>>>>>>>>>
>>>>>>>>> This is with current master, git hash
>>>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>>>
>>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Jay
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Update:
>>>>>>>>>>
>>>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile
>>>>>>>>>> and package the latest master (0.26.x) and deployed it to the cluster, and
>>>>>>>>>> now health checks are working as advertised in both Marathon and my own
>>>>>>>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>>>>>>
>>>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Jay
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Haosdent,
>>>>>>>>>>>
>>>>>>>>>>> Can you share your Marathon POST request that results in Mesos
>>>>>>>>>>> executing the health checks?
>>>>>>>>>>>
>>>>>>>>>>> Since we can reference the Marathon framework, I've been doing
>>>>>>>>>>> some digging around.
>>>>>>>>>>>
>>>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>>>
>>>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>>>
>>>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's
>>>>>>>>>>> dependencies
>>>>>>>>>>>
>>>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to
>>>>>>>>>>> /tmp/X in both the TaskFactory as well an right before the task is sent to
>>>>>>>>>>> Mesos via driver.launchTasks:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>>>
>>>>>>>>>>> $ git diff
>>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>>>
>>>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>>          CreatedTask(
>>>>>>>>>>>>            taskInfo,
>>>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>>>
>>>>>>>>>>> $ git diff
>>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") {
>>>>>>>>>>>> driver =>
>>>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>>>> +      var i = 0
>>>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>>> +        import java.io._
>>>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() +
>>>>>>>>>>>> "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>> +      }
>>>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>>>>>> taskInfos.asJava)
>>>>>>>>>>>>      }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Then I built and deployed the hacked Marathon and restarted the
>>>>>>>>>>> marathon service.
>>>>>>>>>>>
>>>>>>>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>>>>>>>> container with a simple hello-world ruby app running on
>>>>>>>>>>> 0.0.0.0:8000)
>>>>>>>>>>>
>>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST
>>>>>>>>>>>> -H'Content-Type: application/json' -d'
>>>>>>>>>>>> {
>>>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>>>   "apps": [
>>>>>>>>>>>>     {
>>>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>>>       "container": {
>>>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>>>         "docker": {
>>>>>>>>>>>>           "image":
>>>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>>>           "portMappings": [
>>>>>>>>>>>>             {
>>>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>>>             }
>>>>>>>>>>>>           ]
>>>>>>>>>>>>         }
>>>>>>>>>>>>       },
>>>>>>>>>>>>       "env": {
>>>>>>>>>>>>
>>>>>>>>>>>>       },
>>>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>>>         }
>>>>>>>>>>>>       ],
>>>>>>>>>>>>       "instances": 1,
>>>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>>>       "mem": 512
>>>>>>>>>>>>     }
>>>>>>>>>>>>   ]
>>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> $ ls /tmp/
>>>>>>>>>>>>
>>>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>>
>>>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Do they match?
>>>>>>>>>>>
>>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yes, so I am confident this is the information being sent across
>>>>>>>>>>> the wire to Mesos.
>>>>>>>>>>>
>>>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>>>
>>>>>>>>>>> $ cat
>>>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>> {
>>>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>
>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>   },
>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>   },
>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>     {
>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>         "value":1.0
>>>>>>>>>>>>       },
>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>     },
>>>>>>>>>>>>     {
>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>         "value":512.0
>>>>>>>>>>>>       },
>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>     },
>>>>>>>>>>>>     {
>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>           {
>>>>>>>>>>>>             "begin":31641,
>>>>>>>>>>>>             "end":31641
>>>>>>>>>>>>           }
>>>>>>>>>>>>         ]
>>>>>>>>>>>>       },
>>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>>     }
>>>>>>>>>>>>   ],
>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>     "environment":{
>>>>>>>>>>>>       "variables":[
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>>>
>>>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>>>
>>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>>>         },
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>>         }
>>>>>>>>>>>>       ]
>>>>>>>>>>>>     },
>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>   },
>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>
>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>         }
>>>>>>>>>>>>       ],
>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>     }
>>>>>>>>>>>>   }
>>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>>>
>>>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>>>
>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>>> Starting task
>>>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> And STDERR:
>>>>>>>>>>>
>>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered
>>>>>>>>>>>> on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>>>>> memory limited without swap.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>>>
>>>>>>>>>>> Any ideas of other things to try or what I could be missing?
>>>>>>>>>>> Can't say either way about the Mesos health-check system working or not if
>>>>>>>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for all your help!
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Jay
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Maybe you could post your executor stdout/stderr so that we
>>>>>>>>>>>> could know whether health check running not.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> marathon also use mesos health check. When I use health check,
>>>>>>>>>>>>> I could saw the log like this in executor stdout.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>>>> ```
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <
>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am using my own framework, and the full task info I'm using
>>>>>>>>>>>>>> is posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>>>>>>>>>>> Mesos's health checks for its health check system?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, launch the health task through its definition in
>>>>>>>>>>>>>> taskinfo. Do you launch your task through Marathon? I could test it in my
>>>>>>>>>>>>>> side.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <
>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Precisely, and there are none of those statements.  Are you
>>>>>>>>>>>>>>> or others confident health-checks are part of the code path when defined
>>>>>>>>>>>>>>> via task info for docker container tasks?  Going through the code, I wasn't
>>>>>>>>>>>>>>> able to find the linkage for anything other than health-checks triggered
>>>>>>>>>>>>>>> through a custom executor.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With that being said it is a pretty good sized code base and
>>>>>>>>>>>>>>> I'm not very familiar with it, so my analysis this far has by no means been
>>>>>>>>>>>>>>> exhaustive.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When health check launch, it would have a log like this in
>>>>>>>>>>>>>>> your executor stdout
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in
>>>>>>>>>>>>>>>> the logs with the string "health" or "Health" if the health-check were
>>>>>>>>>>>>>>>> active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether
>>>>>>>>>>>>>>>> could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <
>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let
>>>>>>>>>>>>>>>>>>> me double check.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll
>>>>>>>>>>>>>>>>>>>> look there :)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to
>>>>>>>>>>>>>>>>>>>>> test it out?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks
>>>>>>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker exec with the
>>>>>>>>>>>>>>>>>>>>>> command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image
>>>>>>>>>>>>>>>>>>>>>> tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if
>>>>>>>>>>>>>>>>>>>>>> they ever run the command (in this case `sleep 5`), but have not found any
>>>>>>>>>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked
>>>>>>>>>>>>>>>>>>>>>> from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does
>>>>>>>>>>>>>>>>>>>>>> this mean that health-checks are only supported for custom executors and
>>>>>>>>>>>>>>>>>>>>>> not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the
>>>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Marco Massenzio <ma...@mesosphere.io>.
On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <ma...@mesosphere.io>
wrote:

> Are those the stdout logs of the Agent? Because I don't see the
> --launcher-dir set, however, if I look into one that is running off the
> same 0.24.1 package, this is what I see:
>
> I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup:
> --appc_store_dir="/tmp/mesos/store/appc"
> --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5"
> --cgroups_cpu_enable_pids_and_tids_count="false"
> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
> --cgroups_limit_swap="false" --cgroups_root="mesos"
> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
> --enforce_container_disk_quota="false"
> --executor_registration_timeout="1mins"
> --executor_shutdown_grace_period="5secs"
> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
> --hadoop_home="" --help="false" --initialize_driver_logging="true"
> --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem"
> --launcher_dir="/usr/libexec/mesos"
> --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0"
> --logging_level="INFO" --master="zk://192.168.33.1:2181/mesos/vagrant"
> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
> --registration_backoff_factor="1secs"
> --resource_monitoring_interval="1secs"
> --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]"
> --revocable_cpu_low_priority="true"
> --sandbox_directory="/var/local/sandbox" --strict="true"
> --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
> (this is run off the Vagrantfile at [0] in case you want to reproduce).
> That agent is not run via the init command, though, I execute it manually
> via the `run-agent.sh` in the same directory.
>
> I don't really think this matters, but I assume you also restarted the
> agent after making the config changes?
> (and, for your own sanity - you can double check the version by looking at
> the very head of the logs).
>
>
> [0] http://github.com/massenz/zk-mesos

>
>
>
>
> --
> *Marco Massenzio*
> Distributed Systems Engineer
> http://codetrips.com
>
> On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <ou...@gmail.com> wrote:
>
>> Hi Haosdent and Mesos friends,
>>
>> I've rebuilt the cluster from scratch and installed mesos 0.24.1 from the
>> mesosphere apt repo:
>>
>> $ dpkg -l | grep mesos
>> ii  mesos                               0.24.1-0.2.35.ubuntu1404
>>    amd64        Cluster resource manager with efficient resource isolation
>>
>> Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir on
>> the slaves:
>>
>> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
>> /usr/libexec/mesos
>>
>> And yet the task health-checks are still being launched from the sandbox
>> directory like before!
>>
>> I've also tested setting the MESOS_LAUNCHER_DIR env var and get the
>> identical result (just as before on the cluster where many versions of
>> mesos had been installed):
>>
>> STDOUT:
>>
>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>> --logbufsecs="0" --logging_level="INFO"
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --stop_timeout="0ns"
>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>> --logbufsecs="0" --logging_level="INFO"
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --stop_timeout="0ns"
>>> Registered docker executor on mesos-worker1a
>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>> Launching health check process:
>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
>>> --executor=(1)@192.168.225.58:48912
>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>> 127.0.0.1:8000
>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>> Health check process launched at pid: 11253
>>
>>
>>
>> STDERR:
>>
>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>> --logbufsecs="0" --logging_level="INFO"
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --stop_timeout="0ns"
>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>> --logbufsecs="0" --logging_level="INFO"
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --stop_timeout="0ns"
>>> Registered docker executor on mesos-worker1a
>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>> *Launching health check process:
>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
>>> --executor=(1)@192.168.225.58:48912
>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>> 127.0.0.1:8000
>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>> Health check process launched at pid: 11253
>>
>>
>> Any ideas on where to go from here?  Is there any additional information
>> I can provide?
>>
>> Thanks as always,
>> Jay
>>
>>
>> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <ha...@gmail.com> wrote:
>>
>>> For flag sent to the executor from containerizer, the flag would
>>> stringify and become a command line parameter when launch executor.
>>>
>>> You could see this in
>>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>>
>>> But for launcher_dir, the executor get it from `argv[0]`, as you
>>> mentioned above.
>>> ```
>>>   string path =
>>>     envPath.isSome() ? envPath.get()
>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>
>>> ```
>>> So I want to figure out why your argv[0] would become sandbox dir, not
>>> "/usr/libexec/mesos".
>>>
>>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>
>>>> I see.  And then how are the flags sent to the executor?
>>>>
>>>>
>>>>
>>>> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com> wrote:
>>>>
>>>> Yes. The related code is located in
>>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>>
>>>> In fact, environment variables starts with MESOS_ would load as flags
>>>> variables.
>>>>
>>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>>
>>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com>
>>>> wrote:
>>>>
>>>>> One question for you haosdent-
>>>>>
>>>>> You mentioned that the flags.launcher_dir should propagate to the
>>>>> docker executor all the way up the chain.  Can you show me where this logic
>>>>> is in the codebase?  I didn't see where that was happening and would like
>>>>> to understand the mechanism.
>>>>>
>>>>> Thanks!
>>>>> Jay
>>>>>
>>>>>
>>>>>
>>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>
>>>>> Maybe tomorrow I will build a fresh cluster from scratch to see if the
>>>>> broken behavior experienced today still persists.
>>>>>
>>>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set
>>>>> flags.launcher_dir which would find mesos-docker-executor
>>>>> and mesos-health-check under this dir. Although the env is not propagated,
>>>>> but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is get
>>>>> from it.
>>>>>
>>>>> For example, because I
>>>>> ```
>>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>>> ```
>>>>> before start mesos-slave. So when I launch slave, I could find this
>>>>> log in slave log
>>>>> ```
>>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>>>> xxxxx  --launcher_dir="/tmp"
>>>>> ```
>>>>>
>>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become
>>>>> sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other
>>>>> scripts?
>>>>>
>>>>>
>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>>>>
>>>>>> I just tried setting both the env var and flag on the slaves, and
>>>>>> have determined that the env var is not present when it is being checked
>>>>>> src/docker/executor.cpp @ line 573:
>>>>>>
>>>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>   string path =
>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>>>
>>>>>>
>>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
>>>>>> propagated along up to the point of mesos-slave launch):
>>>>>>
>>>>>> $ cat /etc/default/mesos-slave
>>>>>>> export
>>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>>> export MESOS_PORT="5050"
>>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>>
>>>>>>
>>>>>> TASK OUTPUT:
>>>>>>
>>>>>>
>>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>> Launching health check process:
>>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>>>> --executor=(1)@192.168.225.59:44523
>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>>>> sh -c \" \/bin\/bash
>>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>> Health check process launched at pid: 2519
>>>>>>
>>>>>>
>>>>>> The env var is not propagated when the docker executor is launched
>>>>>> in src/slave/containerizer/docker.cpp around line 903:
>>>>>>
>>>>>>   vector<string> argv;
>>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>>   // Construct the mesos-docker-executor using the "name" we gave the
>>>>>>>   // container (to distinguish it from Docker containers not created
>>>>>>>   // by Mesos).
>>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>>       argv,
>>>>>>>       Subprocess::PIPE(),
>>>>>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>>>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>>>>       environment,
>>>>>>>       lambda::bind(&setup, container->directory));
>>>>>>
>>>>>>
>>>>>> A little ways above we can see the environment is setup w/ the
>>>>>> container tasks defined env vars.
>>>>>>
>>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>>
>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>>            container->executor.command().environment().variables()) {
>>>>>>>     environment[variable.name()] = variable.value();
>>>>>>>   }
>>>>>>
>>>>>>
>>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>
>>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>>> 0.24.1 should be works.
>>>>>>>
>>>>>>> >Do any of you know which host the path
>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>>>>> failing.
>>>>>>>
>>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before?
>>>>>>> We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the
>>>>>>> same dir of mesos-docker-executor.
>>>>>>>
>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Maybe I spoke too soon.
>>>>>>>>
>>>>>>>> Now the checks are attempting to run, however the STDERR is not
>>>>>>>> looking good.  I've added some debugging to the error message output to
>>>>>>>> show the path, argv, and envp variables:
>>>>>>>>
>>>>>>>> STDOUT:
>>>>>>>>
>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>> Starting task
>>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>> Launching health check process:
>>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>>>> sh -c \" exit 1
>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>> Health check process launched at pid: 3012
>>>>>>>>
>>>>>>>>
>>>>>>>> STDERR:
>>>>>>>>
>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on
>>>>>>>>> slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>> memory limited without swap.
>>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in
>>>>>>>>> childMain
>>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700)
>>>>>>>>> from PID 3012; stack trace: ***
>>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>>> @ 0x41921c _Abort()
>>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>>> @ 0x43cc9c
>>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>>> @ 0x7f4a39d92827
>>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>>
>>>>>>>>
>>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence
>>>>>>>> execution failing.
>>>>>>>>
>>>>>>>> This is with current master, git hash
>>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>>
>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>>
>>>>>>>>
>>>>>>>> -Jay
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Update:
>>>>>>>>>
>>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile
>>>>>>>>> and package the latest master (0.26.x) and deployed it to the cluster, and
>>>>>>>>> now health checks are working as advertised in both Marathon and my own
>>>>>>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>>>>>
>>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Jay
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Haosdent,
>>>>>>>>>>
>>>>>>>>>> Can you share your Marathon POST request that results in Mesos
>>>>>>>>>> executing the health checks?
>>>>>>>>>>
>>>>>>>>>> Since we can reference the Marathon framework, I've been doing
>>>>>>>>>> some digging around.
>>>>>>>>>>
>>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>>
>>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>>
>>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's
>>>>>>>>>> dependencies
>>>>>>>>>>
>>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to
>>>>>>>>>> /tmp/X in both the TaskFactory as well an right before the task is sent to
>>>>>>>>>> Mesos via driver.launchTasks:
>>>>>>>>>>
>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>>
>>>>>>>>>> $ git diff
>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>>
>>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>> +        import java.io._
>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>          CreatedTask(
>>>>>>>>>>>            taskInfo,
>>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>>
>>>>>>>>>> $ git diff
>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") {
>>>>>>>>>>> driver =>
>>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>>> +      var i = 0
>>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>> +        import java.io._
>>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() +
>>>>>>>>>>> "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>> +        bw.close()
>>>>>>>>>>> +      }
>>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>>>>> taskInfos.asJava)
>>>>>>>>>>>      }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Then I built and deployed the hacked Marathon and restarted the
>>>>>>>>>> marathon service.
>>>>>>>>>>
>>>>>>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>>>>>>> container with a simple hello-world ruby app running on
>>>>>>>>>> 0.0.0.0:8000)
>>>>>>>>>>
>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST
>>>>>>>>>>> -H'Content-Type: application/json' -d'
>>>>>>>>>>> {
>>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>>   "apps": [
>>>>>>>>>>>     {
>>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>>       "container": {
>>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>>         "docker": {
>>>>>>>>>>>           "image":
>>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>>           "portMappings": [
>>>>>>>>>>>             {
>>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>>             }
>>>>>>>>>>>           ]
>>>>>>>>>>>         }
>>>>>>>>>>>       },
>>>>>>>>>>>       "env": {
>>>>>>>>>>>
>>>>>>>>>>>       },
>>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>>         {
>>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>>         }
>>>>>>>>>>>       ],
>>>>>>>>>>>       "instances": 1,
>>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>>       "mem": 512
>>>>>>>>>>>     }
>>>>>>>>>>>   ]
>>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> $ ls /tmp/
>>>>>>>>>>>
>>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>
>>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Do they match?
>>>>>>>>>>
>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes, so I am confident this is the information being sent across
>>>>>>>>>> the wire to Mesos.
>>>>>>>>>>
>>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>>
>>>>>>>>>> $ cat
>>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>> {
>>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>
>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>   },
>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>   },
>>>>>>>>>>>   "resources":[
>>>>>>>>>>>     {
>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>         "value":1.0
>>>>>>>>>>>       },
>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>     },
>>>>>>>>>>>     {
>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>         "value":512.0
>>>>>>>>>>>       },
>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>     },
>>>>>>>>>>>     {
>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>         "range":[
>>>>>>>>>>>           {
>>>>>>>>>>>             "begin":31641,
>>>>>>>>>>>             "end":31641
>>>>>>>>>>>           }
>>>>>>>>>>>         ]
>>>>>>>>>>>       },
>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>     }
>>>>>>>>>>>   ],
>>>>>>>>>>>   "command":{
>>>>>>>>>>>     "environment":{
>>>>>>>>>>>       "variables":[
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>>
>>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>>
>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>         }
>>>>>>>>>>>       ]
>>>>>>>>>>>     },
>>>>>>>>>>>     "shell":false
>>>>>>>>>>>   },
>>>>>>>>>>>   "container":{
>>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>>     "docker":{
>>>>>>>>>>>
>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>         {
>>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>         }
>>>>>>>>>>>       ],
>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>     }
>>>>>>>>>>>   }
>>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>>
>>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>>
>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>> Starting task
>>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And STDERR:
>>>>>>>>>>
>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on
>>>>>>>>>>> slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>>>> memory limited without swap.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>>
>>>>>>>>>> Any ideas of other things to try or what I could be missing?
>>>>>>>>>> Can't say either way about the Mesos health-check system working or not if
>>>>>>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>>>>
>>>>>>>>>> Thanks for all your help!
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Jay
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Maybe you could post your executor stdout/stderr so that we
>>>>>>>>>>> could know whether health check running not.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> marathon also use mesos health check. When I use health check,
>>>>>>>>>>>> I could saw the log like this in executor stdout.
>>>>>>>>>>>>
>>>>>>>>>>>> ```
>>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>>> Starting task
>>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>>> ```
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <
>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I am using my own framework, and the full task info I'm using
>>>>>>>>>>>>> is posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>>>>>>>>>> Mesos's health checks for its health check system?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, launch the health task through its definition in
>>>>>>>>>>>>> taskinfo. Do you launch your task through Marathon? I could test it in my
>>>>>>>>>>>>> side.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <
>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Precisely, and there are none of those statements.  Are you
>>>>>>>>>>>>>> or others confident health-checks are part of the code path when defined
>>>>>>>>>>>>>> via task info for docker container tasks?  Going through the code, I wasn't
>>>>>>>>>>>>>> able to find the linkage for anything other than health-checks triggered
>>>>>>>>>>>>>> through a custom executor.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With that being said it is a pretty good sized code base and
>>>>>>>>>>>>>> I'm not very familiar with it, so my analysis this far has by no means been
>>>>>>>>>>>>>> exhaustive.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When health check launch, it would have a log like this in
>>>>>>>>>>>>>> your executor stdout
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in
>>>>>>>>>>>>>>> the logs with the string "health" or "Health" if the health-check were
>>>>>>>>>>>>>>> active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether
>>>>>>>>>>>>>>> could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <
>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let
>>>>>>>>>>>>>>>>>> me double check.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll
>>>>>>>>>>>>>>>>>>> look there :)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test
>>>>>>>>>>>>>>>>>>>> it out?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks
>>>>>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker exec with the
>>>>>>>>>>>>>>>>>>>>> command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image
>>>>>>>>>>>>>>>>>>>>> tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if
>>>>>>>>>>>>>>>>>>>>> they ever run the command (in this case `sleep 5`), but have not found any
>>>>>>>>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked
>>>>>>>>>>>>>>>>>>>>> from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does
>>>>>>>>>>>>>>>>>>>>> this mean that health-checks are only supported for custom executors and
>>>>>>>>>>>>>>>>>>>>> not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the
>>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Marco Massenzio <ma...@mesosphere.io>.
Are those the stdout logs of the Agent? Because I don't see the
--launcher-dir set, however, if I look into one that is running off the
same 0.24.1 package, this is what I see:

I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup:
--appc_store_dir="/tmp/mesos/store/appc"
--attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5"
--cgroups_cpu_enable_pids_and_tids_count="false"
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
--cgroups_limit_swap="false" --cgroups_root="mesos"
--container_disk_watch_interval="15secs" --containerizers="docker,mesos"
--default_role="*" --disk_watch_interval="1mins" --docker="docker"
--docker_kill_orphans="true" --docker_remove_delay="6hrs"
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
--enforce_container_disk_quota="false"
--executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
--frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
--hadoop_home="" --help="false" --initialize_driver_logging="true"
--ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem"
--launcher_dir="/usr/libexec/mesos"
--log_dir="/var/local/mesos/logs/agent" --logbufsecs="0"
--logging_level="INFO" --master="zk://192.168.33.1:2181/mesos/vagrant"
--oversubscribed_resources_interval="15secs" --perf_duration="10secs"
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
--quiet="false" --recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="1secs"
--resource_monitoring_interval="1secs"
--resources="ports:[9000-10000];ephemeral_ports:[32768-57344]"
--revocable_cpu_low_priority="true"
--sandbox_directory="/var/local/sandbox" --strict="true"
--switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
(this is run off the Vagrantfile at [0] in case you want to reproduce).
That agent is not run via the init command, though, I execute it manually
via the `run-agent.sh` in the same directory.

I don't really think this matters, but I assume you also restarted the
agent after making the config changes?
(and, for your own sanity - you can double check the version by looking at
the very head of the logs).






--
*Marco Massenzio*
Distributed Systems Engineer
http://codetrips.com

On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <ou...@gmail.com> wrote:

> Hi Haosdent and Mesos friends,
>
> I've rebuilt the cluster from scratch and installed mesos 0.24.1 from the
> mesosphere apt repo:
>
> $ dpkg -l | grep mesos
> ii  mesos                               0.24.1-0.2.35.ubuntu1404
>  amd64        Cluster resource manager with efficient resource isolation
>
> Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir on the
> slaves:
>
> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
> /usr/libexec/mesos
>
> And yet the task health-checks are still being launched from the sandbox
> directory like before!
>
> I've also tested setting the MESOS_LAUNCHER_DIR env var and get the
> identical result (just as before on the cluster where many versions of
> mesos had been installed):
>
> STDOUT:
>
> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>> --docker="docker" --help="false" --initialize_driver_logging="true"
>> --logbufsecs="0" --logging_level="INFO"
>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>> --stop_timeout="0ns"
>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>> --docker="docker" --help="false" --initialize_driver_logging="true"
>> --logbufsecs="0" --logging_level="INFO"
>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>> --stop_timeout="0ns"
>> Registered docker executor on mesos-worker1a
>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>> Launching health check process:
>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
>> --executor=(1)@192.168.225.58:48912
>> --health_check_json={"command":{"shell":true,"value":"docker exec
>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>> 127.0.0.1:8000
>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>> Health check process launched at pid: 11253
>
>
>
> STDERR:
>
> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>> --docker="docker" --help="false" --initialize_driver_logging="true"
>> --logbufsecs="0" --logging_level="INFO"
>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>> --stop_timeout="0ns"
>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>> --docker="docker" --help="false" --initialize_driver_logging="true"
>> --logbufsecs="0" --logging_level="INFO"
>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>> --stop_timeout="0ns"
>> Registered docker executor on mesos-worker1a
>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>> *Launching health check process:
>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
>> --executor=(1)@192.168.225.58:48912
>> --health_check_json={"command":{"shell":true,"value":"docker exec
>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>> 127.0.0.1:8000
>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>> Health check process launched at pid: 11253
>
>
> Any ideas on where to go from here?  Is there any additional information I
> can provide?
>
> Thanks as always,
> Jay
>
>
> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <ha...@gmail.com> wrote:
>
>> For flag sent to the executor from containerizer, the flag would
>> stringify and become a command line parameter when launch executor.
>>
>> You could see this in
>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>
>> But for launcher_dir, the executor get it from `argv[0]`, as you
>> mentioned above.
>> ```
>>   string path =
>>     envPath.isSome() ? envPath.get()
>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>
>> ```
>> So I want to figure out why your argv[0] would become sandbox dir, not
>> "/usr/libexec/mesos".
>>
>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> I see.  And then how are the flags sent to the executor?
>>>
>>>
>>>
>>> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com> wrote:
>>>
>>> Yes. The related code is located in
>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>
>>> In fact, environment variables starts with MESOS_ would load as flags
>>> variables.
>>>
>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>
>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>
>>>> One question for you haosdent-
>>>>
>>>> You mentioned that the flags.launcher_dir should propagate to the
>>>> docker executor all the way up the chain.  Can you show me where this logic
>>>> is in the codebase?  I didn't see where that was happening and would like
>>>> to understand the mechanism.
>>>>
>>>> Thanks!
>>>> Jay
>>>>
>>>>
>>>>
>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>
>>>> Maybe tomorrow I will build a fresh cluster from scratch to see if the
>>>> broken behavior experienced today still persists.
>>>>
>>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>>>>
>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir
>>>> which would find mesos-docker-executor and mesos-health-check under this
>>>> dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still
>>>> works because flags.launcher_dir is get from it.
>>>>
>>>> For example, because I
>>>> ```
>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>> ```
>>>> before start mesos-slave. So when I launch slave, I could find this log
>>>> in slave log
>>>> ```
>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>>> xxxxx  --launcher_dir="/tmp"
>>>> ```
>>>>
>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become
>>>> sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other
>>>> scripts?
>>>>
>>>>
>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>
>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>>>
>>>>> I just tried setting both the env var and flag on the slaves, and have
>>>>> determined that the env var is not present when it is being checked
>>>>> src/docker/executor.cpp @ line 573:
>>>>>
>>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>   string path =
>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>>
>>>>>
>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
>>>>> propagated along up to the point of mesos-slave launch):
>>>>>
>>>>> $ cat /etc/default/mesos-slave
>>>>>> export
>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>> export MESOS_PORT="5050"
>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>
>>>>>
>>>>> TASK OUTPUT:
>>>>>
>>>>>
>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>>> Registered docker executor on mesos-worker2a
>>>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>> Launching health check process:
>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>>> --executor=(1)@192.168.225.59:44523
>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>>> sh -c \" \/bin\/bash
>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>> Health check process launched at pid: 2519
>>>>>
>>>>>
>>>>> The env var is not propagated when the docker executor is launched
>>>>> in src/slave/containerizer/docker.cpp around line 903:
>>>>>
>>>>>   vector<string> argv;
>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>   // Construct the mesos-docker-executor using the "name" we gave the
>>>>>>   // container (to distinguish it from Docker containers not created
>>>>>>   // by Mesos).
>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>       argv,
>>>>>>       Subprocess::PIPE(),
>>>>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>>>       environment,
>>>>>>       lambda::bind(&setup, container->directory));
>>>>>
>>>>>
>>>>> A little ways above we can see the environment is setup w/ the
>>>>> container tasks defined env vars.
>>>>>
>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>
>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>            container->executor.command().environment().variables()) {
>>>>>>     environment[variable.name()] = variable.value();
>>>>>>   }
>>>>>
>>>>>
>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>
>>>>>
>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>> 0.24.1 should be works.
>>>>>>
>>>>>> >Do any of you know which host the path
>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>>>> failing.
>>>>>>
>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We
>>>>>> got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the
>>>>>> same dir of mesos-docker-executor.
>>>>>>
>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Maybe I spoke too soon.
>>>>>>>
>>>>>>> Now the checks are attempting to run, however the STDERR is not
>>>>>>> looking good.  I've added some debugging to the error message output to
>>>>>>> show the path, argv, and envp variables:
>>>>>>>
>>>>>>> STDOUT:
>>>>>>>
>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>> --stop_timeout="0ns"
>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>> --stop_timeout="0ns"
>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>> Starting task
>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>> Launching health check process:
>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>>> sh -c \" exit 1
>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>> Health check process launched at pid: 3012
>>>>>>>
>>>>>>>
>>>>>>> STDERR:
>>>>>>>
>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on
>>>>>>>> slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>> memory limited without swap.
>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain
>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from
>>>>>>>> PID 3012; stack trace: ***
>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>> @ 0x41921c _Abort()
>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>> @ 0x43cc9c
>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>> @ 0x7f4a39d92827
>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>
>>>>>>>
>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>> should exist on? It definitely doesn't exist on the slave, hence
>>>>>>> execution failing.
>>>>>>>
>>>>>>> This is with current master, git hash
>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>
>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>
>>>>>>>
>>>>>>> -Jay
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Update:
>>>>>>>>
>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile and
>>>>>>>> package the latest master (0.26.x) and deployed it to the cluster, and now
>>>>>>>> health checks are working as advertised in both Marathon and my own
>>>>>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>>>>
>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Jay
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Haosdent,
>>>>>>>>>
>>>>>>>>> Can you share your Marathon POST request that results in Mesos
>>>>>>>>> executing the health checks?
>>>>>>>>>
>>>>>>>>> Since we can reference the Marathon framework, I've been doing
>>>>>>>>> some digging around.
>>>>>>>>>
>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>
>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>
>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>>>>>>
>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to
>>>>>>>>> /tmp/X in both the TaskFactory as well an right before the task is sent to
>>>>>>>>> Mesos via driver.launchTasks:
>>>>>>>>>
>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>
>>>>>>>>> $ git diff
>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>
>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>> +        import java.io._
>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>> +        bw.close()
>>>>>>>>>>          CreatedTask(
>>>>>>>>>>            taskInfo,
>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>
>>>>>>>>> $ git diff
>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver
>>>>>>>>>> =>
>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>> +      var i = 0
>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>> +        import java.io._
>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() +
>>>>>>>>>> "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>> +        bw.close()
>>>>>>>>>> +      }
>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>>>> taskInfos.asJava)
>>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Then I built and deployed the hacked Marathon and restarted the
>>>>>>>>> marathon service.
>>>>>>>>>
>>>>>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>>>>>> container with a simple hello-world ruby app running on
>>>>>>>>> 0.0.0.0:8000)
>>>>>>>>>
>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST
>>>>>>>>>> -H'Content-Type: application/json' -d'
>>>>>>>>>> {
>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>   "apps": [
>>>>>>>>>>     {
>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>       "container": {
>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>         "docker": {
>>>>>>>>>>           "image":
>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>           "portMappings": [
>>>>>>>>>>             {
>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>             }
>>>>>>>>>>           ]
>>>>>>>>>>         }
>>>>>>>>>>       },
>>>>>>>>>>       "env": {
>>>>>>>>>>
>>>>>>>>>>       },
>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>         {
>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>         }
>>>>>>>>>>       ],
>>>>>>>>>>       "instances": 1,
>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>       "mem": 512
>>>>>>>>>>     }
>>>>>>>>>>   ]
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> $ ls /tmp/
>>>>>>>>>>
>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>
>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Do they match?
>>>>>>>>>
>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes, so I am confident this is the information being sent across
>>>>>>>>> the wire to Mesos.
>>>>>>>>>
>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>
>>>>>>>>> $ cat
>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>> {
>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>   "task_id":{
>>>>>>>>>>
>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>   },
>>>>>>>>>>   "slave_id":{
>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>   },
>>>>>>>>>>   "resources":[
>>>>>>>>>>     {
>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>       "scalar":{
>>>>>>>>>>         "value":1.0
>>>>>>>>>>       },
>>>>>>>>>>       "role":"*"
>>>>>>>>>>     },
>>>>>>>>>>     {
>>>>>>>>>>       "name":"mem",
>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>       "scalar":{
>>>>>>>>>>         "value":512.0
>>>>>>>>>>       },
>>>>>>>>>>       "role":"*"
>>>>>>>>>>     },
>>>>>>>>>>     {
>>>>>>>>>>       "name":"ports",
>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>       "ranges":{
>>>>>>>>>>         "range":[
>>>>>>>>>>           {
>>>>>>>>>>             "begin":31641,
>>>>>>>>>>             "end":31641
>>>>>>>>>>           }
>>>>>>>>>>         ]
>>>>>>>>>>       },
>>>>>>>>>>       "role":"*"
>>>>>>>>>>     }
>>>>>>>>>>   ],
>>>>>>>>>>   "command":{
>>>>>>>>>>     "environment":{
>>>>>>>>>>       "variables":[
>>>>>>>>>>         {
>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>           "value":"31641"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>
>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>
>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>           "value":"31641"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>           "value":"31641"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>           "value":"31641"
>>>>>>>>>>         }
>>>>>>>>>>       ]
>>>>>>>>>>     },
>>>>>>>>>>     "shell":false
>>>>>>>>>>   },
>>>>>>>>>>   "container":{
>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>     "docker":{
>>>>>>>>>>
>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>         {
>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>         }
>>>>>>>>>>       ],
>>>>>>>>>>       "privileged":false,
>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>     }
>>>>>>>>>>   }
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>
>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>
>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>> Starting task
>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And STDERR:
>>>>>>>>>
>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on
>>>>>>>>>> slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>>> memory limited without swap.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>
>>>>>>>>> Any ideas of other things to try or what I could be missing?
>>>>>>>>> Can't say either way about the Mesos health-check system working or not if
>>>>>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>>>
>>>>>>>>> Thanks for all your help!
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jay
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Maybe you could post your executor stdout/stderr so that we could
>>>>>>>>>> know whether health check running not.
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> marathon also use mesos health check. When I use health check, I
>>>>>>>>>>> could saw the log like this in executor stdout.
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>> Starting task
>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>> Launching health check process:
>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <outtatime@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I am using my own framework, and the full task info I'm using
>>>>>>>>>>>> is posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>>>>>>>>> Mesos's health checks for its health check system?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, launch the health task through its definition in taskinfo.
>>>>>>>>>>>> Do you launch your task through Marathon? I could test it in my side.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <
>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Precisely, and there are none of those statements.  Are you or
>>>>>>>>>>>>> others confident health-checks are part of the code path when defined via
>>>>>>>>>>>>> task info for docker container tasks?  Going through the code, I wasn't
>>>>>>>>>>>>> able to find the linkage for anything other than health-checks triggered
>>>>>>>>>>>>> through a custom executor.
>>>>>>>>>>>>>
>>>>>>>>>>>>> With that being said it is a pretty good sized code base and
>>>>>>>>>>>>> I'm not very familiar with it, so my analysis this far has by no means been
>>>>>>>>>>>>> exhaustive.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> When health check launch, it would have a log like this in
>>>>>>>>>>>>> your executor stdout
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>> ```
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in
>>>>>>>>>>>>>> the logs with the string "health" or "Health" if the health-check were
>>>>>>>>>>>>>> active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether
>>>>>>>>>>>>>> could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <haosdent@gmail.com
>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let
>>>>>>>>>>>>>>>>> me double check.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll
>>>>>>>>>>>>>>>>>> look there :)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test
>>>>>>>>>>>>>>>>>>> it out?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks
>>>>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker exec with the
>>>>>>>>>>>>>>>>>>>> command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image
>>>>>>>>>>>>>>>>>>>> tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if
>>>>>>>>>>>>>>>>>>>> they ever run the command (in this case `sleep 5`), but have not found any
>>>>>>>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked
>>>>>>>>>>>>>>>>>>>> from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does
>>>>>>>>>>>>>>>>>>>> this mean that health-checks are only supported for custom executors and
>>>>>>>>>>>>>>>>>>>> not for docker tasks?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the
>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
Hi Haosdent and Mesos friends,

I've rebuilt the cluster from scratch and installed mesos 0.24.1 from the
mesosphere apt repo:

$ dpkg -l | grep mesos
ii  mesos                               0.24.1-0.2.35.ubuntu1404
 amd64        Cluster resource manager with efficient resource isolation

Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir on the
slaves:

mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
/usr/libexec/mesos

And yet the task health-checks are still being launched from the sandbox
directory like before!

I've also tested setting the MESOS_LAUNCHER_DIR env var and get the
identical result (just as before on the cluster where many versions of
mesos had been installed):

STDOUT:

--container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
> --docker="docker" --help="false" --initialize_driver_logging="true"
> --logbufsecs="0" --logging_level="INFO"
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
> --stop_timeout="0ns"
> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
> --docker="docker" --help="false" --initialize_driver_logging="true"
> --logbufsecs="0" --logging_level="INFO"
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
> --stop_timeout="0ns"
> Registered docker executor on mesos-worker1a
> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
> Launching health check process:
> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
> --executor=(1)@192.168.225.58:48912
> --health_check_json={"command":{"shell":true,"value":"docker exec
> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
> 127.0.0.1:8000
> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
> Health check process launched at pid: 11253



STDERR:

--container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
> --docker="docker" --help="false" --initialize_driver_logging="true"
> --logbufsecs="0" --logging_level="INFO"
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
> --stop_timeout="0ns"
> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
> --docker="docker" --help="false" --initialize_driver_logging="true"
> --logbufsecs="0" --logging_level="INFO"
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
> --stop_timeout="0ns"
> Registered docker executor on mesos-worker1a
> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
> *Launching health check process:
> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
> --executor=(1)@192.168.225.58:48912
> --health_check_json={"command":{"shell":true,"value":"docker exec
> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
> 127.0.0.1:8000
> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
> Health check process launched at pid: 11253


Any ideas on where to go from here?  Is there any additional information I
can provide?

Thanks as always,
Jay


On Thu, Oct 8, 2015 at 9:23 PM, haosdent <ha...@gmail.com> wrote:

> For flag sent to the executor from containerizer, the flag would stringify
> and become a command line parameter when launch executor.
>
> You could see this in
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>
> But for launcher_dir, the executor get it from `argv[0]`, as you mentioned
> above.
> ```
>   string path =
>     envPath.isSome() ? envPath.get()
>                      : os::realpath(Path(argv[0]).dirname()).get();
>
> ```
> So I want to figure out why your argv[0] would become sandbox dir, not
> "/usr/libexec/mesos".
>
> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <ou...@gmail.com> wrote:
>
>> I see.  And then how are the flags sent to the executor?
>>
>>
>>
>> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com> wrote:
>>
>> Yes. The related code is located in
>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>
>> In fact, environment variables starts with MESOS_ would load as flags
>> variables.
>>
>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>
>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> One question for you haosdent-
>>>
>>> You mentioned that the flags.launcher_dir should propagate to the docker
>>> executor all the way up the chain.  Can you show me where this logic is in
>>> the codebase?  I didn't see where that was happening and would like to
>>> understand the mechanism.
>>>
>>> Thanks!
>>> Jay
>>>
>>>
>>>
>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>
>>> Maybe tomorrow I will build a fresh cluster from scratch to see if the
>>> broken behavior experienced today still persists.
>>>
>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>>>
>>> As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir
>>> which would find mesos-docker-executor and mesos-health-check under this
>>> dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still
>>> works because flags.launcher_dir is get from it.
>>>
>>> For example, because I
>>> ```
>>> export MESOS_LAUNCHER_DIR=/tmp
>>> ```
>>> before start mesos-slave. So when I launch slave, I could find this log
>>> in slave log
>>> ```
>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>> xxxxx  --launcher_dir="/tmp"
>>> ```
>>>
>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox
>>> dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts?
>>>
>>>
>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>
>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>>
>>>> I just tried setting both the env var and flag on the slaves, and have
>>>> determined that the env var is not present when it is being checked
>>>> src/docker/executor.cpp @ line 573:
>>>>
>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>   string path =
>>>>>     envPath.isSome() ? envPath.get()
>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>
>>>>
>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
>>>> propagated along up to the point of mesos-slave launch):
>>>>
>>>> $ cat /etc/default/mesos-slave
>>>>> export
>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>> export MESOS_PORT="5050"
>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>
>>>>
>>>> TASK OUTPUT:
>>>>
>>>>
>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>> Registered docker executor on mesos-worker2a
>>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>> Launching health check process:
>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>> --executor=(1)@192.168.225.59:44523
>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>> sh -c \" \/bin\/bash
>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>> Health check process launched at pid: 2519
>>>>
>>>>
>>>> The env var is not propagated when the docker executor is launched
>>>> in src/slave/containerizer/docker.cpp around line 903:
>>>>
>>>>   vector<string> argv;
>>>>>   argv.push_back("mesos-docker-executor");
>>>>>   // Construct the mesos-docker-executor using the "name" we gave the
>>>>>   // container (to distinguish it from Docker containers not created
>>>>>   // by Mesos).
>>>>>   Try<Subprocess> s = subprocess(
>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>       argv,
>>>>>       Subprocess::PIPE(),
>>>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>>       environment,
>>>>>       lambda::bind(&setup, container->directory));
>>>>
>>>>
>>>> A little ways above we can see the environment is setup w/ the
>>>> container tasks defined env vars.
>>>>
>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>
>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>   foreach (const Environment::Variable& variable,
>>>>>            container->executor.command().environment().variables()) {
>>>>>     environment[variable.name()] = variable.value();
>>>>>   }
>>>>
>>>>
>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>
>>>>
>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:
>>>>
>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>> 0.24.1 should be works.
>>>>>
>>>>> >Do any of you know which host the path
>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>>> failing.
>>>>>
>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We
>>>>> got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the
>>>>> same dir of mesos-docker-executor.
>>>>>
>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Maybe I spoke too soon.
>>>>>>
>>>>>> Now the checks are attempting to run, however the STDERR is not
>>>>>> looking good.  I've added some debugging to the error message output to
>>>>>> show the path, argv, and envp variables:
>>>>>>
>>>>>> STDOUT:
>>>>>>
>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>> --stop_timeout="0ns"
>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>> --stop_timeout="0ns"
>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>> Starting task
>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>> Launching health check process:
>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>> sh -c \" exit 1
>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>> Health check process launched at pid: 3012
>>>>>>
>>>>>>
>>>>>> STDERR:
>>>>>>
>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on
>>>>>>> slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>> memory limited without swap.
>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain
>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from
>>>>>>> PID 3012; stack trace: ***
>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>> @ 0x4191e2 _Abort()
>>>>>>> @ 0x41921c _Abort()
>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>> @ 0x43cc9c
>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>> @ 0x7f4a39d92827
>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>
>>>>>>
>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>> should exist on? It definitely doesn't exist on the slave, hence
>>>>>> execution failing.
>>>>>>
>>>>>> This is with current master, git hash
>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>
>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>
>>>>>>
>>>>>> -Jay
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Update:
>>>>>>>
>>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile and
>>>>>>> package the latest master (0.26.x) and deployed it to the cluster, and now
>>>>>>> health checks are working as advertised in both Marathon and my own
>>>>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>>>
>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Jay
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Haosdent,
>>>>>>>>
>>>>>>>> Can you share your Marathon POST request that results in Mesos
>>>>>>>> executing the health checks?
>>>>>>>>
>>>>>>>> Since we can reference the Marathon framework, I've been doing some
>>>>>>>> digging around.
>>>>>>>>
>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>
>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>
>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>>>>>
>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to
>>>>>>>> /tmp/X in both the TaskFactory as well an right before the task is sent to
>>>>>>>> Mesos via driver.launchTasks:
>>>>>>>>
>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>
>>>>>>>> $ git diff
>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>
>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>> +        import java.io._
>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>> +        bw.write("\n")
>>>>>>>>> +        bw.close()
>>>>>>>>>          CreatedTask(
>>>>>>>>>            taskInfo,
>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>
>>>>>>>> $ git diff
>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver
>>>>>>>>> =>
>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>> +      var i = 0
>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>> +        import java.io._
>>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() +
>>>>>>>>> "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>> +        bw.write("\n")
>>>>>>>>> +        bw.close()
>>>>>>>>> +      }
>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>>> taskInfos.asJava)
>>>>>>>>>      }
>>>>>>>>
>>>>>>>>
>>>>>>>> Then I built and deployed the hacked Marathon and restarted the
>>>>>>>> marathon service.
>>>>>>>>
>>>>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>>>>> container with a simple hello-world ruby app running on
>>>>>>>> 0.0.0.0:8000)
>>>>>>>>
>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type:
>>>>>>>>> application/json' -d'
>>>>>>>>> {
>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>   "apps": [
>>>>>>>>>     {
>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>       "container": {
>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>         "docker": {
>>>>>>>>>           "image":
>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>           "portMappings": [
>>>>>>>>>             {
>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>               "hostPort": 0,
>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>             }
>>>>>>>>>           ]
>>>>>>>>>         }
>>>>>>>>>       },
>>>>>>>>>       "env": {
>>>>>>>>>
>>>>>>>>>       },
>>>>>>>>>       "healthChecks": [
>>>>>>>>>         {
>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>         }
>>>>>>>>>       ],
>>>>>>>>>       "instances": 1,
>>>>>>>>>       "cpus": 1,
>>>>>>>>>       "mem": 512
>>>>>>>>>     }
>>>>>>>>>   ]
>>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> $ ls /tmp/
>>>>>>>>>
>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>
>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>
>>>>>>>>
>>>>>>>> Do they match?
>>>>>>>>
>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, so I am confident this is the information being sent across
>>>>>>>> the wire to Mesos.
>>>>>>>>
>>>>>>>> Do they contain any health-check information?
>>>>>>>>
>>>>>>>> $ cat
>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>> {
>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>   "task_id":{
>>>>>>>>>
>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>   },
>>>>>>>>>   "slave_id":{
>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>   },
>>>>>>>>>   "resources":[
>>>>>>>>>     {
>>>>>>>>>       "name":"cpus",
>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>       "scalar":{
>>>>>>>>>         "value":1.0
>>>>>>>>>       },
>>>>>>>>>       "role":"*"
>>>>>>>>>     },
>>>>>>>>>     {
>>>>>>>>>       "name":"mem",
>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>       "scalar":{
>>>>>>>>>         "value":512.0
>>>>>>>>>       },
>>>>>>>>>       "role":"*"
>>>>>>>>>     },
>>>>>>>>>     {
>>>>>>>>>       "name":"ports",
>>>>>>>>>       "type":"RANGES",
>>>>>>>>>       "ranges":{
>>>>>>>>>         "range":[
>>>>>>>>>           {
>>>>>>>>>             "begin":31641,
>>>>>>>>>             "end":31641
>>>>>>>>>           }
>>>>>>>>>         ]
>>>>>>>>>       },
>>>>>>>>>       "role":"*"
>>>>>>>>>     }
>>>>>>>>>   ],
>>>>>>>>>   "command":{
>>>>>>>>>     "environment":{
>>>>>>>>>       "variables":[
>>>>>>>>>         {
>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>           "value":"31641"
>>>>>>>>>         },
>>>>>>>>>         {
>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>         },
>>>>>>>>>         {
>>>>>>>>>           "name":"HOST",
>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>         },
>>>>>>>>>         {
>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>
>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>         },
>>>>>>>>>         {
>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>
>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>         },
>>>>>>>>>         {
>>>>>>>>>           "name":"PORT",
>>>>>>>>>           "value":"31641"
>>>>>>>>>         },
>>>>>>>>>         {
>>>>>>>>>           "name":"PORTS",
>>>>>>>>>           "value":"31641"
>>>>>>>>>         },
>>>>>>>>>         {
>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>         },
>>>>>>>>>         {
>>>>>>>>>           "name":"PORT0",
>>>>>>>>>           "value":"31641"
>>>>>>>>>         }
>>>>>>>>>       ]
>>>>>>>>>     },
>>>>>>>>>     "shell":false
>>>>>>>>>   },
>>>>>>>>>   "container":{
>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>     "docker":{
>>>>>>>>>
>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>       "port_mappings":[
>>>>>>>>>         {
>>>>>>>>>           "host_port":31641,
>>>>>>>>>           "container_port":8000,
>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>         }
>>>>>>>>>       ],
>>>>>>>>>       "privileged":false,
>>>>>>>>>       "force_pull_image":false
>>>>>>>>>     }
>>>>>>>>>   }
>>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>
>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>
>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>> Starting task
>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>
>>>>>>>>
>>>>>>>> And STDERR:
>>>>>>>>
>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on
>>>>>>>>> slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>> memory limited without swap.
>>>>>>>>
>>>>>>>>
>>>>>>>> Again, nothing about any health checks.
>>>>>>>>
>>>>>>>> Any ideas of other things to try or what I could be missing?  Can't
>>>>>>>> say either way about the Mesos health-check system working or not if
>>>>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>>
>>>>>>>> Thanks for all your help!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jay
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Maybe you could post your executor stdout/stderr so that we could
>>>>>>>>> know whether health check running not.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> marathon also use mesos health check. When I use health check, I
>>>>>>>>>> could saw the log like this in executor stdout.
>>>>>>>>>>
>>>>>>>>>> ```
>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>> Starting task
>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>> Launching health check process:
>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>> ```
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I am using my own framework, and the full task info I'm using is
>>>>>>>>>>> posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>>>>>>>> Mesos's health checks for its health check system?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Yes, launch the health task through its definition in taskinfo.
>>>>>>>>>>> Do you launch your task through Marathon? I could test it in my side.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <outtatime@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Precisely, and there are none of those statements.  Are you or
>>>>>>>>>>>> others confident health-checks are part of the code path when defined via
>>>>>>>>>>>> task info for docker container tasks?  Going through the code, I wasn't
>>>>>>>>>>>> able to find the linkage for anything other than health-checks triggered
>>>>>>>>>>>> through a custom executor.
>>>>>>>>>>>>
>>>>>>>>>>>> With that being said it is a pretty good sized code base and
>>>>>>>>>>>> I'm not very familiar with it, so my analysis this far has by no means been
>>>>>>>>>>>> exhaustive.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> When health check launch, it would have a log like this in your
>>>>>>>>>>>> executor stdout
>>>>>>>>>>>> ```
>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>> ```
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the
>>>>>>>>>>>>> logs with the string "health" or "Health" if the health-check were active?
>>>>>>>>>>>>> None of my master or slave logs contain the string..
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether
>>>>>>>>>>>>> could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>> haosdent@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me
>>>>>>>>>>>>>>>> double check.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll
>>>>>>>>>>>>>>>>> look there :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test
>>>>>>>>>>>>>>>>>> it out?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks
>>>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker exec with the
>>>>>>>>>>>>>>>>>>> command you provided as health checks.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image
>>>>>>>>>>>>>>>>>>> tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if
>>>>>>>>>>>>>>>>>>> they ever run the command (in this case `sleep 5`), but have not found any
>>>>>>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>>>>>>>>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>>>>>>>>>>>>>> docker tasks?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
For flag sent to the executor from containerizer, the flag would stringify
and become a command line parameter when launch executor.

You could see this in
https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288

But for launcher_dir, the executor get it from `argv[0]`, as you mentioned
above.
```
  string path =
    envPath.isSome() ? envPath.get()
                     : os::realpath(Path(argv[0]).dirname()).get();

```
So I want to figure out why your argv[0] would become sandbox dir, not
"/usr/libexec/mesos".

On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <ou...@gmail.com> wrote:

> I see.  And then how are the flags sent to the executor?
>
>
>
> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com> wrote:
>
> Yes. The related code is located in
> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>
> In fact, environment variables starts with MESOS_ would load as flags
> variables.
>
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>
> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com> wrote:
>
>> One question for you haosdent-
>>
>> You mentioned that the flags.launcher_dir should propagate to the docker
>> executor all the way up the chain.  Can you show me where this logic is in
>> the codebase?  I didn't see where that was happening and would like to
>> understand the mechanism.
>>
>> Thanks!
>> Jay
>>
>>
>>
>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com> wrote:
>>
>> Maybe tomorrow I will build a fresh cluster from scratch to see if the
>> broken behavior experienced today still persists.
>>
>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>>
>> As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir
>> which would find mesos-docker-executor and mesos-health-check under this
>> dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still
>> works because flags.launcher_dir is get from it.
>>
>> For example, because I
>> ```
>> export MESOS_LAUNCHER_DIR=/tmp
>> ```
>> before start mesos-slave. So when I launch slave, I could find this log
>> in slave log
>> ```
>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>> xxxxx  --launcher_dir="/tmp"
>> ```
>>
>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox
>> dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts?
>>
>>
>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>
>>> I just tried setting both the env var and flag on the slaves, and have
>>> determined that the env var is not present when it is being checked
>>> src/docker/executor.cpp @ line 573:
>>>
>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>   string path =
>>>>     envPath.isSome() ? envPath.get()
>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome()
>>>> ? "yes" : "no") << endl;
>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>
>>>
>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
>>> propagated along up to the point of mesos-slave launch):
>>>
>>> $ cat /etc/default/mesos-slave
>>>> export
>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>> export MESOS_PORT="5050"
>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>
>>>
>>> TASK OUTPUT:
>>>
>>>
>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>> Registered docker executor on mesos-worker2a
>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>> Launching health check process:
>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>> --executor=(1)@192.168.225.59:44523
>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>> sh -c \" \/bin\/bash
>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>> Health check process launched at pid: 2519
>>>
>>>
>>> The env var is not propagated when the docker executor is launched
>>> in src/slave/containerizer/docker.cpp around line 903:
>>>
>>>   vector<string> argv;
>>>>   argv.push_back("mesos-docker-executor");
>>>>   // Construct the mesos-docker-executor using the "name" we gave the
>>>>   // container (to distinguish it from Docker containers not created
>>>>   // by Mesos).
>>>>   Try<Subprocess> s = subprocess(
>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>       argv,
>>>>       Subprocess::PIPE(),
>>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>       environment,
>>>>       lambda::bind(&setup, container->directory));
>>>
>>>
>>> A little ways above we can see the environment is setup w/ the container
>>> tasks defined env vars.
>>>
>>> See src/slave/containerizer/docker.cpp around line 871:
>>>
>>>   // Include any enviroment variables from ExecutorInfo.
>>>>   foreach (const Environment::Variable& variable,
>>>>            container->executor.command().environment().variables()) {
>>>>     environment[variable.name()] = variable.value();
>>>>   }
>>>
>>>
>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>
>>>
>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:
>>>
>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>> 0.24.1 should be works.
>>>>
>>>> >Do any of you know which host the path
>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>> failing.
>>>>
>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We
>>>> got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the
>>>> same dir of mesos-docker-executor.
>>>>
>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com>
>>>> wrote:
>>>>
>>>>> Maybe I spoke too soon.
>>>>>
>>>>> Now the checks are attempting to run, however the STDERR is not
>>>>> looking good.  I've added some debugging to the error message output to
>>>>> show the path, argv, and envp variables:
>>>>>
>>>>> STDOUT:
>>>>>
>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>> --stop_timeout="0ns"
>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>> --stop_timeout="0ns"
>>>>>> Registered docker executor on mesos-worker2a
>>>>>> Starting task
>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>> Launching health check process:
>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>> sh -c \" exit 1
>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>> Health check process launched at pid: 3012
>>>>>
>>>>>
>>>>> STDERR:
>>>>>
>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave
>>>>>> 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory
>>>>>> limited without swap.
>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain
>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from
>>>>>> PID 3012; stack trace: ***
>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>> @ 0x4191e2 _Abort()
>>>>>> @ 0x41921c _Abort()
>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>> @ 0x43cc9c
>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>> @ 0x7f4a39d92827
>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>
>>>>>
>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>> should exist on? It definitely doesn't exist on the slave, hence
>>>>> execution failing.
>>>>>
>>>>> This is with current master, git hash
>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>
>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>
>>>>>
>>>>> -Jay
>>>>>
>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Update:
>>>>>>
>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile and
>>>>>> package the latest master (0.26.x) and deployed it to the cluster, and now
>>>>>> health checks are working as advertised in both Marathon and my own
>>>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>>
>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>
>>>>>> Cheers,
>>>>>> Jay
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Haosdent,
>>>>>>>
>>>>>>> Can you share your Marathon POST request that results in Mesos
>>>>>>> executing the health checks?
>>>>>>>
>>>>>>> Since we can reference the Marathon framework, I've been doing some
>>>>>>> digging around.
>>>>>>>
>>>>>>> Here are the details of my setup and findings:
>>>>>>>
>>>>>>> I put a few small hacks in Marathon:
>>>>>>>
>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>>>>
>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to
>>>>>>> /tmp/X in both the TaskFactory as well an right before the task is sent to
>>>>>>> Mesos via driver.launchTasks:
>>>>>>>
>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>
>>>>>>> $ git diff
>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>
>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>> +        import java.io._
>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>> +        bw.write("\n")
>>>>>>>> +        bw.close()
>>>>>>>>          CreatedTask(
>>>>>>>>            taskInfo,
>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>
>>>>>>> $ git diff
>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>> +      var i = 0
>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>> +        import java.io._
>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-"
>>>>>>>> + taskInfos(i).getTaskId.getValue)
>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>> +        bw.write("\n")
>>>>>>>> +        bw.close()
>>>>>>>> +      }
>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>> taskInfos.asJava)
>>>>>>>>      }
>>>>>>>
>>>>>>>
>>>>>>> Then I built and deployed the hacked Marathon and restarted the
>>>>>>> marathon service.
>>>>>>>
>>>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>>>> container with a simple hello-world ruby app running on 0.0.0.0:8000
>>>>>>> )
>>>>>>>
>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type:
>>>>>>>> application/json' -d'
>>>>>>>> {
>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>   "apps": [
>>>>>>>>     {
>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>       "container": {
>>>>>>>>         "type": "DOCKER",
>>>>>>>>         "docker": {
>>>>>>>>           "image":
>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>           "network": "BRIDGE",
>>>>>>>>           "portMappings": [
>>>>>>>>             {
>>>>>>>>               "containerPort": 8000,
>>>>>>>>               "hostPort": 0,
>>>>>>>>               "protocol": "tcp"
>>>>>>>>             }
>>>>>>>>           ]
>>>>>>>>         }
>>>>>>>>       },
>>>>>>>>       "env": {
>>>>>>>>
>>>>>>>>       },
>>>>>>>>       "healthChecks": [
>>>>>>>>         {
>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>         }
>>>>>>>>       ],
>>>>>>>>       "instances": 1,
>>>>>>>>       "cpus": 1,
>>>>>>>>       "mem": 512
>>>>>>>>     }
>>>>>>>>   ]
>>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> $ ls /tmp/
>>>>>>>>
>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>
>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>
>>>>>>>
>>>>>>> Do they match?
>>>>>>>
>>>>>>> $ md5sum /tmp/task*
>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>
>>>>>>>
>>>>>>> Yes, so I am confident this is the information being sent across the
>>>>>>> wire to Mesos.
>>>>>>>
>>>>>>> Do they contain any health-check information?
>>>>>>>
>>>>>>> $ cat
>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>> {
>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>   "task_id":{
>>>>>>>>
>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>   },
>>>>>>>>   "slave_id":{
>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>   },
>>>>>>>>   "resources":[
>>>>>>>>     {
>>>>>>>>       "name":"cpus",
>>>>>>>>       "type":"SCALAR",
>>>>>>>>       "scalar":{
>>>>>>>>         "value":1.0
>>>>>>>>       },
>>>>>>>>       "role":"*"
>>>>>>>>     },
>>>>>>>>     {
>>>>>>>>       "name":"mem",
>>>>>>>>       "type":"SCALAR",
>>>>>>>>       "scalar":{
>>>>>>>>         "value":512.0
>>>>>>>>       },
>>>>>>>>       "role":"*"
>>>>>>>>     },
>>>>>>>>     {
>>>>>>>>       "name":"ports",
>>>>>>>>       "type":"RANGES",
>>>>>>>>       "ranges":{
>>>>>>>>         "range":[
>>>>>>>>           {
>>>>>>>>             "begin":31641,
>>>>>>>>             "end":31641
>>>>>>>>           }
>>>>>>>>         ]
>>>>>>>>       },
>>>>>>>>       "role":"*"
>>>>>>>>     }
>>>>>>>>   ],
>>>>>>>>   "command":{
>>>>>>>>     "environment":{
>>>>>>>>       "variables":[
>>>>>>>>         {
>>>>>>>>           "name":"PORT_8000",
>>>>>>>>           "value":"31641"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"HOST",
>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>
>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>
>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"PORT",
>>>>>>>>           "value":"31641"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"PORTS",
>>>>>>>>           "value":"31641"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"PORT0",
>>>>>>>>           "value":"31641"
>>>>>>>>         }
>>>>>>>>       ]
>>>>>>>>     },
>>>>>>>>     "shell":false
>>>>>>>>   },
>>>>>>>>   "container":{
>>>>>>>>     "type":"DOCKER",
>>>>>>>>     "docker":{
>>>>>>>>
>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>       "network":"BRIDGE",
>>>>>>>>       "port_mappings":[
>>>>>>>>         {
>>>>>>>>           "host_port":31641,
>>>>>>>>           "container_port":8000,
>>>>>>>>           "protocol":"tcp"
>>>>>>>>         }
>>>>>>>>       ],
>>>>>>>>       "privileged":false,
>>>>>>>>       "force_pull_image":false
>>>>>>>>     }
>>>>>>>>   }
>>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> No, I don't see anything about any health check.
>>>>>>>
>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>
>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>> --stop_timeout="0ns"
>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>> --stop_timeout="0ns"
>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>> Starting task
>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>
>>>>>>>
>>>>>>> And STDERR:
>>>>>>>
>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on
>>>>>>>> slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>> memory limited without swap.
>>>>>>>
>>>>>>>
>>>>>>> Again, nothing about any health checks.
>>>>>>>
>>>>>>> Any ideas of other things to try or what I could be missing?  Can't
>>>>>>> say either way about the Mesos health-check system working or not if
>>>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>
>>>>>>> Thanks for all your help!
>>>>>>>
>>>>>>> Best,
>>>>>>> Jay
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Maybe you could post your executor stdout/stderr so that we could
>>>>>>>> know whether health check running not.
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> marathon also use mesos health check. When I use health check, I
>>>>>>>>> could saw the log like this in executor stdout.
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>> Starting task
>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>> Launching health check process:
>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>> Received task health update, healthy: true
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I am using my own framework, and the full task info I'm using is
>>>>>>>>>> posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>>>>>>> Mesos's health checks for its health check system?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Yes, launch the health task through its definition in taskinfo.
>>>>>>>>>> Do you launch your task through Marathon? I could test it in my side.
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Precisely, and there are none of those statements.  Are you or
>>>>>>>>>>> others confident health-checks are part of the code path when defined via
>>>>>>>>>>> task info for docker container tasks?  Going through the code, I wasn't
>>>>>>>>>>> able to find the linkage for anything other than health-checks triggered
>>>>>>>>>>> through a custom executor.
>>>>>>>>>>>
>>>>>>>>>>> With that being said it is a pretty good sized code base and I'm
>>>>>>>>>>> not very familiar with it, so my analysis this far has by no means been
>>>>>>>>>>> exhaustive.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> When health check launch, it would have a log like this in your
>>>>>>>>>>> executor stdout
>>>>>>>>>>> ```
>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <outtatime@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the
>>>>>>>>>>>> logs with the string "health" or "Health" if the health-check were active?
>>>>>>>>>>>> None of my master or slave logs contain the string..
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether
>>>>>>>>>>>> could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <haosdent@gmail.com
>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me
>>>>>>>>>>>>>>> double check.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look
>>>>>>>>>>>>>>>> there :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it
>>>>>>>>>>>>>>>>> out?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks
>>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker exec with the
>>>>>>>>>>>>>>>>>> command you provided as health checks.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?
>>>>>>>>>>>>>>>>>> Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if
>>>>>>>>>>>>>>>>>> they ever run the command (in this case `sleep 5`), but have not found any
>>>>>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>>>>>>>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>>>>>>>>>>>>> docker tasks?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>
>


-- 
Best Regards,
Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
I see.  And then how are the flags sent to the executor?



> On Oct 8, 2015, at 8:56 PM, haosdent <ha...@gmail.com> wrote:
> 
> Yes. The related code is located in https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
> 
> In fact, environment variables starts with MESOS_ would load as flags variables.
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
> 
>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com> wrote:
>> One question for you haosdent-
>> 
>> You mentioned that the flags.launcher_dir should propagate to the docker executor all the way up the chain.  Can you show me where this logic is in the codebase?  I didn't see where that was happening and would like to understand the mechanism.
>> 
>> Thanks!
>> Jay
>> 
>> 
>> 
>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com> wrote:
>>> 
>>> Maybe tomorrow I will build a fresh cluster from scratch to see if the broken behavior experienced today still persists.
>>> 
>>>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>>>> 
>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir which would find mesos-docker-executor and mesos-health-check under this dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is get from it.
>>>> 
>>>> For example, because I 
>>>> ```
>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>> ```
>>>> before start mesos-slave. So when I launch slave, I could find this log in slave log
>>>> ```
>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup: xxxxx  --launcher_dir="/tmp"
>>>> ```
>>>> 
>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts?
>>>> 
>>>> 
>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>>> 
>>>>> I just tried setting both the env var and flag on the slaves, and have determined that the env var is not present when it is being checked src/docker/executor.cpp @ line 573:
>>>>> 
>>>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>   string path =
>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>> 
>>>>> 
>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly propagated along up to the point of mesos-slave launch):
>>>>> 
>>>>>> $ cat /etc/default/mesos-slave
>>>>>> export MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>> export MESOS_PORT="5050"
>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>> 
>>>>> TASK OUTPUT:
>>>>> 
>>>>>> MESOS_LAUNCHER_DIR: envpath.isSome()->no
>>>>>> MESOS_LAUNCHER_DIR: path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'
>>>>>> Registered docker executor on mesos-worker2a
>>>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>> Launching health check process: /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check --executor=(1)@192.168.225.59:44523 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad sh -c \" \/bin\/bash \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>> Health check process launched at pid: 2519
>>>>> 
>>>>> 
>>>>> The env var is not propagated when the docker executor is launched in src/slave/containerizer/docker.cpp around line 903:
>>>>> 
>>>>>>   vector<string> argv;
>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>   // Construct the mesos-docker-executor using the "name" we gave the
>>>>>>   // container (to distinguish it from Docker containers not created
>>>>>>   // by Mesos).
>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>       argv,
>>>>>>       Subprocess::PIPE(),
>>>>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>>>       environment,
>>>>>>       lambda::bind(&setup, container->directory));
>>>>> 
>>>>> 
>>>>> A little ways above we can see the environment is setup w/ the container tasks defined env vars.
>>>>> 
>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>> 
>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>            container->executor.command().environment().variables()) {
>>>>>>     environment[variable.name()] = variable.value();
>>>>>>   }
>>>>> 
>>>>> 
>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>> 
>>>>> 
>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:
>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>> 0.24.1 should be works.
>>>>>> 
>>>>>> >Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" should exist on? It definitely doesn't exist on the slave, hence execution failing.
>>>>>> 
>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same dir of mesos-docker-executor. 
>>>>>> 
>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>> Maybe I spoke too soon.
>>>>>>> 
>>>>>>> Now the checks are attempting to run, however the STDERR is not looking good.  I've added some debugging to the error message output to show the path, argv, and envp variables:
>>>>>>> 
>>>>>>> STDOUT:
>>>>>>> 
>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --stop_timeout="0ns"
>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --stop_timeout="0ns"
>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>> Starting task app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>> Launching health check process: /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check --executor=(1)@192.168.225.59:43917 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc sh -c \" exit 1 \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>> Health check process launched at pid: 3012
>>>>>>> 
>>>>>>> 
>>>>>>> STDERR:
>>>>>>> 
>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', envp=''): No such file or directory*** Aborted at 1444270649 (unix time) try "date -d @1444270649" if you are using GNU date ***
>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID 3012; stack trace: ***
>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>> @ 0x41921c _Abort()
>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>> @ 0x43cc9c mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>> @ 0x7f4a39d92827 _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>> 
>>>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" should exist on? It definitely doesn't exist on the slave, hence execution failing.
>>>>>>> 
>>>>>>> This is with current master, git hash 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>> 
>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>> 
>>>>>>> 
>>>>>>> -Jay
>>>>>>> 
>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>> Update:
>>>>>>>> 
>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile and package the latest master (0.26.x) and deployed it to the cluster, and now health checks are working as advertised in both Marathon and my own framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>>>> 
>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Jay
>>>>>>>> 
>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>> Hi Haosdent,
>>>>>>>>> 
>>>>>>>>> Can you share your Marathon POST request that results in Mesos executing the health checks?
>>>>>>>>> 
>>>>>>>>> Since we can reference the Marathon framework, I've been doing some digging around.
>>>>>>>>> 
>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>> 
>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>> 
>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>>>>>> 
>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X in both the TaskFactory as well an right before the task is sent to Mesos via driver.launchTasks:
>>>>>>>>> 
>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>> 
>>>>>>>>>> $ git diff src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>> 
>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId, config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>> +        import java.io._
>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>> +        bw.close()
>>>>>>>>>>          CreatedTask(
>>>>>>>>>>            taskInfo,
>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>> 
>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>> 
>>>>>>>>>> $ git diff src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos: Seq[TaskInfo]): Boolean = {
>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>> +      var i = 0
>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>> +        import java.io._
>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>> +        bw.close()
>>>>>>>>>> +      }
>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID), taskInfos.asJava)
>>>>>>>>>>      }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Then I built and deployed the hacked Marathon and restarted the marathon service.
>>>>>>>>> 
>>>>>>>>> Next I created the app via the Marathon API ("hello app" is a container with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>>>>>>> 
>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type: application/json' -d'
>>>>>>>>>> {
>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>   "apps": [
>>>>>>>>>>     {
>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>       "container": {
>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>         "docker": {
>>>>>>>>>>           "image": "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>           "portMappings": [
>>>>>>>>>>             {
>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>             }
>>>>>>>>>>           ]
>>>>>>>>>>         }
>>>>>>>>>>       },
>>>>>>>>>>       "env": {
>>>>>>>>>>         
>>>>>>>>>>       },
>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>         {
>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>         }
>>>>>>>>>>       ],
>>>>>>>>>>       "instances": 1,
>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>       "mem": 512
>>>>>>>>>>     }
>>>>>>>>>>   ]
>>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> $ ls /tmp/
>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>> 
>>>>>>>>> Do they match?
>>>>>>>>> 
>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>> 1b5115997e78e2611654059249d99578  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>> 1b5115997e78e2611654059249d99578  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>> 
>>>>>>>>> Yes, so I am confident this is the information being sent across the wire to Mesos.
>>>>>>>>> 
>>>>>>>>> Do they contain any health-check information?
>>>>>>>>> 
>>>>>>>>>> $ cat /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>> {
>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>   "task_id":{
>>>>>>>>>>     "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>   },
>>>>>>>>>>   "slave_id":{
>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>   },
>>>>>>>>>>   "resources":[
>>>>>>>>>>     {
>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>       "scalar":{
>>>>>>>>>>         "value":1.0
>>>>>>>>>>       },
>>>>>>>>>>       "role":"*"
>>>>>>>>>>     },
>>>>>>>>>>     {
>>>>>>>>>>       "name":"mem",
>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>       "scalar":{
>>>>>>>>>>         "value":512.0
>>>>>>>>>>       },
>>>>>>>>>>       "role":"*"
>>>>>>>>>>     },
>>>>>>>>>>     {
>>>>>>>>>>       "name":"ports",
>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>       "ranges":{
>>>>>>>>>>         "range":[
>>>>>>>>>>           {
>>>>>>>>>>             "begin":31641,
>>>>>>>>>>             "end":31641
>>>>>>>>>>           }
>>>>>>>>>>         ]
>>>>>>>>>>       },
>>>>>>>>>>       "role":"*"
>>>>>>>>>>     }
>>>>>>>>>>   ],
>>>>>>>>>>   "command":{
>>>>>>>>>>     "environment":{
>>>>>>>>>>       "variables":[
>>>>>>>>>>         {
>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>           "value":"31641"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>           "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>           "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>           "value":"31641"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>           "value":"31641"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>         },
>>>>>>>>>>         {
>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>           "value":"31641"
>>>>>>>>>>         }
>>>>>>>>>>       ]
>>>>>>>>>>     },
>>>>>>>>>>     "shell":false
>>>>>>>>>>   },
>>>>>>>>>>   "container":{
>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>     "docker":{
>>>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>         {
>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>         }
>>>>>>>>>>       ],
>>>>>>>>>>       "privileged":false,
>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>     }
>>>>>>>>>>   }
>>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>> 
>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>> 
>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" --stop_timeout="0ns"
>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" --stop_timeout="0ns"
>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>> Starting task app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> And STDERR:
>>>>>>>>> 
>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>> 
>>>>>>>>> Any ideas of other things to try or what I could be missing?  Can't say either way about the Mesos health-check system working or not if Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>>> 
>>>>>>>>> Thanks for all your help!
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Jay
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>> Maybe you could post your executor stdout/stderr so that we could know whether health check running not.
>>>>>>>>>> 
>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>> marathon also use mesos health check. When I use health check, I could saw the log like this in executor stdout.
>>>>>>>>>>> 
>>>>>>>>>>> ```
>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>> Launching health check process: /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>> ```
>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>> I am using my own framework, and the full task info I'm using is posted earlier in this thread.  Do you happen to know if Marathon uses Mesos's health checks for its health check system?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, launch the health task through its definition in taskinfo. Do you launch your task through Marathon? I could test it in my side.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>> Precisely, and there are none of those statements.  Are you or others confident health-checks are part of the code path when defined via task info for docker container tasks?  Going through the code, I wasn't able to find the linkage for anything other than health-checks triggered through a custom executor.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> With that being said it is a pretty good sized code base and I'm not very familiar with it, so my analysis this far has by no means been exhaustive.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> When health check launch, it would have a log like this in your executor stdout
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the logs with the string "health" or "Health" if the health-check were active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double check.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi Jay, 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks that's in master but not yet released. It will run docker exec with the command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if they ever run the command (in this case `sleep 5`), but have not found any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this mean that health-checks are only supported for custom executors and not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> -- 
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Best Regards,
>>>> Haosdent Huang
> 
> 
> 
> -- 
> Best Regards,
> Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
If you could change the code, I think add this line should be helpful for
debug your problem

```
diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index 1e49013..91ba24e 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -571,6 +571,7 @@ int main(int argc, char** argv)
   }

   const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
+  cerr << "Lauch command: " << argv[0] << endl;
   string path =
     envPath.isSome() ? envPath.get()
                      : os::realpath(Path(argv[0]).dirname()).get();
```

So that we could compare whether argv[0] is match with the launcher_dir you
see in slave initialization log.

On Fri, Oct 9, 2015 at 11:56 AM, haosdent <ha...@gmail.com> wrote:

> Yes. The related code is located in
> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>
> In fact, environment variables starts with MESOS_ would load as flags
> variables.
>
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>
> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com> wrote:
>
>> One question for you haosdent-
>>
>> You mentioned that the flags.launcher_dir should propagate to the docker
>> executor all the way up the chain.  Can you show me where this logic is in
>> the codebase?  I didn't see where that was happening and would like to
>> understand the mechanism.
>>
>> Thanks!
>> Jay
>>
>>
>>
>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com> wrote:
>>
>> Maybe tomorrow I will build a fresh cluster from scratch to see if the
>> broken behavior experienced today still persists.
>>
>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>>
>> As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir
>> which would find mesos-docker-executor and mesos-health-check under this
>> dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still
>> works because flags.launcher_dir is get from it.
>>
>> For example, because I
>> ```
>> export MESOS_LAUNCHER_DIR=/tmp
>> ```
>> before start mesos-slave. So when I launch slave, I could find this log
>> in slave log
>> ```
>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>> xxxxx  --launcher_dir="/tmp"
>> ```
>>
>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox
>> dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts?
>>
>>
>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>
>>> I just tried setting both the env var and flag on the slaves, and have
>>> determined that the env var is not present when it is being checked
>>> src/docker/executor.cpp @ line 573:
>>>
>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>   string path =
>>>>     envPath.isSome() ? envPath.get()
>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome()
>>>> ? "yes" : "no") << endl;
>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>
>>>
>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
>>> propagated along up to the point of mesos-slave launch):
>>>
>>> $ cat /etc/default/mesos-slave
>>>> export
>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>> export MESOS_PORT="5050"
>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>
>>>
>>> TASK OUTPUT:
>>>
>>>
>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>> Registered docker executor on mesos-worker2a
>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>> Launching health check process:
>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>> --executor=(1)@192.168.225.59:44523
>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>> sh -c \" \/bin\/bash
>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>> Health check process launched at pid: 2519
>>>
>>>
>>> The env var is not propagated when the docker executor is launched
>>> in src/slave/containerizer/docker.cpp around line 903:
>>>
>>>   vector<string> argv;
>>>>   argv.push_back("mesos-docker-executor");
>>>>   // Construct the mesos-docker-executor using the "name" we gave the
>>>>   // container (to distinguish it from Docker containers not created
>>>>   // by Mesos).
>>>>   Try<Subprocess> s = subprocess(
>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>       argv,
>>>>       Subprocess::PIPE(),
>>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>       environment,
>>>>       lambda::bind(&setup, container->directory));
>>>
>>>
>>> A little ways above we can see the environment is setup w/ the container
>>> tasks defined env vars.
>>>
>>> See src/slave/containerizer/docker.cpp around line 871:
>>>
>>>   // Include any enviroment variables from ExecutorInfo.
>>>>   foreach (const Environment::Variable& variable,
>>>>            container->executor.command().environment().variables()) {
>>>>     environment[variable.name()] = variable.value();
>>>>   }
>>>
>>>
>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>
>>>
>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:
>>>
>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>> 0.24.1 should be works.
>>>>
>>>> >Do any of you know which host the path
>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>>> failing.
>>>>
>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We
>>>> got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the
>>>> same dir of mesos-docker-executor.
>>>>
>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com>
>>>> wrote:
>>>>
>>>>> Maybe I spoke too soon.
>>>>>
>>>>> Now the checks are attempting to run, however the STDERR is not
>>>>> looking good.  I've added some debugging to the error message output to
>>>>> show the path, argv, and envp variables:
>>>>>
>>>>> STDOUT:
>>>>>
>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>> --stop_timeout="0ns"
>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>> --stop_timeout="0ns"
>>>>>> Registered docker executor on mesos-worker2a
>>>>>> Starting task
>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>> Launching health check process:
>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>> sh -c \" exit 1
>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>> Health check process launched at pid: 3012
>>>>>
>>>>>
>>>>> STDERR:
>>>>>
>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave
>>>>>> 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory
>>>>>> limited without swap.
>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain
>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from
>>>>>> PID 3012; stack trace: ***
>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>> @ 0x4191e2 _Abort()
>>>>>> @ 0x41921c _Abort()
>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>> @ 0x43cc9c
>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>> @ 0x7f4a39d92827
>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>
>>>>>
>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>> should exist on? It definitely doesn't exist on the slave, hence
>>>>> execution failing.
>>>>>
>>>>> This is with current master, git hash
>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>
>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>
>>>>>
>>>>> -Jay
>>>>>
>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Update:
>>>>>>
>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile and
>>>>>> package the latest master (0.26.x) and deployed it to the cluster, and now
>>>>>> health checks are working as advertised in both Marathon and my own
>>>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>>
>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>
>>>>>> Cheers,
>>>>>> Jay
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Haosdent,
>>>>>>>
>>>>>>> Can you share your Marathon POST request that results in Mesos
>>>>>>> executing the health checks?
>>>>>>>
>>>>>>> Since we can reference the Marathon framework, I've been doing some
>>>>>>> digging around.
>>>>>>>
>>>>>>> Here are the details of my setup and findings:
>>>>>>>
>>>>>>> I put a few small hacks in Marathon:
>>>>>>>
>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>>>>
>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to
>>>>>>> /tmp/X in both the TaskFactory as well an right before the task is sent to
>>>>>>> Mesos via driver.launchTasks:
>>>>>>>
>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>
>>>>>>> $ git diff
>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>
>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>> +        import java.io._
>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>> +        bw.write("\n")
>>>>>>>> +        bw.close()
>>>>>>>>          CreatedTask(
>>>>>>>>            taskInfo,
>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>
>>>>>>> $ git diff
>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>> +      var i = 0
>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>> +        import java.io._
>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-"
>>>>>>>> + taskInfos(i).getTaskId.getValue)
>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>> +        bw.write("\n")
>>>>>>>> +        bw.close()
>>>>>>>> +      }
>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>> taskInfos.asJava)
>>>>>>>>      }
>>>>>>>
>>>>>>>
>>>>>>> Then I built and deployed the hacked Marathon and restarted the
>>>>>>> marathon service.
>>>>>>>
>>>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>>>> container with a simple hello-world ruby app running on 0.0.0.0:8000
>>>>>>> )
>>>>>>>
>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type:
>>>>>>>> application/json' -d'
>>>>>>>> {
>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>   "apps": [
>>>>>>>>     {
>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>       "container": {
>>>>>>>>         "type": "DOCKER",
>>>>>>>>         "docker": {
>>>>>>>>           "image":
>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>           "network": "BRIDGE",
>>>>>>>>           "portMappings": [
>>>>>>>>             {
>>>>>>>>               "containerPort": 8000,
>>>>>>>>               "hostPort": 0,
>>>>>>>>               "protocol": "tcp"
>>>>>>>>             }
>>>>>>>>           ]
>>>>>>>>         }
>>>>>>>>       },
>>>>>>>>       "env": {
>>>>>>>>
>>>>>>>>       },
>>>>>>>>       "healthChecks": [
>>>>>>>>         {
>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>         }
>>>>>>>>       ],
>>>>>>>>       "instances": 1,
>>>>>>>>       "cpus": 1,
>>>>>>>>       "mem": 512
>>>>>>>>     }
>>>>>>>>   ]
>>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> $ ls /tmp/
>>>>>>>>
>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>
>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>
>>>>>>>
>>>>>>> Do they match?
>>>>>>>
>>>>>>> $ md5sum /tmp/task*
>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>
>>>>>>>
>>>>>>> Yes, so I am confident this is the information being sent across the
>>>>>>> wire to Mesos.
>>>>>>>
>>>>>>> Do they contain any health-check information?
>>>>>>>
>>>>>>> $ cat
>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>> {
>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>   "task_id":{
>>>>>>>>
>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>   },
>>>>>>>>   "slave_id":{
>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>   },
>>>>>>>>   "resources":[
>>>>>>>>     {
>>>>>>>>       "name":"cpus",
>>>>>>>>       "type":"SCALAR",
>>>>>>>>       "scalar":{
>>>>>>>>         "value":1.0
>>>>>>>>       },
>>>>>>>>       "role":"*"
>>>>>>>>     },
>>>>>>>>     {
>>>>>>>>       "name":"mem",
>>>>>>>>       "type":"SCALAR",
>>>>>>>>       "scalar":{
>>>>>>>>         "value":512.0
>>>>>>>>       },
>>>>>>>>       "role":"*"
>>>>>>>>     },
>>>>>>>>     {
>>>>>>>>       "name":"ports",
>>>>>>>>       "type":"RANGES",
>>>>>>>>       "ranges":{
>>>>>>>>         "range":[
>>>>>>>>           {
>>>>>>>>             "begin":31641,
>>>>>>>>             "end":31641
>>>>>>>>           }
>>>>>>>>         ]
>>>>>>>>       },
>>>>>>>>       "role":"*"
>>>>>>>>     }
>>>>>>>>   ],
>>>>>>>>   "command":{
>>>>>>>>     "environment":{
>>>>>>>>       "variables":[
>>>>>>>>         {
>>>>>>>>           "name":"PORT_8000",
>>>>>>>>           "value":"31641"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"HOST",
>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>
>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>
>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"PORT",
>>>>>>>>           "value":"31641"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"PORTS",
>>>>>>>>           "value":"31641"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"PORT0",
>>>>>>>>           "value":"31641"
>>>>>>>>         }
>>>>>>>>       ]
>>>>>>>>     },
>>>>>>>>     "shell":false
>>>>>>>>   },
>>>>>>>>   "container":{
>>>>>>>>     "type":"DOCKER",
>>>>>>>>     "docker":{
>>>>>>>>
>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>       "network":"BRIDGE",
>>>>>>>>       "port_mappings":[
>>>>>>>>         {
>>>>>>>>           "host_port":31641,
>>>>>>>>           "container_port":8000,
>>>>>>>>           "protocol":"tcp"
>>>>>>>>         }
>>>>>>>>       ],
>>>>>>>>       "privileged":false,
>>>>>>>>       "force_pull_image":false
>>>>>>>>     }
>>>>>>>>   }
>>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> No, I don't see anything about any health check.
>>>>>>>
>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>
>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>> --stop_timeout="0ns"
>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>> --stop_timeout="0ns"
>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>> Starting task
>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>
>>>>>>>
>>>>>>> And STDERR:
>>>>>>>
>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on
>>>>>>>> slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>> memory limited without swap.
>>>>>>>
>>>>>>>
>>>>>>> Again, nothing about any health checks.
>>>>>>>
>>>>>>> Any ideas of other things to try or what I could be missing?  Can't
>>>>>>> say either way about the Mesos health-check system working or not if
>>>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>
>>>>>>> Thanks for all your help!
>>>>>>>
>>>>>>> Best,
>>>>>>> Jay
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Maybe you could post your executor stdout/stderr so that we could
>>>>>>>> know whether health check running not.
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> marathon also use mesos health check. When I use health check, I
>>>>>>>>> could saw the log like this in executor stdout.
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>> Starting task
>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>> Launching health check process:
>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>> Received task health update, healthy: true
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I am using my own framework, and the full task info I'm using is
>>>>>>>>>> posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>>>>>>> Mesos's health checks for its health check system?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Yes, launch the health task through its definition in taskinfo.
>>>>>>>>>> Do you launch your task through Marathon? I could test it in my side.
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Precisely, and there are none of those statements.  Are you or
>>>>>>>>>>> others confident health-checks are part of the code path when defined via
>>>>>>>>>>> task info for docker container tasks?  Going through the code, I wasn't
>>>>>>>>>>> able to find the linkage for anything other than health-checks triggered
>>>>>>>>>>> through a custom executor.
>>>>>>>>>>>
>>>>>>>>>>> With that being said it is a pretty good sized code base and I'm
>>>>>>>>>>> not very familiar with it, so my analysis this far has by no means been
>>>>>>>>>>> exhaustive.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> When health check launch, it would have a log like this in your
>>>>>>>>>>> executor stdout
>>>>>>>>>>> ```
>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <outtatime@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the
>>>>>>>>>>>> logs with the string "health" or "Health" if the health-check were active?
>>>>>>>>>>>> None of my master or slave logs contain the string..
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether
>>>>>>>>>>>> could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <haosdent@gmail.com
>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me
>>>>>>>>>>>>>>> double check.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look
>>>>>>>>>>>>>>>> there :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it
>>>>>>>>>>>>>>>>> out?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks
>>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker exec with the
>>>>>>>>>>>>>>>>>> command you provided as health checks.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?
>>>>>>>>>>>>>>>>>> Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if
>>>>>>>>>>>>>>>>>> they ever run the command (in this case `sleep 5`), but have not found any
>>>>>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>>>>>>>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>>>>>>>>>>>>> docker tasks?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
Yes. The related code is located in
https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123

In fact, environment variables starts with MESOS_ would load as flags
variables.
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52

On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <ou...@gmail.com> wrote:

> One question for you haosdent-
>
> You mentioned that the flags.launcher_dir should propagate to the docker
> executor all the way up the chain.  Can you show me where this logic is in
> the codebase?  I didn't see where that was happening and would like to
> understand the mechanism.
>
> Thanks!
> Jay
>
>
>
> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com> wrote:
>
> Maybe tomorrow I will build a fresh cluster from scratch to see if the
> broken behavior experienced today still persists.
>
> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>
> As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir
> which would find mesos-docker-executor and mesos-health-check under this
> dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still
> works because flags.launcher_dir is get from it.
>
> For example, because I
> ```
> export MESOS_LAUNCHER_DIR=/tmp
> ```
> before start mesos-slave. So when I launch slave, I could find this log in
> slave log
> ```
> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
> xxxxx  --launcher_dir="/tmp"
> ```
>
> And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox
> dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts?
>
>
> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>
>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>
>> I just tried setting both the env var and flag on the slaves, and have
>> determined that the env var is not present when it is being checked
>> src/docker/executor.cpp @ line 573:
>>
>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>   string path =
>>>     envPath.isSome() ? envPath.get()
>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome()
>>> ? "yes" : "no") << endl;
>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>
>>
>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
>> propagated along up to the point of mesos-slave launch):
>>
>> $ cat /etc/default/mesos-slave
>>> export
>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>> export MESOS_PORT="5050"
>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>
>>
>> TASK OUTPUT:
>>
>>
>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>> Registered docker executor on mesos-worker2a
>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>> Launching health check process:
>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>> --executor=(1)@192.168.225.59:44523
>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>> sh -c \" \/bin\/bash
>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>> Health check process launched at pid: 2519
>>
>>
>> The env var is not propagated when the docker executor is launched
>> in src/slave/containerizer/docker.cpp around line 903:
>>
>>   vector<string> argv;
>>>   argv.push_back("mesos-docker-executor");
>>>   // Construct the mesos-docker-executor using the "name" we gave the
>>>   // container (to distinguish it from Docker containers not created
>>>   // by Mesos).
>>>   Try<Subprocess> s = subprocess(
>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>       argv,
>>>       Subprocess::PIPE(),
>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>       dockerFlags(flags, container->name(), container->directory),
>>>       environment,
>>>       lambda::bind(&setup, container->directory));
>>
>>
>> A little ways above we can see the environment is setup w/ the container
>> tasks defined env vars.
>>
>> See src/slave/containerizer/docker.cpp around line 871:
>>
>>   // Include any enviroment variables from ExecutorInfo.
>>>   foreach (const Environment::Variable& variable,
>>>            container->executor.command().environment().variables()) {
>>>     environment[variable.name()] = variable.value();
>>>   }
>>
>>
>> Should I file a JIRA for this?  Have I overlooked anything?
>>
>>
>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:
>>
>>> >Not sure what was going on with health-checks in 0.24.0.
>>> 0.24.1 should be works.
>>>
>>> >Do any of you know which host the path
>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>> should exist on? It definitely doesn't exist on the slave, hence execution
>>> failing.
>>>
>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We
>>> got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the
>>> same dir of mesos-docker-executor.
>>>
>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>
>>>> Maybe I spoke too soon.
>>>>
>>>> Now the checks are attempting to run, however the STDERR is not looking
>>>> good.  I've added some debugging to the error message output to show the
>>>> path, argv, and envp variables:
>>>>
>>>> STDOUT:
>>>>
>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>> --stop_timeout="0ns"
>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>> --stop_timeout="0ns"
>>>>> Registered docker executor on mesos-worker2a
>>>>> Starting task
>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>> Launching health check process:
>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>> --executor=(1)@192.168.225.59:43917
>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>> sh -c \" exit 1
>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>> Health check process launched at pid: 3012
>>>>
>>>>
>>>> STDERR:
>>>>
>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave
>>>>> 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>> WARNING: Your kernel does not support swap limit capabilities, memory
>>>>> limited without swap.
>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain
>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from
>>>>> PID 3012; stack trace: ***
>>>>> @ 0x7f4a38265340 (unknown)
>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>> @ 0x4191e2 _Abort()
>>>>> @ 0x41921c _Abort()
>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>> @ 0x43cc9c
>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>> @ 0x7f4a39d92827
>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>> @ 0x7f4a3825d182 start_thread
>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>
>>>>
>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>> should exist on? It definitely doesn't exist on the slave, hence
>>>> execution failing.
>>>>
>>>> This is with current master, git hash
>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>
>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>
>>>>
>>>> -Jay
>>>>
>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>
>>>>> Update:
>>>>>
>>>>> I used https://github.com/deric/mesos-deb-packaging to compile and
>>>>> package the latest master (0.26.x) and deployed it to the cluster, and now
>>>>> health checks are working as advertised in both Marathon and my own
>>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>
>>>>> Anyways, thanks again for your help Haosdent!
>>>>>
>>>>> Cheers,
>>>>> Jay
>>>>>
>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Haosdent,
>>>>>>
>>>>>> Can you share your Marathon POST request that results in Mesos
>>>>>> executing the health checks?
>>>>>>
>>>>>> Since we can reference the Marathon framework, I've been doing some
>>>>>> digging around.
>>>>>>
>>>>>> Here are the details of my setup and findings:
>>>>>>
>>>>>> I put a few small hacks in Marathon:
>>>>>>
>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>>>
>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to
>>>>>> /tmp/X in both the TaskFactory as well an right before the task is sent to
>>>>>> Mesos via driver.launchTasks:
>>>>>>
>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>
>>>>>> $ git diff
>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>
>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>        case (taskInfo, ports) =>
>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>> +        import java.io._
>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>> +        bw.write("\n")
>>>>>>> +        bw.close()
>>>>>>>          CreatedTask(
>>>>>>>            taskInfo,
>>>>>>>            MarathonTasks.makeTask(
>>>>>>
>>>>>>
>>>>>>
>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>
>>>>>> $ git diff
>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>>>>        import scala.collection.JavaConverters._
>>>>>>> +      var i = 0
>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>> +        import java.io._
>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-"
>>>>>>> + taskInfos(i).getTaskId.getValue)
>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>> +        bw.write("\n")
>>>>>>> +        bw.close()
>>>>>>> +      }
>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>> taskInfos.asJava)
>>>>>>>      }
>>>>>>
>>>>>>
>>>>>> Then I built and deployed the hacked Marathon and restarted the
>>>>>> marathon service.
>>>>>>
>>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>>> container with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>>>>
>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type:
>>>>>>> application/json' -d'
>>>>>>> {
>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>   "apps": [
>>>>>>>     {
>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>       "container": {
>>>>>>>         "type": "DOCKER",
>>>>>>>         "docker": {
>>>>>>>           "image":
>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>           "network": "BRIDGE",
>>>>>>>           "portMappings": [
>>>>>>>             {
>>>>>>>               "containerPort": 8000,
>>>>>>>               "hostPort": 0,
>>>>>>>               "protocol": "tcp"
>>>>>>>             }
>>>>>>>           ]
>>>>>>>         }
>>>>>>>       },
>>>>>>>       "env": {
>>>>>>>
>>>>>>>       },
>>>>>>>       "healthChecks": [
>>>>>>>         {
>>>>>>>           "protocol": "COMMAND",
>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>           "intervalSeconds": 10,
>>>>>>>           "timeoutSeconds": 10,
>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>         }
>>>>>>>       ],
>>>>>>>       "instances": 1,
>>>>>>>       "cpus": 1,
>>>>>>>       "mem": 512
>>>>>>>     }
>>>>>>>   ]
>>>>>>> }
>>>>>>
>>>>>>
>>>>>> $ ls /tmp/
>>>>>>>
>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>
>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>
>>>>>>
>>>>>> Do they match?
>>>>>>
>>>>>> $ md5sum /tmp/task*
>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>
>>>>>>
>>>>>> Yes, so I am confident this is the information being sent across the
>>>>>> wire to Mesos.
>>>>>>
>>>>>> Do they contain any health-check information?
>>>>>>
>>>>>> $ cat
>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>> {
>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>   "task_id":{
>>>>>>>
>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>   },
>>>>>>>   "slave_id":{
>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>   },
>>>>>>>   "resources":[
>>>>>>>     {
>>>>>>>       "name":"cpus",
>>>>>>>       "type":"SCALAR",
>>>>>>>       "scalar":{
>>>>>>>         "value":1.0
>>>>>>>       },
>>>>>>>       "role":"*"
>>>>>>>     },
>>>>>>>     {
>>>>>>>       "name":"mem",
>>>>>>>       "type":"SCALAR",
>>>>>>>       "scalar":{
>>>>>>>         "value":512.0
>>>>>>>       },
>>>>>>>       "role":"*"
>>>>>>>     },
>>>>>>>     {
>>>>>>>       "name":"ports",
>>>>>>>       "type":"RANGES",
>>>>>>>       "ranges":{
>>>>>>>         "range":[
>>>>>>>           {
>>>>>>>             "begin":31641,
>>>>>>>             "end":31641
>>>>>>>           }
>>>>>>>         ]
>>>>>>>       },
>>>>>>>       "role":"*"
>>>>>>>     }
>>>>>>>   ],
>>>>>>>   "command":{
>>>>>>>     "environment":{
>>>>>>>       "variables":[
>>>>>>>         {
>>>>>>>           "name":"PORT_8000",
>>>>>>>           "value":"31641"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"HOST",
>>>>>>>           "value":"mesos-worker1a"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>
>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>
>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"PORT",
>>>>>>>           "value":"31641"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"PORTS",
>>>>>>>           "value":"31641"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"PORT0",
>>>>>>>           "value":"31641"
>>>>>>>         }
>>>>>>>       ]
>>>>>>>     },
>>>>>>>     "shell":false
>>>>>>>   },
>>>>>>>   "container":{
>>>>>>>     "type":"DOCKER",
>>>>>>>     "docker":{
>>>>>>>
>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>       "network":"BRIDGE",
>>>>>>>       "port_mappings":[
>>>>>>>         {
>>>>>>>           "host_port":31641,
>>>>>>>           "container_port":8000,
>>>>>>>           "protocol":"tcp"
>>>>>>>         }
>>>>>>>       ],
>>>>>>>       "privileged":false,
>>>>>>>       "force_pull_image":false
>>>>>>>     }
>>>>>>>   }
>>>>>>> }
>>>>>>
>>>>>>
>>>>>> No, I don't see anything about any health check.
>>>>>>
>>>>>> Mesos STDOUT for the launched task:
>>>>>>
>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>> --stop_timeout="0ns"
>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>> --stop_timeout="0ns"
>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>> Starting task
>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>
>>>>>>
>>>>>> And STDERR:
>>>>>>
>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on
>>>>>>> slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>> memory limited without swap.
>>>>>>
>>>>>>
>>>>>> Again, nothing about any health checks.
>>>>>>
>>>>>> Any ideas of other things to try or what I could be missing?  Can't
>>>>>> say either way about the Mesos health-check system working or not if
>>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>
>>>>>> Thanks for all your help!
>>>>>>
>>>>>> Best,
>>>>>> Jay
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>
>>>>>>> Maybe you could post your executor stdout/stderr so that we could
>>>>>>> know whether health check running not.
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>
>>>>>>>> marathon also use mesos health check. When I use health check, I
>>>>>>>> could saw the log like this in executor stdout.
>>>>>>>>
>>>>>>>> ```
>>>>>>>> Registered docker executor on xxxxx
>>>>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>> Launching health check process:
>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>> Health check process launched at pid: 9895
>>>>>>>> Received task health update, healthy: true
>>>>>>>> ```
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I am using my own framework, and the full task info I'm using is
>>>>>>>>> posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>>>>>> Mesos's health checks for its health check system?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Yes, launch the health task through its definition in taskinfo. Do
>>>>>>>>> you launch your task through Marathon? I could test it in my side.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Precisely, and there are none of those statements.  Are you or
>>>>>>>>>> others confident health-checks are part of the code path when defined via
>>>>>>>>>> task info for docker container tasks?  Going through the code, I wasn't
>>>>>>>>>> able to find the linkage for anything other than health-checks triggered
>>>>>>>>>> through a custom executor.
>>>>>>>>>>
>>>>>>>>>> With that being said it is a pretty good sized code base and I'm
>>>>>>>>>> not very familiar with it, so my analysis this far has by no means been
>>>>>>>>>> exhaustive.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> When health check launch, it would have a log like this in your
>>>>>>>>>> executor stdout
>>>>>>>>>> ```
>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>> ```
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the
>>>>>>>>>>> logs with the string "health" or "Health" if the health-check were active?
>>>>>>>>>>> None of my master or slave logs contain the string..
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could
>>>>>>>>>>> see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <outtatime@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me
>>>>>>>>>>>>>> double check.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look
>>>>>>>>>>>>>>> there :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>> jay@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it
>>>>>>>>>>>>>>>> out?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We just added health check support for docker tasks that's
>>>>>>>>>>>>>>>>> in master but not yet released. It will run docker exec with the command
>>>>>>>>>>>>>>>>> you provided as health checks.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?
>>>>>>>>>>>>>>>>> Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if they
>>>>>>>>>>>>>>>>> ever run the command (in this case `sleep 5`), but have not found any
>>>>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>>>>>>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>>>>>>>>>>>> docker tasks?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>
>


-- 
Best Regards,
Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
One question for you haosdent-

You mentioned that the flags.launcher_dir should propagate to the docker executor all the way up the chain.  Can you show me where this logic is in the codebase?  I didn't see where that was happening and would like to understand the mechanism.

Thanks!
Jay



> On Oct 8, 2015, at 8:29 PM, Jay Taylor <ou...@gmail.com> wrote:
> 
> Maybe tomorrow I will build a fresh cluster from scratch to see if the broken behavior experienced today still persists.
> 
>> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
>> 
>> As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir which would find mesos-docker-executor and mesos-health-check under this dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is get from it.
>> 
>> For example, because I 
>> ```
>> export MESOS_LAUNCHER_DIR=/tmp
>> ```
>> before start mesos-slave. So when I launch slave, I could find this log in slave log
>> ```
>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup: xxxxx  --launcher_dir="/tmp"
>> ```
>> 
>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts?
>> 
>> 
>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>> 
>>> I just tried setting both the env var and flag on the slaves, and have determined that the env var is not present when it is being checked src/docker/executor.cpp @ line 573:
>>> 
>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>   string path =
>>>>     envPath.isSome() ? envPath.get()
>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome() ? "yes" : "no") << endl;
>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>> 
>>> 
>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly propagated along up to the point of mesos-slave launch):
>>> 
>>>> $ cat /etc/default/mesos-slave
>>>> export MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>> export MESOS_PORT="5050"
>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>> 
>>> TASK OUTPUT:
>>> 
>>>> MESOS_LAUNCHER_DIR: envpath.isSome()->no
>>>> MESOS_LAUNCHER_DIR: path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'
>>>> Registered docker executor on mesos-worker2a
>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>> Launching health check process: /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check --executor=(1)@192.168.225.59:44523 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad sh -c \" \/bin\/bash \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>> Health check process launched at pid: 2519
>>> 
>>> 
>>> The env var is not propagated when the docker executor is launched in src/slave/containerizer/docker.cpp around line 903:
>>> 
>>>>   vector<string> argv;
>>>>   argv.push_back("mesos-docker-executor");
>>>>   // Construct the mesos-docker-executor using the "name" we gave the
>>>>   // container (to distinguish it from Docker containers not created
>>>>   // by Mesos).
>>>>   Try<Subprocess> s = subprocess(
>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>       argv,
>>>>       Subprocess::PIPE(),
>>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>       environment,
>>>>       lambda::bind(&setup, container->directory));
>>> 
>>> 
>>> A little ways above we can see the environment is setup w/ the container tasks defined env vars.
>>> 
>>> See src/slave/containerizer/docker.cpp around line 871:
>>> 
>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>   foreach (const Environment::Variable& variable,
>>>>            container->executor.command().environment().variables()) {
>>>>     environment[variable.name()] = variable.value();
>>>>   }
>>> 
>>> 
>>> Should I file a JIRA for this?  Have I overlooked anything?
>>> 
>>> 
>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:
>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>> 0.24.1 should be works.
>>>> 
>>>> >Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" should exist on? It definitely doesn't exist on the slave, hence execution failing.
>>>> 
>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same dir of mesos-docker-executor. 
>>>> 
>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>> Maybe I spoke too soon.
>>>>> 
>>>>> Now the checks are attempting to run, however the STDERR is not looking good.  I've added some debugging to the error message output to show the path, argv, and envp variables:
>>>>> 
>>>>> STDOUT:
>>>>> 
>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --stop_timeout="0ns"
>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --stop_timeout="0ns"
>>>>>> Registered docker executor on mesos-worker2a
>>>>>> Starting task app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>> Launching health check process: /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check --executor=(1)@192.168.225.59:43917 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc sh -c \" exit 1 \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>> Health check process launched at pid: 3012
>>>>> 
>>>>> 
>>>>> STDERR:
>>>>> 
>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', envp=''): No such file or directory*** Aborted at 1444270649 (unix time) try "date -d @1444270649" if you are using GNU date ***
>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID 3012; stack trace: ***
>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>> @ 0x4191e2 _Abort()
>>>>>> @ 0x41921c _Abort()
>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>> @ 0x43cc9c mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>> @ 0x7f4a39d92827 _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>> 
>>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" should exist on? It definitely doesn't exist on the slave, hence execution failing.
>>>>> 
>>>>> This is with current master, git hash 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>> 
>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>> 
>>>>> 
>>>>> -Jay
>>>>> 
>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>> Update:
>>>>>> 
>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile and package the latest master (0.26.x) and deployed it to the cluster, and now health checks are working as advertised in both Marathon and my own framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>> 
>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>> 
>>>>>> Cheers,
>>>>>> Jay
>>>>>> 
>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>> Hi Haosdent,
>>>>>>> 
>>>>>>> Can you share your Marathon POST request that results in Mesos executing the health checks?
>>>>>>> 
>>>>>>> Since we can reference the Marathon framework, I've been doing some digging around.
>>>>>>> 
>>>>>>> Here are the details of my setup and findings:
>>>>>>> 
>>>>>>> I put a few small hacks in Marathon:
>>>>>>> 
>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>>>> 
>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X in both the TaskFactory as well an right before the task is sent to Mesos via driver.launchTasks:
>>>>>>> 
>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>> 
>>>>>>>> $ git diff src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>> 
>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId, config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>> +        import java.io._
>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>> +        bw.write("\n")
>>>>>>>> +        bw.close()
>>>>>>>>          CreatedTask(
>>>>>>>>            taskInfo,
>>>>>>>>            MarathonTasks.makeTask(
>>>>>>> 
>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>> 
>>>>>>>> $ git diff src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos: Seq[TaskInfo]): Boolean = {
>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>> +      var i = 0
>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>> +        import java.io._
>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>> +        bw.write("\n")
>>>>>>>> +        bw.close()
>>>>>>>> +      }
>>>>>>>>        driver.launchTasks(Collections.singleton(offerID), taskInfos.asJava)
>>>>>>>>      }
>>>>>>> 
>>>>>>> 
>>>>>>> Then I built and deployed the hacked Marathon and restarted the marathon service.
>>>>>>> 
>>>>>>> Next I created the app via the Marathon API ("hello app" is a container with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>>>>> 
>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type: application/json' -d'
>>>>>>>> {
>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>   "apps": [
>>>>>>>>     {
>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>       "container": {
>>>>>>>>         "type": "DOCKER",
>>>>>>>>         "docker": {
>>>>>>>>           "image": "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>           "network": "BRIDGE",
>>>>>>>>           "portMappings": [
>>>>>>>>             {
>>>>>>>>               "containerPort": 8000,
>>>>>>>>               "hostPort": 0,
>>>>>>>>               "protocol": "tcp"
>>>>>>>>             }
>>>>>>>>           ]
>>>>>>>>         }
>>>>>>>>       },
>>>>>>>>       "env": {
>>>>>>>>         
>>>>>>>>       },
>>>>>>>>       "healthChecks": [
>>>>>>>>         {
>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>         }
>>>>>>>>       ],
>>>>>>>>       "instances": 1,
>>>>>>>>       "cpus": 1,
>>>>>>>>       "mem": 512
>>>>>>>>     }
>>>>>>>>   ]
>>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>>> $ ls /tmp/
>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>> 
>>>>>>> Do they match?
>>>>>>> 
>>>>>>>> $ md5sum /tmp/task*
>>>>>>>> 1b5115997e78e2611654059249d99578  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>> 1b5115997e78e2611654059249d99578  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>> 
>>>>>>> Yes, so I am confident this is the information being sent across the wire to Mesos.
>>>>>>> 
>>>>>>> Do they contain any health-check information?
>>>>>>> 
>>>>>>>> $ cat /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>> {
>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>   "task_id":{
>>>>>>>>     "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>   },
>>>>>>>>   "slave_id":{
>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>   },
>>>>>>>>   "resources":[
>>>>>>>>     {
>>>>>>>>       "name":"cpus",
>>>>>>>>       "type":"SCALAR",
>>>>>>>>       "scalar":{
>>>>>>>>         "value":1.0
>>>>>>>>       },
>>>>>>>>       "role":"*"
>>>>>>>>     },
>>>>>>>>     {
>>>>>>>>       "name":"mem",
>>>>>>>>       "type":"SCALAR",
>>>>>>>>       "scalar":{
>>>>>>>>         "value":512.0
>>>>>>>>       },
>>>>>>>>       "role":"*"
>>>>>>>>     },
>>>>>>>>     {
>>>>>>>>       "name":"ports",
>>>>>>>>       "type":"RANGES",
>>>>>>>>       "ranges":{
>>>>>>>>         "range":[
>>>>>>>>           {
>>>>>>>>             "begin":31641,
>>>>>>>>             "end":31641
>>>>>>>>           }
>>>>>>>>         ]
>>>>>>>>       },
>>>>>>>>       "role":"*"
>>>>>>>>     }
>>>>>>>>   ],
>>>>>>>>   "command":{
>>>>>>>>     "environment":{
>>>>>>>>       "variables":[
>>>>>>>>         {
>>>>>>>>           "name":"PORT_8000",
>>>>>>>>           "value":"31641"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"HOST",
>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>           "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>           "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"PORT",
>>>>>>>>           "value":"31641"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"PORTS",
>>>>>>>>           "value":"31641"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"PORT0",
>>>>>>>>           "value":"31641"
>>>>>>>>         }
>>>>>>>>       ]
>>>>>>>>     },
>>>>>>>>     "shell":false
>>>>>>>>   },
>>>>>>>>   "container":{
>>>>>>>>     "type":"DOCKER",
>>>>>>>>     "docker":{
>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>       "network":"BRIDGE",
>>>>>>>>       "port_mappings":[
>>>>>>>>         {
>>>>>>>>           "host_port":31641,
>>>>>>>>           "container_port":8000,
>>>>>>>>           "protocol":"tcp"
>>>>>>>>         }
>>>>>>>>       ],
>>>>>>>>       "privileged":false,
>>>>>>>>       "force_pull_image":false
>>>>>>>>     }
>>>>>>>>   }
>>>>>>>> }
>>>>>>> 
>>>>>>> No, I don't see anything about any health check.
>>>>>>> 
>>>>>>> Mesos STDOUT for the launched task:
>>>>>>> 
>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" --stop_timeout="0ns"
>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" --stop_timeout="0ns"
>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>> Starting task app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>> 
>>>>>>> 
>>>>>>> And STDERR:
>>>>>>> 
>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
>>>>>>> 
>>>>>>> 
>>>>>>> Again, nothing about any health checks.
>>>>>>> 
>>>>>>> Any ideas of other things to try or what I could be missing?  Can't say either way about the Mesos health-check system working or not if Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>> 
>>>>>>> Thanks for all your help!
>>>>>>> 
>>>>>>> Best,
>>>>>>> Jay
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>> Maybe you could post your executor stdout/stderr so that we could know whether health check running not.
>>>>>>>> 
>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>> marathon also use mesos health check. When I use health check, I could saw the log like this in executor stdout.
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>> Launching health check process: /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>> Received task health update, healthy: true
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>> I am using my own framework, and the full task info I'm using is posted earlier in this thread.  Do you happen to know if Marathon uses Mesos's health checks for its health check system?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Yes, launch the health task through its definition in taskinfo. Do you launch your task through Marathon? I could test it in my side.
>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>> Precisely, and there are none of those statements.  Are you or others confident health-checks are part of the code path when defined via task info for docker container tasks?  Going through the code, I wasn't able to find the linkage for anything other than health-checks triggered through a custom executor.
>>>>>>>>>>>> 
>>>>>>>>>>>> With that being said it is a pretty good sized code base and I'm not very familiar with it, so my analysis this far has by no means been exhaustive.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> When health check launch, it would have a log like this in your executor stdout
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the logs with the string "health" or "Health" if the health-check were active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double check.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>>> Hi Jay, 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks that's in master but not yet released. It will run docker exec with the command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if they ever run the command (in this case `sleep 5`), but have not found any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this mean that health-checks are only supported for custom executors and not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> -- 
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Best Regards,
>>>> Haosdent Huang
>> 
>> 
>> 
>> -- 
>> Best Regards,
>> Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
Maybe tomorrow I will build a fresh cluster from scratch to see if the broken behavior experienced today still persists.

> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
> 
> As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir which would find mesos-docker-executor and mesos-health-check under this dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is get from it.
> 
> For example, because I 
> ```
> export MESOS_LAUNCHER_DIR=/tmp
> ```
> before start mesos-slave. So when I launch slave, I could find this log in slave log
> ```
> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup: xxxxx  --launcher_dir="/tmp"
> ```
> 
> And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts?
> 
> 
>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>> 
>> I just tried setting both the env var and flag on the slaves, and have determined that the env var is not present when it is being checked src/docker/executor.cpp @ line 573:
>> 
>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>   string path =
>>>     envPath.isSome() ? envPath.get()
>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome() ? "yes" : "no") << endl;
>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>> 
>> 
>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly propagated along up to the point of mesos-slave launch):
>> 
>>> $ cat /etc/default/mesos-slave
>>> export MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>> export MESOS_PORT="5050"
>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>> 
>> TASK OUTPUT:
>> 
>>> MESOS_LAUNCHER_DIR: envpath.isSome()->no
>>> MESOS_LAUNCHER_DIR: path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'
>>> Registered docker executor on mesos-worker2a
>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>> Launching health check process: /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check --executor=(1)@192.168.225.59:44523 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad sh -c \" \/bin\/bash \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>> Health check process launched at pid: 2519
>> 
>> 
>> The env var is not propagated when the docker executor is launched in src/slave/containerizer/docker.cpp around line 903:
>> 
>>>   vector<string> argv;
>>>   argv.push_back("mesos-docker-executor");
>>>   // Construct the mesos-docker-executor using the "name" we gave the
>>>   // container (to distinguish it from Docker containers not created
>>>   // by Mesos).
>>>   Try<Subprocess> s = subprocess(
>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>       argv,
>>>       Subprocess::PIPE(),
>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>       dockerFlags(flags, container->name(), container->directory),
>>>       environment,
>>>       lambda::bind(&setup, container->directory));
>> 
>> 
>> A little ways above we can see the environment is setup w/ the container tasks defined env vars.
>> 
>> See src/slave/containerizer/docker.cpp around line 871:
>> 
>>>   // Include any enviroment variables from ExecutorInfo.
>>>   foreach (const Environment::Variable& variable,
>>>            container->executor.command().environment().variables()) {
>>>     environment[variable.name()] = variable.value();
>>>   }
>> 
>> 
>> Should I file a JIRA for this?  Have I overlooked anything?
>> 
>> 
>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:
>>> >Not sure what was going on with health-checks in 0.24.0.
>>> 0.24.1 should be works.
>>> 
>>> >Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" should exist on? It definitely doesn't exist on the slave, hence execution failing.
>>> 
>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same dir of mesos-docker-executor. 
>>> 
>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>> Maybe I spoke too soon.
>>>> 
>>>> Now the checks are attempting to run, however the STDERR is not looking good.  I've added some debugging to the error message output to show the path, argv, and envp variables:
>>>> 
>>>> STDOUT:
>>>> 
>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --stop_timeout="0ns"
>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --stop_timeout="0ns"
>>>>> Registered docker executor on mesos-worker2a
>>>>> Starting task app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>> Launching health check process: /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check --executor=(1)@192.168.225.59:43917 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc sh -c \" exit 1 \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>> Health check process launched at pid: 3012
>>>> 
>>>> 
>>>> STDERR:
>>>> 
>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>> WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', envp=''): No such file or directory*** Aborted at 1444270649 (unix time) try "date -d @1444270649" if you are using GNU date ***
>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID 3012; stack trace: ***
>>>>> @ 0x7f4a38265340 (unknown)
>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>> @ 0x4191e2 _Abort()
>>>>> @ 0x41921c _Abort()
>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>> @ 0x43cc9c mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>> @ 0x7f4a39d92827 _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>> @ 0x7f4a3825d182 start_thread
>>>>> @ 0x7f4a37f8a47d (unknown)
>>>> 
>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" should exist on? It definitely doesn't exist on the slave, hence execution failing.
>>>> 
>>>> This is with current master, git hash 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>> 
>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>> 
>>>> 
>>>> -Jay
>>>> 
>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>> Update:
>>>>> 
>>>>> I used https://github.com/deric/mesos-deb-packaging to compile and package the latest master (0.26.x) and deployed it to the cluster, and now health checks are working as advertised in both Marathon and my own framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>> 
>>>>> Anyways, thanks again for your help Haosdent!
>>>>> 
>>>>> Cheers,
>>>>> Jay
>>>>> 
>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>> Hi Haosdent,
>>>>>> 
>>>>>> Can you share your Marathon POST request that results in Mesos executing the health checks?
>>>>>> 
>>>>>> Since we can reference the Marathon framework, I've been doing some digging around.
>>>>>> 
>>>>>> Here are the details of my setup and findings:
>>>>>> 
>>>>>> I put a few small hacks in Marathon:
>>>>>> 
>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>>> 
>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X in both the TaskFactory as well an right before the task is sent to Mesos via driver.launchTasks:
>>>>>> 
>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>> 
>>>>>>> $ git diff src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>> 
>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId, config).buildIfMatches(offer, runningTasks).map {
>>>>>>>        case (taskInfo, ports) =>
>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>> +        import java.io._
>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>> +        bw.write("\n")
>>>>>>> +        bw.close()
>>>>>>>          CreatedTask(
>>>>>>>            taskInfo,
>>>>>>>            MarathonTasks.makeTask(
>>>>>> 
>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>> 
>>>>>>> $ git diff src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos: Seq[TaskInfo]): Boolean = {
>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>>>>        import scala.collection.JavaConverters._
>>>>>>> +      var i = 0
>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>> +        import java.io._
>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>> +        bw.write("\n")
>>>>>>> +        bw.close()
>>>>>>> +      }
>>>>>>>        driver.launchTasks(Collections.singleton(offerID), taskInfos.asJava)
>>>>>>>      }
>>>>>> 
>>>>>> Then I built and deployed the hacked Marathon and restarted the marathon service.
>>>>>> 
>>>>>> Next I created the app via the Marathon API ("hello app" is a container with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>>>> 
>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type: application/json' -d'
>>>>>> {
>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>   "apps": [
>>>>>>     {
>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>       "container": {
>>>>>>         "type": "DOCKER",
>>>>>>         "docker": {
>>>>>>           "image": "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>           "network": "BRIDGE",
>>>>>>           "portMappings": [
>>>>>>             {
>>>>>>               "containerPort": 8000,
>>>>>>               "hostPort": 0,
>>>>>>               "protocol": "tcp"
>>>>>>             }
>>>>>>           ]
>>>>>>         }
>>>>>>       },
>>>>>>       "env": {
>>>>>>         
>>>>>>       },
>>>>>>       "healthChecks": [
>>>>>>         {
>>>>>>           "protocol": "COMMAND",
>>>>>>           "command": {"value": "exit 1"},
>>>>>>           "gracePeriodSeconds": 10,
>>>>>>           "intervalSeconds": 10,
>>>>>>           "timeoutSeconds": 10,
>>>>>>           "maxConsecutiveFailures": 3
>>>>>>         }
>>>>>>       ],
>>>>>>       "instances": 1,
>>>>>>       "cpus": 1,
>>>>>>       "mem": 512
>>>>>>     }
>>>>>>   ]
>>>>>> }
>>>>>> 
>>>>>> $ ls /tmp/
>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> 
>>>>>> Do they match?
>>>>>> 
>>>>>> $ md5sum /tmp/task*
>>>>>> 1b5115997e78e2611654059249d99578  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> 1b5115997e78e2611654059249d99578  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> 
>>>>>> Yes, so I am confident this is the information being sent across the wire to Mesos.
>>>>>> 
>>>>>> Do they contain any health-check information?
>>>>>> 
>>>>>> $ cat /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> {
>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>   "task_id":{
>>>>>>     "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>   },
>>>>>>   "slave_id":{
>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>   },
>>>>>>   "resources":[
>>>>>>     {
>>>>>>       "name":"cpus",
>>>>>>       "type":"SCALAR",
>>>>>>       "scalar":{
>>>>>>         "value":1.0
>>>>>>       },
>>>>>>       "role":"*"
>>>>>>     },
>>>>>>     {
>>>>>>       "name":"mem",
>>>>>>       "type":"SCALAR",
>>>>>>       "scalar":{
>>>>>>         "value":512.0
>>>>>>       },
>>>>>>       "role":"*"
>>>>>>     },
>>>>>>     {
>>>>>>       "name":"ports",
>>>>>>       "type":"RANGES",
>>>>>>       "ranges":{
>>>>>>         "range":[
>>>>>>           {
>>>>>>             "begin":31641,
>>>>>>             "end":31641
>>>>>>           }
>>>>>>         ]
>>>>>>       },
>>>>>>       "role":"*"
>>>>>>     }
>>>>>>   ],
>>>>>>   "command":{
>>>>>>     "environment":{
>>>>>>       "variables":[
>>>>>>         {
>>>>>>           "name":"PORT_8000",
>>>>>>           "value":"31641"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"HOST",
>>>>>>           "value":"mesos-worker1a"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>           "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>           "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"PORT",
>>>>>>           "value":"31641"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"PORTS",
>>>>>>           "value":"31641"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"PORT0",
>>>>>>           "value":"31641"
>>>>>>         }
>>>>>>       ]
>>>>>>     },
>>>>>>     "shell":false
>>>>>>   },
>>>>>>   "container":{
>>>>>>     "type":"DOCKER",
>>>>>>     "docker":{
>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>       "network":"BRIDGE",
>>>>>>       "port_mappings":[
>>>>>>         {
>>>>>>           "host_port":31641,
>>>>>>           "container_port":8000,
>>>>>>           "protocol":"tcp"
>>>>>>         }
>>>>>>       ],
>>>>>>       "privileged":false,
>>>>>>       "force_pull_image":false
>>>>>>     }
>>>>>>   }
>>>>>> }
>>>>>> 
>>>>>> No, I don't see anything about any health check.
>>>>>> 
>>>>>> Mesos STDOUT for the launched task:
>>>>>> 
>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" --stop_timeout="0ns"
>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" --stop_timeout="0ns"
>>>>>> Registered docker executor on mesos-worker1a
>>>>>> Starting task app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> 
>>>>>> And STDERR:
>>>>>> 
>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
>>>>>> 
>>>>>> Again, nothing about any health checks.
>>>>>> 
>>>>>> Any ideas of other things to try or what I could be missing?  Can't say either way about the Mesos health-check system working or not if Marathon won't put the health-check into the task it sends to Mesos.
>>>>>> 
>>>>>> Thanks for all your help!
>>>>>> 
>>>>>> Best,
>>>>>> Jay
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Then I built and deployed the hacked Marathon and restarted the marathon service.
>>>>>> 
>>>>>> Next I created the app via the Marathon API ("hello app" is a container with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>>>> 
>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type: application/json' -d'
>>>>>>> {
>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>   "apps": [
>>>>>>>     {
>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>       "container": {
>>>>>>>         "type": "DOCKER",
>>>>>>>         "docker": {
>>>>>>>           "image": "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>           "network": "BRIDGE",
>>>>>>>           "portMappings": [
>>>>>>>             {
>>>>>>>               "containerPort": 8000,
>>>>>>>               "hostPort": 0,
>>>>>>>               "protocol": "tcp"
>>>>>>>             }
>>>>>>>           ]
>>>>>>>         }
>>>>>>>       },
>>>>>>>       "env": {
>>>>>>>         
>>>>>>>       },
>>>>>>>       "healthChecks": [
>>>>>>>         {
>>>>>>>           "protocol": "COMMAND",
>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>           "intervalSeconds": 10,
>>>>>>>           "timeoutSeconds": 10,
>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>         }
>>>>>>>       ],
>>>>>>>       "instances": 1,
>>>>>>>       "cpus": 1,
>>>>>>>       "mem": 512
>>>>>>>     }
>>>>>>>   ]
>>>>>>> }
>>>>>> 
>>>>>> 
>>>>>>> $ ls /tmp/
>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> 
>>>>>> Do they match?
>>>>>> 
>>>>>>> $ md5sum /tmp/task*
>>>>>>> 1b5115997e78e2611654059249d99578  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>> 1b5115997e78e2611654059249d99578  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> 
>>>>>> Yes, so I am confident this is the information being sent across the wire to Mesos.
>>>>>> 
>>>>>> Do they contain any health-check information?
>>>>>> 
>>>>>>> $ cat /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>> {
>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>   "task_id":{
>>>>>>>     "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>   },
>>>>>>>   "slave_id":{
>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>   },
>>>>>>>   "resources":[
>>>>>>>     {
>>>>>>>       "name":"cpus",
>>>>>>>       "type":"SCALAR",
>>>>>>>       "scalar":{
>>>>>>>         "value":1.0
>>>>>>>       },
>>>>>>>       "role":"*"
>>>>>>>     },
>>>>>>>     {
>>>>>>>       "name":"mem",
>>>>>>>       "type":"SCALAR",
>>>>>>>       "scalar":{
>>>>>>>         "value":512.0
>>>>>>>       },
>>>>>>>       "role":"*"
>>>>>>>     },
>>>>>>>     {
>>>>>>>       "name":"ports",
>>>>>>>       "type":"RANGES",
>>>>>>>       "ranges":{
>>>>>>>         "range":[
>>>>>>>           {
>>>>>>>             "begin":31641,
>>>>>>>             "end":31641
>>>>>>>           }
>>>>>>>         ]
>>>>>>>       },
>>>>>>>       "role":"*"
>>>>>>>     }
>>>>>>>   ],
>>>>>>>   "command":{
>>>>>>>     "environment":{
>>>>>>>       "variables":[
>>>>>>>         {
>>>>>>>           "name":"PORT_8000",
>>>>>>>           "value":"31641"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"HOST",
>>>>>>>           "value":"mesos-worker1a"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>           "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>           "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"PORT",
>>>>>>>           "value":"31641"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"PORTS",
>>>>>>>           "value":"31641"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"PORT0",
>>>>>>>           "value":"31641"
>>>>>>>         }
>>>>>>>       ]
>>>>>>>     },
>>>>>>>     "shell":false
>>>>>>>   },
>>>>>>>   "container":{
>>>>>>>     "type":"DOCKER",
>>>>>>>     "docker":{
>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>       "network":"BRIDGE",
>>>>>>>       "port_mappings":[
>>>>>>>         {
>>>>>>>           "host_port":31641,
>>>>>>>           "container_port":8000,
>>>>>>>           "protocol":"tcp"
>>>>>>>         }
>>>>>>>       ],
>>>>>>>       "privileged":false,
>>>>>>>       "force_pull_image":false
>>>>>>>     }
>>>>>>>   }
>>>>>>> }
>>>>>> 
>>>>>> No, I don't see anything about any health check.
>>>>>> 
>>>>>> Mesos STDOUT for the launched task:
>>>>>> 
>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" --stop_timeout="0ns"
>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" --stop_timeout="0ns"
>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>> Starting task app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> 
>>>>>> 
>>>>>> And STDERR:
>>>>>> 
>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
>>>>>> 
>>>>>> 
>>>>>> Again, nothing about any health checks.
>>>>>> 
>>>>>> Any ideas of other things to try or what I could be missing?  Can't say either way about the Mesos health-check system working or not if Marathon won't put the health-check into the task it sends to Mesos.
>>>>>> 
>>>>>> Thanks for all your help!
>>>>>> 
>>>>>> Best,
>>>>>> Jay
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>> Maybe you could post your executor stdout/stderr so that we could know whether health check running not.
>>>>>>>> 
>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>> marathon also use mesos health check. When I use health check, I could saw the log like this in executor stdout.
>>>>>>>> 
>>>>>>>> ```
>>>>>>>> Registered docker executor on xxxxx
>>>>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>> Launching health check process: /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>> Health check process launched at pid: 9895
>>>>>>>> Received task health update, healthy: true
>>>>>>>> ```
>>>>>>>> 
>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>> I am using my own framework, and the full task info I'm using is posted earlier in this thread.  Do you happen to know if Marathon uses Mesos's health checks for its health check system?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Yes, launch the health task through its definition in taskinfo. Do you launch your task through Marathon? I could test it in my side.
>>>>>>>>>> 
>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>> Precisely, and there are none of those statements.  Are you or others confident health-checks are part of the code path when defined via task info for docker container tasks?  Going through the code, I wasn't able to find the linkage for anything other than health-checks triggered through a custom executor.
>>>>>>>>>>> 
>>>>>>>>>>> With that being said it is a pretty good sized code base and I'm not very familiar with it, so my analysis this far has by no means been exhaustive.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> When health check launch, it would have a log like this in your executor stdout
>>>>>>>>>>>> ```
>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>> ```
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the logs with the string "health" or "Health" if the health-check were active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double check.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>> Hi Jay, 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks that's in master but not yet released. It will run docker exec with the command you provided as health checks.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if they ever run the command (in this case `sleep 5`), but have not found any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this mean that health-checks are only supported for custom executors and not for docker tasks?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> -- 
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>> 
>>> 
>>> 
>>> -- 
>>> Best Regards,
>>> Haosdent Huang
> 
> 
> 
> -- 
> Best Regards,
> Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
It's definitely not overridden in any of my other scripts.  Like I said earlier, I've never touched it except for the first time today.



> On Oct 8, 2015, at 7:52 PM, haosdent <ha...@gmail.com> wrote:
> 
> As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir which would find mesos-docker-executor and mesos-health-check under this dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is get from it.
> 
> For example, because I 
> ```
> export MESOS_LAUNCHER_DIR=/tmp
> ```
> before start mesos-slave. So when I launch slave, I could find this log in slave log
> ```
> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup: xxxxx  --launcher_dir="/tmp"
> ```
> 
> And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts?
> 
> 
>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>> 
>> I just tried setting both the env var and flag on the slaves, and have determined that the env var is not present when it is being checked src/docker/executor.cpp @ line 573:
>> 
>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>   string path =
>>>     envPath.isSome() ? envPath.get()
>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome() ? "yes" : "no") << endl;
>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>> 
>> 
>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly propagated along up to the point of mesos-slave launch):
>> 
>>> $ cat /etc/default/mesos-slave
>>> export MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>> export MESOS_PORT="5050"
>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>> 
>> TASK OUTPUT:
>> 
>>> MESOS_LAUNCHER_DIR: envpath.isSome()->no
>>> MESOS_LAUNCHER_DIR: path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'
>>> Registered docker executor on mesos-worker2a
>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>> Launching health check process: /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check --executor=(1)@192.168.225.59:44523 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad sh -c \" \/bin\/bash \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>> Health check process launched at pid: 2519
>> 
>> 
>> The env var is not propagated when the docker executor is launched in src/slave/containerizer/docker.cpp around line 903:
>> 
>>>   vector<string> argv;
>>>   argv.push_back("mesos-docker-executor");
>>>   // Construct the mesos-docker-executor using the "name" we gave the
>>>   // container (to distinguish it from Docker containers not created
>>>   // by Mesos).
>>>   Try<Subprocess> s = subprocess(
>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>       argv,
>>>       Subprocess::PIPE(),
>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>       dockerFlags(flags, container->name(), container->directory),
>>>       environment,
>>>       lambda::bind(&setup, container->directory));
>> 
>> 
>> A little ways above we can see the environment is setup w/ the container tasks defined env vars.
>> 
>> See src/slave/containerizer/docker.cpp around line 871:
>> 
>>>   // Include any enviroment variables from ExecutorInfo.
>>>   foreach (const Environment::Variable& variable,
>>>            container->executor.command().environment().variables()) {
>>>     environment[variable.name()] = variable.value();
>>>   }
>> 
>> 
>> Should I file a JIRA for this?  Have I overlooked anything?
>> 
>> 
>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:
>>> >Not sure what was going on with health-checks in 0.24.0.
>>> 0.24.1 should be works.
>>> 
>>> >Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" should exist on? It definitely doesn't exist on the slave, hence execution failing.
>>> 
>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same dir of mesos-docker-executor. 
>>> 
>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>> Maybe I spoke too soon.
>>>> 
>>>> Now the checks are attempting to run, however the STDERR is not looking good.  I've added some debugging to the error message output to show the path, argv, and envp variables:
>>>> 
>>>> STDOUT:
>>>> 
>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --stop_timeout="0ns"
>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" --stop_timeout="0ns"
>>>>> Registered docker executor on mesos-worker2a
>>>>> Starting task app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>> Launching health check process: /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check --executor=(1)@192.168.225.59:43917 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc sh -c \" exit 1 \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>> Health check process launched at pid: 3012
>>>> 
>>>> 
>>>> STDERR:
>>>> 
>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>> WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', envp=''): No such file or directory*** Aborted at 1444270649 (unix time) try "date -d @1444270649" if you are using GNU date ***
>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID 3012; stack trace: ***
>>>>> @ 0x7f4a38265340 (unknown)
>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>> @ 0x4191e2 _Abort()
>>>>> @ 0x41921c _Abort()
>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>> @ 0x43cc9c mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>> @ 0x7f4a39d92827 _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>> @ 0x7f4a3825d182 start_thread
>>>>> @ 0x7f4a37f8a47d (unknown)
>>>> 
>>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" should exist on? It definitely doesn't exist on the slave, hence execution failing.
>>>> 
>>>> This is with current master, git hash 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>> 
>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>> 
>>>> 
>>>> -Jay
>>>> 
>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>> Update:
>>>>> 
>>>>> I used https://github.com/deric/mesos-deb-packaging to compile and package the latest master (0.26.x) and deployed it to the cluster, and now health checks are working as advertised in both Marathon and my own framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>> 
>>>>> Anyways, thanks again for your help Haosdent!
>>>>> 
>>>>> Cheers,
>>>>> Jay
>>>>> 
>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>> Hi Haosdent,
>>>>>> 
>>>>>> Can you share your Marathon POST request that results in Mesos executing the health checks?
>>>>>> 
>>>>>> Since we can reference the Marathon framework, I've been doing some digging around.
>>>>>> 
>>>>>> Here are the details of my setup and findings:
>>>>>> 
>>>>>> I put a few small hacks in Marathon:
>>>>>> 
>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>>> 
>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X in both the TaskFactory as well an right before the task is sent to Mesos via driver.launchTasks:
>>>>>> 
>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>> 
>>>>>>> $ git diff src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>> 
>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId, config).buildIfMatches(offer, runningTasks).map {
>>>>>>>        case (taskInfo, ports) =>
>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>> +        import java.io._
>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>> +        bw.write("\n")
>>>>>>> +        bw.close()
>>>>>>>          CreatedTask(
>>>>>>>            taskInfo,
>>>>>>>            MarathonTasks.makeTask(
>>>>>> 
>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>> 
>>>>>>> $ git diff src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos: Seq[TaskInfo]): Boolean = {
>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>>>>        import scala.collection.JavaConverters._
>>>>>>> +      var i = 0
>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>> +        import java.io._
>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" + taskInfos(i).getTaskId.getValue)
>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>> +        bw.write("\n")
>>>>>>> +        bw.close()
>>>>>>> +      }
>>>>>>>        driver.launchTasks(Collections.singleton(offerID), taskInfos.asJava)
>>>>>>>      }
>>>>>> 
>>>>>> 
>>>>>> Then I built and deployed the hacked Marathon and restarted the marathon service.
>>>>>> 
>>>>>> Next I created the app via the Marathon API ("hello app" is a container with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>>>> 
>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type: application/json' -d'
>>>>>>> {
>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>   "apps": [
>>>>>>>     {
>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>       "container": {
>>>>>>>         "type": "DOCKER",
>>>>>>>         "docker": {
>>>>>>>           "image": "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>           "network": "BRIDGE",
>>>>>>>           "portMappings": [
>>>>>>>             {
>>>>>>>               "containerPort": 8000,
>>>>>>>               "hostPort": 0,
>>>>>>>               "protocol": "tcp"
>>>>>>>             }
>>>>>>>           ]
>>>>>>>         }
>>>>>>>       },
>>>>>>>       "env": {
>>>>>>>         
>>>>>>>       },
>>>>>>>       "healthChecks": [
>>>>>>>         {
>>>>>>>           "protocol": "COMMAND",
>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>           "intervalSeconds": 10,
>>>>>>>           "timeoutSeconds": 10,
>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>         }
>>>>>>>       ],
>>>>>>>       "instances": 1,
>>>>>>>       "cpus": 1,
>>>>>>>       "mem": 512
>>>>>>>     }
>>>>>>>   ]
>>>>>>> }
>>>>>> 
>>>>>> 
>>>>>>> $ ls /tmp/
>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> 
>>>>>> Do they match?
>>>>>> 
>>>>>>> $ md5sum /tmp/task*
>>>>>>> 1b5115997e78e2611654059249d99578  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>> 1b5115997e78e2611654059249d99578  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> 
>>>>>> Yes, so I am confident this is the information being sent across the wire to Mesos.
>>>>>> 
>>>>>> Do they contain any health-check information?
>>>>>> 
>>>>>>> $ cat /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>> {
>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>   "task_id":{
>>>>>>>     "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>   },
>>>>>>>   "slave_id":{
>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>   },
>>>>>>>   "resources":[
>>>>>>>     {
>>>>>>>       "name":"cpus",
>>>>>>>       "type":"SCALAR",
>>>>>>>       "scalar":{
>>>>>>>         "value":1.0
>>>>>>>       },
>>>>>>>       "role":"*"
>>>>>>>     },
>>>>>>>     {
>>>>>>>       "name":"mem",
>>>>>>>       "type":"SCALAR",
>>>>>>>       "scalar":{
>>>>>>>         "value":512.0
>>>>>>>       },
>>>>>>>       "role":"*"
>>>>>>>     },
>>>>>>>     {
>>>>>>>       "name":"ports",
>>>>>>>       "type":"RANGES",
>>>>>>>       "ranges":{
>>>>>>>         "range":[
>>>>>>>           {
>>>>>>>             "begin":31641,
>>>>>>>             "end":31641
>>>>>>>           }
>>>>>>>         ]
>>>>>>>       },
>>>>>>>       "role":"*"
>>>>>>>     }
>>>>>>>   ],
>>>>>>>   "command":{
>>>>>>>     "environment":{
>>>>>>>       "variables":[
>>>>>>>         {
>>>>>>>           "name":"PORT_8000",
>>>>>>>           "value":"31641"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"HOST",
>>>>>>>           "value":"mesos-worker1a"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>           "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>           "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"PORT",
>>>>>>>           "value":"31641"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"PORTS",
>>>>>>>           "value":"31641"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "name":"PORT0",
>>>>>>>           "value":"31641"
>>>>>>>         }
>>>>>>>       ]
>>>>>>>     },
>>>>>>>     "shell":false
>>>>>>>   },
>>>>>>>   "container":{
>>>>>>>     "type":"DOCKER",
>>>>>>>     "docker":{
>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>       "network":"BRIDGE",
>>>>>>>       "port_mappings":[
>>>>>>>         {
>>>>>>>           "host_port":31641,
>>>>>>>           "container_port":8000,
>>>>>>>           "protocol":"tcp"
>>>>>>>         }
>>>>>>>       ],
>>>>>>>       "privileged":false,
>>>>>>>       "force_pull_image":false
>>>>>>>     }
>>>>>>>   }
>>>>>>> }
>>>>>> 
>>>>>> No, I don't see anything about any health check.
>>>>>> 
>>>>>> Mesos STDOUT for the launched task:
>>>>>> 
>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" --stop_timeout="0ns"
>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" --docker="docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" --stop_timeout="0ns"
>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>> Starting task app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> 
>>>>>> 
>>>>>> And STDERR:
>>>>>> 
>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
>>>>>> 
>>>>>> 
>>>>>> Again, nothing about any health checks.
>>>>>> 
>>>>>> Any ideas of other things to try or what I could be missing?  Can't say either way about the Mesos health-check system working or not if Marathon won't put the health-check into the task it sends to Mesos.
>>>>>> 
>>>>>> Thanks for all your help!
>>>>>> 
>>>>>> Best,
>>>>>> Jay
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>> Maybe you could post your executor stdout/stderr so that we could know whether health check running not.
>>>>>>> 
>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>> marathon also use mesos health check. When I use health check, I could saw the log like this in executor stdout.
>>>>>>>> 
>>>>>>>> ```
>>>>>>>> Registered docker executor on xxxxx
>>>>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>> Launching health check process: /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>>> Health check process launched at pid: 9895
>>>>>>>> Received task health update, healthy: true
>>>>>>>> ```
>>>>>>>> 
>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>> I am using my own framework, and the full task info I'm using is posted earlier in this thread.  Do you happen to know if Marathon uses Mesos's health checks for its health check system?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Yes, launch the health task through its definition in taskinfo. Do you launch your task through Marathon? I could test it in my side.
>>>>>>>>>> 
>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>> Precisely, and there are none of those statements.  Are you or others confident health-checks are part of the code path when defined via task info for docker container tasks?  Going through the code, I wasn't able to find the linkage for anything other than health-checks triggered through a custom executor.
>>>>>>>>>>> 
>>>>>>>>>>> With that being said it is a pretty good sized code base and I'm not very familiar with it, so my analysis this far has by no means been exhaustive.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> When health check launch, it would have a log like this in your executor stdout
>>>>>>>>>>>> ```
>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>> ```
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the logs with the string "health" or "Health" if the health-check were active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double check.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io> wrote:
>>>>>>>>>>>>>>>>>>>> Hi Jay, 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks that's in master but not yet released. It will run docker exec with the command you provided as health checks.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if they ever run the command (in this case `sleep 5`), but have not found any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this mean that health-checks are only supported for custom executors and not for docker tasks?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> -- 
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>> 
>>> 
>>> 
>>> -- 
>>> Best Regards,
>>> Haosdent Huang
> 
> 
> 
> -- 
> Best Regards,
> Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir
which would find mesos-docker-executor and mesos-health-check under this
dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still works
because flags.launcher_dir is get from it.

For example, because I
```
export MESOS_LAUNCHER_DIR=/tmp
```
before start mesos-slave. So when I launch slave, I could find this log in
slave log
```
I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
xxxxx  --launcher_dir="/tmp"
```

And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox
dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts?


On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <ou...@gmail.com> wrote:

> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>
> I just tried setting both the env var and flag on the slaves, and have
> determined that the env var is not present when it is being checked
> src/docker/executor.cpp @ line 573:
>
>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>   string path =
>>     envPath.isSome() ? envPath.get()
>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome() ?
>> "yes" : "no") << endl;
>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>
>
> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
> propagated along up to the point of mesos-slave launch):
>
> $ cat /etc/default/mesos-slave
>> export
>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>> export MESOS_CONTAINERIZERS="mesos,docker"
>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>> export MESOS_PORT="5050"
>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>
>
> TASK OUTPUT:
>
>
>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>> Registered docker executor on mesos-worker2a
>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>> Launching health check process:
>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>> --executor=(1)@192.168.225.59:44523
>> --health_check_json={"command":{"shell":true,"value":"docker exec
>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>> sh -c \" \/bin\/bash
>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>> Health check process launched at pid: 2519
>
>
> The env var is not propagated when the docker executor is launched
> in src/slave/containerizer/docker.cpp around line 903:
>
>   vector<string> argv;
>>   argv.push_back("mesos-docker-executor");
>>   // Construct the mesos-docker-executor using the "name" we gave the
>>   // container (to distinguish it from Docker containers not created
>>   // by Mesos).
>>   Try<Subprocess> s = subprocess(
>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>       argv,
>>       Subprocess::PIPE(),
>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>       dockerFlags(flags, container->name(), container->directory),
>>       environment,
>>       lambda::bind(&setup, container->directory));
>
>
> A little ways above we can see the environment is setup w/ the container
> tasks defined env vars.
>
> See src/slave/containerizer/docker.cpp around line 871:
>
>   // Include any enviroment variables from ExecutorInfo.
>>   foreach (const Environment::Variable& variable,
>>            container->executor.command().environment().variables()) {
>>     environment[variable.name()] = variable.value();
>>   }
>
>
> Should I file a JIRA for this?  Have I overlooked anything?
>
>
> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:
>
>> >Not sure what was going on with health-checks in 0.24.0.
>> 0.24.1 should be works.
>>
>> >Do any of you know which host the path
>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>> should exist on? It definitely doesn't exist on the slave, hence execution
>> failing.
>>
>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got
>> mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same
>> dir of mesos-docker-executor.
>>
>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> Maybe I spoke too soon.
>>>
>>> Now the checks are attempting to run, however the STDERR is not looking
>>> good.  I've added some debugging to the error message output to show the
>>> path, argv, and envp variables:
>>>
>>> STDOUT:
>>>
>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>> --stop_timeout="0ns"
>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>> --stop_timeout="0ns"
>>>> Registered docker executor on mesos-worker2a
>>>> Starting task
>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>> Launching health check process:
>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>> --executor=(1)@192.168.225.59:43917
>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>> sh -c \" exit 1
>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>> Health check process launched at pid: 3012
>>>
>>>
>>> STDERR:
>>>
>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave
>>>> 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>> WARNING: Your kernel does not support swap limit capabilities, memory
>>>> limited without swap.
>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain
>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>>> try "date -d @1444270649" if you are using GNU date ***
>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID
>>>> 3012; stack trace: ***
>>>> @ 0x7f4a38265340 (unknown)
>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>> @ 0x7f4a37eca0d8 (unknown)
>>>> @ 0x4191e2 _Abort()
>>>> @ 0x41921c _Abort()
>>>> @ 0x7f4a39dc2768 process::childMain()
>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>> @ 0x43cc9c
>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>> @ 0x7f4a39d92827
>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>> @ 0x7f4a38a47e40 (unknown)
>>>> @ 0x7f4a3825d182 start_thread
>>>> @ 0x7f4a37f8a47d (unknown)
>>>
>>>
>>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>> should exist on? It definitely doesn't exist on the slave, hence
>>> execution failing.
>>>
>>> This is with current master, git hash
>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>
>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>> Author: Anand Mazumdar <ma...@gmail.com>
>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>
>>>
>>> -Jay
>>>
>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>
>>>> Update:
>>>>
>>>> I used https://github.com/deric/mesos-deb-packaging to compile and
>>>> package the latest master (0.26.x) and deployed it to the cluster, and now
>>>> health checks are working as advertised in both Marathon and my own
>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>
>>>> Anyways, thanks again for your help Haosdent!
>>>>
>>>> Cheers,
>>>> Jay
>>>>
>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Haosdent,
>>>>>
>>>>> Can you share your Marathon POST request that results in Mesos
>>>>> executing the health checks?
>>>>>
>>>>> Since we can reference the Marathon framework, I've been doing some
>>>>> digging around.
>>>>>
>>>>> Here are the details of my setup and findings:
>>>>>
>>>>> I put a few small hacks in Marathon:
>>>>>
>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>>
>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X
>>>>> in both the TaskFactory as well an right before the task is sent to Mesos
>>>>> via driver.launchTasks:
>>>>>
>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>
>>>>> $ git diff
>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>
>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>        case (taskInfo, ports) =>
>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>> +        import java.io._
>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>> +        bw.write("\n")
>>>>>> +        bw.close()
>>>>>>          CreatedTask(
>>>>>>            taskInfo,
>>>>>>            MarathonTasks.makeTask(
>>>>>
>>>>>
>>>>>
>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>
>>>>> $ git diff
>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>>>        import scala.collection.JavaConverters._
>>>>>> +      var i = 0
>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>> +        import java.io._
>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" +
>>>>>> taskInfos(i).getTaskId.getValue)
>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>> +        bw.write("\n")
>>>>>> +        bw.close()
>>>>>> +      }
>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>> taskInfos.asJava)
>>>>>>      }
>>>>>
>>>>>
>>>>> Then I built and deployed the hacked Marathon and restarted the
>>>>> marathon service.
>>>>>
>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>> container with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>>>
>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type:
>>>>>> application/json' -d'
>>>>>> {
>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>   "apps": [
>>>>>>     {
>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>       "container": {
>>>>>>         "type": "DOCKER",
>>>>>>         "docker": {
>>>>>>           "image":
>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>           "network": "BRIDGE",
>>>>>>           "portMappings": [
>>>>>>             {
>>>>>>               "containerPort": 8000,
>>>>>>               "hostPort": 0,
>>>>>>               "protocol": "tcp"
>>>>>>             }
>>>>>>           ]
>>>>>>         }
>>>>>>       },
>>>>>>       "env": {
>>>>>>
>>>>>>       },
>>>>>>       "healthChecks": [
>>>>>>         {
>>>>>>           "protocol": "COMMAND",
>>>>>>           "command": {"value": "exit 1"},
>>>>>>           "gracePeriodSeconds": 10,
>>>>>>           "intervalSeconds": 10,
>>>>>>           "timeoutSeconds": 10,
>>>>>>           "maxConsecutiveFailures": 3
>>>>>>         }
>>>>>>       ],
>>>>>>       "instances": 1,
>>>>>>       "cpus": 1,
>>>>>>       "mem": 512
>>>>>>     }
>>>>>>   ]
>>>>>> }
>>>>>
>>>>>
>>>>> $ ls /tmp/
>>>>>>
>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>
>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>
>>>>>
>>>>> Do they match?
>>>>>
>>>>> $ md5sum /tmp/task*
>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>
>>>>>
>>>>> Yes, so I am confident this is the information being sent across the
>>>>> wire to Mesos.
>>>>>
>>>>> Do they contain any health-check information?
>>>>>
>>>>> $ cat
>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>> {
>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>   "task_id":{
>>>>>>
>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>   },
>>>>>>   "slave_id":{
>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>   },
>>>>>>   "resources":[
>>>>>>     {
>>>>>>       "name":"cpus",
>>>>>>       "type":"SCALAR",
>>>>>>       "scalar":{
>>>>>>         "value":1.0
>>>>>>       },
>>>>>>       "role":"*"
>>>>>>     },
>>>>>>     {
>>>>>>       "name":"mem",
>>>>>>       "type":"SCALAR",
>>>>>>       "scalar":{
>>>>>>         "value":512.0
>>>>>>       },
>>>>>>       "role":"*"
>>>>>>     },
>>>>>>     {
>>>>>>       "name":"ports",
>>>>>>       "type":"RANGES",
>>>>>>       "ranges":{
>>>>>>         "range":[
>>>>>>           {
>>>>>>             "begin":31641,
>>>>>>             "end":31641
>>>>>>           }
>>>>>>         ]
>>>>>>       },
>>>>>>       "role":"*"
>>>>>>     }
>>>>>>   ],
>>>>>>   "command":{
>>>>>>     "environment":{
>>>>>>       "variables":[
>>>>>>         {
>>>>>>           "name":"PORT_8000",
>>>>>>           "value":"31641"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"HOST",
>>>>>>           "value":"mesos-worker1a"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>
>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>
>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"PORT",
>>>>>>           "value":"31641"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"PORTS",
>>>>>>           "value":"31641"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>         },
>>>>>>         {
>>>>>>           "name":"PORT0",
>>>>>>           "value":"31641"
>>>>>>         }
>>>>>>       ]
>>>>>>     },
>>>>>>     "shell":false
>>>>>>   },
>>>>>>   "container":{
>>>>>>     "type":"DOCKER",
>>>>>>     "docker":{
>>>>>>
>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>       "network":"BRIDGE",
>>>>>>       "port_mappings":[
>>>>>>         {
>>>>>>           "host_port":31641,
>>>>>>           "container_port":8000,
>>>>>>           "protocol":"tcp"
>>>>>>         }
>>>>>>       ],
>>>>>>       "privileged":false,
>>>>>>       "force_pull_image":false
>>>>>>     }
>>>>>>   }
>>>>>> }
>>>>>
>>>>>
>>>>> No, I don't see anything about any health check.
>>>>>
>>>>> Mesos STDOUT for the launched task:
>>>>>
>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>> --stop_timeout="0ns"
>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>> --stop_timeout="0ns"
>>>>>> Registered docker executor on mesos-worker1a
>>>>>> Starting task
>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>
>>>>>
>>>>> And STDERR:
>>>>>
>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on
>>>>>> slave 20150924-210922-1608624320-5050-1792-S1
>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory
>>>>>> limited without swap.
>>>>>
>>>>>
>>>>> Again, nothing about any health checks.
>>>>>
>>>>> Any ideas of other things to try or what I could be missing?  Can't
>>>>> say either way about the Mesos health-check system working or not if
>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>
>>>>> Thanks for all your help!
>>>>>
>>>>> Best,
>>>>> Jay
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>>> Maybe you could post your executor stdout/stderr so that we could
>>>>>> know whether health check running not.
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>
>>>>>>> marathon also use mesos health check. When I use health check, I
>>>>>>> could saw the log like this in executor stdout.
>>>>>>>
>>>>>>> ```
>>>>>>> Registered docker executor on xxxxx
>>>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>> Launching health check process:
>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>>> Health check process launched at pid: 9895
>>>>>>> Received task health update, healthy: true
>>>>>>> ```
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I am using my own framework, and the full task info I'm using is
>>>>>>>> posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>>>>> Mesos's health checks for its health check system?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Yes, launch the health task through its definition in taskinfo. Do
>>>>>>>> you launch your task through Marathon? I could test it in my side.
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Precisely, and there are none of those statements.  Are you or
>>>>>>>>> others confident health-checks are part of the code path when defined via
>>>>>>>>> task info for docker container tasks?  Going through the code, I wasn't
>>>>>>>>> able to find the linkage for anything other than health-checks triggered
>>>>>>>>> through a custom executor.
>>>>>>>>>
>>>>>>>>> With that being said it is a pretty good sized code base and I'm
>>>>>>>>> not very familiar with it, so my analysis this far has by no means been
>>>>>>>>> exhaustive.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> When health check launch, it would have a log like this in your
>>>>>>>>> executor stdout
>>>>>>>>> ```
>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the
>>>>>>>>>> logs with the string "health" or "Health" if the health-check were active?
>>>>>>>>>> None of my master or slave logs contain the string..
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could
>>>>>>>>>> see unhealthy status in your task stdout/stderr.
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me
>>>>>>>>>>>>> double check.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look
>>>>>>>>>>>>>> there :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <jay@jaytaylor.com
>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it
>>>>>>>>>>>>>>> out?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We just added health check support for docker tasks that's
>>>>>>>>>>>>>>>> in master but not yet released. It will run docker exec with the command
>>>>>>>>>>>>>>>> you provided as health checks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?
>>>>>>>>>>>>>>>> Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have searched all machines and containers to see if they
>>>>>>>>>>>>>>>> ever run the command (in this case `sleep 5`), but have not found any
>>>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>>>>>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>>>>>>>>>>> docker tasks?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


-- 
Best Regards,
Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.

I just tried setting both the env var and flag on the slaves, and have
determined that the env var is not present when it is being checked
src/docker/executor.cpp @ line 573:

 const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>   string path =
>     envPath.isSome() ? envPath.get()
>                      : os::realpath(Path(argv[0]).dirname()).get();
>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome() ?
> "yes" : "no") << endl;
>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;


Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
propagated along up to the point of mesos-slave launch):

$ cat /etc/default/mesos-slave
> export
> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
> export MESOS_CONTAINERIZERS="mesos,docker"
> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
> export MESOS_PORT="5050"
> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"


TASK OUTPUT:


> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
> Registered docker executor on mesos-worker2a
> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
> Launching health check process:
> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
> --executor=(1)@192.168.225.59:44523
> --health_check_json={"command":{"shell":true,"value":"docker exec
> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
> sh -c \" \/bin\/bash
> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
> Health check process launched at pid: 2519


The env var is not propagated when the docker executor is launched
in src/slave/containerizer/docker.cpp around line 903:

  vector<string> argv;
>   argv.push_back("mesos-docker-executor");
>   // Construct the mesos-docker-executor using the "name" we gave the
>   // container (to distinguish it from Docker containers not created
>   // by Mesos).
>   Try<Subprocess> s = subprocess(
>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>       argv,
>       Subprocess::PIPE(),
>       Subprocess::PATH(path::join(container->directory, "stdout")),
>       Subprocess::PATH(path::join(container->directory, "stderr")),
>       dockerFlags(flags, container->name(), container->directory),
>       environment,
>       lambda::bind(&setup, container->directory));


A little ways above we can see the environment is setup w/ the container
tasks defined env vars.

See src/slave/containerizer/docker.cpp around line 871:

  // Include any enviroment variables from ExecutorInfo.
>   foreach (const Environment::Variable& variable,
>            container->executor.command().environment().variables()) {
>     environment[variable.name()] = variable.value();
>   }


Should I file a JIRA for this?  Have I overlooked anything?


On Wed, Oct 7, 2015 at 8:11 PM, haosdent <ha...@gmail.com> wrote:

> >Not sure what was going on with health-checks in 0.24.0.
> 0.24.1 should be works.
>
> >Do any of you know which host the path
> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
> should exist on? It definitely doesn't exist on the slave, hence execution
> failing.
>
> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got
> mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same
> dir of mesos-docker-executor.
>
> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com> wrote:
>
>> Maybe I spoke too soon.
>>
>> Now the checks are attempting to run, however the STDERR is not looking
>> good.  I've added some debugging to the error message output to show the
>> path, argv, and envp variables:
>>
>> STDOUT:
>>
>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>> --stop_timeout="0ns"
>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>> --stop_timeout="0ns"
>>> Registered docker executor on mesos-worker2a
>>> Starting task
>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>> Launching health check process:
>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>> --executor=(1)@192.168.225.59:43917
>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>> sh -c \" exit 1
>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>> Health check process launched at pid: 3012
>>
>>
>> STDERR:
>>
>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave
>>> 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>> WARNING: Your kernel does not support swap limit capabilities, memory
>>> limited without swap.
>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain
>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>> try "date -d @1444270649" if you are using GNU date ***
>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID
>>> 3012; stack trace: ***
>>> @ 0x7f4a38265340 (unknown)
>>> @ 0x7f4a37ec6cc9 (unknown)
>>> @ 0x7f4a37eca0d8 (unknown)
>>> @ 0x4191e2 _Abort()
>>> @ 0x41921c _Abort()
>>> @ 0x7f4a39dc2768 process::childMain()
>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>> @ 0x7f4a39dc24fc process::defaultClone()
>>> @ 0x7f4a39dc34fb process::subprocess()
>>> @ 0x43cc9c
>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>> @ 0x7f4a39d92827
>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>> @ 0x7f4a38a47e40 (unknown)
>>> @ 0x7f4a3825d182 start_thread
>>> @ 0x7f4a37f8a47d (unknown)
>>
>>
>> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>> should exist on? It definitely doesn't exist on the slave, hence
>> execution failing.
>>
>> This is with current master, git hash
>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>
>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>> Author: Anand Mazumdar <ma...@gmail.com>
>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>
>>
>> -Jay
>>
>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> Update:
>>>
>>> I used https://github.com/deric/mesos-deb-packaging to compile and
>>> package the latest master (0.26.x) and deployed it to the cluster, and now
>>> health checks are working as advertised in both Marathon and my own
>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>
>>> Anyways, thanks again for your help Haosdent!
>>>
>>> Cheers,
>>> Jay
>>>
>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>
>>>> Hi Haosdent,
>>>>
>>>> Can you share your Marathon POST request that results in Mesos
>>>> executing the health checks?
>>>>
>>>> Since we can reference the Marathon framework, I've been doing some
>>>> digging around.
>>>>
>>>> Here are the details of my setup and findings:
>>>>
>>>> I put a few small hacks in Marathon:
>>>>
>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>
>>>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X
>>>> in both the TaskFactory as well an right before the task is sent to Mesos
>>>> via driver.launchTasks:
>>>>
>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>
>>>> $ git diff
>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>
>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>        case (taskInfo, ports) =>
>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>> +        import java.io._
>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>> +        bw.write("\n")
>>>>> +        bw.close()
>>>>>          CreatedTask(
>>>>>            taskInfo,
>>>>>            MarathonTasks.makeTask(
>>>>
>>>>
>>>>
>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>
>>>> $ git diff
>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>> Seq[TaskInfo]): Boolean = {
>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>>        import scala.collection.JavaConverters._
>>>>> +      var i = 0
>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>> +        import java.io._
>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" +
>>>>> taskInfos(i).getTaskId.getValue)
>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>> +        bw.write("\n")
>>>>> +        bw.close()
>>>>> +      }
>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>> taskInfos.asJava)
>>>>>      }
>>>>
>>>>
>>>> Then I built and deployed the hacked Marathon and restarted the
>>>> marathon service.
>>>>
>>>> Next I created the app via the Marathon API ("hello app" is a container
>>>> with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>>
>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type:
>>>>> application/json' -d'
>>>>> {
>>>>>   "id": "/app-81-1-hello-app",
>>>>>   "apps": [
>>>>>     {
>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>       "container": {
>>>>>         "type": "DOCKER",
>>>>>         "docker": {
>>>>>           "image":
>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>           "network": "BRIDGE",
>>>>>           "portMappings": [
>>>>>             {
>>>>>               "containerPort": 8000,
>>>>>               "hostPort": 0,
>>>>>               "protocol": "tcp"
>>>>>             }
>>>>>           ]
>>>>>         }
>>>>>       },
>>>>>       "env": {
>>>>>
>>>>>       },
>>>>>       "healthChecks": [
>>>>>         {
>>>>>           "protocol": "COMMAND",
>>>>>           "command": {"value": "exit 1"},
>>>>>           "gracePeriodSeconds": 10,
>>>>>           "intervalSeconds": 10,
>>>>>           "timeoutSeconds": 10,
>>>>>           "maxConsecutiveFailures": 3
>>>>>         }
>>>>>       ],
>>>>>       "instances": 1,
>>>>>       "cpus": 1,
>>>>>       "mem": 512
>>>>>     }
>>>>>   ]
>>>>> }
>>>>
>>>>
>>>> $ ls /tmp/
>>>>>
>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>
>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>
>>>>
>>>> Do they match?
>>>>
>>>> $ md5sum /tmp/task*
>>>>> 1b5115997e78e2611654059249d99578
>>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>> 1b5115997e78e2611654059249d99578
>>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>
>>>>
>>>> Yes, so I am confident this is the information being sent across the
>>>> wire to Mesos.
>>>>
>>>> Do they contain any health-check information?
>>>>
>>>> $ cat
>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>> {
>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>   "task_id":{
>>>>>
>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>   },
>>>>>   "slave_id":{
>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>   },
>>>>>   "resources":[
>>>>>     {
>>>>>       "name":"cpus",
>>>>>       "type":"SCALAR",
>>>>>       "scalar":{
>>>>>         "value":1.0
>>>>>       },
>>>>>       "role":"*"
>>>>>     },
>>>>>     {
>>>>>       "name":"mem",
>>>>>       "type":"SCALAR",
>>>>>       "scalar":{
>>>>>         "value":512.0
>>>>>       },
>>>>>       "role":"*"
>>>>>     },
>>>>>     {
>>>>>       "name":"ports",
>>>>>       "type":"RANGES",
>>>>>       "ranges":{
>>>>>         "range":[
>>>>>           {
>>>>>             "begin":31641,
>>>>>             "end":31641
>>>>>           }
>>>>>         ]
>>>>>       },
>>>>>       "role":"*"
>>>>>     }
>>>>>   ],
>>>>>   "command":{
>>>>>     "environment":{
>>>>>       "variables":[
>>>>>         {
>>>>>           "name":"PORT_8000",
>>>>>           "value":"31641"
>>>>>         },
>>>>>         {
>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>         },
>>>>>         {
>>>>>           "name":"HOST",
>>>>>           "value":"mesos-worker1a"
>>>>>         },
>>>>>         {
>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>
>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>         },
>>>>>         {
>>>>>           "name":"MESOS_TASK_ID",
>>>>>
>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>         },
>>>>>         {
>>>>>           "name":"PORT",
>>>>>           "value":"31641"
>>>>>         },
>>>>>         {
>>>>>           "name":"PORTS",
>>>>>           "value":"31641"
>>>>>         },
>>>>>         {
>>>>>           "name":"MARATHON_APP_ID",
>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>         },
>>>>>         {
>>>>>           "name":"PORT0",
>>>>>           "value":"31641"
>>>>>         }
>>>>>       ]
>>>>>     },
>>>>>     "shell":false
>>>>>   },
>>>>>   "container":{
>>>>>     "type":"DOCKER",
>>>>>     "docker":{
>>>>>
>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>       "network":"BRIDGE",
>>>>>       "port_mappings":[
>>>>>         {
>>>>>           "host_port":31641,
>>>>>           "container_port":8000,
>>>>>           "protocol":"tcp"
>>>>>         }
>>>>>       ],
>>>>>       "privileged":false,
>>>>>       "force_pull_image":false
>>>>>     }
>>>>>   }
>>>>> }
>>>>
>>>>
>>>> No, I don't see anything about any health check.
>>>>
>>>> Mesos STDOUT for the launched task:
>>>>
>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>> --stop_timeout="0ns"
>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>> --stop_timeout="0ns"
>>>>> Registered docker executor on mesos-worker1a
>>>>> Starting task
>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>
>>>>
>>>> And STDERR:
>>>>
>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave
>>>>> 20150924-210922-1608624320-5050-1792-S1
>>>>> WARNING: Your kernel does not support swap limit capabilities, memory
>>>>> limited without swap.
>>>>
>>>>
>>>> Again, nothing about any health checks.
>>>>
>>>> Any ideas of other things to try or what I could be missing?  Can't say
>>>> either way about the Mesos health-check system working or not if Marathon
>>>> won't put the health-check into the task it sends to Mesos.
>>>>
>>>> Thanks for all your help!
>>>>
>>>> Best,
>>>> Jay
>>>>
>>>>
>>>>
>>>>>
>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com> wrote:
>>>>
>>>>> Maybe you could post your executor stdout/stderr so that we could know
>>>>> whether health check running not.
>>>>>
>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>>> marathon also use mesos health check. When I use health check, I
>>>>>> could saw the log like this in executor stdout.
>>>>>>
>>>>>> ```
>>>>>> Registered docker executor on xxxxx
>>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>> Launching health check process:
>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>> Health check process launched at pid: 9895
>>>>>> Received task health update, healthy: true
>>>>>> ```
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I am using my own framework, and the full task info I'm using is
>>>>>>> posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>>>> Mesos's health checks for its health check system?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>
>>>>>>> Yes, launch the health task through its definition in taskinfo. Do
>>>>>>> you launch your task through Marathon? I could test it in my side.
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Precisely, and there are none of those statements.  Are you or
>>>>>>>> others confident health-checks are part of the code path when defined via
>>>>>>>> task info for docker container tasks?  Going through the code, I wasn't
>>>>>>>> able to find the linkage for anything other than health-checks triggered
>>>>>>>> through a custom executor.
>>>>>>>>
>>>>>>>> With that being said it is a pretty good sized code base and I'm
>>>>>>>> not very familiar with it, so my analysis this far has by no means been
>>>>>>>> exhaustive.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> When health check launch, it would have a log like this in your
>>>>>>>> executor stdout
>>>>>>>> ```
>>>>>>>> Health check process launched at pid xxx
>>>>>>>> ```
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I'm happy to try this, however wouldn't there be output in the
>>>>>>>>> logs with the string "health" or "Health" if the health-check were active?
>>>>>>>>> None of my master or slave logs contain the string..
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could
>>>>>>>>> see unhealthy status in your task stdout/stderr.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me
>>>>>>>>>>>> double check.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>> outtatime@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look
>>>>>>>>>>>>> there :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it
>>>>>>>>>>>>>> out?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We just added health check support for docker tasks that's
>>>>>>>>>>>>>>> in master but not yet released. It will run docker exec with the command
>>>>>>>>>>>>>>> you provided as health checks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?
>>>>>>>>>>>>>>> Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have searched all machines and containers to see if they
>>>>>>>>>>>>>>> ever run the command (in this case `sleep 5`), but have not found any
>>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>>>>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>>>>>>>>>> docker tasks?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
>Not sure what was going on with health-checks in 0.24.0.
0.24.1 should be works.

>Do any of you know which host the path
"/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
should exist on? It definitely doesn't exist on the slave, hence execution
failing.

Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got
mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same
dir of mesos-docker-executor.

On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <ou...@gmail.com> wrote:

> Maybe I spoke too soon.
>
> Now the checks are attempting to run, however the STDERR is not looking
> good.  I've added some debugging to the error message output to show the
> path, argv, and envp variables:
>
> STDOUT:
>
> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>> --stop_timeout="0ns"
>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>> --stop_timeout="0ns"
>> Registered docker executor on mesos-worker2a
>> Starting task
>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>> Launching health check process:
>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>> --executor=(1)@192.168.225.59:43917
>> --health_check_json={"command":{"shell":true,"value":"docker exec
>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>> sh -c \" exit 1
>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>> Health check process launched at pid: 3012
>
>
> STDERR:
>
> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave
>> 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>> WARNING: Your kernel does not support swap limit capabilities, memory
>> limited without swap.
>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain
>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>> try "date -d @1444270649" if you are using GNU date ***
>> PC: @ 0x7f4a37ec6cc9 (unknown)
>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID
>> 3012; stack trace: ***
>> @ 0x7f4a38265340 (unknown)
>> @ 0x7f4a37ec6cc9 (unknown)
>> @ 0x7f4a37eca0d8 (unknown)
>> @ 0x4191e2 _Abort()
>> @ 0x41921c _Abort()
>> @ 0x7f4a39dc2768 process::childMain()
>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>> @ 0x7f4a39dc24fc process::defaultClone()
>> @ 0x7f4a39dc34fb process::subprocess()
>> @ 0x43cc9c
>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>> @ 0x7f4a39d92827
>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>> @ 0x7f4a38a47e40 (unknown)
>> @ 0x7f4a3825d182 start_thread
>> @ 0x7f4a37f8a47d (unknown)
>
>
> Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
> should exist on? It definitely doesn't exist on the slave, hence
> execution failing.
>
> This is with current master, git hash
> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>
> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>> Author: Anand Mazumdar <ma...@gmail.com>
>> Date:   Tue Oct 6 17:37:41 2015 -0700
>
>
> -Jay
>
> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com> wrote:
>
>> Update:
>>
>> I used https://github.com/deric/mesos-deb-packaging to compile and
>> package the latest master (0.26.x) and deployed it to the cluster, and now
>> health checks are working as advertised in both Marathon and my own
>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>
>> Anyways, thanks again for your help Haosdent!
>>
>> Cheers,
>> Jay
>>
>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> Hi Haosdent,
>>>
>>> Can you share your Marathon POST request that results in Mesos executing
>>> the health checks?
>>>
>>> Since we can reference the Marathon framework, I've been doing some
>>> digging around.
>>>
>>> Here are the details of my setup and findings:
>>>
>>> I put a few small hacks in Marathon:
>>>
>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>
>>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X
>>> in both the TaskFactory as well an right before the task is sent to Mesos
>>> via driver.launchTasks:
>>>
>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>
>>> $ git diff
>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>
>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>        case (taskInfo, ports) =>
>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>> +        import java.io._
>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>> +        bw.write("\n")
>>>> +        bw.close()
>>>>          CreatedTask(
>>>>            taskInfo,
>>>>            MarathonTasks.makeTask(
>>>
>>>
>>>
>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>
>>> $ git diff
>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>> Seq[TaskInfo]): Boolean = {
>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>        import scala.collection.JavaConverters._
>>>> +      var i = 0
>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>> +        import java.io._
>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" +
>>>> taskInfos(i).getTaskId.getValue)
>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>> +        bw.write("\n")
>>>> +        bw.close()
>>>> +      }
>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>> taskInfos.asJava)
>>>>      }
>>>
>>>
>>> Then I built and deployed the hacked Marathon and restarted the marathon
>>> service.
>>>
>>> Next I created the app via the Marathon API ("hello app" is a container
>>> with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>
>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type:
>>>> application/json' -d'
>>>> {
>>>>   "id": "/app-81-1-hello-app",
>>>>   "apps": [
>>>>     {
>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>       "container": {
>>>>         "type": "DOCKER",
>>>>         "docker": {
>>>>           "image":
>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>           "network": "BRIDGE",
>>>>           "portMappings": [
>>>>             {
>>>>               "containerPort": 8000,
>>>>               "hostPort": 0,
>>>>               "protocol": "tcp"
>>>>             }
>>>>           ]
>>>>         }
>>>>       },
>>>>       "env": {
>>>>
>>>>       },
>>>>       "healthChecks": [
>>>>         {
>>>>           "protocol": "COMMAND",
>>>>           "command": {"value": "exit 1"},
>>>>           "gracePeriodSeconds": 10,
>>>>           "intervalSeconds": 10,
>>>>           "timeoutSeconds": 10,
>>>>           "maxConsecutiveFailures": 3
>>>>         }
>>>>       ],
>>>>       "instances": 1,
>>>>       "cpus": 1,
>>>>       "mem": 512
>>>>     }
>>>>   ]
>>>> }
>>>
>>>
>>> $ ls /tmp/
>>>>
>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>
>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>
>>>
>>> Do they match?
>>>
>>> $ md5sum /tmp/task*
>>>> 1b5115997e78e2611654059249d99578
>>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>> 1b5115997e78e2611654059249d99578
>>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>
>>>
>>> Yes, so I am confident this is the information being sent across the
>>> wire to Mesos.
>>>
>>> Do they contain any health-check information?
>>>
>>> $ cat
>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>> {
>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>   "task_id":{
>>>>
>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>   },
>>>>   "slave_id":{
>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>   },
>>>>   "resources":[
>>>>     {
>>>>       "name":"cpus",
>>>>       "type":"SCALAR",
>>>>       "scalar":{
>>>>         "value":1.0
>>>>       },
>>>>       "role":"*"
>>>>     },
>>>>     {
>>>>       "name":"mem",
>>>>       "type":"SCALAR",
>>>>       "scalar":{
>>>>         "value":512.0
>>>>       },
>>>>       "role":"*"
>>>>     },
>>>>     {
>>>>       "name":"ports",
>>>>       "type":"RANGES",
>>>>       "ranges":{
>>>>         "range":[
>>>>           {
>>>>             "begin":31641,
>>>>             "end":31641
>>>>           }
>>>>         ]
>>>>       },
>>>>       "role":"*"
>>>>     }
>>>>   ],
>>>>   "command":{
>>>>     "environment":{
>>>>       "variables":[
>>>>         {
>>>>           "name":"PORT_8000",
>>>>           "value":"31641"
>>>>         },
>>>>         {
>>>>           "name":"MARATHON_APP_VERSION",
>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>         },
>>>>         {
>>>>           "name":"HOST",
>>>>           "value":"mesos-worker1a"
>>>>         },
>>>>         {
>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>
>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>         },
>>>>         {
>>>>           "name":"MESOS_TASK_ID",
>>>>
>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>         },
>>>>         {
>>>>           "name":"PORT",
>>>>           "value":"31641"
>>>>         },
>>>>         {
>>>>           "name":"PORTS",
>>>>           "value":"31641"
>>>>         },
>>>>         {
>>>>           "name":"MARATHON_APP_ID",
>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>         },
>>>>         {
>>>>           "name":"PORT0",
>>>>           "value":"31641"
>>>>         }
>>>>       ]
>>>>     },
>>>>     "shell":false
>>>>   },
>>>>   "container":{
>>>>     "type":"DOCKER",
>>>>     "docker":{
>>>>
>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>       "network":"BRIDGE",
>>>>       "port_mappings":[
>>>>         {
>>>>           "host_port":31641,
>>>>           "container_port":8000,
>>>>           "protocol":"tcp"
>>>>         }
>>>>       ],
>>>>       "privileged":false,
>>>>       "force_pull_image":false
>>>>     }
>>>>   }
>>>> }
>>>
>>>
>>> No, I don't see anything about any health check.
>>>
>>> Mesos STDOUT for the launched task:
>>>
>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>> --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>> --stop_timeout="0ns"
>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>> --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>> --stop_timeout="0ns"
>>>> Registered docker executor on mesos-worker1a
>>>> Starting task
>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>
>>>
>>> And STDERR:
>>>
>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave
>>>> 20150924-210922-1608624320-5050-1792-S1
>>>> WARNING: Your kernel does not support swap limit capabilities, memory
>>>> limited without swap.
>>>
>>>
>>> Again, nothing about any health checks.
>>>
>>> Any ideas of other things to try or what I could be missing?  Can't say
>>> either way about the Mesos health-check system working or not if Marathon
>>> won't put the health-check into the task it sends to Mesos.
>>>
>>> Thanks for all your help!
>>>
>>> Best,
>>> Jay
>>>
>>>
>>>
>>>>
>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com> wrote:
>>>
>>>> Maybe you could post your executor stdout/stderr so that we could know
>>>> whether health check running not.
>>>>
>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com> wrote:
>>>>
>>>>> marathon also use mesos health check. When I use health check, I could
>>>>> saw the log like this in executor stdout.
>>>>>
>>>>> ```
>>>>> Registered docker executor on xxxxx
>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>> Launching health check process:
>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>> Health check process launched at pid: 9895
>>>>> Received task health update, healthy: true
>>>>> ```
>>>>>
>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I am using my own framework, and the full task info I'm using is
>>>>>> posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>>> Mesos's health checks for its health check system?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>
>>>>>> Yes, launch the health task through its definition in taskinfo. Do
>>>>>> you launch your task through Marathon? I could test it in my side.
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Precisely, and there are none of those statements.  Are you or
>>>>>>> others confident health-checks are part of the code path when defined via
>>>>>>> task info for docker container tasks?  Going through the code, I wasn't
>>>>>>> able to find the linkage for anything other than health-checks triggered
>>>>>>> through a custom executor.
>>>>>>>
>>>>>>> With that being said it is a pretty good sized code base and I'm not
>>>>>>> very familiar with it, so my analysis this far has by no means been
>>>>>>> exhaustive.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>
>>>>>>> When health check launch, it would have a log like this in your
>>>>>>> executor stdout
>>>>>>> ```
>>>>>>> Health check process launched at pid xxx
>>>>>>> ```
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I'm happy to try this, however wouldn't there be output in the logs
>>>>>>>> with the string "health" or "Health" if the health-check were active?  None
>>>>>>>> of my master or slave logs contain the string..
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could
>>>>>>>> see unhealthy status in your task stdout/stderr.
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>
>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>
>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me
>>>>>>>>>>> double check.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <outtatime@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look
>>>>>>>>>>>> there :)
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>> tim@mesosphere.io> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We just added health check support for docker tasks that's in
>>>>>>>>>>>>>> master but not yet released. It will run docker exec with the command you
>>>>>>>>>>>>>> provided as health checks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?
>>>>>>>>>>>>>> Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have searched all machines and containers to see if they
>>>>>>>>>>>>>> ever run the command (in this case `sleep 5`), but have not found any
>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>>>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>>>>>>>>> docker tasks?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>
>


-- 
Best Regards,
Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
Maybe I spoke too soon.

Now the checks are attempting to run, however the STDERR is not looking
good.  I've added some debugging to the error message output to show the
path, argv, and envp variables:

STDOUT:

--container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
> --stop_timeout="0ns"
> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
> --stop_timeout="0ns"
> Registered docker executor on mesos-worker2a
> Starting task
> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
> Launching health check process:
> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
> --executor=(1)@192.168.225.59:43917
> --health_check_json={"command":{"shell":true,"value":"docker exec
> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
> sh -c \" exit 1
> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
> Health check process launched at pid: 3012


STDERR:

I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave
> 16b49e90-6852-4c91-8e70-d89c54f25668-S1
> WARNING: Your kernel does not support swap limit capabilities, memory
> limited without swap.
> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain
> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
> try "date -d @1444270649" if you are using GNU date ***
> PC: @ 0x7f4a37ec6cc9 (unknown)
> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID
> 3012; stack trace: ***
> @ 0x7f4a38265340 (unknown)
> @ 0x7f4a37ec6cc9 (unknown)
> @ 0x7f4a37eca0d8 (unknown)
> @ 0x4191e2 _Abort()
> @ 0x41921c _Abort()
> @ 0x7f4a39dc2768 process::childMain()
> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
> @ 0x7f4a39dc24fc process::defaultClone()
> @ 0x7f4a39dc34fb process::subprocess()
> @ 0x43cc9c
> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
> @ 0x7f4a39d924f4 process::ProcessManager::resume()
> @ 0x7f4a39d92827
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x7f4a38a47e40 (unknown)
> @ 0x7f4a3825d182 start_thread
> @ 0x7f4a37f8a47d (unknown)


Do any of you know which host the path
"/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
should exist on? It definitely doesn't exist on the slave, hence execution
failing.

This is with current master, git hash
5058fac1083dc91bca54d33c26c810c17ad95dd1.

commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
> Author: Anand Mazumdar <ma...@gmail.com>
> Date:   Tue Oct 6 17:37:41 2015 -0700


-Jay

On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <ou...@gmail.com> wrote:

> Update:
>
> I used https://github.com/deric/mesos-deb-packaging to compile and
> package the latest master (0.26.x) and deployed it to the cluster, and now
> health checks are working as advertised in both Marathon and my own
> framework!  Not sure what was going on with health-checks in 0.24.0..
>
> Anyways, thanks again for your help Haosdent!
>
> Cheers,
> Jay
>
> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com> wrote:
>
>> Hi Haosdent,
>>
>> Can you share your Marathon POST request that results in Mesos executing
>> the health checks?
>>
>> Since we can reference the Marathon framework, I've been doing some
>> digging around.
>>
>> Here are the details of my setup and findings:
>>
>> I put a few small hacks in Marathon:
>>
>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>
>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X in
>> both the TaskFactory as well an right before the task is sent to Mesos via
>> driver.launchTasks:
>>
>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>
>> $ git diff
>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>
>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>> config).buildIfMatches(offer, runningTasks).map {
>>>        case (taskInfo, ports) =>
>>> +        import com.googlecode.protobuf.format.JsonFormat
>>> +        import java.io._
>>> +        val bw = new BufferedWriter(new FileWriter(new
>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>> +        bw.write("\n")
>>> +        bw.close()
>>>          CreatedTask(
>>>            taskInfo,
>>>            MarathonTasks.makeTask(
>>
>>
>>
>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>
>> $ git diff
>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>    override def launchTasks(offerID: OfferID, taskInfos: Seq[TaskInfo]):
>>> Boolean = {
>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>        import scala.collection.JavaConverters._
>>> +      var i = 0
>>> +      for (i <- 0 to taskInfos.length - 1) {
>>> +        import com.googlecode.protobuf.format.JsonFormat
>>> +        import java.io._
>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" +
>>> taskInfos(i).getTaskId.getValue)
>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>> +        bw.write("\n")
>>> +        bw.close()
>>> +      }
>>>        driver.launchTasks(Collections.singleton(offerID),
>>> taskInfos.asJava)
>>>      }
>>
>>
>> Then I built and deployed the hacked Marathon and restarted the marathon
>> service.
>>
>> Next I created the app via the Marathon API ("hello app" is a container
>> with a simple hello-world ruby app running on 0.0.0.0:8000)
>>
>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type:
>>> application/json' -d'
>>> {
>>>   "id": "/app-81-1-hello-app",
>>>   "apps": [
>>>     {
>>>       "id": "/app-81-1-hello-app/web-v11",
>>>       "container": {
>>>         "type": "DOCKER",
>>>         "docker": {
>>>           "image":
>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>           "network": "BRIDGE",
>>>           "portMappings": [
>>>             {
>>>               "containerPort": 8000,
>>>               "hostPort": 0,
>>>               "protocol": "tcp"
>>>             }
>>>           ]
>>>         }
>>>       },
>>>       "env": {
>>>
>>>       },
>>>       "healthChecks": [
>>>         {
>>>           "protocol": "COMMAND",
>>>           "command": {"value": "exit 1"},
>>>           "gracePeriodSeconds": 10,
>>>           "intervalSeconds": 10,
>>>           "timeoutSeconds": 10,
>>>           "maxConsecutiveFailures": 3
>>>         }
>>>       ],
>>>       "instances": 1,
>>>       "cpus": 1,
>>>       "mem": 512
>>>     }
>>>   ]
>>> }
>>
>>
>> $ ls /tmp/
>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>
>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>
>>
>> Do they match?
>>
>> $ md5sum /tmp/task*
>>> 1b5115997e78e2611654059249d99578
>>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>> 1b5115997e78e2611654059249d99578
>>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>
>>
>> Yes, so I am confident this is the information being sent across the wire
>> to Mesos.
>>
>> Do they contain any health-check information?
>>
>> $ cat
>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>> {
>>>   "name":"web-v11.app-81-1-hello-app",
>>>   "task_id":{
>>>
>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>   },
>>>   "slave_id":{
>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>   },
>>>   "resources":[
>>>     {
>>>       "name":"cpus",
>>>       "type":"SCALAR",
>>>       "scalar":{
>>>         "value":1.0
>>>       },
>>>       "role":"*"
>>>     },
>>>     {
>>>       "name":"mem",
>>>       "type":"SCALAR",
>>>       "scalar":{
>>>         "value":512.0
>>>       },
>>>       "role":"*"
>>>     },
>>>     {
>>>       "name":"ports",
>>>       "type":"RANGES",
>>>       "ranges":{
>>>         "range":[
>>>           {
>>>             "begin":31641,
>>>             "end":31641
>>>           }
>>>         ]
>>>       },
>>>       "role":"*"
>>>     }
>>>   ],
>>>   "command":{
>>>     "environment":{
>>>       "variables":[
>>>         {
>>>           "name":"PORT_8000",
>>>           "value":"31641"
>>>         },
>>>         {
>>>           "name":"MARATHON_APP_VERSION",
>>>           "value":"2015-10-07T19:35:08.386Z"
>>>         },
>>>         {
>>>           "name":"HOST",
>>>           "value":"mesos-worker1a"
>>>         },
>>>         {
>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>
>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>         },
>>>         {
>>>           "name":"MESOS_TASK_ID",
>>>
>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>         },
>>>         {
>>>           "name":"PORT",
>>>           "value":"31641"
>>>         },
>>>         {
>>>           "name":"PORTS",
>>>           "value":"31641"
>>>         },
>>>         {
>>>           "name":"MARATHON_APP_ID",
>>>           "value":"/app-81-1-hello-app/web-v11"
>>>         },
>>>         {
>>>           "name":"PORT0",
>>>           "value":"31641"
>>>         }
>>>       ]
>>>     },
>>>     "shell":false
>>>   },
>>>   "container":{
>>>     "type":"DOCKER",
>>>     "docker":{
>>>
>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>       "network":"BRIDGE",
>>>       "port_mappings":[
>>>         {
>>>           "host_port":31641,
>>>           "container_port":8000,
>>>           "protocol":"tcp"
>>>         }
>>>       ],
>>>       "privileged":false,
>>>       "force_pull_image":false
>>>     }
>>>   }
>>> }
>>
>>
>> No, I don't see anything about any health check.
>>
>> Mesos STDOUT for the launched task:
>>
>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>> --logbufsecs="0" --logging_level="INFO"
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>> --stop_timeout="0ns"
>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>> --logbufsecs="0" --logging_level="INFO"
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>> --stop_timeout="0ns"
>>> Registered docker executor on mesos-worker1a
>>> Starting task
>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>
>>
>> And STDERR:
>>
>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave
>>> 20150924-210922-1608624320-5050-1792-S1
>>> WARNING: Your kernel does not support swap limit capabilities, memory
>>> limited without swap.
>>
>>
>> Again, nothing about any health checks.
>>
>> Any ideas of other things to try or what I could be missing?  Can't say
>> either way about the Mesos health-check system working or not if Marathon
>> won't put the health-check into the task it sends to Mesos.
>>
>> Thanks for all your help!
>>
>> Best,
>> Jay
>>
>>
>>
>>>
>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com> wrote:
>>
>>> Maybe you could post your executor stdout/stderr so that we could know
>>> whether health check running not.
>>>
>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com> wrote:
>>>
>>>> marathon also use mesos health check. When I use health check, I could
>>>> saw the log like this in executor stdout.
>>>>
>>>> ```
>>>> Registered docker executor on xxxxx
>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>> Launching health check process:
>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>> Health check process launched at pid: 9895
>>>> Received task health update, healthy: true
>>>> ```
>>>>
>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com>
>>>> wrote:
>>>>
>>>>> I am using my own framework, and the full task info I'm using is
>>>>> posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>> Mesos's health checks for its health check system?
>>>>>
>>>>>
>>>>>
>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>> Yes, launch the health task through its definition in taskinfo. Do you
>>>>> launch your task through Marathon? I could test it in my side.
>>>>>
>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Precisely, and there are none of those statements.  Are you or others
>>>>>> confident health-checks are part of the code path when defined via task
>>>>>> info for docker container tasks?  Going through the code, I wasn't able to
>>>>>> find the linkage for anything other than health-checks triggered through a
>>>>>> custom executor.
>>>>>>
>>>>>> With that being said it is a pretty good sized code base and I'm not
>>>>>> very familiar with it, so my analysis this far has by no means been
>>>>>> exhaustive.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>
>>>>>> When health check launch, it would have a log like this in your
>>>>>> executor stdout
>>>>>> ```
>>>>>> Health check process launched at pid xxx
>>>>>> ```
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I'm happy to try this, however wouldn't there be output in the logs
>>>>>>> with the string "health" or "Health" if the health-check were active?  None
>>>>>>> of my master or slave logs contain the string..
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>
>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could see
>>>>>>> unhealthy status in your task stdout/stderr.
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> My current version is 0.24.1.
>>>>>>>>
>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>
>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>> Are you use one of this version?
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me
>>>>>>>>>> double check.
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look
>>>>>>>>>>> there :)
>>>>>>>>>>>
>>>>>>>>>>> Thanks again!
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>
>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <tim@mesosphere.io
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>
>>>>>>>>>>>>> We just added health check support for docker tasks that's in
>>>>>>>>>>>>> master but not yet released. It will run docker exec with the command you
>>>>>>>>>>>>> provided as health checks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?
>>>>>>>>>>>>> Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>
>>>>>>>>>>>>> {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> I have searched all machines and containers to see if they
>>>>>>>>>>>>> ever run the command (in this case `sleep 5`), but have not found any
>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>>>>>>>> docker tasks?
>>>>>>>>>>>>>
>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
Update:

I used https://github.com/deric/mesos-deb-packaging to compile and package
the latest master (0.26.x) and deployed it to the cluster, and now health
checks are working as advertised in both Marathon and my own framework!
Not sure what was going on with health-checks in 0.24.0..

Anyways, thanks again for your help Haosdent!

Cheers,
Jay

On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <ou...@gmail.com> wrote:

> Hi Haosdent,
>
> Can you share your Marathon POST request that results in Mesos executing
> the health checks?
>
> Since we can reference the Marathon framework, I've been doing some
> digging around.
>
> Here are the details of my setup and findings:
>
> I put a few small hacks in Marathon:
>
> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>
> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X in
> both the TaskFactory as well an right before the task is sent to Mesos via
> driver.launchTasks:
>
> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>
> $ git diff
>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>
>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>> config).buildIfMatches(offer, runningTasks).map {
>>        case (taskInfo, ports) =>
>> +        import com.googlecode.protobuf.format.JsonFormat
>> +        import java.io._
>> +        val bw = new BufferedWriter(new FileWriter(new
>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>> +        bw.write(JsonFormat.printToString(taskInfo))
>> +        bw.write("\n")
>> +        bw.close()
>>          CreatedTask(
>>            taskInfo,
>>            MarathonTasks.makeTask(
>
>
>
> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>
> $ git diff
>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>    override def launchTasks(offerID: OfferID, taskInfos: Seq[TaskInfo]):
>> Boolean = {
>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>        import scala.collection.JavaConverters._
>> +      var i = 0
>> +      for (i <- 0 to taskInfos.length - 1) {
>> +        import com.googlecode.protobuf.format.JsonFormat
>> +        import java.io._
>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" +
>> taskInfos(i).getTaskId.getValue)
>> +        val bw = new BufferedWriter(new FileWriter(file))
>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>> +        bw.write("\n")
>> +        bw.close()
>> +      }
>>        driver.launchTasks(Collections.singleton(offerID),
>> taskInfos.asJava)
>>      }
>
>
> Then I built and deployed the hacked Marathon and restarted the marathon
> service.
>
> Next I created the app via the Marathon API ("hello app" is a container
> with a simple hello-world ruby app running on 0.0.0.0:8000)
>
> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type:
>> application/json' -d'
>> {
>>   "id": "/app-81-1-hello-app",
>>   "apps": [
>>     {
>>       "id": "/app-81-1-hello-app/web-v11",
>>       "container": {
>>         "type": "DOCKER",
>>         "docker": {
>>           "image":
>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>           "network": "BRIDGE",
>>           "portMappings": [
>>             {
>>               "containerPort": 8000,
>>               "hostPort": 0,
>>               "protocol": "tcp"
>>             }
>>           ]
>>         }
>>       },
>>       "env": {
>>
>>       },
>>       "healthChecks": [
>>         {
>>           "protocol": "COMMAND",
>>           "command": {"value": "exit 1"},
>>           "gracePeriodSeconds": 10,
>>           "intervalSeconds": 10,
>>           "timeoutSeconds": 10,
>>           "maxConsecutiveFailures": 3
>>         }
>>       ],
>>       "instances": 1,
>>       "cpus": 1,
>>       "mem": 512
>>     }
>>   ]
>> }
>
>
> $ ls /tmp/
>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>
>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>
>
> Do they match?
>
> $ md5sum /tmp/task*
>> 1b5115997e78e2611654059249d99578
>>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>> 1b5115997e78e2611654059249d99578
>>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>
>
> Yes, so I am confident this is the information being sent across the wire
> to Mesos.
>
> Do they contain any health-check information?
>
> $ cat
>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>> {
>>   "name":"web-v11.app-81-1-hello-app",
>>   "task_id":{
>>
>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>   },
>>   "slave_id":{
>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>   },
>>   "resources":[
>>     {
>>       "name":"cpus",
>>       "type":"SCALAR",
>>       "scalar":{
>>         "value":1.0
>>       },
>>       "role":"*"
>>     },
>>     {
>>       "name":"mem",
>>       "type":"SCALAR",
>>       "scalar":{
>>         "value":512.0
>>       },
>>       "role":"*"
>>     },
>>     {
>>       "name":"ports",
>>       "type":"RANGES",
>>       "ranges":{
>>         "range":[
>>           {
>>             "begin":31641,
>>             "end":31641
>>           }
>>         ]
>>       },
>>       "role":"*"
>>     }
>>   ],
>>   "command":{
>>     "environment":{
>>       "variables":[
>>         {
>>           "name":"PORT_8000",
>>           "value":"31641"
>>         },
>>         {
>>           "name":"MARATHON_APP_VERSION",
>>           "value":"2015-10-07T19:35:08.386Z"
>>         },
>>         {
>>           "name":"HOST",
>>           "value":"mesos-worker1a"
>>         },
>>         {
>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>
>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>         },
>>         {
>>           "name":"MESOS_TASK_ID",
>>
>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>         },
>>         {
>>           "name":"PORT",
>>           "value":"31641"
>>         },
>>         {
>>           "name":"PORTS",
>>           "value":"31641"
>>         },
>>         {
>>           "name":"MARATHON_APP_ID",
>>           "value":"/app-81-1-hello-app/web-v11"
>>         },
>>         {
>>           "name":"PORT0",
>>           "value":"31641"
>>         }
>>       ]
>>     },
>>     "shell":false
>>   },
>>   "container":{
>>     "type":"DOCKER",
>>     "docker":{
>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>       "network":"BRIDGE",
>>       "port_mappings":[
>>         {
>>           "host_port":31641,
>>           "container_port":8000,
>>           "protocol":"tcp"
>>         }
>>       ],
>>       "privileged":false,
>>       "force_pull_image":false
>>     }
>>   }
>> }
>
>
> No, I don't see anything about any health check.
>
> Mesos STDOUT for the launched task:
>
> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>> --docker="docker" --help="false" --initialize_driver_logging="true"
>> --logbufsecs="0" --logging_level="INFO"
>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>> --stop_timeout="0ns"
>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>> --docker="docker" --help="false" --initialize_driver_logging="true"
>> --logbufsecs="0" --logging_level="INFO"
>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>> --stop_timeout="0ns"
>> Registered docker executor on mesos-worker1a
>> Starting task
>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>
>
> And STDERR:
>
> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave
>> 20150924-210922-1608624320-5050-1792-S1
>> WARNING: Your kernel does not support swap limit capabilities, memory
>> limited without swap.
>
>
> Again, nothing about any health checks.
>
> Any ideas of other things to try or what I could be missing?  Can't say
> either way about the Mesos health-check system working or not if Marathon
> won't put the health-check into the task it sends to Mesos.
>
> Thanks for all your help!
>
> Best,
> Jay
>
>
>
>>
> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com> wrote:
>
>> Maybe you could post your executor stdout/stderr so that we could know
>> whether health check running not.
>>
>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com> wrote:
>>
>>> marathon also use mesos health check. When I use health check, I could
>>> saw the log like this in executor stdout.
>>>
>>> ```
>>> Registered docker executor on xxxxx
>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>> Launching health check process:
>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>> Health check process launched at pid: 9895
>>> Received task health update, healthy: true
>>> ```
>>>
>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>
>>>> I am using my own framework, and the full task info I'm using is posted
>>>> earlier in this thread.  Do you happen to know if Marathon uses Mesos's
>>>> health checks for its health check system?
>>>>
>>>>
>>>>
>>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>>
>>>> Yes, launch the health task through its definition in taskinfo. Do you
>>>> launch your task through Marathon? I could test it in my side.
>>>>
>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com>
>>>> wrote:
>>>>
>>>>> Precisely, and there are none of those statements.  Are you or others
>>>>> confident health-checks are part of the code path when defined via task
>>>>> info for docker container tasks?  Going through the code, I wasn't able to
>>>>> find the linkage for anything other than health-checks triggered through a
>>>>> custom executor.
>>>>>
>>>>> With that being said it is a pretty good sized code base and I'm not
>>>>> very familiar with it, so my analysis this far has by no means been
>>>>> exhaustive.
>>>>>
>>>>>
>>>>>
>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>> When health check launch, it would have a log like this in your
>>>>> executor stdout
>>>>> ```
>>>>> Health check process launched at pid xxx
>>>>> ```
>>>>>
>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I'm happy to try this, however wouldn't there be output in the logs
>>>>>> with the string "health" or "Health" if the health-check were active?  None
>>>>>> of my master or slave logs contain the string..
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>
>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could see
>>>>>> unhealthy status in your task stdout/stderr.
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> My current version is 0.24.1.
>>>>>>>
>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>>
>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>
>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>> Are you use one of this version?
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double
>>>>>>>>> check.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look there
>>>>>>>>>> :)
>>>>>>>>>>
>>>>>>>>>> Thanks again!
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>
>>>>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>
>>>>>>>>>>>> We just added health check support for docker tasks that's in
>>>>>>>>>>>> master but not yet released. It will run docker exec with the command you
>>>>>>>>>>>> provided as health checks.
>>>>>>>>>>>>
>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>
>>>>>>>>>>>> Tim
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos
>>>>>>>>>>>> seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>
>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>
>>>>>>>>>>>> {
>>>>>>>>>>>>>
>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>
>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>
>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>
>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>
>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>
>>>>>>>>>>>>>     {
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>
>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>
>>>>>>>>>>>>>       }
>>>>>>>>>>>>>
>>>>>>>>>>>>>     },
>>>>>>>>>>>>>
>>>>>>>>>>>>>     {
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>
>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>
>>>>>>>>>>>>>       }
>>>>>>>>>>>>>
>>>>>>>>>>>>>     },
>>>>>>>>>>>>>
>>>>>>>>>>>>>     {
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>
>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>
>>>>>>>>>>>>>           {
>>>>>>>>>>>>>
>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>
>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>
>>>>>>>>>>>>>           }
>>>>>>>>>>>>>
>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>
>>>>>>>>>>>>>       }
>>>>>>>>>>>>>
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>
>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>
>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>
>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>
>>>>>>>>>>>>>     },
>>>>>>>>>>>>>
>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>
>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>
>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>
>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>
>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>
>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>
>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>
>>>>>>>>>>>>>         }
>>>>>>>>>>>>>
>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>
>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>
>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>
>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>
>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>
>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>
>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>
>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>
>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> I have searched all machines and containers to see if they ever
>>>>>>>>>>>> run the command (in this case `sleep 5`), but have not found any indication
>>>>>>>>>>>> that it is being executed.
>>>>>>>>>>>>
>>>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>>>>>>> docker tasks?
>>>>>>>>>>>>
>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Jay
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
Hi Haosdent,

Can you share your Marathon POST request that results in Mesos executing
the health checks?

Since we can reference the Marathon framework, I've been doing some digging
around.

Here are the details of my setup and findings:

I put a few small hacks in Marathon:

(1) Added com.googlecode.protobuf.format to Marathon's dependencies

(2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X in
both the TaskFactory as well an right before the task is sent to Mesos via
driver.launchTasks:

src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:

$ git diff src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>
>      new TaskBuilder(app, taskIdUtil.newTaskId,
> config).buildIfMatches(offer, runningTasks).map {
>        case (taskInfo, ports) =>
> +        import com.googlecode.protobuf.format.JsonFormat
> +        import java.io._
> +        val bw = new BufferedWriter(new FileWriter(new
> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
> +        bw.write(JsonFormat.printToString(taskInfo))
> +        bw.write("\n")
> +        bw.close()
>          CreatedTask(
>            taskInfo,
>            MarathonTasks.makeTask(


src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:

$ git diff
> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>    override def launchTasks(offerID: OfferID, taskInfos: Seq[TaskInfo]):
> Boolean = {
>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>        import scala.collection.JavaConverters._
> +      var i = 0
> +      for (i <- 0 to taskInfos.length - 1) {
> +        import com.googlecode.protobuf.format.JsonFormat
> +        import java.io._
> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" +
> taskInfos(i).getTaskId.getValue)
> +        val bw = new BufferedWriter(new FileWriter(file))
> +        bw.write(JsonFormat.printToString(taskInfos(i)))
> +        bw.write("\n")
> +        bw.close()
> +      }
>        driver.launchTasks(Collections.singleton(offerID), taskInfos.asJava)
>      }


Then I built and deployed the hacked Marathon and restarted the marathon
service.

Next I created the app via the Marathon API ("hello app" is a container
with a simple hello-world ruby app running on 0.0.0.0:8000)

curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type:
> application/json' -d'
> {
>   "id": "/app-81-1-hello-app",
>   "apps": [
>     {
>       "id": "/app-81-1-hello-app/web-v11",
>       "container": {
>         "type": "DOCKER",
>         "docker": {
>           "image":
> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>           "network": "BRIDGE",
>           "portMappings": [
>             {
>               "containerPort": 8000,
>               "hostPort": 0,
>               "protocol": "tcp"
>             }
>           ]
>         }
>       },
>       "env": {
>
>       },
>       "healthChecks": [
>         {
>           "protocol": "COMMAND",
>           "command": {"value": "exit 1"},
>           "gracePeriodSeconds": 10,
>           "intervalSeconds": 10,
>           "timeoutSeconds": 10,
>           "maxConsecutiveFailures": 3
>         }
>       ],
>       "instances": 1,
>       "cpus": 1,
>       "mem": 512
>     }
>   ]
> }


$ ls /tmp/
> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0


Do they match?

$ md5sum /tmp/task*
> 1b5115997e78e2611654059249d99578
>  /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
> 1b5115997e78e2611654059249d99578
>  /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0


Yes, so I am confident this is the information being sent across the wire
to Mesos.

Do they contain any health-check information?

$ cat
> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
> {
>   "name":"web-v11.app-81-1-hello-app",
>   "task_id":{
>
> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>   },
>   "slave_id":{
>     "value":"20150924-210922-1608624320-5050-1792-S1"
>   },
>   "resources":[
>     {
>       "name":"cpus",
>       "type":"SCALAR",
>       "scalar":{
>         "value":1.0
>       },
>       "role":"*"
>     },
>     {
>       "name":"mem",
>       "type":"SCALAR",
>       "scalar":{
>         "value":512.0
>       },
>       "role":"*"
>     },
>     {
>       "name":"ports",
>       "type":"RANGES",
>       "ranges":{
>         "range":[
>           {
>             "begin":31641,
>             "end":31641
>           }
>         ]
>       },
>       "role":"*"
>     }
>   ],
>   "command":{
>     "environment":{
>       "variables":[
>         {
>           "name":"PORT_8000",
>           "value":"31641"
>         },
>         {
>           "name":"MARATHON_APP_VERSION",
>           "value":"2015-10-07T19:35:08.386Z"
>         },
>         {
>           "name":"HOST",
>           "value":"mesos-worker1a"
>         },
>         {
>           "name":"MARATHON_APP_DOCKER_IMAGE",
>
> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>         },
>         {
>           "name":"MESOS_TASK_ID",
>
> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>         },
>         {
>           "name":"PORT",
>           "value":"31641"
>         },
>         {
>           "name":"PORTS",
>           "value":"31641"
>         },
>         {
>           "name":"MARATHON_APP_ID",
>           "value":"/app-81-1-hello-app/web-v11"
>         },
>         {
>           "name":"PORT0",
>           "value":"31641"
>         }
>       ]
>     },
>     "shell":false
>   },
>   "container":{
>     "type":"DOCKER",
>     "docker":{
>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>       "network":"BRIDGE",
>       "port_mappings":[
>         {
>           "host_port":31641,
>           "container_port":8000,
>           "protocol":"tcp"
>         }
>       ],
>       "privileged":false,
>       "force_pull_image":false
>     }
>   }
> }


No, I don't see anything about any health check.

Mesos STDOUT for the launched task:

--container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
> --docker="docker" --help="false" --initialize_driver_logging="true"
> --logbufsecs="0" --logging_level="INFO"
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
> --stop_timeout="0ns"
> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
> --docker="docker" --help="false" --initialize_driver_logging="true"
> --logbufsecs="0" --logging_level="INFO"
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
> --stop_timeout="0ns"
> Registered docker executor on mesos-worker1a
> Starting task
> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0


And STDERR:

I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave
> 20150924-210922-1608624320-5050-1792-S1
> WARNING: Your kernel does not support swap limit capabilities, memory
> limited without swap.


Again, nothing about any health checks.

Any ideas of other things to try or what I could be missing?  Can't say
either way about the Mesos health-check system working or not if Marathon
won't put the health-check into the task it sends to Mesos.

Thanks for all your help!

Best,
Jay



>
On Tue, Oct 6, 2015 at 11:24 PM, haosdent <ha...@gmail.com> wrote:

> Maybe you could post your executor stdout/stderr so that we could know
> whether health check running not.
>
> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com> wrote:
>
>> marathon also use mesos health check. When I use health check, I could
>> saw the log like this in executor stdout.
>>
>> ```
>> Registered docker executor on xxxxx
>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>> Launching health check process:
>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>> Health check process launched at pid: 9895
>> Received task health update, healthy: true
>> ```
>>
>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> I am using my own framework, and the full task info I'm using is posted
>>> earlier in this thread.  Do you happen to know if Marathon uses Mesos's
>>> health checks for its health check system?
>>>
>>>
>>>
>>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>>
>>> Yes, launch the health task through its definition in taskinfo. Do you
>>> launch your task through Marathon? I could test it in my side.
>>>
>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>
>>>> Precisely, and there are none of those statements.  Are you or others
>>>> confident health-checks are part of the code path when defined via task
>>>> info for docker container tasks?  Going through the code, I wasn't able to
>>>> find the linkage for anything other than health-checks triggered through a
>>>> custom executor.
>>>>
>>>> With that being said it is a pretty good sized code base and I'm not
>>>> very familiar with it, so my analysis this far has by no means been
>>>> exhaustive.
>>>>
>>>>
>>>>
>>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>>
>>>> When health check launch, it would have a log like this in your
>>>> executor stdout
>>>> ```
>>>> Health check process launched at pid xxx
>>>> ```
>>>>
>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com>
>>>> wrote:
>>>>
>>>>> I'm happy to try this, however wouldn't there be output in the logs
>>>>> with the string "health" or "Health" if the health-check were active?  None
>>>>> of my master or slave logs contain the string..
>>>>>
>>>>>
>>>>>
>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could see
>>>>> unhealthy status in your task stdout/stderr.
>>>>>
>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> My current version is 0.24.1.
>>>>>>
>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>
>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>
>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>> Are you use one of this version?
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double
>>>>>>>> check.
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look there
>>>>>>>>> :)
>>>>>>>>>
>>>>>>>>> Thanks again!
>>>>>>>>>
>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>
>>>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>
>>>>>>>>>>> We just added health check support for docker tasks that's in
>>>>>>>>>>> master but not yet released. It will run docker exec with the command you
>>>>>>>>>>> provided as health checks.
>>>>>>>>>>>
>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>> Tim
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos
>>>>>>>>>>> seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>
>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>
>>>>>>>>>>> {
>>>>>>>>>>>>
>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>
>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>
>>>>>>>>>>>>   },
>>>>>>>>>>>>
>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>
>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>
>>>>>>>>>>>>   },
>>>>>>>>>>>>
>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>
>>>>>>>>>>>>     {
>>>>>>>>>>>>
>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>
>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>
>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>
>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>
>>>>>>>>>>>>       }
>>>>>>>>>>>>
>>>>>>>>>>>>     },
>>>>>>>>>>>>
>>>>>>>>>>>>     {
>>>>>>>>>>>>
>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>
>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>
>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>
>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>
>>>>>>>>>>>>       }
>>>>>>>>>>>>
>>>>>>>>>>>>     },
>>>>>>>>>>>>
>>>>>>>>>>>>     {
>>>>>>>>>>>>
>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>
>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>
>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>
>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>
>>>>>>>>>>>>           {
>>>>>>>>>>>>
>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>
>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>
>>>>>>>>>>>>           }
>>>>>>>>>>>>
>>>>>>>>>>>>         ]
>>>>>>>>>>>>
>>>>>>>>>>>>       }
>>>>>>>>>>>>
>>>>>>>>>>>>     }
>>>>>>>>>>>>
>>>>>>>>>>>>   ],
>>>>>>>>>>>>
>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>
>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>
>>>>>>>>>>>>     },
>>>>>>>>>>>>
>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>
>>>>>>>>>>>>   },
>>>>>>>>>>>>
>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>
>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>
>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>
>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>
>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>
>>>>>>>>>>>>         {
>>>>>>>>>>>>
>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>
>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>
>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>
>>>>>>>>>>>>         }
>>>>>>>>>>>>
>>>>>>>>>>>>       ],
>>>>>>>>>>>>
>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>
>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>
>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>
>>>>>>>>>>>>     }
>>>>>>>>>>>>
>>>>>>>>>>>>   },
>>>>>>>>>>>>
>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>
>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>
>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>
>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>
>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>
>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>
>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>
>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>
>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>
>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>
>>>>>>>>>>>>     }
>>>>>>>>>>>>
>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> I have searched all machines and containers to see if they ever
>>>>>>>>>>> run the command (in this case `sleep 5`), but have not found any indication
>>>>>>>>>>> that it is being executed.
>>>>>>>>>>>
>>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>>>>>> docker tasks?
>>>>>>>>>>>
>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> Jay
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
Maybe you could post your executor stdout/stderr so that we could know
whether health check running not.

On Wed, Oct 7, 2015 at 2:15 PM, haosdent <ha...@gmail.com> wrote:

> marathon also use mesos health check. When I use health check, I could saw
> the log like this in executor stdout.
>
> ```
> Registered docker executor on xxxxx
> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
> Launching health check process:
> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
> Health check process launched at pid: 9895
> Received task health update, healthy: true
> ```
>
> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com> wrote:
>
>> I am using my own framework, and the full task info I'm using is posted
>> earlier in this thread.  Do you happen to know if Marathon uses Mesos's
>> health checks for its health check system?
>>
>>
>>
>> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>>
>> Yes, launch the health task through its definition in taskinfo. Do you
>> launch your task through Marathon? I could test it in my side.
>>
>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> Precisely, and there are none of those statements.  Are you or others
>>> confident health-checks are part of the code path when defined via task
>>> info for docker container tasks?  Going through the code, I wasn't able to
>>> find the linkage for anything other than health-checks triggered through a
>>> custom executor.
>>>
>>> With that being said it is a pretty good sized code base and I'm not
>>> very familiar with it, so my analysis this far has by no means been
>>> exhaustive.
>>>
>>>
>>>
>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>>
>>> When health check launch, it would have a log like this in your executor
>>> stdout
>>> ```
>>> Health check process launched at pid xxx
>>> ```
>>>
>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>
>>>> I'm happy to try this, however wouldn't there be output in the logs
>>>> with the string "health" or "Health" if the health-check were active?  None
>>>> of my master or slave logs contain the string..
>>>>
>>>>
>>>>
>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>>
>>>> Could you use "exit 1" instead of "sleep 5" to see whether could see
>>>> unhealthy status in your task stdout/stderr.
>>>>
>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com>
>>>> wrote:
>>>>
>>>>> My current version is 0.24.1.
>>>>>
>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>
>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>> Are you use one of this version?
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:
>>>>>>
>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double
>>>>>>> check.
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>>>>>>
>>>>>>>> Thanks again!
>>>>>>>>
>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>
>>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>>
>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Jay,
>>>>>>>>>>
>>>>>>>>>> We just added health check support for docker tasks that's in
>>>>>>>>>> master but not yet released. It will run docker exec with the command you
>>>>>>>>>> provided as health checks.
>>>>>>>>>>
>>>>>>>>>> It should be in the next release.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> Tim
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos
>>>>>>>>>> seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>
>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>
>>>>>>>>>> {
>>>>>>>>>>>
>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>
>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>
>>>>>>>>>>>   },
>>>>>>>>>>>
>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>
>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>
>>>>>>>>>>>   },
>>>>>>>>>>>
>>>>>>>>>>>   "resources":[
>>>>>>>>>>>
>>>>>>>>>>>     {
>>>>>>>>>>>
>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>
>>>>>>>>>>>       "type":0,
>>>>>>>>>>>
>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>
>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>
>>>>>>>>>>>       }
>>>>>>>>>>>
>>>>>>>>>>>     },
>>>>>>>>>>>
>>>>>>>>>>>     {
>>>>>>>>>>>
>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>
>>>>>>>>>>>       "type":0,
>>>>>>>>>>>
>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>
>>>>>>>>>>>         "value":256
>>>>>>>>>>>
>>>>>>>>>>>       }
>>>>>>>>>>>
>>>>>>>>>>>     },
>>>>>>>>>>>
>>>>>>>>>>>     {
>>>>>>>>>>>
>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>
>>>>>>>>>>>       "type":1,
>>>>>>>>>>>
>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>
>>>>>>>>>>>         "range":[
>>>>>>>>>>>
>>>>>>>>>>>           {
>>>>>>>>>>>
>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>
>>>>>>>>>>>             "end":31002
>>>>>>>>>>>
>>>>>>>>>>>           }
>>>>>>>>>>>
>>>>>>>>>>>         ]
>>>>>>>>>>>
>>>>>>>>>>>       }
>>>>>>>>>>>
>>>>>>>>>>>     }
>>>>>>>>>>>
>>>>>>>>>>>   ],
>>>>>>>>>>>
>>>>>>>>>>>   "command":{
>>>>>>>>>>>
>>>>>>>>>>>     "container":{
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>
>>>>>>>>>>>     },
>>>>>>>>>>>
>>>>>>>>>>>     "shell":false
>>>>>>>>>>>
>>>>>>>>>>>   },
>>>>>>>>>>>
>>>>>>>>>>>   "container":{
>>>>>>>>>>>
>>>>>>>>>>>     "type":1,
>>>>>>>>>>>
>>>>>>>>>>>     "docker":{
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>
>>>>>>>>>>>       "network":2,
>>>>>>>>>>>
>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>
>>>>>>>>>>>         {
>>>>>>>>>>>
>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>
>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>
>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>
>>>>>>>>>>>         }
>>>>>>>>>>>
>>>>>>>>>>>       ],
>>>>>>>>>>>
>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>
>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>
>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>
>>>>>>>>>>>     }
>>>>>>>>>>>
>>>>>>>>>>>   },
>>>>>>>>>>>
>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>
>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>
>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>
>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>
>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>
>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>
>>>>>>>>>>>     "command":{
>>>>>>>>>>>
>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>
>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>
>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>
>>>>>>>>>>>     }
>>>>>>>>>>>
>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> I have searched all machines and containers to see if they ever
>>>>>>>>>> run the command (in this case `sleep 5`), but have not found any indication
>>>>>>>>>> that it is being executed.
>>>>>>>>>>
>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>>>>> docker tasks?
>>>>>>>>>>
>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Jay
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
marathon also use mesos health check. When I use health check, I could saw
the log like this in executor stdout.

```
Registered docker executor on xxxxx
Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
Launching health check process:
/home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
Health check process launched at pid: 9895
Received task health update, healthy: true
```

On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <ou...@gmail.com> wrote:

> I am using my own framework, and the full task info I'm using is posted
> earlier in this thread.  Do you happen to know if Marathon uses Mesos's
> health checks for its health check system?
>
>
>
> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>
> Yes, launch the health task through its definition in taskinfo. Do you
> launch your task through Marathon? I could test it in my side.
>
> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>
>> Precisely, and there are none of those statements.  Are you or others
>> confident health-checks are part of the code path when defined via task
>> info for docker container tasks?  Going through the code, I wasn't able to
>> find the linkage for anything other than health-checks triggered through a
>> custom executor.
>>
>> With that being said it is a pretty good sized code base and I'm not very
>> familiar with it, so my analysis this far has by no means been exhaustive.
>>
>>
>>
>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>
>> When health check launch, it would have a log like this in your executor
>> stdout
>> ```
>> Health check process launched at pid xxx
>> ```
>>
>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> I'm happy to try this, however wouldn't there be output in the logs with
>>> the string "health" or "Health" if the health-check were active?  None of
>>> my master or slave logs contain the string..
>>>
>>>
>>>
>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>
>>> Could you use "exit 1" instead of "sleep 5" to see whether could see
>>> unhealthy status in your task stdout/stderr.
>>>
>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>
>>>> My current version is 0.24.1.
>>>>
>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>
>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>
>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>> Are you use one of this version?
>>>>>
>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double
>>>>>> check.
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>>>>>
>>>>>>> Thanks again!
>>>>>>>
>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>
>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>
>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Jay,
>>>>>>>>>
>>>>>>>>> We just added health check support for docker tasks that's in
>>>>>>>>> master but not yet released. It will run docker exec with the command you
>>>>>>>>> provided as health checks.
>>>>>>>>>
>>>>>>>>> It should be in the next release.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Tim
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos
>>>>>>>>> seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>
>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>>>
>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>
>>>>>>>>>>   "task_id":{
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>
>>>>>>>>>>   },
>>>>>>>>>>
>>>>>>>>>>   "slave_id":{
>>>>>>>>>>
>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>
>>>>>>>>>>   },
>>>>>>>>>>
>>>>>>>>>>   "resources":[
>>>>>>>>>>
>>>>>>>>>>     {
>>>>>>>>>>
>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>
>>>>>>>>>>       "type":0,
>>>>>>>>>>
>>>>>>>>>>       "scalar":{
>>>>>>>>>>
>>>>>>>>>>         "value":0.1
>>>>>>>>>>
>>>>>>>>>>       }
>>>>>>>>>>
>>>>>>>>>>     },
>>>>>>>>>>
>>>>>>>>>>     {
>>>>>>>>>>
>>>>>>>>>>       "name":"mem",
>>>>>>>>>>
>>>>>>>>>>       "type":0,
>>>>>>>>>>
>>>>>>>>>>       "scalar":{
>>>>>>>>>>
>>>>>>>>>>         "value":256
>>>>>>>>>>
>>>>>>>>>>       }
>>>>>>>>>>
>>>>>>>>>>     },
>>>>>>>>>>
>>>>>>>>>>     {
>>>>>>>>>>
>>>>>>>>>>       "name":"ports",
>>>>>>>>>>
>>>>>>>>>>       "type":1,
>>>>>>>>>>
>>>>>>>>>>       "ranges":{
>>>>>>>>>>
>>>>>>>>>>         "range":[
>>>>>>>>>>
>>>>>>>>>>           {
>>>>>>>>>>
>>>>>>>>>>             "begin":31002,
>>>>>>>>>>
>>>>>>>>>>             "end":31002
>>>>>>>>>>
>>>>>>>>>>           }
>>>>>>>>>>
>>>>>>>>>>         ]
>>>>>>>>>>
>>>>>>>>>>       }
>>>>>>>>>>
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>>   ],
>>>>>>>>>>
>>>>>>>>>>   "command":{
>>>>>>>>>>
>>>>>>>>>>     "container":{
>>>>>>>>>>
>>>>>>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>
>>>>>>>>>>     },
>>>>>>>>>>
>>>>>>>>>>     "shell":false
>>>>>>>>>>
>>>>>>>>>>   },
>>>>>>>>>>
>>>>>>>>>>   "container":{
>>>>>>>>>>
>>>>>>>>>>     "type":1,
>>>>>>>>>>
>>>>>>>>>>     "docker":{
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>
>>>>>>>>>>       "network":2,
>>>>>>>>>>
>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>
>>>>>>>>>>         {
>>>>>>>>>>
>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>
>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>
>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>
>>>>>>>>>>         }
>>>>>>>>>>
>>>>>>>>>>       ],
>>>>>>>>>>
>>>>>>>>>>       "privileged":false,
>>>>>>>>>>
>>>>>>>>>>       "parameters":[],
>>>>>>>>>>
>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>>   },
>>>>>>>>>>
>>>>>>>>>>   "health_check":{
>>>>>>>>>>
>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>
>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>
>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>
>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>
>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>
>>>>>>>>>>     "command":{
>>>>>>>>>>
>>>>>>>>>>       "shell":true,
>>>>>>>>>>
>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>
>>>>>>>>>>       "user":"root"
>>>>>>>>>>
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> I have searched all machines and containers to see if they ever
>>>>>>>>> run the command (in this case `sleep 5`), but have not found any indication
>>>>>>>>> that it is being executed.
>>>>>>>>>
>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>>>> docker tasks?
>>>>>>>>>
>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Jay
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>
>


-- 
Best Regards,
Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
I am using my own framework, and the full task info I'm using is posted earlier in this thread.  Do you happen to know if Marathon uses Mesos's health checks for its health check system?



> On Oct 6, 2015, at 9:01 PM, haosdent <ha...@gmail.com> wrote:
> 
> Yes, launch the health task through its definition in taskinfo. Do you launch your task through Marathon? I could test it in my side.
> 
>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com> wrote:
>> Precisely, and there are none of those statements.  Are you or others confident health-checks are part of the code path when defined via task info for docker container tasks?  Going through the code, I wasn't able to find the linkage for anything other than health-checks triggered through a custom executor.
>> 
>> With that being said it is a pretty good sized code base and I'm not very familiar with it, so my analysis this far has by no means been exhaustive.
>> 
>> 
>> 
>>> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>>> 
>>> When health check launch, it would have a log like this in your executor stdout
>>> ```
>>> Health check process launched at pid xxx
>>> ```
>>> 
>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>> I'm happy to try this, however wouldn't there be output in the logs with the string "health" or "Health" if the health-check were active?  None of my master or slave logs contain the string..
>>>> 
>>>> 
>>>> 
>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>>>> 
>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could see unhealthy status in your task stdout/stderr.
>>>>> 
>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>> My current version is 0.24.1.
>>>>>> 
>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>> Are you use one of this version?
>>>>>>> 
>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:
>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double check.
>>>>>>>> 
>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>>>>>>> 
>>>>>>>>> Thanks again!
>>>>>>>>> 
>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>> 
>>>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>>> 
>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io> wrote:
>>>>>>>>>>> Hi Jay, 
>>>>>>>>>>> 
>>>>>>>>>>> We just added health check support for docker tasks that's in master but not yet released. It will run docker exec with the command you provided as health checks.
>>>>>>>>>>> 
>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks!
>>>>>>>>>>> 
>>>>>>>>>>> Tim
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>> 
>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>> 
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>> }
>>>>>>>>>>>> 
>>>>>>>>>>>> I have searched all machines and containers to see if they ever run the command (in this case `sleep 5`), but have not found any indication that it is being executed.
>>>>>>>>>>>> 
>>>>>>>>>>>> In the mesos src code the health-checks are invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this mean that health-checks are only supported for custom executors and not for docker tasks?
>>>>>>>>>>>> 
>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Jay
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Best Regards,
>>>>> Haosdent Huang
>>> 
>>> 
>>> 
>>> -- 
>>> Best Regards,
>>> Haosdent Huang
> 
> 
> 
> -- 
> Best Regards,
> Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
Yes, launch the health task through its definition in taskinfo. Do you
launch your task through Marathon? I could test it in my side.

On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <ou...@gmail.com> wrote:

> Precisely, and there are none of those statements.  Are you or others
> confident health-checks are part of the code path when defined via task
> info for docker container tasks?  Going through the code, I wasn't able to
> find the linkage for anything other than health-checks triggered through a
> custom executor.
>
> With that being said it is a pretty good sized code base and I'm not very
> familiar with it, so my analysis this far has by no means been exhaustive.
>
>
>
> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
>
> When health check launch, it would have a log like this in your executor
> stdout
> ```
> Health check process launched at pid xxx
> ```
>
> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com> wrote:
>
>> I'm happy to try this, however wouldn't there be output in the logs with
>> the string "health" or "Health" if the health-check were active?  None of
>> my master or slave logs contain the string..
>>
>>
>>
>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>
>> Could you use "exit 1" instead of "sleep 5" to see whether could see
>> unhealthy status in your task stdout/stderr.
>>
>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> My current version is 0.24.1.
>>>
>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>>>
>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>
>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>> Are you use one of this version?
>>>>
>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:
>>>>
>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double
>>>>> check.
>>>>>
>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>>>>
>>>>>> Thanks again!
>>>>>>
>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>>>
>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>
>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>
>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Jay,
>>>>>>>>
>>>>>>>> We just added health check support for docker tasks that's in
>>>>>>>> master but not yet released. It will run docker exec with the command you
>>>>>>>> provided as health checks.
>>>>>>>>
>>>>>>>> It should be in the next release.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Tim
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos
>>>>>>>> seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>
>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>
>>>>>>>> {
>>>>>>>>>
>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>
>>>>>>>>>   "task_id":{
>>>>>>>>>
>>>>>>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>
>>>>>>>>>   },
>>>>>>>>>
>>>>>>>>>   "slave_id":{
>>>>>>>>>
>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>
>>>>>>>>>   },
>>>>>>>>>
>>>>>>>>>   "resources":[
>>>>>>>>>
>>>>>>>>>     {
>>>>>>>>>
>>>>>>>>>       "name":"cpus",
>>>>>>>>>
>>>>>>>>>       "type":0,
>>>>>>>>>
>>>>>>>>>       "scalar":{
>>>>>>>>>
>>>>>>>>>         "value":0.1
>>>>>>>>>
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>>     },
>>>>>>>>>
>>>>>>>>>     {
>>>>>>>>>
>>>>>>>>>       "name":"mem",
>>>>>>>>>
>>>>>>>>>       "type":0,
>>>>>>>>>
>>>>>>>>>       "scalar":{
>>>>>>>>>
>>>>>>>>>         "value":256
>>>>>>>>>
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>>     },
>>>>>>>>>
>>>>>>>>>     {
>>>>>>>>>
>>>>>>>>>       "name":"ports",
>>>>>>>>>
>>>>>>>>>       "type":1,
>>>>>>>>>
>>>>>>>>>       "ranges":{
>>>>>>>>>
>>>>>>>>>         "range":[
>>>>>>>>>
>>>>>>>>>           {
>>>>>>>>>
>>>>>>>>>             "begin":31002,
>>>>>>>>>
>>>>>>>>>             "end":31002
>>>>>>>>>
>>>>>>>>>           }
>>>>>>>>>
>>>>>>>>>         ]
>>>>>>>>>
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>   ],
>>>>>>>>>
>>>>>>>>>   "command":{
>>>>>>>>>
>>>>>>>>>     "container":{
>>>>>>>>>
>>>>>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>
>>>>>>>>>     },
>>>>>>>>>
>>>>>>>>>     "shell":false
>>>>>>>>>
>>>>>>>>>   },
>>>>>>>>>
>>>>>>>>>   "container":{
>>>>>>>>>
>>>>>>>>>     "type":1,
>>>>>>>>>
>>>>>>>>>     "docker":{
>>>>>>>>>
>>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>
>>>>>>>>>       "network":2,
>>>>>>>>>
>>>>>>>>>       "port_mappings":[
>>>>>>>>>
>>>>>>>>>         {
>>>>>>>>>
>>>>>>>>>           "host_port":31002,
>>>>>>>>>
>>>>>>>>>           "container_port":8000,
>>>>>>>>>
>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>       ],
>>>>>>>>>
>>>>>>>>>       "privileged":false,
>>>>>>>>>
>>>>>>>>>       "parameters":[],
>>>>>>>>>
>>>>>>>>>       "force_pull_image":false
>>>>>>>>>
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>   },
>>>>>>>>>
>>>>>>>>>   "health_check":{
>>>>>>>>>
>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>
>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>
>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>
>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>
>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>
>>>>>>>>>     "command":{
>>>>>>>>>
>>>>>>>>>       "shell":true,
>>>>>>>>>
>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>
>>>>>>>>>       "user":"root"
>>>>>>>>>
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>> I have searched all machines and containers to see if they ever run
>>>>>>>> the command (in this case `sleep 5`), but have not found any indication
>>>>>>>> that it is being executed.
>>>>>>>>
>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>>> docker tasks?
>>>>>>>>
>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Jay
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>
>


-- 
Best Regards,
Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
Precisely, and there are none of those statements.  Are you or others confident health-checks are part of the code path when defined via task info for docker container tasks?  Going through the code, I wasn't able to find the linkage for anything other than health-checks triggered through a custom executor.

With that being said it is a pretty good sized code base and I'm not very familiar with it, so my analysis this far has by no means been exhaustive.



> On Oct 6, 2015, at 8:41 PM, haosdent <ha...@gmail.com> wrote:
> 
> When health check launch, it would have a log like this in your executor stdout
> ```
> Health check process launched at pid xxx
> ```
> 
>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com> wrote:
>> I'm happy to try this, however wouldn't there be output in the logs with the string "health" or "Health" if the health-check were active?  None of my master or slave logs contain the string..
>> 
>> 
>> 
>>> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>>> 
>>> Could you use "exit 1" instead of "sleep 5" to see whether could see unhealthy status in your task stdout/stderr.
>>> 
>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>> My current version is 0.24.1.
>>>> 
>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>> Are you use one of this version?
>>>>> 
>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:
>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double check.
>>>>>> 
>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>>>>> 
>>>>>>> Thanks again!
>>>>>>> 
>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>> 
>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>> 
>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io> wrote:
>>>>>>>>> Hi Jay, 
>>>>>>>>> 
>>>>>>>>> We just added health check support for docker tasks that's in master but not yet released. It will run docker exec with the command you provided as health checks.
>>>>>>>>> 
>>>>>>>>> It should be in the next release.
>>>>>>>>> 
>>>>>>>>> Thanks!
>>>>>>>>> 
>>>>>>>>> Tim
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>> 
>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>> 
>>>>>>>>>>>> {
>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>   },
>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>   },
>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>     {
>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>       }
>>>>>>>>>>>>     },
>>>>>>>>>>>>     {
>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>       }
>>>>>>>>>>>>     },
>>>>>>>>>>>>     {
>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>           {
>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>           }
>>>>>>>>>>>>         ]
>>>>>>>>>>>>       }
>>>>>>>>>>>>     }
>>>>>>>>>>>>   ],
>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>     },
>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>   },
>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>         {
>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>         }
>>>>>>>>>>>>       ],
>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>     }
>>>>>>>>>>>>   },
>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>     }
>>>>>>>>>>>>   }
>>>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> I have searched all machines and containers to see if they ever run the command (in this case `sleep 5`), but have not found any indication that it is being executed.
>>>>>>>>>> 
>>>>>>>>>> In the mesos src code the health-checks are invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this mean that health-checks are only supported for custom executors and not for docker tasks?
>>>>>>>>>> 
>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>>>> 
>>>>>>>>>> Thanks!
>>>>>>>>>> Jay
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Best Regards,
>>>>> Haosdent Huang
>>> 
>>> 
>>> 
>>> -- 
>>> Best Regards,
>>> Haosdent Huang
> 
> 
> 
> -- 
> Best Regards,
> Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
When health check launch, it would have a log like this in your executor
stdout
```
Health check process launched at pid xxx
```

On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <ou...@gmail.com> wrote:

> I'm happy to try this, however wouldn't there be output in the logs with
> the string "health" or "Health" if the health-check were active?  None of
> my master or slave logs contain the string..
>
>
>
> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
>
> Could you use "exit 1" instead of "sleep 5" to see whether could see
> unhealthy status in your task stdout/stderr.
>
> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com> wrote:
>
>> My current version is 0.24.1.
>>
>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>>
>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>
>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>> Are you use one of this version?
>>>
>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:
>>>
>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double
>>>> check.
>>>>
>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com>
>>>> wrote:
>>>>
>>>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>>>
>>>>> Thanks again!
>>>>>
>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>>
>>>>>> Great, thanks for the quick reply Tim!
>>>>>>
>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>
>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Jay,
>>>>>>>
>>>>>>> We just added health check support for docker tasks that's in master
>>>>>>> but not yet released. It will run docker exec with the command you provided
>>>>>>> as health checks.
>>>>>>>
>>>>>>> It should be in the next release.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Tim
>>>>>>>
>>>>>>>
>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>
>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos
>>>>>>> seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>
>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>
>>>>>>> {
>>>>>>>>
>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>
>>>>>>>>   "task_id":{
>>>>>>>>
>>>>>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>
>>>>>>>>   },
>>>>>>>>
>>>>>>>>   "slave_id":{
>>>>>>>>
>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>
>>>>>>>>   },
>>>>>>>>
>>>>>>>>   "resources":[
>>>>>>>>
>>>>>>>>     {
>>>>>>>>
>>>>>>>>       "name":"cpus",
>>>>>>>>
>>>>>>>>       "type":0,
>>>>>>>>
>>>>>>>>       "scalar":{
>>>>>>>>
>>>>>>>>         "value":0.1
>>>>>>>>
>>>>>>>>       }
>>>>>>>>
>>>>>>>>     },
>>>>>>>>
>>>>>>>>     {
>>>>>>>>
>>>>>>>>       "name":"mem",
>>>>>>>>
>>>>>>>>       "type":0,
>>>>>>>>
>>>>>>>>       "scalar":{
>>>>>>>>
>>>>>>>>         "value":256
>>>>>>>>
>>>>>>>>       }
>>>>>>>>
>>>>>>>>     },
>>>>>>>>
>>>>>>>>     {
>>>>>>>>
>>>>>>>>       "name":"ports",
>>>>>>>>
>>>>>>>>       "type":1,
>>>>>>>>
>>>>>>>>       "ranges":{
>>>>>>>>
>>>>>>>>         "range":[
>>>>>>>>
>>>>>>>>           {
>>>>>>>>
>>>>>>>>             "begin":31002,
>>>>>>>>
>>>>>>>>             "end":31002
>>>>>>>>
>>>>>>>>           }
>>>>>>>>
>>>>>>>>         ]
>>>>>>>>
>>>>>>>>       }
>>>>>>>>
>>>>>>>>     }
>>>>>>>>
>>>>>>>>   ],
>>>>>>>>
>>>>>>>>   "command":{
>>>>>>>>
>>>>>>>>     "container":{
>>>>>>>>
>>>>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>
>>>>>>>>     },
>>>>>>>>
>>>>>>>>     "shell":false
>>>>>>>>
>>>>>>>>   },
>>>>>>>>
>>>>>>>>   "container":{
>>>>>>>>
>>>>>>>>     "type":1,
>>>>>>>>
>>>>>>>>     "docker":{
>>>>>>>>
>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>
>>>>>>>>       "network":2,
>>>>>>>>
>>>>>>>>       "port_mappings":[
>>>>>>>>
>>>>>>>>         {
>>>>>>>>
>>>>>>>>           "host_port":31002,
>>>>>>>>
>>>>>>>>           "container_port":8000,
>>>>>>>>
>>>>>>>>           "protocol":"tcp"
>>>>>>>>
>>>>>>>>         }
>>>>>>>>
>>>>>>>>       ],
>>>>>>>>
>>>>>>>>       "privileged":false,
>>>>>>>>
>>>>>>>>       "parameters":[],
>>>>>>>>
>>>>>>>>       "force_pull_image":false
>>>>>>>>
>>>>>>>>     }
>>>>>>>>
>>>>>>>>   },
>>>>>>>>
>>>>>>>>   "health_check":{
>>>>>>>>
>>>>>>>>     "delay_seconds":5,
>>>>>>>>
>>>>>>>>     "interval_seconds":10,
>>>>>>>>
>>>>>>>>     "timeout_seconds":10,
>>>>>>>>
>>>>>>>>     "consecutive_failures":3,
>>>>>>>>
>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>
>>>>>>>>     "command":{
>>>>>>>>
>>>>>>>>       "shell":true,
>>>>>>>>
>>>>>>>>       "value":"sleep 5",
>>>>>>>>
>>>>>>>>       "user":"root"
>>>>>>>>
>>>>>>>>     }
>>>>>>>>
>>>>>>>>   }
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>> I have searched all machines and containers to see if they ever run
>>>>>>> the command (in this case `sleep 5`), but have not found any indication
>>>>>>> that it is being executed.
>>>>>>>
>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>>> docker tasks?
>>>>>>>
>>>>>>> What I am trying to accomplish is to have the 0/non-zero exit-status
>>>>>>> of a health-check command translate to task health.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Jay
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>
>


-- 
Best Regards,
Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
I'm happy to try this, however wouldn't there be output in the logs with the string "health" or "Health" if the health-check were active?  None of my master or slave logs contain the string..



> On Oct 6, 2015, at 7:45 PM, haosdent <ha...@gmail.com> wrote:
> 
> Could you use "exit 1" instead of "sleep 5" to see whether could see unhealthy status in your task stdout/stderr.
> 
>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com> wrote:
>> My current version is 0.24.1.
>> 
>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>>> yes, adam also help commit it to 0.23.1 and 0.24.1 https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>> Are you use one of this version?
>>> 
>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:
>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double check.
>>>> 
>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>>> 
>>>>> Thanks again!
>>>>> 
>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>>> Great, thanks for the quick reply Tim!
>>>>>> 
>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>> 
>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io> wrote:
>>>>>>> Hi Jay, 
>>>>>>> 
>>>>>>> We just added health check support for docker tasks that's in master but not yet released. It will run docker exec with the command you provided as health checks.
>>>>>>> 
>>>>>>> It should be in the next release.
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> 
>>>>>>> Tim
>>>>>>> 
>>>>>>> 
>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>> 
>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>> 
>>>>>>>>>> {
>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>   "task_id":{
>>>>>>>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>   },
>>>>>>>>>>   "slave_id":{
>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>   },
>>>>>>>>>>   "resources":[
>>>>>>>>>>     {
>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>       "type":0,
>>>>>>>>>>       "scalar":{
>>>>>>>>>>         "value":0.1
>>>>>>>>>>       }
>>>>>>>>>>     },
>>>>>>>>>>     {
>>>>>>>>>>       "name":"mem",
>>>>>>>>>>       "type":0,
>>>>>>>>>>       "scalar":{
>>>>>>>>>>         "value":256
>>>>>>>>>>       }
>>>>>>>>>>     },
>>>>>>>>>>     {
>>>>>>>>>>       "name":"ports",
>>>>>>>>>>       "type":1,
>>>>>>>>>>       "ranges":{
>>>>>>>>>>         "range":[
>>>>>>>>>>           {
>>>>>>>>>>             "begin":31002,
>>>>>>>>>>             "end":31002
>>>>>>>>>>           }
>>>>>>>>>>         ]
>>>>>>>>>>       }
>>>>>>>>>>     }
>>>>>>>>>>   ],
>>>>>>>>>>   "command":{
>>>>>>>>>>     "container":{
>>>>>>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>     },
>>>>>>>>>>     "shell":false
>>>>>>>>>>   },
>>>>>>>>>>   "container":{
>>>>>>>>>>     "type":1,
>>>>>>>>>>     "docker":{
>>>>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>       "network":2,
>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>         {
>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>         }
>>>>>>>>>>       ],
>>>>>>>>>>       "privileged":false,
>>>>>>>>>>       "parameters":[],
>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>     }
>>>>>>>>>>   },
>>>>>>>>>>   "health_check":{
>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>     "command":{
>>>>>>>>>>       "shell":true,
>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>       "user":"root"
>>>>>>>>>>     }
>>>>>>>>>>   }
>>>>>>>>>> }
>>>>>>>> 
>>>>>>>> I have searched all machines and containers to see if they ever run the command (in this case `sleep 5`), but have not found any indication that it is being executed.
>>>>>>>> 
>>>>>>>> In the mesos src code the health-checks are invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this mean that health-checks are only supported for custom executors and not for docker tasks?
>>>>>>>> 
>>>>>>>> What I am trying to accomplish is to have the 0/non-zero exit-status of a health-check command translate to task health.
>>>>>>>> 
>>>>>>>> Thanks!
>>>>>>>> Jay
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Best Regards,
>>>> Haosdent Huang
>>> 
>>> 
>>> 
>>> -- 
>>> Best Regards,
>>> Haosdent Huang
> 
> 
> 
> -- 
> Best Regards,
> Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
Could you use "exit 1" instead of "sleep 5" to see whether could see
unhealthy status in your task stdout/stderr.

On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <ou...@gmail.com> wrote:

> My current version is 0.24.1.
>
> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:
>
>> yes, adam also help commit it to 0.23.1 and 0.24.1
>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>
>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>> Are you use one of this version?
>>
>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:
>>
>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double check.
>>>
>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com> wrote:
>>>
>>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>>
>>>> Thanks again!
>>>>
>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>>
>>>>> Great, thanks for the quick reply Tim!
>>>>>
>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>
>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io>
>>>>> wrote:
>>>>>
>>>>>> Hi Jay,
>>>>>>
>>>>>> We just added health check support for docker tasks that's in master
>>>>>> but not yet released. It will run docker exec with the command you provided
>>>>>> as health checks.
>>>>>>
>>>>>> It should be in the next release.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Tim
>>>>>>
>>>>>>
>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>>
>>>>>> Does Mesos support health checks for docker image tasks?  Mesos seems
>>>>>> to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>
>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>
>>>>>> {
>>>>>>>
>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>
>>>>>>>   "task_id":{
>>>>>>>
>>>>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>
>>>>>>>   },
>>>>>>>
>>>>>>>   "slave_id":{
>>>>>>>
>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>
>>>>>>>   },
>>>>>>>
>>>>>>>   "resources":[
>>>>>>>
>>>>>>>     {
>>>>>>>
>>>>>>>       "name":"cpus",
>>>>>>>
>>>>>>>       "type":0,
>>>>>>>
>>>>>>>       "scalar":{
>>>>>>>
>>>>>>>         "value":0.1
>>>>>>>
>>>>>>>       }
>>>>>>>
>>>>>>>     },
>>>>>>>
>>>>>>>     {
>>>>>>>
>>>>>>>       "name":"mem",
>>>>>>>
>>>>>>>       "type":0,
>>>>>>>
>>>>>>>       "scalar":{
>>>>>>>
>>>>>>>         "value":256
>>>>>>>
>>>>>>>       }
>>>>>>>
>>>>>>>     },
>>>>>>>
>>>>>>>     {
>>>>>>>
>>>>>>>       "name":"ports",
>>>>>>>
>>>>>>>       "type":1,
>>>>>>>
>>>>>>>       "ranges":{
>>>>>>>
>>>>>>>         "range":[
>>>>>>>
>>>>>>>           {
>>>>>>>
>>>>>>>             "begin":31002,
>>>>>>>
>>>>>>>             "end":31002
>>>>>>>
>>>>>>>           }
>>>>>>>
>>>>>>>         ]
>>>>>>>
>>>>>>>       }
>>>>>>>
>>>>>>>     }
>>>>>>>
>>>>>>>   ],
>>>>>>>
>>>>>>>   "command":{
>>>>>>>
>>>>>>>     "container":{
>>>>>>>
>>>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>
>>>>>>>     },
>>>>>>>
>>>>>>>     "shell":false
>>>>>>>
>>>>>>>   },
>>>>>>>
>>>>>>>   "container":{
>>>>>>>
>>>>>>>     "type":1,
>>>>>>>
>>>>>>>     "docker":{
>>>>>>>
>>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>
>>>>>>>       "network":2,
>>>>>>>
>>>>>>>       "port_mappings":[
>>>>>>>
>>>>>>>         {
>>>>>>>
>>>>>>>           "host_port":31002,
>>>>>>>
>>>>>>>           "container_port":8000,
>>>>>>>
>>>>>>>           "protocol":"tcp"
>>>>>>>
>>>>>>>         }
>>>>>>>
>>>>>>>       ],
>>>>>>>
>>>>>>>       "privileged":false,
>>>>>>>
>>>>>>>       "parameters":[],
>>>>>>>
>>>>>>>       "force_pull_image":false
>>>>>>>
>>>>>>>     }
>>>>>>>
>>>>>>>   },
>>>>>>>
>>>>>>>   "health_check":{
>>>>>>>
>>>>>>>     "delay_seconds":5,
>>>>>>>
>>>>>>>     "interval_seconds":10,
>>>>>>>
>>>>>>>     "timeout_seconds":10,
>>>>>>>
>>>>>>>     "consecutive_failures":3,
>>>>>>>
>>>>>>>     "grace_period_seconds":0,
>>>>>>>
>>>>>>>     "command":{
>>>>>>>
>>>>>>>       "shell":true,
>>>>>>>
>>>>>>>       "value":"sleep 5",
>>>>>>>
>>>>>>>       "user":"root"
>>>>>>>
>>>>>>>     }
>>>>>>>
>>>>>>>   }
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>> I have searched all machines and containers to see if they ever run
>>>>>> the command (in this case `sleep 5`), but have not found any indication
>>>>>> that it is being executed.
>>>>>>
>>>>>> In the mesos src code the health-checks are invoked from
>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>>> mean that health-checks are only supported for custom executors and not for
>>>>>> docker tasks?
>>>>>>
>>>>>> What I am trying to accomplish is to have the 0/non-zero exit-status
>>>>>> of a health-check command translate to task health.
>>>>>>
>>>>>> Thanks!
>>>>>> Jay
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


-- 
Best Regards,
Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
My current version is 0.24.1.

On Tue, Oct 6, 2015 at 7:30 PM, haosdent <ha...@gmail.com> wrote:

> yes, adam also help commit it to 0.23.1 and 0.24.1
> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>
> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
> Are you use one of this version?
>
> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:
>
>> I remember 0.23.1 and 0.24.1 contains this backport, let me double check.
>>
>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com> wrote:
>>
>>> Oops- Now I see you already said it's in master.  I'll look there :)
>>>
>>> Thanks again!
>>>
>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>>
>>>> Great, thanks for the quick reply Tim!
>>>>
>>>> Do you know if there is a branch I can checkout to test it out?
>>>>
>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io> wrote:
>>>>
>>>>> Hi Jay,
>>>>>
>>>>> We just added health check support for docker tasks that's in master
>>>>> but not yet released. It will run docker exec with the command you provided
>>>>> as health checks.
>>>>>
>>>>> It should be in the next release.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Tim
>>>>>
>>>>>
>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>>
>>>>> Does Mesos support health checks for docker image tasks?  Mesos seems
>>>>> to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>
>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>
>>>>> {
>>>>>>
>>>>>>   "name":"hello-app.web.v3",
>>>>>>
>>>>>>   "task_id":{
>>>>>>
>>>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>
>>>>>>   },
>>>>>>
>>>>>>   "slave_id":{
>>>>>>
>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>
>>>>>>   },
>>>>>>
>>>>>>   "resources":[
>>>>>>
>>>>>>     {
>>>>>>
>>>>>>       "name":"cpus",
>>>>>>
>>>>>>       "type":0,
>>>>>>
>>>>>>       "scalar":{
>>>>>>
>>>>>>         "value":0.1
>>>>>>
>>>>>>       }
>>>>>>
>>>>>>     },
>>>>>>
>>>>>>     {
>>>>>>
>>>>>>       "name":"mem",
>>>>>>
>>>>>>       "type":0,
>>>>>>
>>>>>>       "scalar":{
>>>>>>
>>>>>>         "value":256
>>>>>>
>>>>>>       }
>>>>>>
>>>>>>     },
>>>>>>
>>>>>>     {
>>>>>>
>>>>>>       "name":"ports",
>>>>>>
>>>>>>       "type":1,
>>>>>>
>>>>>>       "ranges":{
>>>>>>
>>>>>>         "range":[
>>>>>>
>>>>>>           {
>>>>>>
>>>>>>             "begin":31002,
>>>>>>
>>>>>>             "end":31002
>>>>>>
>>>>>>           }
>>>>>>
>>>>>>         ]
>>>>>>
>>>>>>       }
>>>>>>
>>>>>>     }
>>>>>>
>>>>>>   ],
>>>>>>
>>>>>>   "command":{
>>>>>>
>>>>>>     "container":{
>>>>>>
>>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>
>>>>>>     },
>>>>>>
>>>>>>     "shell":false
>>>>>>
>>>>>>   },
>>>>>>
>>>>>>   "container":{
>>>>>>
>>>>>>     "type":1,
>>>>>>
>>>>>>     "docker":{
>>>>>>
>>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>
>>>>>>       "network":2,
>>>>>>
>>>>>>       "port_mappings":[
>>>>>>
>>>>>>         {
>>>>>>
>>>>>>           "host_port":31002,
>>>>>>
>>>>>>           "container_port":8000,
>>>>>>
>>>>>>           "protocol":"tcp"
>>>>>>
>>>>>>         }
>>>>>>
>>>>>>       ],
>>>>>>
>>>>>>       "privileged":false,
>>>>>>
>>>>>>       "parameters":[],
>>>>>>
>>>>>>       "force_pull_image":false
>>>>>>
>>>>>>     }
>>>>>>
>>>>>>   },
>>>>>>
>>>>>>   "health_check":{
>>>>>>
>>>>>>     "delay_seconds":5,
>>>>>>
>>>>>>     "interval_seconds":10,
>>>>>>
>>>>>>     "timeout_seconds":10,
>>>>>>
>>>>>>     "consecutive_failures":3,
>>>>>>
>>>>>>     "grace_period_seconds":0,
>>>>>>
>>>>>>     "command":{
>>>>>>
>>>>>>       "shell":true,
>>>>>>
>>>>>>       "value":"sleep 5",
>>>>>>
>>>>>>       "user":"root"
>>>>>>
>>>>>>     }
>>>>>>
>>>>>>   }
>>>>>>
>>>>>> }
>>>>>>
>>>>>>
>>>>> I have searched all machines and containers to see if they ever run
>>>>> the command (in this case `sleep 5`), but have not found any indication
>>>>> that it is being executed.
>>>>>
>>>>> In the mesos src code the health-checks are invoked from
>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>>> mean that health-checks are only supported for custom executors and not for
>>>>> docker tasks?
>>>>>
>>>>> What I am trying to accomplish is to have the 0/non-zero exit-status
>>>>> of a health-check command translate to task health.
>>>>>
>>>>> Thanks!
>>>>> Jay
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
yes, adam also help commit it to 0.23.1 and 0.24.1
https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0

https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
Are you use one of this version?

On Wed, Oct 7, 2015 at 10:26 AM, haosdent <ha...@gmail.com> wrote:

> I remember 0.23.1 and 0.24.1 contains this backport, let me double check.
>
> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com> wrote:
>
>> Oops- Now I see you already said it's in master.  I'll look there :)
>>
>> Thanks again!
>>
>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>>
>>> Great, thanks for the quick reply Tim!
>>>
>>> Do you know if there is a branch I can checkout to test it out?
>>>
>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io> wrote:
>>>
>>>> Hi Jay,
>>>>
>>>> We just added health check support for docker tasks that's in master
>>>> but not yet released. It will run docker exec with the command you provided
>>>> as health checks.
>>>>
>>>> It should be in the next release.
>>>>
>>>> Thanks!
>>>>
>>>> Tim
>>>>
>>>>
>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>>
>>>> Does Mesos support health checks for docker image tasks?  Mesos seems
>>>> to be ignoring the TaskInfo.HealthCheck field for me.
>>>>
>>>> Example TaskInfo JSON received back from Mesos:
>>>>
>>>> {
>>>>>
>>>>>   "name":"hello-app.web.v3",
>>>>>
>>>>>   "task_id":{
>>>>>
>>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>
>>>>>   },
>>>>>
>>>>>   "slave_id":{
>>>>>
>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>
>>>>>   },
>>>>>
>>>>>   "resources":[
>>>>>
>>>>>     {
>>>>>
>>>>>       "name":"cpus",
>>>>>
>>>>>       "type":0,
>>>>>
>>>>>       "scalar":{
>>>>>
>>>>>         "value":0.1
>>>>>
>>>>>       }
>>>>>
>>>>>     },
>>>>>
>>>>>     {
>>>>>
>>>>>       "name":"mem",
>>>>>
>>>>>       "type":0,
>>>>>
>>>>>       "scalar":{
>>>>>
>>>>>         "value":256
>>>>>
>>>>>       }
>>>>>
>>>>>     },
>>>>>
>>>>>     {
>>>>>
>>>>>       "name":"ports",
>>>>>
>>>>>       "type":1,
>>>>>
>>>>>       "ranges":{
>>>>>
>>>>>         "range":[
>>>>>
>>>>>           {
>>>>>
>>>>>             "begin":31002,
>>>>>
>>>>>             "end":31002
>>>>>
>>>>>           }
>>>>>
>>>>>         ]
>>>>>
>>>>>       }
>>>>>
>>>>>     }
>>>>>
>>>>>   ],
>>>>>
>>>>>   "command":{
>>>>>
>>>>>     "container":{
>>>>>
>>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>
>>>>>     },
>>>>>
>>>>>     "shell":false
>>>>>
>>>>>   },
>>>>>
>>>>>   "container":{
>>>>>
>>>>>     "type":1,
>>>>>
>>>>>     "docker":{
>>>>>
>>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>
>>>>>       "network":2,
>>>>>
>>>>>       "port_mappings":[
>>>>>
>>>>>         {
>>>>>
>>>>>           "host_port":31002,
>>>>>
>>>>>           "container_port":8000,
>>>>>
>>>>>           "protocol":"tcp"
>>>>>
>>>>>         }
>>>>>
>>>>>       ],
>>>>>
>>>>>       "privileged":false,
>>>>>
>>>>>       "parameters":[],
>>>>>
>>>>>       "force_pull_image":false
>>>>>
>>>>>     }
>>>>>
>>>>>   },
>>>>>
>>>>>   "health_check":{
>>>>>
>>>>>     "delay_seconds":5,
>>>>>
>>>>>     "interval_seconds":10,
>>>>>
>>>>>     "timeout_seconds":10,
>>>>>
>>>>>     "consecutive_failures":3,
>>>>>
>>>>>     "grace_period_seconds":0,
>>>>>
>>>>>     "command":{
>>>>>
>>>>>       "shell":true,
>>>>>
>>>>>       "value":"sleep 5",
>>>>>
>>>>>       "user":"root"
>>>>>
>>>>>     }
>>>>>
>>>>>   }
>>>>>
>>>>> }
>>>>>
>>>>>
>>>> I have searched all machines and containers to see if they ever run the
>>>> command (in this case `sleep 5`), but have not found any indication that it
>>>> is being executed.
>>>>
>>>> In the mesos src code the health-checks are invoked from
>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>>> mean that health-checks are only supported for custom executors and not for
>>>> docker tasks?
>>>>
>>>> What I am trying to accomplish is to have the 0/non-zero exit-status of
>>>> a health-check command translate to task health.
>>>>
>>>> Thanks!
>>>> Jay
>>>>
>>>>
>>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by haosdent <ha...@gmail.com>.
I remember 0.23.1 and 0.24.1 contains this backport, let me double check.

On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <ou...@gmail.com> wrote:

> Oops- Now I see you already said it's in master.  I'll look there :)
>
> Thanks again!
>
> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:
>
>> Great, thanks for the quick reply Tim!
>>
>> Do you know if there is a branch I can checkout to test it out?
>>
>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io> wrote:
>>
>>> Hi Jay,
>>>
>>> We just added health check support for docker tasks that's in master but
>>> not yet released. It will run docker exec with the command you provided as
>>> health checks.
>>>
>>> It should be in the next release.
>>>
>>> Thanks!
>>>
>>> Tim
>>>
>>>
>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>>
>>> Does Mesos support health checks for docker image tasks?  Mesos seems to
>>> be ignoring the TaskInfo.HealthCheck field for me.
>>>
>>> Example TaskInfo JSON received back from Mesos:
>>>
>>> {
>>>>
>>>>   "name":"hello-app.web.v3",
>>>>
>>>>   "task_id":{
>>>>
>>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>
>>>>   },
>>>>
>>>>   "slave_id":{
>>>>
>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>
>>>>   },
>>>>
>>>>   "resources":[
>>>>
>>>>     {
>>>>
>>>>       "name":"cpus",
>>>>
>>>>       "type":0,
>>>>
>>>>       "scalar":{
>>>>
>>>>         "value":0.1
>>>>
>>>>       }
>>>>
>>>>     },
>>>>
>>>>     {
>>>>
>>>>       "name":"mem",
>>>>
>>>>       "type":0,
>>>>
>>>>       "scalar":{
>>>>
>>>>         "value":256
>>>>
>>>>       }
>>>>
>>>>     },
>>>>
>>>>     {
>>>>
>>>>       "name":"ports",
>>>>
>>>>       "type":1,
>>>>
>>>>       "ranges":{
>>>>
>>>>         "range":[
>>>>
>>>>           {
>>>>
>>>>             "begin":31002,
>>>>
>>>>             "end":31002
>>>>
>>>>           }
>>>>
>>>>         ]
>>>>
>>>>       }
>>>>
>>>>     }
>>>>
>>>>   ],
>>>>
>>>>   "command":{
>>>>
>>>>     "container":{
>>>>
>>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>
>>>>     },
>>>>
>>>>     "shell":false
>>>>
>>>>   },
>>>>
>>>>   "container":{
>>>>
>>>>     "type":1,
>>>>
>>>>     "docker":{
>>>>
>>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>
>>>>       "network":2,
>>>>
>>>>       "port_mappings":[
>>>>
>>>>         {
>>>>
>>>>           "host_port":31002,
>>>>
>>>>           "container_port":8000,
>>>>
>>>>           "protocol":"tcp"
>>>>
>>>>         }
>>>>
>>>>       ],
>>>>
>>>>       "privileged":false,
>>>>
>>>>       "parameters":[],
>>>>
>>>>       "force_pull_image":false
>>>>
>>>>     }
>>>>
>>>>   },
>>>>
>>>>   "health_check":{
>>>>
>>>>     "delay_seconds":5,
>>>>
>>>>     "interval_seconds":10,
>>>>
>>>>     "timeout_seconds":10,
>>>>
>>>>     "consecutive_failures":3,
>>>>
>>>>     "grace_period_seconds":0,
>>>>
>>>>     "command":{
>>>>
>>>>       "shell":true,
>>>>
>>>>       "value":"sleep 5",
>>>>
>>>>       "user":"root"
>>>>
>>>>     }
>>>>
>>>>   }
>>>>
>>>> }
>>>>
>>>>
>>> I have searched all machines and containers to see if they ever run the
>>> command (in this case `sleep 5`), but have not found any indication that it
>>> is being executed.
>>>
>>> In the mesos src code the health-checks are invoked from
>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>>> mean that health-checks are only supported for custom executors and not for
>>> docker tasks?
>>>
>>> What I am trying to accomplish is to have the 0/non-zero exit-status of
>>> a health-check command translate to task health.
>>>
>>> Thanks!
>>> Jay
>>>
>>>
>>
>


-- 
Best Regards,
Haosdent Huang

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ou...@gmail.com>.
Oops- Now I see you already said it's in master.  I'll look there :)

Thanks again!

On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <ja...@jaytaylor.com> wrote:

> Great, thanks for the quick reply Tim!
>
> Do you know if there is a branch I can checkout to test it out?
>
> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io> wrote:
>
>> Hi Jay,
>>
>> We just added health check support for docker tasks that's in master but
>> not yet released. It will run docker exec with the command you provided as
>> health checks.
>>
>> It should be in the next release.
>>
>> Thanks!
>>
>> Tim
>>
>>
>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>>
>> Does Mesos support health checks for docker image tasks?  Mesos seems to
>> be ignoring the TaskInfo.HealthCheck field for me.
>>
>> Example TaskInfo JSON received back from Mesos:
>>
>> {
>>>
>>>   "name":"hello-app.web.v3",
>>>
>>>   "task_id":{
>>>
>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>
>>>   },
>>>
>>>   "slave_id":{
>>>
>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>
>>>   },
>>>
>>>   "resources":[
>>>
>>>     {
>>>
>>>       "name":"cpus",
>>>
>>>       "type":0,
>>>
>>>       "scalar":{
>>>
>>>         "value":0.1
>>>
>>>       }
>>>
>>>     },
>>>
>>>     {
>>>
>>>       "name":"mem",
>>>
>>>       "type":0,
>>>
>>>       "scalar":{
>>>
>>>         "value":256
>>>
>>>       }
>>>
>>>     },
>>>
>>>     {
>>>
>>>       "name":"ports",
>>>
>>>       "type":1,
>>>
>>>       "ranges":{
>>>
>>>         "range":[
>>>
>>>           {
>>>
>>>             "begin":31002,
>>>
>>>             "end":31002
>>>
>>>           }
>>>
>>>         ]
>>>
>>>       }
>>>
>>>     }
>>>
>>>   ],
>>>
>>>   "command":{
>>>
>>>     "container":{
>>>
>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>
>>>     },
>>>
>>>     "shell":false
>>>
>>>   },
>>>
>>>   "container":{
>>>
>>>     "type":1,
>>>
>>>     "docker":{
>>>
>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>
>>>       "network":2,
>>>
>>>       "port_mappings":[
>>>
>>>         {
>>>
>>>           "host_port":31002,
>>>
>>>           "container_port":8000,
>>>
>>>           "protocol":"tcp"
>>>
>>>         }
>>>
>>>       ],
>>>
>>>       "privileged":false,
>>>
>>>       "parameters":[],
>>>
>>>       "force_pull_image":false
>>>
>>>     }
>>>
>>>   },
>>>
>>>   "health_check":{
>>>
>>>     "delay_seconds":5,
>>>
>>>     "interval_seconds":10,
>>>
>>>     "timeout_seconds":10,
>>>
>>>     "consecutive_failures":3,
>>>
>>>     "grace_period_seconds":0,
>>>
>>>     "command":{
>>>
>>>       "shell":true,
>>>
>>>       "value":"sleep 5",
>>>
>>>       "user":"root"
>>>
>>>     }
>>>
>>>   }
>>>
>>> }
>>>
>>>
>> I have searched all machines and containers to see if they ever run the
>> command (in this case `sleep 5`), but have not found any indication that it
>> is being executed.
>>
>> In the mesos src code the health-checks are invoked from
>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
>> mean that health-checks are only supported for custom executors and not for
>> docker tasks?
>>
>> What I am trying to accomplish is to have the 0/non-zero exit-status of a
>> health-check command translate to task health.
>>
>> Thanks!
>> Jay
>>
>>
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Jay Taylor <ja...@jaytaylor.com>.
Great, thanks for the quick reply Tim!

Do you know if there is a branch I can checkout to test it out?

On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <ti...@mesosphere.io> wrote:

> Hi Jay,
>
> We just added health check support for docker tasks that's in master but
> not yet released. It will run docker exec with the command you provided as
> health checks.
>
> It should be in the next release.
>
> Thanks!
>
> Tim
>
>
> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
>
> Does Mesos support health checks for docker image tasks?  Mesos seems to
> be ignoring the TaskInfo.HealthCheck field for me.
>
> Example TaskInfo JSON received back from Mesos:
>
> {
>>
>>   "name":"hello-app.web.v3",
>>
>>   "task_id":{
>>
>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>
>>   },
>>
>>   "slave_id":{
>>
>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>
>>   },
>>
>>   "resources":[
>>
>>     {
>>
>>       "name":"cpus",
>>
>>       "type":0,
>>
>>       "scalar":{
>>
>>         "value":0.1
>>
>>       }
>>
>>     },
>>
>>     {
>>
>>       "name":"mem",
>>
>>       "type":0,
>>
>>       "scalar":{
>>
>>         "value":256
>>
>>       }
>>
>>     },
>>
>>     {
>>
>>       "name":"ports",
>>
>>       "type":1,
>>
>>       "ranges":{
>>
>>         "range":[
>>
>>           {
>>
>>             "begin":31002,
>>
>>             "end":31002
>>
>>           }
>>
>>         ]
>>
>>       }
>>
>>     }
>>
>>   ],
>>
>>   "command":{
>>
>>     "container":{
>>
>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>
>>     },
>>
>>     "shell":false
>>
>>   },
>>
>>   "container":{
>>
>>     "type":1,
>>
>>     "docker":{
>>
>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>
>>       "network":2,
>>
>>       "port_mappings":[
>>
>>         {
>>
>>           "host_port":31002,
>>
>>           "container_port":8000,
>>
>>           "protocol":"tcp"
>>
>>         }
>>
>>       ],
>>
>>       "privileged":false,
>>
>>       "parameters":[],
>>
>>       "force_pull_image":false
>>
>>     }
>>
>>   },
>>
>>   "health_check":{
>>
>>     "delay_seconds":5,
>>
>>     "interval_seconds":10,
>>
>>     "timeout_seconds":10,
>>
>>     "consecutive_failures":3,
>>
>>     "grace_period_seconds":0,
>>
>>     "command":{
>>
>>       "shell":true,
>>
>>       "value":"sleep 5",
>>
>>       "user":"root"
>>
>>     }
>>
>>   }
>>
>> }
>>
>>
> I have searched all machines and containers to see if they ever run the
> command (in this case `sleep 5`), but have not found any indication that it
> is being executed.
>
> In the mesos src code the health-checks are invoked from
> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this
> mean that health-checks are only supported for custom executors and not for
> docker tasks?
>
> What I am trying to accomplish is to have the 0/non-zero exit-status of a
> health-check command translate to task health.
>
> Thanks!
> Jay
>
>

Re: Can health-checks be run by Mesos for docker tasks?

Posted by Timothy Chen <ti...@mesosphere.io>.
Hi Jay, 

We just added health check support for docker tasks that's in master but not yet released. It will run docker exec with the command you provided as health checks.

It should be in the next release.

Thanks!

Tim


> On Oct 6, 2015, at 6:49 PM, Jay Taylor <ou...@gmail.com> wrote:
> 
> Does Mesos support health checks for docker image tasks?  Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
> 
> Example TaskInfo JSON received back from Mesos:
> 
>>> {
>>>   "name":"hello-app.web.v3",
>>>   "task_id":{
>>>     "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>   },
>>>   "slave_id":{
>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>   },
>>>   "resources":[
>>>     {
>>>       "name":"cpus",
>>>       "type":0,
>>>       "scalar":{
>>>         "value":0.1
>>>       }
>>>     },
>>>     {
>>>       "name":"mem",
>>>       "type":0,
>>>       "scalar":{
>>>         "value":256
>>>       }
>>>     },
>>>     {
>>>       "name":"ports",
>>>       "type":1,
>>>       "ranges":{
>>>         "range":[
>>>           {
>>>             "begin":31002,
>>>             "end":31002
>>>           }
>>>         ]
>>>       }
>>>     }
>>>   ],
>>>   "command":{
>>>     "container":{
>>>       "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>     },
>>>     "shell":false
>>>   },
>>>   "container":{
>>>     "type":1,
>>>     "docker":{
>>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>       "network":2,
>>>       "port_mappings":[
>>>         {
>>>           "host_port":31002,
>>>           "container_port":8000,
>>>           "protocol":"tcp"
>>>         }
>>>       ],
>>>       "privileged":false,
>>>       "parameters":[],
>>>       "force_pull_image":false
>>>     }
>>>   },
>>>   "health_check":{
>>>     "delay_seconds":5,
>>>     "interval_seconds":10,
>>>     "timeout_seconds":10,
>>>     "consecutive_failures":3,
>>>     "grace_period_seconds":0,
>>>     "command":{
>>>       "shell":true,
>>>       "value":"sleep 5",
>>>       "user":"root"
>>>     }
>>>   }
>>> }
> 
> I have searched all machines and containers to see if they ever run the command (in this case `sleep 5`), but have not found any indication that it is being executed.
> 
> In the mesos src code the health-checks are invoked from src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this mean that health-checks are only supported for custom executors and not for docker tasks?
> 
> What I am trying to accomplish is to have the 0/non-zero exit-status of a health-check command translate to task health.
> 
> Thanks!
> Jay