You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Gastón Kleiman <ga...@mesosphere.com> on 2017/01/25 00:33:46 UTC

Review Request 55901: [WIP] Added support for command health checks to the default executor.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
-------

[WIP] Added support for command health checks to the default executor.

This is still a work in progress, but I am posting it in order to get some feedback on the general approach.

Open questions/issues:

  - The code that handles communication with the Agent is on the `DefaultExecutor`, so the health checker calls `evolve` when serializing the API calls. Should it use the v1 protos instead, and skip the `evolve` call?
  - CMD health checks for command tasks inherit the env from the executor, but allow overriding env variables. I don't think that inheriting the executor env makes sense for cmd health checks launched by the `DefaultExecutor` - each task can have its own env, so the health check command should inherit the task env and merge it with the one in the check's `CommandInfo`.
    I added some code that takes the task env from `TaskInfo` and merges it check's env, but I don't think that this is an ideal approach, since it will miss env variables set/modified by containerizers/hooks/isolators. Do you think that it would make sense for all Debug Containers to inherit the environment of the parent container?
  - There's a TODO for stopping/resuming health checking when the `DefaultExecutor` is disconnected from the agent. After talking to AlexR, I believe that we should do this for all types of checks, not just for `CMD`.
  - The test that I introduced passes on Linux, but not on macOS, I still have to do some more debugging.


Diffs
-----

  src/checks/health_checker.hpp 6e558f2061a9e31157c47d31cb64b3a8568aace3 
  src/checks/health_checker.cpp 50aa2858e807b27bbab58a3618f5200cfe4eca9e 
  src/launcher/default_executor.cpp a03794934adb93868734f8cf00b337a1bff9b5ab 
  src/tests/health_check_tests.cpp debbd3c09b7555145aaf3f62a24d795d1423a269 

Diff: https://reviews.apache.org/r/55901/diff/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: [WIP] Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review162972
-----------------------------------------------------------



It looks like there's a deadlock in the cleanup path of the Agent/Containerizer that is only manifested under OS X. Here's the output of a failed test run on OS X:

```
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from HealthCheckTest
[ RUN      ] HealthCheckTest.DefaultExecutorCmdHealthCheck
I0125 14:43:57.005580 3095696320 cluster.cpp:160] Creating default 'local' authorizer
I0125 14:43:57.036818 31969280 master.cpp:383] Master 41942215-2f76-4ffa-9bfc-383fc4bc2b04 (172.18.9.98) started on 172.18.9.98:49526
I0125 14:43:57.036867 31969280 master.cpp:385] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/Fi6dxT/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_
 timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/Users/gaston/mesos/master/share/mesos/webui" --work_dir="/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/Fi6dxT/master" --zk_session_timeout="10secs"
I0125 14:43:57.038107 31969280 master.cpp:435] Master only allowing authenticated frameworks to register
I0125 14:43:57.038125 31969280 master.cpp:449] Master only allowing authenticated agents to register
I0125 14:43:57.038132 31969280 master.cpp:462] Master only allowing authenticated HTTP frameworks to register
I0125 14:43:57.038138 31969280 credentials.hpp:37] Loading credentials for authentication from '/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/Fi6dxT/credentials'
I0125 14:43:57.038478 31969280 master.cpp:507] Using default 'crammd5' authenticator
I0125 14:43:57.038516 31969280 authenticator.cpp:519] Initializing server SASL
I0125 14:43:57.056694 31969280 auxprop.cpp:73] Initialized in-memory auxiliary property plugin
I0125 14:43:57.056784 31969280 http.cpp:922] Using default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
I0125 14:43:57.056943 31969280 http.cpp:922] Using default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
I0125 14:43:57.057091 31969280 http.cpp:922] Using default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
I0125 14:43:57.057209 31969280 master.cpp:587] Authorization enabled
I0125 14:43:57.057353 29286400 hierarchical.cpp:151] Initialized hierarchical allocator process
I0125 14:43:57.057349 31432704 whitelist_watcher.cpp:77] No whitelist given
I0125 14:43:57.060212 33042432 master.cpp:2121] Elected as the leading master!
I0125 14:43:57.060230 33042432 master.cpp:1643] Recovering from registrar
I0125 14:43:57.060317 31432704 registrar.cpp:329] Recovering registrar
I0125 14:43:57.064963 31432704 registrar.cpp:362] Successfully fetched the registry (0B) in 4.62208ms
I0125 14:43:57.065105 31432704 registrar.cpp:461] Applied 1 operations in 36us; attempting to update the registry
I0125 14:43:57.069882 31969280 registrar.cpp:506] Successfully updated the registry in 4.75008ms
I0125 14:43:57.069962 31969280 registrar.cpp:392] Successfully recovered registrar
I0125 14:43:57.070274 30896128 master.cpp:1759] Recovered 0 agents from the registry (131B); allowing 10mins for agents to re-register
I0125 14:43:57.070297 29286400 hierarchical.cpp:178] Skipping recovery of hierarchical allocator: nothing to recover
I0125 14:43:57.071858 3095696320 containerizer.cpp:220] Using isolation: posix/cpu,posix/mem,filesystem/posix
I0125 14:43:57.074162 3095696320 cluster.cpp:446] Creating default 'local' authorizer
I0125 14:43:57.074981 33042432 slave.cpp:209] Mesos agent started on (1)@172.18.9.98:49526
I0125 14:43:57.075031 33042432 slave.cpp:210] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_oFfwrZ/credential" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false
 " --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_oFfwrZ/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_credentials="/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_oFfwrZ/http_credentials" --http_heartbeat_interval="30secs" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/Users/gaston/mesosphere/mesos/build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_
 timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" --runtime_dir="/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_oFfwrZ" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --version="false" --work_dir="/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg"
I0125 14:43:57.075314 33042432 credentials.hpp:86] Loading credential for authentication from '/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_oFfwrZ/credential'
I0125 14:43:57.075451 33042432 slave.cpp:352] Agent using credential for: test-principal
I0125 14:43:57.075469 33042432 credentials.hpp:37] Loading credentials for authentication from '/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_oFfwrZ/http_credentials'
I0125 14:43:57.075603 33042432 http.cpp:922] Using default 'basic' HTTP authenticator for realm 'mesos-agent-readonly'
I0125 14:43:57.076303 33042432 slave.cpp:539] Agent resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0125 14:43:57.076386 33042432 slave.cpp:547] Agent attributes: [  ]
I0125 14:43:57.076395 33042432 slave.cpp:552] Agent hostname: 172.18.9.98
I0125 14:43:57.076504 32505856 status_update_manager.cpp:177] Pausing sending status updates
I0125 14:43:57.076628 3095696320 sched.cpp:232] Version: 1.2.0
I0125 14:43:57.076966 29286400 sched.cpp:336] New master detected at master@172.18.9.98:49526
I0125 14:43:57.077011 29286400 sched.cpp:407] Authenticating with master master@172.18.9.98:49526
I0125 14:43:57.077033 29286400 sched.cpp:414] Using default CRAM-MD5 authenticatee
I0125 14:43:57.077142 29822976 authenticatee.cpp:97] Initializing client SASL
I0125 14:43:57.077421 29286400 state.cpp:60] Recovering state from '/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg/meta'
I0125 14:43:57.077636 33042432 status_update_manager.cpp:203] Recovering status update manager
I0125 14:43:57.077785 31969280 containerizer.cpp:599] Recovering containerizer
I0125 14:43:57.078785 31432704 provisioner.cpp:251] Provisioner recovery complete
I0125 14:43:57.078979 31432704 slave.cpp:5422] Finished recovery
I0125 14:43:57.079376 31432704 slave.cpp:5596] Querying resource estimator for oversubscribable resources
I0125 14:43:57.079521 33042432 status_update_manager.cpp:177] Pausing sending status updates
I0125 14:43:57.079525 31432704 slave.cpp:929] New master detected at master@172.18.9.98:49526
I0125 14:43:57.079567 31432704 slave.cpp:964] Detecting new master
I0125 14:43:57.079654 31432704 slave.cpp:5610] Received oversubscribable resources {} from the resource estimator
I0125 14:43:57.079880 29822976 authenticatee.cpp:121] Creating new client SASL connection
I0125 14:43:57.080003 29286400 master.cpp:7001] Authenticating scheduler-8fcc1ae4-1a01-4d5b-a2c8-444d1295751b@172.18.9.98:49526
I0125 14:43:57.080070 30359552 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(1)@172.18.9.98:49526
I0125 14:43:57.080181 31432704 authenticator.cpp:98] Creating new server SASL connection
I0125 14:43:57.080265 31969280 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
I0125 14:43:57.080309 31969280 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
I0125 14:43:57.080360 33042432 authenticator.cpp:204] Received SASL authentication start
I0125 14:43:57.080399 33042432 authenticator.cpp:326] Authentication requires more steps
I0125 14:43:57.080458 33042432 authenticatee.cpp:259] Received SASL authentication step
I0125 14:43:57.080544 31432704 authenticator.cpp:232] Received SASL authentication step
I0125 14:43:57.080564 31432704 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: 'Gastons-MacBook-Pro-2.local' server FQDN: 'Gastons-MacBook-Pro-2.local' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
I0125 14:43:57.080580 31432704 auxprop.cpp:181] Looking up auxiliary property '*userPassword'
I0125 14:43:57.080605 31432704 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I0125 14:43:57.080621 31432704 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: 'Gastons-MacBook-Pro-2.local' server FQDN: 'Gastons-MacBook-Pro-2.local' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true
I0125 14:43:57.080633 31432704 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I0125 14:43:57.080637 31432704 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0125 14:43:57.080646 31432704 authenticator.cpp:318] Authentication success
I0125 14:43:57.080687 29286400 authenticatee.cpp:299] Authentication success
I0125 14:43:57.080730 30359552 master.cpp:7031] Successfully authenticated principal 'test-principal' at scheduler-8fcc1ae4-1a01-4d5b-a2c8-444d1295751b@172.18.9.98:49526
I0125 14:43:57.080780 32505856 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(1)@172.18.9.98:49526
I0125 14:43:57.080895 31969280 sched.cpp:513] Successfully authenticated with master master@172.18.9.98:49526
I0125 14:43:57.080906 31969280 sched.cpp:836] Sending SUBSCRIBE call to master@172.18.9.98:49526
I0125 14:43:57.080955 31969280 sched.cpp:869] Will retry registration in 788.765853ms if necessary
I0125 14:43:57.081032 30359552 master.cpp:2734] Received SUBSCRIBE call for framework 'default' at scheduler-8fcc1ae4-1a01-4d5b-a2c8-444d1295751b@172.18.9.98:49526
I0125 14:43:57.081043 30359552 master.cpp:2157] Authorizing framework principal 'test-principal' to receive offers for role '*'
I0125 14:43:57.081260 29286400 master.cpp:2810] Subscribing framework default with checkpointing disabled and capabilities [  ]
I0125 14:43:57.088062 31969280 slave.cpp:991] Authenticating with master master@172.18.9.98:49526
I0125 14:43:57.096890 31969280 slave.cpp:1002] Using default CRAM-MD5 authenticatee
I0125 14:43:57.097005 29822976 authenticatee.cpp:121] Creating new client SASL connection
I0125 14:43:57.097234 29286400 master.cpp:7001] Authenticating slave(1)@172.18.9.98:49526
I0125 14:43:57.097245 33042432 hierarchical.cpp:271] Added framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.097241 31969280 sched.cpp:759] Framework registered with 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.097296 30896128 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(2)@172.18.9.98:49526
I0125 14:43:57.097301 33042432 hierarchical.cpp:1677] No allocations performed
I0125 14:43:57.097332 33042432 hierarchical.cpp:1772] No inverse offers to send out!
I0125 14:43:57.097352 33042432 hierarchical.cpp:1279] Performed allocation for 0 agents in 94us
I0125 14:43:57.097369 31969280 sched.cpp:773] Scheduler::registered took 108us
I0125 14:43:57.097446 29286400 authenticator.cpp:98] Creating new server SASL connection
I0125 14:43:57.097564 31432704 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
I0125 14:43:57.097605 31432704 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
I0125 14:43:57.097694 31969280 authenticator.cpp:204] Received SASL authentication start
I0125 14:43:57.097761 31969280 authenticator.cpp:326] Authentication requires more steps
I0125 14:43:57.097836 30359552 authenticatee.cpp:259] Received SASL authentication step
I0125 14:43:57.097904 29822976 authenticator.cpp:232] Received SASL authentication step
I0125 14:43:57.097934 29822976 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: 'Gastons-MacBook-Pro-2.local' server FQDN: 'Gastons-MacBook-Pro-2.local' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
I0125 14:43:57.097944 29822976 auxprop.cpp:181] Looking up auxiliary property '*userPassword'
I0125 14:43:57.097965 29822976 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I0125 14:43:57.097980 29822976 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: 'Gastons-MacBook-Pro-2.local' server FQDN: 'Gastons-MacBook-Pro-2.local' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true
I0125 14:43:57.097990 29822976 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I0125 14:43:57.097997 29822976 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0125 14:43:57.098006 29822976 authenticator.cpp:318] Authentication success
I0125 14:43:57.098071 30359552 authenticatee.cpp:299] Authentication success
I0125 14:43:57.098121 29286400 master.cpp:7031] Successfully authenticated principal 'test-principal' at slave(1)@172.18.9.98:49526
I0125 14:43:57.098161 31969280 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(2)@172.18.9.98:49526
I0125 14:43:57.098255 31432704 slave.cpp:1086] Successfully authenticated with master master@172.18.9.98:49526
I0125 14:43:57.098340 31432704 slave.cpp:1508] Will retry registration in 15.661984ms if necessary
I0125 14:43:57.098412 29822976 master.cpp:5299] Registering agent at slave(1)@172.18.9.98:49526 (172.18.9.98) with id 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0
I0125 14:43:57.098567 33042432 registrar.cpp:461] Applied 1 operations in 35us; attempting to update the registry
I0125 14:43:57.103466 33042432 registrar.cpp:506] Successfully updated the registry in 4.877056ms
I0125 14:43:57.103818 33042432 slave.cpp:4286] Received ping from slave-observer(1)@172.18.9.98:49526
I0125 14:43:57.103868 30896128 master.cpp:5370] Registered agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 at slave(1)@172.18.9.98:49526 (172.18.9.98) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0125 14:43:57.103914 33042432 slave.cpp:1132] Registered with master master@172.18.9.98:49526; given agent ID 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0
I0125 14:43:57.103934 33042432 fetcher.cpp:90] Clearing fetcher cache
I0125 14:43:57.104035 31969280 hierarchical.cpp:478] Added agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 (172.18.9.98) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: {})
I0125 14:43:57.104058 30359552 status_update_manager.cpp:184] Resuming sending status updates
I0125 14:43:57.104224 33042432 slave.cpp:1160] Checkpointing SlaveInfo to '/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg/meta/slaves/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0/slave.info'
I0125 14:43:57.104449 31969280 hierarchical.cpp:1772] No inverse offers to send out!
I0125 14:43:57.104480 31969280 hierarchical.cpp:1302] Performed allocation for agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 in 439us
I0125 14:43:57.104595 33042432 slave.cpp:1198] Forwarding total oversubscribed resources {}
I0125 14:43:57.104617 31432704 master.cpp:6830] Sending 1 offers to framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 (default) at scheduler-8fcc1ae4-1a01-4d5b-a2c8-444d1295751b@172.18.9.98:49526
I0125 14:43:57.104727 31432704 master.cpp:5863] Received update of agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 at slave(1)@172.18.9.98:49526 (172.18.9.98) with total oversubscribed resources {}
I0125 14:43:57.104867 29286400 hierarchical.cpp:548] Agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 (172.18.9.98) updated with oversubscribed resources {} (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000])
I0125 14:43:57.104959 29286400 hierarchical.cpp:1677] No allocations performed
I0125 14:43:57.104972 29286400 hierarchical.cpp:1772] No inverse offers to send out!
I0125 14:43:57.104961 32505856 sched.cpp:933] Scheduler::resourceOffers took 173us
I0125 14:43:57.104995 29286400 hierarchical.cpp:1302] Performed allocation for agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 in 74us
I0125 14:43:57.106262 29286400 master.cpp:3728] Processing ACCEPT call for offers: [ 41942215-2f76-4ffa-9bfc-383fc4bc2b04-O0 ] on agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 at slave(1)@172.18.9.98:49526 (172.18.9.98) for framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 (default) at scheduler-8fcc1ae4-1a01-4d5b-a2c8-444d1295751b@172.18.9.98:49526
I0125 14:43:57.106317 29286400 master.cpp:3316] Authorizing framework principal 'test-principal' to launch task 1
I0125 14:43:57.107606 32505856 master.cpp:8882] Adding task 1 with resources cpus(*):1.9; mem(*):992; disk(*):992; ports(*):[31000-32000] on agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 at slave(1)@172.18.9.98:49526 (172.18.9.98)
I0125 14:43:57.107693 32505856 master.cpp:4583] Launching task group { 1 } of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 (default) at scheduler-8fcc1ae4-1a01-4d5b-a2c8-444d1295751b@172.18.9.98:49526 with resources cpus(*):1.9; mem(*):992; disk(*):992; ports(*):[31000-32000] on agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 at slave(1)@172.18.9.98:49526 (172.18.9.98)
I0125 14:43:57.107838 30359552 slave.cpp:1576] Got assigned task group containing tasks [ 1 ] for framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.108158 30359552 slave.cpp:1736] Launching task group containing tasks [ 1 ] for framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.112789 30359552 paths.cpp:547] Trying to chown '/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg/slaves/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0/frameworks/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000/executors/default/runs/ed1df37e-c3cd-4cb7-9be0-3f57df46a746' to user 'gaston'
I0125 14:43:57.113065 30359552 slave.cpp:6331] Launching executor 'default' of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 with resources cpus(*):0.1; mem(*):32; disk(*):32 in work directory '/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg/slaves/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0/frameworks/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000/executors/default/runs/ed1df37e-c3cd-4cb7-9be0-3f57df46a746'
I0125 14:43:57.113381 30896128 containerizer.cpp:992] Starting container ed1df37e-c3cd-4cb7-9be0-3f57df46a746 for executor 'default' of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.113417 30359552 slave.cpp:2058] Queued task group containing tasks [ 1 ] for executor 'default' of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.113462 30359552 slave.cpp:882] Successfully attached file '/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg/slaves/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0/frameworks/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000/executors/default/runs/ed1df37e-c3cd-4cb7-9be0-3f57df46a746'
I0125 14:43:57.115821 31969280 containerizer.cpp:1541] Launching 'mesos-containerizer' with flags '--help="false" --launch_info="{"command":{"arguments":["mesos-default-executor","--launcher_dir=\/Users\/gaston\/mesosphere\/mesos\/build\/src"],"shell":false,"value":"\/Users\/gaston\/mesosphere\/mesos\/build\/src\/mesos-default-executor"},"environment":{"variables":[{"name":"LIBPROCESS_PORT","value":"0"},{"name":"MESOS_AGENT_ENDPOINT","value":"172.18.9.98:49526"},{"name":"MESOS_CHECKPOINT","value":"0"},{"name":"MESOS_DIRECTORY","value":"\/var\/folders\/jr\/17y2w4ld019bsn9vhx0c13f80000gn\/T\/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg\/slaves\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0\/frameworks\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000\/executors\/default\/runs\/ed1df37e-c3cd-4cb7-9be0-3f57df46a746"},{"name":"MESOS_EXECUTOR_ID","value":"default"},{"name":"MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD","value":"5secs"},{"name":"MESOS_FRAMEWORK_ID","value":"41942215-2f76-4ffa-9bfc-383fc
 4bc2b04-0000"},{"name":"MESOS_HTTP_COMMAND_EXECUTOR","value":"0"},{"name":"MESOS_SLAVE_ID","value":"41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0"},{"name":"MESOS_SLAVE_PID","value":"slave(1)@172.18.9.98:49526"},{"name":"MESOS_SANDBOX","value":"\/var\/folders\/jr\/17y2w4ld019bsn9vhx0c13f80000gn\/T\/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg\/slaves\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0\/frameworks\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000\/executors\/default\/runs\/ed1df37e-c3cd-4cb7-9be0-3f57df46a746"}]},"err":{"path":"\/var\/folders\/jr\/17y2w4ld019bsn9vhx0c13f80000gn\/T\/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg\/slaves\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0\/frameworks\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000\/executors\/default\/runs\/ed1df37e-c3cd-4cb7-9be0-3f57df46a746\/stderr","type":"PATH"},"out":{"path":"\/var\/folders\/jr\/17y2w4ld019bsn9vhx0c13f80000gn\/T\/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg\/slaves\/41942215-2f76-4ffa-9bfc-383fc4b
 c2b04-S0\/frameworks\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000\/executors\/default\/runs\/ed1df37e-c3cd-4cb7-9be0-3f57df46a746\/stdout","type":"PATH"},"user":"gaston","working_directory":"\/var\/folders\/jr\/17y2w4ld019bsn9vhx0c13f80000gn\/T\/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg\/slaves\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0\/frameworks\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000\/executors\/default\/runs\/ed1df37e-c3cd-4cb7-9be0-3f57df46a746"}" --pipe_read="6" --pipe_write="8" --runtime_directory="/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_oFfwrZ/containers/ed1df37e-c3cd-4cb7-9be0-3f57df46a746"'
I0125 14:43:57.117321 31969280 launcher.cpp:135] Forked child with pid '85059' for container 'ed1df37e-c3cd-4cb7-9be0-3f57df46a746'
I0125 14:43:57.119642 30896128 fetcher.cpp:349] Starting to fetch URIs for container: ed1df37e-c3cd-4cb7-9be0-3f57df46a746, directory: /var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg/slaves/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0/frameworks/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000/executors/default/runs/ed1df37e-c3cd-4cb7-9be0-3f57df46a746
I0125 14:43:57.222311 29822976 process.cpp:3697] Handling HTTP event for process 'slave(1)' with path: '/slave(1)/api/v1/executor'
I0125 14:43:57.223127 29822976 http.cpp:303] HTTP POST for /slave(1)/api/v1/executor from 172.18.9.98:49528
I0125 14:43:57.223256 29822976 slave.cpp:3109] Received Subscribe request for HTTP executor 'default' of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.224218 30896128 slave.cpp:2300] Sending queued task group task group containing tasks [ 1 ] to executor 'default' of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 (via HTTP)
I0125 14:43:57.237958 30896128 process.cpp:3697] Handling HTTP event for process 'slave(1)' with path: '/slave(1)/api/v1'
I0125 14:43:57.238396 30896128 http.cpp:303] HTTP POST for /slave(1)/api/v1 from 172.18.9.98:49530
I0125 14:43:57.238607 30896128 http.cpp:464] Processing call LAUNCH_NESTED_CONTAINER
I0125 14:43:57.239876 29822976 containerizer.cpp:1808] Starting nested container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1
I0125 14:43:57.240108 29822976 containerizer.cpp:1832] Trying to chown '/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg/slaves/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0/frameworks/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000/executors/default/runs/ed1df37e-c3cd-4cb7-9be0-3f57df46a746/containers/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1' to user 'gaston'
I0125 14:43:57.241133 32505856 containerizer.cpp:1541] Launching 'mesos-containerizer' with flags '--help="false" --launch_info="{"command":{"shell":true,"value":"sleep 120"},"environment":{"variables":[{"name":"MESOS_SANDBOX","value":"\/var\/folders\/jr\/17y2w4ld019bsn9vhx0c13f80000gn\/T\/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg\/slaves\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0\/frameworks\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000\/executors\/default\/runs\/ed1df37e-c3cd-4cb7-9be0-3f57df46a746\/containers\/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1"}]},"err":{"path":"\/var\/folders\/jr\/17y2w4ld019bsn9vhx0c13f80000gn\/T\/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg\/slaves\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0\/frameworks\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000\/executors\/default\/runs\/ed1df37e-c3cd-4cb7-9be0-3f57df46a746\/containers\/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1\/stderr","type":"PATH"},"out":{"path":"\/var\/folders\/jr\/17y2w4ld019bsn9vhx0c13f80000gn\/
 T\/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg\/slaves\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0\/frameworks\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000\/executors\/default\/runs\/ed1df37e-c3cd-4cb7-9be0-3f57df46a746\/containers\/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1\/stdout","type":"PATH"},"user":"gaston","working_directory":"\/var\/folders\/jr\/17y2w4ld019bsn9vhx0c13f80000gn\/T\/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg\/slaves\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0\/frameworks\/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000\/executors\/default\/runs\/ed1df37e-c3cd-4cb7-9be0-3f57df46a746\/containers\/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1"}" --pipe_read="10" --pipe_write="11" --runtime_directory="/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_oFfwrZ/containers/ed1df37e-c3cd-4cb7-9be0-3f57df46a746/containers/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1"'
I0125 14:43:57.251711 32505856 launcher.cpp:135] Forked child with pid '85088' for container 'ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1'
I0125 14:43:57.253597 33042432 fetcher.cpp:349] Starting to fetch URIs for container: ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1, directory: /var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg/slaves/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0/frameworks/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000/executors/default/runs/ed1df37e-c3cd-4cb7-9be0-3f57df46a746/containers/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1
E0125 14:43:57.257318 33579008 process.cpp:2419] Failed to shutdown socket with fd 9: Socket is not connected
I0125 14:43:57.257848 29822976 process.cpp:3697] Handling HTTP event for process 'slave(1)' with path: '/slave(1)/api/v1/executor'
I0125 14:43:57.258191 29286400 process.cpp:3697] Handling HTTP event for process 'slave(1)' with path: '/slave(1)/api/v1'
I0125 14:43:57.258460 29286400 http.cpp:303] HTTP POST for /slave(1)/api/v1/executor from 172.18.9.98:49529
I0125 14:43:57.258599 29286400 slave.cpp:3756] Handling status update TASK_RUNNING (UUID: c8583885-558e-44a3-9e0b-c39874132e27) for task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.258846 29286400 http.cpp:303] HTTP POST for /slave(1)/api/v1 from 172.18.9.98:49531
I0125 14:43:57.259006 29286400 process.cpp:3697] Handling HTTP event for process 'slave(1)' with path: '/slave(1)/api/v1'
I0125 14:43:57.259308 29286400 http.cpp:464] Processing call LAUNCH_NESTED_CONTAINER_SESSION
I0125 14:43:57.259663 30359552 status_update_manager.cpp:323] Received status update TASK_RUNNING (UUID: c8583885-558e-44a3-9e0b-c39874132e27) for task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.259696 30359552 status_update_manager.cpp:500] Creating StatusUpdate stream for task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.259893 29822976 containerizer.cpp:1808] Starting nested container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check
I0125 14:43:57.259938 30359552 status_update_manager.cpp:377] Forwarding update TASK_RUNNING (UUID: c8583885-558e-44a3-9e0b-c39874132e27) for task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 to the agent
I0125 14:43:57.259951 29286400 http.cpp:303] HTTP POST for /slave(1)/api/v1 from 172.18.9.98:49532
I0125 14:43:57.260105 29822976 containerizer.cpp:1832] Trying to chown '/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg/slaves/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0/frameworks/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000/executors/default/runs/ed1df37e-c3cd-4cb7-9be0-3f57df46a746/containers/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1/containers/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check' to user 'gaston'
I0125 14:43:57.260159 29286400 slave.cpp:4196] Forwarding the update TASK_RUNNING (UUID: c8583885-558e-44a3-9e0b-c39874132e27) for task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 to master@172.18.9.98:49526
I0125 14:43:57.260329 29286400 http.cpp:464] Processing call WAIT_NESTED_CONTAINER
I0125 14:43:57.260387 31432704 master.cpp:6008] Status update TASK_RUNNING (UUID: c8583885-558e-44a3-9e0b-c39874132e27) for task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 from agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 at slave(1)@172.18.9.98:49526 (172.18.9.98)
I0125 14:43:57.260434 31432704 master.cpp:6076] Forwarding status update TASK_RUNNING (UUID: c8583885-558e-44a3-9e0b-c39874132e27) for task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.260524 29286400 slave.cpp:4090] Status update manager successfully handled status update TASK_RUNNING (UUID: c8583885-558e-44a3-9e0b-c39874132e27) for task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.260597 31432704 master.cpp:8138] Updating the state of task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 (latest state: TASK_RUNNING, status update state: TASK_RUNNING)
I0125 14:43:57.260751 30896128 sched.cpp:1041] Scheduler::statusUpdate took 137us
I0125 14:43:57.260968 29286400 master.cpp:5015] Processing ACKNOWLEDGE call c8583885-558e-44a3-9e0b-c39874132e27 for task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 (default) at scheduler-8fcc1ae4-1a01-4d5b-a2c8-444d1295751b@172.18.9.98:49526 on agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0
I0125 14:43:57.261234 32505856 status_update_manager.cpp:395] Received status update acknowledgement (UUID: c8583885-558e-44a3-9e0b-c39874132e27) for task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.261389 31969280 slave.cpp:3045] Status update manager successfully handled status update acknowledgement (UUID: c8583885-558e-44a3-9e0b-c39874132e27) for task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.266468 30896128 switchboard.cpp:571] Launching 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" --help="false" --socket_address="/tmp/mesos-io-switchboard-a719e861-62a0-42dd-9a18-8af6f13b856f" --stderr_from_fd="15" --stderr_to_fd="2" --stdin_to_fd="12" --stdout_from_fd="13" --stdout_to_fd="1" --tty="false" --wait_for_connection="true"' for container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check
I0125 14:43:57.267304 30896128 switchboard.cpp:601] Created I/O switchboard server (pid: 85095) listening on socket file '/tmp/mesos-io-switchboard-a719e861-62a0-42dd-9a18-8af6f13b856f' for container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check
I0125 14:43:57.268826 29822976 containerizer.cpp:1541] Launching 'mesos-containerizer' with flags '--help="false" --launch_info="{"command":{"shell":true,"value":"exit $STATUS"},"environment":{"variables":[{"name":"STATUS","value":"0"}]},"err":{"fd":16,"type":"FD"},"in":{"fd":9,"type":"FD"},"out":{"fd":14,"type":"FD"},"user":"gaston"}" --pipe_read="12" --pipe_write="13" --runtime_directory="/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_oFfwrZ/containers/ed1df37e-c3cd-4cb7-9be0-3f57df46a746/containers/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1/containers/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check"'
I0125 14:43:57.269855 29822976 launcher.cpp:135] Forked child with pid '85099' for container 'ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check'
I0125 14:43:57.271919 33042432 fetcher.cpp:349] Starting to fetch URIs for container: ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check, directory: /var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg/slaves/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0/frameworks/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000/executors/default/runs/ed1df37e-c3cd-4cb7-9be0-3f57df46a746/containers/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1/containers/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check
I0125 14:43:57.330857 32505856 http.cpp:2647] Received EOF attach response for ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check
I0125 14:43:57.331059 29286400 containerizer.cpp:2119] Destroying container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check in RUNNING state
W0125 14:43:57.331065 32505856 http.cpp:2662] Launch nested container session connection for container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check closed
I0125 14:43:57.331297 29286400 launcher.cpp:151] Asked to destroy container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check
I0125 14:43:57.332167 30896128 process.cpp:3697] Handling HTTP event for process 'slave(1)' with path: '/slave(1)/api/v1'
I0125 14:43:57.332547 31969280 http.cpp:303] HTTP POST for /slave(1)/api/v1 from 172.18.9.98:49534
I0125 14:43:57.332726 31969280 http.cpp:464] Processing call WAIT_NESTED_CONTAINER
I0125 14:43:57.366807 33042432 switchboard.cpp:873] I/O switchboard server process for container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check has terminated (status=0)
I0125 14:43:57.367132 31432704 containerizer.cpp:2482] Container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check has exited
I0125 14:43:57.367938 31969280 provisioner.cpp:322] Ignoring destroy request for unknown container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check
I0125 14:43:57.368036 33042432 containerizer.cpp:2398] Checkpointing termination state to nested container's runtime directory '/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_oFfwrZ/containers/ed1df37e-c3cd-4cb7-9be0-3f57df46a746/containers/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1/containers/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check/termination'
I0125 14:43:57.387104 29822976 process.cpp:3697] Handling HTTP event for process 'slave(1)' with path: '/slave(1)/api/v1'
I0125 14:43:57.387472 29822976 http.cpp:303] HTTP POST for /slave(1)/api/v1 from 172.18.9.98:49535
I0125 14:43:57.387668 29822976 http.cpp:464] Processing call LAUNCH_NESTED_CONTAINER_SESSION
I0125 14:43:57.387924 33042432 containerizer.cpp:1808] Starting nested container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check
I0125 14:43:57.387951 29822976 process.cpp:3697] Handling HTTP event for process 'slave(1)' with path: '/slave(1)/api/v1/executor'
I0125 14:43:57.388016 33042432 containerizer.cpp:1832] Trying to chown '/var/folders/jr/17y2w4ld019bsn9vhx0c13f80000gn/T/HealthCheckTest_DefaultExecutorCmdHealthCheck_zSkvqg/slaves/41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0/frameworks/41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000/executors/default/runs/ed1df37e-c3cd-4cb7-9be0-3f57df46a746/containers/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1/containers/dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check' to user 'gaston'
I0125 14:43:57.388273 31432704 http.cpp:303] HTTP POST for /slave(1)/api/v1/executor from 172.18.9.98:49529
I0125 14:43:57.388356 31432704 slave.cpp:3756] Handling status update TASK_RUNNING (UUID: 0a69f89b-35c9-4943-a6b4-2f9646b11338) for task 1 in health state healthy of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.389214 29822976 status_update_manager.cpp:323] Received status update TASK_RUNNING (UUID: 0a69f89b-35c9-4943-a6b4-2f9646b11338) for task 1 in health state healthy of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.389328 29822976 status_update_manager.cpp:377] Forwarding update TASK_RUNNING (UUID: 0a69f89b-35c9-4943-a6b4-2f9646b11338) for task 1 in health state healthy of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 to the agent
I0125 14:43:57.389411 30896128 slave.cpp:4196] Forwarding the update TASK_RUNNING (UUID: 0a69f89b-35c9-4943-a6b4-2f9646b11338) for task 1 in health state healthy of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 to master@172.18.9.98:49526
I0125 14:43:57.389511 30896128 slave.cpp:4090] Status update manager successfully handled status update TASK_RUNNING (UUID: 0a69f89b-35c9-4943-a6b4-2f9646b11338) for task 1 in health state healthy of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.389554 32505856 master.cpp:6008] Status update TASK_RUNNING (UUID: 0a69f89b-35c9-4943-a6b4-2f9646b11338) for task 1 in health state healthy of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 from agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 at slave(1)@172.18.9.98:49526 (172.18.9.98)
I0125 14:43:57.389578 32505856 master.cpp:6076] Forwarding status update TASK_RUNNING (UUID: 0a69f89b-35c9-4943-a6b4-2f9646b11338) for task 1 in health state healthy of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.389693 32505856 master.cpp:8138] Updating the state of task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 (latest state: TASK_RUNNING, status update state: TASK_RUNNING)
I0125 14:43:57.389803 31432704 sched.cpp:1041] Scheduler::statusUpdate took 76us
I0125 14:43:57.390027 3095696320 sched.cpp:2021] Asked to stop the driver
I0125 14:43:57.390022 29822976 master.cpp:5015] Processing ACKNOWLEDGE call 0a69f89b-35c9-4943-a6b4-2f9646b11338 for task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 (default) at scheduler-8fcc1ae4-1a01-4d5b-a2c8-444d1295751b@172.18.9.98:49526 on agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0
I0125 14:43:57.390105 31969280 sched.cpp:1203] Stopping framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.390228 31432704 master.cpp:7543] Processing TEARDOWN call for framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 (default) at scheduler-8fcc1ae4-1a01-4d5b-a2c8-444d1295751b@172.18.9.98:49526
I0125 14:43:57.390235 30359552 status_update_manager.cpp:395] Received status update acknowledgement (UUID: 0a69f89b-35c9-4943-a6b4-2f9646b11338) for task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.390249 31432704 master.cpp:7555] Removing framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 (default) at scheduler-8fcc1ae4-1a01-4d5b-a2c8-444d1295751b@172.18.9.98:49526
I0125 14:43:57.390262 31432704 master.cpp:3078] Deactivating framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 (default) at scheduler-8fcc1ae4-1a01-4d5b-a2c8-444d1295751b@172.18.9.98:49526
I0125 14:43:57.390390 32505856 hierarchical.cpp:386] Deactivated framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.390399 31969280 slave.cpp:3045] Status update manager successfully handled status update acknowledgement (UUID: 0a69f89b-35c9-4943-a6b4-2f9646b11338) for task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.390424 31432704 master.cpp:8138] Updating the state of task 1 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I0125 14:43:57.390444 31969280 slave.cpp:2598] Asked to shut down framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 by master@172.18.9.98:49526
I0125 14:43:57.390458 31969280 slave.cpp:2623] Shutting down framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.390477 31969280 slave.cpp:5005] Shutting down executor 'default' of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 (via HTTP)
I0125 14:43:57.390702 31432704 master.cpp:8232] Removing task 1 with resources cpus(*):1.9; mem(*):992; disk(*):992; ports(*):[31000-32000] of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 on agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 at slave(1)@172.18.9.98:49526 (172.18.9.98)
I0125 14:43:57.390861 31432704 master.cpp:8261] Removing executor 'default' with resources cpus(*):0.1; mem(*):32; disk(*):32 of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 on agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 at slave(1)@172.18.9.98:49526 (172.18.9.98)
I0125 14:43:57.390902 33042432 hierarchical.cpp:1011] Recovered cpus(*):1.9; mem(*):992; disk(*):992; ports(*):[31000-32000] (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: cpus(*):0.1; mem(*):32; disk(*):32) on agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 from framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.391154 32505856 containerizer.cpp:2119] Destroying container ed1df37e-c3cd-4cb7-9be0-3f57df46a746 in RUNNING state
I0125 14:43:57.391171 32505856 containerizer.cpp:2119] Destroying container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1 in RUNNING state
I0125 14:43:57.391186 32505856 containerizer.cpp:2119] Destroying container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check in PREPARING state
I0125 14:43:57.391238 29822976 hierarchical.cpp:1011] Recovered cpus(*):0.1; mem(*):32; disk(*):32 (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: {}) on agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 from framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.391365 29822976 hierarchical.cpp:337] Removed framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000
I0125 14:43:57.391458 32505856 containerizer.cpp:2190] Waiting for the isolators to complete preparing before destroying container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check
I0125 14:43:57.394384 31432704 process.cpp:3697] Handling HTTP event for process 'slave(1)' with path: '/slave(1)/api/v1'
I0125 14:43:57.394676 30896128 http.cpp:303] HTTP POST for /slave(1)/api/v1 from 172.18.9.98:49536
I0125 14:43:57.394672 29286400 switchboard.cpp:571] Launching 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" --help="false" --socket_address="/tmp/mesos-io-switchboard-9127715f-8aed-454e-8ec3-458fbec784fa" --stderr_from_fd="15" --stderr_to_fd="2" --stdin_to_fd="12" --stdout_from_fd="13" --stdout_to_fd="1" --tty="false" --wait_for_connection="true"' for container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check
I0125 14:43:57.394807 30896128 http.cpp:464] Processing call KILL_NESTED_CONTAINER
I0125 14:43:57.395575 29286400 switchboard.cpp:601] Created I/O switchboard server (pid: 85132) listening on socket file '/tmp/mesos-io-switchboard-9127715f-8aed-454e-8ec3-458fbec784fa' for container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check
W0125 14:43:57.396780 31432704 http.cpp:2226] Failed to launch nested container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check: Container is being destroyed during preparing
I0125 14:43:57.396934 30896128 process.cpp:3754] Failed to process request for '/slave(1)/api/v1': Container is being destroyed during preparing
I0125 14:43:57.396984 31432704 process.cpp:1456] Returning '500 Internal Server Error' for '/slave(1)/api/v1' (Container is being destroyed during preparing)
I0125 14:43:58.061504 30359552 hierarchical.cpp:1677] No allocations performed
I0125 14:43:58.061565 30359552 hierarchical.cpp:1279] Performed allocation for 1 agents in 189us
I0125 14:43:59.067081 30896128 hierarchical.cpp:1677] No allocations performed
I0125 14:43:59.067144 30896128 hierarchical.cpp:1279] Performed allocation for 1 agents in 206us
I0125 14:44:00.068758 33042432 hierarchical.cpp:1677] No allocations performed
I0125 14:44:00.068822 33042432 hierarchical.cpp:1279] Performed allocation for 1 agents in 182us
I0125 14:44:01.070752 31969280 hierarchical.cpp:1677] No allocations performed
I0125 14:44:01.070814 31969280 hierarchical.cpp:1279] Performed allocation for 1 agents in 171us
I0125 14:44:02.071462 30896128 hierarchical.cpp:1677] No allocations performed
I0125 14:44:02.071526 30896128 hierarchical.cpp:1279] Performed allocation for 1 agents in 175us
I0125 14:44:02.393194 32505856 slave.cpp:5078] Killing executor 'default' of framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 (via HTTP)
E0125 14:44:02.394971 33579008 process.cpp:2419] Failed to shutdown socket with fd 11: Socket is not connected
E0125 14:44:02.395092 33579008 process.cpp:2419] Failed to shutdown socket with fd 18: Socket is not connected
E0125 14:44:02.395231 33579008 process.cpp:2419] Failed to shutdown socket with fd 6: Socket is not connected
E0125 14:44:02.395437 33579008 process.cpp:2419] Failed to shutdown socket with fd 8: Socket is not connected
I0125 14:44:02.397722 33579008 switchboard.cpp:782] Sending SIGTERM to I/O switchboard server (pid: 85132) since container ed1df37e-c3cd-4cb7-9be0-3f57df46a746.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1.dc6d91c1-1a7d-4563-bef6-a9e3fcad3cb1-health-check is being destroyed
I0125 14:44:02.431910 33042432 containerizer.cpp:2482] Container ed1df37e-c3cd-4cb7-9be0-3f57df46a746 has exited
I0125 14:44:03.074764 29286400 hierarchical.cpp:1677] No allocations performed
I0125 14:44:03.074825 29286400 hierarchical.cpp:1279] Performed allocation for 1 agents in 223us
I0125 14:44:04.080370 29822976 hierarchical.cpp:1677] No allocations performed
I0125 14:44:04.080440 29822976 hierarchical.cpp:1279] Performed allocation for 1 agents in 190us
I0125 14:44:05.080876 30896128 hierarchical.cpp:1677] No allocations performed
I0125 14:44:05.080950 30896128 hierarchical.cpp:1279] Performed allocation for 1 agents in 197us
I0125 14:44:06.082023 30896128 hierarchical.cpp:1677] No allocations performed
I0125 14:44:06.082082 30896128 hierarchical.cpp:1279] Performed allocation for 1 agents in 186us
I0125 14:44:07.086395 29286400 hierarchical.cpp:1677] No allocations performed
I0125 14:44:07.086458 29286400 hierarchical.cpp:1279] Performed allocation for 1 agents in 226us
I0125 14:44:08.089059 33042432 hierarchical.cpp:1677] No allocations performed
I0125 14:44:08.089119 33042432 hierarchical.cpp:1279] Performed allocation for 1 agents in 168us
I0125 14:44:09.090726 31969280 hierarchical.cpp:1677] No allocations performed
I0125 14:44:09.090795 31969280 hierarchical.cpp:1279] Performed allocation for 1 agents in 205us
I0125 14:44:10.096534 33042432 hierarchical.cpp:1677] No allocations performed
I0125 14:44:10.096603 33042432 hierarchical.cpp:1279] Performed allocation for 1 agents in 243us
I0125 14:44:11.098918 30359552 hierarchical.cpp:1677] No allocations performed
I0125 14:44:11.098978 30359552 hierarchical.cpp:1279] Performed allocation for 1 agents in 175us
I0125 14:44:12.080423 31432704 slave.cpp:5596] Querying resource estimator for oversubscribable resources
I0125 14:44:12.080592 30359552 slave.cpp:5610] Received oversubscribable resources {} from the resource estimator
I0125 14:44:12.099279 31969280 hierarchical.cpp:1677] No allocations performed
I0125 14:44:12.099328 31969280 hierarchical.cpp:1279] Performed allocation for 1 agents in 180us
I0125 14:44:12.104923 31969280 slave.cpp:4286] Received ping from slave-observer(1)@172.18.9.98:49526
../../src/tests/cluster.cpp:570: Failure
Failed to wait 15secs for wait
I0125 14:44:12.395627 3095696320 slave.cpp:801] Agent terminating
I0125 14:44:12.395679 3095696320 slave.cpp:2598] Asked to shut down framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 by @0.0.0.0:0
W0125 14:44:12.395700 3095696320 slave.cpp:2619] Ignoring shutdown framework 41942215-2f76-4ffa-9bfc-383fc4bc2b04-0000 because it is terminating
I0125 14:44:12.395988 31969280 master.cpp:1261] Agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 at slave(1)@172.18.9.98:49526 (172.18.9.98) disconnected
I0125 14:44:12.396028 31969280 master.cpp:3115] Disconnecting agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 at slave(1)@172.18.9.98:49526 (172.18.9.98)
I0125 14:44:12.396126 31969280 master.cpp:3134] Deactivating agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 at slave(1)@172.18.9.98:49526 (172.18.9.98)
I0125 14:44:12.396370 32505856 hierarchical.cpp:577] Agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0 deactivated
I0125 14:44:12.401453 29822976 master.cpp:1100] Master terminating
I0125 14:44:12.401763 31432704 hierarchical.cpp:510] Removed agent 41942215-2f76-4ffa-9bfc-383fc4bc2b04-S0
[  FAILED  ] HealthCheckTest.DefaultExecutorCmdHealthCheck (15404 ms)
[----------] 1 test from HealthCheckTest (15407 ms total)

[----------] Global test environment tear-down
../../src/tests/environment.cpp:843: Failure
Failed
Tests completed with child processes remaining:
-+- 85038 /Users/gaston/mesosphere/mesos/build/src/.libs/mesos-tests --gtest_filter=HealthCheckTest.DefaultExecutorCmd* --break-on-error --verbose
 |-+- 85088 /Users/gaston/mesosphere/mesos/build/src/.libs/mesos-containerizer launch
 | --- 85130 sleep 120
 --- 85132 /Users/gaston/mesosphere/mesos/build/src/.libs/mesos-io-switchboard --heartbeat_interval=30secs --help=false --socket_address=/tmp/mesos-io-switchboard-9127715f-8aed-454e-8ec3-458fbec784fa --stderr_from_fd=15 --stderr_to_fd=2 --stdin_to_fd=12 --stdout_from_fd=13 --stdout_to_fd=1 --tty=false --wait_for_connection=true
[==========] 1 test from 1 test case ran. (15418 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] HealthCheckTest.DefaultExecutorCmdHealthCheck

 1 FAILED TEST
```

- Gast�n Kleiman


On Jan. 25, 2017, 12:33 a.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Jan. 25, 2017, 12:33 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> [WIP] Added support for command health checks to the default executor.
> 
> This is still a work in progress, but I am posting it in order to get some feedback on the general approach.
> 
> Open questions/issues:
> 
>   - The code that handles communication with the Agent is on the `DefaultExecutor`, so the health checker calls `evolve` when serializing the API calls. Should it use the v1 protos instead, and skip the `evolve` call?
>   - CMD health checks for command tasks inherit the env from the executor, but allow overriding env variables. I don't think that inheriting the executor env makes sense for cmd health checks launched by the `DefaultExecutor` - each task can have its own env, so the health check command should inherit the task env and merge it with the one in the check's `CommandInfo`.
>     I added some code that takes the task env from `TaskInfo` and merges it check's env, but I don't think that this is an ideal approach, since it will miss env variables set/modified by containerizers/hooks/isolators. Do you think that it would make sense for all Debug Containers to inherit the environment of the parent container?
>   - There's a TODO for stopping/resuming health checking when the `DefaultExecutor` is disconnected from the agent. After talking to AlexR, I believe that we should do this for all types of checks, not just for `CMD`.
>   - The test that I introduced passes on Linux, but not on macOS, I still have to do some more debugging.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 6e558f2061a9e31157c47d31cb64b3a8568aace3 
>   src/checks/health_checker.cpp 50aa2858e807b27bbab58a3618f5200cfe4eca9e 
>   src/launcher/default_executor.cpp a03794934adb93868734f8cf00b337a1bff9b5ab 
>   src/tests/health_check_tests.cpp debbd3c09b7555145aaf3f62a24d795d1423a269 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review163402
-----------------------------------------------------------



Patch looks great!

Reviews applied: [55900, 55901]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Jan. 28, 2017, 12:39 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2017, 12:39 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp e70bd7936752613a4f92c70c4c61cd7cdf7c4ee5 
>   src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
>   src/tests/health_check_tests.cpp 710cb66eff6c4447caa22772f0cdc97cfa582c50 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review164003
-----------------------------------------------------------



Patch looks great!

Reviews applied: [55900, 55901]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 2, 2017, 5:02 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 2, 2017, 5:02 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
>   src/tests/health_check_tests.cpp 8418cd91484fd26734de16255b37f3ebf574f5eb 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review164138
-----------------------------------------------------------



Patch looks great!

Reviews applied: [55900, 55901]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 3, 2017, 5:15 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2017, 5:15 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
>   src/tests/health_check_tests.cpp 8418cd91484fd26734de16255b37f3ebf574f5eb 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review164951
-----------------------------------------------------------




src/checks/health_checker.cpp (line 657)
<https://reviews.apache.org/r/55901/#comment236804>

    I wanted to make the errors here similar to/consistent with the errors logged by the default executor:
    https://github.com/apache/mesos/blob/c2388a511c775dd6f392961b06fd7738bf051dbc/src/launcher/default_executor.cpp#L618-L624



src/checks/health_checker.cpp (line 667)
<https://reviews.apache.org/r/55901/#comment236789>

    It depends on what containerizer is being used. With the `MesosContainerizer`, we get the exit status as returned by `waitpid`. I added a comment in the header file.


- Gast�n Kleiman


On Feb. 8, 2017, 1:27 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 8, 2017, 1:27 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.

> On Feb. 9, 2017, 3:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, lines 653-654
> > <https://reviews.apache.org/r/55901/diff/8/?file=1627619#file1627619line653>
> >
> >     Why not `defer`?

Added `defer` where it corresponds.


> On Feb. 9, 2017, 3:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, line 635
> > <https://reviews.apache.org/r/55901/diff/8/?file=1627619#file1627619line635>
> >
> >     Could you please specify what this function returns? Status or exit code?

This depends on what containerizer the agent is using. In the case of the Mesos containerizer, it currently returns a status code.


> On Feb. 9, 2017, 3:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, line 667
> > <https://reviews.apache.org/r/55901/diff/8/?file=1627619#file1627619line667>
> >
> >     What do you get from the agent call: status or exit code directly? It's probably not that important here, but it is for checks.

This depends on what containerizer the agent is using. In the case of the Mesos containerizer, it currently returns a status code.


> On Feb. 9, 2017, 3:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, lines 565-567
> > <https://reviews.apache.org/r/55901/diff/8/?file=1627619#file1627619line565>
> >
> >     This should probably be indented, no?

Fixed.


> On Feb. 9, 2017, 3:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, line 555
> > <https://reviews.apache.org/r/55901/diff/8/?file=1627619#file1627619line555>
> >
> >     s/until/until after/ ?

Fixed.


> On Feb. 9, 2017, 3:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, line 615
> > <https://reviews.apache.org/r/55901/diff/8/?file=1627619#file1627619line615>
> >
> >     s/this future is completed//we finish processing current timeout./
> >     
> >     It's not clear what "this future" means.

Fixed.


- Gast�n


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review164919
-----------------------------------------------------------


On Feb. 10, 2017, 6:40 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 10, 2017, 6:40 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
>   src/checks/health_checker.cpp a5225ff1f4b071ed4182d41fa8ecc705fa4dbe00 
>   src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS, because of MESOS-7050.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review164919
-----------------------------------------------------------


Fix it, then Ship it!




LGTM modulo possible collision with the previous check on timeout. Please confirm such collision is not possible.


src/checks/health_checker.cpp (lines 338 - 342)
<https://reviews.apache.org/r/55901/#comment236747>

    How about `checkResult = commandCheckViaAgent ? nestedCommandHealthCheck() : commandHealthCheck();` ?



src/checks/health_checker.cpp (line 555)
<https://reviews.apache.org/r/55901/#comment236753>

    s/until/until after/ ?



src/checks/health_checker.cpp (lines 565 - 567)
<https://reviews.apache.org/r/55901/#comment236752>

    This should probably be indented, no?



src/checks/health_checker.cpp (lines 575 - 580)
<https://reviews.apache.org/r/55901/#comment236757>

    After thinking some time I think I understand why we treat any HTTP status code other than 200 as health check failure. I think it still makes sense to capture this in a comment explaining that if the agent cannot launch a nested container for _any_ reason, it is considered a health check failure. We should also mention this when we update the documentaiton.



src/checks/health_checker.cpp (line 615)
<https://reviews.apache.org/r/55901/#comment236759>

    s/this future is completed//we finish processing current timeout./
    
    It's not clear what "this future" means.



src/checks/health_checker.cpp (lines 623 - 628)
<https://reviews.apache.org/r/55901/#comment236762>

    I think what you're trying to say is that it is fine to complete the future returned from this call if `waitForNestedContainer` fails, because 1) debug containers are not checkpointed and 2) `waitForNestedContainer` fails iff agent fails as well hence cleaning up the container.
    
    Could you please rephrase the comment mentioning both your expectation (no wait retry is fine) and reasoning?



src/checks/health_checker.cpp (line 635)
<https://reviews.apache.org/r/55901/#comment236774>

    Could you please specify what this function returns? Status or exit code?



src/checks/health_checker.cpp (lines 653 - 654)
<https://reviews.apache.org/r/55901/#comment236765>

    Why not `defer`?



src/checks/health_checker.cpp (lines 654 - 670)
<https://reviews.apache.org/r/55901/#comment236763>

    I would extract this lambda into a separate callback for readability.



src/checks/health_checker.cpp (line 657)
<https://reviews.apache.org/r/55901/#comment236771>

    Maybe saying "Agent returned" instead of "Received"?



src/checks/health_checker.cpp (line 666)
<https://reviews.apache.org/r/55901/#comment236764>

    please `CHECK` that `response->has_wait_nested_container()`.



src/checks/health_checker.cpp (lines 666 - 670)
<https://reviews.apache.org/r/55901/#comment236766>

    ```
      return (
        response->wait_nested_container().has_exit_status()
          ? response->wait_nested_container().exit_status()
          : None());
    ```



src/checks/health_checker.cpp (line 667)
<https://reviews.apache.org/r/55901/#comment236770>

    What do you get from the agent call: status or exit code directly? It's probably not that important here, but it is for checks.


- Alexander Rukletsov


On Feb. 8, 2017, 1:27 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 8, 2017, 1:27 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.

> On March 22, 2017, 3:11 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp
> > Lines 474 (patched)
> > <https://reviews.apache.org/r/55901/diff/16/?file=1670849#file1670849line474>
> >
> >     Please escape task ids, since they are not generated by Mesos and may contain space.
> >     
> >     Also, you don't need to `stringify` `TaskID` here.

Fixed in https://reviews.apache.org/r/57854/


- Gast�n


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review169691
-----------------------------------------------------------


On March 21, 2017, 2:01 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated March 21, 2017, 2:01 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 44df544b585b8c9f1138fc69b34b064bae8cc867 
>   src/checks/health_checker.cpp a26e9b570ea3a0ee775d220a3b523ae7052dad23 
>   src/launcher/default_executor.cpp 6a885af50db6d489194cc9b780fcf1494e853694 
>   src/tests/health_check_tests.cpp fc1c8281ff8fbb76b7ac510ce36c98db4de4d171 
> 
> 
> Diff: https://reviews.apache.org/r/55901/diff/16/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux and macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.

> On March 22, 2017, 3:11 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp
> > Lines 601 (patched)
> > <https://reviews.apache.org/r/55901/diff/16/?file=1670849#file1670849line601>
> >
> >     Please avoid shadowing. `waitFuture`?

That `Future` aint need no name!


- Gast�n


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review169691
-----------------------------------------------------------


On March 21, 2017, 2:01 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated March 21, 2017, 2:01 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 44df544b585b8c9f1138fc69b34b064bae8cc867 
>   src/checks/health_checker.cpp a26e9b570ea3a0ee775d220a3b523ae7052dad23 
>   src/launcher/default_executor.cpp 6a885af50db6d489194cc9b780fcf1494e853694 
>   src/tests/health_check_tests.cpp fc1c8281ff8fbb76b7ac510ce36c98db4de4d171 
> 
> 
> Diff: https://reviews.apache.org/r/55901/diff/16/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux and macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review169691
-----------------------------------------------------------


Fix it, then Ship it!





src/checks/health_checker.cpp
Lines 474 (patched)
<https://reviews.apache.org/r/55901/#comment242212>

    Please escape task ids, since they are not generated by Mesos and may contain space.
    
    Also, you don't need to `stringify` `TaskID` here.



src/checks/health_checker.cpp
Lines 601 (patched)
<https://reviews.apache.org/r/55901/#comment242213>

    Please avoid shadowing. `waitFuture`?



src/tests/health_check_tests.cpp
Lines 2191 (patched)
<https://reviews.apache.org/r/55901/#comment242223>

    This will very likely not work on Windows, which is fine for now, because the test is disabled on Windows.



src/tests/health_check_tests.cpp
Lines 2196-2199 (patched)
<https://reviews.apache.org/r/55901/#comment242217>

    I know this is what `clang-format` suggest, but I'm not sure this formatting is not common in our codebase. Moreover, it's "jagged". I'd vote for
    ```
    Environment::Variable* variable = healthCheck.mutable_command()->
      mutable_environment()->mutable_variables()->Add();
    ```


- Alexander Rukletsov


On March 21, 2017, 2:01 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated March 21, 2017, 2:01 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 44df544b585b8c9f1138fc69b34b064bae8cc867 
>   src/checks/health_checker.cpp a26e9b570ea3a0ee775d220a3b523ae7052dad23 
>   src/launcher/default_executor.cpp 6a885af50db6d489194cc9b780fcf1494e853694 
>   src/tests/health_check_tests.cpp fc1c8281ff8fbb76b7ac510ce36c98db4de4d171 
> 
> 
> Diff: https://reviews.apache.org/r/55901/diff/16/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux and macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for COMMAND health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated March 23, 2017, 11:39 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Summary (updated)
-----------------

Added support for COMMAND health checks to the default executor.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
-------

Added support for command health checks to the default executor.


Diffs
-----

  src/checks/health_checker.hpp 44df544b585b8c9f1138fc69b34b064bae8cc867 
  src/checks/health_checker.cpp a26e9b570ea3a0ee775d220a3b523ae7052dad23 
  src/launcher/default_executor.cpp 6a885af50db6d489194cc9b780fcf1494e853694 
  src/tests/health_check_tests.cpp fc1c8281ff8fbb76b7ac510ce36c98db4de4d171 


Diff: https://reviews.apache.org/r/55901/diff/19/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux and macOS.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated March 22, 2017, 6:19 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Addresed feedback.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description (updated)
-------

Added support for command health checks to the default executor.


Diffs (updated)
-----

  src/checks/health_checker.hpp 44df544b585b8c9f1138fc69b34b064bae8cc867 
  src/checks/health_checker.cpp a26e9b570ea3a0ee775d220a3b523ae7052dad23 
  src/launcher/default_executor.cpp 6a885af50db6d489194cc9b780fcf1494e853694 
  src/tests/health_check_tests.cpp fc1c8281ff8fbb76b7ac510ce36c98db4de4d171 


Diff: https://reviews.apache.org/r/55901/diff/17/

Changes: https://reviews.apache.org/r/55901/diff/16-17/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux and macOS.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated March 21, 2017, 2:01 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Fixed typos, removed single quotes from log messages and fixed flaky test in macOS.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description (updated)
-------

See summary.


Diffs (updated)
-----

  src/checks/health_checker.hpp 44df544b585b8c9f1138fc69b34b064bae8cc867 
  src/checks/health_checker.cpp a26e9b570ea3a0ee775d220a3b523ae7052dad23 
  src/launcher/default_executor.cpp 6a885af50db6d489194cc9b780fcf1494e853694 
  src/tests/health_check_tests.cpp fc1c8281ff8fbb76b7ac510ce36c98db4de4d171 


Diff: https://reviews.apache.org/r/55901/diff/16/

Changes: https://reviews.apache.org/r/55901/diff/15-16/


Testing (updated)
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux and macOS.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated March 16, 2017, 2:31 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Rebased + updated accept type.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description (updated)
-------

Added support for command health checks to the default executor.


Diffs (updated)
-----

  src/checks/health_checker.hpp 44df544b585b8c9f1138fc69b34b064bae8cc867 
  src/checks/health_checker.cpp 7efefe9455df20a038a2c9d4f67c0c0cd69c301e 
  src/launcher/default_executor.cpp cbd4f7ecd042e7fa603bd69774d95472df2c896d 
  src/tests/health_check_tests.cpp fc1c8281ff8fbb76b7ac510ce36c98db4de4d171 


Diff: https://reviews.apache.org/r/55901/diff/15/

Changes: https://reviews.apache.org/r/55901/diff/14-15/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS, because of MESOS-7050.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review169120
-----------------------------------------------------------


Fix it, then Ship it!





src/checks/health_checker.cpp
Lines 505-506 (patched)
<https://reviews.apache.org/r/55901/#comment241442>

    We changed the `Accept` type for streaming responses recently (though the old type is still accepted for backwards compat)
    
    ```
    Accept : stringify(ContentType::RECORDIO);
    Message-Accept: stringify(ContentType::ContentType::PROTOBUF)
    ```


- Vinod Kone


On March 15, 2017, 3:05 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated March 15, 2017, 3:05 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See the summary.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
>   src/checks/health_checker.cpp 6c97369fd9a567ba16dd92085bf142d43f71eaf1 
>   src/launcher/default_executor.cpp 0ed436faa68e984d0be4e5186138f738bc7f1b52 
>   src/tests/health_check_tests.cpp fc1c8281ff8fbb76b7ac510ce36c98db4de4d171 
> 
> 
> Diff: https://reviews.apache.org/r/55901/diff/14/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS, because of MESOS-7050.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated March 15, 2017, 3:05 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Rebase + added a TODO.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description (updated)
-------

See the summary.


Diffs (updated)
-----

  src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
  src/checks/health_checker.cpp 6c97369fd9a567ba16dd92085bf142d43f71eaf1 
  src/launcher/default_executor.cpp 0ed436faa68e984d0be4e5186138f738bc7f1b52 
  src/tests/health_check_tests.cpp fc1c8281ff8fbb76b7ac510ce36c98db4de4d171 


Diff: https://reviews.apache.org/r/55901/diff/14/

Changes: https://reviews.apache.org/r/55901/diff/13-14/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS, because of MESOS-7050.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated March 6, 2017, 4:23 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Rebased + removed env inheritance.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
-------

Added support for command health checks to the default executor.


Diffs (updated)
-----

  src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
  src/checks/health_checker.cpp 6c97369fd9a567ba16dd92085bf142d43f71eaf1 
  src/launcher/default_executor.cpp 0ed436faa68e984d0be4e5186138f738bc7f1b52 
  src/tests/health_check_tests.cpp 56e90747f2c943daee675738428d8ddeeafde36d 


Diff: https://reviews.apache.org/r/55901/diff/13/

Changes: https://reviews.apache.org/r/55901/diff/12-13/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS, because of MESOS-7050.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated Feb. 17, 2017, 11:59 a.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Addressed feedback.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
-------

Added support for command health checks to the default executor.


Diffs (updated)
-----

  src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
  src/checks/health_checker.cpp 6c97369fd9a567ba16dd92085bf142d43f71eaf1 
  src/launcher/default_executor.cpp d97324b0e5eb3f145ee93aa0821205c5eefbfce0 
  src/tests/health_check_tests.cpp 476e04a12f885e5060a6e838b82ee599594c96f7 

Diff: https://reviews.apache.org/r/55901/diff/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS, because of MESOS-7050.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated Feb. 16, 2017, 2:59 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Addresed Vinod's feedback.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
-------

Added support for command health checks to the default executor.


Diffs (updated)
-----

  src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
  src/checks/health_checker.cpp 6c97369fd9a567ba16dd92085bf142d43f71eaf1 
  src/launcher/default_executor.cpp d97324b0e5eb3f145ee93aa0821205c5eefbfce0 
  src/tests/health_check_tests.cpp 476e04a12f885e5060a6e838b82ee599594c96f7 

Diff: https://reviews.apache.org/r/55901/diff/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS, because of MESOS-7050.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review165230
-----------------------------------------------------------



Patch looks great!

Reviews applied: [55900, 56288, 55901]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 10, 2017, 6:40 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 10, 2017, 6:40 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
>   src/checks/health_checker.cpp a5225ff1f4b071ed4182d41fa8ecc705fa4dbe00 
>   src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS, because of MESOS-7050.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.

> On Feb. 15, 2017, 9:40 p.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, line 664
> > <https://reviews.apache.org/r/55901/diff/1-10/?file=1613998#file1613998line664>
> >
> >     won't we be losing the info about why wait failed?

Yes, but the health check timed out anyway, we call `WaitNestedContainer`, only to ensure that the next `LaunchNestedContainerSession` call won't fail, so I don't think that a user would care about the details of the `WaitNestedContainer` call. We could log the response `VLOG(1)` or `VLOG(2)`, to make it easier to debug potential Mesos bugs.

In our offline discussions we also realized that reusing the `ContainerID` is not always safe, so we'll have to replace this wait call with a wait + cleanupContainer.


> On Feb. 15, 2017, 9:40 p.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, line 616
> > <https://reviews.apache.org/r/55901/diff/1-10/?file=1613998#file1613998line616>
> >
> >     is this deferred only because you want to access `taskId`? why not just pass taskId to the lambda directly and not-defer?

Because the compiler complains about not being able to capture a non-variable:

```
../../src/checks/health_checker.cpp:584:26: error: capture of non-variable \u2018mesos::internal::checks::HealthCheckerProcess::taskId\u2019
```


> On Feb. 15, 2017, 9:40 p.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, line 654
> > <https://reviews.apache.org/r/55901/diff/1-10/?file=1613998#file1613998line654>
> >
> >     do you want to add a TODO here to not re-use the ContainerID?

I don't know if we should commit this with that TODO, or if I should wait until MESOS-7120 is resolved, and then update this patch to not re-use the ContainerID.


- Gast�n


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review165630
-----------------------------------------------------------


On Feb. 10, 2017, 6:40 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 10, 2017, 6:40 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
>   src/checks/health_checker.cpp a5225ff1f4b071ed4182d41fa8ecc705fa4dbe00 
>   src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS, because of MESOS-7050.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.

> On Feb. 15, 2017, 9:40 p.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp
> > Lines 617 (patched)
> > <https://reviews.apache.org/r/55901/diff/1-10/?file=1613998#file1613998line654>
> >
> >     do you want to add a TODO here to not re-use the ContainerID?
> 
> Gast�n Kleiman wrote:
>     I don't know if we should commit this with that TODO, or if I should wait until MESOS-7120 is resolved, and then update this patch to not re-use the ContainerID.

Added a TODO and addressed it in https://reviews.apache.org/r/57647/.


- Gast�n


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review165630
-----------------------------------------------------------


On March 6, 2017, 4:23 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated March 6, 2017, 4:23 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
>   src/checks/health_checker.cpp 6c97369fd9a567ba16dd92085bf142d43f71eaf1 
>   src/launcher/default_executor.cpp 0ed436faa68e984d0be4e5186138f738bc7f1b52 
>   src/tests/health_check_tests.cpp 56e90747f2c943daee675738428d8ddeeafde36d 
> 
> 
> Diff: https://reviews.apache.org/r/55901/diff/13/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS, because of MESOS-7050.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review165630
-----------------------------------------------------------




src/checks/health_checker.hpp (line 183)
<https://reviews.apache.org/r/55901/#comment237494>

    What is it in the case of Docker containerizer? I think it is worth mentioning.



src/checks/health_checker.cpp (line 579)
<https://reviews.apache.org/r/55901/#comment237611>

    is this deferred only because you want to access `taskId`? why not just pass taskId to the lambda directly and not-defer?



src/checks/health_checker.cpp (line 582)
<https://reviews.apache.org/r/55901/#comment237610>

    s/of a command/of command/



src/checks/health_checker.cpp (line 608)
<https://reviews.apache.org/r/55901/#comment237616>

    s/not returned/timed out/



src/checks/health_checker.cpp (line 617)
<https://reviews.apache.org/r/55901/#comment237612>

    do you want to add a TODO here to not re-use the ContainerID?



src/checks/health_checker.cpp (line 619)
<https://reviews.apache.org/r/55901/#comment237625>

    why does this wait has repair but not the one in `__nestedCommandHealthCheck` ?



src/checks/health_checker.cpp (line 620)
<https://reviews.apache.org/r/55901/#comment237613>

    s/WaitForNestedContainer/WaitNestedContainer/



src/checks/health_checker.cpp (lines 620 - 621)
<https://reviews.apache.org/r/55901/#comment237614>

    s/no matter if/irrespective of whether/



src/checks/health_checker.cpp (line 625)
<https://reviews.apache.org/r/55901/#comment237615>

    s/WaitFor/Wait/



src/checks/health_checker.cpp (line 627)
<https://reviews.apache.org/r/55901/#comment237624>

    won't we be losing the info about why wait failed?



src/tests/health_check_tests.cpp (line 2122)
<https://reviews.apache.org/r/55901/#comment237626>

    s/is/in/


- Vinod Kone


On Feb. 10, 2017, 6:40 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 10, 2017, 6:40 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
>   src/checks/health_checker.cpp a5225ff1f4b071ed4182d41fa8ecc705fa4dbe00 
>   src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS, because of MESOS-7050.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated Feb. 10, 2017, 6:40 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Split `waitNestedContainer` into two + use defer instead of plain lambdas.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
-------

Added support for command health checks to the default executor.


Diffs (updated)
-----

  src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
  src/checks/health_checker.cpp a5225ff1f4b071ed4182d41fa8ecc705fa4dbe00 
  src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
  src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 

Diff: https://reviews.apache.org/r/55901/diff/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS, because of MESOS-7050.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated Feb. 10, 2017, 11:25 a.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Addressed feedback.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
-------

Added support for command health checks to the default executor.


Diffs (updated)
-----

  src/checks/health_checker.hpp f1f2834b3429fb00cc49c179fa9a3de328f597b5 
  src/checks/health_checker.cpp a5225ff1f4b071ed4182d41fa8ecc705fa4dbe00 
  src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
  src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 

Diff: https://reviews.apache.org/r/55901/diff/


Testing (updated)
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS, because of MESOS-7050.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated Feb. 8, 2017, 1:27 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Fixed broken rebase.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
-------

Added support for command health checks to the default executor.


Diffs (updated)
-----

  src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
  src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
  src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
  src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 

Diff: https://reviews.apache.org/r/55901/diff/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated Feb. 7, 2017, 11:37 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Added a comment in the new test.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
-------

Added support for command health checks to the default executor.


Diffs (updated)
-----

  src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
  src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
  src/launcher/default_executor.cpp 8e942c6faf6e2dbc70a6fff1629343c9a3ce8668 
  src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 

Diff: https://reviews.apache.org/r/55901/diff/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated Feb. 7, 2017, 11:32 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Addressed Alex's comments.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
-------

Added support for command health checks to the default executor.


Diffs (updated)
-----

  src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
  src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
  src/launcher/default_executor.cpp 8e942c6faf6e2dbc70a6fff1629343c9a3ce8668 
  src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 

Diff: https://reviews.apache.org/r/55901/diff/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.com>.

> On Feb. 7, 2017, 4:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, lines 618-624
> > <https://reviews.apache.org/r/55901/diff/5/?file=1623742#file1623742line618>
> >
> >     1) You have no interest in both `future` and `status`. Omit the names for clarity, please.
> >     
> >     2) Why do we need to call `waitForNestedContainer` here instead of simply returning `failure`?

Added a comment clarifying this.


> On Feb. 7, 2017, 4:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, line 653
> > <https://reviews.apache.org/r/55901/diff/5/?file=1623742#file1623742line653>
> >
> >     Don't you need to check that `wait_nested_container().exit_status()` exists? If `exit_status` is not set in the proto, don't you have to check this explicitly and return `None`?

Very good catch, thanks!


> On Feb. 7, 2017, 4:20 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, lines 585-588
> > <https://reviews.apache.org/r/55901/diff/5/?file=1623742#file1623742line585>
> >
> >     Mention `taskId`?

Updated all the comments I could find.


- Gast�n


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review164501
-----------------------------------------------------------


On Feb. 7, 2017, 11:37 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2017, 11:37 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp 8e942c6faf6e2dbc70a6fff1629343c9a3ce8668 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review164501
-----------------------------------------------------------




src/checks/health_checker.cpp (line 512)
<https://reviews.apache.org/r/55901/#comment236268>

    You can use `->` for options instead of `.get()`.



src/checks/health_checker.cpp (lines 585 - 588)
<https://reviews.apache.org/r/55901/#comment236282>

    Mention `taskId`?



src/checks/health_checker.cpp (line 594)
<https://reviews.apache.org/r/55901/#comment236283>

    How about "WAIT_NESTED_CONTAINER API call..."



src/checks/health_checker.cpp (lines 618 - 624)
<https://reviews.apache.org/r/55901/#comment236279>

    1) You have no interest in both `future` and `status`. Omit the names for clarity, please.
    
    2) Why do we need to call `waitForNestedContainer` here instead of simply returning `failure`?



src/checks/health_checker.cpp (lines 640 - 641)
<https://reviews.apache.org/r/55901/#comment236278>

    Is this formatting correct? Is it what clang-format suggests?



src/checks/health_checker.cpp (line 646)
<https://reviews.apache.org/r/55901/#comment236280>

    "... while waiting..."?



src/checks/health_checker.cpp (line 653)
<https://reviews.apache.org/r/55901/#comment236281>

    Don't you need to check that `wait_nested_container().exit_status()` exists? If `exit_status` is not set in the proto, don't you have to check this explicitly and return `None`?


- Alexander Rukletsov


On Feb. 3, 2017, 7:52 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2017, 7:52 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
>   src/tests/health_check_tests.cpp 8418cd91484fd26734de16255b37f3ebf574f5eb 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated Feb. 3, 2017, 7:52 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Split into two patches.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
-------

Added support for command health checks to the default executor.


Diffs (updated)
-----

  src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
  src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
  src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
  src/tests/health_check_tests.cpp 8418cd91484fd26734de16255b37f3ebf574f5eb 

Diff: https://reviews.apache.org/r/55901/diff/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated Feb. 3, 2017, 5:15 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Addressed feedback.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
-------

Added support for command health checks to the default executor.


Diffs (updated)
-----

  src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
  src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
  src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
  src/tests/health_check_tests.cpp 8418cd91484fd26734de16255b37f3ebf574f5eb 

Diff: https://reviews.apache.org/r/55901/diff/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Alexander Rukletsov <ru...@gmail.com>.

> On Feb. 3, 2017, 1:34 p.m., Alexander Rukletsov wrote:
> > src/tests/health_check_tests.cpp, lines 2154-2155
> > <https://reviews.apache.org/r/55901/diff/3/?file=1622284#file1622284line2154>
> >
> >     Formatting.
> >     
> >     Also it is probably a good idea to explicitly say you don't care about further updates.
> 
> Gast�n Kleiman wrote:
>     Fixed the formatting. We don't have such a comment in most of the other tests in this file.

Oh, under say I mean `.WillRepeatedly(Return());`


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review164117
-----------------------------------------------------------


On Feb. 3, 2017, 7:52 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2017, 7:52 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
>   src/tests/health_check_tests.cpp 8418cd91484fd26734de16255b37f3ebf574f5eb 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.com>.

> On Feb. 3, 2017, 1:34 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, line 130
> > <https://reviews.apache.org/r/55901/diff/3/?file=1622282#file1622282line130>
> >
> >     s/createRequest/createV1AgentAPIRequest?
> >     remove `inline`
> >     
> >     This looks like a utility function not really related to the health checker library. Maybe put it into "src/slave/api_utils.{hpp|cpp}"?
> 
> Gast�n Kleiman wrote:
>     I renamed the function. Vinod suggested to make it `inline` what's wrong with that?
>     
>     This function as-is is specific to how we perform requests in this file (`keep_alive = false`, protobuf serialization). Since it is used only in two places, I'm tempted to removed it and put the code inline in the corresponding methods.

I removed these functions.


- Gast�n


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review164117
-----------------------------------------------------------


On Feb. 7, 2017, 11:37 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2017, 11:37 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp 8e942c6faf6e2dbc70a6fff1629343c9a3ce8668 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Alexander Rukletsov <ru...@gmail.com>.

> On Feb. 3, 2017, 1:34 p.m., Alexander Rukletsov wrote:
> > src/tests/health_check_tests.cpp, lines 2118-2119
> > <https://reviews.apache.org/r/55901/diff/3/?file=1622284#file1622284line2118>
> >
> >     Why do we need to expilictly create `containerizer`?
> 
> Gast�n Kleiman wrote:
>     Because the implicit `containerizer` is started in local mode. `LaunchNestedContainerSession` (used by cmd health checks) tries to start a IO switchboard, which doesn't work in local mode yet.

Please capture this in a comment.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review164117
-----------------------------------------------------------


On Feb. 3, 2017, 7:52 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2017, 7:52 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
>   src/tests/health_check_tests.cpp 8418cd91484fd26734de16255b37f3ebf574f5eb 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.com>.

> On Feb. 3, 2017, 1:34 p.m., Alexander Rukletsov wrote:
> > src/tests/health_check_tests.cpp, lines 2118-2119
> > <https://reviews.apache.org/r/55901/diff/3/?file=1622284#file1622284line2118>
> >
> >     Why do we need to expilictly create `containerizer`?

Because the implicit `containerizer` is started in local mode. `LaunchNestedContainerSession` (used by cmd health checks) tries to start a IO switchboard, which doesn't work in local mode yet.


> On Feb. 3, 2017, 1:34 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.hpp, line 207
> > <https://reviews.apache.org/r/55901/diff/3/?file=1622281#file1622281line207>
> >
> >     How about calling it `commandCheckViaAgent`?

Good idea, done.


> On Feb. 3, 2017, 1:34 p.m., Alexander Rukletsov wrote:
> > src/checks/health_checker.cpp, line 130
> > <https://reviews.apache.org/r/55901/diff/3/?file=1622282#file1622282line130>
> >
> >     s/createRequest/createV1AgentAPIRequest?
> >     remove `inline`
> >     
> >     This looks like a utility function not really related to the health checker library. Maybe put it into "src/slave/api_utils.{hpp|cpp}"?

I renamed the function. Vinod suggested to make it `inline` what's wrong with that?

This function as-is is specific to how we perform requests in this file (`keep_alive = false`, protobuf serialization). Since it is used only in two places, I'm tempted to removed it and put the code inline in the corresponding methods.


> On Feb. 3, 2017, 1:34 p.m., Alexander Rukletsov wrote:
> > src/tests/health_check_tests.cpp, line 2107
> > <https://reviews.apache.org/r/55901/diff/3/?file=1622284#file1622284line2107>
> >
> >     Why not spelling out `cmd`?

Renamed.


> On Feb. 3, 2017, 1:34 p.m., Alexander Rukletsov wrote:
> > src/tests/health_check_tests.cpp, lines 2154-2155
> > <https://reviews.apache.org/r/55901/diff/3/?file=1622284#file1622284line2154>
> >
> >     Formatting.
> >     
> >     Also it is probably a good idea to explicitly say you don't care about further updates.

Fixed the formatting. We don't have such a comment in most of the other tests in this file.


- Gast�n


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review164117
-----------------------------------------------------------


On Feb. 3, 2017, 5:15 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2017, 5:15 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
>   src/tests/health_check_tests.cpp 8418cd91484fd26734de16255b37f3ebf574f5eb 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review164117
-----------------------------------------------------------




src/checks/health_checker.hpp (line 94)
<https://reviews.apache.org/r/55901/#comment235698>

    Could you please add a todo here saying this will go away and link a jira ticket?



src/checks/health_checker.hpp (line 207)
<https://reviews.apache.org/r/55901/#comment235702>

    How about calling it `commandCheckViaAgent`?



src/checks/health_checker.cpp (line 130)
<https://reviews.apache.org/r/55901/#comment235700>

    s/createRequest/createV1AgentAPIRequest?
    remove `inline`
    
    This looks like a utility function not really related to the health checker library. Maybe put it into "src/slave/api_utils.{hpp|cpp}"?



src/checks/health_checker.cpp (lines 146 - 170)
<https://reviews.apache.org/r/55901/#comment235701>

    What is the value of using these helpers? I'd say having `connection.send(createRequest(url, call), false);` is clearer and self explanatory rather than `post(connection, url, call)`. And you don't need any comments then : )



src/checks/health_checker.cpp (line 224)
<https://reviews.apache.org/r/55901/#comment235699>

    You can use `{}` instead.



src/checks/health_checker.cpp (lines 434 - 436)
<https://reviews.apache.org/r/55901/#comment235703>

    Use `CHECK_SOME` instead.



src/checks/health_checker.cpp (lines 456 - 482)
<https://reviews.apache.org/r/55901/#comment235697>

    Could you please link a JIRA here?



src/checks/health_checker.cpp (line 593)
<https://reviews.apache.org/r/55901/#comment235704>

    Please keep the order of the definitions in sync with the order of declarations.



src/checks/health_checker.cpp (line 746)
<https://reviews.apache.org/r/55901/#comment235705>

    Let's pull this (together with another below) into a separate patch.



src/tests/health_check_tests.cpp (line 2107)
<https://reviews.apache.org/r/55901/#comment235706>

    Why not spelling out `cmd`?



src/tests/health_check_tests.cpp (lines 2118 - 2119)
<https://reviews.apache.org/r/55901/#comment235707>

    Why do we need to expilictly create `containerizer`?



src/tests/health_check_tests.cpp (lines 2154 - 2155)
<https://reviews.apache.org/r/55901/#comment235708>

    Formatting.
    
    Also it is probably a good idea to explicitly say you don't care about further updates.



src/tests/health_check_tests.cpp (lines 2157 - 2161)
<https://reviews.apache.org/r/55901/#comment235709>

    1. I think you can also check that task's env vars are available in the health check -> pass exit status in the task's command.
    2. Let's avoid using obscure `populateTasks()` here. We'll get rid of it altogether in the nearest future.



src/tests/health_check_tests.cpp (lines 2185 - 2186)
<https://reviews.apache.org/r/55901/#comment235710>

    check that `healthy` is set please.


- Alexander Rukletsov


On Feb. 2, 2017, 5:02 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 2, 2017, 5:02 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
>   src/tests/health_check_tests.cpp 8418cd91484fd26734de16255b37f3ebf574f5eb 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated Feb. 2, 2017, 5:02 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Addressed feedback + rebase.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description
-------

Added support for command health checks to the default executor.


Diffs (updated)
-----

  src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
  src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
  src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
  src/tests/health_check_tests.cpp 8418cd91484fd26734de16255b37f3ebf574f5eb 

Diff: https://reviews.apache.org/r/55901/diff/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/
-----------------------------------------------------------

(Updated Jan. 28, 2017, 12:39 p.m.)


Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.


Changes
-------

Addressed Vinod's feedback.


Summary (updated)
-----------------

Added support for command health checks to the default executor.


Bugs: MESOS-6280
    https://issues.apache.org/jira/browse/MESOS-6280


Repository: mesos


Description (updated)
-------

Added support for command health checks to the default executor.


Diffs (updated)
-----

  src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
  src/checks/health_checker.cpp e70bd7936752613a4f92c70c4c61cd7cdf7c4ee5 
  src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
  src/tests/health_check_tests.cpp 710cb66eff6c4447caa22772f0cdc97cfa582c50 

Diff: https://reviews.apache.org/r/55901/diff/


Testing
-------

Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.


Thanks,

Gast�n Kleiman


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Alexander Rukletsov <ru...@gmail.com>.

> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, lines 426-429
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line426>
> >
> >     why a `.repair()` here?
> 
> Gast�n Kleiman wrote:
>     To fail the future/check with a nice descriptive message that will later be logged.
> 
> Vinod Kone wrote:
>     The message returned by `http::connect` should be good enough? Do we use this pattern elsewhere esp. with connect? Most uses of repair I have seen in the code base, transform the future not really to add extra logging information.

We use `.repair` for adjusting the message or changing the error type, which is technically the same. Here are some examples from the code:
https://github.com/apache/mesos/blob/25f4feae487d53a701adb787fd8a2e5f6166b789/3rdparty/libprocess/src/http.cpp#L1766-L1769
https://github.com/apache/mesos/blob/25f4feae487d53a701adb787fd8a2e5f6166b789/src/master/http.cpp#L4695-L4697
https://github.com/apache/mesos/blob/25f4feae487d53a701adb787fd8a2e5f6166b789/src/slave/containerizer/mesos/io/switchboard.cpp#L1632-L1638

Now, the question is whether connection failure message is good enough? Gast�n, could you trigger a failure path and check what error message will be returned?


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review163363
-----------------------------------------------------------


On Feb. 3, 2017, 7:52 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2017, 7:52 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
>   src/tests/health_check_tests.cpp 8418cd91484fd26734de16255b37f3ebf574f5eb 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Alexander Rukletsov <ru...@gmail.com>.

> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, lines 426-429
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line426>
> >
> >     why a `.repair()` here?
> 
> Gast�n Kleiman wrote:
>     To fail the future/check with a nice descriptive message that will later be logged.
> 
> Vinod Kone wrote:
>     The message returned by `http::connect` should be good enough? Do we use this pattern elsewhere esp. with connect? Most uses of repair I have seen in the code base, transform the future not really to add extra logging information.
> 
> Alexander Rukletsov wrote:
>     We use `.repair` for adjusting the message or changing the error type, which is technically the same. Here are some examples from the code:
>     https://github.com/apache/mesos/blob/25f4feae487d53a701adb787fd8a2e5f6166b789/3rdparty/libprocess/src/http.cpp#L1766-L1769
>     https://github.com/apache/mesos/blob/25f4feae487d53a701adb787fd8a2e5f6166b789/src/master/http.cpp#L4695-L4697
>     https://github.com/apache/mesos/blob/25f4feae487d53a701adb787fd8a2e5f6166b789/src/slave/containerizer/mesos/io/switchboard.cpp#L1632-L1638
>     
>     Now, the question is whether connection failure message is good enough? Gast�n, could you trigger a failure path and check what error message will be returned?
> 
> Gast�n Kleiman wrote:
>     This is how it looks like with the repair:
>     
>     ```
>     W0207 10:42:19.659122  9361 health_checker.cpp:314] Health check failed 1 times consecutively: COMMAND health check failed: Unable to establish connection with the agent: Failed to connect to 192.99.40.208:31338: Connection refused
>     ```
>     
>     Without the repair it would look like this:
>     
>     ```
>     W0207 10:42:19.659122  9361 health_checker.cpp:314] Health check failed 1 times consecutively: COMMAND health check failed: Failed to connect to 192.99.40.208:31338: Connection refused
>     ```

I'd vote for keeping `.repair`.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review163363
-----------------------------------------------------------


On Feb. 8, 2017, 1:27 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 8, 2017, 1:27 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.com>.

> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, line 480
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line480>
> >
> >     launch nested container session returns a streaming response, how come you are calling `post()` helper here which expects a non-streaming response?
> >     
> >     probably one of the reasons why your test is hanging.

The `post()` helper delegates to `process::http::request()`, which takes boolean flag (`streamedResponse`). If this flag is set to `false`, libprocess will convert a PIPE (streaming) response into a BODY (non-streaming) response. This means that the `Future` returned by the helper will not be completed until the server closes the connection.

Relevant links:
 - https://github.com/apache/mesos/blob/2d0195eed54feac41485fb1503dc4004e5500c81/3rdparty/libprocess/include/process/http.hpp#L953-L955
 - https://github.com/apache/mesos/blob/2d0195eed54feac41485fb1503dc4004e5500c81/3rdparty/libprocess/src/http.cpp#L1191-L1197
 - https://github.com/apache/mesos/blob/2d0195eed54feac41485fb1503dc4004e5500c81/3rdparty/libprocess/src/http.cpp#L1005-L1022

In these particular cases, it means that the `Future` will not be completed until the container exits, which is exactly what we need.


Regarding the test hanging in Linux, some further debugging seem to indicate that test hangs on `process::Clock::settle()`, because there's a race + deadlock in `RateLimiter` that leaves a process stuck in `RUNNING`. I'll dig deeper on Monday, but here's some evidence:

# Snippet from a successful test run that "leaks" a running process

```
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from HealthCheckTest
[ RUN      ] HealthCheckTest.DefaultExecutorCmdHealthCheck

[...]

E0127 11:18:06.607446  8665 limiter.hpp:123] !!!! LIMITER _acquire
E0127 11:18:06.607471  8665 limiter.hpp:129] !!!! LIMITER There are 1 promises
E0127 11:18:06.608064  8665 limiter.hpp:186] !!!! LIMITER destroy

**** DEADLOCK DETECTED! ****
You are waiting on process __limiter__(1419)@192.99.40.208:41947 that it is currently executing.

[...]

[       OK ] HealthCheckTest.DefaultExecutorCmdHealthCheck (592 ms)
[----------] 1 test from HealthCheckTest (593 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (603 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 474) . . .
```

# Snippet from a hung run

```
[...]
E0127 11:18:06.878636  8640 cluster.cpp:358] !!! Settling the clock
E0127 11:18:06.878646  8640 process.cpp:3491] !!! Attempting to acquire mutex
E0127 11:18:06.878654  8640 process.cpp:3494] !!! !runq.empty()?
E0127 11:18:06.878659  8640 process.cpp:3502] !!! running.load() > 0?
E0127 11:18:06.878667  8640 process.cpp:3504] !!! 1 processes still running
E0127 11:18:06.878676  8640 process.cpp:3509] !!!! Process: __authentication_router__(1) state: 3
E0127 11:18:06.878684  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1893) state: 3
E0127 11:18:06.878691  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1894) state: 3
E0127 11:18:06.878697  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1895) state: 3
E0127 11:18:06.878705  8640 process.cpp:3509] !!!! Process: __gc__ state: 3
E0127 11:18:06.878711  8640 process.cpp:3509] !!!! Process: __limiter__(1419) state: 2
E0127 11:18:06.878718  8640 process.cpp:3509] !!!! Process: __processes__ state: 3
E0127 11:18:06.878725  8640 process.cpp:3509] !!!! Process: __reaper__(1) state: 3
E0127 11:18:06.878731  8640 process.cpp:3509] !!!! Process: crammd5-authenticator(474) state: 3
E0127 11:18:06.878738  8640 process.cpp:3509] !!!! Process: files state: 3
E0127 11:18:06.878744  8640 process.cpp:3509] !!!! Process: help state: 3
E0127 11:18:06.878751  8640 process.cpp:3509] !!!! Process: hierarchical-allocator(474) state: 3
E0127 11:18:06.878757  8640 process.cpp:3509] !!!! Process: in-memory-storage(474) state: 3
E0127 11:18:06.878764  8640 process.cpp:3509] !!!! Process: local-authorizer(947) state: 3
E0127 11:18:06.878772  8640 process.cpp:3509] !!!! Process: logging state: 3
E0127 11:18:06.878777  8640 process.cpp:3509] !!!! Process: master state: 3
E0127 11:18:06.878792  8640 process.cpp:3509] !!!! Process: metrics state: 3
E0127 11:18:06.878800  8640 process.cpp:3509] !!!! Process: profiler state: 3
E0127 11:18:06.878806  8640 process.cpp:3509] !!!! Process: registrar(474) state: 3
E0127 11:18:06.878813  8640 process.cpp:3509] !!!! Process: standalone-master-detector(1420) state: 3
E0127 11:18:06.878820  8640 process.cpp:3509] !!!! Process: system state: 3
E0127 11:18:06.878828  8640 process.cpp:3509] !!!! Process: version state: 3
E0127 11:18:06.878834  8640 process.cpp:3509] !!!! Process: whitelist(474) state: 3
E0127 11:18:06.878840  8640 process.cpp:3491] !!! Attempting to acquire mutex
E0127 11:18:06.878846  8640 process.cpp:3494] !!! !runq.empty()?
E0127 11:18:06.878852  8640 process.cpp:3502] !!! running.load() > 0?
E0127 11:18:06.878859  8640 process.cpp:3504] !!! 1 processes still running
E0127 11:18:06.878867  8640 process.cpp:3509] !!!! Process: __authentication_router__(1) state: 3
E0127 11:18:06.878873  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1893) state: 3
E0127 11:18:06.878880  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1894) state: 3
E0127 11:18:06.878887  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1895) state: 3
E0127 11:18:06.878895  8640 process.cpp:3509] !!!! Process: __gc__ state: 3
E0127 11:18:06.878901  8640 process.cpp:3509] !!!! Process: __limiter__(1419) state: 2
E0127 11:18:06.878907  8640 process.cpp:3509] !!!! Process: __processes__ state: 3
E0127 11:18:06.878913  8640 process.cpp:3509] !!!! Process: __reaper__(1) state: 3
E0127 11:18:06.878921  8640 process.cpp:3509] !!!! Process: crammd5-authenticator(474) state: 3
E0127 11:18:06.878927  8640 process.cpp:3509] !!!! Process: files state: 3
E0127 11:18:06.878933  8640 process.cpp:3509] !!!! Process: help state: 3
E0127 11:18:06.878940  8640 process.cpp:3509] !!!! Process: hierarchical-allocator(474) state: 3
E0127 11:18:06.878947  8640 process.cpp:3509] !!!! Process: in-memory-storage(474) state: 3
E0127 11:18:06.878953  8640 process.cpp:3509] !!!! Process: local-authorizer(947) state: 3
E0127 11:18:06.878960  8640 process.cpp:3509] !!!! Process: logging state: 3
E0127 11:18:06.878967  8640 process.cpp:3509] !!!! Process: master state: 3
E0127 11:18:06.878973  8640 process.cpp:3509] !!!! Process: metrics state: 3
E0127 11:18:06.878979  8640 process.cpp:3509] !!!! Process: profiler state: 3
E0127 11:18:06.878986  8640 process.cpp:3509] !!!! Process: registrar(474) state: 3
E0127 11:18:06.878993  8640 process.cpp:3509] !!!! Process: standalone-master-detector(1420) state: 3
E0127 11:18:06.878999  8640 process.cpp:3509] !!!! Process: system state: 3
E0127 11:18:06.879006  8640 process.cpp:3509] !!!! Process: version state: 3
E0127 11:18:06.879014  8640 process.cpp:3509] !!!! Process: whitelist(474) state: 3
[...] (repeats ad nauseam)
```


> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, lines 481-485
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line481>
> >
> >     The identation seems wrong? do we do it like this elsehwhere?
> >     
> >     i would've done so
> >     
> >     ```
> >     .after(checkTimeout,
> >            defer(self(),
> >                  ...)
> >     ```

This was what `clang-format` suggested, but I adopted your suggestion.


> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, line 531
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line531>
> >
> >     s/has not returned/timed out/

I had taken the failure message from the "normal" cmd check code. I'm updating both.


> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, lines 534-536
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line534>
> >
> >     why repair?

To fail the future/check with a nice descriptive message that will later be logged.


> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, line 538
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line538>
> >
> >     why not just return failure?

The compiler complains if I do that, so I have to choose between:

```
    .then([failure](const Option<int>& status) -> Future<Response> {
        return failure;
    });
```

And:

```
    .then([failure](const Option<int>& status) {
        return Future<Response>(failure);
    });
```

Which one do we prefer?


> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, lines 426-429
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line426>
> >
> >     why a `.repair()` here?

To fail the future/check with a nice descriptive message that will later be logged.


> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, line 435
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line435>
> >
> >     const & ?

This connection is passed to `nestedCommandHealthCheckTimedOut()`, which calls the non-const method `connection.disconnect()`.


- Gast�n


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review163363
-----------------------------------------------------------


On Jan. 28, 2017, 12:39 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2017, 12:39 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp e70bd7936752613a4f92c70c4c61cd7cdf7c4ee5 
>   src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
>   src/tests/health_check_tests.cpp 710cb66eff6c4447caa22772f0cdc97cfa582c50 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Vinod Kone <vi...@gmail.com>.

> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, lines 426-429
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line426>
> >
> >     why a `.repair()` here?
> 
> Gast�n Kleiman wrote:
>     To fail the future/check with a nice descriptive message that will later be logged.

The message returned by `http::connect` should be good enough? Do we use this pattern elsewhere esp. with connect? Most uses of repair I have seen in the code base, transform the future not really to add extra logging information.


> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.hpp, line 137
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613997#file1613997line137>
> >
> >     we don't typically do `const` for POD types.
> >     
> >     s/_agentSpawnsCommandContainer/
> 
> Gast�n Kleiman wrote:
>     Looks lke the regex was truncated ;-).

How about "localCommandCheck" (vs "remoteCommandCheck")?


> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, line 480
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line480>
> >
> >     launch nested container session returns a streaming response, how come you are calling `post()` helper here which expects a non-streaming response?
> >     
> >     probably one of the reasons why your test is hanging.
> 
> Gast�n Kleiman wrote:
>     The `post()` helper delegates to `process::http::request()`, which takes boolean flag (`streamedResponse`). If this flag is set to `false`, libprocess will convert a PIPE (streaming) response into a BODY (non-streaming) response. This means that the `Future` returned by the helper will not be completed until the server closes the connection.
>     
>     Relevant links:
>      - https://github.com/apache/mesos/blob/2d0195eed54feac41485fb1503dc4004e5500c81/3rdparty/libprocess/include/process/http.hpp#L953-L955
>      - https://github.com/apache/mesos/blob/2d0195eed54feac41485fb1503dc4004e5500c81/3rdparty/libprocess/src/http.cpp#L1191-L1197
>      - https://github.com/apache/mesos/blob/2d0195eed54feac41485fb1503dc4004e5500c81/3rdparty/libprocess/src/http.cpp#L1005-L1022
>     
>     In these particular cases, it means that the `Future` will not be completed until the container exits, which is exactly what we need.
>     
>     
>     Regarding the test hanging in Linux, some further debugging seem to indicate that test hangs on `process::Clock::settle()`, because there's a race + deadlock in `RateLimiter` that leaves a process stuck in `RUNNING`. I'll dig deeper on Monday, but here's some evidence:
>     
>     # Snippet from a successful test run that "leaks" a running process
>     
>     ```
>     [==========] Running 1 test from 1 test case.
>     [----------] Global test environment set-up.
>     [----------] 1 test from HealthCheckTest
>     [ RUN      ] HealthCheckTest.DefaultExecutorCmdHealthCheck
>     
>     [...]
>     
>     E0127 11:18:06.607446  8665 limiter.hpp:123] !!!! LIMITER _acquire
>     E0127 11:18:06.607471  8665 limiter.hpp:129] !!!! LIMITER There are 1 promises
>     E0127 11:18:06.608064  8665 limiter.hpp:186] !!!! LIMITER destroy
>     
>     **** DEADLOCK DETECTED! ****
>     You are waiting on process __limiter__(1419)@192.99.40.208:41947 that it is currently executing.
>     
>     [...]
>     
>     [       OK ] HealthCheckTest.DefaultExecutorCmdHealthCheck (592 ms)
>     [----------] 1 test from HealthCheckTest (593 ms total)
>     
>     [----------] Global test environment tear-down
>     [==========] 1 test from 1 test case ran. (603 ms total)
>     [  PASSED  ] 1 test.
>     
>     Repeating all tests (iteration 474) . . .
>     ```
>     
>     # Snippet from a hung run
>     
>     ```
>     [...]
>     E0127 11:18:06.878636  8640 cluster.cpp:358] !!! Settling the clock
>     E0127 11:18:06.878646  8640 process.cpp:3491] !!! Attempting to acquire mutex
>     E0127 11:18:06.878654  8640 process.cpp:3494] !!! !runq.empty()?
>     E0127 11:18:06.878659  8640 process.cpp:3502] !!! running.load() > 0?
>     E0127 11:18:06.878667  8640 process.cpp:3504] !!! 1 processes still running
>     E0127 11:18:06.878676  8640 process.cpp:3509] !!!! Process: __authentication_router__(1) state: 3
>     E0127 11:18:06.878684  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1893) state: 3
>     E0127 11:18:06.878691  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1894) state: 3
>     E0127 11:18:06.878697  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1895) state: 3
>     E0127 11:18:06.878705  8640 process.cpp:3509] !!!! Process: __gc__ state: 3
>     E0127 11:18:06.878711  8640 process.cpp:3509] !!!! Process: __limiter__(1419) state: 2
>     E0127 11:18:06.878718  8640 process.cpp:3509] !!!! Process: __processes__ state: 3
>     E0127 11:18:06.878725  8640 process.cpp:3509] !!!! Process: __reaper__(1) state: 3
>     E0127 11:18:06.878731  8640 process.cpp:3509] !!!! Process: crammd5-authenticator(474) state: 3
>     E0127 11:18:06.878738  8640 process.cpp:3509] !!!! Process: files state: 3
>     E0127 11:18:06.878744  8640 process.cpp:3509] !!!! Process: help state: 3
>     E0127 11:18:06.878751  8640 process.cpp:3509] !!!! Process: hierarchical-allocator(474) state: 3
>     E0127 11:18:06.878757  8640 process.cpp:3509] !!!! Process: in-memory-storage(474) state: 3
>     E0127 11:18:06.878764  8640 process.cpp:3509] !!!! Process: local-authorizer(947) state: 3
>     E0127 11:18:06.878772  8640 process.cpp:3509] !!!! Process: logging state: 3
>     E0127 11:18:06.878777  8640 process.cpp:3509] !!!! Process: master state: 3
>     E0127 11:18:06.878792  8640 process.cpp:3509] !!!! Process: metrics state: 3
>     E0127 11:18:06.878800  8640 process.cpp:3509] !!!! Process: profiler state: 3
>     E0127 11:18:06.878806  8640 process.cpp:3509] !!!! Process: registrar(474) state: 3
>     E0127 11:18:06.878813  8640 process.cpp:3509] !!!! Process: standalone-master-detector(1420) state: 3
>     E0127 11:18:06.878820  8640 process.cpp:3509] !!!! Process: system state: 3
>     E0127 11:18:06.878828  8640 process.cpp:3509] !!!! Process: version state: 3
>     E0127 11:18:06.878834  8640 process.cpp:3509] !!!! Process: whitelist(474) state: 3
>     E0127 11:18:06.878840  8640 process.cpp:3491] !!! Attempting to acquire mutex
>     E0127 11:18:06.878846  8640 process.cpp:3494] !!! !runq.empty()?
>     E0127 11:18:06.878852  8640 process.cpp:3502] !!! running.load() > 0?
>     E0127 11:18:06.878859  8640 process.cpp:3504] !!! 1 processes still running
>     E0127 11:18:06.878867  8640 process.cpp:3509] !!!! Process: __authentication_router__(1) state: 3
>     E0127 11:18:06.878873  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1893) state: 3
>     E0127 11:18:06.878880  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1894) state: 3
>     E0127 11:18:06.878887  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1895) state: 3
>     E0127 11:18:06.878895  8640 process.cpp:3509] !!!! Process: __gc__ state: 3
>     E0127 11:18:06.878901  8640 process.cpp:3509] !!!! Process: __limiter__(1419) state: 2
>     E0127 11:18:06.878907  8640 process.cpp:3509] !!!! Process: __processes__ state: 3
>     E0127 11:18:06.878913  8640 process.cpp:3509] !!!! Process: __reaper__(1) state: 3
>     E0127 11:18:06.878921  8640 process.cpp:3509] !!!! Process: crammd5-authenticator(474) state: 3
>     E0127 11:18:06.878927  8640 process.cpp:3509] !!!! Process: files state: 3
>     E0127 11:18:06.878933  8640 process.cpp:3509] !!!! Process: help state: 3
>     E0127 11:18:06.878940  8640 process.cpp:3509] !!!! Process: hierarchical-allocator(474) state: 3
>     E0127 11:18:06.878947  8640 process.cpp:3509] !!!! Process: in-memory-storage(474) state: 3
>     E0127 11:18:06.878953  8640 process.cpp:3509] !!!! Process: local-authorizer(947) state: 3
>     E0127 11:18:06.878960  8640 process.cpp:3509] !!!! Process: logging state: 3
>     E0127 11:18:06.878967  8640 process.cpp:3509] !!!! Process: master state: 3
>     E0127 11:18:06.878973  8640 process.cpp:3509] !!!! Process: metrics state: 3
>     E0127 11:18:06.878979  8640 process.cpp:3509] !!!! Process: profiler state: 3
>     E0127 11:18:06.878986  8640 process.cpp:3509] !!!! Process: registrar(474) state: 3
>     E0127 11:18:06.878993  8640 process.cpp:3509] !!!! Process: standalone-master-detector(1420) state: 3
>     E0127 11:18:06.878999  8640 process.cpp:3509] !!!! Process: system state: 3
>     E0127 11:18:06.879006  8640 process.cpp:3509] !!!! Process: version state: 3
>     E0127 11:18:06.879014  8640 process.cpp:3509] !!!! Process: whitelist(474) state: 3
>     [...] (repeats ad nauseam)
>     ```
> 
> Gast�n Kleiman wrote:
>     See https://issues.apache.org/jira/browse/MESOS-7036 for an in-depth analysis of the deadlock.

This needs a giant comment then to explain the rationale for posterity.


> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, line 538
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line538>
> >
> >     why not just return failure?
> 
> Gast�n Kleiman wrote:
>     The compiler complains if I do that, so I have to choose between:
>     
>     ```
>         .then([failure](const Option<int>& status) -> Future<Response> {
>             return failure;
>         });
>     ```
>     
>     And:
>     
>     ```
>         .then([failure](const Option<int>& status) {
>             return Future<Response>(failure);
>         });
>     ```
>     
>     Which one do we prefer?

whichever one you pick I would like to see both `.repair` and `.then` be consistent if possible.


> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, line 435
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line435>
> >
> >     const & ?
> 
> Gast�n Kleiman wrote:
>     This connection is passed to `nestedCommandHealthCheckTimedOut()`, which calls the non-const method `connection.disconnect()`.

I see.


> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, lines 534-536
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line534>
> >
> >     why repair?
> 
> Gast�n Kleiman wrote:
>     To fail the future/check with a nice descriptive message that will later be logged.

see above.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review163363
-----------------------------------------------------------


On Jan. 28, 2017, 12:39 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2017, 12:39 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp e70bd7936752613a4f92c70c4c61cd7cdf7c4ee5 
>   src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
>   src/tests/health_check_tests.cpp 710cb66eff6c4447caa22772f0cdc97cfa582c50 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.io>.

> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.hpp, line 137
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613997#file1613997line137>
> >
> >     we don't typically do `const` for POD types.
> >     
> >     s/_agentSpawnsCommandContainer/
> 
> Gast�n Kleiman wrote:
>     Looks lke the regex was truncated ;-).
> 
> Vinod Kone wrote:
>     How about "localCommandCheck" (vs "remoteCommandCheck")?
> 
> Alexander Rukletsov wrote:
>     We are not consistent about `const` that's true, but I think we are starting to use it more and more often. Unfortunately, I don't have extensive data at hand, here are some links:
>     https://github.com/apache/mesos/blob/30f7f3e1965885406d6d3e49348bee7e7d0df9d5/src/slave/slave.hpp#L970
>     https://github.com/apache/mesos/blob/30f7f3e1965885406d6d3e49348bee7e7d0df9d5/src/slave/containerizer/mesos/isolators/volume/sandbox_path.hpp#L55
> 
> Vinod Kone wrote:
>     Making a POD member variable const is different from making it a const in function argument. We definitely do the former but not the latter IIRC.
> 
> Alexander Rukletsov wrote:
>     Agree.

I'm going with `commandCheckViaAgent` and not marking it `const`.


- Gast�n


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review163363
-----------------------------------------------------------


On Feb. 8, 2017, 1:27 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 8, 2017, 1:27 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.com>.

> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, lines 426-429
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line426>
> >
> >     why a `.repair()` here?
> 
> Gast�n Kleiman wrote:
>     To fail the future/check with a nice descriptive message that will later be logged.
> 
> Vinod Kone wrote:
>     The message returned by `http::connect` should be good enough? Do we use this pattern elsewhere esp. with connect? Most uses of repair I have seen in the code base, transform the future not really to add extra logging information.
> 
> Alexander Rukletsov wrote:
>     We use `.repair` for adjusting the message or changing the error type, which is technically the same. Here are some examples from the code:
>     https://github.com/apache/mesos/blob/25f4feae487d53a701adb787fd8a2e5f6166b789/3rdparty/libprocess/src/http.cpp#L1766-L1769
>     https://github.com/apache/mesos/blob/25f4feae487d53a701adb787fd8a2e5f6166b789/src/master/http.cpp#L4695-L4697
>     https://github.com/apache/mesos/blob/25f4feae487d53a701adb787fd8a2e5f6166b789/src/slave/containerizer/mesos/io/switchboard.cpp#L1632-L1638
>     
>     Now, the question is whether connection failure message is good enough? Gast�n, could you trigger a failure path and check what error message will be returned?

This is how it looks like with the repair:

```
W0207 10:42:19.659122  9361 health_checker.cpp:314] Health check failed 1 times consecutively: COMMAND health check failed: Unable to establish connection with the agent: Failed to connect to 192.99.40.208:31338: Connection refused
```

Without the repair it would look like this:

```
W0207 10:42:19.659122  9361 health_checker.cpp:314] Health check failed 1 times consecutively: COMMAND health check failed: Failed to connect to 192.99.40.208:31338: Connection refused
```


- Gast�n


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review163363
-----------------------------------------------------------


On Feb. 7, 2017, 11:37 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2017, 11:37 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp 8e942c6faf6e2dbc70a6fff1629343c9a3ce8668 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Alexander Rukletsov <ru...@gmail.com>.

> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.hpp, line 137
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613997#file1613997line137>
> >
> >     we don't typically do `const` for POD types.
> >     
> >     s/_agentSpawnsCommandContainer/
> 
> Gast�n Kleiman wrote:
>     Looks lke the regex was truncated ;-).
> 
> Vinod Kone wrote:
>     How about "localCommandCheck" (vs "remoteCommandCheck")?
> 
> Alexander Rukletsov wrote:
>     We are not consistent about `const` that's true, but I think we are starting to use it more and more often. Unfortunately, I don't have extensive data at hand, here are some links:
>     https://github.com/apache/mesos/blob/30f7f3e1965885406d6d3e49348bee7e7d0df9d5/src/slave/slave.hpp#L970
>     https://github.com/apache/mesos/blob/30f7f3e1965885406d6d3e49348bee7e7d0df9d5/src/slave/containerizer/mesos/isolators/volume/sandbox_path.hpp#L55
> 
> Vinod Kone wrote:
>     Making a POD member variable const is different from making it a const in function argument. We definitely do the former but not the latter IIRC.

Agree.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review163363
-----------------------------------------------------------


On Feb. 7, 2017, 11:37 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2017, 11:37 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp 8e942c6faf6e2dbc70a6fff1629343c9a3ce8668 
>   src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Alexander Rukletsov <ru...@gmail.com>.

> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.hpp, line 137
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613997#file1613997line137>
> >
> >     we don't typically do `const` for POD types.
> >     
> >     s/_agentSpawnsCommandContainer/
> 
> Gast�n Kleiman wrote:
>     Looks lke the regex was truncated ;-).
> 
> Vinod Kone wrote:
>     How about "localCommandCheck" (vs "remoteCommandCheck")?

We are not consistent about `const` that's true, but I think we are starting to use it more and more often. Unfortunately, I don't have extensive data at hand, here are some links:
https://github.com/apache/mesos/blob/30f7f3e1965885406d6d3e49348bee7e7d0df9d5/src/slave/slave.hpp#L970
https://github.com/apache/mesos/blob/30f7f3e1965885406d6d3e49348bee7e7d0df9d5/src/slave/containerizer/mesos/isolators/volume/sandbox_path.hpp#L55


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review163363
-----------------------------------------------------------


On Feb. 3, 2017, 7:52 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2017, 7:52 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
>   src/tests/health_check_tests.cpp 8418cd91484fd26734de16255b37f3ebf574f5eb 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.com>.

> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.cpp, line 480
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613998#file1613998line480>
> >
> >     launch nested container session returns a streaming response, how come you are calling `post()` helper here which expects a non-streaming response?
> >     
> >     probably one of the reasons why your test is hanging.
> 
> Gast�n Kleiman wrote:
>     The `post()` helper delegates to `process::http::request()`, which takes boolean flag (`streamedResponse`). If this flag is set to `false`, libprocess will convert a PIPE (streaming) response into a BODY (non-streaming) response. This means that the `Future` returned by the helper will not be completed until the server closes the connection.
>     
>     Relevant links:
>      - https://github.com/apache/mesos/blob/2d0195eed54feac41485fb1503dc4004e5500c81/3rdparty/libprocess/include/process/http.hpp#L953-L955
>      - https://github.com/apache/mesos/blob/2d0195eed54feac41485fb1503dc4004e5500c81/3rdparty/libprocess/src/http.cpp#L1191-L1197
>      - https://github.com/apache/mesos/blob/2d0195eed54feac41485fb1503dc4004e5500c81/3rdparty/libprocess/src/http.cpp#L1005-L1022
>     
>     In these particular cases, it means that the `Future` will not be completed until the container exits, which is exactly what we need.
>     
>     
>     Regarding the test hanging in Linux, some further debugging seem to indicate that test hangs on `process::Clock::settle()`, because there's a race + deadlock in `RateLimiter` that leaves a process stuck in `RUNNING`. I'll dig deeper on Monday, but here's some evidence:
>     
>     # Snippet from a successful test run that "leaks" a running process
>     
>     ```
>     [==========] Running 1 test from 1 test case.
>     [----------] Global test environment set-up.
>     [----------] 1 test from HealthCheckTest
>     [ RUN      ] HealthCheckTest.DefaultExecutorCmdHealthCheck
>     
>     [...]
>     
>     E0127 11:18:06.607446  8665 limiter.hpp:123] !!!! LIMITER _acquire
>     E0127 11:18:06.607471  8665 limiter.hpp:129] !!!! LIMITER There are 1 promises
>     E0127 11:18:06.608064  8665 limiter.hpp:186] !!!! LIMITER destroy
>     
>     **** DEADLOCK DETECTED! ****
>     You are waiting on process __limiter__(1419)@192.99.40.208:41947 that it is currently executing.
>     
>     [...]
>     
>     [       OK ] HealthCheckTest.DefaultExecutorCmdHealthCheck (592 ms)
>     [----------] 1 test from HealthCheckTest (593 ms total)
>     
>     [----------] Global test environment tear-down
>     [==========] 1 test from 1 test case ran. (603 ms total)
>     [  PASSED  ] 1 test.
>     
>     Repeating all tests (iteration 474) . . .
>     ```
>     
>     # Snippet from a hung run
>     
>     ```
>     [...]
>     E0127 11:18:06.878636  8640 cluster.cpp:358] !!! Settling the clock
>     E0127 11:18:06.878646  8640 process.cpp:3491] !!! Attempting to acquire mutex
>     E0127 11:18:06.878654  8640 process.cpp:3494] !!! !runq.empty()?
>     E0127 11:18:06.878659  8640 process.cpp:3502] !!! running.load() > 0?
>     E0127 11:18:06.878667  8640 process.cpp:3504] !!! 1 processes still running
>     E0127 11:18:06.878676  8640 process.cpp:3509] !!!! Process: __authentication_router__(1) state: 3
>     E0127 11:18:06.878684  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1893) state: 3
>     E0127 11:18:06.878691  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1894) state: 3
>     E0127 11:18:06.878697  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1895) state: 3
>     E0127 11:18:06.878705  8640 process.cpp:3509] !!!! Process: __gc__ state: 3
>     E0127 11:18:06.878711  8640 process.cpp:3509] !!!! Process: __limiter__(1419) state: 2
>     E0127 11:18:06.878718  8640 process.cpp:3509] !!!! Process: __processes__ state: 3
>     E0127 11:18:06.878725  8640 process.cpp:3509] !!!! Process: __reaper__(1) state: 3
>     E0127 11:18:06.878731  8640 process.cpp:3509] !!!! Process: crammd5-authenticator(474) state: 3
>     E0127 11:18:06.878738  8640 process.cpp:3509] !!!! Process: files state: 3
>     E0127 11:18:06.878744  8640 process.cpp:3509] !!!! Process: help state: 3
>     E0127 11:18:06.878751  8640 process.cpp:3509] !!!! Process: hierarchical-allocator(474) state: 3
>     E0127 11:18:06.878757  8640 process.cpp:3509] !!!! Process: in-memory-storage(474) state: 3
>     E0127 11:18:06.878764  8640 process.cpp:3509] !!!! Process: local-authorizer(947) state: 3
>     E0127 11:18:06.878772  8640 process.cpp:3509] !!!! Process: logging state: 3
>     E0127 11:18:06.878777  8640 process.cpp:3509] !!!! Process: master state: 3
>     E0127 11:18:06.878792  8640 process.cpp:3509] !!!! Process: metrics state: 3
>     E0127 11:18:06.878800  8640 process.cpp:3509] !!!! Process: profiler state: 3
>     E0127 11:18:06.878806  8640 process.cpp:3509] !!!! Process: registrar(474) state: 3
>     E0127 11:18:06.878813  8640 process.cpp:3509] !!!! Process: standalone-master-detector(1420) state: 3
>     E0127 11:18:06.878820  8640 process.cpp:3509] !!!! Process: system state: 3
>     E0127 11:18:06.878828  8640 process.cpp:3509] !!!! Process: version state: 3
>     E0127 11:18:06.878834  8640 process.cpp:3509] !!!! Process: whitelist(474) state: 3
>     E0127 11:18:06.878840  8640 process.cpp:3491] !!! Attempting to acquire mutex
>     E0127 11:18:06.878846  8640 process.cpp:3494] !!! !runq.empty()?
>     E0127 11:18:06.878852  8640 process.cpp:3502] !!! running.load() > 0?
>     E0127 11:18:06.878859  8640 process.cpp:3504] !!! 1 processes still running
>     E0127 11:18:06.878867  8640 process.cpp:3509] !!!! Process: __authentication_router__(1) state: 3
>     E0127 11:18:06.878873  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1893) state: 3
>     E0127 11:18:06.878880  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1894) state: 3
>     E0127 11:18:06.878887  8640 process.cpp:3509] !!!! Process: __basic_authenticator__(1895) state: 3
>     E0127 11:18:06.878895  8640 process.cpp:3509] !!!! Process: __gc__ state: 3
>     E0127 11:18:06.878901  8640 process.cpp:3509] !!!! Process: __limiter__(1419) state: 2
>     E0127 11:18:06.878907  8640 process.cpp:3509] !!!! Process: __processes__ state: 3
>     E0127 11:18:06.878913  8640 process.cpp:3509] !!!! Process: __reaper__(1) state: 3
>     E0127 11:18:06.878921  8640 process.cpp:3509] !!!! Process: crammd5-authenticator(474) state: 3
>     E0127 11:18:06.878927  8640 process.cpp:3509] !!!! Process: files state: 3
>     E0127 11:18:06.878933  8640 process.cpp:3509] !!!! Process: help state: 3
>     E0127 11:18:06.878940  8640 process.cpp:3509] !!!! Process: hierarchical-allocator(474) state: 3
>     E0127 11:18:06.878947  8640 process.cpp:3509] !!!! Process: in-memory-storage(474) state: 3
>     E0127 11:18:06.878953  8640 process.cpp:3509] !!!! Process: local-authorizer(947) state: 3
>     E0127 11:18:06.878960  8640 process.cpp:3509] !!!! Process: logging state: 3
>     E0127 11:18:06.878967  8640 process.cpp:3509] !!!! Process: master state: 3
>     E0127 11:18:06.878973  8640 process.cpp:3509] !!!! Process: metrics state: 3
>     E0127 11:18:06.878979  8640 process.cpp:3509] !!!! Process: profiler state: 3
>     E0127 11:18:06.878986  8640 process.cpp:3509] !!!! Process: registrar(474) state: 3
>     E0127 11:18:06.878993  8640 process.cpp:3509] !!!! Process: standalone-master-detector(1420) state: 3
>     E0127 11:18:06.878999  8640 process.cpp:3509] !!!! Process: system state: 3
>     E0127 11:18:06.879006  8640 process.cpp:3509] !!!! Process: version state: 3
>     E0127 11:18:06.879014  8640 process.cpp:3509] !!!! Process: whitelist(474) state: 3
>     [...] (repeats ad nauseam)
>     ```

See https://issues.apache.org/jira/browse/MESOS-7036 for an in-depth analysis of the deadlock.


- Gast�n


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review163363
-----------------------------------------------------------


On Jan. 28, 2017, 12:39 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2017, 12:39 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp e70bd7936752613a4f92c70c4c61cd7cdf7c4ee5 
>   src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
>   src/tests/health_check_tests.cpp 710cb66eff6c4447caa22772f0cdc97cfa582c50 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: [WIP] Added support for command health checks to the default executor.

Posted by Gastón Kleiman <ga...@mesosphere.com>.

> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.hpp, line 137
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613997#file1613997line137>
> >
> >     we don't typically do `const` for POD types.
> >     
> >     s/_agentSpawnsCommandContainer/

Looks lke the regex was truncated ;-).


- Gast�n


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review163363
-----------------------------------------------------------


On Jan. 25, 2017, 12:33 a.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Jan. 25, 2017, 12:33 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> [WIP] Added support for command health checks to the default executor.
> 
> This is still a work in progress, but I am posting it in order to get some feedback on the general approach.
> 
> Open questions/issues:
> 
>   - The code that handles communication with the Agent is on the `DefaultExecutor`, so the health checker calls `evolve` when serializing the API calls. Should it use the v1 protos instead, and skip the `evolve` call?
>   - CMD health checks for command tasks inherit the env from the executor, but allow overriding env variables. I don't think that inheriting the executor env makes sense for cmd health checks launched by the `DefaultExecutor` - each task can have its own env, so the health check command should inherit the task env and merge it with the one in the check's `CommandInfo`.
>     I added some code that takes the task env from `TaskInfo` and merges it check's env, but I don't think that this is an ideal approach, since it will miss env variables set/modified by containerizers/hooks/isolators. Do you think that it would make sense for all Debug Containers to inherit the environment of the parent container?
>   - There's a TODO for stopping/resuming health checking when the `DefaultExecutor` is disconnected from the agent. After talking to AlexR, I believe that we should do this for all types of checks, not just for `CMD`.
>   - The test that I introduced passes on Linux, but not on macOS, I still have to do some more debugging.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 6e558f2061a9e31157c47d31cb64b3a8568aace3 
>   src/checks/health_checker.cpp 50aa2858e807b27bbab58a3618f5200cfe4eca9e 
>   src/launcher/default_executor.cpp a03794934adb93868734f8cf00b337a1bff9b5ab 
>   src/tests/health_check_tests.cpp debbd3c09b7555145aaf3f62a24d795d1423a269 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: Added support for command health checks to the default executor.

Posted by Vinod Kone <vi...@gmail.com>.

> On Jan. 28, 2017, 1:17 a.m., Vinod Kone wrote:
> > src/checks/health_checker.hpp, line 137
> > <https://reviews.apache.org/r/55901/diff/1/?file=1613997#file1613997line137>
> >
> >     we don't typically do `const` for POD types.
> >     
> >     s/_agentSpawnsCommandContainer/
> 
> Gast�n Kleiman wrote:
>     Looks lke the regex was truncated ;-).
> 
> Vinod Kone wrote:
>     How about "localCommandCheck" (vs "remoteCommandCheck")?
> 
> Alexander Rukletsov wrote:
>     We are not consistent about `const` that's true, but I think we are starting to use it more and more often. Unfortunately, I don't have extensive data at hand, here are some links:
>     https://github.com/apache/mesos/blob/30f7f3e1965885406d6d3e49348bee7e7d0df9d5/src/slave/slave.hpp#L970
>     https://github.com/apache/mesos/blob/30f7f3e1965885406d6d3e49348bee7e7d0df9d5/src/slave/containerizer/mesos/isolators/volume/sandbox_path.hpp#L55

Making a POD member variable const is different from making it a const in function argument. We definitely do the former but not the latter IIRC.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review163363
-----------------------------------------------------------


On Feb. 3, 2017, 7:52 p.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2017, 7:52 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added support for command health checks to the default executor.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 
>   src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b 
>   src/launcher/default_executor.cpp 97eee05cac8cb1f62d43e2aecc08a8e54e49eac3 
>   src/tests/health_check_tests.cpp 8418cd91484fd26734de16255b37f3ebf574f5eb 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: [WIP] Added support for command health checks to the default executor.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review163363
-----------------------------------------------------------




src/checks/health_checker.hpp (line 137)
<https://reviews.apache.org/r/55901/#comment234769>

    we don't typically do `const` for POD types.
    
    s/_agentSpawnsCommandContainer/



src/checks/health_checker.cpp (line 129)
<https://reviews.apache.org/r/55901/#comment234776>

    s/build/create/



src/checks/health_checker.cpp (lines 145 - 157)
<https://reviews.apache.org/r/55901/#comment234773>

    can you inline these?



src/checks/health_checker.cpp (lines 426 - 429)
<https://reviews.apache.org/r/55901/#comment234777>

    why a `.repair()` here?



src/checks/health_checker.cpp (line 435)
<https://reviews.apache.org/r/55901/#comment234798>

    const & ?



src/checks/health_checker.cpp (line 447)
<https://reviews.apache.org/r/55901/#comment234780>

    s/this/This/



src/checks/health_checker.cpp (line 449)
<https://reviews.apache.org/r/55901/#comment234781>

    s/Agent/agent/



src/checks/health_checker.cpp (line 480)
<https://reviews.apache.org/r/55901/#comment234799>

    launch nested container session returns a streaming response, how come you are calling `post()` helper here which expects a non-streaming response?
    
    probably one of the reasons why your test is hanging.



src/checks/health_checker.cpp (lines 481 - 485)
<https://reviews.apache.org/r/55901/#comment234782>

    The identation seems wrong? do we do it like this elsehwhere?
    
    i would've done so
    
    ```
    .after(checkTimeout,
           defer(self(),
                 ...)
    ```



src/checks/health_checker.cpp (line 531)
<https://reviews.apache.org/r/55901/#comment234789>

    s/has not returned/timed out/



src/checks/health_checker.cpp (lines 534 - 536)
<https://reviews.apache.org/r/55901/#comment234790>

    why repair?



src/checks/health_checker.cpp (line 538)
<https://reviews.apache.org/r/55901/#comment234797>

    why not just return failure?



src/checks/health_checker.cpp (line 562)
<https://reviews.apache.org/r/55901/#comment234793>

    s/callResponse/response/



src/launcher/default_executor.cpp (line 112)
<https://reviews.apache.org/r/55901/#comment234794>

    s/resume/Resume/



src/launcher/default_executor.cpp (line 130)
<https://reviews.apache.org/r/55901/#comment234795>

    s/stop/Stop/


- Vinod Kone


On Jan. 25, 2017, 12:33 a.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Jan. 25, 2017, 12:33 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> [WIP] Added support for command health checks to the default executor.
> 
> This is still a work in progress, but I am posting it in order to get some feedback on the general approach.
> 
> Open questions/issues:
> 
>   - The code that handles communication with the Agent is on the `DefaultExecutor`, so the health checker calls `evolve` when serializing the API calls. Should it use the v1 protos instead, and skip the `evolve` call?
>   - CMD health checks for command tasks inherit the env from the executor, but allow overriding env variables. I don't think that inheriting the executor env makes sense for cmd health checks launched by the `DefaultExecutor` - each task can have its own env, so the health check command should inherit the task env and merge it with the one in the check's `CommandInfo`.
>     I added some code that takes the task env from `TaskInfo` and merges it check's env, but I don't think that this is an ideal approach, since it will miss env variables set/modified by containerizers/hooks/isolators. Do you think that it would make sense for all Debug Containers to inherit the environment of the parent container?
>   - There's a TODO for stopping/resuming health checking when the `DefaultExecutor` is disconnected from the agent. After talking to AlexR, I believe that we should do this for all types of checks, not just for `CMD`.
>   - The test that I introduced passes on Linux, but not on macOS, I still have to do some more debugging.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 6e558f2061a9e31157c47d31cb64b3a8568aace3 
>   src/checks/health_checker.cpp 50aa2858e807b27bbab58a3618f5200cfe4eca9e 
>   src/launcher/default_executor.cpp a03794934adb93868734f8cf00b337a1bff9b5ab 
>   src/tests/health_check_tests.cpp debbd3c09b7555145aaf3f62a24d795d1423a269 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>


Re: Review Request 55901: [WIP] Added support for command health checks to the default executor.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55901/#review162922
-----------------------------------------------------------



Patch looks great!

Reviews applied: [55899, 55900, 55901]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Jan. 25, 2017, 12:33 a.m., Gast�n Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55901/
> -----------------------------------------------------------
> 
> (Updated Jan. 25, 2017, 12:33 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6280
>     https://issues.apache.org/jira/browse/MESOS-6280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> [WIP] Added support for command health checks to the default executor.
> 
> This is still a work in progress, but I am posting it in order to get some feedback on the general approach.
> 
> Open questions/issues:
> 
>   - The code that handles communication with the Agent is on the `DefaultExecutor`, so the health checker calls `evolve` when serializing the API calls. Should it use the v1 protos instead, and skip the `evolve` call?
>   - CMD health checks for command tasks inherit the env from the executor, but allow overriding env variables. I don't think that inheriting the executor env makes sense for cmd health checks launched by the `DefaultExecutor` - each task can have its own env, so the health check command should inherit the task env and merge it with the one in the check's `CommandInfo`.
>     I added some code that takes the task env from `TaskInfo` and merges it check's env, but I don't think that this is an ideal approach, since it will miss env variables set/modified by containerizers/hooks/isolators. Do you think that it would make sense for all Debug Containers to inherit the environment of the parent container?
>   - There's a TODO for stopping/resuming health checking when the `DefaultExecutor` is disconnected from the agent. After talking to AlexR, I believe that we should do this for all types of checks, not just for `CMD`.
>   - The test that I introduced passes on Linux, but not on macOS, I still have to do some more debugging.
> 
> 
> Diffs
> -----
> 
>   src/checks/health_checker.hpp 6e558f2061a9e31157c47d31cb64b3a8568aace3 
>   src/checks/health_checker.cpp 50aa2858e807b27bbab58a3618f5200cfe4eca9e 
>   src/launcher/default_executor.cpp a03794934adb93868734f8cf00b337a1bff9b5ab 
>   src/tests/health_check_tests.cpp debbd3c09b7555145aaf3f62a24d795d1423a269 
> 
> Diff: https://reviews.apache.org/r/55901/diff/
> 
> 
> Testing
> -------
> 
> Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It passes on Linux, but not on macOS.
> 
> 
> Thanks,
> 
> Gast�n Kleiman
> 
>