You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by haosdent <ha...@gmail.com> on 2015/11/01 09:54:20 UTC

Re: Can't start docker container when SSL_ENABLED is on.

Hi, @Xiaodong I could reproduce your problem in my testing today. A quickly
workaround is adding environment variables when you launch slave.

```
./bin/mesos-slave.sh xxxx --containerizers=docker,mesos
--executor_environment_variables='{"SSL_KEY_FILE": "/tmp/server.key",
"SSL_CERT_FILE": "/tmp/ssl.chain.crt", "SSL_ENABLED": "true"}''
```

As you see above, pass the ssl env to docker-executor through specifying
--executor_environment_variables when starting. So far it works well for
me. Anyway I would submit a patch later to fix the docker environment
variables passing. After that, you could launch slave without
executor_environment_variables flag.

On Sat, Oct 31, 2015 at 2:56 PM, Tim Chen <ti...@mesosphere.io> wrote:

> Hi Xiaodong,
>
> If you follow the reviewboard you'll see that the fix is not correct, I
> believe Jojy will be posting a new patch.
>
> Tim
>
> On Fri, Oct 30, 2015 at 6:58 PM, Xiaodong Zhang <xd...@alauda.io> wrote:
>
>> it is still not working!
>>
>> Only if I remove SSL_ENABLED from envs before I start the slave it works
>> well.
>>
>> I applied the patch in version 0.24.1. And rebuild it with `--enable-libevent
>> --enable-ssl` 。
>>
>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>> 日期: 2015年10月31日 星期六 上午7:45
>>
>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>
>> Thanks Jojy.
>>
>> I will patch this in version 0.24.1, and rebuild it. I will let you know
>> if it work well after I finish testing.
>>
>> 发件人: Jojy Varghese <jo...@mesosphere.io>
>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>> 日期: 2015年10月31日 星期六 上午12:45
>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>
>> Thanks Xiaodong.
>>
>> Based on the hypothesis that the container process launched with
>> SSL_ENABLED in environment is the problem, I have created a patch
>> https://reviews.apache.org/r/39818/.  This might be a quick and dirty
>> was to test the hypothesis. Would it be possible for you to test again
>> after applying the patch?
>>
>> -Jojy
>>
>>
>>
>> On Oct 30, 2015, at 8:29 AM, Xiaodong Zhang <xd...@alauda.io> wrote:
>>
>> Thanks @Jojy
>>
>>
>>
>> Flags at startup: --appc_store_dir="/tmp/mesos/store/appc"
>> --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false"
>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
>> --credential="/etc/mesos-slave-auth" --default_role="*"
>> --disk_watch_interval="1mins" --docker="/usr/bin/docker"
>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>> --enforce_container_disk_quota="false"
>> --executor_registration_timeout="1hrs"
>> --executor_shutdown_grace_period="5secs"
>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>> --hadoop_home="" --help="false" --initialize_driver_logging="true"
>> --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos"
>> --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO"
>> --master="zk://172.31.43.77:2181,172.31.44.2:2181,172.31.36.91:2181/mesos"
>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>> --registration_backoff_factor="1secs"
>> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
>> --switch_user="true" --version="false" --work_dir="/tmp/mesos"
>>
>> 发件人: Jojy Varghese <jo...@mesosphere.io>
>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>> 日期: 2015年10月30日 星期五 下午11:17
>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>
>> Hi Xiaodong
>>   This might be because the executor inherits the SSL environment
>> variables of slave and thus expects SSL key password to launch. Could you
>> please add the part of the slave logs that says "Flags at startup” so that
>> we can have more information?
>>
>> thanks
>> Jojy
>>
>>
>> On Oct 29, 2015, at 8:55 PM, Xiaodong Zhang <xd...@alauda.io> wrote:
>>
>> Thanks a lot !~ @haosent
>>
>> 发件人: haosdent <ha...@gmail.com>
>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>> 日期: 2015年10月30日 星期五 上午11:45
>> 至: user <us...@mesos.apache.org>
>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>
>> Hi, @Xiaodong I interested in your problem. But recently days I don't
>> have enough time to try reproduce your problem. I think I could try to dig
>> your problem at this Sunday and give you feedback.
>>
>> On Fri, Oct 30, 2015 at 11:30 AM, Xiaodong Zhang <xd...@alauda.io>
>> wrote:
>>
>>> Anybody know about this?
>>>
>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 日期: 2015年10月29日 星期四 下午7:38
>>>
>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>
>>> I think it is easy to reproduce this error.
>>>
>>> Start master with env:
>>>
>>> SSL_SUPPORT_DOWNGRADE
>>> SSL_ENABLED
>>> SSL_KEY_FILE
>>> SSL_CERT_FILE
>>>
>>> Start slave with env:
>>>
>>> SSL_ENABLED
>>> SSL_KEY_FILE
>>> SSL_CERT_FILE
>>> LIBPROCESS_ADVERTISE_IP
>>>
>>>
>>> Then run a docker task via marathon.
>>>
>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>> 日期: 2015年10月29日 星期四 下午3:09
>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>
>>> So now, mesos task work well but docker task doesn’t.
>>>
>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 日期: 2015年10月29日 星期四 下午2:08
>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>
>>> I run a task by marathon:
>>>
>>> {
>>>     "id": "basic-0",
>>>     "cmd": "while [ true ] ; do echo 'Hello Marathon' ; sleep 5 ; done",
>>>     "cpus": 0.1,
>>>     "mem": 10.0,
>>>     "instances": 1}
>>>
>>>
>>> It works well.
>>>
>>> <742629F2-78E8-43F2-9015-F3D22720826B.png>
>>>
>>> Docker task can pull image but can’t run as I mentioned.
>>>
>>> My docker version 1.5.0
>>>
>>> 发件人: Tim Chen <ti...@mesosphere.io>
>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 日期: 2015年10月29日 星期四 下午1:48
>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>
>>> Does running a task without docker container (Mesos containerizer) works
>>> with ssl in your environment?
>>>
>>> Tim
>>>
>>> On Wed, Oct 28, 2015 at 10:19 PM, Xiaodong Zhang <xd...@alauda.io>
>>> wrote:
>>>
>>>> Thanks a lot. I find the log file in slave.
>>>>
>>>> One of the task:
>>>>
>>>> Stdout:
>>>>
>>>> --container="mesos-20151029-043755-3549436724-5050-5674-S0.e2c2580f-8082-4f17-b0cc-4e32e040d444"
>>>> --docker="/home/ubuntu/luna/bin/docker" --help="false"
>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444"
>>>> --stop_timeout="0ns"
>>>> --container="mesos-20151029-043755-3549436724-5050-5674-S0.e2c2580f-8082-4f17-b0cc-4e32e040d444"
>>>> --docker="/home/ubuntu/luna/bin/docker" --help="false"
>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444"
>>>> --stop_timeout="0ns"
>>>> Shutting down
>>>>
>>>> Stderr:
>>>>
>>>> I1029 05:14:06.529364 27862 fetcher.cpp:414] Fetcher Info:
>>>> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20151029-043755-3549436724-5050-5674-S0","items":[{"action":"BYPASS_CACHE","uri":{"extract":false,"value":"file:\/\/\/etc\/.dockercfg"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20151029-043755-3549436724-5050-5674-S0\/frameworks\/20151029-043755-3549436724-5050-5674-0000\/executors\/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f\/runs\/e2c2580f-8082-4f17-b0cc-4e32e040d444"}
>>>> I1029 05:14:06.530562 27862 fetcher.cpp:369] Fetching URI '
>>>> file:///etc/.dockercfg'
>>>> I1029 05:14:06.530580 27862 fetcher.cpp:243] Fetching directly into the
>>>> sandbox directory
>>>> I1029 05:14:06.530594 27862 fetcher.cpp:180] Fetching URI '
>>>> file:///etc/.dockercfg'
>>>> I1029 05:14:06.530609 27862 fetcher.cpp:160] Copying resource with
>>>> command:cp '/etc/.dockercfg'
>>>> '/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444/.dockercfg'
>>>> I1029 05:14:06.532165 27862 fetcher.cpp:446] Fetched '
>>>> file:///etc/.dockercfg' to
>>>> '/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444/.dockercfg'
>>>> I1029 05:14:07.782054 27955 exec.cpp:133] Version: 0.24.1
>>>> I1029 05:14:07.785039 27963 exec.cpp:462] Slave exited ... shutting down
>>>> E1029 05:14:07.785158 27964 socket.hpp:174] Shutdown failed on fd=7:
>>>> Transport endpoint is not connected [107]
>>>>
>>>> 发件人: haosdent <ha...@gmail.com>
>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 日期: 2015年10月29日 星期四 下午1:13
>>>>
>>>> 至: user <us...@mesos.apache.org>
>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>
>>>> <5185_02_04.png>
>>>> <5185_02_07.png>
>>>> ​
>>>> I capture how I find tasks log in my local webui, could you find the
>>>> stderr and stdout for your tasks according above screenshots?
>>>> ​
>>>>
>>>> On Thu, Oct 29, 2015 at 1:07 PM, Xiaodong Zhang <xd...@alauda.io>
>>>> wrote:
>>>>
>>>>> I didn’t see some useful info.
>>>>>
>>>>> In mesos slave log, there is a line :
>>>>> I1029 03:29:53.160143  9292 slave.cpp:3399] Executor
>>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713'
>>>>> of framework 20151029-031549-1294671788-5050-4937-0000 terminated
>>>>> with signal Killed
>>>>>
>>>>> I check the normal log, it shows:
>>>>>
>>>>> I1014 15:22:21.276007 23163 slave.cpp:3326] Executor
>>>>> 'ffc08dce-997f-41f7-9b03-57c1b4bc1f85.47ed02aa-7285-11e5-80d7-000d3a8033de'
>>>>> of framework 20150814-115157-1677721866-5050-6185-0000 exited with
>>>>> status 0
>>>>>
>>>>> Is this helpful?
>>>>>
>>>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>> 日期: 2015年10月29日 星期四 下午12:59
>>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>
>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>
>>>>> <9D46724C-457C-4BE1-B0E4-F57B147F6DC8.png>
>>>>>
>>>>> The webui have a LOG link, when click it shows like this:
>>>>>
>>>>> I1029 04:44:32.293445  5697 http.cpp:321] HTTP GET for
>>>>> /master/state.json from 114.113.20.135:55682 with
>>>>> User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)
>>>>> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36'
>>>>> I1029 04:44:34.533504  5704 master.cpp:4613] Sending 1 offers to
>>>>> framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>> I1029 04:44:34.539579  5702 master.cpp:2739] Processing ACCEPT call
>>>>> for offers: [ 20151029-043755-3549436724-5050-5674-O2 ] on slave
>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>> 50.112.136.148:5051 (
>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com) for framework
>>>>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>> I1029 04:44:34.539710  5702 hierarchical.hpp:814] Recovered cpus(*):1;
>>>>> mem(*):999; disk(*):3962; ports(*):[31000-32000] (total: cpus(*):1;
>>>>> mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: ) on slave
>>>>> 20151029-043755-3549436724-5050-5674-S0 from framework
>>>>> 20151029-043755-3549436724-5050-5674-0000
>>>>> I1029 04:44:37.360901  5703 master.cpp:4294] Performing implicit task
>>>>> state reconciliation for framework
>>>>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>> I1029 04:44:40.539989  5704 master.cpp:4613] Sending 1 offers to
>>>>> framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>> I1029 04:44:40.610321  5702 master.cpp:2739] Processing ACCEPT call
>>>>> for offers: [ 20151029-043755-3549436724-5050-5674-O3 ] on slave
>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>> 50.112.136.148:5051 (
>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com) for framework
>>>>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>> I1029 04:44:40.610846  5702 master.hpp:170] Adding task
>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>> with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] on slave
>>>>> 20151029-043755-3549436724-5050-5674-S0 (
>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>>>>> I1029 04:44:40.610911  5702 master.cpp:3069] Launching task
>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>> of framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>> with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] on slave
>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>> 50.112.136.148:5051 (
>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>>>>> I1029 04:44:40.611095  5702 hierarchical.hpp:814] Recovered
>>>>> cpus(*):0.9375; mem(*):743; disk(*):3962; ports(*):[31000-31863,
>>>>> 31865-32000] (total: cpus(*):1; mem(*):999; disk(*):3962;
>>>>> ports(*):[31000-32000], allocated: cpus(*):0.0625; mem(*):256;
>>>>> ports(*):[31864-31864]) on slave 20151029-043755-3549436724-5050-5674-S0
>>>>> from framework 20151029-043755-3549436724-5050-5674-0000
>>>>> I1029 04:44:43.324970  5698 http.cpp:321] HTTP GET for
>>>>> /master/state.json from 114.113.20.135:55682 with
>>>>> User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)
>>>>> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36'
>>>>> I1029 04:44:46.546671  5703 master.cpp:4613] Sending 1 offers to
>>>>> framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>> I1029 04:44:46.557266  5699 master.cpp:2739] Processing ACCEPT call
>>>>> for offers: [ 20151029-043755-3549436724-5050-5674-O4 ] on slave
>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>> 50.112.136.148:5051 (
>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com) for framework
>>>>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>> I1029 04:44:46.557394  5699 hierarchical.hpp:814] Recovered
>>>>> cpus(*):0.9375; mem(*):743; disk(*):3962; ports(*):[31000-31863,
>>>>> 31865-32000] (total: cpus(*):1; mem(*):999; disk(*):3962;
>>>>> ports(*):[31000-32000], allocated: cpus(*):0.0625; mem(*):256;
>>>>> ports(*):[31864-31864]) on slave 20151029-043755-3549436724-5050-5674-S0
>>>>> from framework 20151029-043755-3549436724-5050-5674-0000
>>>>> I1029 04:44:47.267562  5700 master.cpp:4069] Status update TASK_FAILED
>>>>> (UUID: 0ea607fc-bf24-4bda-b107-55a54aba31cf) for task
>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>> of framework 20151029-043755-3549436724-5050-5674-0000 from slave
>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>> 50.112.136.148:5051 (
>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>>>>> I1029 04:44:47.267645  5700 master.cpp:4108] Forwarding status update
>>>>> TASK_FAILED (UUID: 0ea607fc-bf24-4bda-b107-55a54aba31cf) for task
>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>> of framework 20151029-043755-3549436724-5050-5674-0000
>>>>> I1029 04:44:47.267774  5700 master.cpp:5576] Updating the latest state
>>>>> of task
>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>> of framework 20151029-043755-3549436724-5050-5674-0000 to TASK_FAILED
>>>>> I1029 04:44:47.267907  5700 hierarchical.hpp:814] Recovered
>>>>> cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] (total: cpus(*):1;
>>>>> mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: ) on slave
>>>>> 20151029-043755-3549436724-5050-5674-S0 from framework
>>>>> 20151029-043755-3549436724-5050-5674-0000
>>>>> I1029 04:44:47.289356  5698 master.cpp:5644] Removing task
>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>> with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] of
>>>>> framework 20151029-043755-3549436724-5050-5674-0000 on slave
>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>> 50.112.136.148:5051 (
>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>>>>> I1029 04:44:47.289459  5698 master.cpp:3398] Processing ACKNOWLEDGE
>>>>> call 0ea607fc-bf24-4bda-b107-55a54aba31cf for task
>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>> of framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373 on
>>>>> slave 20151029-043755-3549436724-5050-5674-S0
>>>>>
>>>>>
>>>>>
>>>>> 发件人: haosdent <ha...@gmail.com>
>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>> 日期: 2015年10月29日 星期四 下午12:02
>>>>> 至: user <us...@mesos.apache.org>
>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>
>>>>> Oh, I mean you task logs. They could be get from Mesos webui.
>>>>>
>>>>> On Thu, Oct 29, 2015 at 11:52 AM, Xiaodong Zhang <xd...@alauda.io>
>>>>> wrote:
>>>>>
>>>>>> Thanks for your reply.
>>>>>>
>>>>>> Yes I build mesos with `--enable-libevent --enable-ssl`. If I don’t
>>>>>> provide key and pem when start slave, it will register fail(That means the
>>>>>> ssl work well right?)
>>>>>>
>>>>>> As I said the odd thing is the container nerver run(`docker ps –a
>>>>>> show nothing`). So it can’t have any stdout or stderr.
>>>>>>
>>>>>> 发件人: haosdent <ha...@gmail.com>
>>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>> 日期: 2015年10月29日 星期四 上午11:47
>>>>>> 至: user <us...@mesos.apache.org>
>>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>>
>>>>>> Do you compile mesos with ssl support? The default compile don't
>>>>>> contains ssl. And does docker container have stdour and stderr?
>>>>>>
>>>>>> On Thu, Oct 29, 2015 at 11:41 AM, Xiaodong Zhang <xd...@alauda.io>
>>>>>> wrote:
>>>>>>
>>>>>>> My scenarios is like previous email says, masters and slaves are in
>>>>>>> different IaaS. Now the slaves can register to the masters with SSL_ENABLED
>>>>>>> is on .
>>>>>>>
>>>>>>> But I meet another problem. Slaves can’t run container(the odd thing
>>>>>>> is they can pull image successfully,just can not run container, `docker ps
>>>>>>> –a ` list nothing)
>>>>>>>
>>>>>>> The logs like this:
>>>>>>>
>>>>>>> I1029 03:29:45.967741  9288 docker.cpp:758] Starting container
>>>>>>> 'd4f4e236-0d0a-492c-86df-eef48a414e23' for task
>>>>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713'
>>>>>>> (and executor
>>>>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713')
>>>>>>> of framework '20151029-031549-1294671788-5050-4937-0000'
>>>>>>> I1029 03:29:48.044148  9292 docker.cpp:382] Checkpointing pid 12062
>>>>>>> to
>>>>>>> '/tmp/mesos/meta/slaves/20151029-031549-1294671788-5050-4937-S0/frameworks/20151029-031549-1294671788-5050-4937-0000/executors/279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713/runs/d4f4e236-0d0a-492c-86df-eef48a414e23/pids/forked.pid'
>>>>>>> I1029 03:29:53.159361  9292 docker.cpp:1576] Executor for container
>>>>>>> 'd4f4e236-0d0a-492c-86df-eef48a414e23' has exited
>>>>>>> I1029 03:29:53.159572  9292 docker.cpp:1374] Destroying container
>>>>>>> 'd4f4e236-0d0a-492c-86df-eef48a414e23'
>>>>>>> I1029 03:29:53.159822  9292 docker.cpp:1478] Running docker stop on
>>>>>>> container 'd4f4e236-0d0a-492c-86df-eef48a414e23'
>>>>>>> I1029 03:29:53.160143  9292 slave.cpp:3399] Executor
>>>>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713'
>>>>>>> of framework 20151029-031549-1294671788-5050-4937-0000 terminated
>>>>>>> with signal Killed
>>>>>>> I1029 03:29:53.160884  9292 slave.cpp:2696] Handling status update
>>>>>>> TASK_FAILED (UUID: 27a2080a-8807-449e-9077-837ec45b4c51) for task
>>>>>>> 279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713
>>>>>>> of framework 20151029-031549-1294671788-5050-4937-0000 from @
>>>>>>> 0.0.0.0:0
>>>>>>> W1029 03:29:53.161247  9288 docker.cpp:986] Ignoring updating
>>>>>>> unknown container: d4f4e236-0d0a-492c-86df-eef48a414e23
>>>>>>> I1029 03:29:53.161548  9293 status_update_manager.cpp:322] Received
>>>>>>> status update TASK_FAILED (UUID: 27a2080a-8807-449e-9077-837ec45b4c51) for
>>>>>>> task
>>>>>>> 279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713
>>>>>>> of framework 20151029-031549-1294671788-5050-4937-0000
>>>>>>>
>>>>>>> I run master node with env:
>>>>>>>
>>>>>>> SSL_SUPPORT_DOWNGRADE=true
>>>>>>> SSL_ENABLED=true
>>>>>>> SSL_KEY_FILE=/home/ubuntu/xx.key
>>>>>>> SSL_CERT_FILE=/home/ubuntu/xx.pem
>>>>>>>
>>>>>>> Slave node with env:
>>>>>>>
>>>>>>> SSL_ENABLED=true
>>>>>>> SSL_KEY_FILE=/home/ubuntu/xx.key
>>>>>>> SSL_CERT_FILE=/home/ubuntu/xx.pem
>>>>>>> LIBPROCESS_ADVERTISE_IP=xxx.xxx.xxx.xxx
>>>>>>>
>>>>>>> When I remove all SSL envs. Slaves work well.
>>>>>>>
>>>>>>> Did I miss sth?
>>>>>>>
>>>>>>> Version:
>>>>>>>
>>>>>>> Mesos 0.24.1
>>>>>>> Maraton 0.9.2
>>>>>>>
>>>>>>> OS
>>>>>>> ubuntu 14.04
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 发件人: Anindya Sinha <an...@gmail.com>
>>>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>> 日期: 2015年10月28日 星期三 下午2:32
>>>>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>> 主题: Re: How to tell master which ip to connect.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 27, 2015 at 7:43 PM, Xiaodong Zhang <xd...@alauda.io>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It works! Thanks a lot.
>>>>>>>>
>>>>>>>
>>>>>>> Ok. So we should expose advertise_ip and advertise_port as command
>>>>>>> line options for mesos-slave as well (instead of using the environment
>>>>>>> variables)? Opened https://issues.apache.org/jira/browse/MESOS-3809.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Another question. Do masters and slaves communicate each other via
>>>>>>>> a safety way?Is the data encrypted? I want to make sure deploy masters and
>>>>>>>> slaves into different IaaS is PROD-READY.
>>>>>>>>
>>>>>>>> 发件人: haosdent <ha...@gmail.com>
>>>>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>>> 日期: 2015年10月28日 星期三 上午10:23
>>>>>>>> 至: user <us...@mesos.apache.org>
>>>>>>>> 主题: Re: How to tell master which ip to connect.
>>>>>>>>
>>>>>>>> Do you try `export LIBPROCESS_ADVERTISE_IP=xxx` and
>>>>>>>> `LIBPROCESS_ADVERTISE_PORT` when start slave?
>>>>>>>>
>>>>>>>> On Wed, Oct 28, 2015 at 10:16 AM, Xiaodong Zhang <xdzhang@alauda.io
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hi teams:
>>>>>>>>>
>>>>>>>>> My scenarios is like this:
>>>>>>>>>
>>>>>>>>> My master nodes were deployed in AWS. My slaves were in AZURE.So
>>>>>>>>> they communicate via public ip.
>>>>>>>>> I got trouble when slaves try to register to master.
>>>>>>>>> Now slaves can get master’s public ip address,and can send
>>>>>>>>> register request.But they can only send there private ip to master.(Because
>>>>>>>>> they don’t know there public ip,thus they can’t not bind a public ip via
>>>>>>>>> —ip flag), thus  masters can’t connect slaves.How can the slave to tell
>>>>>>>>> master which ip master should connect(I can’t find any flags like —advertise_ip
>>>>>>>>> in master).
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>> <5185_02_07.png><9D46724C-457C-4BE1-B0E4-F57B147F6DC8.png>
>> <742629F2-78E8-43F2-9015-F3D22720826B.png><5185_02_04.png>
>>
>>
>>
>>
>


-- 
Best Regards,
Haosdent Huang

Re: Can't start docker container when SSL_ENABLED is on.

Posted by Xiaodong Zhang <xd...@alauda.io>.
Oh! My bad. Sorry!

Thanks again

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年11月2日 星期一 下午5:22
至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

```
stderr:
Could not load cert file
```

Does this because your path is wrong? Generally, executor_environment_variables should be OK.

On Mon, Nov 2, 2015 at 5:15 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Hi, haosdent.

1、command line arguments works not well.

Command:

/usr/sbin/mesos-slave --master=zk://xxx/mesos --log_dir=/var/log/mesos --containerizers=docker,mesos --credential=/etc/mesos-slave-auth --docker=/usr/bin/docker --executor_environment_variables={"SSL_KEY_FILE": "/home/ubuntu/cert/xxx.pem", "SSL_CERT_FILE": "/home/ubuntu/cert/xxx.key", "SSL_ENABLED": "true"} --executor_registration_timeout=60mins

env without ssl

Error info:

stderr:
Could not load cert file

Stdout:
--container="mesos-20151102-085117-3565115700-5050-25211-S1.2b784e8d-0bdd-4ffa-a7db-b6dcf35f0a03" --docker="/usr/bin/docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20151102-085117-3565115700-5050-25211-S1/frameworks/20151102-085117-3565115700-5050-25211-0000/executors/c310fa88-af8e-4fdd-92b6-eabf372bd187.85ff0237-8140-11e5-a875-021121f8fdf7/runs/2b784e8d-0bdd-4ffa-a7db-b6dcf35f0a03" --stop_timeout=“0ns"


2、the patch works well.(thanks again)

1 and 2 read the same cert file.

The format of the cert file like this:

-----BEGIN CERTIFICATE-----
Xxxxxx
-----END CERTIFICATE——

发件人: Xiaodong Zhang <xd...@alauda.io>>
日期: 2015年11月2日 星期一 上午11:22

至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Thanks@haosdent

I will test the command line arguments and then test patch.

Have a nice day!~~

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年11月1日 星期日 下午5:40
至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

@Xiaodong I create a ticket to trace this https://issues.apache.org/jira/browse/MESOS-3815 and post a patch in it. Feel free to review and test it together. Thank you!

On Sun, Nov 1, 2015 at 4:54 PM, haosdent <ha...@gmail.com>> wrote:
Hi, @Xiaodong I could reproduce your problem in my testing today. A quickly workaround is adding environment variables when you launch slave.

```
./bin/mesos-slave.sh xxxx --containerizers=docker,mesos --executor_environment_variables='{"SSL_KEY_FILE": "/tmp/server.key", "SSL_CERT_FILE": "/tmp/ssl.chain.crt", "SSL_ENABLED": "true"}''
```

As you see above, pass the ssl env to docker-executor through specifying --executor_environment_variables when starting. So far it works well for me. Anyway I would submit a patch later to fix the docker environment variables passing. After that, you could launch slave without executor_environment_variables flag.

On Sat, Oct 31, 2015 at 2:56 PM, Tim Chen <ti...@mesosphere.io>> wrote:
Hi Xiaodong,

If you follow the reviewboard you'll see that the fix is not correct, I believe Jojy will be posting a new patch.

Tim

On Fri, Oct 30, 2015 at 6:58 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:
it is still not working!

Only if I remove SSL_ENABLED from envs before I start the slave it works well.

I applied the patch in version 0.24.1. And rebuild it with `--enable-libevent --enable-ssl` 。

发件人: Xiaodong Zhang <xd...@alauda.io>>
日期: 2015年10月31日 星期六 上午7:45

至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Thanks Jojy.

I will patch this in version 0.24.1, and rebuild it. I will let you know if it work well after I finish testing.

发件人: Jojy Varghese <jo...@mesosphere.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月31日 星期六 上午12:45
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Thanks Xiaodong.

Based on the hypothesis that the container process launched with SSL_ENABLED in environment is the problem, I have created a patch https://reviews.apache.org/r/39818/.  This might be a quick and dirty was to test the hypothesis. Would it be possible for you to test again after applying the patch?

-Jojy



On Oct 30, 2015, at 8:29 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:

Thanks @Jojy



Flags at startup: --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --credential="/etc/mesos-slave-auth" --default_role="*" --disk_watch_interval="1mins" --docker="/usr/bin/docker" --docker_kill_orphans="true" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --enforce_container_disk_quota="false" --executor_registration_timeout="1hrs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://172.31.43.77:2181,172.31.44.2:2181,172.31.36.91:2181/mesos" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --version="false" --work_dir="/tmp/mesos"

发件人: Jojy Varghese <jo...@mesosphere.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月30日 星期五 下午11:17
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Hi Xiaodong
  This might be because the executor inherits the SSL environment variables of slave and thus expects SSL key password to launch. Could you please add the part of the slave logs that says "Flags at startup” so that we can have more information?

thanks
Jojy


On Oct 29, 2015, at 8:55 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:

Thanks a lot !~ @haosent

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月30日 星期五 上午11:45
至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Hi, @Xiaodong I interested in your problem. But recently days I don't have enough time to try reproduce your problem. I think I could try to dig your problem at this Sunday and give you feedback.

On Fri, Oct 30, 2015 at 11:30 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Anybody know about this?

发件人: Xiaodong Zhang <xd...@alauda.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午7:38

至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

I think it is easy to reproduce this error.

Start master with env:

SSL_SUPPORT_DOWNGRADE
SSL_ENABLED
SSL_KEY_FILE
SSL_CERT_FILE

Start slave with env:

SSL_ENABLED
SSL_KEY_FILE
SSL_CERT_FILE
LIBPROCESS_ADVERTISE_IP


Then run a docker task via marathon.

发件人: Xiaodong Zhang <xd...@alauda.io>>
日期: 2015年10月29日 星期四 下午3:09
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

So now, mesos task work well but docker task doesn’t.

发件人: Xiaodong Zhang <xd...@alauda.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午2:08
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

I run a task by marathon:


{
    "id": "basic-0",
    "cmd": "while [ true ] ; do echo 'Hello Marathon' ; sleep 5 ; done",
    "cpus": 0.1,
    "mem": 10.0,
    "instances": 1}

It works well.

<742629F2-78E8-43F2-9015-F3D22720826B.png>

Docker task can pull image but can’t run as I mentioned.

My docker version 1.5.0

发件人: Tim Chen <ti...@mesosphere.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午1:48
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Does running a task without docker container (Mesos containerizer) works with ssl in your environment?

Tim

On Wed, Oct 28, 2015 at 10:19 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Thanks a lot. I find the log file in slave.

One of the task:

Stdout:

--container="mesos-20151029-043755-3549436724-5050-5674-S0.e2c2580f-8082-4f17-b0cc-4e32e040d444" --docker="/home/ubuntu/luna/bin/docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444" --stop_timeout="0ns"
--container="mesos-20151029-043755-3549436724-5050-5674-S0.e2c2580f-8082-4f17-b0cc-4e32e040d444" --docker="/home/ubuntu/luna/bin/docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444" --stop_timeout="0ns"
Shutting down

Stderr:

I1029 05:14:06.529364 27862 fetcher.cpp:414] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20151029-043755-3549436724-5050-5674-S0","items":[{"action":"BYPASS_CACHE","uri":{"extract":false,"value":"file:\/\/\/etc\/.dockercfg"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20151029-043755-3549436724-5050-5674-S0\/frameworks\/20151029-043755-3549436724-5050-5674-0000\/executors\/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f\/runs\/e2c2580f-8082-4f17-b0cc-4e32e040d444"}
I1029 05:14:06.530562 27862 fetcher.cpp:369] Fetching URI 'file:///etc/.dockercfg'
I1029 05:14:06.530580 27862 fetcher.cpp:243] Fetching directly into the sandbox directory
I1029 05:14:06.530594 27862 fetcher.cpp:180] Fetching URI 'file:///etc/.dockercfg'
I1029 05:14:06.530609 27862 fetcher.cpp:160] Copying resource with command:cp '/etc/.dockercfg' '/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444/.dockercfg'
I1029 05:14:06.532165 27862 fetcher.cpp:446] Fetched 'file:///etc/.dockercfg' to '/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444/.dockercfg'
I1029 05:14:07.782054 27955 exec.cpp:133] Version: 0.24.1
I1029 05:14:07.785039 27963 exec.cpp:462] Slave exited ... shutting down
E1029 05:14:07.785158 27964 socket.hpp:174] Shutdown failed on fd=7: Transport endpoint is not connected [107]

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午1:13

至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

<5185_02_04.png>
<5185_02_07.png>
​
I capture how I find tasks log in my local webui, could you find the stderr and stdout for your tasks according above screenshots?
​

On Thu, Oct 29, 2015 at 1:07 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:
I didn’t see some useful info.

In mesos slave log, there is a line :
I1029 03:29:53.160143  9292 slave.cpp:3399] Executor '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713' of framework 20151029-031549-1294671788-5050-4937-0000 terminated with signal Killed

I check the normal log, it shows:

I1014 15:22:21.276007 23163 slave.cpp:3326] Executor 'ffc08dce-997f-41f7-9b03-57c1b4bc1f85.47ed02aa-7285-11e5-80d7-000d3a8033de' of framework 20150814-115157-1677721866-5050-6185-0000 exited with status 0

Is this helpful?

发件人: Xiaodong Zhang <xd...@alauda.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午12:59
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>

主题: Re: Can't start docker container when SSL_ENABLED is on.

<9D46724C-457C-4BE1-B0E4-F57B147F6DC8.png>

The webui have a LOG link, when click it shows like this:

I1029 04:44:32.293445  5697 http.cpp:321] HTTP GET for /master/state.json from 114.113.20.135:55682<http://114.113.20.135:55682/> with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36'
I1029 04:44:34.533504  5704 master.cpp:4613] Sending 1 offers to framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:34.539579  5702 master.cpp:2739] Processing ACCEPT call for offers: [ 20151029-043755-3549436724-5050-5674-O2 ] on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>) for framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:34.539710  5702 hierarchical.hpp:814] Recovered cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000] (total: cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: ) on slave 20151029-043755-3549436724-5050-5674-S0 from framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:37.360901  5703 master.cpp:4294] Performing implicit task state reconciliation for framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:40.539989  5704 master.cpp:4613] Sending 1 offers to framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:40.610321  5702 master.cpp:2739] Processing ACCEPT call for offers: [ 20151029-043755-3549436724-5050-5674-O3 ] on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>) for framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:40.610846  5702 master.hpp:170] Adding task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] on slave 20151029-043755-3549436724-5050-5674-S0 (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>)
I1029 04:44:40.610911  5702 master.cpp:3069] Launching task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373 with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>)
I1029 04:44:40.611095  5702 hierarchical.hpp:814] Recovered cpus(*):0.9375; mem(*):743; disk(*):3962; ports(*):[31000-31863, 31865-32000] (total: cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: cpus(*):0.0625; mem(*):256; ports(*):[31864-31864]) on slave 20151029-043755-3549436724-5050-5674-S0 from framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:43.324970  5698 http.cpp:321] HTTP GET for /master/state.json from 114.113.20.135:55682<http://114.113.20.135:55682/> with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36'
I1029 04:44:46.546671  5703 master.cpp:4613] Sending 1 offers to framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:46.557266  5699 master.cpp:2739] Processing ACCEPT call for offers: [ 20151029-043755-3549436724-5050-5674-O4 ] on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>) for framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:46.557394  5699 hierarchical.hpp:814] Recovered cpus(*):0.9375; mem(*):743; disk(*):3962; ports(*):[31000-31863, 31865-32000] (total: cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: cpus(*):0.0625; mem(*):256; ports(*):[31864-31864]) on slave 20151029-043755-3549436724-5050-5674-S0 from framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:47.267562  5700 master.cpp:4069] Status update TASK_FAILED (UUID: 0ea607fc-bf24-4bda-b107-55a54aba31cf) for task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000 from slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>)
I1029 04:44:47.267645  5700 master.cpp:4108] Forwarding status update TASK_FAILED (UUID: 0ea607fc-bf24-4bda-b107-55a54aba31cf) for task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:47.267774  5700 master.cpp:5576] Updating the latest state of task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000 to TASK_FAILED
I1029 04:44:47.267907  5700 hierarchical.hpp:814] Recovered cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] (total: cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: ) on slave 20151029-043755-3549436724-5050-5674-S0 from framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:47.289356  5698 master.cpp:5644] Removing task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] of framework 20151029-043755-3549436724-5050-5674-0000 on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>)
I1029 04:44:47.289459  5698 master.cpp:3398] Processing ACKNOWLEDGE call 0ea607fc-bf24-4bda-b107-55a54aba31cf for task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373 on slave 20151029-043755-3549436724-5050-5674-S0



发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午12:02
至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Oh, I mean you task logs. They could be get from Mesos webui.

On Thu, Oct 29, 2015 at 11:52 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Thanks for your reply.

Yes I build mesos with `--enable-libevent --enable-ssl`. If I don’t provide key and pem when start slave, it will register fail(That means the ssl work well right?)

As I said the odd thing is the container nerver run(`docker ps –a show nothing`). So it can’t have any stdout or stderr.

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 上午11:47
至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Do you compile mesos with ssl support? The default compile don't contains ssl. And does docker container have stdour and stderr?

On Thu, Oct 29, 2015 at 11:41 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
My scenarios is like previous email says, masters and slaves are in different IaaS. Now the slaves can register to the masters with SSL_ENABLED is on .

But I meet another problem. Slaves can’t run container(the odd thing is they can pull image successfully,just can not run container, `docker ps –a ` list nothing)

The logs like this:

I1029 03:29:45.967741  9288 docker.cpp:758] Starting container 'd4f4e236-0d0a-492c-86df-eef48a414e23' for task '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713' (and executor '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713') of framework '20151029-031549-1294671788-5050-4937-0000'
I1029 03:29:48.044148  9292 docker.cpp:382] Checkpointing pid 12062 to '/tmp/mesos/meta/slaves/20151029-031549-1294671788-5050-4937-S0/frameworks/20151029-031549-1294671788-5050-4937-0000/executors/279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713/runs/d4f4e236-0d0a-492c-86df-eef48a414e23/pids/forked.pid'
I1029 03:29:53.159361  9292 docker.cpp:1576] Executor for container 'd4f4e236-0d0a-492c-86df-eef48a414e23' has exited
I1029 03:29:53.159572  9292 docker.cpp:1374] Destroying container 'd4f4e236-0d0a-492c-86df-eef48a414e23'
I1029 03:29:53.159822  9292 docker.cpp:1478] Running docker stop on container 'd4f4e236-0d0a-492c-86df-eef48a414e23'
I1029 03:29:53.160143  9292 slave.cpp:3399] Executor '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713' of framework 20151029-031549-1294671788-5050-4937-0000 terminated with signal Killed
I1029 03:29:53.160884  9292 slave.cpp:2696] Handling status update TASK_FAILED (UUID: 27a2080a-8807-449e-9077-837ec45b4c51) for task 279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713 of framework 20151029-031549-1294671788-5050-4937-0000 from @0.0.0.0:0<http://0.0.0.0:0/>
W1029 03:29:53.161247  9288 docker.cpp:986] Ignoring updating unknown container: d4f4e236-0d0a-492c-86df-eef48a414e23
I1029 03:29:53.161548  9293 status_update_manager.cpp:322] Received status update TASK_FAILED (UUID: 27a2080a-8807-449e-9077-837ec45b4c51) for task 279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713 of framework 20151029-031549-1294671788-5050-4937-0000

I run master node with env:

SSL_SUPPORT_DOWNGRADE=true
SSL_ENABLED=true
SSL_KEY_FILE=/home/ubuntu/xx.key
SSL_CERT_FILE=/home/ubuntu/xx.pem

Slave node with env:

SSL_ENABLED=true
SSL_KEY_FILE=/home/ubuntu/xx.key
SSL_CERT_FILE=/home/ubuntu/xx.pem
LIBPROCESS_ADVERTISE_IP=xxx.xxx.xxx.xxx

When I remove all SSL envs. Slaves work well.

Did I miss sth?

Version:

Mesos 0.24.1
Maraton 0.9.2

OS
ubuntu 14.04



发件人: Anindya Sinha <an...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月28日 星期三 下午2:32
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: How to tell master which ip to connect.



On Tue, Oct 27, 2015 at 7:43 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:
It works! Thanks a lot.

Ok. So we should expose advertise_ip and advertise_port as command line options for mesos-slave as well (instead of using the environment variables)? Opened https://issues.apache.org/jira/browse/MESOS-3809.


Another question. Do masters and slaves communicate each other via a safety way?Is the data encrypted? I want to make sure deploy masters and slaves into different IaaS is PROD-READY.

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月28日 星期三 上午10:23
至: user <us...@mesos.apache.org>>
主题: Re: How to tell master which ip to connect.

Do you try `export LIBPROCESS_ADVERTISE_IP=xxx` and `LIBPROCESS_ADVERTISE_PORT` when start slave?

On Wed, Oct 28, 2015 at 10:16 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Hi teams:

My scenarios is like this:

My master nodes were deployed in AWS. My slaves were in AZURE.So they communicate via public ip.
I got trouble when slaves try to register to master.
Now slaves can get master’s public ip address,and can send register request.But they can only send there private ip to master.(Because they don’t know there public ip,thus they can’t not bind a public ip via —ip flag), thus  masters can’t connect slaves.How can the slave to tell master which ip master should connect(I can’t find any flags like —advertise_ip in master).



--
Best Regards,
Haosdent Huang




--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang




--
Best Regards,
Haosdent Huang
<5185_02_07.png><9D46724C-457C-4BE1-B0E4-F57B147F6DC8.png><742629F2-78E8-43F2-9015-F3D22720826B.png><5185_02_04.png>






--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang

Re: Can't start docker container when SSL_ENABLED is on.

Posted by haosdent <ha...@gmail.com>.
```
stderr:
Could not load cert file
```

Does this because your path is wrong? Generally,
executor_environment_variables should be OK.

On Mon, Nov 2, 2015 at 5:15 PM, Xiaodong Zhang <xd...@alauda.io> wrote:

> Hi, haosdent.
>
> 1、command line arguments works not well.
>
> Command:
>
> /usr/sbin/mesos-slave --master=zk://xxx/mesos --log_dir=/var/log/mesos
> --containerizers=docker,mesos --credential=/etc/mesos-slave-auth
> --docker=/usr/bin/docker --executor_environment_variables={"SSL_KEY_FILE":
> "/home/ubuntu/cert/xxx.pem", "SSL_CERT_FILE": "/home/ubuntu/cert/xxx.key",
> "SSL_ENABLED": "true"} --executor_registration_timeout=60mins
>
> env without ssl
>
> Error info:
>
> stderr:
> Could not load cert file
>
> Stdout:
> --container="mesos-20151102-085117-3565115700-5050-25211-S1.2b784e8d-0bdd-4ffa-a7db-b6dcf35f0a03"
> --docker="/usr/bin/docker" --help="false"
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
> --sandbox_directory="/tmp/mesos/slaves/20151102-085117-3565115700-5050-25211-S1/frameworks/20151102-085117-3565115700-5050-25211-0000/executors/c310fa88-af8e-4fdd-92b6-eabf372bd187.85ff0237-8140-11e5-a875-021121f8fdf7/runs/2b784e8d-0bdd-4ffa-a7db-b6dcf35f0a03"
> --stop_timeout=“0ns"
>
>
> 2、the patch works well.(thanks again)
>
> 1 and 2 read the same cert file.
>
> The format of the cert file like this:
>
> -----BEGIN CERTIFICATE-----
> Xxxxxx
> -----END CERTIFICATE——
>
> 发件人: Xiaodong Zhang <xd...@alauda.io>
> 日期: 2015年11月2日 星期一 上午11:22
>
> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>
> Thanks@haosdent
>
> I will test the command line arguments and then test patch.
>
> Have a nice day!~~
>
> 发件人: haosdent <ha...@gmail.com>
> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
> 日期: 2015年11月1日 星期日 下午5:40
> 至: user <us...@mesos.apache.org>
> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>
> @Xiaodong I create a ticket to trace this
> https://issues.apache.org/jira/browse/MESOS-3815 and post a patch in it.
> Feel free to review and test it together. Thank you!
>
> On Sun, Nov 1, 2015 at 4:54 PM, haosdent <ha...@gmail.com> wrote:
>
>> Hi, @Xiaodong I could reproduce your problem in my testing today. A
>> quickly workaround is adding environment variables when you launch slave.
>>
>> ```
>> ./bin/mesos-slave.sh xxxx --containerizers=docker,mesos
>> --executor_environment_variables='{"SSL_KEY_FILE": "/tmp/server.key",
>> "SSL_CERT_FILE": "/tmp/ssl.chain.crt", "SSL_ENABLED": "true"}''
>> ```
>>
>> As you see above, pass the ssl env to docker-executor through specifying
>> --executor_environment_variables when starting. So far it works well for
>> me. Anyway I would submit a patch later to fix the docker environment
>> variables passing. After that, you could launch slave without
>> executor_environment_variables flag.
>>
>> On Sat, Oct 31, 2015 at 2:56 PM, Tim Chen <ti...@mesosphere.io> wrote:
>>
>>> Hi Xiaodong,
>>>
>>> If you follow the reviewboard you'll see that the fix is not correct, I
>>> believe Jojy will be posting a new patch.
>>>
>>> Tim
>>>
>>> On Fri, Oct 30, 2015 at 6:58 PM, Xiaodong Zhang <xd...@alauda.io>
>>> wrote:
>>>
>>>> it is still not working!
>>>>
>>>> Only if I remove SSL_ENABLED from envs before I start the slave it
>>>> works well.
>>>>
>>>> I applied the patch in version 0.24.1. And rebuild it with `--enable-libevent
>>>> --enable-ssl` 。
>>>>
>>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>>> 日期: 2015年10月31日 星期六 上午7:45
>>>>
>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>
>>>> Thanks Jojy.
>>>>
>>>> I will patch this in version 0.24.1, and rebuild it. I will let you
>>>> know if it work well after I finish testing.
>>>>
>>>> 发件人: Jojy Varghese <jo...@mesosphere.io>
>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 日期: 2015年10月31日 星期六 上午12:45
>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>
>>>> Thanks Xiaodong.
>>>>
>>>> Based on the hypothesis that the container process launched with
>>>> SSL_ENABLED in environment is the problem, I have created a patch
>>>> https://reviews.apache.org/r/39818/.  This might be a quick and dirty
>>>> was to test the hypothesis. Would it be possible for you to test again
>>>> after applying the patch?
>>>>
>>>> -Jojy
>>>>
>>>>
>>>>
>>>> On Oct 30, 2015, at 8:29 AM, Xiaodong Zhang <xd...@alauda.io> wrote:
>>>>
>>>> Thanks @Jojy
>>>>
>>>>
>>>>
>>>> Flags at startup: --appc_store_dir="/tmp/mesos/store/appc"
>>>> --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false"
>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>>> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
>>>> --credential="/etc/mesos-slave-auth" --default_role="*"
>>>> --disk_watch_interval="1mins" --docker="/usr/bin/docker"
>>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>>> --enforce_container_disk_quota="false"
>>>> --executor_registration_timeout="1hrs"
>>>> --executor_shutdown_grace_period="5secs"
>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>>> --hadoop_home="" --help="false" --initialize_driver_logging="true"
>>>> --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos"
>>>> --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO"
>>>> --master="
>>>> zk://172.31.43.77:2181,172.31.44.2:2181,172.31.36.91:2181/mesos"
>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>>> --registration_backoff_factor="1secs"
>>>> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
>>>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
>>>> --switch_user="true" --version="false" --work_dir="/tmp/mesos"
>>>>
>>>> 发件人: Jojy Varghese <jo...@mesosphere.io>
>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 日期: 2015年10月30日 星期五 下午11:17
>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>
>>>> Hi Xiaodong
>>>>   This might be because the executor inherits the SSL environment
>>>> variables of slave and thus expects SSL key password to launch. Could you
>>>> please add the part of the slave logs that says "Flags at startup” so that
>>>> we can have more information?
>>>>
>>>> thanks
>>>> Jojy
>>>>
>>>>
>>>> On Oct 29, 2015, at 8:55 PM, Xiaodong Zhang <xd...@alauda.io> wrote:
>>>>
>>>> Thanks a lot !~ @haosent
>>>>
>>>> 发件人: haosdent <ha...@gmail.com>
>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 日期: 2015年10月30日 星期五 上午11:45
>>>> 至: user <us...@mesos.apache.org>
>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>
>>>> Hi, @Xiaodong I interested in your problem. But recently days I don't
>>>> have enough time to try reproduce your problem. I think I could try to dig
>>>> your problem at this Sunday and give you feedback.
>>>>
>>>> On Fri, Oct 30, 2015 at 11:30 AM, Xiaodong Zhang <xd...@alauda.io>
>>>> wrote:
>>>>
>>>>> Anybody know about this?
>>>>>
>>>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>> 日期: 2015年10月29日 星期四 下午7:38
>>>>>
>>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>
>>>>> I think it is easy to reproduce this error.
>>>>>
>>>>> Start master with env:
>>>>>
>>>>> SSL_SUPPORT_DOWNGRADE
>>>>> SSL_ENABLED
>>>>> SSL_KEY_FILE
>>>>> SSL_CERT_FILE
>>>>>
>>>>> Start slave with env:
>>>>>
>>>>> SSL_ENABLED
>>>>> SSL_KEY_FILE
>>>>> SSL_CERT_FILE
>>>>> LIBPROCESS_ADVERTISE_IP
>>>>>
>>>>>
>>>>> Then run a docker task via marathon.
>>>>>
>>>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>>>> 日期: 2015年10月29日 星期四 下午3:09
>>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>
>>>>> So now, mesos task work well but docker task doesn’t.
>>>>>
>>>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>> 日期: 2015年10月29日 星期四 下午2:08
>>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>
>>>>> I run a task by marathon:
>>>>>
>>>>> {
>>>>>     "id": "basic-0",
>>>>>     "cmd": "while [ true ] ; do echo 'Hello Marathon' ; sleep 5 ; done",
>>>>>     "cpus": 0.1,
>>>>>     "mem": 10.0,
>>>>>     "instances": 1}
>>>>>
>>>>>
>>>>> It works well.
>>>>>
>>>>> <742629F2-78E8-43F2-9015-F3D22720826B.png>
>>>>>
>>>>> Docker task can pull image but can’t run as I mentioned.
>>>>>
>>>>> My docker version 1.5.0
>>>>>
>>>>> 发件人: Tim Chen <ti...@mesosphere.io>
>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>> 日期: 2015年10月29日 星期四 下午1:48
>>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>
>>>>> Does running a task without docker container (Mesos containerizer)
>>>>> works with ssl in your environment?
>>>>>
>>>>> Tim
>>>>>
>>>>> On Wed, Oct 28, 2015 at 10:19 PM, Xiaodong Zhang <xd...@alauda.io>
>>>>> wrote:
>>>>>
>>>>>> Thanks a lot. I find the log file in slave.
>>>>>>
>>>>>> One of the task:
>>>>>>
>>>>>> Stdout:
>>>>>>
>>>>>> --container="mesos-20151029-043755-3549436724-5050-5674-S0.e2c2580f-8082-4f17-b0cc-4e32e040d444"
>>>>>> --docker="/home/ubuntu/luna/bin/docker" --help="false"
>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444"
>>>>>> --stop_timeout="0ns"
>>>>>> --container="mesos-20151029-043755-3549436724-5050-5674-S0.e2c2580f-8082-4f17-b0cc-4e32e040d444"
>>>>>> --docker="/home/ubuntu/luna/bin/docker" --help="false"
>>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>> --sandbox_directory="/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444"
>>>>>> --stop_timeout="0ns"
>>>>>> Shutting down
>>>>>>
>>>>>> Stderr:
>>>>>>
>>>>>> I1029 05:14:06.529364 27862 fetcher.cpp:414] Fetcher Info:
>>>>>> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20151029-043755-3549436724-5050-5674-S0","items":[{"action":"BYPASS_CACHE","uri":{"extract":false,"value":"file:\/\/\/etc\/.dockercfg"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20151029-043755-3549436724-5050-5674-S0\/frameworks\/20151029-043755-3549436724-5050-5674-0000\/executors\/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f\/runs\/e2c2580f-8082-4f17-b0cc-4e32e040d444"}
>>>>>> I1029 05:14:06.530562 27862 fetcher.cpp:369] Fetching URI '
>>>>>> file:///etc/.dockercfg'
>>>>>> I1029 05:14:06.530580 27862 fetcher.cpp:243] Fetching directly into
>>>>>> the sandbox directory
>>>>>> I1029 05:14:06.530594 27862 fetcher.cpp:180] Fetching URI '
>>>>>> file:///etc/.dockercfg'
>>>>>> I1029 05:14:06.530609 27862 fetcher.cpp:160] Copying resource with
>>>>>> command:cp '/etc/.dockercfg'
>>>>>> '/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444/.dockercfg'
>>>>>> I1029 05:14:06.532165 27862 fetcher.cpp:446] Fetched '
>>>>>> file:///etc/.dockercfg' to
>>>>>> '/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444/.dockercfg'
>>>>>> I1029 05:14:07.782054 27955 exec.cpp:133] Version: 0.24.1
>>>>>> I1029 05:14:07.785039 27963 exec.cpp:462] Slave exited ... shutting
>>>>>> down
>>>>>> E1029 05:14:07.785158 27964 socket.hpp:174] Shutdown failed on fd=7:
>>>>>> Transport endpoint is not connected [107]
>>>>>>
>>>>>> 发件人: haosdent <ha...@gmail.com>
>>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>> 日期: 2015年10月29日 星期四 下午1:13
>>>>>>
>>>>>> 至: user <us...@mesos.apache.org>
>>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>>
>>>>>> <5185_02_04.png>
>>>>>> <5185_02_07.png>
>>>>>> ​
>>>>>> I capture how I find tasks log in my local webui, could you find the
>>>>>> stderr and stdout for your tasks according above screenshots?
>>>>>> ​
>>>>>>
>>>>>> On Thu, Oct 29, 2015 at 1:07 PM, Xiaodong Zhang <xd...@alauda.io>
>>>>>> wrote:
>>>>>>
>>>>>>> I didn’t see some useful info.
>>>>>>>
>>>>>>> In mesos slave log, there is a line :
>>>>>>> I1029 03:29:53.160143  9292 slave.cpp:3399] Executor
>>>>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713'
>>>>>>> of framework 20151029-031549-1294671788-5050-4937-0000 terminated
>>>>>>> with signal Killed
>>>>>>>
>>>>>>> I check the normal log, it shows:
>>>>>>>
>>>>>>> I1014 15:22:21.276007 23163 slave.cpp:3326] Executor
>>>>>>> 'ffc08dce-997f-41f7-9b03-57c1b4bc1f85.47ed02aa-7285-11e5-80d7-000d3a8033de'
>>>>>>> of framework 20150814-115157-1677721866-5050-6185-0000 exited with
>>>>>>> status 0
>>>>>>>
>>>>>>> Is this helpful?
>>>>>>>
>>>>>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>> 日期: 2015年10月29日 星期四 下午12:59
>>>>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>>
>>>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>>>
>>>>>>> <9D46724C-457C-4BE1-B0E4-F57B147F6DC8.png>
>>>>>>>
>>>>>>> The webui have a LOG link, when click it shows like this:
>>>>>>>
>>>>>>> I1029 04:44:32.293445  5697 http.cpp:321] HTTP GET for
>>>>>>> /master/state.json from 114.113.20.135:55682 with
>>>>>>> User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)
>>>>>>> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36'
>>>>>>> I1029 04:44:34.533504  5704 master.cpp:4613] Sending 1 offers to
>>>>>>> framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>>> I1029 04:44:34.539579  5702 master.cpp:2739] Processing ACCEPT call
>>>>>>> for offers: [ 20151029-043755-3549436724-5050-5674-O2 ] on slave
>>>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>>>> 50.112.136.148:5051 (
>>>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com) for framework
>>>>>>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>>> I1029 04:44:34.539710  5702 hierarchical.hpp:814] Recovered
>>>>>>> cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000] (total:
>>>>>>> cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: )
>>>>>>> on slave 20151029-043755-3549436724-5050-5674-S0 from framework
>>>>>>> 20151029-043755-3549436724-5050-5674-0000
>>>>>>> I1029 04:44:37.360901  5703 master.cpp:4294] Performing implicit
>>>>>>> task state reconciliation for framework
>>>>>>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>>> I1029 04:44:40.539989  5704 master.cpp:4613] Sending 1 offers to
>>>>>>> framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>>> I1029 04:44:40.610321  5702 master.cpp:2739] Processing ACCEPT call
>>>>>>> for offers: [ 20151029-043755-3549436724-5050-5674-O3 ] on slave
>>>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>>>> 50.112.136.148:5051 (
>>>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com) for framework
>>>>>>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>>> I1029 04:44:40.610846  5702 master.hpp:170] Adding task
>>>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>>>> with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] on slave
>>>>>>> 20151029-043755-3549436724-5050-5674-S0 (
>>>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>>>>>>> I1029 04:44:40.610911  5702 master.cpp:3069] Launching task
>>>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>>>> of framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>>> with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] on slave
>>>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>>>> 50.112.136.148:5051 (
>>>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>>>>>>> I1029 04:44:40.611095  5702 hierarchical.hpp:814] Recovered
>>>>>>> cpus(*):0.9375; mem(*):743; disk(*):3962; ports(*):[31000-31863,
>>>>>>> 31865-32000] (total: cpus(*):1; mem(*):999; disk(*):3962;
>>>>>>> ports(*):[31000-32000], allocated: cpus(*):0.0625; mem(*):256;
>>>>>>> ports(*):[31864-31864]) on slave 20151029-043755-3549436724-5050-5674-S0
>>>>>>> from framework 20151029-043755-3549436724-5050-5674-0000
>>>>>>> I1029 04:44:43.324970  5698 http.cpp:321] HTTP GET for
>>>>>>> /master/state.json from 114.113.20.135:55682 with
>>>>>>> User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)
>>>>>>> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36'
>>>>>>> I1029 04:44:46.546671  5703 master.cpp:4613] Sending 1 offers to
>>>>>>> framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>>> I1029 04:44:46.557266  5699 master.cpp:2739] Processing ACCEPT call
>>>>>>> for offers: [ 20151029-043755-3549436724-5050-5674-O4 ] on slave
>>>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>>>> 50.112.136.148:5051 (
>>>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com) for framework
>>>>>>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>>> I1029 04:44:46.557394  5699 hierarchical.hpp:814] Recovered
>>>>>>> cpus(*):0.9375; mem(*):743; disk(*):3962; ports(*):[31000-31863,
>>>>>>> 31865-32000] (total: cpus(*):1; mem(*):999; disk(*):3962;
>>>>>>> ports(*):[31000-32000], allocated: cpus(*):0.0625; mem(*):256;
>>>>>>> ports(*):[31864-31864]) on slave 20151029-043755-3549436724-5050-5674-S0
>>>>>>> from framework 20151029-043755-3549436724-5050-5674-0000
>>>>>>> I1029 04:44:47.267562  5700 master.cpp:4069] Status update
>>>>>>> TASK_FAILED (UUID: 0ea607fc-bf24-4bda-b107-55a54aba31cf) for task
>>>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>>>> of framework 20151029-043755-3549436724-5050-5674-0000 from slave
>>>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>>>> 50.112.136.148:5051 (
>>>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>>>>>>> I1029 04:44:47.267645  5700 master.cpp:4108] Forwarding status
>>>>>>> update TASK_FAILED (UUID: 0ea607fc-bf24-4bda-b107-55a54aba31cf) for task
>>>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>>>> of framework 20151029-043755-3549436724-5050-5674-0000
>>>>>>> I1029 04:44:47.267774  5700 master.cpp:5576] Updating the latest
>>>>>>> state of task
>>>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>>>> of framework 20151029-043755-3549436724-5050-5674-0000 to TASK_FAILED
>>>>>>> I1029 04:44:47.267907  5700 hierarchical.hpp:814] Recovered
>>>>>>> cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] (total: cpus(*):1;
>>>>>>> mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: ) on slave
>>>>>>> 20151029-043755-3549436724-5050-5674-S0 from framework
>>>>>>> 20151029-043755-3549436724-5050-5674-0000
>>>>>>> I1029 04:44:47.289356  5698 master.cpp:5644] Removing task
>>>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>>>> with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] of
>>>>>>> framework 20151029-043755-3549436724-5050-5674-0000 on slave
>>>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>>>> 50.112.136.148:5051 (
>>>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>>>>>>> I1029 04:44:47.289459  5698 master.cpp:3398] Processing ACKNOWLEDGE
>>>>>>> call 0ea607fc-bf24-4bda-b107-55a54aba31cf for task
>>>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>>>> of framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>>> on slave 20151029-043755-3549436724-5050-5674-S0
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 发件人: haosdent <ha...@gmail.com>
>>>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>> 日期: 2015年10月29日 星期四 下午12:02
>>>>>>> 至: user <us...@mesos.apache.org>
>>>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>>>
>>>>>>> Oh, I mean you task logs. They could be get from Mesos webui.
>>>>>>>
>>>>>>> On Thu, Oct 29, 2015 at 11:52 AM, Xiaodong Zhang <xd...@alauda.io>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for your reply.
>>>>>>>>
>>>>>>>> Yes I build mesos with `--enable-libevent --enable-ssl`. If I
>>>>>>>> don’t provide key and pem when start slave, it will register fail(That
>>>>>>>> means the ssl work well right?)
>>>>>>>>
>>>>>>>> As I said the odd thing is the container nerver run(`docker ps –a
>>>>>>>> show nothing`). So it can’t have any stdout or stderr.
>>>>>>>>
>>>>>>>> 发件人: haosdent <ha...@gmail.com>
>>>>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>>> 日期: 2015年10月29日 星期四 上午11:47
>>>>>>>> 至: user <us...@mesos.apache.org>
>>>>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>>>>
>>>>>>>> Do you compile mesos with ssl support? The default compile don't
>>>>>>>> contains ssl. And does docker container have stdour and stderr?
>>>>>>>>
>>>>>>>> On Thu, Oct 29, 2015 at 11:41 AM, Xiaodong Zhang <xdzhang@alauda.io
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> My scenarios is like previous email says, masters and slaves are
>>>>>>>>> in different IaaS. Now the slaves can register to the masters
>>>>>>>>> with SSL_ENABLED is on .
>>>>>>>>>
>>>>>>>>> But I meet another problem. Slaves can’t run container(the odd
>>>>>>>>> thing is they can pull image successfully,just can not run container,
>>>>>>>>> `docker ps –a ` list nothing)
>>>>>>>>>
>>>>>>>>> The logs like this:
>>>>>>>>>
>>>>>>>>> I1029 03:29:45.967741  9288 docker.cpp:758] Starting container
>>>>>>>>> 'd4f4e236-0d0a-492c-86df-eef48a414e23' for task
>>>>>>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713'
>>>>>>>>> (and executor
>>>>>>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713')
>>>>>>>>> of framework '20151029-031549-1294671788-5050-4937-0000'
>>>>>>>>> I1029 03:29:48.044148  9292 docker.cpp:382] Checkpointing pid
>>>>>>>>> 12062 to
>>>>>>>>> '/tmp/mesos/meta/slaves/20151029-031549-1294671788-5050-4937-S0/frameworks/20151029-031549-1294671788-5050-4937-0000/executors/279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713/runs/d4f4e236-0d0a-492c-86df-eef48a414e23/pids/forked.pid'
>>>>>>>>> I1029 03:29:53.159361  9292 docker.cpp:1576] Executor for
>>>>>>>>> container 'd4f4e236-0d0a-492c-86df-eef48a414e23' has exited
>>>>>>>>> I1029 03:29:53.159572  9292 docker.cpp:1374] Destroying container
>>>>>>>>> 'd4f4e236-0d0a-492c-86df-eef48a414e23'
>>>>>>>>> I1029 03:29:53.159822  9292 docker.cpp:1478] Running docker stop
>>>>>>>>> on container 'd4f4e236-0d0a-492c-86df-eef48a414e23'
>>>>>>>>> I1029 03:29:53.160143  9292 slave.cpp:3399] Executor
>>>>>>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713'
>>>>>>>>> of framework 20151029-031549-1294671788-5050-4937-0000 terminated
>>>>>>>>> with signal Killed
>>>>>>>>> I1029 03:29:53.160884  9292 slave.cpp:2696] Handling status update
>>>>>>>>> TASK_FAILED (UUID: 27a2080a-8807-449e-9077-837ec45b4c51) for task
>>>>>>>>> 279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713
>>>>>>>>> of framework 20151029-031549-1294671788-5050-4937-0000 from @
>>>>>>>>> 0.0.0.0:0
>>>>>>>>> W1029 03:29:53.161247  9288 docker.cpp:986] Ignoring updating
>>>>>>>>> unknown container: d4f4e236-0d0a-492c-86df-eef48a414e23
>>>>>>>>> I1029 03:29:53.161548  9293 status_update_manager.cpp:322]
>>>>>>>>> Received status update TASK_FAILED (UUID:
>>>>>>>>> 27a2080a-8807-449e-9077-837ec45b4c51) for task
>>>>>>>>> 279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713
>>>>>>>>> of framework 20151029-031549-1294671788-5050-4937-0000
>>>>>>>>>
>>>>>>>>> I run master node with env:
>>>>>>>>>
>>>>>>>>> SSL_SUPPORT_DOWNGRADE=true
>>>>>>>>> SSL_ENABLED=true
>>>>>>>>> SSL_KEY_FILE=/home/ubuntu/xx.key
>>>>>>>>> SSL_CERT_FILE=/home/ubuntu/xx.pem
>>>>>>>>>
>>>>>>>>> Slave node with env:
>>>>>>>>>
>>>>>>>>> SSL_ENABLED=true
>>>>>>>>> SSL_KEY_FILE=/home/ubuntu/xx.key
>>>>>>>>> SSL_CERT_FILE=/home/ubuntu/xx.pem
>>>>>>>>> LIBPROCESS_ADVERTISE_IP=xxx.xxx.xxx.xxx
>>>>>>>>>
>>>>>>>>> When I remove all SSL envs. Slaves work well.
>>>>>>>>>
>>>>>>>>> Did I miss sth?
>>>>>>>>>
>>>>>>>>> Version:
>>>>>>>>>
>>>>>>>>> Mesos 0.24.1
>>>>>>>>> Maraton 0.9.2
>>>>>>>>>
>>>>>>>>> OS
>>>>>>>>> ubuntu 14.04
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 发件人: Anindya Sinha <an...@gmail.com>
>>>>>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>>>> 日期: 2015年10月28日 星期三 下午2:32
>>>>>>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>>>> 主题: Re: How to tell master which ip to connect.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 27, 2015 at 7:43 PM, Xiaodong Zhang <xdzhang@alauda.io
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> It works! Thanks a lot.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ok. So we should expose advertise_ip and advertise_port as command
>>>>>>>>> line options for mesos-slave as well (instead of using the environment
>>>>>>>>> variables)? Opened
>>>>>>>>> https://issues.apache.org/jira/browse/MESOS-3809.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Another question. Do masters and slaves communicate each other
>>>>>>>>>> via a safety way?Is the data encrypted? I want to make sure deploy masters
>>>>>>>>>> and slaves into different IaaS is PROD-READY.
>>>>>>>>>>
>>>>>>>>>> 发件人: haosdent <ha...@gmail.com>
>>>>>>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>>>>> 日期: 2015年10月28日 星期三 上午10:23
>>>>>>>>>> 至: user <us...@mesos.apache.org>
>>>>>>>>>> 主题: Re: How to tell master which ip to connect.
>>>>>>>>>>
>>>>>>>>>> Do you try `export LIBPROCESS_ADVERTISE_IP=xxx` and
>>>>>>>>>> `LIBPROCESS_ADVERTISE_PORT` when start slave?
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 28, 2015 at 10:16 AM, Xiaodong Zhang <
>>>>>>>>>> xdzhang@alauda.io> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi teams:
>>>>>>>>>>>
>>>>>>>>>>> My scenarios is like this:
>>>>>>>>>>>
>>>>>>>>>>> My master nodes were deployed in AWS. My slaves were in AZURE.So
>>>>>>>>>>> they communicate via public ip.
>>>>>>>>>>> I got trouble when slaves try to register to master.
>>>>>>>>>>> Now slaves can get master’s public ip address,and can send
>>>>>>>>>>> register request.But they can only send there private ip to master.(Because
>>>>>>>>>>> they don’t know there public ip,thus they can’t not bind a public ip via
>>>>>>>>>>> —ip flag), thus  masters can’t connect slaves.How can the slave to tell
>>>>>>>>>>> master which ip master should connect(I can’t find any flags like —advertise_ip
>>>>>>>>>>> in master).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>> <5185_02_07.png><9D46724C-457C-4BE1-B0E4-F57B147F6DC8.png>
>>>> <742629F2-78E8-43F2-9015-F3D22720826B.png><5185_02_04.png>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: Can't start docker container when SSL_ENABLED is on.

Posted by Xiaodong Zhang <xd...@alauda.io>.
Hi, haosdent.

1、command line arguments works not well.

Command:

/usr/sbin/mesos-slave --master=zk://xxx/mesos --log_dir=/var/log/mesos --containerizers=docker,mesos --credential=/etc/mesos-slave-auth --docker=/usr/bin/docker --executor_environment_variables={"SSL_KEY_FILE": "/home/ubuntu/cert/xxx.pem", "SSL_CERT_FILE": "/home/ubuntu/cert/xxx.key", "SSL_ENABLED": "true"} --executor_registration_timeout=60mins

env without ssl

Error info:

stderr:
Could not load cert file

Stdout:
--container="mesos-20151102-085117-3565115700-5050-25211-S1.2b784e8d-0bdd-4ffa-a7db-b6dcf35f0a03" --docker="/usr/bin/docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20151102-085117-3565115700-5050-25211-S1/frameworks/20151102-085117-3565115700-5050-25211-0000/executors/c310fa88-af8e-4fdd-92b6-eabf372bd187.85ff0237-8140-11e5-a875-021121f8fdf7/runs/2b784e8d-0bdd-4ffa-a7db-b6dcf35f0a03" --stop_timeout=“0ns"


2、the patch works well.(thanks again)

1 and 2 read the same cert file.

The format of the cert file like this:

-----BEGIN CERTIFICATE-----
Xxxxxx
-----END CERTIFICATE——

发件人: Xiaodong Zhang <xd...@alauda.io>>
日期: 2015年11月2日 星期一 上午11:22
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Thanks@haosdent

I will test the command line arguments and then test patch.

Have a nice day!~~

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年11月1日 星期日 下午5:40
至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

@Xiaodong I create a ticket to trace this https://issues.apache.org/jira/browse/MESOS-3815 and post a patch in it. Feel free to review and test it together. Thank you!

On Sun, Nov 1, 2015 at 4:54 PM, haosdent <ha...@gmail.com>> wrote:
Hi, @Xiaodong I could reproduce your problem in my testing today. A quickly workaround is adding environment variables when you launch slave.

```
./bin/mesos-slave.sh xxxx --containerizers=docker,mesos --executor_environment_variables='{"SSL_KEY_FILE": "/tmp/server.key", "SSL_CERT_FILE": "/tmp/ssl.chain.crt", "SSL_ENABLED": "true"}''
```

As you see above, pass the ssl env to docker-executor through specifying --executor_environment_variables when starting. So far it works well for me. Anyway I would submit a patch later to fix the docker environment variables passing. After that, you could launch slave without executor_environment_variables flag.

On Sat, Oct 31, 2015 at 2:56 PM, Tim Chen <ti...@mesosphere.io>> wrote:
Hi Xiaodong,

If you follow the reviewboard you'll see that the fix is not correct, I believe Jojy will be posting a new patch.

Tim

On Fri, Oct 30, 2015 at 6:58 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:
it is still not working!

Only if I remove SSL_ENABLED from envs before I start the slave it works well.

I applied the patch in version 0.24.1. And rebuild it with `--enable-libevent --enable-ssl` 。

发件人: Xiaodong Zhang <xd...@alauda.io>>
日期: 2015年10月31日 星期六 上午7:45

至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Thanks Jojy.

I will patch this in version 0.24.1, and rebuild it. I will let you know if it work well after I finish testing.

发件人: Jojy Varghese <jo...@mesosphere.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月31日 星期六 上午12:45
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Thanks Xiaodong.

Based on the hypothesis that the container process launched with SSL_ENABLED in environment is the problem, I have created a patch https://reviews.apache.org/r/39818/.  This might be a quick and dirty was to test the hypothesis. Would it be possible for you to test again after applying the patch?

-Jojy



On Oct 30, 2015, at 8:29 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:

Thanks @Jojy



Flags at startup: --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --credential="/etc/mesos-slave-auth" --default_role="*" --disk_watch_interval="1mins" --docker="/usr/bin/docker" --docker_kill_orphans="true" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --enforce_container_disk_quota="false" --executor_registration_timeout="1hrs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://172.31.43.77:2181,172.31.44.2:2181,172.31.36.91:2181/mesos" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --version="false" --work_dir="/tmp/mesos"

发件人: Jojy Varghese <jo...@mesosphere.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月30日 星期五 下午11:17
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Hi Xiaodong
  This might be because the executor inherits the SSL environment variables of slave and thus expects SSL key password to launch. Could you please add the part of the slave logs that says "Flags at startup” so that we can have more information?

thanks
Jojy


On Oct 29, 2015, at 8:55 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:

Thanks a lot !~ @haosent

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月30日 星期五 上午11:45
至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Hi, @Xiaodong I interested in your problem. But recently days I don't have enough time to try reproduce your problem. I think I could try to dig your problem at this Sunday and give you feedback.

On Fri, Oct 30, 2015 at 11:30 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Anybody know about this?

发件人: Xiaodong Zhang <xd...@alauda.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午7:38

至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

I think it is easy to reproduce this error.

Start master with env:

SSL_SUPPORT_DOWNGRADE
SSL_ENABLED
SSL_KEY_FILE
SSL_CERT_FILE

Start slave with env:

SSL_ENABLED
SSL_KEY_FILE
SSL_CERT_FILE
LIBPROCESS_ADVERTISE_IP


Then run a docker task via marathon.

发件人: Xiaodong Zhang <xd...@alauda.io>>
日期: 2015年10月29日 星期四 下午3:09
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

So now, mesos task work well but docker task doesn’t.

发件人: Xiaodong Zhang <xd...@alauda.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午2:08
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

I run a task by marathon:


{
    "id": "basic-0",
    "cmd": "while [ true ] ; do echo 'Hello Marathon' ; sleep 5 ; done",
    "cpus": 0.1,
    "mem": 10.0,
    "instances": 1}

It works well.

<742629F2-78E8-43F2-9015-F3D22720826B.png>

Docker task can pull image but can’t run as I mentioned.

My docker version 1.5.0

发件人: Tim Chen <ti...@mesosphere.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午1:48
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Does running a task without docker container (Mesos containerizer) works with ssl in your environment?

Tim

On Wed, Oct 28, 2015 at 10:19 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Thanks a lot. I find the log file in slave.

One of the task:

Stdout:

--container="mesos-20151029-043755-3549436724-5050-5674-S0.e2c2580f-8082-4f17-b0cc-4e32e040d444" --docker="/home/ubuntu/luna/bin/docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444" --stop_timeout="0ns"
--container="mesos-20151029-043755-3549436724-5050-5674-S0.e2c2580f-8082-4f17-b0cc-4e32e040d444" --docker="/home/ubuntu/luna/bin/docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444" --stop_timeout="0ns"
Shutting down

Stderr:

I1029 05:14:06.529364 27862 fetcher.cpp:414] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20151029-043755-3549436724-5050-5674-S0","items":[{"action":"BYPASS_CACHE","uri":{"extract":false,"value":"file:\/\/\/etc\/.dockercfg"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20151029-043755-3549436724-5050-5674-S0\/frameworks\/20151029-043755-3549436724-5050-5674-0000\/executors\/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f\/runs\/e2c2580f-8082-4f17-b0cc-4e32e040d444"}
I1029 05:14:06.530562 27862 fetcher.cpp:369] Fetching URI 'file:///etc/.dockercfg'
I1029 05:14:06.530580 27862 fetcher.cpp:243] Fetching directly into the sandbox directory
I1029 05:14:06.530594 27862 fetcher.cpp:180] Fetching URI 'file:///etc/.dockercfg'
I1029 05:14:06.530609 27862 fetcher.cpp:160] Copying resource with command:cp '/etc/.dockercfg' '/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444/.dockercfg'
I1029 05:14:06.532165 27862 fetcher.cpp:446] Fetched 'file:///etc/.dockercfg' to '/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444/.dockercfg'
I1029 05:14:07.782054 27955 exec.cpp:133] Version: 0.24.1
I1029 05:14:07.785039 27963 exec.cpp:462] Slave exited ... shutting down
E1029 05:14:07.785158 27964 socket.hpp:174] Shutdown failed on fd=7: Transport endpoint is not connected [107]

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午1:13

至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

<5185_02_04.png>
<5185_02_07.png>
​
I capture how I find tasks log in my local webui, could you find the stderr and stdout for your tasks according above screenshots?
​

On Thu, Oct 29, 2015 at 1:07 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:
I didn’t see some useful info.

In mesos slave log, there is a line :
I1029 03:29:53.160143  9292 slave.cpp:3399] Executor '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713' of framework 20151029-031549-1294671788-5050-4937-0000 terminated with signal Killed

I check the normal log, it shows:

I1014 15:22:21.276007 23163 slave.cpp:3326] Executor 'ffc08dce-997f-41f7-9b03-57c1b4bc1f85.47ed02aa-7285-11e5-80d7-000d3a8033de' of framework 20150814-115157-1677721866-5050-6185-0000 exited with status 0

Is this helpful?

发件人: Xiaodong Zhang <xd...@alauda.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午12:59
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>

主题: Re: Can't start docker container when SSL_ENABLED is on.

<9D46724C-457C-4BE1-B0E4-F57B147F6DC8.png>

The webui have a LOG link, when click it shows like this:

I1029 04:44:32.293445  5697 http.cpp:321] HTTP GET for /master/state.json from 114.113.20.135:55682<http://114.113.20.135:55682/> with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36'
I1029 04:44:34.533504  5704 master.cpp:4613] Sending 1 offers to framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:34.539579  5702 master.cpp:2739] Processing ACCEPT call for offers: [ 20151029-043755-3549436724-5050-5674-O2 ] on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>) for framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:34.539710  5702 hierarchical.hpp:814] Recovered cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000] (total: cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: ) on slave 20151029-043755-3549436724-5050-5674-S0 from framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:37.360901  5703 master.cpp:4294] Performing implicit task state reconciliation for framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:40.539989  5704 master.cpp:4613] Sending 1 offers to framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:40.610321  5702 master.cpp:2739] Processing ACCEPT call for offers: [ 20151029-043755-3549436724-5050-5674-O3 ] on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>) for framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:40.610846  5702 master.hpp:170] Adding task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] on slave 20151029-043755-3549436724-5050-5674-S0 (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>)
I1029 04:44:40.610911  5702 master.cpp:3069] Launching task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373 with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>)
I1029 04:44:40.611095  5702 hierarchical.hpp:814] Recovered cpus(*):0.9375; mem(*):743; disk(*):3962; ports(*):[31000-31863, 31865-32000] (total: cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: cpus(*):0.0625; mem(*):256; ports(*):[31864-31864]) on slave 20151029-043755-3549436724-5050-5674-S0 from framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:43.324970  5698 http.cpp:321] HTTP GET for /master/state.json from 114.113.20.135:55682<http://114.113.20.135:55682/> with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36'
I1029 04:44:46.546671  5703 master.cpp:4613] Sending 1 offers to framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:46.557266  5699 master.cpp:2739] Processing ACCEPT call for offers: [ 20151029-043755-3549436724-5050-5674-O4 ] on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>) for framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:46.557394  5699 hierarchical.hpp:814] Recovered cpus(*):0.9375; mem(*):743; disk(*):3962; ports(*):[31000-31863, 31865-32000] (total: cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: cpus(*):0.0625; mem(*):256; ports(*):[31864-31864]) on slave 20151029-043755-3549436724-5050-5674-S0 from framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:47.267562  5700 master.cpp:4069] Status update TASK_FAILED (UUID: 0ea607fc-bf24-4bda-b107-55a54aba31cf) for task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000 from slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>)
I1029 04:44:47.267645  5700 master.cpp:4108] Forwarding status update TASK_FAILED (UUID: 0ea607fc-bf24-4bda-b107-55a54aba31cf) for task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:47.267774  5700 master.cpp:5576] Updating the latest state of task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000 to TASK_FAILED
I1029 04:44:47.267907  5700 hierarchical.hpp:814] Recovered cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] (total: cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: ) on slave 20151029-043755-3549436724-5050-5674-S0 from framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:47.289356  5698 master.cpp:5644] Removing task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] of framework 20151029-043755-3549436724-5050-5674-0000 on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>)
I1029 04:44:47.289459  5698 master.cpp:3398] Processing ACKNOWLEDGE call 0ea607fc-bf24-4bda-b107-55a54aba31cf for task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373 on slave 20151029-043755-3549436724-5050-5674-S0



发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午12:02
至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Oh, I mean you task logs. They could be get from Mesos webui.

On Thu, Oct 29, 2015 at 11:52 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Thanks for your reply.

Yes I build mesos with `--enable-libevent --enable-ssl`. If I don’t provide key and pem when start slave, it will register fail(That means the ssl work well right?)

As I said the odd thing is the container nerver run(`docker ps –a show nothing`). So it can’t have any stdout or stderr.

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 上午11:47
至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Do you compile mesos with ssl support? The default compile don't contains ssl. And does docker container have stdour and stderr?

On Thu, Oct 29, 2015 at 11:41 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
My scenarios is like previous email says, masters and slaves are in different IaaS. Now the slaves can register to the masters with SSL_ENABLED is on .

But I meet another problem. Slaves can’t run container(the odd thing is they can pull image successfully,just can not run container, `docker ps –a ` list nothing)

The logs like this:

I1029 03:29:45.967741  9288 docker.cpp:758] Starting container 'd4f4e236-0d0a-492c-86df-eef48a414e23' for task '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713' (and executor '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713') of framework '20151029-031549-1294671788-5050-4937-0000'
I1029 03:29:48.044148  9292 docker.cpp:382] Checkpointing pid 12062 to '/tmp/mesos/meta/slaves/20151029-031549-1294671788-5050-4937-S0/frameworks/20151029-031549-1294671788-5050-4937-0000/executors/279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713/runs/d4f4e236-0d0a-492c-86df-eef48a414e23/pids/forked.pid'
I1029 03:29:53.159361  9292 docker.cpp:1576] Executor for container 'd4f4e236-0d0a-492c-86df-eef48a414e23' has exited
I1029 03:29:53.159572  9292 docker.cpp:1374] Destroying container 'd4f4e236-0d0a-492c-86df-eef48a414e23'
I1029 03:29:53.159822  9292 docker.cpp:1478] Running docker stop on container 'd4f4e236-0d0a-492c-86df-eef48a414e23'
I1029 03:29:53.160143  9292 slave.cpp:3399] Executor '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713' of framework 20151029-031549-1294671788-5050-4937-0000 terminated with signal Killed
I1029 03:29:53.160884  9292 slave.cpp:2696] Handling status update TASK_FAILED (UUID: 27a2080a-8807-449e-9077-837ec45b4c51) for task 279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713 of framework 20151029-031549-1294671788-5050-4937-0000 from @0.0.0.0:0<http://0.0.0.0:0/>
W1029 03:29:53.161247  9288 docker.cpp:986] Ignoring updating unknown container: d4f4e236-0d0a-492c-86df-eef48a414e23
I1029 03:29:53.161548  9293 status_update_manager.cpp:322] Received status update TASK_FAILED (UUID: 27a2080a-8807-449e-9077-837ec45b4c51) for task 279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713 of framework 20151029-031549-1294671788-5050-4937-0000

I run master node with env:

SSL_SUPPORT_DOWNGRADE=true
SSL_ENABLED=true
SSL_KEY_FILE=/home/ubuntu/xx.key
SSL_CERT_FILE=/home/ubuntu/xx.pem

Slave node with env:

SSL_ENABLED=true
SSL_KEY_FILE=/home/ubuntu/xx.key
SSL_CERT_FILE=/home/ubuntu/xx.pem
LIBPROCESS_ADVERTISE_IP=xxx.xxx.xxx.xxx

When I remove all SSL envs. Slaves work well.

Did I miss sth?

Version:

Mesos 0.24.1
Maraton 0.9.2

OS
ubuntu 14.04



发件人: Anindya Sinha <an...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月28日 星期三 下午2:32
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: How to tell master which ip to connect.



On Tue, Oct 27, 2015 at 7:43 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:
It works! Thanks a lot.

Ok. So we should expose advertise_ip and advertise_port as command line options for mesos-slave as well (instead of using the environment variables)? Opened https://issues.apache.org/jira/browse/MESOS-3809.


Another question. Do masters and slaves communicate each other via a safety way?Is the data encrypted? I want to make sure deploy masters and slaves into different IaaS is PROD-READY.

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月28日 星期三 上午10:23
至: user <us...@mesos.apache.org>>
主题: Re: How to tell master which ip to connect.

Do you try `export LIBPROCESS_ADVERTISE_IP=xxx` and `LIBPROCESS_ADVERTISE_PORT` when start slave?

On Wed, Oct 28, 2015 at 10:16 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Hi teams:

My scenarios is like this:

My master nodes were deployed in AWS. My slaves were in AZURE.So they communicate via public ip.
I got trouble when slaves try to register to master.
Now slaves can get master’s public ip address,and can send register request.But they can only send there private ip to master.(Because they don’t know there public ip,thus they can’t not bind a public ip via —ip flag), thus  masters can’t connect slaves.How can the slave to tell master which ip master should connect(I can’t find any flags like —advertise_ip in master).



--
Best Regards,
Haosdent Huang




--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang




--
Best Regards,
Haosdent Huang
<5185_02_07.png><9D46724C-457C-4BE1-B0E4-F57B147F6DC8.png><742629F2-78E8-43F2-9015-F3D22720826B.png><5185_02_04.png>






--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang

Re: Can't start docker container when SSL_ENABLED is on.

Posted by Xiaodong Zhang <xd...@alauda.io>.
Thanks@haosdent

I will test the command line arguments and then test patch.

Have a nice day!~~

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年11月1日 星期日 下午5:40
至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

@Xiaodong I create a ticket to trace this https://issues.apache.org/jira/browse/MESOS-3815 and post a patch in it. Feel free to review and test it together. Thank you!

On Sun, Nov 1, 2015 at 4:54 PM, haosdent <ha...@gmail.com>> wrote:
Hi, @Xiaodong I could reproduce your problem in my testing today. A quickly workaround is adding environment variables when you launch slave.

```
./bin/mesos-slave.sh xxxx --containerizers=docker,mesos --executor_environment_variables='{"SSL_KEY_FILE": "/tmp/server.key", "SSL_CERT_FILE": "/tmp/ssl.chain.crt", "SSL_ENABLED": "true"}''
```

As you see above, pass the ssl env to docker-executor through specifying --executor_environment_variables when starting. So far it works well for me. Anyway I would submit a patch later to fix the docker environment variables passing. After that, you could launch slave without executor_environment_variables flag.

On Sat, Oct 31, 2015 at 2:56 PM, Tim Chen <ti...@mesosphere.io>> wrote:
Hi Xiaodong,

If you follow the reviewboard you'll see that the fix is not correct, I believe Jojy will be posting a new patch.

Tim

On Fri, Oct 30, 2015 at 6:58 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:
it is still not working!

Only if I remove SSL_ENABLED from envs before I start the slave it works well.

I applied the patch in version 0.24.1. And rebuild it with `--enable-libevent --enable-ssl` 。

发件人: Xiaodong Zhang <xd...@alauda.io>>
日期: 2015年10月31日 星期六 上午7:45

至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Thanks Jojy.

I will patch this in version 0.24.1, and rebuild it. I will let you know if it work well after I finish testing.

发件人: Jojy Varghese <jo...@mesosphere.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月31日 星期六 上午12:45
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Thanks Xiaodong.

Based on the hypothesis that the container process launched with SSL_ENABLED in environment is the problem, I have created a patch https://reviews.apache.org/r/39818/.  This might be a quick and dirty was to test the hypothesis. Would it be possible for you to test again after applying the patch?

-Jojy



On Oct 30, 2015, at 8:29 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:

Thanks @Jojy



Flags at startup: --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --credential="/etc/mesos-slave-auth" --default_role="*" --disk_watch_interval="1mins" --docker="/usr/bin/docker" --docker_kill_orphans="true" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --enforce_container_disk_quota="false" --executor_registration_timeout="1hrs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://172.31.43.77:2181,172.31.44.2:2181,172.31.36.91:2181/mesos" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --version="false" --work_dir="/tmp/mesos"

发件人: Jojy Varghese <jo...@mesosphere.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月30日 星期五 下午11:17
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Hi Xiaodong
  This might be because the executor inherits the SSL environment variables of slave and thus expects SSL key password to launch. Could you please add the part of the slave logs that says "Flags at startup” so that we can have more information?

thanks
Jojy


On Oct 29, 2015, at 8:55 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:

Thanks a lot !~ @haosent

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月30日 星期五 上午11:45
至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Hi, @Xiaodong I interested in your problem. But recently days I don't have enough time to try reproduce your problem. I think I could try to dig your problem at this Sunday and give you feedback.

On Fri, Oct 30, 2015 at 11:30 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Anybody know about this?

发件人: Xiaodong Zhang <xd...@alauda.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午7:38

至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

I think it is easy to reproduce this error.

Start master with env:

SSL_SUPPORT_DOWNGRADE
SSL_ENABLED
SSL_KEY_FILE
SSL_CERT_FILE

Start slave with env:

SSL_ENABLED
SSL_KEY_FILE
SSL_CERT_FILE
LIBPROCESS_ADVERTISE_IP


Then run a docker task via marathon.

发件人: Xiaodong Zhang <xd...@alauda.io>>
日期: 2015年10月29日 星期四 下午3:09
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

So now, mesos task work well but docker task doesn’t.

发件人: Xiaodong Zhang <xd...@alauda.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午2:08
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

I run a task by marathon:


{
    "id": "basic-0",
    "cmd": "while [ true ] ; do echo 'Hello Marathon' ; sleep 5 ; done",
    "cpus": 0.1,
    "mem": 10.0,
    "instances": 1}

It works well.

<742629F2-78E8-43F2-9015-F3D22720826B.png>

Docker task can pull image but can’t run as I mentioned.

My docker version 1.5.0

发件人: Tim Chen <ti...@mesosphere.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午1:48
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Does running a task without docker container (Mesos containerizer) works with ssl in your environment?

Tim

On Wed, Oct 28, 2015 at 10:19 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Thanks a lot. I find the log file in slave.

One of the task:

Stdout:

--container="mesos-20151029-043755-3549436724-5050-5674-S0.e2c2580f-8082-4f17-b0cc-4e32e040d444" --docker="/home/ubuntu/luna/bin/docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444" --stop_timeout="0ns"
--container="mesos-20151029-043755-3549436724-5050-5674-S0.e2c2580f-8082-4f17-b0cc-4e32e040d444" --docker="/home/ubuntu/luna/bin/docker" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444" --stop_timeout="0ns"
Shutting down

Stderr:

I1029 05:14:06.529364 27862 fetcher.cpp:414] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20151029-043755-3549436724-5050-5674-S0","items":[{"action":"BYPASS_CACHE","uri":{"extract":false,"value":"file:\/\/\/etc\/.dockercfg"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20151029-043755-3549436724-5050-5674-S0\/frameworks\/20151029-043755-3549436724-5050-5674-0000\/executors\/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f\/runs\/e2c2580f-8082-4f17-b0cc-4e32e040d444"}
I1029 05:14:06.530562 27862 fetcher.cpp:369] Fetching URI 'file:///etc/.dockercfg'
I1029 05:14:06.530580 27862 fetcher.cpp:243] Fetching directly into the sandbox directory
I1029 05:14:06.530594 27862 fetcher.cpp:180] Fetching URI 'file:///etc/.dockercfg'
I1029 05:14:06.530609 27862 fetcher.cpp:160] Copying resource with command:cp '/etc/.dockercfg' '/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444/.dockercfg'
I1029 05:14:06.532165 27862 fetcher.cpp:446] Fetched 'file:///etc/.dockercfg' to '/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444/.dockercfg'
I1029 05:14:07.782054 27955 exec.cpp:133] Version: 0.24.1
I1029 05:14:07.785039 27963 exec.cpp:462] Slave exited ... shutting down
E1029 05:14:07.785158 27964 socket.hpp:174] Shutdown failed on fd=7: Transport endpoint is not connected [107]

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午1:13

至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

<5185_02_04.png>
<5185_02_07.png>
​
I capture how I find tasks log in my local webui, could you find the stderr and stdout for your tasks according above screenshots?
​

On Thu, Oct 29, 2015 at 1:07 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:
I didn’t see some useful info.

In mesos slave log, there is a line :
I1029 03:29:53.160143  9292 slave.cpp:3399] Executor '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713' of framework 20151029-031549-1294671788-5050-4937-0000 terminated with signal Killed

I check the normal log, it shows:

I1014 15:22:21.276007 23163 slave.cpp:3326] Executor 'ffc08dce-997f-41f7-9b03-57c1b4bc1f85.47ed02aa-7285-11e5-80d7-000d3a8033de' of framework 20150814-115157-1677721866-5050-6185-0000 exited with status 0

Is this helpful?

发件人: Xiaodong Zhang <xd...@alauda.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午12:59
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>

主题: Re: Can't start docker container when SSL_ENABLED is on.

<9D46724C-457C-4BE1-B0E4-F57B147F6DC8.png>

The webui have a LOG link, when click it shows like this:

I1029 04:44:32.293445  5697 http.cpp:321] HTTP GET for /master/state.json from 114.113.20.135:55682<http://114.113.20.135:55682/> with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36'
I1029 04:44:34.533504  5704 master.cpp:4613] Sending 1 offers to framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:34.539579  5702 master.cpp:2739] Processing ACCEPT call for offers: [ 20151029-043755-3549436724-5050-5674-O2 ] on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>) for framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:34.539710  5702 hierarchical.hpp:814] Recovered cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000] (total: cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: ) on slave 20151029-043755-3549436724-5050-5674-S0 from framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:37.360901  5703 master.cpp:4294] Performing implicit task state reconciliation for framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:40.539989  5704 master.cpp:4613] Sending 1 offers to framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:40.610321  5702 master.cpp:2739] Processing ACCEPT call for offers: [ 20151029-043755-3549436724-5050-5674-O3 ] on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>) for framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:40.610846  5702 master.hpp:170] Adding task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] on slave 20151029-043755-3549436724-5050-5674-S0 (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>)
I1029 04:44:40.610911  5702 master.cpp:3069] Launching task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373 with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>)
I1029 04:44:40.611095  5702 hierarchical.hpp:814] Recovered cpus(*):0.9375; mem(*):743; disk(*):3962; ports(*):[31000-31863, 31865-32000] (total: cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: cpus(*):0.0625; mem(*):256; ports(*):[31864-31864]) on slave 20151029-043755-3549436724-5050-5674-S0 from framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:43.324970  5698 http.cpp:321] HTTP GET for /master/state.json from 114.113.20.135:55682<http://114.113.20.135:55682/> with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36'
I1029 04:44:46.546671  5703 master.cpp:4613] Sending 1 offers to framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:46.557266  5699 master.cpp:2739] Processing ACCEPT call for offers: [ 20151029-043755-3549436724-5050-5674-O4 ] on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>) for framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373
I1029 04:44:46.557394  5699 hierarchical.hpp:814] Recovered cpus(*):0.9375; mem(*):743; disk(*):3962; ports(*):[31000-31863, 31865-32000] (total: cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: cpus(*):0.0625; mem(*):256; ports(*):[31864-31864]) on slave 20151029-043755-3549436724-5050-5674-S0 from framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:47.267562  5700 master.cpp:4069] Status update TASK_FAILED (UUID: 0ea607fc-bf24-4bda-b107-55a54aba31cf) for task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000 from slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>)
I1029 04:44:47.267645  5700 master.cpp:4108] Forwarding status update TASK_FAILED (UUID: 0ea607fc-bf24-4bda-b107-55a54aba31cf) for task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:47.267774  5700 master.cpp:5576] Updating the latest state of task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000 to TASK_FAILED
I1029 04:44:47.267907  5700 hierarchical.hpp:814] Recovered cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] (total: cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: ) on slave 20151029-043755-3549436724-5050-5674-S0 from framework 20151029-043755-3549436724-5050-5674-0000
I1029 04:44:47.289356  5698 master.cpp:5644] Removing task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] of framework 20151029-043755-3549436724-5050-5674-0000 on slave 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051<http://50.112.136.148:5051/> (ec2-50-112-136-148.us-west-2.compute.amazonaws.com<http://ec2-50-112-136-148.us-west-2.compute.amazonaws.com/>)
I1029 04:44:47.289459  5698 master.cpp:3398] Processing ACKNOWLEDGE call 0ea607fc-bf24-4bda-b107-55a54aba31cf for task e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f of framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77<ma...@172.31.43.77>:53373 on slave 20151029-043755-3549436724-5050-5674-S0



发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 下午12:02
至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Oh, I mean you task logs. They could be get from Mesos webui.

On Thu, Oct 29, 2015 at 11:52 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Thanks for your reply.

Yes I build mesos with `--enable-libevent --enable-ssl`. If I don’t provide key and pem when start slave, it will register fail(That means the ssl work well right?)

As I said the odd thing is the container nerver run(`docker ps –a show nothing`). So it can’t have any stdout or stderr.

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月29日 星期四 上午11:47
至: user <us...@mesos.apache.org>>
主题: Re: Can't start docker container when SSL_ENABLED is on.

Do you compile mesos with ssl support? The default compile don't contains ssl. And does docker container have stdour and stderr?

On Thu, Oct 29, 2015 at 11:41 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
My scenarios is like previous email says, masters and slaves are in different IaaS. Now the slaves can register to the masters with SSL_ENABLED is on .

But I meet another problem. Slaves can’t run container(the odd thing is they can pull image successfully,just can not run container, `docker ps –a ` list nothing)

The logs like this:

I1029 03:29:45.967741  9288 docker.cpp:758] Starting container 'd4f4e236-0d0a-492c-86df-eef48a414e23' for task '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713' (and executor '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713') of framework '20151029-031549-1294671788-5050-4937-0000'
I1029 03:29:48.044148  9292 docker.cpp:382] Checkpointing pid 12062 to '/tmp/mesos/meta/slaves/20151029-031549-1294671788-5050-4937-S0/frameworks/20151029-031549-1294671788-5050-4937-0000/executors/279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713/runs/d4f4e236-0d0a-492c-86df-eef48a414e23/pids/forked.pid'
I1029 03:29:53.159361  9292 docker.cpp:1576] Executor for container 'd4f4e236-0d0a-492c-86df-eef48a414e23' has exited
I1029 03:29:53.159572  9292 docker.cpp:1374] Destroying container 'd4f4e236-0d0a-492c-86df-eef48a414e23'
I1029 03:29:53.159822  9292 docker.cpp:1478] Running docker stop on container 'd4f4e236-0d0a-492c-86df-eef48a414e23'
I1029 03:29:53.160143  9292 slave.cpp:3399] Executor '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713' of framework 20151029-031549-1294671788-5050-4937-0000 terminated with signal Killed
I1029 03:29:53.160884  9292 slave.cpp:2696] Handling status update TASK_FAILED (UUID: 27a2080a-8807-449e-9077-837ec45b4c51) for task 279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713 of framework 20151029-031549-1294671788-5050-4937-0000 from @0.0.0.0:0<http://0.0.0.0:0/>
W1029 03:29:53.161247  9288 docker.cpp:986] Ignoring updating unknown container: d4f4e236-0d0a-492c-86df-eef48a414e23
I1029 03:29:53.161548  9293 status_update_manager.cpp:322] Received status update TASK_FAILED (UUID: 27a2080a-8807-449e-9077-837ec45b4c51) for task 279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713 of framework 20151029-031549-1294671788-5050-4937-0000

I run master node with env:

SSL_SUPPORT_DOWNGRADE=true
SSL_ENABLED=true
SSL_KEY_FILE=/home/ubuntu/xx.key
SSL_CERT_FILE=/home/ubuntu/xx.pem

Slave node with env:

SSL_ENABLED=true
SSL_KEY_FILE=/home/ubuntu/xx.key
SSL_CERT_FILE=/home/ubuntu/xx.pem
LIBPROCESS_ADVERTISE_IP=xxx.xxx.xxx.xxx

When I remove all SSL envs. Slaves work well.

Did I miss sth?

Version:

Mesos 0.24.1
Maraton 0.9.2

OS
ubuntu 14.04



发件人: Anindya Sinha <an...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月28日 星期三 下午2:32
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: How to tell master which ip to connect.



On Tue, Oct 27, 2015 at 7:43 PM, Xiaodong Zhang <xd...@alauda.io>> wrote:
It works! Thanks a lot.

Ok. So we should expose advertise_ip and advertise_port as command line options for mesos-slave as well (instead of using the environment variables)? Opened https://issues.apache.org/jira/browse/MESOS-3809.


Another question. Do masters and slaves communicate each other via a safety way?Is the data encrypted? I want to make sure deploy masters and slaves into different IaaS is PROD-READY.

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2015年10月28日 星期三 上午10:23
至: user <us...@mesos.apache.org>>
主题: Re: How to tell master which ip to connect.

Do you try `export LIBPROCESS_ADVERTISE_IP=xxx` and `LIBPROCESS_ADVERTISE_PORT` when start slave?

On Wed, Oct 28, 2015 at 10:16 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Hi teams:

My scenarios is like this:

My master nodes were deployed in AWS. My slaves were in AZURE.So they communicate via public ip.
I got trouble when slaves try to register to master.
Now slaves can get master’s public ip address,and can send register request.But they can only send there private ip to master.(Because they don’t know there public ip,thus they can’t not bind a public ip via —ip flag), thus  masters can’t connect slaves.How can the slave to tell master which ip master should connect(I can’t find any flags like —advertise_ip in master).



--
Best Regards,
Haosdent Huang




--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang




--
Best Regards,
Haosdent Huang
<5185_02_07.png><9D46724C-457C-4BE1-B0E4-F57B147F6DC8.png><742629F2-78E8-43F2-9015-F3D22720826B.png><5185_02_04.png>






--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang

Re: Can't start docker container when SSL_ENABLED is on.

Posted by haosdent <ha...@gmail.com>.
@Xiaodong I create a ticket to trace this
https://issues.apache.org/jira/browse/MESOS-3815 and post a patch in it.
Feel free to review and test it together. Thank you!

On Sun, Nov 1, 2015 at 4:54 PM, haosdent <ha...@gmail.com> wrote:

> Hi, @Xiaodong I could reproduce your problem in my testing today. A
> quickly workaround is adding environment variables when you launch slave.
>
> ```
> ./bin/mesos-slave.sh xxxx --containerizers=docker,mesos
> --executor_environment_variables='{"SSL_KEY_FILE": "/tmp/server.key",
> "SSL_CERT_FILE": "/tmp/ssl.chain.crt", "SSL_ENABLED": "true"}''
> ```
>
> As you see above, pass the ssl env to docker-executor through specifying
> --executor_environment_variables when starting. So far it works well for
> me. Anyway I would submit a patch later to fix the docker environment
> variables passing. After that, you could launch slave without
> executor_environment_variables flag.
>
> On Sat, Oct 31, 2015 at 2:56 PM, Tim Chen <ti...@mesosphere.io> wrote:
>
>> Hi Xiaodong,
>>
>> If you follow the reviewboard you'll see that the fix is not correct, I
>> believe Jojy will be posting a new patch.
>>
>> Tim
>>
>> On Fri, Oct 30, 2015 at 6:58 PM, Xiaodong Zhang <xd...@alauda.io>
>> wrote:
>>
>>> it is still not working!
>>>
>>> Only if I remove SSL_ENABLED from envs before I start the slave it works
>>> well.
>>>
>>> I applied the patch in version 0.24.1. And rebuild it with `--enable-libevent
>>> --enable-ssl` 。
>>>
>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>> 日期: 2015年10月31日 星期六 上午7:45
>>>
>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>
>>> Thanks Jojy.
>>>
>>> I will patch this in version 0.24.1, and rebuild it. I will let you know
>>> if it work well after I finish testing.
>>>
>>> 发件人: Jojy Varghese <jo...@mesosphere.io>
>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 日期: 2015年10月31日 星期六 上午12:45
>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>
>>> Thanks Xiaodong.
>>>
>>> Based on the hypothesis that the container process launched with
>>> SSL_ENABLED in environment is the problem, I have created a patch
>>> https://reviews.apache.org/r/39818/.  This might be a quick and dirty
>>> was to test the hypothesis. Would it be possible for you to test again
>>> after applying the patch?
>>>
>>> -Jojy
>>>
>>>
>>>
>>> On Oct 30, 2015, at 8:29 AM, Xiaodong Zhang <xd...@alauda.io> wrote:
>>>
>>> Thanks @Jojy
>>>
>>>
>>>
>>> Flags at startup: --appc_store_dir="/tmp/mesos/store/appc"
>>> --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false"
>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>>> --cgroups_limit_swap="false" --cgroups_root="mesos"
>>> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
>>> --credential="/etc/mesos-slave-auth" --default_role="*"
>>> --disk_watch_interval="1mins" --docker="/usr/bin/docker"
>>> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
>>> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
>>> --enforce_container_disk_quota="false"
>>> --executor_registration_timeout="1hrs"
>>> --executor_shutdown_grace_period="5secs"
>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>>> --hadoop_home="" --help="false" --initialize_driver_logging="true"
>>> --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos"
>>> --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO"
>>> --master="
>>> zk://172.31.43.77:2181,172.31.44.2:2181,172.31.36.91:2181/mesos"
>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
>>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>>> --registration_backoff_factor="1secs"
>>> --resource_monitoring_interval="1secs" --revocable_cpu_low_priority="true"
>>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
>>> --switch_user="true" --version="false" --work_dir="/tmp/mesos"
>>>
>>> 发件人: Jojy Varghese <jo...@mesosphere.io>
>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 日期: 2015年10月30日 星期五 下午11:17
>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>
>>> Hi Xiaodong
>>>   This might be because the executor inherits the SSL environment
>>> variables of slave and thus expects SSL key password to launch. Could you
>>> please add the part of the slave logs that says "Flags at startup” so that
>>> we can have more information?
>>>
>>> thanks
>>> Jojy
>>>
>>>
>>> On Oct 29, 2015, at 8:55 PM, Xiaodong Zhang <xd...@alauda.io> wrote:
>>>
>>> Thanks a lot !~ @haosent
>>>
>>> 发件人: haosdent <ha...@gmail.com>
>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 日期: 2015年10月30日 星期五 上午11:45
>>> 至: user <us...@mesos.apache.org>
>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>
>>> Hi, @Xiaodong I interested in your problem. But recently days I don't
>>> have enough time to try reproduce your problem. I think I could try to dig
>>> your problem at this Sunday and give you feedback.
>>>
>>> On Fri, Oct 30, 2015 at 11:30 AM, Xiaodong Zhang <xd...@alauda.io>
>>> wrote:
>>>
>>>> Anybody know about this?
>>>>
>>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 日期: 2015年10月29日 星期四 下午7:38
>>>>
>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>
>>>> I think it is easy to reproduce this error.
>>>>
>>>> Start master with env:
>>>>
>>>> SSL_SUPPORT_DOWNGRADE
>>>> SSL_ENABLED
>>>> SSL_KEY_FILE
>>>> SSL_CERT_FILE
>>>>
>>>> Start slave with env:
>>>>
>>>> SSL_ENABLED
>>>> SSL_KEY_FILE
>>>> SSL_CERT_FILE
>>>> LIBPROCESS_ADVERTISE_IP
>>>>
>>>>
>>>> Then run a docker task via marathon.
>>>>
>>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>>> 日期: 2015年10月29日 星期四 下午3:09
>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>
>>>> So now, mesos task work well but docker task doesn’t.
>>>>
>>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 日期: 2015年10月29日 星期四 下午2:08
>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>
>>>> I run a task by marathon:
>>>>
>>>> {
>>>>     "id": "basic-0",
>>>>     "cmd": "while [ true ] ; do echo 'Hello Marathon' ; sleep 5 ; done",
>>>>     "cpus": 0.1,
>>>>     "mem": 10.0,
>>>>     "instances": 1}
>>>>
>>>>
>>>> It works well.
>>>>
>>>> <742629F2-78E8-43F2-9015-F3D22720826B.png>
>>>>
>>>> Docker task can pull image but can’t run as I mentioned.
>>>>
>>>> My docker version 1.5.0
>>>>
>>>> 发件人: Tim Chen <ti...@mesosphere.io>
>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 日期: 2015年10月29日 星期四 下午1:48
>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>
>>>> Does running a task without docker container (Mesos containerizer)
>>>> works with ssl in your environment?
>>>>
>>>> Tim
>>>>
>>>> On Wed, Oct 28, 2015 at 10:19 PM, Xiaodong Zhang <xd...@alauda.io>
>>>> wrote:
>>>>
>>>>> Thanks a lot. I find the log file in slave.
>>>>>
>>>>> One of the task:
>>>>>
>>>>> Stdout:
>>>>>
>>>>> --container="mesos-20151029-043755-3549436724-5050-5674-S0.e2c2580f-8082-4f17-b0cc-4e32e040d444"
>>>>> --docker="/home/ubuntu/luna/bin/docker" --help="false"
>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>> --sandbox_directory="/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444"
>>>>> --stop_timeout="0ns"
>>>>> --container="mesos-20151029-043755-3549436724-5050-5674-S0.e2c2580f-8082-4f17-b0cc-4e32e040d444"
>>>>> --docker="/home/ubuntu/luna/bin/docker" --help="false"
>>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>> --sandbox_directory="/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444"
>>>>> --stop_timeout="0ns"
>>>>> Shutting down
>>>>>
>>>>> Stderr:
>>>>>
>>>>> I1029 05:14:06.529364 27862 fetcher.cpp:414] Fetcher Info:
>>>>> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20151029-043755-3549436724-5050-5674-S0","items":[{"action":"BYPASS_CACHE","uri":{"extract":false,"value":"file:\/\/\/etc\/.dockercfg"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20151029-043755-3549436724-5050-5674-S0\/frameworks\/20151029-043755-3549436724-5050-5674-0000\/executors\/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f\/runs\/e2c2580f-8082-4f17-b0cc-4e32e040d444"}
>>>>> I1029 05:14:06.530562 27862 fetcher.cpp:369] Fetching URI '
>>>>> file:///etc/.dockercfg'
>>>>> I1029 05:14:06.530580 27862 fetcher.cpp:243] Fetching directly into
>>>>> the sandbox directory
>>>>> I1029 05:14:06.530594 27862 fetcher.cpp:180] Fetching URI '
>>>>> file:///etc/.dockercfg'
>>>>> I1029 05:14:06.530609 27862 fetcher.cpp:160] Copying resource with
>>>>> command:cp '/etc/.dockercfg'
>>>>> '/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444/.dockercfg'
>>>>> I1029 05:14:06.532165 27862 fetcher.cpp:446] Fetched '
>>>>> file:///etc/.dockercfg' to
>>>>> '/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444/.dockercfg'
>>>>> I1029 05:14:07.782054 27955 exec.cpp:133] Version: 0.24.1
>>>>> I1029 05:14:07.785039 27963 exec.cpp:462] Slave exited ... shutting
>>>>> down
>>>>> E1029 05:14:07.785158 27964 socket.hpp:174] Shutdown failed on fd=7:
>>>>> Transport endpoint is not connected [107]
>>>>>
>>>>> 发件人: haosdent <ha...@gmail.com>
>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>> 日期: 2015年10月29日 星期四 下午1:13
>>>>>
>>>>> 至: user <us...@mesos.apache.org>
>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>
>>>>> <5185_02_04.png>
>>>>> <5185_02_07.png>
>>>>> ​
>>>>> I capture how I find tasks log in my local webui, could you find the
>>>>> stderr and stdout for your tasks according above screenshots?
>>>>> ​
>>>>>
>>>>> On Thu, Oct 29, 2015 at 1:07 PM, Xiaodong Zhang <xd...@alauda.io>
>>>>> wrote:
>>>>>
>>>>>> I didn’t see some useful info.
>>>>>>
>>>>>> In mesos slave log, there is a line :
>>>>>> I1029 03:29:53.160143  9292 slave.cpp:3399] Executor
>>>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713'
>>>>>> of framework 20151029-031549-1294671788-5050-4937-0000 terminated
>>>>>> with signal Killed
>>>>>>
>>>>>> I check the normal log, it shows:
>>>>>>
>>>>>> I1014 15:22:21.276007 23163 slave.cpp:3326] Executor
>>>>>> 'ffc08dce-997f-41f7-9b03-57c1b4bc1f85.47ed02aa-7285-11e5-80d7-000d3a8033de'
>>>>>> of framework 20150814-115157-1677721866-5050-6185-0000 exited with
>>>>>> status 0
>>>>>>
>>>>>> Is this helpful?
>>>>>>
>>>>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>> 日期: 2015年10月29日 星期四 下午12:59
>>>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>
>>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>>
>>>>>> <9D46724C-457C-4BE1-B0E4-F57B147F6DC8.png>
>>>>>>
>>>>>> The webui have a LOG link, when click it shows like this:
>>>>>>
>>>>>> I1029 04:44:32.293445  5697 http.cpp:321] HTTP GET for
>>>>>> /master/state.json from 114.113.20.135:55682 with
>>>>>> User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)
>>>>>> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36'
>>>>>> I1029 04:44:34.533504  5704 master.cpp:4613] Sending 1 offers to
>>>>>> framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>> I1029 04:44:34.539579  5702 master.cpp:2739] Processing ACCEPT call
>>>>>> for offers: [ 20151029-043755-3549436724-5050-5674-O2 ] on slave
>>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>>> 50.112.136.148:5051 (
>>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com) for framework
>>>>>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>> I1029 04:44:34.539710  5702 hierarchical.hpp:814] Recovered
>>>>>> cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000] (total:
>>>>>> cpus(*):1; mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: )
>>>>>> on slave 20151029-043755-3549436724-5050-5674-S0 from framework
>>>>>> 20151029-043755-3549436724-5050-5674-0000
>>>>>> I1029 04:44:37.360901  5703 master.cpp:4294] Performing implicit task
>>>>>> state reconciliation for framework
>>>>>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>> I1029 04:44:40.539989  5704 master.cpp:4613] Sending 1 offers to
>>>>>> framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>> I1029 04:44:40.610321  5702 master.cpp:2739] Processing ACCEPT call
>>>>>> for offers: [ 20151029-043755-3549436724-5050-5674-O3 ] on slave
>>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>>> 50.112.136.148:5051 (
>>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com) for framework
>>>>>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>> I1029 04:44:40.610846  5702 master.hpp:170] Adding task
>>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>>> with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] on slave
>>>>>> 20151029-043755-3549436724-5050-5674-S0 (
>>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>>>>>> I1029 04:44:40.610911  5702 master.cpp:3069] Launching task
>>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>>> of framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>> with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] on slave
>>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>>> 50.112.136.148:5051 (
>>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>>>>>> I1029 04:44:40.611095  5702 hierarchical.hpp:814] Recovered
>>>>>> cpus(*):0.9375; mem(*):743; disk(*):3962; ports(*):[31000-31863,
>>>>>> 31865-32000] (total: cpus(*):1; mem(*):999; disk(*):3962;
>>>>>> ports(*):[31000-32000], allocated: cpus(*):0.0625; mem(*):256;
>>>>>> ports(*):[31864-31864]) on slave 20151029-043755-3549436724-5050-5674-S0
>>>>>> from framework 20151029-043755-3549436724-5050-5674-0000
>>>>>> I1029 04:44:43.324970  5698 http.cpp:321] HTTP GET for
>>>>>> /master/state.json from 114.113.20.135:55682 with
>>>>>> User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)
>>>>>> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36'
>>>>>> I1029 04:44:46.546671  5703 master.cpp:4613] Sending 1 offers to
>>>>>> framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>> I1029 04:44:46.557266  5699 master.cpp:2739] Processing ACCEPT call
>>>>>> for offers: [ 20151029-043755-3549436724-5050-5674-O4 ] on slave
>>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>>> 50.112.136.148:5051 (
>>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com) for framework
>>>>>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>>>>>> I1029 04:44:46.557394  5699 hierarchical.hpp:814] Recovered
>>>>>> cpus(*):0.9375; mem(*):743; disk(*):3962; ports(*):[31000-31863,
>>>>>> 31865-32000] (total: cpus(*):1; mem(*):999; disk(*):3962;
>>>>>> ports(*):[31000-32000], allocated: cpus(*):0.0625; mem(*):256;
>>>>>> ports(*):[31864-31864]) on slave 20151029-043755-3549436724-5050-5674-S0
>>>>>> from framework 20151029-043755-3549436724-5050-5674-0000
>>>>>> I1029 04:44:47.267562  5700 master.cpp:4069] Status update
>>>>>> TASK_FAILED (UUID: 0ea607fc-bf24-4bda-b107-55a54aba31cf) for task
>>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>>> of framework 20151029-043755-3549436724-5050-5674-0000 from slave
>>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>>> 50.112.136.148:5051 (
>>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>>>>>> I1029 04:44:47.267645  5700 master.cpp:4108] Forwarding status update
>>>>>> TASK_FAILED (UUID: 0ea607fc-bf24-4bda-b107-55a54aba31cf) for task
>>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>>> of framework 20151029-043755-3549436724-5050-5674-0000
>>>>>> I1029 04:44:47.267774  5700 master.cpp:5576] Updating the latest
>>>>>> state of task
>>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>>> of framework 20151029-043755-3549436724-5050-5674-0000 to TASK_FAILED
>>>>>> I1029 04:44:47.267907  5700 hierarchical.hpp:814] Recovered
>>>>>> cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] (total: cpus(*):1;
>>>>>> mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: ) on slave
>>>>>> 20151029-043755-3549436724-5050-5674-S0 from framework
>>>>>> 20151029-043755-3549436724-5050-5674-0000
>>>>>> I1029 04:44:47.289356  5698 master.cpp:5644] Removing task
>>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>>> with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] of
>>>>>> framework 20151029-043755-3549436724-5050-5674-0000 on slave
>>>>>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@
>>>>>> 50.112.136.148:5051 (
>>>>>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>>>>>> I1029 04:44:47.289459  5698 master.cpp:3398] Processing ACKNOWLEDGE
>>>>>> call 0ea607fc-bf24-4bda-b107-55a54aba31cf for task
>>>>>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>>>>>> of framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>>>>>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373 on
>>>>>> slave 20151029-043755-3549436724-5050-5674-S0
>>>>>>
>>>>>>
>>>>>>
>>>>>> 发件人: haosdent <ha...@gmail.com>
>>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>> 日期: 2015年10月29日 星期四 下午12:02
>>>>>> 至: user <us...@mesos.apache.org>
>>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>>
>>>>>> Oh, I mean you task logs. They could be get from Mesos webui.
>>>>>>
>>>>>> On Thu, Oct 29, 2015 at 11:52 AM, Xiaodong Zhang <xd...@alauda.io>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for your reply.
>>>>>>>
>>>>>>> Yes I build mesos with `--enable-libevent --enable-ssl`. If I don’t
>>>>>>> provide key and pem when start slave, it will register fail(That means the
>>>>>>> ssl work well right?)
>>>>>>>
>>>>>>> As I said the odd thing is the container nerver run(`docker ps –a
>>>>>>> show nothing`). So it can’t have any stdout or stderr.
>>>>>>>
>>>>>>> 发件人: haosdent <ha...@gmail.com>
>>>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>> 日期: 2015年10月29日 星期四 上午11:47
>>>>>>> 至: user <us...@mesos.apache.org>
>>>>>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>>>>>
>>>>>>> Do you compile mesos with ssl support? The default compile don't
>>>>>>> contains ssl. And does docker container have stdour and stderr?
>>>>>>>
>>>>>>> On Thu, Oct 29, 2015 at 11:41 AM, Xiaodong Zhang <xd...@alauda.io>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> My scenarios is like previous email says, masters and slaves are in
>>>>>>>> different IaaS. Now the slaves can register to the masters with SSL_ENABLED
>>>>>>>> is on .
>>>>>>>>
>>>>>>>> But I meet another problem. Slaves can’t run container(the odd
>>>>>>>> thing is they can pull image successfully,just can not run container,
>>>>>>>> `docker ps –a ` list nothing)
>>>>>>>>
>>>>>>>> The logs like this:
>>>>>>>>
>>>>>>>> I1029 03:29:45.967741  9288 docker.cpp:758] Starting container
>>>>>>>> 'd4f4e236-0d0a-492c-86df-eef48a414e23' for task
>>>>>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713'
>>>>>>>> (and executor
>>>>>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713')
>>>>>>>> of framework '20151029-031549-1294671788-5050-4937-0000'
>>>>>>>> I1029 03:29:48.044148  9292 docker.cpp:382] Checkpointing pid 12062
>>>>>>>> to
>>>>>>>> '/tmp/mesos/meta/slaves/20151029-031549-1294671788-5050-4937-S0/frameworks/20151029-031549-1294671788-5050-4937-0000/executors/279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713/runs/d4f4e236-0d0a-492c-86df-eef48a414e23/pids/forked.pid'
>>>>>>>> I1029 03:29:53.159361  9292 docker.cpp:1576] Executor for container
>>>>>>>> 'd4f4e236-0d0a-492c-86df-eef48a414e23' has exited
>>>>>>>> I1029 03:29:53.159572  9292 docker.cpp:1374] Destroying container
>>>>>>>> 'd4f4e236-0d0a-492c-86df-eef48a414e23'
>>>>>>>> I1029 03:29:53.159822  9292 docker.cpp:1478] Running docker stop on
>>>>>>>> container 'd4f4e236-0d0a-492c-86df-eef48a414e23'
>>>>>>>> I1029 03:29:53.160143  9292 slave.cpp:3399] Executor
>>>>>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713'
>>>>>>>> of framework 20151029-031549-1294671788-5050-4937-0000 terminated
>>>>>>>> with signal Killed
>>>>>>>> I1029 03:29:53.160884  9292 slave.cpp:2696] Handling status update
>>>>>>>> TASK_FAILED (UUID: 27a2080a-8807-449e-9077-837ec45b4c51) for task
>>>>>>>> 279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713
>>>>>>>> of framework 20151029-031549-1294671788-5050-4937-0000 from @
>>>>>>>> 0.0.0.0:0
>>>>>>>> W1029 03:29:53.161247  9288 docker.cpp:986] Ignoring updating
>>>>>>>> unknown container: d4f4e236-0d0a-492c-86df-eef48a414e23
>>>>>>>> I1029 03:29:53.161548  9293 status_update_manager.cpp:322] Received
>>>>>>>> status update TASK_FAILED (UUID: 27a2080a-8807-449e-9077-837ec45b4c51) for
>>>>>>>> task
>>>>>>>> 279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713
>>>>>>>> of framework 20151029-031549-1294671788-5050-4937-0000
>>>>>>>>
>>>>>>>> I run master node with env:
>>>>>>>>
>>>>>>>> SSL_SUPPORT_DOWNGRADE=true
>>>>>>>> SSL_ENABLED=true
>>>>>>>> SSL_KEY_FILE=/home/ubuntu/xx.key
>>>>>>>> SSL_CERT_FILE=/home/ubuntu/xx.pem
>>>>>>>>
>>>>>>>> Slave node with env:
>>>>>>>>
>>>>>>>> SSL_ENABLED=true
>>>>>>>> SSL_KEY_FILE=/home/ubuntu/xx.key
>>>>>>>> SSL_CERT_FILE=/home/ubuntu/xx.pem
>>>>>>>> LIBPROCESS_ADVERTISE_IP=xxx.xxx.xxx.xxx
>>>>>>>>
>>>>>>>> When I remove all SSL envs. Slaves work well.
>>>>>>>>
>>>>>>>> Did I miss sth?
>>>>>>>>
>>>>>>>> Version:
>>>>>>>>
>>>>>>>> Mesos 0.24.1
>>>>>>>> Maraton 0.9.2
>>>>>>>>
>>>>>>>> OS
>>>>>>>> ubuntu 14.04
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 发件人: Anindya Sinha <an...@gmail.com>
>>>>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>>> 日期: 2015年10月28日 星期三 下午2:32
>>>>>>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>>> 主题: Re: How to tell master which ip to connect.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 27, 2015 at 7:43 PM, Xiaodong Zhang <xd...@alauda.io>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> It works! Thanks a lot.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Ok. So we should expose advertise_ip and advertise_port as command
>>>>>>>> line options for mesos-slave as well (instead of using the environment
>>>>>>>> variables)? Opened https://issues.apache.org/jira/browse/MESOS-3809
>>>>>>>> .
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Another question. Do masters and slaves communicate each other via
>>>>>>>>> a safety way?Is the data encrypted? I want to make sure deploy masters and
>>>>>>>>> slaves into different IaaS is PROD-READY.
>>>>>>>>>
>>>>>>>>> 发件人: haosdent <ha...@gmail.com>
>>>>>>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>>>>>>> 日期: 2015年10月28日 星期三 上午10:23
>>>>>>>>> 至: user <us...@mesos.apache.org>
>>>>>>>>> 主题: Re: How to tell master which ip to connect.
>>>>>>>>>
>>>>>>>>> Do you try `export LIBPROCESS_ADVERTISE_IP=xxx` and
>>>>>>>>> `LIBPROCESS_ADVERTISE_PORT` when start slave?
>>>>>>>>>
>>>>>>>>> On Wed, Oct 28, 2015 at 10:16 AM, Xiaodong Zhang <
>>>>>>>>> xdzhang@alauda.io> wrote:
>>>>>>>>>
>>>>>>>>>> Hi teams:
>>>>>>>>>>
>>>>>>>>>> My scenarios is like this:
>>>>>>>>>>
>>>>>>>>>> My master nodes were deployed in AWS. My slaves were in AZURE.So
>>>>>>>>>> they communicate via public ip.
>>>>>>>>>> I got trouble when slaves try to register to master.
>>>>>>>>>> Now slaves can get master’s public ip address,and can send
>>>>>>>>>> register request.But they can only send there private ip to master.(Because
>>>>>>>>>> they don’t know there public ip,thus they can’t not bind a public ip via
>>>>>>>>>> —ip flag), thus  masters can’t connect slaves.How can the slave to tell
>>>>>>>>>> master which ip master should connect(I can’t find any flags like —advertise_ip
>>>>>>>>>> in master).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>> <5185_02_07.png><9D46724C-457C-4BE1-B0E4-F57B147F6DC8.png>
>>> <742629F2-78E8-43F2-9015-F3D22720826B.png><5185_02_04.png>
>>>
>>>
>>>
>>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang