You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Stephan Erb (JIRA)" <ji...@apache.org> on 2016/11/24 13:27:59 UTC

[jira] [Comment Edited] (AURORA-1830) Unknown exception initializing sandbox

    [ https://issues.apache.org/jira/browse/AURORA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693294#comment-15693294 ] 

Stephan Erb edited comment on AURORA-1830 at 11/24/16 1:27 PM:
---------------------------------------------------------------

Thanks for the bug report! I have never seen this error before. [~joshua.cohen] might know more.

A couple of random questions and things to try. Maybe this helps in the meantime:

* Does it work when you leaving out the {{container = ...}} part? This will use the Mesos containerizer but without the docker image provisioning.
* The thermos executor has an option `--mesos-containerizer-path`. Does this point to a valid path on your system?
* Running as the user {{root}} is suspicious. Could you please test with another linux user on your agent host system? 
* This part of the log message locks suspicious as well {{SlaveInfo:     hostname: "000.000.00.001"}}. Is this really a routable hostname of your agent? Normally this comes from the value passed to the Mesos agent with {{-hostname}} and/or {{-ip}}.


was (Author: stephanerb):
Thanks for the bug report! I have never seen this error before. [~joshua.cohen] might know more.

A couple of random questions and things to try. Maybe this helps in the meantime:

* Does it work when you leaving out the {{container = ...}} part? This will use the Mesos containerizer but without the docker image provisioning.
* The thermos executor has an option `--mesos-containerizer-path`. Does this point to a valid path on your system?
* Running as the user {{root}} is suspicious. Could you please test with another linux user on your agent host system? 
* This part of the log message locks suspicious as well {{SlaveInfo:     hostname: "000.000.00.001"}}. Is this really a routable hostname of your agent? Normally this comes from the value passed to the Mesos agent with {{--hostname}} and/or {{--ip}}.

> Unknown exception initializing sandbox
> --------------------------------------
>
>                 Key: AURORA-1830
>                 URL: https://issues.apache.org/jira/browse/AURORA-1830
>             Project: Aurora
>          Issue Type: Bug
>          Components: Executor
>    Affects Versions: 0.16.0
>            Reporter: Kostiantyn Bokhan
>
> When launching a job using the Mesos containerizer and a docker image, the sandbox setup fails with the following error:
> {quote}
>  FAILED • Unknown exception initializing sandbox: [Errno 2] No such file or directory
> {quote}
> Aurora file:
> {code}
> # run the script
> python = Process(
>   name = 'python',
>   cmdline = 'python --version')
> # describe the task
> python_task = Task(
>   processes = [python],
>   resources = Resources(cpu = 1, ram = 1*GB, disk=8*GB))
> jobs = [
>   Service(cluster = 'MY Cluster',
>           environment = 'devel',
>           role = 'root',
>           name = 'python',
>           task = python_task,
>           container = Mesos( image = DockerImage (name = 'python', tag = '2')))
> ]
> {code}
> *__main__.log*:
> {noformat}
> Log file created at: 2016/11/24 14:45:44
> Running on machine: gnode1
> [DIWEF]mmdd hh:mm:ss.uuuuuu pid file:line] msg
> Command line: /var/lib/mesos/slave/slaves/195fbdc8-6720-443b-b036-7fa5608b27cc-S24/frameworks/195fbdc8-6720-443b-b036-7fa5608b27cc-0014/executors/thermos-root-devel-python-0-e33ad106-90dd-481a-8d45-c320990b67d8/runs/e25e2e98-0b65-4e9f-a86d-13a18dff01bc/thermos_executor --announcer-ensemble 127.0.0.1:2181
> I1124 14:45:44.041621 25610 executor_base.py:45] Executor [None]: registered() called with:
> I1124 14:45:44.042294 25610 executor_base.py:45] Executor [None]:    ExecutorInfo:  executor_id {
>   value: "thermos-root-devel-python-0-e33ad106-90dd-481a-8d45-c320990b67d8"
> }
> resources {
>   name: "cpus"
>   type: SCALAR
>   scalar {
>     value: 0.25
>   }
>   role: "*"
> }
> resources {
>   name: "mem"
>   type: SCALAR
>   scalar {
>     value: 128.0
>   }
>   role: "*"
> }
> command {
>   uris {
>     value: "/usr/bin/thermos_executor"
>     executable: true
>   }
>   value: "${MESOS_SANDBOX=.}/thermos_executor --announcer-ensemble 127.0.0.1:2181"
> }
> framework_id {
>   value: "195fbdc8-6720-443b-b036-7fa5608b27cc-0014"
> }
> name: "AuroraExecutor"
> source: "root.devel.python.0"
> container {
>   type: MESOS
>   volumes {
>     container_path: "taskfs"
>     mode: RO
>     image {
>       type: DOCKER
>       docker {
>         name: python:2"
>       }
>     }
>   }
>   mesos {
>   }
> }
> labels {
>   labels {
>     key: "source"
>     value: "root.devel.python.0"
>   }
> }
> I1124 14:45:44.042458 25610 executor_base.py:45] Executor [None]:    FrameworkInfo: user: "root"
> name: "Aurora"
> id {
>   value: "195fbdc8-6720-443b-b036-7fa5608b27cc-0014"
> }
> failover_timeout: 1814400.0
> checkpoint: true
> hostname: "vnode7"
> capabilities {
>   type: GPU_RESOURCES
> }
> I1124 14:45:44.043046 25610 executor_base.py:45] Executor [None]:    SlaveInfo:     hostname: "000.000.00.001"
> resources {
>   name: "gpus"
>   type: SCALAR
>   scalar {
>     value: 2.0
>   }
>   role: "*"
> }
> resources {
>   name: "ports"
>   type: RANGES
>   ranges {
>     range {
>       begin: 1025
>       end: 2180
>     }
>     range {
>       begin: 2182
>       end: 3887
>     }
>     range {
>       begin: 3889
>       end: 5049
>     }
>     range {
>       begin: 5052
>       end: 8079
>     }
>     range {
>       begin: 8082
>       end: 8180
>     }
>     range {
>       begin: 8182
>       end: 32000
>     }
>   }
>   role: "*"
> }
> resources {
>   name: "disk"
>   type: SCALAR
>   scalar {
>     value: 428201.0
>   }
>   role: "*"
> }
> resources {
>   name: "cpus"
>   type: SCALAR
>   scalar {
>     value: 8.0
>   }
>   role: "*"
> }
> resources {
>   name: "mem"
>   type: SCALAR
>   scalar {
>     value: 14957.0
>   }
>   role: "*"
> }
> attributes {
>   name: "hostname"
>   type: TEXT
>   text {
>     value: "gnode1"
>   }
> }
> attributes {
>   name: "ip"
>   type: TEXT
>   text {
>     value: "000.000.00.001"
>   }
> }
> attributes {
>   name: "rack"
>   type: TEXT
>   text {
>     value: "gpu"
>   }
> }
> attributes {
>   name: "gputype"
>   type: TEXT
>   text {
>     value: "titanz"
>   }
> }
> id {
>   value: "195fbdc8-6720-443b-b036-7fa5608b27cc-S24"
> }
> checkpoint: true
> port: 5051
> I1124 14:45:44.043673 25610 executor_base.py:45] Executor [None]: launchTask got task: root/devel/python:root-devel-python-0-e33ad106-90dd-481a-8d45-c320990b67d8
> I1124 14:45:44.044601 25610 executor_base.py:45] Executor [195fbdc8-6720-443b-b036-7fa5608b27cc-S24]: Updating root-devel-python-0-e33ad106-90dd-481a-8d45-c320990b67d8 => STARTING
> I1124 14:45:44.044718 25610 executor_base.py:45] Executor [195fbdc8-6720-443b-b036-7fa5608b27cc-S24]:    Reason: Initializing sandbox.
> F1124 14:45:44.049196 25610 aurora_executor.py:85] Unknown exception initializing sandbox: [Errno 2] No such file or directory
> I1124 14:45:44.049439 25610 executor_base.py:45] Executor [195fbdc8-6720-443b-b036-7fa5608b27cc-S24]: Updating root-devel-python-0-e33ad106-90dd-481a-8d45-c320990b67d8 => FAILED
> I1124 14:45:44.049519 25610 executor_base.py:45] Executor [195fbdc8-6720-443b-b036-7fa5608b27cc-S24]:    Reason: Unknown exception initializing sandbox: [Errno 2] No such file or directory
> I1124 14:45:49.152787 25610 thermos_executor_main.py:299] MesosExecutorDriver.run() has finished.
> {noformat}
> *stderr*
> {noformat}
> I1124 14:45:43.559283 25614 fetcher.cpp:498] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/195fbdc8-6720-443b-b036-7fa5608b27cc-S24\/root","items":[{"action":"BYPASS_CACHE","uri":{"executable":true,"extract":true,"value":"\/usr\/bin\/thermos_executor"}}],"sandbox_directory":"\/var\/lib\/mesos\/slave\/slaves\/195fbdc8-6720-443b-b036-7fa5608b27cc-S24\/frameworks\/195fbdc8-6720-443b-b036-7fa5608b27cc-0014\/executors\/thermos-root-devel-python-0-e33ad106-90dd-481a-8d45-c320990b67d8\/runs\/e25e2e98-0b65-4e9f-a86d-13a18dff01bc","user":"root"}
> I1124 14:45:43.561226 25614 fetcher.cpp:409] Fetching URI '/usr/bin/thermos_executor'
> I1124 14:45:43.561242 25614 fetcher.cpp:250] Fetching directly into the sandbox directory
> I1124 14:45:43.561266 25614 fetcher.cpp:187] Fetching URI '/usr/bin/thermos_executor'
> I1124 14:45:43.561285 25614 fetcher.cpp:167] Copying resource with command:cp '/usr/bin/thermos_executor' '/var/lib/mesos/slave/slaves/195fbdc8-6720-443b-b036-7fa5608b27cc-S24/frameworks/195fbdc8-6720-443b-b036-7fa5608b27cc-0014/executors/thermos-root-devel-python-0-e33ad106-90dd-481a-8d45-c320990b67d8/runs/e25e2e98-0b65-4e9f-a86d-13a18dff01bc/thermos_executor'
> I1124 14:45:43.569787 25614 fetcher.cpp:547] Fetched '/usr/bin/thermos_executor' to '/var/lib/mesos/slave/slaves/195fbdc8-6720-443b-b036-7fa5608b27cc-S24/frameworks/195fbdc8-6720-443b-b036-7fa5608b27cc-0014/executors/thermos-root-devel-python-0-e33ad106-90dd-481a-8d45-c320990b67d8/runs/e25e2e98-0b65-4e9f-a86d-13a18dff01bc/thermos_executor'
> twitter.common.app debug: Initializing: twitter.common.log (Logging subsystem.)
> Writing log files to disk in /var/lib/mesos/slave/slaves/195fbdc8-6720-443b-b036-7fa5608b27cc-S24/frameworks/195fbdc8-6720-443b-b036-7fa5608b27cc-0014/executors/thermos-root-devel-python-0-e33ad106-90dd-481a-8d45-c320990b67d8/runs/e25e2e98-0b65-4e9f-a86d-13a18dff01bc
> I1124 14:45:44.033974 25610 exec.cpp:161] Version: 1.0.0
> I1124 14:45:44.040127 25639 exec.cpp:236] Executor registered on agent 195fbdc8-6720-443b-b036-7fa5608b27cc-S24
> FATAL] Unknown exception initializing sandbox: [Errno 2] No such file or directory
> twitter.common.app debug: Shutting application down.
> twitter.common.app debug: Running exit function for twitter.common.log (Logging subsystem.)
> twitter.common.app debug: Finishing up module teardown.
> twitter.common.app debug:   Active thread: <_MainThread(MainThread, started 139772146038592)>
> twitter.common.app debug:   Active thread (daemon): <_DummyThread(Dummy-2, started daemon 139771946940160)>
> twitter.common.app debug: Exiting cleanly.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)