You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Jan-Philip Gehrcke (JIRA)" <ji...@apache.org> on 2017/01/19 01:56:26 UTC

[jira] [Created] (MESOS-6951) Docker containerizer: mangled environment when env value contains LF byte

Jan-Philip Gehrcke created MESOS-6951:
-----------------------------------------

             Summary: Docker containerizer: mangled environment when env value contains LF byte
                 Key: MESOS-6951
                 URL: https://issues.apache.org/jira/browse/MESOS-6951
             Project: Mesos
          Issue Type: Bug
          Components: containerization
            Reporter: Jan-Philip Gehrcke


Consider this Marathon app definition

{code}
{
  "id": "/testapp",
  "cmd": "env && tail -f /dev/null",
  "env":{
    "TESTVAR":"line1\nline2"
  },
  "cpus": 0.1,
  "mem": 10,
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "alpine"
    }
  }
}
{code}

The JSON-encoded newline in the value of the {{TESTVAR}} environment variable leads to a corrupted task environment. What follows is a subset of the resulting task environment (as printed via {{env}}, i.e. in key=value notation):

{code}
line2=
TESTVAR=line1
{code}

That is, the trailing part of the intended value ended up being interpreted as variable name, and only the leading part of the intended value was used as actual value for {{TESTVAR}}.

Common application scenarios that would badly break with that involve pretty-printed JSON documents or YAML documents passed along via the environment.

Following the code and information flow led to the conclusion that Docker's {{--env-file}} command line interface is the weak point in the flow. It is currently used in Mesos' Docker containerizer for passing the environment to the container:

{code}
  argv.push_back("--env-file");
  argv.push_back(environmentFile);
{code}

(Ref: [code|https://github.com/apache/mesos/blob/c0aee8cc10b1d1f4b2db5ff12b771372fdd5b1f3/src/docker/docker.cpp#L584])


Docker's {{--env-file}} argument behavior is documented via

{quote}
The --env-file flag takes a filename as an argument
and expects each line to be in the VAR=VAL format,
{quote}
(Ref: https://docs.docker.com/engine/reference/commandline/run/)

That is, Docker identifies individual environment variable key/value pair definitions based on newline bytes in that file which explains the observed environment variable value fragmentation. Notably, Docker does not provide a mechanism for escaping newline bytes in the values specified in this environment file.

I think it is important to understand that Docker's {{--env-file}} mechanism is ill-posed in the sense that it is not capable of transmitting the whole range of environment variable values allowed by POSIX. That's what the Single UNIX Specification, Version 3 has to say about environment variable values:

{quote}
the value shall be composed of characters from the
portable character set (except NUL and as indicated below). 
{quote}
(Ref: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html)

About "The portable character set": http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap06.html#tagtcjh_3

It includes (among others) the LF byte. Understandably, the current Docker {{--env-file}} behavior will not change, so this is not an issue that can be deferred to Docker: https://github.com/docker/docker/issues/12997

Notably, the {{--env-file}} method for communicating environment variables to Docker containers was just recently introduced to Mesos as of https://issues.apache.org/jira/browse/MESOS-6566, for not leaking secrets through the process listing. Previously, we specified env key/value pairs on the command line which leaked secrets to the process list and probably also did not support the full range of valid environment variable values.

We need a solution that
1) does not leak sensitive values (i.e. is compliant with MESOS-6566).
2) allows for passing arbitrary environment variable values.

It seems that Docker's {{--env}} method can be used for that. It can be used to define _just the names of the environment variables_ to-be-passed-along, in which case the docker binary will read the corresponding values from its own environment, which we can clearly prepare appropriately when we invoke the corresponding child process. This method would still leak environment variable _names_ to the process listing, but (especially if documented) this should be fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)