You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@openwhisk.apache.org by GitBox <gi...@apache.org> on 2019/08/13 06:29:14 UTC

[GitHub] [openwhisk] sven-lange-last commented on issue #4582: Action container log collection does not wait for sentinel on developer error

sven-lange-last commented on issue #4582: Action container log collection does not wait for sentinel on developer error
URL: https://github.com/apache/openwhisk/pull/4582#issuecomment-520706643

## Assessment

* Following assessment lists all situations where a container / developer error may occur.
* This PR changes log collecting and processing for container / developer errors.
* While this PR makes log collecting and processing more robust when container / developer errors are detected, it has the downside that a few log lines may be lost.

## Conditions for container / developer error

### In `ContainerProxy`:

* https://github.com/apache/openwhisk/blob/9bef49fcd47f7922e4226ff7a97385f1313f7d64/core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerProxy.scala#L280

* Application-specific error while starting a blackbox container.
* Examples: image not found in registry or network failure while pulling.
* The action container does not even start, no log reading is performed at all.

### In `Container`:

* https://github.com/apache/openwhisk/blob/9bef49fcd47f7922e4226ff7a97385f1313f7d64/common/scala/src/main/scala/org/apache/openwhisk/core/containerpool/Container.scala#L140

* Action timeout is exceeded during init.
* Example: a Node.js action with a busy loop during init blocks the event loop.
* Expectation: managed action runtimes do not produce sentinels on timeout during init.

* https://github.com/apache/openwhisk/blob/9bef49fcd47f7922e4226ff7a97385f1313f7d64/common/scala/src/main/scala/org/apache/openwhisk/core/containerpool/Container.scala#L177

* Action timeout is exceeded during run.
* Example: action runs too long.
* Expectation: managed action runtimes do not produce sentinels on timeout during run.

### In `ActivationResult` - `processInitResponseContent()`:

* https://github.com/apache/openwhisk/blob/9bef49fcd47f7922e4226ff7a97385f1313f7d64/common/scala/src/main/scala/org/apache/openwhisk/core/entity/ActivationResult.scala#L151

* POST /init returns a response with status != OK (200), response is not truncated and is a valid JSON. We assume that the managed action runtime detected a problem during init and sent the response.
* Example: Node.js action code that only contains comment => no valid entry point can be found.
* Expectation: managed action runtimes should produce sentinels in this situation.

* https://github.com/apache/openwhisk/blob/9bef49fcd47f7922e4226ff7a97385f1313f7d64/common/scala/src/main/scala/org/apache/openwhisk/core/entity/ActivationResult.scala#L153

* POST /init returns a response with status != OK (200), response is not truncated but is no valid JSON. We assume that the managed action runtime detected a problem during init and sent a broken response due to a bug.
* Should not happen for managed runtimes because such failures should be caught in test. Blackbox containers may return improper responses. Can malicious actions cause sending of a broken response?
* Example: no known examples.
* Expectation: managed action runtimes should produce sentinels in this situation.

* https://github.com/apache/openwhisk/blob/9bef49fcd47f7922e4226ff7a97385f1313f7d64/common/scala/src/main/scala/org/apache/openwhisk/core/entity/ActivationResult.scala#L157

* POST /init returns a response with status != OK (200) and response is truncated because it exceeds the size limit (1 MiB). We assume that the managed action runtime detected a problem during init and sent a broken response due to a bug.
* Should not happen for managed runtimes because such failures should be caught in test. Blackbox containers may return improper responses. Can malicious actions cause sending of a broken response?
* Example: no known examples.
* Expectation: managed action runtimes should produce sentinels in this situation.

* https://github.com/apache/openwhisk/blob/9bef49fcd47f7922e4226ff7a97385f1313f7d64/common/scala/src/main/scala/org/apache/openwhisk/core/entity/ActivationResult.scala#L161

* The TCP connection to the action container "fails" or breaks down during POST /init or while waiting for a response AND the OOM check is positive. We assume that the managed action runtime did not have a chance to react on this error situation.
* Example: action code consumes too much physical memory by including a lot of prereq packages.
* Expectation: managed action runtimes won't produce sentinels in this situation.

* https://github.com/apache/openwhisk/blob/9bef49fcd47f7922e4226ff7a97385f1313f7d64/common/scala/src/main/scala/org/apache/openwhisk/core/entity/ActivationResult.scala#L165

* The TCP connection to the action container "fails" or breaks down during POST /init or while waiting for a response AND the OOM check is negative. We assume that the managed action runtime did not have a chance to react on this error situation.
* Example: the action code breaks the managed action runtime process by exiting.
* Expectation: managed action runtimes won't produce sentinels in this situation. Malicious actions may be able to delay / prevent sentinel production.

### In `ActivationResult` - `processRunResponseContent()`:

* https://github.com/apache/openwhisk/blob/9bef49fcd47f7922e4226ff7a97385f1313f7d64/common/scala/src/main/scala/org/apache/openwhisk/core/entity/ActivationResult.scala#L197

* POST /run returns a response with status != OK (200), response is not truncated and is a valid JSON. We assume that the managed action runtime detected a problem during / after run and sent the response.
* Example: action runtime caught an exception thrown by action code.
* Expectation: managed action runtimes should produce sentinels in this situation.

* https://github.com/apache/openwhisk/blob/9bef49fcd47f7922e4226ff7a97385f1313f7d64/common/scala/src/main/scala/org/apache/openwhisk/core/entity/ActivationResult.scala#L202

* POST /run returns a response with status != OK (200), response is not truncated and is a valid JSON - but JSON is not an object (instead: String, Number, ...). We assume that the managed action runtime detected a problem during / after run and and sent a broken response due to a bug.
* Should not happen for managed runtimes because such failures should be caught in test. Blackbox containers may return improper responses. Can malicious actions cause sending of a broken response?
* Example: no known examples.
* Expectation: managed action runtimes should produce sentinels in this situation.

* https://github.com/apache/openwhisk/blob/9bef49fcd47f7922e4226ff7a97385f1313f7d64/common/scala/src/main/scala/org/apache/openwhisk/core/entity/ActivationResult.scala#L207

* POST /run returns a response with status != OK (200), response is not truncated but is no valid JSON. We assume that the managed action runtime detected a problem during / after run and and sent a broken response due to a bug.
* Should not happen for managed runtimes because such failures should be caught in test. Blackbox containers may return improper responses. Can malicious actions cause sending of a broken response?
* Example: no known examples.
* Expectation: managed action runtimes should produce sentinels in this situation.

* https://github.com/apache/openwhisk/blob/9bef49fcd47f7922e4226ff7a97385f1313f7d64/common/scala/src/main/scala/org/apache/openwhisk/core/entity/ActivationResult.scala#L211

* POST /run returns a response with status != OK (200) and response is truncated because it exceeds the size limit (1 MiB). We assume that the managed action runtime detected a problem during / after run and and sent a broken response due to a bug.
* Should not happen for managed runtimes because such failures should be caught in test. Blackbox containers may return improper responses. Can malicious actions cause sending of a broken response?
* Example: no known examples.
* Expectation: managed action runtimes should produce sentinels in this situation.

* https://github.com/apache/openwhisk/blob/9bef49fcd47f7922e4226ff7a97385f1313f7d64/common/scala/src/main/scala/org/apache/openwhisk/core/entity/ActivationResult.scala#L215

* The TCP connection to the action container "fails" or breaks down during POST /run or while waiting for a response AND the OOM check is positive. We assume that the managed action runtime did not have a chance to react on this error situation.
* Example: action code consumes too much physical memory by creating too many objects.
* Expectation: managed action runtimes won't produce sentinels in this situation.

* https://github.com/apache/openwhisk/blob/9bef49fcd47f7922e4226ff7a97385f1313f7d64/common/scala/src/main/scala/org/apache/openwhisk/core/entity/ActivationResult.scala#L219

* The TCP connection to the action container "fails" or breaks down during POST /run or while waiting for a response AND the OOM check is negative. We assume that the managed action runtime did not have a chance to react on this error situation.
* Example: the action code breaks the managed action runtime process by exiting.
* Expectation: managed action runtimes won't produce sentinels in this situation. Malicious actions may be able to delay / prevent sentinel production.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services