You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Erik Weathers (JIRA)" <ji...@apache.org> on 2017/07/01 21:34:00 UTC

[jira] [Comment Edited] (STORM-1342) support multiple logviewers per host for container-isolated worker logs

    [ https://issues.apache.org/jira/browse/STORM-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071411#comment-16071411 ] 

Erik Weathers edited comment on STORM-1342 at 7/1/17 9:33 PM:
--------------------------------------------------------------

I inquired with the Mesos user list about whether they might have a feature that could allow Storm-on-Mesos to ensure there is a logviewer deployed to every host where Storm Workers+Supervisors run:
* https://www.mail-archive.com/user@mesos.apache.org/msg09141.html

Answer:  there isn't support for this kind of feature yet, but the Mesosphere folks that work on DC/OS also have need of such a feature, so there likely will be a feature eventually:

{panel: title=Kevin Klues talking about "daemon sets" and "admin tasks"}
What you are describing is a feature we call 'admin tasks' or 'daemon sets'. 

Unfortunately, there is no direct support for these yet, but we do have plans in the (relatively) near future to start working on it.

One of our use cases is exactly what you describe with the logging service. On DC/OS we currently run our logging service as a systemd unit outside of mesos since we can't guarantee it gets launched everywhere (the same is true for a bunch of other services as well, namely metrics).

We don't have an exact timeline for when we will build this support yet, but we will certainly announce it once we start actively working on it.
{panel}

Notably, another response indicated the approach taken by the Groupon team operating our Storm-on-Mesos cluster is the recommended method for handling the deployment of the logviewer right now:

{panel: title=Dick Davies suggestion to manually roll out the logviewer via "configuration management" tooling}
If it _needs_ to be there always then I'd roll it out with whatever
automation you use to deploy the mesos workers ; depending on
the scale you're running at launching it as a task is likely to be less
reliable due to outages etc.

( I understand the 'maybe all hosts' constraint but if it's 'up to one per
host', it sounds like a CM issue to me. )
{panel}

In the Groupon use-case, we deploy the logviewer to every Mesos Worker host when we do a deploy of the storm-mesos code.


was (Author: erikdw):
I inquired with the Mesos user list about whether they might have a feature that could allow Storm-on-Mesos to ensure there is a logviewer deployed to every host where Storm Workers+Supervisors run:
* https://www.mail-archive.com/user@mesos.apache.org/msg09141.html

Answer:  there isn't support for this kind of feature yet, but the Mesosphere folks that work on DC/OS also have need of such a feature, so there likely will be a feature eventually:

{quote: title=Kevin Klues talking about "daemon sets" and "admin tasks"}
What you are describing is a feature we call 'admin tasks' or 'daemon sets'. 

Unfortunately, there is no direct support for these yet, but we do have plans in the (relatively) near future to start working on it.

One of our use cases is exactly what you describe with the logging service. On DC/OS we currently run our logging service as a systemd unit outside of mesos since we can't guarantee it gets launched everywhere (the same is true for a bunch of other services as well, namely metrics).

We don't have an exact timeline for when we will build this support yet, but we will certainly announce it once we start actively working on it.
{quote}

Notably, another response indicated the approach taken by the Groupon team operating our Storm-on-Mesos cluster is the recommended method for handling the deployment of the logviewer right now:

{quote: title=Dick Davies suggestion to manually roll out the logviewer via "configuration management" tooling}
If it _needs_ to be there always then I'd roll it out with whatever
automation you use to deploy the mesos workers ; depending on
the scale you're running at launching it as a task is likely to be less
reliable due to outages etc.

( I understand the 'maybe all hosts' constraint but if it's 'up to one per
host', it sounds like a CM issue to me. )
{quote}

In the Groupon use-case, we deploy the logviewer to every Mesos Worker host when we do a deploy of the storm-mesos code.

> support multiple logviewers per host for container-isolated worker logs
> -----------------------------------------------------------------------
>
>                 Key: STORM-1342
>                 URL: https://issues.apache.org/jira/browse/STORM-1342
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>            Reporter: Erik Weathers
>            Priority: Minor
>
> h3. Storm-on-Mesos Worker Logs are in varying directories
> When using [storm-on-mesos|https://github.com/mesos/storm] with cgroups, each topology's workers are isolated into separate containers.  By default the worker logs will be saved into container-specific sandbox directories.  These directories are also topology-specific by definition, because, as just stated, the containers are specific to each topology.
> h3. Problem: Storm supports 1-and-only-1 Logviewer per Worker Host
> A challenge with this different way of running Storm is that the [Storm logviewer|https://github.com/apache/storm/blob/768a85926373355c15cc139fd86268916abc6850/docs/_posts/2013-12-08-storm090-released.md#log-viewer-ui] runs as a single instance on each worker host.   This doesn't play well with having the topology worker logs in separate per-topology containers.  The one logviewer doesn't know about the various sandbox directories that the Storm Workers are writing to.  And if we just spawned new logviewers for each container, the problem is that the Storm UI only knows about 1 global port the logviewer, so you cannot just direct.
> These problems are documented (or linked to) from [Issue #6 in the storm-on-mesos project|https://github.com/mesos/storm/issues/6]
> h3. Possible Solutions I can envision
> # configure the Storm workers to write to log directories that exist on the raw host outside of the container sandbox, and run a single logviewer on a host, which serves up the contents of that directory.
> #* violates one of the basic reasons for using containers: isolation.
> #* also prevents a standard use case for Mesos: running more than 1 instance of a Mesos Framework (e.g., "Storm Cluster") at once on same Mesos Cluster. e.g., for Blue-Green deployments.
> #* a variation on this proposal is to somehow expose the sandbox dirs of all storm containers to this singleton logviewer process (still has above problems)
> # launch a separate logviewer in each container, and somehow register those logviewers with Storm such that Storm knows for a given host which logviewer port is assigned to a given topology.
> #* this is the proposed solution
> h3. Storm Changes for the Proposed Solution
> Nimbus or ZooKeeper could serve as a registrar, recording the association between a slot (host + worker port) and the logviewer port that is serving the workers logs. And the Storm-on-Mesos framework could update this registry when launching a new worker.  (This proposal definitely calls for thorough vetting and thinking.)
> h3. Storm-on-Mesos Framework Changes for the Proposed Solution
> Along with the interaction with the "registrar" proposed above, the storm-on-mesos framework can be enhanced to launch multiple logviewers on a given worker host, where each logviewer is dedicated to serving the worker logs from a specific topology's container/sandbox directory.  This would be done by launching a logviewer process within the topology's container, and assigning it an arbitrary listening port that has been determined dynamically through mesos (which treats ports as one of the schedulable resource primitives of a worker host).  [Code implementing this logviewer-port-allocation logic already exists|https://github.com/mesos/storm/commit/af8c49beac04b530c33c1401c829caaa8e368a35], but [that specific portion of the code was reverted|https://github.com/mesos/storm/commit/dc3eee0f0e9c06f6da7b2fe697a8e4fc05b5227e] because of the issues that inspired this ticket.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)