You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Erik Weathers (JIRA)" <ji...@apache.org> on 2017/04/24 19:09:04 UTC

[jira] [Comment Edited] (STORM-2191) shorten classpaths in worker and LogWriter commands

    [ https://issues.apache.org/jira/browse/STORM-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981693#comment-15981693 ] 

Erik Weathers edited comment on STORM-2191 at 4/24/17 7:08 PM:
---------------------------------------------------------------

[~revans2]:  thanks for responding (I'm guessing that [~hmclouro] or [~sriharsha] reached out to you based on the timing, since I chatted with them about this ticket last Thursday after the Storm Meetup at Hortonworks).

The random ordering of jars on classpath is an issue for sure -- ideally that's a reason to avoid conflicts in an application's jars, but I guess it's not always feasible.  I was assuming the operation *originally* was functional as opposed to defensive in this manner though, because the Storm code at that time supported Java 5 which didn't have the {{*}} wildcard classpath support, and so if you wanted jars on your classpath you had to list them explicitly.

Regarding "if we really do want to reduce the command line", for us I'd say that's an absolute requirement for us to support Storm. We need to be able to see what the different processes are and what the different topologies are for each process, and those fields are at the end of the commands.  *Really*, operationally I don't wanna accept having any commands that get cut off because of being over 4096 in general, but those are the main driving reasons.

Regarding your suggestion (manifest only jars?), I didn't quite follow how it would work nor how it could shorten the command lines.

Here's another random idea:  use the {{CLASSPATH}} environment variable instead of passing the classpath as a command line parameter, then we could have it fully filled out with these huge numbers of cmds.  I don't love it since it obscures from the {{ps}} output what the classpath is, but maybe it satisfies both of our requirements.   Also I'm not sure if it would work via the {{LogWriter}}; i.e., would {{LogWriter}} be able to pass along its {{CLASSPATH}} env var to the launched child {{Worker}} process, or would it need to reconstruct the {{CLASSPATH}} env variable like how the {{Supervisor}} would do under this proposal.

Would you have any time this week to talk about this a bit more in detail in some non-JIRA channel?  I'm happy to try to pursue your "manifest only jars" idea if I can learn more about what it means and how it could solve the problem.  My email address is embedded in my Apache JIRA user info.


was (Author: erikdw):
[~revans2]:  thanks for responding (I'm guessing that [~hmclouro] or [~sriharsha] reached out to you based on the timing, since I chatted with them about this ticket last Thursday after the Storm Meetup at Hortonworks).

The random ordering of jars on classpath is an issue for sure -- ideally that's a reason to avoid conflicts in an application's jars, but I guess it's not always feasible.  I was assuming the operation *originally* was functional as opposed to defensive in this manner though, because the Storm code at that time supported Java 5 which didn't have the {{*}} wildcard classpath support, and so if you wanted jars on your classpath you had to list them explicitly.

Regarding "if we really do want to reduce the command line", for us I'd say that's an absolute requirement for us to support Storm. We need to be able to see what the different processes are and what the different topologies are for each process, and those fields are at the end of the commands.  *Really*, operationally I don't wanna accept having any commands that get cut off because of being over 4096 in general, but those are the main driving reasons.

Regarding your suggestion (manifest only jars?), I didn't quite follow how it would work nor how it could shorten the command lines.

Here's another random idea:  use the {{CLASSPATH}} environment variable instead of passing the classpath as a command line parameter, then we could have it fully filled out with these huge numbers of cmds.  I don't love it since it obscures from the {{ps}} output what the classpath is, but maybe it satisfies both of our requirements.   Also I'm not sure if it would work via the {{LogWriter}}; i.e., would {{LogWriter}} be able to pass along its {{CLASSPATH}} env var to the launched child {{Worker}} process, or would it need to reconstruct the {{CLASSPATH}} env variable like how the {{Supervisor}} does.

Would you have any time this week to talk about this a bit more in detail in some non-JIRA channel?  I'm happy to try to pursue your "manifest only jars" idea if I can learn more about what it means and how it could solve the problem.  My email address is embedded in my Apache JIRA user info.

> shorten classpaths in worker and LogWriter commands
> ---------------------------------------------------
>
>                 Key: STORM-2191
>                 URL: https://issues.apache.org/jira/browse/STORM-2191
>             Project: Apache Storm
>          Issue Type: Task
>          Components: storm-core
>    Affects Versions: 1.0.2
>            Reporter: Erik Weathers
>            Priority: Minor
>              Labels: cli, command-line
>
> When launching the worker daemon and its wrapping LogWriter daemon, the commands can become so long that they eclipse the default Linux limit of 4096 bytes. That results in commands that are cut off in {{ps}} output, and prevents easily inspecting the system to see even what processes are running.
> The specific scenario in which this problem can be easily triggered: *running Storm on Mesos*.
> h5. Details on why it happens:
> # using the default Mesos containerizer instead of Docker containers, which causes the storm-mesos package to be unpacked into the Mesos executor sandbox.
> # The ["expand all jars on classpath"|https://github.com/apache/storm/blob/6dc6407a01d032483edebb1c1b4d8b69a304d81c/bin/storm.py#L114-L140] functionality in the {{bin/storm.py}} script causes every one of the jars that storm bundles into its lib directory to be explicitly listed in the command.
> #* e.g., say the mesos work dir is {{/var/run/mesos/work_dir/}}
> #* and say that the original classpath argument in the supervisor cmd includes the following for the {{lib/}} dir in the binary storm package:
> #** {{/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-0000/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/*}}
> #* That leads to a hugely expanded classpath argument for the LogWriter and Worker daemons that get launched:
> #** {{/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-0000/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/asm-4.0.jar:/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-0000/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/carbonite-1.4.0.jar:...}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)