You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Matthew Sharpe <ma...@gmail.com> on 2015/04/13 16:22:45 UTC

Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Hi,

I've set up a small Storm cluster on AWS that is giving me headaches. At
the moment I've got a situation where the cluster (a nimbus node and 3
supervisors) will successfully run in local mode my particular topology.
However, as soon as I try to run the topology in distributed mode there are
no errors (that I can see) in any of the logs - the topology appears in the
UI but never shows any tuples emitted by the spouts.

In order to check it wasn't my topology I cloned the storm-starter project
and have tried with the RollingTopWords example - running locally I get a
stream of output but as soon as I specify that we should run in a
distributed fashion I get a success message (topology successfully
submitted). The topology appears on the UI as successfully submitted but I
see no progress.

It's a bit of a weird one as I don't know which logs to share to show the
problem as the whole point is I'm not seeing any errors!

The last entry in the supervisor logs pertains to the set up of the
supervisor (i.e. it's not logged anything about the topologies that have
been submitted)

The last entries in the nimbus logs are probably a bit more illuminating:

2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
production-topology-2-1428932653:[2 2] not alive
2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
production-topology-2-1428932653:[3 3] not alive
2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
production-topology-2-1428932653:[4 4] not alive
2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
(["bd340179-5905-4233-ad5d-44b47b228177" 6702]
["bd340179-5905-4233-ad5d-44b47b228177" 6701]
["bd340179-5905-4233-ad5d-44b47b228177" 6700]
["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
production-topology-2-1428932653 to 1 slots
2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 3] [4
4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
[15 15] [1 1]]
2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment for topology
id production-topology-2-1428932653:
#backtype.storm.daemon.common.Assignment{:master-code-dir
"/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
{"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
:executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702],
[3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
{[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
[14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}

If anybody can help that'd be greatly appreciated!

Thanks,

Matt

-- 
Check out my blog: dogdogfish.com

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Matthew Sharpe <ma...@gmail.com>.
I've (just about) solved this.

My problem was with reverse DNS lookup on my supervisor node - on adding
HOSTNAME - IP mappings in /etc/hosts  I was able to move on and get a
different error message: https://issues.apache.org/jira/browse/STORM-130

I've updated my version of Storm (to 0.9.3) and cleared Zookeeper and local
data directories and am now able to successfully run the
RollingWordsTopology on a cluster.

Thanks for your help Nathan :)

Now to solve the latest problem...

On 14 April 2015 at 11:30, Matthew Sharpe <ma...@gmail.com> wrote:

> Additional bit at the end of the worker GC log:
>
> 2.551: [GC [PSYoungGen: 120832K->16195K(140800K)]
> 120832K->16203K(461824K), 0.0217550 secs] [Times: user=0.09 sys=0.01,
> real=0.02 secs]
> Heap
>  PSYoungGen      total 140800K, used 36644K [0x00000000f0000000,
> 0x00000000f9d00000, 0x0000000100000000)
>   eden space 120832K, 16% used
> [0x00000000f0000000,0x00000000f13f8428,0x00000000f7600000)
>   from space 19968K, 81% used
> [0x00000000f7600000,0x00000000f85d0c90,0x00000000f8980000)
>   to   space 19968K, 0% used
> [0x00000000f8980000,0x00000000f8980000,0x00000000f9d00000)
>  ParOldGen       total 321024K, used 8K [0x00000000d0000000,
> 0x00000000e3980000, 0x00000000f0000000)
>   object space 321024K, 0% used
> [0x00000000d0000000,0x00000000d0002000,0x00000000e3980000)
>  PSPermGen       total 22528K, used 22196K [0x00000000cae00000,
> 0x00000000cc400000, 0x00000000d0000000)
>   object space 22528K, 98% used
> [0x00000000cae00000,0x00000000cc3ad218,0x00000000cc400000)
>
> On 14 April 2015 at 11:15, Matthew Sharpe <ma...@gmail.com>
> wrote:
>
>> Thanks for all your help so far - I've set up a cluster with m3.2xlarge
>> EC2 instances, 30GB RAM and 8 vCPUs which I'm hoping should be enough. I've
>> added GC logging and have switched to using one worker per supervisor
>> (have gone from:
>> - 6700
>> - 6701
>> - 6702
>> - 6703
>>
>> to
>>
>> - 6700). I've also switched all of my Java space declarations to the
>> values used in here:
>> https://github.com/apache/storm/blob/master/conf/defaults.yaml
>> i.e. nimbus: 1024m, ui : 768m, supervisor: 256m, worker : 768m
>>
>> I'm still not getting any tuples emitted - the supervisor GC log:
>>
>> Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE
>> (1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0
>> 20080428 (Red Hat 4.3.0-8)
>> Memory: 4k page, physical 30824244k(27561384k free), swap 0k(0k free)
>> CommandLine flags: -XX:InitialHeapSize=268435456
>> -XX:MaxHeapSize=268435456 -XX:+PrintGC -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps -XX:+UseCompressedOops -XX:+UseParallelGC
>> 2.058: [GC [PSYoungGen: 66048K->10199K(76800K)] 66048K->10207K(251392K),
>> 0.0158580 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
>> 3.010: [GC [PSYoungGen: 76247K->10746K(76800K)] 76255K->11131K(251392K),
>> 0.0171470 secs] [Times: user=0.07 sys=0.01, real=0.02 secs]
>> 264.012: [GC [PSYoungGen: 76794K->7879K(76800K)] 77179K->8272K(251392K),
>> 0.0114030 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]
>>
>> The worker GC log:
>>
>> Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE
>> (1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0
>> 20080428 (Red Hat 4.3.0-8)
>> Memory: 4k page, physical 30824244k(27356064k free), swap 0k(0k free)
>> CommandLine flags: -XX:InitialHeapSize=493187904
>> -XX:MaxHeapSize=805306368 -XX:+PrintGC -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps -XX:+UseCompressedOops -XX:+UseParallelGC
>> 2.861: [GC [PSYoungGen: 120832K->16150K(140800K)]
>> 120832K->16158K(461824K), 0.0233930 secs] [Times: user=0.07 sys=0.03,
>> real=0.02 secs]
>>
>> The supervisor log:
>>
>> 2015-04-14 10:04:24 b.s.d.supervisor [INFO]
>> f5888c5a-d36f-4d6b-8b67-4cf20203efef still hasn't started
>> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shutting down and clearing
>> state for id f5888c5a-d36f-4d6b-8b67-4cf20203efef. Current supervisor time:
>> 1429005898. State: :timed-out, Heartbeat:
>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1429005865,
>> :storm-id "production-topology-1-1429005259", :executors #{[2 2] [3 3] [4
>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>> [15 15] [-1 -1] [1 1]}, :port 6700}
>> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shutting down
>> 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e:f5888c5a-d36f-4d6b-8b67-4cf20203efef
>> 2015-04-14 10:04:58 b.s.util [INFO] Error when trying to kill 30755.
>> Process is probably already dead.
>> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shut down
>> 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e:f5888c5a-d36f-4d6b-8b67-4cf20203efef
>> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Launching worker with
>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>> "production-topology-1-1429005259", :executors ([2 2] [3 3] [4 4] [5 5] [6
>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>> 1])} for this supervisor 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e on port 6700
>> with id e7cfe011-56a4-42d4-8331-4cb055055799
>> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Launching worker with
>> command: java -server -Xmx768m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>> -verbose:gc -Xloggc:/app/storm/gc-storm-worker-6700.log
>> -Djava.net.preferIPv4Stack=true
>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>> -Dlogfile.name=worker-6700.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>> -Dstorm.id=production-topology-1-1429005259
>> -Dworker.id=e7cfe011-56a4-42d4-8331-4cb055055799 -Dworker.port=6700 -cp
>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-1-1429005259/stormjar.jar
>> backtype.storm.daemon.worker production-topology-1-1429005259
>> 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e 6700
>> e7cfe011-56a4-42d4-8331-4cb055055799
>> 2015-04-14 10:04:58 b.s.d.supervisor [INFO]
>> e7cfe011-56a4-42d4-8331-4cb055055799 still hasn't started
>> 2015-04-14 10:04:59 b.s.d.supervisor [INFO]
>> e7cfe011-56a4-42d4-8331-4cb055055799 still hasn't started
>>
>> The worker log: https://gist.github.com/Kali89/2a3d3aea401f8d1bb195
>>
>> The nimbus log:
>>
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[2 2] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[3 3] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[4 4] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[5 5] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[6 6] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[7 7] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[8 8] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[9 9] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[10 10] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[11 11] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[12 12] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[13 13] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[14 14] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[15 15] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[1 1] not alive
>> 2015-04-14 10:11:27 b.s.s.EvenScheduler [INFO] Available slots:
>> (["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700]
>> ["5f1c9cea-3fcc-4d4e-aa9e-650d837e6a49" 6700])
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Reassigning
>> production-topology-1-1429005259 to 1 slots
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 3]
>> [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>> [15 15] [1 1]]
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Setting new assignment for
>> topology id production-topology-1-1429005259:
>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>> "/app/storm/nimbus/stormdist/production-topology-1-1429005259", :node->host
>> {"eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" "
>> ec2-52-16-76-40.eu-west-1.compute.amazonaws.com"}, :executor->node+port
>> {[2 2] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [3 3]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [4 4]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [5 5]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [6 6]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [7 7]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [8 8]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [9 9]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [10 10]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [11 11]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [12 12]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [13 13]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [14 14]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [15 15]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [1 1]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700]}, :executor->start-time-secs
>> {[2 2] 1429006287, [3 3] 1429006287, [4 4] 1429006287, [5 5] 1429006287, [6
>> 6] 1429006287, [7 7] 1429006287, [8 8] 1429006287, [9 9] 1429006287, [10
>> 10] 1429006287, [11 11] 1429006287, [12 12] 1429006287, [13 13] 1429006287,
>> [14 14] 1429006287, [15 15] 1429006287, [1 1] 1429006287}}
>>
>> Sorry for the data dump! For what it's worth - I'm now wondering how the
>> supervisor and worker relationship works? With one 756MB worker per
>> supervisor (node) I'm well below the machine limit but am I hitting another
>> sort of limit? Does the supervisor daemon on the supervisor node have
>> enough memory with 256MB? Is there a relationship between the worker memory
>> and the supervisor memory?
>>
>> Thanks!
>>
>>
>> On 13 April 2015 at 18:58, Nathan Leung <nc...@gmail.com> wrote:
>>
>>> No, the default is 4.  If you are running multiple workers on the same
>>> supervisor node, that could cause problems with 768 MB / worker and 1 GB /
>>> supervisor host.
>>>
>>> On Mon, Apr 13, 2015 at 1:55 PM, Matthew Sharpe <
>>> matthewsharpe3@gmail.com> wrote:
>>>
>>>> I can only think that I'll be using whatever the default is - that's
>>>> not a setting that I've changed anywhere or set on any command line. Am I
>>>> right in thinking (hoping) that that'll default to one? If not, how do I
>>>> set it for the RollingTopWords example?
>>>>
>>>> The cluster is being spun up now but will probably take half an hour or
>>>> so before it's ready. I'll reply once it's all up and good to go.
>>>>
>>>> On 13 April 2015 at 18:40, Nathan Leung <nc...@gmail.com> wrote:
>>>>
>>>>> That looks about right.  Are you running one worker per supervisor
>>>>> right now?
>>>>>
>>>>> On Mon, Apr 13, 2015 at 1:33 PM, Matthew Sharpe <
>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>
>>>>>> I was just thinking about that - I'm using EC2 m3.medium instances,
>>>>>> they're not especially RAM heavy (1GB). Am I right in thinking that to
>>>>>> enable GC logging I'd just need to run Storm with the following options in
>>>>>> my 'storm.yaml'?
>>>>>>
>>>>>> worker.childopts: "-Xmx768m -XX:MaxPermSize=256m -XX:+PrintGCDetails
>>>>>> -XX:+PrintGCTimeStamps -verbose:gc
>>>>>> -Xloggc:/app/storm/logs/gc-storm-worker-%ID%.log"
>>>>>> supervisor.childopts: "-XX:MaxPermSize=256m -XX:+PrintGCDetails
>>>>>> -XX:+PrintGCTimeStamps -verbose:gc
>>>>>> -Xloggc:/app/storm/logs/gc-storm-supervisor.log"
>>>>>> If so, I'll set up the same cluster but using m3.xlarge instances
>>>>>> (15GB of RAM) with those settings. Either way, I'll start setting up the
>>>>>> extra large nodes.
>>>>>>
>>>>>>
>>>>>> On 13 April 2015 at 18:10, Nathan Leung <nc...@gmail.com> wrote:
>>>>>>
>>>>>>> Do you have GC logging enabled?  You should have enough ram for each
>>>>>>> worker (for this example) that it's not a problem, but doesn't hurt to
>>>>>>> check.
>>>>>>>
>>>>>>> On Mon, Apr 13, 2015 at 12:12 PM, Matthew Sharpe <
>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>
>>>>>>>> Not long at all - certainly less than couple of minutes but longer
>>>>>>>> than a few seconds, the couple I actually measured lasted 42 seconds
>>>>>>>> followed by about 5 seconds. So the heartbeats folder behaves like so...
>>>>>>>>
>>>>>>>> /app/storm/workers/[uuid] is created with an empty heartbeats
>>>>>>>> folder inside it
>>>>>>>> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a
>>>>>>>> couple of examples:
>>>>>>>>
>>>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941026297.version
>>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297
>>>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941027240.version
>>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240
>>>>>>>>
>>>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941398235.version
>>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235
>>>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941399163.version
>>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163
>>>>>>>> )
>>>>>>>>
>>>>>>>> /app/storm/workers/[uuid] is deleted.
>>>>>>>>
>>>>>>>> I've not seen any heartbeats folder with any more than 2 heartbeats
>>>>>>>> in.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 13 April 2015 at 17:00, Nathan Leung <nc...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> It only matters what user supervisor is run as, so you should be
>>>>>>>>> ok.  How long is it between the worker being started, and it getting
>>>>>>>>> killed?  What are the timestamps of the heartbeats in the corresponding
>>>>>>>>> worker folder?
>>>>>>>>>
>>>>>>>>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <
>>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I'm running storm as a (specially created) storm user - on both
>>>>>>>>>> the nimbus and the supervisors the process is showing as being run by the
>>>>>>>>>> storm user. And I've got the storm local directory set as /app/storm -
>>>>>>>>>> owned by storm user and with permissions 700. Thought it might be due to me
>>>>>>>>>> submitting the job as a different user but have just terminated the old job
>>>>>>>>>> and submitted the new one as the storm user and still not getting any
>>>>>>>>>> tuples coming through.
>>>>>>>>>>
>>>>>>>>>> For what it's worth, I've looked in the local data directory on
>>>>>>>>>> the supervisors and in the workers folder there is a (non-empty) heartbeats
>>>>>>>>>> folder being created - don't know if that's indicative of heartbeats being
>>>>>>>>>> processed correctly or if that's created at the start of a worker process.
>>>>>>>>>>
>>>>>>>>>> Is there anything else you want to see that would help you
>>>>>>>>>> diagnose this? I can share the UI with you if you think that'd help?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 13 April 2015 at 16:22, Nathan Leung <nc...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Sounds like it's having problems with heartbeats.  As I recall
>>>>>>>>>>> worker -> supervisor heartbeats are written using the local filesystem.
>>>>>>>>>>> What user are you running as?  In your storm.yaml file, you will see the
>>>>>>>>>>> option "storm.local.dir".  Does the user supervisor / workers are running
>>>>>>>>>>> as have permissions to write to this folder?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>>>>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> New bonus bit of information - the supervisor log has got this
>>>>>>>>>>>> going on:
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>>>>>>>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and
>>>>>>>>>>>> clearing state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current
>>>>>>>>>>>> supervisor time: 1428938060. State: :timed-out, Heartbeat:
>>>>>>>>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>>>>>>>>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
>>>>>>>>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>>>>>>>>>>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill
>>>>>>>>>>>> 5284. Process is probably already dead.
>>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker
>>>>>>>>>>>> with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>>>>>>>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
>>>>>>>>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>>>>>>>>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
>>>>>>>>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker
>>>>>>>>>>>> with command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>>>>>>>>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>>>>>> -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>>>>>>>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>>>>>>>>>>> -Dstorm.id=production-topology-2-1428932653
>>>>>>>>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
>>>>>>>>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>>>>>>>>>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>>>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>>>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>>>>>>>>>>
>>>>>>>>>>>> On 13 April 2015 at 16:09, Matthew Sharpe <
>>>>>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> OK, in checking the worker process logs I've uncovered a bit
>>>>>>>>>>>>> more information. The UI reports 0 emitted from the wordGenerator spout,
>>>>>>>>>>>>> along with 0 transferred, acked and failed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know
>>>>>>>>>>>>> the usual way of doing that), I've got a bunch of other stuff on the boxes
>>>>>>>>>>>>> and in troubleshooting things out have been a bit laxer with security than
>>>>>>>>>>>>> I should have been!
>>>>>>>>>>>>>
>>>>>>>>>>>>> The UI shows 5 workers all starting on the same node - each of
>>>>>>>>>>>>> which has a declared uptime of 1s. Looking at the logs on the node, during
>>>>>>>>>>>>> that time the following happens:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:java.io.tmpdir=/tmp
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:java.compiler=<NA>
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:os.name=Linux
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:os.arch=amd64
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:os.version=3.13.0-48-generic
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:user.name=storm
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:user.home=/app/home/storm
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:user.dir=/
>>>>>>>>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>>>>>>>>>>> production-topology-2-1428932653 on
>>>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>>>>>>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>>>>>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>>>>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>>>>>>>>>>> "topology.fall.back.on.java.serialization" true,
>>>>>>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>>>>>>>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
>>>>>>>>>>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
>>>>>>>>>>>>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
>>>>>>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>>>>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>>>>>>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
>>>>>>>>>>>>> "supervisor.worker.start.timeout.secs" 120,
>>>>>>>>>>>>> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
>>>>>>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>>>>>>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>>>>>>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>>>>>>>>>>> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
>>>>>>>>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>>>>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>>>>>>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>>>>>>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>>>>>>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>>>>>>>>>>> "/transactional", "topology.acker.executors" nil,
>>>>>>>>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>>>>>>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>>>>>>>>>>> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
>>>>>>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
>>>>>>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>>>>>>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>>>>>>>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>>>>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
>>>>>>>>>>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>>>>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>>>>>>>>>>> "nimbus.topology.validator"
>>>>>>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
>>>>>>>>>>>>> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
>>>>>>>>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
>>>>>>>>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>>>>>>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>>>>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>>>>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>>>>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>>>>>>>>>>> "topology.disruptor.wait.strategy"
>>>>>>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
>>>>>>>>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>>>>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
>>>>>>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
>>>>>>>>>>>>> 5, "storm.thrift.transport"
>>>>>>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>>>>>>>>>>> "topology.state.synchronization.timeout.secs" 60,
>>>>>>>>>>>>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>>>>>>>>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>>>>>>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
>>>>>>>>>>>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>>>>>>>>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>>>>>>>>>>> "topology.max.task.parallelism" nil}
>>>>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO]
>>>>>>>>>>>>> Starting
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>>>>>>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating
>>>>>>>>>>>>> session
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session
>>>>>>>>>>>>> establishment complete on server *.
>>>>>>>>>>>>> compute.amazonaws.com/*:2181, sessionid = 0x34cb217be05005e,
>>>>>>>>>>>>> negotiated timeout = 20000
>>>>>>>>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state
>>>>>>>>>>>>> update: :connected:none
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session:
>>>>>>>>>>>>> 0x34cb217be05005e closed
>>>>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO]
>>>>>>>>>>>>> Starting
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>>>>>>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@4b492c47
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut
>>>>>>>>>>>>> down
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating
>>>>>>>>>>>>> session
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session
>>>>>>>>>>>>> establishment complete on server *.
>>>>>>>>>>>>> compute.amazonaws.com/*:2181, sessionid = 0x24cb217be030072,
>>>>>>>>>>>>> negotiated timeout = 20000
>>>>>>>>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer
>>>>>>>>>>>>> transport plugin:backtype.storm.messaging.zmq
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>>> counter:[2 2]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor
>>>>>>>>>>>>> tasks wordGenerator:[11 11]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>>> executor wordGenerator:[11 11]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>>> wordGenerator:[12 12]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor
>>>>>>>>>>>>> tasks wordGenerator:[12 12]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>>> executor wordGenerator:[12 12]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>>> wordGenerator:[13 13]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor
>>>>>>>>>>>>> tasks wordGenerator:[13 13]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>>> executor wordGenerator:[13 13]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>>> wordGenerator:[14 14]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor
>>>>>>>>>>>>> tasks wordGenerator:[14 14]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>>> executor wordGenerator:[14 14]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [bertels]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [bertels]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [mike]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [mike]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [bertels]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [bertels]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>>> wordGenerator:[15 15]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor
>>>>>>>>>>>>> tasks wordGenerator:[15 15]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>>> executor wordGenerator:[15 15]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>>> __system:[-1 -1]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system
>>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor
>>>>>>>>>>>>> tasks __system:[-1 -1]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>>> executor __system:[-1 -1]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>>>>>> __system:(-1)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>>> __acker:[1 1]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [jackson]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [mike]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [jackson]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:14, stream: default, id: {}, [mike]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [jackson]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [jackson]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [golda]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [golda]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>>>>>>> __system:(-1)
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker
>>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor
>>>>>>>>>>>>> tasks __acker:[1 1]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [jackson]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>>> executor __acker:[1 1]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching
>>>>>>>>>>>>> receive-thread for bd340179-5905-4233-ad5d-44b47b228177:6702
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>>>>>> __acker:(1)
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>>>>>>> __acker:(1)
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [nathan]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [bertels]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [golda]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [golda]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>>>>>>>>>>>
>>>>>>>>>>>>> After that, the logs stop (no more messages) and the UI says
>>>>>>>>>>>>> that the 5 executors are now running on another node (and port) - at which
>>>>>>>>>>>>> point the worker log is created on that node and with that number (same
>>>>>>>>>>>>> messages).
>>>>>>>>>>>>>
>>>>>>>>>>>>> So the worker seems to start and start emitting words but the
>>>>>>>>>>>>> UI isn't registered anything as having been emitted. After a short period
>>>>>>>>>>>>> of time the worker seems to stop without logging anything and so a new
>>>>>>>>>>>>> worker is started.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Did you check the worker process logs?  What are the emit
>>>>>>>>>>>>>> counts for the components in the UI?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also I have not run on AWS personally but the following line
>>>>>>>>>>>>>> seems strange to me:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>>>>>>> compute.amazonaws.com"}
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Generally this would be bound to a specific machine, not
>>>>>>>>>>>>>> *.<domain>.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>>>>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've set up a small Storm cluster on AWS that is giving me
>>>>>>>>>>>>>>> headaches. At the moment I've got a situation where the cluster (a nimbus
>>>>>>>>>>>>>>> node and 3 supervisors) will successfully run in local mode my particular
>>>>>>>>>>>>>>> topology. However, as soon as I try to run the topology in distributed mode
>>>>>>>>>>>>>>> there are no errors (that I can see) in any of the logs - the topology
>>>>>>>>>>>>>>> appears in the UI but never shows any tuples emitted by the spouts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In order to check it wasn't my topology I cloned the
>>>>>>>>>>>>>>> storm-starter project and have tried with the RollingTopWords example -
>>>>>>>>>>>>>>> running locally I get a stream of output but as soon as I specify that we
>>>>>>>>>>>>>>> should run in a distributed fashion I get a success message (topology
>>>>>>>>>>>>>>> successfully submitted). The topology appears on the UI as successfully
>>>>>>>>>>>>>>> submitted but I see no progress.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It's a bit of a weird one as I don't know which logs to
>>>>>>>>>>>>>>> share to show the problem as the whole point is I'm not seeing any errors!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The last entry in the supervisor logs pertains to the set up
>>>>>>>>>>>>>>> of the supervisor (i.e. it's not logged anything about the topologies that
>>>>>>>>>>>>>>> have been submitted)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>>>>>>>>>>> illuminating:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available
>>>>>>>>>>>>>>> slots: (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>>>>>>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors:
>>>>>>>>>>>>>>> [[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12]
>>>>>>>>>>>>>>> [13 13] [14 14] [15 15] [1 1]]
>>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new
>>>>>>>>>>>>>>> assignment for topology id production-topology-2-1428932653:
>>>>>>>>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>>>>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>>>>>>>>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>>>>>>>> compute.amazonaws.com"}, :executor->node+port {[2 2]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [3 3]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>>>>>>>>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>>>>>>>>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>>>>>>>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>>>>>>>>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Check out my blog: dogdogfish.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Check out my blog: dogdogfish.com
>>>>
>>>
>>>
>>
>>
>> --
>> Check out my blog: dogdogfish.com
>>
>
>
>
> --
> Check out my blog: dogdogfish.com
>



-- 
Check out my blog: dogdogfish.com

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Matthew Sharpe <ma...@gmail.com>.
Additional bit at the end of the worker GC log:

2.551: [GC [PSYoungGen: 120832K->16195K(140800K)] 120832K->16203K(461824K),
0.0217550 secs] [Times: user=0.09 sys=0.01, real=0.02 secs]
Heap
 PSYoungGen      total 140800K, used 36644K [0x00000000f0000000,
0x00000000f9d00000, 0x0000000100000000)
  eden space 120832K, 16% used
[0x00000000f0000000,0x00000000f13f8428,0x00000000f7600000)
  from space 19968K, 81% used
[0x00000000f7600000,0x00000000f85d0c90,0x00000000f8980000)
  to   space 19968K, 0% used
[0x00000000f8980000,0x00000000f8980000,0x00000000f9d00000)
 ParOldGen       total 321024K, used 8K [0x00000000d0000000,
0x00000000e3980000, 0x00000000f0000000)
  object space 321024K, 0% used
[0x00000000d0000000,0x00000000d0002000,0x00000000e3980000)
 PSPermGen       total 22528K, used 22196K [0x00000000cae00000,
0x00000000cc400000, 0x00000000d0000000)
  object space 22528K, 98% used
[0x00000000cae00000,0x00000000cc3ad218,0x00000000cc400000)

On 14 April 2015 at 11:15, Matthew Sharpe <ma...@gmail.com> wrote:

> Thanks for all your help so far - I've set up a cluster with m3.2xlarge
> EC2 instances, 30GB RAM and 8 vCPUs which I'm hoping should be enough. I've
> added GC logging and have switched to using one worker per supervisor
> (have gone from:
> - 6700
> - 6701
> - 6702
> - 6703
>
> to
>
> - 6700). I've also switched all of my Java space declarations to the
> values used in here:
> https://github.com/apache/storm/blob/master/conf/defaults.yaml
> i.e. nimbus: 1024m, ui : 768m, supervisor: 256m, worker : 768m
>
> I'm still not getting any tuples emitted - the supervisor GC log:
>
> Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE
> (1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0
> 20080428 (Red Hat 4.3.0-8)
> Memory: 4k page, physical 30824244k(27561384k free), swap 0k(0k free)
> CommandLine flags: -XX:InitialHeapSize=268435456 -XX:MaxHeapSize=268435456
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -XX:+UseCompressedOops -XX:+UseParallelGC
> 2.058: [GC [PSYoungGen: 66048K->10199K(76800K)] 66048K->10207K(251392K),
> 0.0158580 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
> 3.010: [GC [PSYoungGen: 76247K->10746K(76800K)] 76255K->11131K(251392K),
> 0.0171470 secs] [Times: user=0.07 sys=0.01, real=0.02 secs]
> 264.012: [GC [PSYoungGen: 76794K->7879K(76800K)] 77179K->8272K(251392K),
> 0.0114030 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]
>
> The worker GC log:
>
> Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE
> (1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0
> 20080428 (Red Hat 4.3.0-8)
> Memory: 4k page, physical 30824244k(27356064k free), swap 0k(0k free)
> CommandLine flags: -XX:InitialHeapSize=493187904 -XX:MaxHeapSize=805306368
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -XX:+UseCompressedOops -XX:+UseParallelGC
> 2.861: [GC [PSYoungGen: 120832K->16150K(140800K)]
> 120832K->16158K(461824K), 0.0233930 secs] [Times: user=0.07 sys=0.03,
> real=0.02 secs]
>
> The supervisor log:
>
> 2015-04-14 10:04:24 b.s.d.supervisor [INFO]
> f5888c5a-d36f-4d6b-8b67-4cf20203efef still hasn't started
> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shutting down and clearing
> state for id f5888c5a-d36f-4d6b-8b67-4cf20203efef. Current supervisor time:
> 1429005898. State: :timed-out, Heartbeat:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1429005865,
> :storm-id "production-topology-1-1429005259", :executors #{[2 2] [3 3] [4
> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
> [15 15] [-1 -1] [1 1]}, :port 6700}
> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shutting down
> 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e:f5888c5a-d36f-4d6b-8b67-4cf20203efef
> 2015-04-14 10:04:58 b.s.util [INFO] Error when trying to kill 30755.
> Process is probably already dead.
> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shut down
> 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e:f5888c5a-d36f-4d6b-8b67-4cf20203efef
> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Launching worker with
> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "production-topology-1-1429005259", :executors ([2 2] [3 3] [4 4] [5 5] [6
> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
> 1])} for this supervisor 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e on port 6700
> with id e7cfe011-56a4-42d4-8331-4cb055055799
> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Launching worker with command:
> java -server -Xmx768m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -verbose:gc -Xloggc:/app/storm/gc-storm-worker-6700.log
> -Djava.net.preferIPv4Stack=true
>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
> -Dlogfile.name=worker-6700.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
> -Dstorm.id=production-topology-1-1429005259
> -Dworker.id=e7cfe011-56a4-42d4-8331-4cb055055799 -Dworker.port=6700 -cp
> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-1-1429005259/stormjar.jar
> backtype.storm.daemon.worker production-topology-1-1429005259
> 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e 6700
> e7cfe011-56a4-42d4-8331-4cb055055799
> 2015-04-14 10:04:58 b.s.d.supervisor [INFO]
> e7cfe011-56a4-42d4-8331-4cb055055799 still hasn't started
> 2015-04-14 10:04:59 b.s.d.supervisor [INFO]
> e7cfe011-56a4-42d4-8331-4cb055055799 still hasn't started
>
> The worker log: https://gist.github.com/Kali89/2a3d3aea401f8d1bb195
>
> The nimbus log:
>
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[2 2] not alive
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[3 3] not alive
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[4 4] not alive
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[5 5] not alive
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[6 6] not alive
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[7 7] not alive
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[8 8] not alive
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[9 9] not alive
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[10 10] not alive
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[11 11] not alive
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[12 12] not alive
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[13 13] not alive
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[14 14] not alive
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[15 15] not alive
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
> production-topology-1-1429005259:[1 1] not alive
> 2015-04-14 10:11:27 b.s.s.EvenScheduler [INFO] Available slots:
> (["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700]
> ["5f1c9cea-3fcc-4d4e-aa9e-650d837e6a49" 6700])
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Reassigning
> production-topology-1-1429005259 to 1 slots
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 3]
> [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
> [15 15] [1 1]]
> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Setting new assignment for
> topology id production-topology-1-1429005259:
> #backtype.storm.daemon.common.Assignment{:master-code-dir
> "/app/storm/nimbus/stormdist/production-topology-1-1429005259", :node->host
> {"eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" "
> ec2-52-16-76-40.eu-west-1.compute.amazonaws.com"}, :executor->node+port
> {[2 2] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [3 3]
> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [4 4]
> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [5 5]
> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [6 6]
> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [7 7]
> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [8 8]
> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [9 9]
> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [10 10]
> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [11 11]
> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [12 12]
> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [13 13]
> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [14 14]
> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [15 15]
> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [1 1]
> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700]}, :executor->start-time-secs
> {[2 2] 1429006287, [3 3] 1429006287, [4 4] 1429006287, [5 5] 1429006287, [6
> 6] 1429006287, [7 7] 1429006287, [8 8] 1429006287, [9 9] 1429006287, [10
> 10] 1429006287, [11 11] 1429006287, [12 12] 1429006287, [13 13] 1429006287,
> [14 14] 1429006287, [15 15] 1429006287, [1 1] 1429006287}}
>
> Sorry for the data dump! For what it's worth - I'm now wondering how the
> supervisor and worker relationship works? With one 756MB worker per
> supervisor (node) I'm well below the machine limit but am I hitting another
> sort of limit? Does the supervisor daemon on the supervisor node have
> enough memory with 256MB? Is there a relationship between the worker memory
> and the supervisor memory?
>
> Thanks!
>
>
> On 13 April 2015 at 18:58, Nathan Leung <nc...@gmail.com> wrote:
>
>> No, the default is 4.  If you are running multiple workers on the same
>> supervisor node, that could cause problems with 768 MB / worker and 1 GB /
>> supervisor host.
>>
>> On Mon, Apr 13, 2015 at 1:55 PM, Matthew Sharpe <matthewsharpe3@gmail.com
>> > wrote:
>>
>>> I can only think that I'll be using whatever the default is - that's not
>>> a setting that I've changed anywhere or set on any command line. Am I right
>>> in thinking (hoping) that that'll default to one? If not, how do I set it
>>> for the RollingTopWords example?
>>>
>>> The cluster is being spun up now but will probably take half an hour or
>>> so before it's ready. I'll reply once it's all up and good to go.
>>>
>>> On 13 April 2015 at 18:40, Nathan Leung <nc...@gmail.com> wrote:
>>>
>>>> That looks about right.  Are you running one worker per supervisor
>>>> right now?
>>>>
>>>> On Mon, Apr 13, 2015 at 1:33 PM, Matthew Sharpe <
>>>> matthewsharpe3@gmail.com> wrote:
>>>>
>>>>> I was just thinking about that - I'm using EC2 m3.medium instances,
>>>>> they're not especially RAM heavy (1GB). Am I right in thinking that to
>>>>> enable GC logging I'd just need to run Storm with the following options in
>>>>> my 'storm.yaml'?
>>>>>
>>>>> worker.childopts: "-Xmx768m -XX:MaxPermSize=256m -XX:+PrintGCDetails
>>>>> -XX:+PrintGCTimeStamps -verbose:gc
>>>>> -Xloggc:/app/storm/logs/gc-storm-worker-%ID%.log"
>>>>> supervisor.childopts: "-XX:MaxPermSize=256m -XX:+PrintGCDetails
>>>>> -XX:+PrintGCTimeStamps -verbose:gc
>>>>> -Xloggc:/app/storm/logs/gc-storm-supervisor.log"
>>>>> If so, I'll set up the same cluster but using m3.xlarge instances
>>>>> (15GB of RAM) with those settings. Either way, I'll start setting up the
>>>>> extra large nodes.
>>>>>
>>>>>
>>>>> On 13 April 2015 at 18:10, Nathan Leung <nc...@gmail.com> wrote:
>>>>>
>>>>>> Do you have GC logging enabled?  You should have enough ram for each
>>>>>> worker (for this example) that it's not a problem, but doesn't hurt to
>>>>>> check.
>>>>>>
>>>>>> On Mon, Apr 13, 2015 at 12:12 PM, Matthew Sharpe <
>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>
>>>>>>> Not long at all - certainly less than couple of minutes but longer
>>>>>>> than a few seconds, the couple I actually measured lasted 42 seconds
>>>>>>> followed by about 5 seconds. So the heartbeats folder behaves like so...
>>>>>>>
>>>>>>> /app/storm/workers/[uuid] is created with an empty heartbeats folder
>>>>>>> inside it
>>>>>>> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a
>>>>>>> couple of examples:
>>>>>>>
>>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941026297.version
>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297
>>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941027240.version
>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240
>>>>>>>
>>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941398235.version
>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235
>>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941399163.version
>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163
>>>>>>> )
>>>>>>>
>>>>>>> /app/storm/workers/[uuid] is deleted.
>>>>>>>
>>>>>>> I've not seen any heartbeats folder with any more than 2 heartbeats
>>>>>>> in.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 13 April 2015 at 17:00, Nathan Leung <nc...@gmail.com> wrote:
>>>>>>>
>>>>>>>> It only matters what user supervisor is run as, so you should be
>>>>>>>> ok.  How long is it between the worker being started, and it getting
>>>>>>>> killed?  What are the timestamps of the heartbeats in the corresponding
>>>>>>>> worker folder?
>>>>>>>>
>>>>>>>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <
>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I'm running storm as a (specially created) storm user - on both
>>>>>>>>> the nimbus and the supervisors the process is showing as being run by the
>>>>>>>>> storm user. And I've got the storm local directory set as /app/storm -
>>>>>>>>> owned by storm user and with permissions 700. Thought it might be due to me
>>>>>>>>> submitting the job as a different user but have just terminated the old job
>>>>>>>>> and submitted the new one as the storm user and still not getting any
>>>>>>>>> tuples coming through.
>>>>>>>>>
>>>>>>>>> For what it's worth, I've looked in the local data directory on
>>>>>>>>> the supervisors and in the workers folder there is a (non-empty) heartbeats
>>>>>>>>> folder being created - don't know if that's indicative of heartbeats being
>>>>>>>>> processed correctly or if that's created at the start of a worker process.
>>>>>>>>>
>>>>>>>>> Is there anything else you want to see that would help you
>>>>>>>>> diagnose this? I can share the UI with you if you think that'd help?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 13 April 2015 at 16:22, Nathan Leung <nc...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Sounds like it's having problems with heartbeats.  As I recall
>>>>>>>>>> worker -> supervisor heartbeats are written using the local filesystem.
>>>>>>>>>> What user are you running as?  In your storm.yaml file, you will see the
>>>>>>>>>> option "storm.local.dir".  Does the user supervisor / workers are running
>>>>>>>>>> as have permissions to write to this folder?
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>>>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> New bonus bit of information - the supervisor log has got this
>>>>>>>>>>> going on:
>>>>>>>>>>>
>>>>>>>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>>>>>>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and
>>>>>>>>>>> clearing state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current
>>>>>>>>>>> supervisor time: 1428938060. State: :timed-out, Heartbeat:
>>>>>>>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>>>>>>>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
>>>>>>>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>>>>>>>>>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill
>>>>>>>>>>> 5284. Process is probably already dead.
>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker
>>>>>>>>>>> with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>>>>>>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
>>>>>>>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>>>>>>>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
>>>>>>>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker
>>>>>>>>>>> with command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>>>>>>>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>>>>> -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>>>>>>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>>>>>>>>>> -Dstorm.id=production-topology-2-1428932653
>>>>>>>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
>>>>>>>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>>>>>>>>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>>>>>>>>>
>>>>>>>>>>> On 13 April 2015 at 16:09, Matthew Sharpe <
>>>>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> OK, in checking the worker process logs I've uncovered a bit
>>>>>>>>>>>> more information. The UI reports 0 emitted from the wordGenerator spout,
>>>>>>>>>>>> along with 0 transferred, acked and failed.
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know
>>>>>>>>>>>> the usual way of doing that), I've got a bunch of other stuff on the boxes
>>>>>>>>>>>> and in troubleshooting things out have been a bit laxer with security than
>>>>>>>>>>>> I should have been!
>>>>>>>>>>>>
>>>>>>>>>>>> The UI shows 5 workers all starting on the same node - each of
>>>>>>>>>>>> which has a declared uptime of 1s. Looking at the logs on the node, during
>>>>>>>>>>>> that time the following happens:
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>> environment:java.io.tmpdir=/tmp
>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>> environment:java.compiler=<NA>
>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>> environment:os.name=Linux
>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>> environment:os.arch=amd64
>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>> environment:os.version=3.13.0-48-generic
>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>> environment:user.name=storm
>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>> environment:user.home=/app/home/storm
>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>> environment:user.dir=/
>>>>>>>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>>>>>>>>>> production-topology-2-1428932653 on
>>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>>>>>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>>>>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>>>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>>>>>>>>>> "topology.fall.back.on.java.serialization" true,
>>>>>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>>>>>>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
>>>>>>>>>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
>>>>>>>>>>>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
>>>>>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>>>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>>>>>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
>>>>>>>>>>>> "supervisor.worker.start.timeout.secs" 120,
>>>>>>>>>>>> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
>>>>>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>>>>>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>>>>>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>>>>>>>>>> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
>>>>>>>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>>>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>>>>>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>>>>>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>>>>>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>>>>>>>>>> "/transactional", "topology.acker.executors" nil,
>>>>>>>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>>>>>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>>>>>>>>>> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
>>>>>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
>>>>>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>>>>>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>>>>>>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>>>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
>>>>>>>>>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>>>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>>>>>>>>>> "nimbus.topology.validator"
>>>>>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
>>>>>>>>>>>> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
>>>>>>>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
>>>>>>>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>>>>>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>>>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>>>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>>>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>>>>>>>>>> "topology.disruptor.wait.strategy"
>>>>>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
>>>>>>>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>>>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
>>>>>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
>>>>>>>>>>>> 5, "storm.thrift.transport"
>>>>>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>>>>>>>>>> "topology.state.synchronization.timeout.secs" 60,
>>>>>>>>>>>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>>>>>>>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>>>>>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
>>>>>>>>>>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>>>>>>>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>>>>>>>>>> "topology.max.task.parallelism" nil}
>>>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO]
>>>>>>>>>>>> Starting
>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>>>>>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating
>>>>>>>>>>>> session
>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session
>>>>>>>>>>>> establishment complete on server *.compute.amazonaws.com/*:2181,
>>>>>>>>>>>> sessionid = 0x34cb217be05005e, negotiated timeout = 20000
>>>>>>>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state
>>>>>>>>>>>> update: :connected:none
>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session:
>>>>>>>>>>>> 0x34cb217be05005e closed
>>>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO]
>>>>>>>>>>>> Starting
>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>>>>>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@4b492c47
>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut
>>>>>>>>>>>> down
>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating
>>>>>>>>>>>> session
>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session
>>>>>>>>>>>> establishment complete on server *.compute.amazonaws.com/*:2181,
>>>>>>>>>>>> sessionid = 0x24cb217be030072, negotiated timeout = 20000
>>>>>>>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer
>>>>>>>>>>>> transport plugin:backtype.storm.messaging.zmq
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>> counter:[2 2]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>>> wordGenerator:[11 11]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>> executor wordGenerator:[11 11]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>> wordGenerator:[12 12]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>>> wordGenerator:[12 12]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>> executor wordGenerator:[12 12]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>> wordGenerator:[13 13]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>>> wordGenerator:[13 13]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>> executor wordGenerator:[13 13]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>> wordGenerator:[14 14]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>>> wordGenerator:[14 14]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>> executor wordGenerator:[14 14]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> default [bertels]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [bertels]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> default [mike]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [mike]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> default [bertels]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [bertels]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>> wordGenerator:[15 15]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>>> wordGenerator:[15 15]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>> executor wordGenerator:[15 15]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>> __system:[-1 -1]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system
>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>>> __system:[-1 -1]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>> executor __system:[-1 -1]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>>>>> __system:(-1)
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>> __acker:[1 1]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> default [jackson]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> default [mike]
>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [jackson]
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>> message source: wordGenerator:14, stream: default, id: {}, [mike]
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> default [jackson]
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [jackson]
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> default [golda]
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [golda]
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>>>>>> __system:(-1)
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker
>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>>> __acker:[1 1]
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> default [jackson]
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>> executor __acker:[1 1]
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching
>>>>>>>>>>>> receive-thread for bd340179-5905-4233-ad5d-44b47b228177:6702
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>>>>> __acker:(1)
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>>>>>> __acker:(1)
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> default [nathan]
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> default [bertels]
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>> default [golda]
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [golda]
>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>>>>>>>>>>
>>>>>>>>>>>> After that, the logs stop (no more messages) and the UI says
>>>>>>>>>>>> that the 5 executors are now running on another node (and port) - at which
>>>>>>>>>>>> point the worker log is created on that node and with that number (same
>>>>>>>>>>>> messages).
>>>>>>>>>>>>
>>>>>>>>>>>> So the worker seems to start and start emitting words but the
>>>>>>>>>>>> UI isn't registered anything as having been emitted. After a short period
>>>>>>>>>>>> of time the worker seems to stop without logging anything and so a new
>>>>>>>>>>>> worker is started.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Matt
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Did you check the worker process logs?  What are the emit
>>>>>>>>>>>>> counts for the components in the UI?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also I have not run on AWS personally but the following line
>>>>>>>>>>>>> seems strange to me:
>>>>>>>>>>>>>
>>>>>>>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>>>>>> compute.amazonaws.com"}
>>>>>>>>>>>>>
>>>>>>>>>>>>> Generally this would be bound to a specific machine, not
>>>>>>>>>>>>> *.<domain>.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>>>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've set up a small Storm cluster on AWS that is giving me
>>>>>>>>>>>>>> headaches. At the moment I've got a situation where the cluster (a nimbus
>>>>>>>>>>>>>> node and 3 supervisors) will successfully run in local mode my particular
>>>>>>>>>>>>>> topology. However, as soon as I try to run the topology in distributed mode
>>>>>>>>>>>>>> there are no errors (that I can see) in any of the logs - the topology
>>>>>>>>>>>>>> appears in the UI but never shows any tuples emitted by the spouts.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In order to check it wasn't my topology I cloned the
>>>>>>>>>>>>>> storm-starter project and have tried with the RollingTopWords example -
>>>>>>>>>>>>>> running locally I get a stream of output but as soon as I specify that we
>>>>>>>>>>>>>> should run in a distributed fashion I get a success message (topology
>>>>>>>>>>>>>> successfully submitted). The topology appears on the UI as successfully
>>>>>>>>>>>>>> submitted but I see no progress.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It's a bit of a weird one as I don't know which logs to share
>>>>>>>>>>>>>> to show the problem as the whole point is I'm not seeing any errors!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The last entry in the supervisor logs pertains to the set up
>>>>>>>>>>>>>> of the supervisor (i.e. it's not logged anything about the topologies that
>>>>>>>>>>>>>> have been submitted)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>>>>>>>>>> illuminating:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available
>>>>>>>>>>>>>> slots: (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>>>>>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors:
>>>>>>>>>>>>>> [[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12]
>>>>>>>>>>>>>> [13 13] [14 14] [15 15] [1 1]]
>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new
>>>>>>>>>>>>>> assignment for topology id production-topology-2-1428932653:
>>>>>>>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>>>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>>>>>>>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>>>>>>> compute.amazonaws.com"}, :executor->node+port {[2 2]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [3 3]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>>>>>>>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>>>>>>>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>>>>>>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>>>>>>>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Check out my blog: dogdogfish.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Check out my blog: dogdogfish.com
>>>
>>
>>
>
>
> --
> Check out my blog: dogdogfish.com
>



-- 
Check out my blog: dogdogfish.com

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Matthew Sharpe <ma...@gmail.com>.
Thanks for all your help so far - I've set up a cluster with m3.2xlarge EC2
instances, 30GB RAM and 8 vCPUs which I'm hoping should be enough. I've
added GC logging and have switched to using one worker per supervisor
(have gone from:
- 6700
- 6701
- 6702
- 6703

to

- 6700). I've also switched all of my Java space declarations to the values
used in here: https://github.com/apache/storm/blob/master/conf/defaults.yaml
i.e. nimbus: 1024m, ui : 768m, supervisor: 256m, worker : 768m

I'm still not getting any tuples emitted - the supervisor GC log:

Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE
(1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0
20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 30824244k(27561384k free), swap 0k(0k free)
CommandLine flags: -XX:InitialHeapSize=268435456 -XX:MaxHeapSize=268435456
-XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+UseCompressedOops -XX:+UseParallelGC
2.058: [GC [PSYoungGen: 66048K->10199K(76800K)] 66048K->10207K(251392K),
0.0158580 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
3.010: [GC [PSYoungGen: 76247K->10746K(76800K)] 76255K->11131K(251392K),
0.0171470 secs] [Times: user=0.07 sys=0.01, real=0.02 secs]
264.012: [GC [PSYoungGen: 76794K->7879K(76800K)] 77179K->8272K(251392K),
0.0114030 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]

The worker GC log:

Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE
(1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0
20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 30824244k(27356064k free), swap 0k(0k free)
CommandLine flags: -XX:InitialHeapSize=493187904 -XX:MaxHeapSize=805306368
-XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+UseCompressedOops -XX:+UseParallelGC
2.861: [GC [PSYoungGen: 120832K->16150K(140800K)] 120832K->16158K(461824K),
0.0233930 secs] [Times: user=0.07 sys=0.03, real=0.02 secs]

The supervisor log:

2015-04-14 10:04:24 b.s.d.supervisor [INFO]
f5888c5a-d36f-4d6b-8b67-4cf20203efef still hasn't started
2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shutting down and clearing
state for id f5888c5a-d36f-4d6b-8b67-4cf20203efef. Current supervisor time:
1429005898. State: :timed-out, Heartbeat:
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1429005865,
:storm-id "production-topology-1-1429005259", :executors #{[2 2] [3 3] [4
4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
[15 15] [-1 -1] [1 1]}, :port 6700}
2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shutting down
5a1f1d44-4d4f-4309-9ed3-d753da1dda9e:f5888c5a-d36f-4d6b-8b67-4cf20203efef
2015-04-14 10:04:58 b.s.util [INFO] Error when trying to kill 30755.
Process is probably already dead.
2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shut down
5a1f1d44-4d4f-4309-9ed3-d753da1dda9e:f5888c5a-d36f-4d6b-8b67-4cf20203efef
2015-04-14 10:04:58 b.s.d.supervisor [INFO] Launching worker with
assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
"production-topology-1-1429005259", :executors ([2 2] [3 3] [4 4] [5 5] [6
6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
1])} for this supervisor 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e on port 6700
with id e7cfe011-56a4-42d4-8331-4cb055055799
2015-04-14 10:04:58 b.s.d.supervisor [INFO] Launching worker with command:
java -server -Xmx768m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-verbose:gc -Xloggc:/app/storm/gc-storm-worker-6700.log
-Djava.net.preferIPv4Stack=true
 -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
-Dlogfile.name=worker-6700.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
-Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
-Dstorm.id=production-topology-1-1429005259
-Dworker.id=e7cfe011-56a4-42d4-8331-4cb055055799 -Dworker.port=6700 -cp
/usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-1-1429005259/stormjar.jar
backtype.storm.daemon.worker production-topology-1-1429005259
5a1f1d44-4d4f-4309-9ed3-d753da1dda9e 6700
e7cfe011-56a4-42d4-8331-4cb055055799
2015-04-14 10:04:58 b.s.d.supervisor [INFO]
e7cfe011-56a4-42d4-8331-4cb055055799 still hasn't started
2015-04-14 10:04:59 b.s.d.supervisor [INFO]
e7cfe011-56a4-42d4-8331-4cb055055799 still hasn't started

The worker log: https://gist.github.com/Kali89/2a3d3aea401f8d1bb195

The nimbus log:

2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[2 2] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[3 3] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[4 4] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[5 5] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[6 6] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[7 7] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[8 8] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[9 9] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[10 10] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[11 11] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[12 12] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[13 13] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[14 14] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[15 15] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[1 1] not alive
2015-04-14 10:11:27 b.s.s.EvenScheduler [INFO] Available slots:
(["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700]
["5f1c9cea-3fcc-4d4e-aa9e-650d837e6a49" 6700])
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Reassigning
production-topology-1-1429005259 to 1 slots
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 3] [4
4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
[15 15] [1 1]]
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Setting new assignment for topology
id production-topology-1-1429005259:
#backtype.storm.daemon.common.Assignment{:master-code-dir
"/app/storm/nimbus/stormdist/production-topology-1-1429005259", :node->host
{"eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" "
ec2-52-16-76-40.eu-west-1.compute.amazonaws.com"}, :executor->node+port {[2
2] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [3 3]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [4 4]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [5 5]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [6 6]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [7 7]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [8 8]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [9 9]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [10 10]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [11 11]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [12 12]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [13 13]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [14 14]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [15 15]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [1 1]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700]}, :executor->start-time-secs
{[2 2] 1429006287, [3 3] 1429006287, [4 4] 1429006287, [5 5] 1429006287, [6
6] 1429006287, [7 7] 1429006287, [8 8] 1429006287, [9 9] 1429006287, [10
10] 1429006287, [11 11] 1429006287, [12 12] 1429006287, [13 13] 1429006287,
[14 14] 1429006287, [15 15] 1429006287, [1 1] 1429006287}}

Sorry for the data dump! For what it's worth - I'm now wondering how the
supervisor and worker relationship works? With one 756MB worker per
supervisor (node) I'm well below the machine limit but am I hitting another
sort of limit? Does the supervisor daemon on the supervisor node have
enough memory with 256MB? Is there a relationship between the worker memory
and the supervisor memory?

Thanks!


On 13 April 2015 at 18:58, Nathan Leung <nc...@gmail.com> wrote:

> No, the default is 4.  If you are running multiple workers on the same
> supervisor node, that could cause problems with 768 MB / worker and 1 GB /
> supervisor host.
>
> On Mon, Apr 13, 2015 at 1:55 PM, Matthew Sharpe <ma...@gmail.com>
> wrote:
>
>> I can only think that I'll be using whatever the default is - that's not
>> a setting that I've changed anywhere or set on any command line. Am I right
>> in thinking (hoping) that that'll default to one? If not, how do I set it
>> for the RollingTopWords example?
>>
>> The cluster is being spun up now but will probably take half an hour or
>> so before it's ready. I'll reply once it's all up and good to go.
>>
>> On 13 April 2015 at 18:40, Nathan Leung <nc...@gmail.com> wrote:
>>
>>> That looks about right.  Are you running one worker per supervisor right
>>> now?
>>>
>>> On Mon, Apr 13, 2015 at 1:33 PM, Matthew Sharpe <
>>> matthewsharpe3@gmail.com> wrote:
>>>
>>>> I was just thinking about that - I'm using EC2 m3.medium instances,
>>>> they're not especially RAM heavy (1GB). Am I right in thinking that to
>>>> enable GC logging I'd just need to run Storm with the following options in
>>>> my 'storm.yaml'?
>>>>
>>>> worker.childopts: "-Xmx768m -XX:MaxPermSize=256m -XX:+PrintGCDetails
>>>> -XX:+PrintGCTimeStamps -verbose:gc
>>>> -Xloggc:/app/storm/logs/gc-storm-worker-%ID%.log"
>>>> supervisor.childopts: "-XX:MaxPermSize=256m -XX:+PrintGCDetails
>>>> -XX:+PrintGCTimeStamps -verbose:gc
>>>> -Xloggc:/app/storm/logs/gc-storm-supervisor.log"
>>>> If so, I'll set up the same cluster but using m3.xlarge instances (15GB
>>>> of RAM) with those settings. Either way, I'll start setting up the extra
>>>> large nodes.
>>>>
>>>>
>>>> On 13 April 2015 at 18:10, Nathan Leung <nc...@gmail.com> wrote:
>>>>
>>>>> Do you have GC logging enabled?  You should have enough ram for each
>>>>> worker (for this example) that it's not a problem, but doesn't hurt to
>>>>> check.
>>>>>
>>>>> On Mon, Apr 13, 2015 at 12:12 PM, Matthew Sharpe <
>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>
>>>>>> Not long at all - certainly less than couple of minutes but longer
>>>>>> than a few seconds, the couple I actually measured lasted 42 seconds
>>>>>> followed by about 5 seconds. So the heartbeats folder behaves like so...
>>>>>>
>>>>>> /app/storm/workers/[uuid] is created with an empty heartbeats folder
>>>>>> inside it
>>>>>> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a
>>>>>> couple of examples:
>>>>>>
>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941026297.version
>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297
>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941027240.version
>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240
>>>>>>
>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941398235.version
>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235
>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941399163.version
>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163
>>>>>> )
>>>>>>
>>>>>> /app/storm/workers/[uuid] is deleted.
>>>>>>
>>>>>> I've not seen any heartbeats folder with any more than 2 heartbeats
>>>>>> in.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 13 April 2015 at 17:00, Nathan Leung <nc...@gmail.com> wrote:
>>>>>>
>>>>>>> It only matters what user supervisor is run as, so you should be
>>>>>>> ok.  How long is it between the worker being started, and it getting
>>>>>>> killed?  What are the timestamps of the heartbeats in the corresponding
>>>>>>> worker folder?
>>>>>>>
>>>>>>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <
>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>
>>>>>>>> I'm running storm as a (specially created) storm user - on both the
>>>>>>>> nimbus and the supervisors the process is showing as being run by the storm
>>>>>>>> user. And I've got the storm local directory set as /app/storm - owned by
>>>>>>>> storm user and with permissions 700. Thought it might be due to me
>>>>>>>> submitting the job as a different user but have just terminated the old job
>>>>>>>> and submitted the new one as the storm user and still not getting any
>>>>>>>> tuples coming through.
>>>>>>>>
>>>>>>>> For what it's worth, I've looked in the local data directory on the
>>>>>>>> supervisors and in the workers folder there is a (non-empty) heartbeats
>>>>>>>> folder being created - don't know if that's indicative of heartbeats being
>>>>>>>> processed correctly or if that's created at the start of a worker process.
>>>>>>>>
>>>>>>>> Is there anything else you want to see that would help you diagnose
>>>>>>>> this? I can share the UI with you if you think that'd help?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>> On 13 April 2015 at 16:22, Nathan Leung <nc...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Sounds like it's having problems with heartbeats.  As I recall
>>>>>>>>> worker -> supervisor heartbeats are written using the local filesystem.
>>>>>>>>> What user are you running as?  In your storm.yaml file, you will see the
>>>>>>>>> option "storm.local.dir".  Does the user supervisor / workers are running
>>>>>>>>> as have permissions to write to this folder?
>>>>>>>>>
>>>>>>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> New bonus bit of information - the supervisor log has got this
>>>>>>>>>> going on:
>>>>>>>>>>
>>>>>>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>>>>>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and
>>>>>>>>>> clearing state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current
>>>>>>>>>> supervisor time: 1428938060. State: :timed-out, Heartbeat:
>>>>>>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>>>>>>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
>>>>>>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>>>>>>>>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill
>>>>>>>>>> 5284. Process is probably already dead.
>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>>>>>>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>>>>>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
>>>>>>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>>>>>>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
>>>>>>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>>>>>>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>>>>>>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>>>> -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>>>>>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>>>>>>>>> -Dstorm.id=production-topology-2-1428932653
>>>>>>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
>>>>>>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>>>>>>>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>>>>>>>>
>>>>>>>>>> On 13 April 2015 at 16:09, Matthew Sharpe <
>>>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> OK, in checking the worker process logs I've uncovered a bit
>>>>>>>>>>> more information. The UI reports 0 emitted from the wordGenerator spout,
>>>>>>>>>>> along with 0 transferred, acked and failed.
>>>>>>>>>>>
>>>>>>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the
>>>>>>>>>>> usual way of doing that), I've got a bunch of other stuff on the boxes and
>>>>>>>>>>> in troubleshooting things out have been a bit laxer with security than I
>>>>>>>>>>> should have been!
>>>>>>>>>>>
>>>>>>>>>>> The UI shows 5 workers all starting on the same node - each of
>>>>>>>>>>> which has a declared uptime of 1s. Looking at the logs on the node, during
>>>>>>>>>>> that time the following happens:
>>>>>>>>>>>
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:java.io.tmpdir=/tmp
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:java.compiler=<NA>
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:os.name=Linux
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:os.arch=amd64
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:os.version=3.13.0-48-generic
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:user.name=storm
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:user.home=/app/home/storm
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:user.dir=/
>>>>>>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>>>>>>>>> production-topology-2-1428932653 on
>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>>>>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>>>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>>>>>>>>> "topology.fall.back.on.java.serialization" true,
>>>>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>>>>>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
>>>>>>>>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
>>>>>>>>>>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
>>>>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>>>>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
>>>>>>>>>>> "supervisor.worker.start.timeout.secs" 120,
>>>>>>>>>>> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
>>>>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>>>>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>>>>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>>>>>>>>> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
>>>>>>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>>>>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>>>>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>>>>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>>>>>>>>> "/transactional", "topology.acker.executors" nil,
>>>>>>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>>>>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>>>>>>>>> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
>>>>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
>>>>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>>>>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>>>>>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
>>>>>>>>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>>>>>>>>> "nimbus.topology.validator"
>>>>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
>>>>>>>>>>> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
>>>>>>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
>>>>>>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>>>>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>>>>>>>>> "topology.disruptor.wait.strategy"
>>>>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
>>>>>>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
>>>>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
>>>>>>>>>>> 5, "storm.thrift.transport"
>>>>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>>>>>>>>> "topology.state.synchronization.timeout.secs" 60,
>>>>>>>>>>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>>>>>>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>>>>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
>>>>>>>>>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>>>>>>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>>>>>>>>> "topology.max.task.parallelism" nil}
>>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO]
>>>>>>>>>>> Starting
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>>>>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating
>>>>>>>>>>> session
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session
>>>>>>>>>>> establishment complete on server *.compute.amazonaws.com/*:2181,
>>>>>>>>>>> sessionid = 0x34cb217be05005e, negotiated timeout = 20000
>>>>>>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
>>>>>>>>>>> :connected:none
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session:
>>>>>>>>>>> 0x34cb217be05005e closed
>>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO]
>>>>>>>>>>> Starting
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>>>>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@4b492c47
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating
>>>>>>>>>>> session
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session
>>>>>>>>>>> establishment complete on server *.compute.amazonaws.com/*:2181,
>>>>>>>>>>> sessionid = 0x24cb217be030072, negotiated timeout = 20000
>>>>>>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer
>>>>>>>>>>> transport plugin:backtype.storm.messaging.zmq
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>> counter:[2 2]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>> wordGenerator:[11 11]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>> executor wordGenerator:[11 11]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>> wordGenerator:[12 12]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>> wordGenerator:[12 12]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>> executor wordGenerator:[12 12]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>> wordGenerator:[13 13]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>> wordGenerator:[13 13]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>> executor wordGenerator:[13 13]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>> wordGenerator:[14 14]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>> wordGenerator:[14 14]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>> executor wordGenerator:[14 14]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [bertels]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [bertels]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [mike]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [mike]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [bertels]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [bertels]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>> wordGenerator:[15 15]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>> wordGenerator:[15 15]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>> executor wordGenerator:[15 15]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>> __system:[-1 -1]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system
>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>> __system:[-1 -1]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>> executor __system:[-1 -1]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>>>> __system:(-1)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>> __acker:[1 1]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [jackson]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [mike]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [jackson]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:14, stream: default, id: {}, [mike]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [jackson]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [jackson]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [golda]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [golda]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>>>>> __system:(-1)
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
>>>>>>>>>>> ["startup"]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>> __acker:[1 1]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [jackson]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>> executor __acker:[1 1]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread
>>>>>>>>>>> for bd340179-5905-4233-ad5d-44b47b228177:6702
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>>>> __acker:(1)
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>>>>> __acker:(1)
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [nathan]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [bertels]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [golda]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [golda]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>>>>>>>>>
>>>>>>>>>>> After that, the logs stop (no more messages) and the UI says
>>>>>>>>>>> that the 5 executors are now running on another node (and port) - at which
>>>>>>>>>>> point the worker log is created on that node and with that number (same
>>>>>>>>>>> messages).
>>>>>>>>>>>
>>>>>>>>>>> So the worker seems to start and start emitting words but the UI
>>>>>>>>>>> isn't registered anything as having been emitted. After a short period of
>>>>>>>>>>> time the worker seems to stop without logging anything and so a new worker
>>>>>>>>>>> is started.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Matt
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Did you check the worker process logs?  What are the emit
>>>>>>>>>>>> counts for the components in the UI?
>>>>>>>>>>>>
>>>>>>>>>>>> Also I have not run on AWS personally but the following line
>>>>>>>>>>>> seems strange to me:
>>>>>>>>>>>>
>>>>>>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>>>>> compute.amazonaws.com"}
>>>>>>>>>>>>
>>>>>>>>>>>> Generally this would be bound to a specific machine, not
>>>>>>>>>>>> *.<domain>.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've set up a small Storm cluster on AWS that is giving me
>>>>>>>>>>>>> headaches. At the moment I've got a situation where the cluster (a nimbus
>>>>>>>>>>>>> node and 3 supervisors) will successfully run in local mode my particular
>>>>>>>>>>>>> topology. However, as soon as I try to run the topology in distributed mode
>>>>>>>>>>>>> there are no errors (that I can see) in any of the logs - the topology
>>>>>>>>>>>>> appears in the UI but never shows any tuples emitted by the spouts.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In order to check it wasn't my topology I cloned the
>>>>>>>>>>>>> storm-starter project and have tried with the RollingTopWords example -
>>>>>>>>>>>>> running locally I get a stream of output but as soon as I specify that we
>>>>>>>>>>>>> should run in a distributed fashion I get a success message (topology
>>>>>>>>>>>>> successfully submitted). The topology appears on the UI as successfully
>>>>>>>>>>>>> submitted but I see no progress.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's a bit of a weird one as I don't know which logs to share
>>>>>>>>>>>>> to show the problem as the whole point is I'm not seeing any errors!
>>>>>>>>>>>>>
>>>>>>>>>>>>> The last entry in the supervisor logs pertains to the set up
>>>>>>>>>>>>> of the supervisor (i.e. it's not logged anything about the topologies that
>>>>>>>>>>>>> have been submitted)
>>>>>>>>>>>>>
>>>>>>>>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>>>>>>>>> illuminating:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available
>>>>>>>>>>>>> slots: (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>>>>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors:
>>>>>>>>>>>>> [[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12]
>>>>>>>>>>>>> [13 13] [14 14] [15 15] [1 1]]
>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment
>>>>>>>>>>>>> for topology id production-topology-2-1428932653:
>>>>>>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>>>>>>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>>>>>> compute.amazonaws.com"}, :executor->node+port {[2 2]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [3 3]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>>>>>>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>>>>>>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>>>>>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>>>>>>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>>>>>>>>
>>>>>>>>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Check out my blog: dogdogfish.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Check out my blog: dogdogfish.com
>>>>
>>>
>>>
>>
>>
>> --
>> Check out my blog: dogdogfish.com
>>
>
>


-- 
Check out my blog: dogdogfish.com

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Nathan Leung <nc...@gmail.com>.
No, the default is 4.  If you are running multiple workers on the same
supervisor node, that could cause problems with 768 MB / worker and 1 GB /
supervisor host.

On Mon, Apr 13, 2015 at 1:55 PM, Matthew Sharpe <ma...@gmail.com>
wrote:

> I can only think that I'll be using whatever the default is - that's not a
> setting that I've changed anywhere or set on any command line. Am I right
> in thinking (hoping) that that'll default to one? If not, how do I set it
> for the RollingTopWords example?
>
> The cluster is being spun up now but will probably take half an hour or so
> before it's ready. I'll reply once it's all up and good to go.
>
> On 13 April 2015 at 18:40, Nathan Leung <nc...@gmail.com> wrote:
>
>> That looks about right.  Are you running one worker per supervisor right
>> now?
>>
>> On Mon, Apr 13, 2015 at 1:33 PM, Matthew Sharpe <matthewsharpe3@gmail.com
>> > wrote:
>>
>>> I was just thinking about that - I'm using EC2 m3.medium instances,
>>> they're not especially RAM heavy (1GB). Am I right in thinking that to
>>> enable GC logging I'd just need to run Storm with the following options in
>>> my 'storm.yaml'?
>>>
>>> worker.childopts: "-Xmx768m -XX:MaxPermSize=256m -XX:+PrintGCDetails
>>> -XX:+PrintGCTimeStamps -verbose:gc
>>> -Xloggc:/app/storm/logs/gc-storm-worker-%ID%.log"
>>> supervisor.childopts: "-XX:MaxPermSize=256m -XX:+PrintGCDetails
>>> -XX:+PrintGCTimeStamps -verbose:gc
>>> -Xloggc:/app/storm/logs/gc-storm-supervisor.log"
>>> If so, I'll set up the same cluster but using m3.xlarge instances (15GB
>>> of RAM) with those settings. Either way, I'll start setting up the extra
>>> large nodes.
>>>
>>>
>>> On 13 April 2015 at 18:10, Nathan Leung <nc...@gmail.com> wrote:
>>>
>>>> Do you have GC logging enabled?  You should have enough ram for each
>>>> worker (for this example) that it's not a problem, but doesn't hurt to
>>>> check.
>>>>
>>>> On Mon, Apr 13, 2015 at 12:12 PM, Matthew Sharpe <
>>>> matthewsharpe3@gmail.com> wrote:
>>>>
>>>>> Not long at all - certainly less than couple of minutes but longer
>>>>> than a few seconds, the couple I actually measured lasted 42 seconds
>>>>> followed by about 5 seconds. So the heartbeats folder behaves like so...
>>>>>
>>>>> /app/storm/workers/[uuid] is created with an empty heartbeats folder
>>>>> inside it
>>>>> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a
>>>>> couple of examples:
>>>>>
>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941026297.version
>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297
>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941027240.version
>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240
>>>>>
>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941398235.version
>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235
>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941399163.version
>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163
>>>>> )
>>>>>
>>>>> /app/storm/workers/[uuid] is deleted.
>>>>>
>>>>> I've not seen any heartbeats folder with any more than 2 heartbeats in.
>>>>>
>>>>>
>>>>>
>>>>> On 13 April 2015 at 17:00, Nathan Leung <nc...@gmail.com> wrote:
>>>>>
>>>>>> It only matters what user supervisor is run as, so you should be ok.
>>>>>> How long is it between the worker being started, and it getting killed?
>>>>>> What are the timestamps of the heartbeats in the corresponding worker
>>>>>> folder?
>>>>>>
>>>>>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <
>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>
>>>>>>> I'm running storm as a (specially created) storm user - on both the
>>>>>>> nimbus and the supervisors the process is showing as being run by the storm
>>>>>>> user. And I've got the storm local directory set as /app/storm - owned by
>>>>>>> storm user and with permissions 700. Thought it might be due to me
>>>>>>> submitting the job as a different user but have just terminated the old job
>>>>>>> and submitted the new one as the storm user and still not getting any
>>>>>>> tuples coming through.
>>>>>>>
>>>>>>> For what it's worth, I've looked in the local data directory on the
>>>>>>> supervisors and in the workers folder there is a (non-empty) heartbeats
>>>>>>> folder being created - don't know if that's indicative of heartbeats being
>>>>>>> processed correctly or if that's created at the start of a worker process.
>>>>>>>
>>>>>>> Is there anything else you want to see that would help you diagnose
>>>>>>> this? I can share the UI with you if you think that'd help?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>> On 13 April 2015 at 16:22, Nathan Leung <nc...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Sounds like it's having problems with heartbeats.  As I recall
>>>>>>>> worker -> supervisor heartbeats are written using the local filesystem.
>>>>>>>> What user are you running as?  In your storm.yaml file, you will see the
>>>>>>>> option "storm.local.dir".  Does the user supervisor / workers are running
>>>>>>>> as have permissions to write to this folder?
>>>>>>>>
>>>>>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> New bonus bit of information - the supervisor log has got this
>>>>>>>>> going on:
>>>>>>>>>
>>>>>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>>>>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and
>>>>>>>>> clearing state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current
>>>>>>>>> supervisor time: 1428938060. State: :timed-out, Heartbeat:
>>>>>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>>>>>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
>>>>>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>>>>>>>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill
>>>>>>>>> 5284. Process is probably already dead.
>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>>>>>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>>>>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
>>>>>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>>>>>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
>>>>>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>>>>>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>>>>>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>>> -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>>>>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>>>>>>>> -Dstorm.id=production-topology-2-1428932653
>>>>>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
>>>>>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>>>>>>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>>>>>>>
>>>>>>>>> On 13 April 2015 at 16:09, Matthew Sharpe <
>>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> OK, in checking the worker process logs I've uncovered a bit more
>>>>>>>>>> information. The UI reports 0 emitted from the wordGenerator spout, along
>>>>>>>>>> with 0 transferred, acked and failed.
>>>>>>>>>>
>>>>>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the
>>>>>>>>>> usual way of doing that), I've got a bunch of other stuff on the boxes and
>>>>>>>>>> in troubleshooting things out have been a bit laxer with security than I
>>>>>>>>>> should have been!
>>>>>>>>>>
>>>>>>>>>> The UI shows 5 workers all starting on the same node - each of
>>>>>>>>>> which has a declared uptime of 1s. Looking at the logs on the node, during
>>>>>>>>>> that time the following happens:
>>>>>>>>>>
>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>> environment:java.io.tmpdir=/tmp
>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>> environment:java.compiler=<NA>
>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>> environment:os.name=Linux
>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>> environment:os.arch=amd64
>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>> environment:os.version=3.13.0-48-generic
>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>> environment:user.name=storm
>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>> environment:user.home=/app/home/storm
>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>> environment:user.dir=/
>>>>>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>>>>>>>> production-topology-2-1428932653 on
>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>>>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>>>>>>>> "topology.fall.back.on.java.serialization" true,
>>>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>>>>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
>>>>>>>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
>>>>>>>>>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
>>>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>>>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
>>>>>>>>>> "supervisor.worker.start.timeout.secs" 120,
>>>>>>>>>> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
>>>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>>>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>>>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>>>>>>>> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
>>>>>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>>>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>>>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>>>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>>>>>>>> "/transactional", "topology.acker.executors" nil,
>>>>>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>>>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>>>>>>>> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
>>>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
>>>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>>>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>>>>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
>>>>>>>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>>>>>>>> "nimbus.topology.validator"
>>>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
>>>>>>>>>> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
>>>>>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
>>>>>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>>>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>>>>>>>> "topology.disruptor.wait.strategy"
>>>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
>>>>>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
>>>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
>>>>>>>>>> 5, "storm.thrift.transport"
>>>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>>>>>>>> "topology.state.synchronization.timeout.secs" 60,
>>>>>>>>>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>>>>>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>>>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
>>>>>>>>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>>>>>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>>>>>>>> "topology.max.task.parallelism" nil}
>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>>>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>>>>>>> 0x34cb217be05005e, negotiated timeout = 20000
>>>>>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
>>>>>>>>>> :connected:none
>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session:
>>>>>>>>>> 0x34cb217be05005e closed
>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>>>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@4b492c47
>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>>>>>>> 0x24cb217be030072, negotiated timeout = 20000
>>>>>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer
>>>>>>>>>> transport plugin:backtype.storm.messaging.zmq
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>> counter:[2 2]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>> wordGenerator:[11 11]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>> executor wordGenerator:[11 11]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>> wordGenerator:[12 12]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> __system ["startup"]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>> wordGenerator:[12 12]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>> executor wordGenerator:[12 12]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>> wordGenerator:[13 13]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> __system ["startup"]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>> wordGenerator:[13 13]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>> executor wordGenerator:[13 13]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>> wordGenerator:[14 14]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> __system ["startup"]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>> wordGenerator:[14 14]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>> executor wordGenerator:[14 14]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> default [bertels]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [bertels]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> default [mike]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [mike]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> default [bertels]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [bertels]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>> wordGenerator:[15 15]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> __system ["startup"]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>> wordGenerator:[15 15]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>> executor wordGenerator:[15 15]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>> __system:[-1 -1]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
>>>>>>>>>> ["startup"]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>> __system:[-1 -1]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>> executor __system:[-1 -1]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>>> __system:(-1)
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>> __acker:[1 1]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> default [jackson]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> default [mike]
>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [jackson]
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>> message source: wordGenerator:14, stream: default, id: {}, [mike]
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> default [jackson]
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [jackson]
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> default [golda]
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [golda]
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>>>> __system:(-1)
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
>>>>>>>>>> ["startup"]
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>> __acker:[1 1]
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> default [jackson]
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading
>>>>>>>>>> executor __acker:[1 1]
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread
>>>>>>>>>> for bd340179-5905-4233-ad5d-44b47b228177:6702
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>>> __acker:(1)
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>>>> __acker:(1)
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> default [nathan]
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> default [bertels]
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>> default [golda]
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [golda]
>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>>>>>>>>
>>>>>>>>>> After that, the logs stop (no more messages) and the UI says that
>>>>>>>>>> the 5 executors are now running on another node (and port) - at which point
>>>>>>>>>> the worker log is created on that node and with that number (same
>>>>>>>>>> messages).
>>>>>>>>>>
>>>>>>>>>> So the worker seems to start and start emitting words but the UI
>>>>>>>>>> isn't registered anything as having been emitted. After a short period of
>>>>>>>>>> time the worker seems to stop without logging anything and so a new worker
>>>>>>>>>> is started.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Matt
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Did you check the worker process logs?  What are the emit counts
>>>>>>>>>>> for the components in the UI?
>>>>>>>>>>>
>>>>>>>>>>> Also I have not run on AWS personally but the following line
>>>>>>>>>>> seems strange to me:
>>>>>>>>>>>
>>>>>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>>>> compute.amazonaws.com"}
>>>>>>>>>>>
>>>>>>>>>>> Generally this would be bound to a specific machine, not
>>>>>>>>>>> *.<domain>.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I've set up a small Storm cluster on AWS that is giving me
>>>>>>>>>>>> headaches. At the moment I've got a situation where the cluster (a nimbus
>>>>>>>>>>>> node and 3 supervisors) will successfully run in local mode my particular
>>>>>>>>>>>> topology. However, as soon as I try to run the topology in distributed mode
>>>>>>>>>>>> there are no errors (that I can see) in any of the logs - the topology
>>>>>>>>>>>> appears in the UI but never shows any tuples emitted by the spouts.
>>>>>>>>>>>>
>>>>>>>>>>>> In order to check it wasn't my topology I cloned the
>>>>>>>>>>>> storm-starter project and have tried with the RollingTopWords example -
>>>>>>>>>>>> running locally I get a stream of output but as soon as I specify that we
>>>>>>>>>>>> should run in a distributed fashion I get a success message (topology
>>>>>>>>>>>> successfully submitted). The topology appears on the UI as successfully
>>>>>>>>>>>> submitted but I see no progress.
>>>>>>>>>>>>
>>>>>>>>>>>> It's a bit of a weird one as I don't know which logs to share
>>>>>>>>>>>> to show the problem as the whole point is I'm not seeing any errors!
>>>>>>>>>>>>
>>>>>>>>>>>> The last entry in the supervisor logs pertains to the set up of
>>>>>>>>>>>> the supervisor (i.e. it's not logged anything about the topologies that
>>>>>>>>>>>> have been submitted)
>>>>>>>>>>>>
>>>>>>>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>>>>>>>> illuminating:
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>>>>>>>>>>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>>>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2
>>>>>>>>>>>> 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13
>>>>>>>>>>>> 13] [14 14] [15 15] [1 1]]
>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment
>>>>>>>>>>>> for topology id production-topology-2-1428932653:
>>>>>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>>>>>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>>>>> compute.amazonaws.com"}, :executor->node+port {[2 2]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [3 3]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>>>>>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>>>>>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>>>>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>>>>>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>>>>>>>
>>>>>>>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Matt
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Check out my blog: dogdogfish.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Check out my blog: dogdogfish.com
>>>
>>
>>
>
>
> --
> Check out my blog: dogdogfish.com
>

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Matthew Sharpe <ma...@gmail.com>.
I can only think that I'll be using whatever the default is - that's not a
setting that I've changed anywhere or set on any command line. Am I right
in thinking (hoping) that that'll default to one? If not, how do I set it
for the RollingTopWords example?

The cluster is being spun up now but will probably take half an hour or so
before it's ready. I'll reply once it's all up and good to go.

On 13 April 2015 at 18:40, Nathan Leung <nc...@gmail.com> wrote:

> That looks about right.  Are you running one worker per supervisor right
> now?
>
> On Mon, Apr 13, 2015 at 1:33 PM, Matthew Sharpe <ma...@gmail.com>
> wrote:
>
>> I was just thinking about that - I'm using EC2 m3.medium instances,
>> they're not especially RAM heavy (1GB). Am I right in thinking that to
>> enable GC logging I'd just need to run Storm with the following options in
>> my 'storm.yaml'?
>>
>> worker.childopts: "-Xmx768m -XX:MaxPermSize=256m -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps -verbose:gc
>> -Xloggc:/app/storm/logs/gc-storm-worker-%ID%.log"
>> supervisor.childopts: "-XX:MaxPermSize=256m -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps -verbose:gc
>> -Xloggc:/app/storm/logs/gc-storm-supervisor.log"
>> If so, I'll set up the same cluster but using m3.xlarge instances (15GB
>> of RAM) with those settings. Either way, I'll start setting up the extra
>> large nodes.
>>
>>
>> On 13 April 2015 at 18:10, Nathan Leung <nc...@gmail.com> wrote:
>>
>>> Do you have GC logging enabled?  You should have enough ram for each
>>> worker (for this example) that it's not a problem, but doesn't hurt to
>>> check.
>>>
>>> On Mon, Apr 13, 2015 at 12:12 PM, Matthew Sharpe <
>>> matthewsharpe3@gmail.com> wrote:
>>>
>>>> Not long at all - certainly less than couple of minutes but longer than
>>>> a few seconds, the couple I actually measured lasted 42 seconds followed by
>>>> about 5 seconds. So the heartbeats folder behaves like so...
>>>>
>>>> /app/storm/workers/[uuid] is created with an empty heartbeats folder
>>>> inside it
>>>> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a couple
>>>> of examples:
>>>>
>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941026297.version
>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297
>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941027240.version
>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240
>>>>
>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941398235.version
>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235
>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941399163.version
>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163
>>>> )
>>>>
>>>> /app/storm/workers/[uuid] is deleted.
>>>>
>>>> I've not seen any heartbeats folder with any more than 2 heartbeats in.
>>>>
>>>>
>>>>
>>>> On 13 April 2015 at 17:00, Nathan Leung <nc...@gmail.com> wrote:
>>>>
>>>>> It only matters what user supervisor is run as, so you should be ok.
>>>>> How long is it between the worker being started, and it getting killed?
>>>>> What are the timestamps of the heartbeats in the corresponding worker
>>>>> folder?
>>>>>
>>>>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <
>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>
>>>>>> I'm running storm as a (specially created) storm user - on both the
>>>>>> nimbus and the supervisors the process is showing as being run by the storm
>>>>>> user. And I've got the storm local directory set as /app/storm - owned by
>>>>>> storm user and with permissions 700. Thought it might be due to me
>>>>>> submitting the job as a different user but have just terminated the old job
>>>>>> and submitted the new one as the storm user and still not getting any
>>>>>> tuples coming through.
>>>>>>
>>>>>> For what it's worth, I've looked in the local data directory on the
>>>>>> supervisors and in the workers folder there is a (non-empty) heartbeats
>>>>>> folder being created - don't know if that's indicative of heartbeats being
>>>>>> processed correctly or if that's created at the start of a worker process.
>>>>>>
>>>>>> Is there anything else you want to see that would help you diagnose
>>>>>> this? I can share the UI with you if you think that'd help?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> On 13 April 2015 at 16:22, Nathan Leung <nc...@gmail.com> wrote:
>>>>>>
>>>>>>> Sounds like it's having problems with heartbeats.  As I recall
>>>>>>> worker -> supervisor heartbeats are written using the local filesystem.
>>>>>>> What user are you running as?  In your storm.yaml file, you will see the
>>>>>>> option "storm.local.dir".  Does the user supervisor / workers are running
>>>>>>> as have permissions to write to this folder?
>>>>>>>
>>>>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>
>>>>>>>> New bonus bit of information - the supervisor log has got this
>>>>>>>> going on:
>>>>>>>>
>>>>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>>>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and
>>>>>>>> clearing state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current
>>>>>>>> supervisor time: 1428938060. State: :timed-out, Heartbeat:
>>>>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>>>>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
>>>>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>>>>>>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill 5284.
>>>>>>>> Process is probably already dead.
>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>>>>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>>>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
>>>>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>>>>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
>>>>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>>>>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>>>>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>> -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>>>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>>>>>>> -Dstorm.id=production-topology-2-1428932653
>>>>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
>>>>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>>>>>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>>>>>>
>>>>>>>> On 13 April 2015 at 16:09, Matthew Sharpe <matthewsharpe3@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> OK, in checking the worker process logs I've uncovered a bit more
>>>>>>>>> information. The UI reports 0 emitted from the wordGenerator spout, along
>>>>>>>>> with 0 transferred, acked and failed.
>>>>>>>>>
>>>>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the
>>>>>>>>> usual way of doing that), I've got a bunch of other stuff on the boxes and
>>>>>>>>> in troubleshooting things out have been a bit laxer with security than I
>>>>>>>>> should have been!
>>>>>>>>>
>>>>>>>>> The UI shows 5 workers all starting on the same node - each of
>>>>>>>>> which has a declared uptime of 1s. Looking at the logs on the node, during
>>>>>>>>> that time the following happens:
>>>>>>>>>
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:java.io.tmpdir=/tmp
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:java.compiler=<NA>
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:os.name=Linux
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:os.arch=amd64
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:os.version=3.13.0-48-generic
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:user.name=storm
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:user.home=/app/home/storm
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:user.dir=/
>>>>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>>>>>>> production-topology-2-1428932653 on
>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>>>>>>> "topology.fall.back.on.java.serialization" true,
>>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>>>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
>>>>>>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
>>>>>>>>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
>>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
>>>>>>>>> "supervisor.worker.start.timeout.secs" 120,
>>>>>>>>> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
>>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>>>>>>> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
>>>>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>>>>>>> "/transactional", "topology.acker.executors" nil,
>>>>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>>>>>>> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
>>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
>>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>>>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
>>>>>>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>>>>>>> "nimbus.topology.validator"
>>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
>>>>>>>>> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
>>>>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
>>>>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>>>>>>> "topology.disruptor.wait.strategy"
>>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
>>>>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
>>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
>>>>>>>>> 5, "storm.thrift.transport"
>>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>>>>>>> "topology.state.synchronization.timeout.secs" 60,
>>>>>>>>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>>>>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
>>>>>>>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>>>>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>>>>>>> "topology.max.task.parallelism" nil}
>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>>>>>> 0x34cb217be05005e, negotiated timeout = 20000
>>>>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
>>>>>>>>> :connected:none
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session:
>>>>>>>>> 0x34cb217be05005e closed
>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@4b492c47
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>>>>>> 0x24cb217be030072, negotiated timeout = 20000
>>>>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer
>>>>>>>>> transport plugin:backtype.storm.messaging.zmq
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>> counter:[2 2]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>> wordGenerator:[11 11]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>> executor wordGenerator:[11 11]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>> wordGenerator:(11)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>> wordGenerator:(11)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>> wordGenerator:(11)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>> wordGenerator:[12 12]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> __system ["startup"]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>> wordGenerator:[12 12]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>> executor wordGenerator:[12 12]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>> wordGenerator:(12)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>> wordGenerator:(12)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>> wordGenerator:(12)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>> wordGenerator:[13 13]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> __system ["startup"]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>> wordGenerator:[13 13]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>> executor wordGenerator:[13 13]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>> wordGenerator:(13)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>> wordGenerator:(13)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>> wordGenerator:(13)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>> wordGenerator:[14 14]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> __system ["startup"]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>> wordGenerator:[14 14]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>> executor wordGenerator:[14 14]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [bertels]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>> wordGenerator:(14)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>> wordGenerator:(14)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>> wordGenerator:(14)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [bertels]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [mike]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [mike]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [bertels]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [bertels]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>> wordGenerator:[15 15]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> __system ["startup"]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>> wordGenerator:[15 15]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>> executor wordGenerator:[15 15]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>> wordGenerator:(15)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>> wordGenerator:(15)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>> wordGenerator:(15)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>> __system:[-1 -1]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
>>>>>>>>> ["startup"]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>> __system:[-1 -1]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>> executor __system:[-1 -1]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>> __system:(-1)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>> __acker:[1 1]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [jackson]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [mike]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [jackson]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:14, stream: default, id: {}, [mike]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [jackson]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [jackson]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [golda]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [golda]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>>> __system:(-1)
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
>>>>>>>>> ["startup"]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>> __acker:[1 1]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [jackson]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading
>>>>>>>>> executor __acker:[1 1]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread
>>>>>>>>> for bd340179-5905-4233-ad5d-44b47b228177:6702
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>> __acker:(1)
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1)
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [nathan]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [bertels]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [golda]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [golda]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>>>>>>>
>>>>>>>>> After that, the logs stop (no more messages) and the UI says that
>>>>>>>>> the 5 executors are now running on another node (and port) - at which point
>>>>>>>>> the worker log is created on that node and with that number (same
>>>>>>>>> messages).
>>>>>>>>>
>>>>>>>>> So the worker seems to start and start emitting words but the UI
>>>>>>>>> isn't registered anything as having been emitted. After a short period of
>>>>>>>>> time the worker seems to stop without logging anything and so a new worker
>>>>>>>>> is started.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Matt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Did you check the worker process logs?  What are the emit counts
>>>>>>>>>> for the components in the UI?
>>>>>>>>>>
>>>>>>>>>> Also I have not run on AWS personally but the following line
>>>>>>>>>> seems strange to me:
>>>>>>>>>>
>>>>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>>> compute.amazonaws.com"}
>>>>>>>>>>
>>>>>>>>>> Generally this would be bound to a specific machine, not
>>>>>>>>>> *.<domain>.
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I've set up a small Storm cluster on AWS that is giving me
>>>>>>>>>>> headaches. At the moment I've got a situation where the cluster (a nimbus
>>>>>>>>>>> node and 3 supervisors) will successfully run in local mode my particular
>>>>>>>>>>> topology. However, as soon as I try to run the topology in distributed mode
>>>>>>>>>>> there are no errors (that I can see) in any of the logs - the topology
>>>>>>>>>>> appears in the UI but never shows any tuples emitted by the spouts.
>>>>>>>>>>>
>>>>>>>>>>> In order to check it wasn't my topology I cloned the
>>>>>>>>>>> storm-starter project and have tried with the RollingTopWords example -
>>>>>>>>>>> running locally I get a stream of output but as soon as I specify that we
>>>>>>>>>>> should run in a distributed fashion I get a success message (topology
>>>>>>>>>>> successfully submitted). The topology appears on the UI as successfully
>>>>>>>>>>> submitted but I see no progress.
>>>>>>>>>>>
>>>>>>>>>>> It's a bit of a weird one as I don't know which logs to share to
>>>>>>>>>>> show the problem as the whole point is I'm not seeing any errors!
>>>>>>>>>>>
>>>>>>>>>>> The last entry in the supervisor logs pertains to the set up of
>>>>>>>>>>> the supervisor (i.e. it's not logged anything about the topologies that
>>>>>>>>>>> have been submitted)
>>>>>>>>>>>
>>>>>>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>>>>>>> illuminating:
>>>>>>>>>>>
>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>>>>>>>>>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2
>>>>>>>>>>> 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13
>>>>>>>>>>> 13] [14 14] [15 15] [1 1]]
>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment
>>>>>>>>>>> for topology id production-topology-2-1428932653:
>>>>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>>>>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
>>>>>>>>>>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702],
>>>>>>>>>>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>>>>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>>>>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>>>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>>>>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>>>>>>
>>>>>>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Matt
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Check out my blog: dogdogfish.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Check out my blog: dogdogfish.com
>>>>
>>>
>>>
>>
>>
>> --
>> Check out my blog: dogdogfish.com
>>
>
>


-- 
Check out my blog: dogdogfish.com

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Nathan Leung <nc...@gmail.com>.
That looks about right.  Are you running one worker per supervisor right
now?

On Mon, Apr 13, 2015 at 1:33 PM, Matthew Sharpe <ma...@gmail.com>
wrote:

> I was just thinking about that - I'm using EC2 m3.medium instances,
> they're not especially RAM heavy (1GB). Am I right in thinking that to
> enable GC logging I'd just need to run Storm with the following options in
> my 'storm.yaml'?
>
> worker.childopts: "-Xmx768m -XX:MaxPermSize=256m -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -verbose:gc
> -Xloggc:/app/storm/logs/gc-storm-worker-%ID%.log"
> supervisor.childopts: "-XX:MaxPermSize=256m -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -verbose:gc
> -Xloggc:/app/storm/logs/gc-storm-supervisor.log"
> If so, I'll set up the same cluster but using m3.xlarge instances (15GB of
> RAM) with those settings. Either way, I'll start setting up the extra large
> nodes.
>
>
> On 13 April 2015 at 18:10, Nathan Leung <nc...@gmail.com> wrote:
>
>> Do you have GC logging enabled?  You should have enough ram for each
>> worker (for this example) that it's not a problem, but doesn't hurt to
>> check.
>>
>> On Mon, Apr 13, 2015 at 12:12 PM, Matthew Sharpe <
>> matthewsharpe3@gmail.com> wrote:
>>
>>> Not long at all - certainly less than couple of minutes but longer than
>>> a few seconds, the couple I actually measured lasted 42 seconds followed by
>>> about 5 seconds. So the heartbeats folder behaves like so...
>>>
>>> /app/storm/workers/[uuid] is created with an empty heartbeats folder
>>> inside it
>>> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a couple
>>> of examples:
>>>
>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941026297.version
>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297
>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941027240.version
>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240
>>>
>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941398235.version
>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235
>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941399163.version
>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163
>>> )
>>>
>>> /app/storm/workers/[uuid] is deleted.
>>>
>>> I've not seen any heartbeats folder with any more than 2 heartbeats in.
>>>
>>>
>>>
>>> On 13 April 2015 at 17:00, Nathan Leung <nc...@gmail.com> wrote:
>>>
>>>> It only matters what user supervisor is run as, so you should be ok.
>>>> How long is it between the worker being started, and it getting killed?
>>>> What are the timestamps of the heartbeats in the corresponding worker
>>>> folder?
>>>>
>>>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <
>>>> matthewsharpe3@gmail.com> wrote:
>>>>
>>>>> I'm running storm as a (specially created) storm user - on both the
>>>>> nimbus and the supervisors the process is showing as being run by the storm
>>>>> user. And I've got the storm local directory set as /app/storm - owned by
>>>>> storm user and with permissions 700. Thought it might be due to me
>>>>> submitting the job as a different user but have just terminated the old job
>>>>> and submitted the new one as the storm user and still not getting any
>>>>> tuples coming through.
>>>>>
>>>>> For what it's worth, I've looked in the local data directory on the
>>>>> supervisors and in the workers folder there is a (non-empty) heartbeats
>>>>> folder being created - don't know if that's indicative of heartbeats being
>>>>> processed correctly or if that's created at the start of a worker process.
>>>>>
>>>>> Is there anything else you want to see that would help you diagnose
>>>>> this? I can share the UI with you if you think that'd help?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> On 13 April 2015 at 16:22, Nathan Leung <nc...@gmail.com> wrote:
>>>>>
>>>>>> Sounds like it's having problems with heartbeats.  As I recall worker
>>>>>> -> supervisor heartbeats are written using the local filesystem.  What user
>>>>>> are you running as?  In your storm.yaml file, you will see the option
>>>>>> "storm.local.dir".  Does the user supervisor / workers are running as have
>>>>>> permissions to write to this folder?
>>>>>>
>>>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>
>>>>>>> New bonus bit of information - the supervisor log has got this going
>>>>>>> on:
>>>>>>>
>>>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and
>>>>>>> clearing state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current
>>>>>>> supervisor time: 1428938060. State: :timed-out, Heartbeat:
>>>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>>>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
>>>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>>>>>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill 5284.
>>>>>>> Process is probably already dead.
>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>>>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
>>>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>>>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
>>>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>>>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>>>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>> -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>>>>>> -Dstorm.id=production-topology-2-1428932653
>>>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
>>>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>>>>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>>>>>
>>>>>>> On 13 April 2015 at 16:09, Matthew Sharpe <ma...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> OK, in checking the worker process logs I've uncovered a bit more
>>>>>>>> information. The UI reports 0 emitted from the wordGenerator spout, along
>>>>>>>> with 0 transferred, acked and failed.
>>>>>>>>
>>>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the
>>>>>>>> usual way of doing that), I've got a bunch of other stuff on the boxes and
>>>>>>>> in troubleshooting things out have been a bit laxer with security than I
>>>>>>>> should have been!
>>>>>>>>
>>>>>>>> The UI shows 5 workers all starting on the same node - each of
>>>>>>>> which has a declared uptime of 1s. Looking at the logs on the node, during
>>>>>>>> that time the following happens:
>>>>>>>>
>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>> environment:java.io.tmpdir=/tmp
>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>> environment:java.compiler=<NA>
>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>> environment:os.name=Linux
>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>> environment:os.arch=amd64
>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>> environment:os.version=3.13.0-48-generic
>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>> environment:user.name=storm
>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>> environment:user.home=/app/home/storm
>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>> environment:user.dir=/
>>>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>>>>>> production-topology-2-1428932653 on
>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>>>>>> "topology.fall.back.on.java.serialization" true,
>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
>>>>>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
>>>>>>>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
>>>>>>>> "supervisor.worker.start.timeout.secs" 120,
>>>>>>>> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>>>>>> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
>>>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>>>>>> "/transactional", "topology.acker.executors" nil,
>>>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>>>>>> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
>>>>>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>>>>>> "nimbus.topology.validator"
>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
>>>>>>>> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
>>>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
>>>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>>>>>> "topology.disruptor.wait.strategy"
>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
>>>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
>>>>>>>> 5, "storm.thrift.transport"
>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>>>>>> "topology.state.synchronization.timeout.secs" 60,
>>>>>>>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>>>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
>>>>>>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>>>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>>>>>> "topology.max.task.parallelism" nil}
>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>>>>> 0x34cb217be05005e, negotiated timeout = 20000
>>>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
>>>>>>>> :connected:none
>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session:
>>>>>>>> 0x34cb217be05005e closed
>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@4b492c47
>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>>>>> 0x24cb217be030072, negotiated timeout = 20000
>>>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer
>>>>>>>> transport plugin:backtype.storm.messaging.zmq
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>> counter:[2 2]
>>>>>>>>
>>>>>>>>
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>> wordGenerator:[11 11]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>>>> wordGenerator:[11 11]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>> wordGenerator:(11)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>> wordGenerator:(11)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>> wordGenerator:(11)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>> wordGenerator:[12 12]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> __system ["startup"]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>> wordGenerator:[12 12]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>>>> wordGenerator:[12 12]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>> wordGenerator:(12)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>> wordGenerator:(12)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>> wordGenerator:(12)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>> wordGenerator:[13 13]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> __system ["startup"]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>> wordGenerator:[13 13]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>>>> wordGenerator:[13 13]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>> wordGenerator:(13)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>> wordGenerator:(13)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>> wordGenerator:(13)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>> wordGenerator:[14 14]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> __system ["startup"]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>> wordGenerator:[14 14]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>>>> wordGenerator:[14 14]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> default [bertels]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>> wordGenerator:(14)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>> wordGenerator:(14)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>> wordGenerator:(14)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [bertels]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> default [mike]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [mike]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> default [bertels]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [bertels]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>> wordGenerator:[15 15]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> __system ["startup"]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>> wordGenerator:[15 15]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>>>> wordGenerator:[15 15]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>> wordGenerator:(15)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>> wordGenerator:(15)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>> wordGenerator:(15)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>> __system:[-1 -1]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
>>>>>>>> ["startup"]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>> __system:[-1 -1]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>>>> __system:[-1 -1]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt
>>>>>>>> __system:(-1)
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>> __acker:[1 1]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> default [jackson]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> default [mike]
>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [jackson]
>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>> message source: wordGenerator:14, stream: default, id: {}, [mike]
>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> default [jackson]
>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [jackson]
>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> default [golda]
>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [golda]
>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>> __system:(-1)
>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
>>>>>>>> ["startup"]
>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>> __acker:[1 1]
>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> default [jackson]
>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading executor
>>>>>>>> __acker:[1 1]
>>>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread
>>>>>>>> for bd340179-5905-4233-ad5d-44b47b228177:6702
>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt __acker:(1)
>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1)
>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> default [nathan]
>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> default [bertels]
>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>> default [golda]
>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [golda]
>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>>>>>>
>>>>>>>> After that, the logs stop (no more messages) and the UI says that
>>>>>>>> the 5 executors are now running on another node (and port) - at which point
>>>>>>>> the worker log is created on that node and with that number (same
>>>>>>>> messages).
>>>>>>>>
>>>>>>>> So the worker seems to start and start emitting words but the UI
>>>>>>>> isn't registered anything as having been emitted. After a short period of
>>>>>>>> time the worker seems to stop without logging anything and so a new worker
>>>>>>>> is started.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Matt
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Did you check the worker process logs?  What are the emit counts
>>>>>>>>> for the components in the UI?
>>>>>>>>>
>>>>>>>>> Also I have not run on AWS personally but the following line seems
>>>>>>>>> strange to me:
>>>>>>>>>
>>>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>> compute.amazonaws.com"}
>>>>>>>>>
>>>>>>>>> Generally this would be bound to a specific machine, not
>>>>>>>>> *.<domain>.
>>>>>>>>>
>>>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I've set up a small Storm cluster on AWS that is giving me
>>>>>>>>>> headaches. At the moment I've got a situation where the cluster (a nimbus
>>>>>>>>>> node and 3 supervisors) will successfully run in local mode my particular
>>>>>>>>>> topology. However, as soon as I try to run the topology in distributed mode
>>>>>>>>>> there are no errors (that I can see) in any of the logs - the topology
>>>>>>>>>> appears in the UI but never shows any tuples emitted by the spouts.
>>>>>>>>>>
>>>>>>>>>> In order to check it wasn't my topology I cloned the
>>>>>>>>>> storm-starter project and have tried with the RollingTopWords example -
>>>>>>>>>> running locally I get a stream of output but as soon as I specify that we
>>>>>>>>>> should run in a distributed fashion I get a success message (topology
>>>>>>>>>> successfully submitted). The topology appears on the UI as successfully
>>>>>>>>>> submitted but I see no progress.
>>>>>>>>>>
>>>>>>>>>> It's a bit of a weird one as I don't know which logs to share to
>>>>>>>>>> show the problem as the whole point is I'm not seeing any errors!
>>>>>>>>>>
>>>>>>>>>> The last entry in the supervisor logs pertains to the set up of
>>>>>>>>>> the supervisor (i.e. it's not logged anything about the topologies that
>>>>>>>>>> have been submitted)
>>>>>>>>>>
>>>>>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>>>>>> illuminating:
>>>>>>>>>>
>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>>>>>>>>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2
>>>>>>>>>> 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13
>>>>>>>>>> 13] [14 14] [15 15] [1 1]]
>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment
>>>>>>>>>> for topology id production-topology-2-1428932653:
>>>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>>>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
>>>>>>>>>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702],
>>>>>>>>>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>>>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>>>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>>>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>>>>>
>>>>>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Matt
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Check out my blog: dogdogfish.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Check out my blog: dogdogfish.com
>>>
>>
>>
>
>
> --
> Check out my blog: dogdogfish.com
>

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Matthew Sharpe <ma...@gmail.com>.
I was just thinking about that - I'm using EC2 m3.medium instances, they're
not especially RAM heavy (1GB). Am I right in thinking that to enable GC
logging I'd just need to run Storm with the following options in my
'storm.yaml'?

worker.childopts: "-Xmx768m -XX:MaxPermSize=256m -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -verbose:gc
-Xloggc:/app/storm/logs/gc-storm-worker-%ID%.log"
supervisor.childopts: "-XX:MaxPermSize=256m -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -verbose:gc
-Xloggc:/app/storm/logs/gc-storm-supervisor.log"
If so, I'll set up the same cluster but using m3.xlarge instances (15GB of
RAM) with those settings. Either way, I'll start setting up the extra large
nodes.


On 13 April 2015 at 18:10, Nathan Leung <nc...@gmail.com> wrote:

> Do you have GC logging enabled?  You should have enough ram for each
> worker (for this example) that it's not a problem, but doesn't hurt to
> check.
>
> On Mon, Apr 13, 2015 at 12:12 PM, Matthew Sharpe <matthewsharpe3@gmail.com
> > wrote:
>
>> Not long at all - certainly less than couple of minutes but longer than a
>> few seconds, the couple I actually measured lasted 42 seconds followed by
>> about 5 seconds. So the heartbeats folder behaves like so...
>>
>> /app/storm/workers/[uuid] is created with an empty heartbeats folder
>> inside it
>> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a couple
>> of examples:
>>
>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941026297.version
>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297
>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941027240.version
>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240
>>
>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941398235.version
>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235
>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941399163.version
>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163
>> )
>>
>> /app/storm/workers/[uuid] is deleted.
>>
>> I've not seen any heartbeats folder with any more than 2 heartbeats in.
>>
>>
>>
>> On 13 April 2015 at 17:00, Nathan Leung <nc...@gmail.com> wrote:
>>
>>> It only matters what user supervisor is run as, so you should be ok.
>>> How long is it between the worker being started, and it getting killed?
>>> What are the timestamps of the heartbeats in the corresponding worker
>>> folder?
>>>
>>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <
>>> matthewsharpe3@gmail.com> wrote:
>>>
>>>> I'm running storm as a (specially created) storm user - on both the
>>>> nimbus and the supervisors the process is showing as being run by the storm
>>>> user. And I've got the storm local directory set as /app/storm - owned by
>>>> storm user and with permissions 700. Thought it might be due to me
>>>> submitting the job as a different user but have just terminated the old job
>>>> and submitted the new one as the storm user and still not getting any
>>>> tuples coming through.
>>>>
>>>> For what it's worth, I've looked in the local data directory on the
>>>> supervisors and in the workers folder there is a (non-empty) heartbeats
>>>> folder being created - don't know if that's indicative of heartbeats being
>>>> processed correctly or if that's created at the start of a worker process.
>>>>
>>>> Is there anything else you want to see that would help you diagnose
>>>> this? I can share the UI with you if you think that'd help?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> On 13 April 2015 at 16:22, Nathan Leung <nc...@gmail.com> wrote:
>>>>
>>>>> Sounds like it's having problems with heartbeats.  As I recall worker
>>>>> -> supervisor heartbeats are written using the local filesystem.  What user
>>>>> are you running as?  In your storm.yaml file, you will see the option
>>>>> "storm.local.dir".  Does the user supervisor / workers are running as have
>>>>> permissions to write to this folder?
>>>>>
>>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>
>>>>>> New bonus bit of information - the supervisor log has got this going
>>>>>> on:
>>>>>>
>>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and
>>>>>> clearing state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current
>>>>>> supervisor time: 1428938060. State: :timed-out, Heartbeat:
>>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
>>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>>>>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill 5284.
>>>>>> Process is probably already dead.
>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
>>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
>>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>> -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>>>>> -Dstorm.id=production-topology-2-1428932653
>>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
>>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>>>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>>>>
>>>>>> On 13 April 2015 at 16:09, Matthew Sharpe <ma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> OK, in checking the worker process logs I've uncovered a bit more
>>>>>>> information. The UI reports 0 emitted from the wordGenerator spout, along
>>>>>>> with 0 transferred, acked and failed.
>>>>>>>
>>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the
>>>>>>> usual way of doing that), I've got a bunch of other stuff on the boxes and
>>>>>>> in troubleshooting things out have been a bit laxer with security than I
>>>>>>> should have been!
>>>>>>>
>>>>>>> The UI shows 5 workers all starting on the same node - each of which
>>>>>>> has a declared uptime of 1s. Looking at the logs on the node, during that
>>>>>>> time the following happens:
>>>>>>>
>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>> environment:java.io.tmpdir=/tmp
>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>> environment:java.compiler=<NA>
>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>> environment:os.name=Linux
>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>> environment:os.arch=amd64
>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>> environment:os.version=3.13.0-48-generic
>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>> environment:user.name=storm
>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>> environment:user.home=/app/home/storm
>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>> environment:user.dir=/
>>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>>>>> production-topology-2-1428932653 on
>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>>>>> "topology.fall.back.on.java.serialization" true,
>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
>>>>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
>>>>>>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
>>>>>>> "supervisor.worker.start.timeout.secs" 120,
>>>>>>> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>>>>> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
>>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>>>>> "/transactional", "topology.acker.executors" nil,
>>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>>>>> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
>>>>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>>>>> "nimbus.topology.validator"
>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
>>>>>>> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
>>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
>>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>>>>> "topology.disruptor.wait.strategy"
>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
>>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
>>>>>>> 5, "storm.thrift.transport"
>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>>>>> "topology.state.synchronization.timeout.secs" 60,
>>>>>>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
>>>>>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>>>>> "topology.max.task.parallelism" nil}
>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>>>> 0x34cb217be05005e, negotiated timeout = 20000
>>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
>>>>>>> :connected:none
>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session:
>>>>>>> 0x34cb217be05005e closed
>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@4b492c47
>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>>>> 0x24cb217be030072, negotiated timeout = 20000
>>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer
>>>>>>> transport plugin:backtype.storm.messaging.zmq
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>> counter:[2 2]
>>>>>>>
>>>>>>>
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>> wordGenerator:[11 11]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>>> wordGenerator:[11 11]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>> wordGenerator:(11)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>> wordGenerator:(11)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>> wordGenerator:(11)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>> wordGenerator:[12 12]
>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> __system ["startup"]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>> wordGenerator:[12 12]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>>> wordGenerator:[12 12]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>> wordGenerator:(12)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>> wordGenerator:(12)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>> wordGenerator:(12)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>> wordGenerator:[13 13]
>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> __system ["startup"]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>> wordGenerator:[13 13]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>>> wordGenerator:[13 13]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>> wordGenerator:(13)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>> wordGenerator:(13)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>> wordGenerator:(13)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>> wordGenerator:[14 14]
>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> __system ["startup"]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>> wordGenerator:[14 14]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>>> wordGenerator:[14 14]
>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> default [bertels]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>> wordGenerator:(14)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>> wordGenerator:(14)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>> wordGenerator:(14)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [bertels]
>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> default [mike]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [mike]
>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> default [bertels]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [bertels]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>> wordGenerator:[15 15]
>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> __system ["startup"]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>> wordGenerator:[15 15]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>>> wordGenerator:[15 15]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>> wordGenerator:(15)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>> wordGenerator:(15)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>> wordGenerator:(15)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>> __system:[-1 -1]
>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
>>>>>>> ["startup"]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>> __system:[-1 -1]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>>> __system:[-1 -1]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt
>>>>>>> __system:(-1)
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>> __acker:[1 1]
>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> default [jackson]
>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> default [mike]
>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [jackson]
>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>> message source: wordGenerator:14, stream: default, id: {}, [mike]
>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> default [jackson]
>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [jackson]
>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> default [golda]
>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [golda]
>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __system:(-1)
>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
>>>>>>> ["startup"]
>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>> __acker:[1 1]
>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> default [jackson]
>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading executor
>>>>>>> __acker:[1 1]
>>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread for
>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702
>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt __acker:(1)
>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1)
>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> default [nathan]
>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> default [bertels]
>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>> default [golda]
>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [golda]
>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>>>>>
>>>>>>> After that, the logs stop (no more messages) and the UI says that
>>>>>>> the 5 executors are now running on another node (and port) - at which point
>>>>>>> the worker log is created on that node and with that number (same
>>>>>>> messages).
>>>>>>>
>>>>>>> So the worker seems to start and start emitting words but the UI
>>>>>>> isn't registered anything as having been emitted. After a short period of
>>>>>>> time the worker seems to stop without logging anything and so a new worker
>>>>>>> is started.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Matt
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Did you check the worker process logs?  What are the emit counts
>>>>>>>> for the components in the UI?
>>>>>>>>
>>>>>>>> Also I have not run on AWS personally but the following line seems
>>>>>>>> strange to me:
>>>>>>>>
>>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>> compute.amazonaws.com"}
>>>>>>>>
>>>>>>>> Generally this would be bound to a specific machine, not *.<domain>.
>>>>>>>>
>>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I've set up a small Storm cluster on AWS that is giving me
>>>>>>>>> headaches. At the moment I've got a situation where the cluster (a nimbus
>>>>>>>>> node and 3 supervisors) will successfully run in local mode my particular
>>>>>>>>> topology. However, as soon as I try to run the topology in distributed mode
>>>>>>>>> there are no errors (that I can see) in any of the logs - the topology
>>>>>>>>> appears in the UI but never shows any tuples emitted by the spouts.
>>>>>>>>>
>>>>>>>>> In order to check it wasn't my topology I cloned the storm-starter
>>>>>>>>> project and have tried with the RollingTopWords example - running locally I
>>>>>>>>> get a stream of output but as soon as I specify that we should run in a
>>>>>>>>> distributed fashion I get a success message (topology successfully
>>>>>>>>> submitted). The topology appears on the UI as successfully submitted but I
>>>>>>>>> see no progress.
>>>>>>>>>
>>>>>>>>> It's a bit of a weird one as I don't know which logs to share to
>>>>>>>>> show the problem as the whole point is I'm not seeing any errors!
>>>>>>>>>
>>>>>>>>> The last entry in the supervisor logs pertains to the set up of
>>>>>>>>> the supervisor (i.e. it's not logged anything about the topologies that
>>>>>>>>> have been submitted)
>>>>>>>>>
>>>>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>>>>> illuminating:
>>>>>>>>>
>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>>>>>>>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 2]
>>>>>>>>> [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13]
>>>>>>>>> [14 14] [15 15] [1 1]]
>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment for
>>>>>>>>> topology id production-topology-2-1428932653:
>>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
>>>>>>>>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702],
>>>>>>>>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>>>>
>>>>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Matt
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Check out my blog: dogdogfish.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Check out my blog: dogdogfish.com
>>>>
>>>
>>>
>>
>>
>> --
>> Check out my blog: dogdogfish.com
>>
>
>


-- 
Check out my blog: dogdogfish.com

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Nathan Leung <nc...@gmail.com>.
Do you have GC logging enabled?  You should have enough ram for each worker
(for this example) that it's not a problem, but doesn't hurt to check.

On Mon, Apr 13, 2015 at 12:12 PM, Matthew Sharpe <ma...@gmail.com>
wrote:

> Not long at all - certainly less than couple of minutes but longer than a
> few seconds, the couple I actually measured lasted 42 seconds followed by
> about 5 seconds. So the heartbeats folder behaves like so...
>
> /app/storm/workers/[uuid] is created with an empty heartbeats folder
> inside it
> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a couple of
> examples:
>
> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941026297.version
> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297
> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941027240.version
> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240
>
> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941398235.version
> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235
> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941399163.version
> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163
> )
>
> /app/storm/workers/[uuid] is deleted.
>
> I've not seen any heartbeats folder with any more than 2 heartbeats in.
>
>
>
> On 13 April 2015 at 17:00, Nathan Leung <nc...@gmail.com> wrote:
>
>> It only matters what user supervisor is run as, so you should be ok.  How
>> long is it between the worker being started, and it getting killed?  What
>> are the timestamps of the heartbeats in the corresponding worker folder?
>>
>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <
>> matthewsharpe3@gmail.com> wrote:
>>
>>> I'm running storm as a (specially created) storm user - on both the
>>> nimbus and the supervisors the process is showing as being run by the storm
>>> user. And I've got the storm local directory set as /app/storm - owned by
>>> storm user and with permissions 700. Thought it might be due to me
>>> submitting the job as a different user but have just terminated the old job
>>> and submitted the new one as the storm user and still not getting any
>>> tuples coming through.
>>>
>>> For what it's worth, I've looked in the local data directory on the
>>> supervisors and in the workers folder there is a (non-empty) heartbeats
>>> folder being created - don't know if that's indicative of heartbeats being
>>> processed correctly or if that's created at the start of a worker process.
>>>
>>> Is there anything else you want to see that would help you diagnose
>>> this? I can share the UI with you if you think that'd help?
>>>
>>> Thanks!
>>>
>>>
>>> On 13 April 2015 at 16:22, Nathan Leung <nc...@gmail.com> wrote:
>>>
>>>> Sounds like it's having problems with heartbeats.  As I recall worker
>>>> -> supervisor heartbeats are written using the local filesystem.  What user
>>>> are you running as?  In your storm.yaml file, you will see the option
>>>> "storm.local.dir".  Does the user supervisor / workers are running as have
>>>> permissions to write to this folder?
>>>>
>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>>>> matthewsharpe3@gmail.com> wrote:
>>>>
>>>>> New bonus bit of information - the supervisor log has got this going
>>>>> on:
>>>>>
>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and clearing
>>>>> state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current supervisor time:
>>>>> 1428938060. State: :timed-out, Heartbeat:
>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>>>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill 5284.
>>>>> Process is probably already dead.
>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>> -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>>>> -Dstorm.id=production-topology-2-1428932653
>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>>>
>>>>> On 13 April 2015 at 16:09, Matthew Sharpe <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> OK, in checking the worker process logs I've uncovered a bit more
>>>>>> information. The UI reports 0 emitted from the wordGenerator spout, along
>>>>>> with 0 transferred, acked and failed.
>>>>>>
>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the
>>>>>> usual way of doing that), I've got a bunch of other stuff on the boxes and
>>>>>> in troubleshooting things out have been a bit laxer with security than I
>>>>>> should have been!
>>>>>>
>>>>>> The UI shows 5 workers all starting on the same node - each of which
>>>>>> has a declared uptime of 1s. Looking at the logs on the node, during that
>>>>>> time the following happens:
>>>>>>
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>> environment:java.io.tmpdir=/tmp
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>> environment:java.compiler=<NA>
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
>>>>>> os.name=Linux
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>> environment:os.arch=amd64
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>> environment:os.version=3.13.0-48-generic
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
>>>>>> user.name=storm
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>> environment:user.home=/app/home/storm
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>> environment:user.dir=/
>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>>>> production-topology-2-1428932653 on
>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>>>> "topology.fall.back.on.java.serialization" true,
>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
>>>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
>>>>>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
>>>>>> "supervisor.worker.start.timeout.secs" 120,
>>>>>> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>>>> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>>>> "/transactional", "topology.acker.executors" nil,
>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>>>> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
>>>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>>>> "nimbus.topology.validator"
>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
>>>>>> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>>>> "topology.disruptor.wait.strategy"
>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>>>> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
>>>>>> 5, "storm.thrift.transport"
>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>>>> "topology.state.synchronization.timeout.secs" 60,
>>>>>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
>>>>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>>>> "topology.max.task.parallelism" nil}
>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection
>>>>>> to server *.compute.amazonaws.com/*:2181
>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>>> 0x34cb217be05005e, negotiated timeout = 20000
>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
>>>>>> :connected:none
>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session: 0x34cb217be05005e
>>>>>> closed
>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@4b492c47
>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection
>>>>>> to server *.compute.amazonaws.com/*:2181
>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>>> 0x24cb217be030072, negotiated timeout = 20000
>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer
>>>>>> transport plugin:backtype.storm.messaging.zmq
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor counter:[2
>>>>>> 2]
>>>>>>
>>>>>>
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>> wordGenerator:[11 11]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>> wordGenerator:[11 11]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>> wordGenerator:(11)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>> wordGenerator:(11)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>> wordGenerator:(11)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>> wordGenerator:[12 12]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>> __system ["startup"]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>> wordGenerator:[12 12]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>> wordGenerator:[12 12]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>> wordGenerator:(12)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>> wordGenerator:(12)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>> wordGenerator:(12)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>> wordGenerator:[13 13]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>> __system ["startup"]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>> wordGenerator:[13 13]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>> wordGenerator:[13 13]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>> wordGenerator:(13)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>> wordGenerator:(13)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>> wordGenerator:(13)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>> wordGenerator:[14 14]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>> __system ["startup"]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>> wordGenerator:[14 14]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>> wordGenerator:[14 14]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [bertels]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>> wordGenerator:(14)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>> wordGenerator:(14)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>> wordGenerator:(14)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:11, stream: default, id: {}, [bertels]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [mike]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:12, stream: default, id: {}, [mike]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [bertels]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:13, stream: default, id: {}, [bertels]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>> wordGenerator:[15 15]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>> __system ["startup"]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>> wordGenerator:[15 15]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>> wordGenerator:[15 15]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>> wordGenerator:(15)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>> wordGenerator:(15)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>> wordGenerator:(15)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>> __system:[-1 -1]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
>>>>>> ["startup"]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>> __system:[-1 -1]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>> __system:[-1 -1]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt __system:(-1)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __acker:[1
>>>>>> 1]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [jackson]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [mike]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:11, stream: default, id: {}, [jackson]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:14, stream: default, id: {}, [mike]
>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [jackson]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:12, stream: default, id: {}, [jackson]
>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [golda]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:13, stream: default, id: {}, [golda]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __system:(-1)
>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
>>>>>> ["startup"]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>>>>>> __acker:[1 1]
>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [jackson]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading executor
>>>>>> __acker:[1 1]
>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread for
>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt __acker:(1)
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1)
>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [nathan]
>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [bertels]
>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [golda]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:11, stream: default, id: {}, [golda]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>>>>
>>>>>> After that, the logs stop (no more messages) and the UI says that the
>>>>>> 5 executors are now running on another node (and port) - at which point the
>>>>>> worker log is created on that node and with that number (same messages).
>>>>>>
>>>>>> So the worker seems to start and start emitting words but the UI
>>>>>> isn't registered anything as having been emitted. After a short period of
>>>>>> time the worker seems to stop without logging anything and so a new worker
>>>>>> is started.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com> wrote:
>>>>>>
>>>>>>> Did you check the worker process logs?  What are the emit counts for
>>>>>>> the components in the UI?
>>>>>>>
>>>>>>> Also I have not run on AWS personally but the following line seems
>>>>>>> strange to me:
>>>>>>>
>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>> compute.amazonaws.com"}
>>>>>>>
>>>>>>> Generally this would be bound to a specific machine, not *.<domain>.
>>>>>>>
>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've set up a small Storm cluster on AWS that is giving me
>>>>>>>> headaches. At the moment I've got a situation where the cluster (a nimbus
>>>>>>>> node and 3 supervisors) will successfully run in local mode my particular
>>>>>>>> topology. However, as soon as I try to run the topology in distributed mode
>>>>>>>> there are no errors (that I can see) in any of the logs - the topology
>>>>>>>> appears in the UI but never shows any tuples emitted by the spouts.
>>>>>>>>
>>>>>>>> In order to check it wasn't my topology I cloned the storm-starter
>>>>>>>> project and have tried with the RollingTopWords example - running locally I
>>>>>>>> get a stream of output but as soon as I specify that we should run in a
>>>>>>>> distributed fashion I get a success message (topology successfully
>>>>>>>> submitted). The topology appears on the UI as successfully submitted but I
>>>>>>>> see no progress.
>>>>>>>>
>>>>>>>> It's a bit of a weird one as I don't know which logs to share to
>>>>>>>> show the problem as the whole point is I'm not seeing any errors!
>>>>>>>>
>>>>>>>> The last entry in the supervisor logs pertains to the set up of the
>>>>>>>> supervisor (i.e. it's not logged anything about the topologies that have
>>>>>>>> been submitted)
>>>>>>>>
>>>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>>>> illuminating:
>>>>>>>>
>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>>>>>>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 2]
>>>>>>>> [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13]
>>>>>>>> [14 14] [15 15] [1 1]]
>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment for
>>>>>>>> topology id production-topology-2-1428932653:
>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
>>>>>>>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702],
>>>>>>>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>>>
>>>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Matt
>>>>>>>>
>>>>>>>> --
>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Check out my blog: dogdogfish.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Check out my blog: dogdogfish.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Check out my blog: dogdogfish.com
>>>
>>
>>
>
>
> --
> Check out my blog: dogdogfish.com
>

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Matthew Sharpe <ma...@gmail.com>.
More information (this time from my Zookeeper cluster):

The Zookeeper logs seem pretty calm except from this fairly repetitive
message:

2015-04-13 16:22:36,000 [myid:1] - INFO  [SessionTracker:ZooKeeperServer@347]
- Expiring session 0x34cb217be0500b6, timeout of 20000ms exceeded
2015-04-13 16:22:36,001 [myid:1] - INFO  [ProcessThread(sid:1
cport:-1)::PrepRequestProcessor@494] - Processed session termination for
sessionid: 0x34cb217be0500b6
2015-04-13 16:22:55,379 [myid:1] - INFO  [ProcessThread(sid:1
cport:-1)::PrepRequestProcessor@494] - Processed session termination for
sessionid: 0x34cb217be0500b7
2015-04-13 16:22:55,406 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /[STORM_SUPERVISOR_IP]:51441
2015-04-13 16:22:55,412 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old
client /[STORM_SUPERVISOR_IP]:51441; will be dropped if server is in r-o
mode
2015-04-13 16:22:55,413 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish
new session at /[STORM_SUPERVISOR_IP]:51441
2015-04-13 16:22:55,414 [myid:1] - INFO
 [CommitProcessor:1:ZooKeeperServer@617] - Established session
0x14cb217be0300ca with negotiated timeout 20000 for client
/[STORM_SUPERVISOR_IP]:51441
2015-04-13 16:22:56,222 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x14cb217be0300ca, likely client has closed socket
        at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
2015-04-13 16:22:56,223 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /[STORM_SUPERVISOR_IP]:51441 which had sessionid 0x14cb217be0300ca
2015-04-13 16:23:05,827 [myid:1] - INFO  [ProcessThread(sid:1
cport:-1)::PrepRequestProcessor@494] - Processed session termination for
sessionid: 0x34cb217be0500b8
2015-04-13 16:23:05,868 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /[STORM_SUPERVISOR_IP]:51443
2015-04-13 16:23:05,873 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old
client /[STORM_SUPERVISOR_IP]:51443; will be dropped if server is in r-o
mode

Where STORM_SUPERVISOR_IP is the IP of one of the Storm supervisors
(probably didn't need to say that).

Following on from that - running `echo stat | nc localhost 2181` on one of
the Zookeeper nodes shows connections by localhost and by the Kafka cluster
but not by the Storm cluster. Running the Zookeeper cli shows that a Storm
folder has been created with the following subfolders:

[workerbeats, errors, supervisors, storms, assignments]

Don't know if that helps at all.

On 13 April 2015 at 17:12, Matthew Sharpe <ma...@gmail.com> wrote:

> Not long at all - certainly less than couple of minutes but longer than a
> few seconds, the couple I actually measured lasted 42 seconds followed by
> about 5 seconds. So the heartbeats folder behaves like so...
>
> /app/storm/workers/[uuid] is created with an empty heartbeats folder
> inside it
> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a couple of
> examples:
>
> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941026297.version
> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297
> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941027240.version
> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240
>
> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941398235.version
> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235
> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941399163.version
> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163
> )
>
> /app/storm/workers/[uuid] is deleted.
>
> I've not seen any heartbeats folder with any more than 2 heartbeats in.
>
>
>
> On 13 April 2015 at 17:00, Nathan Leung <nc...@gmail.com> wrote:
>
>> It only matters what user supervisor is run as, so you should be ok.  How
>> long is it between the worker being started, and it getting killed?  What
>> are the timestamps of the heartbeats in the corresponding worker folder?
>>
>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <
>> matthewsharpe3@gmail.com> wrote:
>>
>>> I'm running storm as a (specially created) storm user - on both the
>>> nimbus and the supervisors the process is showing as being run by the storm
>>> user. And I've got the storm local directory set as /app/storm - owned by
>>> storm user and with permissions 700. Thought it might be due to me
>>> submitting the job as a different user but have just terminated the old job
>>> and submitted the new one as the storm user and still not getting any
>>> tuples coming through.
>>>
>>> For what it's worth, I've looked in the local data directory on the
>>> supervisors and in the workers folder there is a (non-empty) heartbeats
>>> folder being created - don't know if that's indicative of heartbeats being
>>> processed correctly or if that's created at the start of a worker process.
>>>
>>> Is there anything else you want to see that would help you diagnose
>>> this? I can share the UI with you if you think that'd help?
>>>
>>> Thanks!
>>>
>>>
>>> On 13 April 2015 at 16:22, Nathan Leung <nc...@gmail.com> wrote:
>>>
>>>> Sounds like it's having problems with heartbeats.  As I recall worker
>>>> -> supervisor heartbeats are written using the local filesystem.  What user
>>>> are you running as?  In your storm.yaml file, you will see the option
>>>> "storm.local.dir".  Does the user supervisor / workers are running as have
>>>> permissions to write to this folder?
>>>>
>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>>>> matthewsharpe3@gmail.com> wrote:
>>>>
>>>>> New bonus bit of information - the supervisor log has got this going
>>>>> on:
>>>>>
>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and clearing
>>>>> state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current supervisor time:
>>>>> 1428938060. State: :timed-out, Heartbeat:
>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>>>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill 5284.
>>>>> Process is probably already dead.
>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>> -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>>>> -Dstorm.id=production-topology-2-1428932653
>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>>>
>>>>> On 13 April 2015 at 16:09, Matthew Sharpe <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> OK, in checking the worker process logs I've uncovered a bit more
>>>>>> information. The UI reports 0 emitted from the wordGenerator spout, along
>>>>>> with 0 transferred, acked and failed.
>>>>>>
>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the
>>>>>> usual way of doing that), I've got a bunch of other stuff on the boxes and
>>>>>> in troubleshooting things out have been a bit laxer with security than I
>>>>>> should have been!
>>>>>>
>>>>>> The UI shows 5 workers all starting on the same node - each of which
>>>>>> has a declared uptime of 1s. Looking at the logs on the node, during that
>>>>>> time the following happens:
>>>>>>
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>> environment:java.io.tmpdir=/tmp
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>> environment:java.compiler=<NA>
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
>>>>>> os.name=Linux
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>> environment:os.arch=amd64
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>> environment:os.version=3.13.0-48-generic
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
>>>>>> user.name=storm
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>> environment:user.home=/app/home/storm
>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>> environment:user.dir=/
>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>>>> production-topology-2-1428932653 on
>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>>>> "topology.fall.back.on.java.serialization" true,
>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
>>>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
>>>>>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
>>>>>> "supervisor.worker.start.timeout.secs" 120,
>>>>>> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>>>> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>>>> "/transactional", "topology.acker.executors" nil,
>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>>>> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
>>>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>>>> "nimbus.topology.validator"
>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
>>>>>> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>>>> "topology.disruptor.wait.strategy"
>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>>>> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
>>>>>> 5, "storm.thrift.transport"
>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>>>> "topology.state.synchronization.timeout.secs" 60,
>>>>>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
>>>>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>>>> "topology.max.task.parallelism" nil}
>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection
>>>>>> to server *.compute.amazonaws.com/*:2181
>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>>> 0x34cb217be05005e, negotiated timeout = 20000
>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
>>>>>> :connected:none
>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session: 0x34cb217be05005e
>>>>>> closed
>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@4b492c47
>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection
>>>>>> to server *.compute.amazonaws.com/*:2181
>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>>> 0x24cb217be030072, negotiated timeout = 20000
>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer
>>>>>> transport plugin:backtype.storm.messaging.zmq
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor counter:[2
>>>>>> 2]
>>>>>>
>>>>>>
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>> wordGenerator:[11 11]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>> wordGenerator:[11 11]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>> wordGenerator:(11)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>> wordGenerator:(11)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>> wordGenerator:(11)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>> wordGenerator:[12 12]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>> __system ["startup"]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>> wordGenerator:[12 12]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>> wordGenerator:[12 12]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>> wordGenerator:(12)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>> wordGenerator:(12)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>> wordGenerator:(12)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>> wordGenerator:[13 13]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>> __system ["startup"]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>> wordGenerator:[13 13]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>> wordGenerator:[13 13]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>> wordGenerator:(13)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>> wordGenerator:(13)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>> wordGenerator:(13)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>> wordGenerator:[14 14]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>> __system ["startup"]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>> wordGenerator:[14 14]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>> wordGenerator:[14 14]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [bertels]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>> wordGenerator:(14)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>> wordGenerator:(14)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>> wordGenerator:(14)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:11, stream: default, id: {}, [bertels]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [mike]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:12, stream: default, id: {}, [mike]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [bertels]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:13, stream: default, id: {}, [bertels]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>> wordGenerator:[15 15]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>> __system ["startup"]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>> wordGenerator:[15 15]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>> wordGenerator:[15 15]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>> wordGenerator:(15)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>> wordGenerator:(15)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>> wordGenerator:(15)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>> __system:[-1 -1]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
>>>>>> ["startup"]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>> __system:[-1 -1]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>>> __system:[-1 -1]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt __system:(-1)
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __acker:[1
>>>>>> 1]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [jackson]
>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [mike]
>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:11, stream: default, id: {}, [jackson]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:14, stream: default, id: {}, [mike]
>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [jackson]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:12, stream: default, id: {}, [jackson]
>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [golda]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:13, stream: default, id: {}, [golda]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __system:(-1)
>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
>>>>>> ["startup"]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>>>>>> __acker:[1 1]
>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [jackson]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading executor
>>>>>> __acker:[1 1]
>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread for
>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt __acker:(1)
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1)
>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [nathan]
>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [bertels]
>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>>> [golda]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:11, stream: default, id: {}, [golda]
>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>>> source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>>>>
>>>>>> After that, the logs stop (no more messages) and the UI says that the
>>>>>> 5 executors are now running on another node (and port) - at which point the
>>>>>> worker log is created on that node and with that number (same messages).
>>>>>>
>>>>>> So the worker seems to start and start emitting words but the UI
>>>>>> isn't registered anything as having been emitted. After a short period of
>>>>>> time the worker seems to stop without logging anything and so a new worker
>>>>>> is started.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com> wrote:
>>>>>>
>>>>>>> Did you check the worker process logs?  What are the emit counts for
>>>>>>> the components in the UI?
>>>>>>>
>>>>>>> Also I have not run on AWS personally but the following line seems
>>>>>>> strange to me:
>>>>>>>
>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>> compute.amazonaws.com"}
>>>>>>>
>>>>>>> Generally this would be bound to a specific machine, not *.<domain>.
>>>>>>>
>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've set up a small Storm cluster on AWS that is giving me
>>>>>>>> headaches. At the moment I've got a situation where the cluster (a nimbus
>>>>>>>> node and 3 supervisors) will successfully run in local mode my particular
>>>>>>>> topology. However, as soon as I try to run the topology in distributed mode
>>>>>>>> there are no errors (that I can see) in any of the logs - the topology
>>>>>>>> appears in the UI but never shows any tuples emitted by the spouts.
>>>>>>>>
>>>>>>>> In order to check it wasn't my topology I cloned the storm-starter
>>>>>>>> project and have tried with the RollingTopWords example - running locally I
>>>>>>>> get a stream of output but as soon as I specify that we should run in a
>>>>>>>> distributed fashion I get a success message (topology successfully
>>>>>>>> submitted). The topology appears on the UI as successfully submitted but I
>>>>>>>> see no progress.
>>>>>>>>
>>>>>>>> It's a bit of a weird one as I don't know which logs to share to
>>>>>>>> show the problem as the whole point is I'm not seeing any errors!
>>>>>>>>
>>>>>>>> The last entry in the supervisor logs pertains to the set up of the
>>>>>>>> supervisor (i.e. it's not logged anything about the topologies that have
>>>>>>>> been submitted)
>>>>>>>>
>>>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>>>> illuminating:
>>>>>>>>
>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>>>>>>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 2]
>>>>>>>> [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13]
>>>>>>>> [14 14] [15 15] [1 1]]
>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment for
>>>>>>>> topology id production-topology-2-1428932653:
>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
>>>>>>>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702],
>>>>>>>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>>>
>>>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Matt
>>>>>>>>
>>>>>>>> --
>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Check out my blog: dogdogfish.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Check out my blog: dogdogfish.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Check out my blog: dogdogfish.com
>>>
>>
>>
>
>
> --
> Check out my blog: dogdogfish.com
>



-- 
Check out my blog: dogdogfish.com

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Matthew Sharpe <ma...@gmail.com>.
Not long at all - certainly less than couple of minutes but longer than a
few seconds, the couple I actually measured lasted 42 seconds followed by
about 5 seconds. So the heartbeats folder behaves like so...

/app/storm/workers/[uuid] is created with an empty heartbeats folder inside
it
/app/storm/workers/[uuid]/heartbeats has two timestamps added (a couple of
examples:

-rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941026297.version
-rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297
-rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941027240.version
-rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240

-rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941398235.version
-rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235
-rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941399163.version
-rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163
)

/app/storm/workers/[uuid] is deleted.

I've not seen any heartbeats folder with any more than 2 heartbeats in.



On 13 April 2015 at 17:00, Nathan Leung <nc...@gmail.com> wrote:

> It only matters what user supervisor is run as, so you should be ok.  How
> long is it between the worker being started, and it getting killed?  What
> are the timestamps of the heartbeats in the corresponding worker folder?
>
> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <matthewsharpe3@gmail.com
> > wrote:
>
>> I'm running storm as a (specially created) storm user - on both the
>> nimbus and the supervisors the process is showing as being run by the storm
>> user. And I've got the storm local directory set as /app/storm - owned by
>> storm user and with permissions 700. Thought it might be due to me
>> submitting the job as a different user but have just terminated the old job
>> and submitted the new one as the storm user and still not getting any
>> tuples coming through.
>>
>> For what it's worth, I've looked in the local data directory on the
>> supervisors and in the workers folder there is a (non-empty) heartbeats
>> folder being created - don't know if that's indicative of heartbeats being
>> processed correctly or if that's created at the start of a worker process.
>>
>> Is there anything else you want to see that would help you diagnose this?
>> I can share the UI with you if you think that'd help?
>>
>> Thanks!
>>
>>
>> On 13 April 2015 at 16:22, Nathan Leung <nc...@gmail.com> wrote:
>>
>>> Sounds like it's having problems with heartbeats.  As I recall worker ->
>>> supervisor heartbeats are written using the local filesystem.  What user
>>> are you running as?  In your storm.yaml file, you will see the option
>>> "storm.local.dir".  Does the user supervisor / workers are running as have
>>> permissions to write to this folder?
>>>
>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>>> matthewsharpe3@gmail.com> wrote:
>>>
>>>> New bonus bit of information - the supervisor log has got this going on:
>>>>
>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and clearing
>>>> state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current supervisor time:
>>>> 1428938060. State: :timed-out, Heartbeat:
>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill 5284.
>>>> Process is probably already dead.
>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>> -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>>> -Dstorm.id=production-topology-2-1428932653
>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>>
>>>> On 13 April 2015 at 16:09, Matthew Sharpe <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> OK, in checking the worker process logs I've uncovered a bit more
>>>>> information. The UI reports 0 emitted from the wordGenerator spout, along
>>>>> with 0 transferred, acked and failed.
>>>>>
>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the usual
>>>>> way of doing that), I've got a bunch of other stuff on the boxes and in
>>>>> troubleshooting things out have been a bit laxer with security than I
>>>>> should have been!
>>>>>
>>>>> The UI shows 5 workers all starting on the same node - each of which
>>>>> has a declared uptime of 1s. Looking at the logs on the node, during that
>>>>> time the following happens:
>>>>>
>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>> environment:java.io.tmpdir=/tmp
>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>> environment:java.compiler=<NA>
>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
>>>>> os.name=Linux
>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>> environment:os.arch=amd64
>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>> environment:os.version=3.13.0-48-generic
>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
>>>>> user.name=storm
>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>> environment:user.home=/app/home/storm
>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>> environment:user.dir=/
>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>>> production-topology-2-1428932653 on
>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>>> "topology.fall.back.on.java.serialization" true,
>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
>>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
>>>>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
>>>>> "supervisor.worker.start.timeout.secs" 120,
>>>>> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>>> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>>> "/transactional", "topology.acker.executors" nil,
>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>>> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
>>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>>> "nimbus.topology.validator"
>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
>>>>> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
>>>>> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>>> "topology.disruptor.wait.strategy"
>>>>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>>> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
>>>>> 5, "storm.thrift.transport"
>>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>>> "topology.state.synchronization.timeout.secs" 60,
>>>>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
>>>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>>> "topology.max.task.parallelism" nil}
>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection
>>>>> to server *.compute.amazonaws.com/*:2181
>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>> 0x34cb217be05005e, negotiated timeout = 20000
>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
>>>>> :connected:none
>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session: 0x34cb217be05005e
>>>>> closed
>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@4b492c47
>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection
>>>>> to server *.compute.amazonaws.com/*:2181
>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>> 0x24cb217be030072, negotiated timeout = 20000
>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer transport
>>>>> plugin:backtype.storm.messaging.zmq
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor counter:[2
>>>>> 2]
>>>>>
>>>>>
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>> wordGenerator:[11 11]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>> wordGenerator:[11 11]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>> wordGenerator:(11)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>> wordGenerator:(11)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>> wordGenerator:(11)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>> wordGenerator:[12 12]
>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>>>> ["startup"]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>> wordGenerator:[12 12]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>> wordGenerator:[12 12]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>> wordGenerator:(12)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>> wordGenerator:(12)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>> wordGenerator:(12)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>> wordGenerator:[13 13]
>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>>>> ["startup"]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>> wordGenerator:[13 13]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>> wordGenerator:[13 13]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>> wordGenerator:(13)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>> wordGenerator:(13)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>> wordGenerator:(13)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>> wordGenerator:[14 14]
>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>>>> ["startup"]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>> wordGenerator:[14 14]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>> wordGenerator:[14 14]
>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>> [bertels]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>> wordGenerator:(14)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>> wordGenerator:(14)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>> wordGenerator:(14)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>>> source: wordGenerator:11, stream: default, id: {}, [bertels]
>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>> [mike]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>>> source: wordGenerator:12, stream: default, id: {}, [mike]
>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>> [bertels]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>>> source: wordGenerator:13, stream: default, id: {}, [bertels]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>> wordGenerator:[15 15]
>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>>>> ["startup"]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>> wordGenerator:[15 15]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>> wordGenerator:[15 15]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>> wordGenerator:(15)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>> wordGenerator:(15)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>> wordGenerator:(15)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>> __system:[-1 -1]
>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
>>>>> ["startup"]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>> __system:[-1 -1]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>>> __system:[-1 -1]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt __system:(-1)
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __acker:[1
>>>>> 1]
>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>> [jackson]
>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>> [mike]
>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>>> source: wordGenerator:11, stream: default, id: {}, [jackson]
>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>> source: wordGenerator:14, stream: default, id: {}, [mike]
>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>> [jackson]
>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>> source: wordGenerator:12, stream: default, id: {}, [jackson]
>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>> [golda]
>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>> source: wordGenerator:13, stream: default, id: {}, [golda]
>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __system:(-1)
>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
>>>>> ["startup"]
>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>>>>> __acker:[1 1]
>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>> [jackson]
>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading executor
>>>>> __acker:[1 1]
>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread for
>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702
>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt __acker:(1)
>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1)
>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>> [nathan]
>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>> [bertels]
>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>>> [golda]
>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>> source: wordGenerator:11, stream: default, id: {}, [golda]
>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>>> source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>>>
>>>>> After that, the logs stop (no more messages) and the UI says that the
>>>>> 5 executors are now running on another node (and port) - at which point the
>>>>> worker log is created on that node and with that number (same messages).
>>>>>
>>>>> So the worker seems to start and start emitting words but the UI isn't
>>>>> registered anything as having been emitted. After a short period of time
>>>>> the worker seems to stop without logging anything and so a new worker is
>>>>> started.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Matt
>>>>>
>>>>>
>>>>>
>>>>> On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com> wrote:
>>>>>
>>>>>> Did you check the worker process logs?  What are the emit counts for
>>>>>> the components in the UI?
>>>>>>
>>>>>> Also I have not run on AWS personally but the following line seems
>>>>>> strange to me:
>>>>>>
>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>> compute.amazonaws.com"}
>>>>>>
>>>>>> Generally this would be bound to a specific machine, not *.<domain>.
>>>>>>
>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've set up a small Storm cluster on AWS that is giving me
>>>>>>> headaches. At the moment I've got a situation where the cluster (a nimbus
>>>>>>> node and 3 supervisors) will successfully run in local mode my particular
>>>>>>> topology. However, as soon as I try to run the topology in distributed mode
>>>>>>> there are no errors (that I can see) in any of the logs - the topology
>>>>>>> appears in the UI but never shows any tuples emitted by the spouts.
>>>>>>>
>>>>>>> In order to check it wasn't my topology I cloned the storm-starter
>>>>>>> project and have tried with the RollingTopWords example - running locally I
>>>>>>> get a stream of output but as soon as I specify that we should run in a
>>>>>>> distributed fashion I get a success message (topology successfully
>>>>>>> submitted). The topology appears on the UI as successfully submitted but I
>>>>>>> see no progress.
>>>>>>>
>>>>>>> It's a bit of a weird one as I don't know which logs to share to
>>>>>>> show the problem as the whole point is I'm not seeing any errors!
>>>>>>>
>>>>>>> The last entry in the supervisor logs pertains to the set up of the
>>>>>>> supervisor (i.e. it's not logged anything about the topologies that have
>>>>>>> been submitted)
>>>>>>>
>>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>>> illuminating:
>>>>>>>
>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>>>>>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 2]
>>>>>>> [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13]
>>>>>>> [14 14] [15 15] [1 1]]
>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment for
>>>>>>> topology id production-topology-2-1428932653:
>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
>>>>>>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702],
>>>>>>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>>
>>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Matt
>>>>>>>
>>>>>>> --
>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Check out my blog: dogdogfish.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Check out my blog: dogdogfish.com
>>>>
>>>
>>>
>>
>>
>> --
>> Check out my blog: dogdogfish.com
>>
>
>


-- 
Check out my blog: dogdogfish.com

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Nathan Leung <nc...@gmail.com>.
It only matters what user supervisor is run as, so you should be ok.  How
long is it between the worker being started, and it getting killed?  What
are the timestamps of the heartbeats in the corresponding worker folder?

On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <ma...@gmail.com>
wrote:

> I'm running storm as a (specially created) storm user - on both the nimbus
> and the supervisors the process is showing as being run by the storm user.
> And I've got the storm local directory set as /app/storm - owned by storm
> user and with permissions 700. Thought it might be due to me submitting the
> job as a different user but have just terminated the old job and submitted
> the new one as the storm user and still not getting any tuples coming
> through.
>
> For what it's worth, I've looked in the local data directory on the
> supervisors and in the workers folder there is a (non-empty) heartbeats
> folder being created - don't know if that's indicative of heartbeats being
> processed correctly or if that's created at the start of a worker process.
>
> Is there anything else you want to see that would help you diagnose this?
> I can share the UI with you if you think that'd help?
>
> Thanks!
>
>
> On 13 April 2015 at 16:22, Nathan Leung <nc...@gmail.com> wrote:
>
>> Sounds like it's having problems with heartbeats.  As I recall worker ->
>> supervisor heartbeats are written using the local filesystem.  What user
>> are you running as?  In your storm.yaml file, you will see the option
>> "storm.local.dir".  Does the user supervisor / workers are running as have
>> permissions to write to this folder?
>>
>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>> matthewsharpe3@gmail.com> wrote:
>>
>>> New bonus bit of information - the supervisor log has got this going on:
>>>
>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and clearing
>>> state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current supervisor time:
>>> 1428938060. State: :timed-out, Heartbeat:
>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill 5284.
>>> Process is probably already dead.
>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>> -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>> -Dstorm.id=production-topology-2-1428932653
>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>
>>> On 13 April 2015 at 16:09, Matthew Sharpe <ma...@gmail.com>
>>> wrote:
>>>
>>>> OK, in checking the worker process logs I've uncovered a bit more
>>>> information. The UI reports 0 emitted from the wordGenerator spout, along
>>>> with 0 transferred, acked and failed.
>>>>
>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the usual
>>>> way of doing that), I've got a bunch of other stuff on the boxes and in
>>>> troubleshooting things out have been a bit laxer with security than I
>>>> should have been!
>>>>
>>>> The UI shows 5 workers all starting on the same node - each of which
>>>> has a declared uptime of 1s. Looking at the logs on the node, during that
>>>> time the following happens:
>>>>
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>> environment:java.io.tmpdir=/tmp
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>> environment:java.compiler=<NA>
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
>>>> os.name=Linux
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>> environment:os.arch=amd64
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>> environment:os.version=3.13.0-48-generic
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
>>>> user.name=storm
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>> environment:user.home=/app/home/storm
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>> environment:user.dir=/
>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>> production-topology-2-1428932653 on
>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>> "topology.fall.back.on.java.serialization" true,
>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
>>>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
>>>> "supervisor.worker.start.timeout.secs" 120,
>>>> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>> "/transactional", "topology.acker.executors" nil,
>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>> "nimbus.topology.validator"
>>>> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
>>>> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
>>>> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>> "topology.disruptor.wait.strategy"
>>>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
>>>> 5, "storm.thrift.transport"
>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>> "topology.state.synchronization.timeout.secs" 60,
>>>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
>>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>> "topology.max.task.parallelism" nil}
>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection
>>>> to server *.compute.amazonaws.com/*:2181
>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>> 0x34cb217be05005e, negotiated timeout = 20000
>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
>>>> :connected:none
>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session: 0x34cb217be05005e
>>>> closed
>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@4b492c47
>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection
>>>> to server *.compute.amazonaws.com/*:2181
>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>> 0x24cb217be030072, negotiated timeout = 20000
>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer transport
>>>> plugin:backtype.storm.messaging.zmq
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor counter:[2 2]
>>>>
>>>>
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>> wordGenerator:[11 11]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>> wordGenerator:[11 11]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>> wordGenerator:(11)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>> wordGenerator:(11)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>> wordGenerator:(11)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>> wordGenerator:[12 12]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>>> ["startup"]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>> wordGenerator:[12 12]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>> wordGenerator:[12 12]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>> wordGenerator:(12)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>> wordGenerator:(12)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>> wordGenerator:(12)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>> wordGenerator:[13 13]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>>> ["startup"]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>> wordGenerator:[13 13]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>> wordGenerator:[13 13]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>> wordGenerator:(13)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>> wordGenerator:(13)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>> wordGenerator:(13)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>> wordGenerator:[14 14]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>>> ["startup"]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>> wordGenerator:[14 14]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>> wordGenerator:[14 14]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [bertels]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>> wordGenerator:(14)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>> wordGenerator:(14)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>> wordGenerator:(14)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:11, stream: default, id: {}, [bertels]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [mike]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:12, stream: default, id: {}, [mike]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [bertels]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:13, stream: default, id: {}, [bertels]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>> wordGenerator:[15 15]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>>> ["startup"]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>> wordGenerator:[15 15]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>> wordGenerator:[15 15]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>> wordGenerator:(15)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>> wordGenerator:(15)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>> wordGenerator:(15)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __system:[-1
>>>> -1]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
>>>> ["startup"]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>> __system:[-1 -1]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>> __system:[-1 -1]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt __system:(-1)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __acker:[1 1]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [jackson]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [mike]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:11, stream: default, id: {}, [jackson]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:14, stream: default, id: {}, [mike]
>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [jackson]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:12, stream: default, id: {}, [jackson]
>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [golda]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:13, stream: default, id: {}, [golda]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __system:(-1)
>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
>>>> ["startup"]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>>>> __acker:[1 1]
>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [jackson]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading executor
>>>> __acker:[1 1]
>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread for
>>>> bd340179-5905-4233-ad5d-44b47b228177:6702
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt __acker:(1)
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1)
>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [nathan]
>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [bertels]
>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [golda]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:11, stream: default, id: {}, [golda]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>>
>>>> After that, the logs stop (no more messages) and the UI says that the 5
>>>> executors are now running on another node (and port) - at which point the
>>>> worker log is created on that node and with that number (same messages).
>>>>
>>>> So the worker seems to start and start emitting words but the UI isn't
>>>> registered anything as having been emitted. After a short period of time
>>>> the worker seems to stop without logging anything and so a new worker is
>>>> started.
>>>>
>>>> Thanks,
>>>>
>>>> Matt
>>>>
>>>>
>>>>
>>>> On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com> wrote:
>>>>
>>>>> Did you check the worker process logs?  What are the emit counts for
>>>>> the components in the UI?
>>>>>
>>>>> Also I have not run on AWS personally but the following line seems
>>>>> strange to me:
>>>>>
>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>> compute.amazonaws.com"}
>>>>>
>>>>> Generally this would be bound to a specific machine, not *.<domain>.
>>>>>
>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>> matthewsharpe3@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've set up a small Storm cluster on AWS that is giving me headaches.
>>>>>> At the moment I've got a situation where the cluster (a nimbus node and 3
>>>>>> supervisors) will successfully run in local mode my particular topology.
>>>>>> However, as soon as I try to run the topology in distributed mode there are
>>>>>> no errors (that I can see) in any of the logs - the topology appears in the
>>>>>> UI but never shows any tuples emitted by the spouts.
>>>>>>
>>>>>> In order to check it wasn't my topology I cloned the storm-starter
>>>>>> project and have tried with the RollingTopWords example - running locally I
>>>>>> get a stream of output but as soon as I specify that we should run in a
>>>>>> distributed fashion I get a success message (topology successfully
>>>>>> submitted). The topology appears on the UI as successfully submitted but I
>>>>>> see no progress.
>>>>>>
>>>>>> It's a bit of a weird one as I don't know which logs to share to show
>>>>>> the problem as the whole point is I'm not seeing any errors!
>>>>>>
>>>>>> The last entry in the supervisor logs pertains to the set up of the
>>>>>> supervisor (i.e. it's not logged anything about the topologies that have
>>>>>> been submitted)
>>>>>>
>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>> illuminating:
>>>>>>
>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>>>>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3
>>>>>> 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14
>>>>>> 14] [15 15] [1 1]]
>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment for
>>>>>> topology id production-topology-2-1428932653:
>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
>>>>>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702],
>>>>>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>
>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>> --
>>>>>> Check out my blog: dogdogfish.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Check out my blog: dogdogfish.com
>>>>
>>>
>>>
>>>
>>> --
>>> Check out my blog: dogdogfish.com
>>>
>>
>>
>
>
> --
> Check out my blog: dogdogfish.com
>

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Matthew Sharpe <ma...@gmail.com>.
I'm running storm as a (specially created) storm user - on both the nimbus
and the supervisors the process is showing as being run by the storm user.
And I've got the storm local directory set as /app/storm - owned by storm
user and with permissions 700. Thought it might be due to me submitting the
job as a different user but have just terminated the old job and submitted
the new one as the storm user and still not getting any tuples coming
through.

For what it's worth, I've looked in the local data directory on the
supervisors and in the workers folder there is a (non-empty) heartbeats
folder being created - don't know if that's indicative of heartbeats being
processed correctly or if that's created at the start of a worker process.

Is there anything else you want to see that would help you diagnose this? I
can share the UI with you if you think that'd help?

Thanks!


On 13 April 2015 at 16:22, Nathan Leung <nc...@gmail.com> wrote:

> Sounds like it's having problems with heartbeats.  As I recall worker ->
> supervisor heartbeats are written using the local filesystem.  What user
> are you running as?  In your storm.yaml file, you will see the option
> "storm.local.dir".  Does the user supervisor / workers are running as have
> permissions to write to this folder?
>
> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <matthewsharpe3@gmail.com
> > wrote:
>
>> New bonus bit of information - the supervisor log has got this going on:
>>
>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and clearing
>> state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current supervisor time:
>> 1428938060. State: :timed-out, Heartbeat:
>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>> [15 15] [-1 -1] [1 1]}, :port 6703}
>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill 5284.
>> Process is probably already dead.
>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>> -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>> -Dstorm.id=production-topology-2-1428932653
>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>> backtype.storm.daemon.worker production-topology-2-1428932653
>> bd340179-5905-4233-ad5d-44b47b228177 6703
>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>
>> On 13 April 2015 at 16:09, Matthew Sharpe <ma...@gmail.com>
>> wrote:
>>
>>> OK, in checking the worker process logs I've uncovered a bit more
>>> information. The UI reports 0 emitted from the wordGenerator spout, along
>>> with 0 transferred, acked and failed.
>>>
>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the usual
>>> way of doing that), I've got a bunch of other stuff on the boxes and in
>>> troubleshooting things out have been a bit laxer with security than I
>>> should have been!
>>>
>>> The UI shows 5 workers all starting on the same node - each of which has
>>> a declared uptime of 1s. Looking at the logs on the node, during that time
>>> the following happens:
>>>
>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>> environment:java.io.tmpdir=/tmp
>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>> environment:java.compiler=<NA>
>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
>>> os.name=Linux
>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>> environment:os.arch=amd64
>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>> environment:os.version=3.13.0-48-generic
>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
>>> user.name=storm
>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>> environment:user.home=/app/home/storm
>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>> environment:user.dir=/
>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>> production-topology-2-1428932653 on
>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>> "topology.fall.back.on.java.serialization" true,
>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
>>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
>>> "supervisor.worker.start.timeout.secs" 120,
>>> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>> "/transactional", "topology.acker.executors" nil,
>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>> "nimbus.topology.validator"
>>> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
>>> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
>>> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>> "topology.disruptor.wait.strategy"
>>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
>>> 5, "storm.thrift.transport"
>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>> "topology.state.synchronization.timeout.secs" 60,
>>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>> "topology.max.task.parallelism" nil}
>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client connection,
>>> connectString=*.compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>> ,*.compute.amazonaws.com:2181 sessionTimeout=20000
>>> watcher=com.netflix.curator.ConnectionState@585bd6a6
>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection to
>>> server *.compute.amazonaws.com/*:2181
>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>> 0x34cb217be05005e, negotiated timeout = 20000
>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
>>> :connected:none
>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session: 0x34cb217be05005e
>>> closed
>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client connection,
>>> connectString=*.compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>> ,*.compute.amazonaws.com:2181/storm sessionTimeout=20000
>>> watcher=com.netflix.curator.ConnectionState@4b492c47
>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection to
>>> server *.compute.amazonaws.com/*:2181
>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>> 0x24cb217be030072, negotiated timeout = 20000
>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer transport
>>> plugin:backtype.storm.messaging.zmq
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor counter:[2 2]
>>>
>>>
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>> wordGenerator:[11 11]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>> wordGenerator:[11 11]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>> wordGenerator:(11)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(11)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>> wordGenerator:(11)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>> wordGenerator:[12 12]
>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>> ["startup"]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>> wordGenerator:[12 12]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>> wordGenerator:[12 12]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>> wordGenerator:(12)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(12)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>> wordGenerator:(12)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>> wordGenerator:[13 13]
>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>> ["startup"]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>> wordGenerator:[13 13]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>> wordGenerator:[13 13]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>> wordGenerator:(13)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(13)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>> wordGenerator:(13)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>> wordGenerator:[14 14]
>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>> ["startup"]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>> wordGenerator:[14 14]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>> wordGenerator:[14 14]
>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>> [bertels]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>> wordGenerator:(14)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(14)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>> wordGenerator:(14)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>> source: wordGenerator:11, stream: default, id: {}, [bertels]
>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>> [mike]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>> source: wordGenerator:12, stream: default, id: {}, [mike]
>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>> [bertels]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>> source: wordGenerator:13, stream: default, id: {}, [bertels]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>> wordGenerator:[15 15]
>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>> ["startup"]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>> wordGenerator:[15 15]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>> wordGenerator:[15 15]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>> wordGenerator:(15)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(15)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>> wordGenerator:(15)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __system:[-1
>>> -1]
>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
>>> ["startup"]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>> __system:[-1 -1]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>> __system:[-1 -1]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt __system:(-1)
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __acker:[1 1]
>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>> [jackson]
>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>> [mike]
>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>> source: wordGenerator:11, stream: default, id: {}, [jackson]
>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>> source: wordGenerator:14, stream: default, id: {}, [mike]
>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>> [jackson]
>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>> source: wordGenerator:12, stream: default, id: {}, [jackson]
>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>> [golda]
>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>> source: wordGenerator:13, stream: default, id: {}, [golda]
>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __system:(-1)
>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
>>> ["startup"]
>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>>> __acker:[1 1]
>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>> [jackson]
>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading executor
>>> __acker:[1 1]
>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread for
>>> bd340179-5905-4233-ad5d-44b47b228177:6702
>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt __acker:(1)
>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1)
>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>> [nathan]
>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>> [bertels]
>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>> [golda]
>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>> source: wordGenerator:11, stream: default, id: {}, [golda]
>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>> source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>
>>> After that, the logs stop (no more messages) and the UI says that the 5
>>> executors are now running on another node (and port) - at which point the
>>> worker log is created on that node and with that number (same messages).
>>>
>>> So the worker seems to start and start emitting words but the UI isn't
>>> registered anything as having been emitted. After a short period of time
>>> the worker seems to stop without logging anything and so a new worker is
>>> started.
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>>
>>>
>>> On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com> wrote:
>>>
>>>> Did you check the worker process logs?  What are the emit counts for
>>>> the components in the UI?
>>>>
>>>> Also I have not run on AWS personally but the following line seems
>>>> strange to me:
>>>>
>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>> compute.amazonaws.com"}
>>>>
>>>> Generally this would be bound to a specific machine, not *.<domain>.
>>>>
>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>> matthewsharpe3@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I've set up a small Storm cluster on AWS that is giving me headaches.
>>>>> At the moment I've got a situation where the cluster (a nimbus node and 3
>>>>> supervisors) will successfully run in local mode my particular topology.
>>>>> However, as soon as I try to run the topology in distributed mode there are
>>>>> no errors (that I can see) in any of the logs - the topology appears in the
>>>>> UI but never shows any tuples emitted by the spouts.
>>>>>
>>>>> In order to check it wasn't my topology I cloned the storm-starter
>>>>> project and have tried with the RollingTopWords example - running locally I
>>>>> get a stream of output but as soon as I specify that we should run in a
>>>>> distributed fashion I get a success message (topology successfully
>>>>> submitted). The topology appears on the UI as successfully submitted but I
>>>>> see no progress.
>>>>>
>>>>> It's a bit of a weird one as I don't know which logs to share to show
>>>>> the problem as the whole point is I'm not seeing any errors!
>>>>>
>>>>> The last entry in the supervisor logs pertains to the set up of the
>>>>> supervisor (i.e. it's not logged anything about the topologies that have
>>>>> been submitted)
>>>>>
>>>>> The last entries in the nimbus logs are probably a bit more
>>>>> illuminating:
>>>>>
>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>>>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>> production-topology-2-1428932653 to 1 slots
>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3
>>>>> 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14
>>>>> 14] [15 15] [1 1]]
>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment for
>>>>> topology id production-topology-2-1428932653:
>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
>>>>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702],
>>>>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>
>>>>> If anybody can help that'd be greatly appreciated!
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Matt
>>>>>
>>>>> --
>>>>> Check out my blog: dogdogfish.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Check out my blog: dogdogfish.com
>>>
>>
>>
>>
>> --
>> Check out my blog: dogdogfish.com
>>
>
>


-- 
Check out my blog: dogdogfish.com

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Nathan Leung <nc...@gmail.com>.
Sounds like it's having problems with heartbeats.  As I recall worker ->
supervisor heartbeats are written using the local filesystem.  What user
are you running as?  In your storm.yaml file, you will see the option
"storm.local.dir".  Does the user supervisor / workers are running as have
permissions to write to this folder?

On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <ma...@gmail.com>
wrote:

> New bonus bit of information - the supervisor log has got this going on:
>
> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and clearing
> state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current supervisor time:
> 1428938060. State: :timed-out, Heartbeat:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
> [15 15] [-1 -1] [1 1]}, :port 6703}
> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill 5284.
> Process is probably already dead.
> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with command:
> java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
> -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
> -Dstorm.id=production-topology-2-1428932653
> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
> backtype.storm.daemon.worker production-topology-2-1428932653
> bd340179-5905-4233-ad5d-44b47b228177 6703
> 786a62ac-2868-4526-a7b2-5c26234bb23e
> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>
> On 13 April 2015 at 16:09, Matthew Sharpe <ma...@gmail.com>
> wrote:
>
>> OK, in checking the worker process logs I've uncovered a bit more
>> information. The UI reports 0 emitted from the wordGenerator spout, along
>> with 0 transferred, acked and failed.
>>
>> Sorry - the *.compute.amazonaws.com is redacted (don't know the usual
>> way of doing that), I've got a bunch of other stuff on the boxes and in
>> troubleshooting things out have been a bit laxer with security than I
>> should have been!
>>
>> The UI shows 5 workers all starting on the same node - each of which has
>> a declared uptime of 1s. Looking at the logs on the node, during that time
>> the following happens:
>>
>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>> environment:java.io.tmpdir=/tmp
>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>> environment:java.compiler=<NA>
>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
>> os.name=Linux
>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>> environment:os.arch=amd64
>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>> environment:os.version=3.13.0-48-generic
>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
>> user.name=storm
>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>> environment:user.home=/app/home/storm
>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>> environment:user.dir=/
>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>> production-topology-2-1428932653 on
>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>> "topology.builtin.metrics.bucket.size.secs" 60,
>> "topology.fall.back.on.java.serialization" true,
>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
>> "supervisor.worker.start.timeout.secs" 120,
>> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.compute.amazonaws.com"],
>> "transactional.zookeeper.root" "/transactional", "topology.acker.executors"
>> nil, "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>> "topology.tasks" nil, "topology.spout.wait.strategy"
>> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
>> nil, "storm.zookeeper.retry.interval" 1000, "
>> topology.sleep.spout.wait.strategy.time.ms" 1,
>> "nimbus.topology.validator"
>> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
>> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
>> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>> "backtype.storm.serialization.types.ListDelegateSerializer",
>> "topology.disruptor.wait.strategy"
>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
>> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
>> 5, "storm.thrift.transport"
>> "backtype.storm.security.auth.SimpleTransportPlugin",
>> "topology.state.synchronization.timeout.secs" 60,
>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>> "topology.max.task.parallelism" nil}
>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client connection,
>> connectString=*.compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>> ,*.compute.amazonaws.com:2181 sessionTimeout=20000
>> watcher=com.netflix.curator.ConnectionState@585bd6a6
>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server *.compute.amazonaws.com/*:2181
>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection established
>> to *.compute.amazonaws.com/*:2181, initiating session
>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>> 0x34cb217be05005e, negotiated timeout = 20000
>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
>> :connected:none
>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session: 0x34cb217be05005e
>> closed
>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client connection,
>> connectString=*.compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>> ,*.compute.amazonaws.com:2181/storm sessionTimeout=20000
>> watcher=com.netflix.curator.ConnectionState@4b492c47
>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server *.compute.amazonaws.com/*:2181
>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection established
>> to *.compute.amazonaws.com/*:2181, initiating session
>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>> 0x24cb217be030072, negotiated timeout = 20000
>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer transport
>> plugin:backtype.storm.messaging.zmq
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor counter:[2 2]
>>
>>
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>> wordGenerator:[11 11]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>> wordGenerator:[11 11]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(11)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(11)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>> wordGenerator:(11)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>> wordGenerator:[12 12]
>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>> ["startup"]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>> wordGenerator:[12 12]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>> wordGenerator:[12 12]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(12)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(12)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>> wordGenerator:(12)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>> wordGenerator:[13 13]
>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>> ["startup"]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>> wordGenerator:[13 13]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>> wordGenerator:[13 13]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(13)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(13)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>> wordGenerator:(13)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>> wordGenerator:[14 14]
>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>> ["startup"]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>> wordGenerator:[14 14]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>> wordGenerator:[14 14]
>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>> [bertels]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(14)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(14)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>> wordGenerator:(14)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>> source: wordGenerator:11, stream: default, id: {}, [bertels]
>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>> [mike]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>> source: wordGenerator:12, stream: default, id: {}, [mike]
>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>> [bertels]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>> source: wordGenerator:13, stream: default, id: {}, [bertels]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>> wordGenerator:[15 15]
>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>> ["startup"]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>> wordGenerator:[15 15]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>> wordGenerator:[15 15]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(15)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(15)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>> wordGenerator:(15)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __system:[-1
>> -1]
>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
>> ["startup"]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>> __system:[-1 -1]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>> __system:[-1 -1]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt __system:(-1)
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __acker:[1 1]
>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>> [jackson]
>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>> [mike]
>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>> source: wordGenerator:11, stream: default, id: {}, [jackson]
>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>> source: wordGenerator:14, stream: default, id: {}, [mike]
>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>> [jackson]
>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>> source: wordGenerator:12, stream: default, id: {}, [jackson]
>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>> [golda]
>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>> source: wordGenerator:13, stream: default, id: {}, [golda]
>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __system:(-1)
>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
>> ["startup"]
>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>> __acker:[1 1]
>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>> [jackson]
>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading executor
>> __acker:[1 1]
>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread for
>> bd340179-5905-4233-ad5d-44b47b228177:6702
>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt __acker:(1)
>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1)
>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>> [nathan]
>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>> [bertels]
>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>> [golda]
>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>> source: wordGenerator:11, stream: default, id: {}, [golda]
>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>> source: wordGenerator:12, stream: default, id: {}, [nathan]
>>
>> After that, the logs stop (no more messages) and the UI says that the 5
>> executors are now running on another node (and port) - at which point the
>> worker log is created on that node and with that number (same messages).
>>
>> So the worker seems to start and start emitting words but the UI isn't
>> registered anything as having been emitted. After a short period of time
>> the worker seems to stop without logging anything and so a new worker is
>> started.
>>
>> Thanks,
>>
>> Matt
>>
>>
>>
>> On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com> wrote:
>>
>>> Did you check the worker process logs?  What are the emit counts for the
>>> components in the UI?
>>>
>>> Also I have not run on AWS personally but the following line seems
>>> strange to me:
>>>
>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>> compute.amazonaws.com"}
>>>
>>> Generally this would be bound to a specific machine, not *.<domain>.
>>>
>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>> matthewsharpe3@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I've set up a small Storm cluster on AWS that is giving me headaches.
>>>> At the moment I've got a situation where the cluster (a nimbus node and 3
>>>> supervisors) will successfully run in local mode my particular topology.
>>>> However, as soon as I try to run the topology in distributed mode there are
>>>> no errors (that I can see) in any of the logs - the topology appears in the
>>>> UI but never shows any tuples emitted by the spouts.
>>>>
>>>> In order to check it wasn't my topology I cloned the storm-starter
>>>> project and have tried with the RollingTopWords example - running locally I
>>>> get a stream of output but as soon as I specify that we should run in a
>>>> distributed fashion I get a success message (topology successfully
>>>> submitted). The topology appears on the UI as successfully submitted but I
>>>> see no progress.
>>>>
>>>> It's a bit of a weird one as I don't know which logs to share to show
>>>> the problem as the whole point is I'm not seeing any errors!
>>>>
>>>> The last entry in the supervisor logs pertains to the set up of the
>>>> supervisor (i.e. it's not logged anything about the topologies that have
>>>> been submitted)
>>>>
>>>> The last entries in the nimbus logs are probably a bit more
>>>> illuminating:
>>>>
>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>> production-topology-2-1428932653:[2 2] not alive
>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>> production-topology-2-1428932653:[3 3] not alive
>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>> production-topology-2-1428932653:[4 4] not alive
>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>> production-topology-2-1428932653 to 1 slots
>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3
>>>> 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14
>>>> 14] [15 15] [1 1]]
>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment for
>>>> topology id production-topology-2-1428932653:
>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
>>>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702],
>>>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>
>>>> If anybody can help that'd be greatly appreciated!
>>>>
>>>> Thanks,
>>>>
>>>> Matt
>>>>
>>>> --
>>>> Check out my blog: dogdogfish.com
>>>>
>>>
>>>
>>
>>
>> --
>> Check out my blog: dogdogfish.com
>>
>
>
>
> --
> Check out my blog: dogdogfish.com
>

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Matthew Sharpe <ma...@gmail.com>.
New bonus bit of information - the supervisor log has got this going on:

2015-04-13 15:13:48 b.s.d.supervisor [INFO]
3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and clearing
state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current supervisor time:
1428938060. State: :timed-out, Heartbeat:
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
:storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
[15 15] [-1 -1] [1 1]}, :port 6703}
2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill 5284. Process
is probably already dead.
2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
"production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
with id 786a62ac-2868-4526-a7b2-5c26234bb23e
2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with command:
java -server -Xmx768m -Djava.net.preferIPv4Stack=true
 -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
-Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
-Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
-Dstorm.id=production-topology-2-1428932653
-Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
/usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
backtype.storm.daemon.worker production-topology-2-1428932653
bd340179-5905-4233-ad5d-44b47b228177 6703
786a62ac-2868-4526-a7b2-5c26234bb23e
2015-04-13 15:14:20 b.s.d.supervisor [INFO]
786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started

On 13 April 2015 at 16:09, Matthew Sharpe <ma...@gmail.com> wrote:

> OK, in checking the worker process logs I've uncovered a bit more
> information. The UI reports 0 emitted from the wordGenerator spout, along
> with 0 transferred, acked and failed.
>
> Sorry - the *.compute.amazonaws.com is redacted (don't know the usual way
> of doing that), I've got a bunch of other stuff on the boxes and in
> troubleshooting things out have been a bit laxer with security than I
> should have been!
>
> The UI shows 5 workers all starting on the same node - each of which has a
> declared uptime of 1s. Looking at the logs on the node, during that time
> the following happens:
>
> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.io.tmpdir=/tmp
> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.compiler=<NA>
> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
> os.name=Linux
> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:os.arch=amd64
> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:os.version=3.13.0-48-generic
> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
> user.name=storm
> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:user.home=/app/home/storm
> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:user.dir=/
> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
> production-topology-2-1428932653 on
> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
> "topology.builtin.metrics.bucket.size.secs" 60,
> "topology.fall.back.on.java.serialization" true,
> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
> "supervisor.worker.start.timeout.secs" 120,
> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
> compute.amazonaws.com", "storm.zookeeper.port" 2181,
> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
> compute.amazonaws.com" "*.compute.amazonaws.com" "*.compute.amazonaws.com"],
> "transactional.zookeeper.root" "/transactional", "topology.acker.executors"
> nil, "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
> "topology.tasks" nil, "topology.spout.wait.strategy"
> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
> nil, "storm.zookeeper.retry.interval" 1000, "
> topology.sleep.spout.wait.strategy.time.ms" 1,
> "nimbus.topology.validator"
> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
> "backtype.storm.serialization.types.ListDelegateSerializer",
> "topology.disruptor.wait.strategy"
> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
> 5, "storm.thrift.transport"
> "backtype.storm.security.auth.SimpleTransportPlugin",
> "topology.state.synchronization.timeout.secs" 60,
> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
> "storm.cluster.mode" "distributed", "topology.optimize" true,
> "topology.max.task.parallelism" nil}
> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=*.compute.amazonaws.com:2181,*.compute.amazonaws.com:2181,*.
> compute.amazonaws.com:2181 sessionTimeout=20000
> watcher=com.netflix.curator.ConnectionState@585bd6a6
> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server *.compute.amazonaws.com/*:2181
> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection established
> to *.compute.amazonaws.com/*:2181, initiating session
> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server *.compute.amazonaws.com/*:2181, sessionid = 0x34cb217be05005e,
> negotiated timeout = 20000
> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
> :connected:none
> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session: 0x34cb217be05005e
> closed
> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=*.compute.amazonaws.com:2181,*.compute.amazonaws.com:2181,*.
> compute.amazonaws.com:2181/storm sessionTimeout=20000
> watcher=com.netflix.curator.ConnectionState@4b492c47
> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server *.compute.amazonaws.com/*:2181
> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection established
> to *.compute.amazonaws.com/*:2181, initiating session
> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server *.compute.amazonaws.com/*:2181, sessionid = 0x24cb217be030072,
> negotiated timeout = 20000
> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer transport
> plugin:backtype.storm.messaging.zmq
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor counter:[2 2]
>
>
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
> wordGenerator:[11 11]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
> wordGenerator:[11 11]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(11)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(11)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
> wordGenerator:(11)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
> wordGenerator:[12 12]
> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
> ["startup"]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
> wordGenerator:[12 12]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
> wordGenerator:[12 12]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(12)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(12)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
> wordGenerator:(12)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
> wordGenerator:[13 13]
> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
> ["startup"]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
> wordGenerator:[13 13]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
> wordGenerator:[13 13]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(13)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(13)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
> wordGenerator:(13)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
> wordGenerator:[14 14]
> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
> ["startup"]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
> wordGenerator:[14 14]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
> wordGenerator:[14 14]
> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
> [bertels]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(14)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(14)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
> wordGenerator:(14)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
> source: wordGenerator:11, stream: default, id: {}, [bertels]
> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
> [mike]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
> source: wordGenerator:12, stream: default, id: {}, [mike]
> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
> [bertels]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
> source: wordGenerator:13, stream: default, id: {}, [bertels]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
> wordGenerator:[15 15]
> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
> ["startup"]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
> wordGenerator:[15 15]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
> wordGenerator:[15 15]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(15)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(15)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
> wordGenerator:(15)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __system:[-1 -1]
> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
> ["startup"]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
> __system:[-1 -1]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
> __system:[-1 -1]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt __system:(-1)
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __acker:[1 1]
> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
> [jackson]
> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
> [mike]
> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
> source: wordGenerator:11, stream: default, id: {}, [jackson]
> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
> source: wordGenerator:14, stream: default, id: {}, [mike]
> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
> [jackson]
> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
> source: wordGenerator:12, stream: default, id: {}, [jackson]
> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
> [golda]
> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
> source: wordGenerator:13, stream: default, id: {}, [golda]
> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __system:(-1)
> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
> ["startup"]
> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks __acker:[1
> 1]
> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
> [jackson]
> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading executor
> __acker:[1 1]
> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread for
> bd340179-5905-4233-ad5d-44b47b228177:6702
> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt __acker:(1)
> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1)
> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
> [nathan]
> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
> [bertels]
> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
> [golda]
> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
> source: wordGenerator:11, stream: default, id: {}, [golda]
> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
> source: wordGenerator:12, stream: default, id: {}, [nathan]
>
> After that, the logs stop (no more messages) and the UI says that the 5
> executors are now running on another node (and port) - at which point the
> worker log is created on that node and with that number (same messages).
>
> So the worker seems to start and start emitting words but the UI isn't
> registered anything as having been emitted. After a short period of time
> the worker seems to stop without logging anything and so a new worker is
> started.
>
> Thanks,
>
> Matt
>
>
>
> On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com> wrote:
>
>> Did you check the worker process logs?  What are the emit counts for the
>> components in the UI?
>>
>> Also I have not run on AWS personally but the following line seems
>> strange to me:
>>
>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>> compute.amazonaws.com"}
>>
>> Generally this would be bound to a specific machine, not *.<domain>.
>>
>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>> matthewsharpe3@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I've set up a small Storm cluster on AWS that is giving me headaches. At
>>> the moment I've got a situation where the cluster (a nimbus node and 3
>>> supervisors) will successfully run in local mode my particular topology.
>>> However, as soon as I try to run the topology in distributed mode there are
>>> no errors (that I can see) in any of the logs - the topology appears in the
>>> UI but never shows any tuples emitted by the spouts.
>>>
>>> In order to check it wasn't my topology I cloned the storm-starter
>>> project and have tried with the RollingTopWords example - running locally I
>>> get a stream of output but as soon as I specify that we should run in a
>>> distributed fashion I get a success message (topology successfully
>>> submitted). The topology appears on the UI as successfully submitted but I
>>> see no progress.
>>>
>>> It's a bit of a weird one as I don't know which logs to share to show
>>> the problem as the whole point is I'm not seeing any errors!
>>>
>>> The last entry in the supervisor logs pertains to the set up of the
>>> supervisor (i.e. it's not logged anything about the topologies that have
>>> been submitted)
>>>
>>> The last entries in the nimbus logs are probably a bit more illuminating:
>>>
>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>> production-topology-2-1428932653:[2 2] not alive
>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>> production-topology-2-1428932653:[3 3] not alive
>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>> production-topology-2-1428932653:[4 4] not alive
>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>> production-topology-2-1428932653 to 1 slots
>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 3]
>>> [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>>> [15 15] [1 1]]
>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment for
>>> topology id production-topology-2-1428932653:
>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
>>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702],
>>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>
>>> If anybody can help that'd be greatly appreciated!
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>> --
>>> Check out my blog: dogdogfish.com
>>>
>>
>>
>
>
> --
> Check out my blog: dogdogfish.com
>



-- 
Check out my blog: dogdogfish.com

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Matthew Sharpe <ma...@gmail.com>.
OK, in checking the worker process logs I've uncovered a bit more
information. The UI reports 0 emitted from the wordGenerator spout, along
with 0 transferred, acked and failed.

Sorry - the *.compute.amazonaws.com is redacted (don't know the usual way
of doing that), I've got a bunch of other stuff on the boxes and in
troubleshooting things out have been a bit laxer with security than I
should have been!

The UI shows 5 workers all starting on the same node - each of which has a
declared uptime of 1s. Looking at the logs on the node, during that time
the following happens:

2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.io.tmpdir=/tmp
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.compiler=<NA>
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
os.name=Linux
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
environment:os.arch=amd64
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
environment:os.version=3.13.0-48-generic
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
user.name=storm
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
environment:user.home=/app/home/storm
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
environment:user.dir=/
2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
production-topology-2-1428932653 on
bd340179-5905-4233-ad5d-44b47b228177:6702 with id
ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
"/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
"topology.builtin.metrics.bucket.size.secs" 60,
"topology.fall.back.on.java.serialization" true,
"topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
"topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
-Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
"nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
"nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
"java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
"topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
"supervisor.worker.start.timeout.secs" 120,
"topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
"topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
compute.amazonaws.com", "storm.zookeeper.port" 2181,
"transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
"/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
"supervisor.enable" true, "storm.zookeeper.servers" ["*.
compute.amazonaws.com" "*.compute.amazonaws.com" "*.compute.amazonaws.com"],
"transactional.zookeeper.root" "/transactional", "topology.acker.executors"
nil, "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
"drpc.queue.size" 128, "worker.childopts" "-Xmx768m
-Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
"topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
"supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
"topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
"topology.tasks" nil, "topology.spout.wait.strategy"
"backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
nil, "storm.zookeeper.retry.interval" 1000, "
topology.sleep.spout.wait.strategy.time.ms" 1, "nimbus.topology.validator"
"backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
[6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
30, "task.refresh.poll.secs" 10, "topology.workers" 1,
"supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
"nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
"worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
"backtype.storm.serialization.types.ListDelegateSerializer",
"topology.disruptor.wait.strategy"
"com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
"storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
"backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
5, "storm.thrift.transport"
"backtype.storm.security.auth.SimpleTransportPlugin",
"topology.state.synchronization.timeout.secs" 60,
"supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
"drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
"storm.cluster.mode" "distributed", "topology.optimize" true,
"topology.max.task.parallelism" nil}
2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=*.compute.amazonaws.com:2181,*.compute.amazonaws.com:2181,*.
compute.amazonaws.com:2181 sessionTimeout=20000
watcher=com.netflix.curator.ConnectionState@585bd6a6
2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection to
server *.compute.amazonaws.com/*:2181
2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection established
to *.compute.amazonaws.com/*:2181, initiating session
2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment complete
on server *.compute.amazonaws.com/*:2181, sessionid = 0x34cb217be05005e,
negotiated timeout = 20000
2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
:connected:none
2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session: 0x34cb217be05005e closed
2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=*.compute.amazonaws.com:2181,*.compute.amazonaws.com:2181,*.
compute.amazonaws.com:2181/storm sessionTimeout=20000
watcher=com.netflix.curator.ConnectionState@4b492c47
2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection to
server *.compute.amazonaws.com/*:2181
2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection established
to *.compute.amazonaws.com/*:2181, initiating session
2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment complete
on server *.compute.amazonaws.com/*:2181, sessionid = 0x24cb217be030072,
negotiated timeout = 20000
2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer transport
plugin:backtype.storm.messaging.zmq
2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor counter:[2 2]


2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
wordGenerator:[11 11]
2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
wordGenerator:[11 11]
2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(11)
2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(11)
2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
wordGenerator:(11)
2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
wordGenerator:[12 12]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
["startup"]
2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
wordGenerator:[12 12]
2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
wordGenerator:[12 12]
2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(12)
2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(12)
2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
wordGenerator:(12)
2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
wordGenerator:[13 13]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
["startup"]
2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
wordGenerator:[13 13]
2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
wordGenerator:[13 13]
2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(13)
2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(13)
2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
wordGenerator:(13)
2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
wordGenerator:[14 14]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
["startup"]
2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
wordGenerator:[14 14]
2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
wordGenerator:[14 14]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
[bertels]
2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(14)
2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(14)
2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
wordGenerator:(14)
2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
source: wordGenerator:11, stream: default, id: {}, [bertels]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default [mike]
2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
source: wordGenerator:12, stream: default, id: {}, [mike]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
[bertels]
2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
source: wordGenerator:13, stream: default, id: {}, [bertels]
2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
wordGenerator:[15 15]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
["startup"]
2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
wordGenerator:[15 15]
2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
wordGenerator:[15 15]
2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(15)
2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(15)
2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
wordGenerator:(15)
2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __system:[-1 -1]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
["startup"]
2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
__system:[-1 -1]
2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
__system:[-1 -1]
2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt __system:(-1)
2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __acker:[1 1]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
[jackson]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default [mike]
2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
source: wordGenerator:11, stream: default, id: {}, [jackson]
2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
source: wordGenerator:14, stream: default, id: {}, [mike]
2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
[jackson]
2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
source: wordGenerator:12, stream: default, id: {}, [jackson]
2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
[golda]
2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
source: wordGenerator:13, stream: default, id: {}, [golda]
2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __system:(-1)
2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system ["startup"]
2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks __acker:[1
1]
2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
[jackson]
2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading executor
__acker:[1 1]
2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread for
bd340179-5905-4233-ad5d-44b47b228177:6702
2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt __acker:(1)
2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1)
2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
[nathan]
2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
[bertels]
2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
[golda]
2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
source: wordGenerator:11, stream: default, id: {}, [golda]
2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
source: wordGenerator:12, stream: default, id: {}, [nathan]

After that, the logs stop (no more messages) and the UI says that the 5
executors are now running on another node (and port) - at which point the
worker log is created on that node and with that number (same messages).

So the worker seems to start and start emitting words but the UI isn't
registered anything as having been emitted. After a short period of time
the worker seems to stop without logging anything and so a new worker is
started.

Thanks,

Matt



On 13 April 2015 at 15:36, Nathan Leung <nc...@gmail.com> wrote:

> Did you check the worker process logs?  What are the emit counts for the
> components in the UI?
>
> Also I have not run on AWS personally but the following line seems strange
> to me:
>
> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
> compute.amazonaws.com"}
>
> Generally this would be bound to a specific machine, not *.<domain>.
>
> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <matthewsharpe3@gmail.com
> > wrote:
>
>> Hi,
>>
>> I've set up a small Storm cluster on AWS that is giving me headaches. At
>> the moment I've got a situation where the cluster (a nimbus node and 3
>> supervisors) will successfully run in local mode my particular topology.
>> However, as soon as I try to run the topology in distributed mode there are
>> no errors (that I can see) in any of the logs - the topology appears in the
>> UI but never shows any tuples emitted by the spouts.
>>
>> In order to check it wasn't my topology I cloned the storm-starter
>> project and have tried with the RollingTopWords example - running locally I
>> get a stream of output but as soon as I specify that we should run in a
>> distributed fashion I get a success message (topology successfully
>> submitted). The topology appears on the UI as successfully submitted but I
>> see no progress.
>>
>> It's a bit of a weird one as I don't know which logs to share to show the
>> problem as the whole point is I'm not seeing any errors!
>>
>> The last entry in the supervisor logs pertains to the set up of the
>> supervisor (i.e. it's not logged anything about the topologies that have
>> been submitted)
>>
>> The last entries in the nimbus logs are probably a bit more illuminating:
>>
>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>> production-topology-2-1428932653:[2 2] not alive
>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>> production-topology-2-1428932653:[3 3] not alive
>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>> production-topology-2-1428932653:[4 4] not alive
>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>> production-topology-2-1428932653 to 1 slots
>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 3]
>> [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>> [15 15] [1 1]]
>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment for
>> topology id production-topology-2-1428932653:
>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702],
>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>
>> If anybody can help that'd be greatly appreciated!
>>
>> Thanks,
>>
>> Matt
>>
>> --
>> Check out my blog: dogdogfish.com
>>
>
>


-- 
Check out my blog: dogdogfish.com

Re: Storm Cluster working in local mode but not in production - No logged errors and UI looks fine

Posted by Nathan Leung <nc...@gmail.com>.
Did you check the worker process logs?  What are the emit counts for the
components in the UI?

Also I have not run on AWS personally but the following line seems strange
to me:

:node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com
"}

Generally this would be bound to a specific machine, not *.<domain>.

On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <ma...@gmail.com>
wrote:

> Hi,
>
> I've set up a small Storm cluster on AWS that is giving me headaches. At
> the moment I've got a situation where the cluster (a nimbus node and 3
> supervisors) will successfully run in local mode my particular topology.
> However, as soon as I try to run the topology in distributed mode there are
> no errors (that I can see) in any of the logs - the topology appears in the
> UI but never shows any tuples emitted by the spouts.
>
> In order to check it wasn't my topology I cloned the storm-starter project
> and have tried with the RollingTopWords example - running locally I get a
> stream of output but as soon as I specify that we should run in a
> distributed fashion I get a success message (topology successfully
> submitted). The topology appears on the UI as successfully submitted but I
> see no progress.
>
> It's a bit of a weird one as I don't know which logs to share to show the
> problem as the whole point is I'm not seeing any errors!
>
> The last entry in the supervisor logs pertains to the set up of the
> supervisor (i.e. it's not logged anything about the topologies that have
> been submitted)
>
> The last entries in the nimbus logs are probably a bit more illuminating:
>
> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
> production-topology-2-1428932653:[2 2] not alive
> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
> production-topology-2-1428932653:[3 3] not alive
> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
> production-topology-2-1428932653:[4 4] not alive
> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
> production-topology-2-1428932653 to 1 slots
> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 3]
> [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
> [15 15] [1 1]]
> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment for
> topology id production-topology-2-1428932653:
> #backtype.storm.daemon.common.Assignment{:master-code-dir
> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702],
> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>
> If anybody can help that'd be greatly appreciated!
>
> Thanks,
>
> Matt
>
> --
> Check out my blog: dogdogfish.com
>