You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Nathan Leung <nc...@gmail.com> on 2014/07/30 18:26:50 UTC
Re: FW: why complete latency and failure rate is so high of my spout.
Complete latency can be higher if you have a lot of data pending from the
spout (your topology.max.spout.pending is not set), however given the low
number of tuples I wouldn't suspect this to be the case. What is the
layout of your topology? Is it circular in any way?
On Wed, Jul 30, 2014 at 12:11 PM, Wei, Xin <Xi...@intuit.com> wrote:
>
>
>
>
> Hi There,
>
> I’m new to storm. I set up a storm cluster with 3 machines. One machine
> I use it as master, I start up zookeeper and nimbus server on it, another 2
> machines I use them as supervisor. I got a very high rate of failure of
> my spout and I can not figure out why. Please see the detailed statistics
> as below.
> opology stats Window Emitted Transferred Complete latency (ms) Acked
> Failed 10m 0s
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-1-1406570637?window=600>
> 32 32 124856.430 7 6 3h 0m 0s
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-1-1406570637?window=10800>
> 15780 15780 108286.484 2885 6435 1d 0h 0m 0s
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-1-1406570637?window=86400>
> 15780 15780 108286.484 2885 6435 All time
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-1-1406570637?window=%3Aall-time>
> 15780 15780 108286.484 2885 6435 Spouts (All time) Id Executors Tasks
> Emitted Transferred Complete latency (ms) Acked Failed Last error
> JMS_QUEUE_SPOUT
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-1-1406570637/component/JMS_QUEUE_SPOUT>
> 2 2 9320 9320 108286.484 2885 6435 Bolts (All time)
> Id Executors Tasks Emitted Transferred Capacity (last 10m) Execute
> latency (ms) Executed Process latency (ms) Acked Failed Last error
> AGGREGATOR_BOLT
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-1-1406570637/component/AGGREGATOR_BOLT>
> 8 8 3230 3230 0.000 61.529 3230 58.975 3230 0 MESSAGEFILTER_BOLT
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-1-1406570637/component/MESSAGEFILTER_BOLT>
> 8 8 3230 3230 0.000 23.207 9320 18.054 9320 0 OFFER_GENERATOR_BOLT
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-1-1406570637/component/OFFER_GENERATOR_BOLT>
> 8 8 0 0 0.000 38.267 3230 34.693 3230 0
> For the spout , only 2885 tuples got acked and 6435 got failed. The
> complete latency is horribly hight.
>
> Here is my storm configuration :
>
> Topology Configuration Key Value zmq.threads 1 zmq.linger.millis
> 5000 zmq.hwm 0 worker.heartbeat.frequency.secs 1 worker.childopts -Xmx768m
> -Djava.net.preferIPv4Stack=false -DNEARLINE_DATA_ENV=dev
> -DNEARLINE_APP_ENV=dev -DNEARLINE_QUEUES_ENV=dev
> -Dauthfilter.appcred.default.encrypt.file=/home/xwei/FP_AppCred_Encrypt.txt
> -Dauthfilter.appcred.default.passphrase.file=/home/xwei/FP_AppCred_Passphrase.txt
> ui.port 8080 ui.childopts -Xmx768m transactional.zookeeper.servers
> transactional.zookeeper.root /transactional transactional.zookeeper.port
> topology.workers 4 topology.worker.shared.thread.pool.size 4
> topology.worker.childopts topology.tuple.serializer
> backtype.storm.serialization.types.ListDelegateSerializer
> topology.trident.batch.emit.interval.millis 500
> topology.transfer.buffer.size 32 topology.tick.tuple.freq.secs
> topology.tasks topology.stats.sample.rate 1
> topology.state.synchronization.timeout.secs 60
> topology.spout.wait.strategy backtype.storm.spout.SleepSpoutWaitStrategy
> topology.sleep.spout.wait.strategy.time.ms 1
> topology.skip.missing.kryo.registrations false
> topology.receiver.buffer.size 8 topology.optimize true topology.name
> nearline topology.message.timeout.secs 30 topology.max.task.parallelism
> topology.max.spout.pending topology.max.error.report.per.interval 5
> topology.kryo.register topology.kryo.factory
> backtype.storm.serialization.DefaultKryoFactory topology.kryo.decorators
> [] topology.fall.back.on.java.serialization true
> topology.executor.send.buffer.size 16384
> topology.executor.receive.buffer.size 16384
> topology.error.throttle.interval.secs 10 topology.enable.message.timeouts
> true topology.disruptor.wait.strategy
> com.lmax.disruptor.BlockingWaitStrategy topology.debug false
> topology.builtin.metrics.bucket.size.secs 60 topology.acker.executors 4
> task.refresh.poll.secs 10 task.heartbeat.frequency.secs 3
> supervisor.worker.timeout.secs 30 supervisor.worker.start.timeout.secs
> 120 supervisor.slots.ports [6700 6701 6702 6703]
> supervisor.monitor.frequency.secs 3 supervisor.heartbeat.frequency.secs 5
> supervisor.enable true supervisor.childopts -Xmx256m
> -Djava.net.preferIPv4Stack=true storm.zookeeper.session.timeout 20000
> storm.zookeeper.servers ["zookeeper"] storm.zookeeper.root /storm
> storm.zookeeper.retry.times 5
> storm.zookeeper.retry.intervalceiling.millis 30000
> storm.zookeeper.retry.interval 1000 storm.zookeeper.port 2181
> storm.zookeeper.connection.timeout 15000 storm.thrift.transport
> backtype.storm.security.auth.SimpleTransportPlugin
> storm.messaging.transport backtype.storm.messaging.zmq
> storm.messaging.netty.server_worker_threads 1
> storm.messaging.netty.min_wait_ms 100 storm.messaging.netty.max_wait_ms
> 1000 storm.messaging.netty.max_retries 30
> storm.messaging.netty.client_worker_threads 1
> storm.messaging.netty.buffer_size 5242880 storm.local.mode.zmq false
> storm.local.dir /app_local/storm storm.id nearline-1-1406570637
> storm.cluster.mode distributed nimbus.topology.validator
> backtype.storm.nimbus.DefaultTopologyValidator nimbus.thrift.port 6627
> nimbus.task.timeout.secs 30 nimbus.task.launch.secs 120
> nimbus.supervisor.timeout.secs 60 nimbus.reassign true
> nimbus.monitor.freq.secs 10 nimbus.inbox.jar.expiration.secs 3600
> nimbus.host zookeeper nimbus.file.copy.expiration.secs 600
> nimbus.cleanup.inbox.freq.secs 600 nimbus.childopts -Xmx1024m
> -Djava.net.preferIPv4Stack=true logviewer.port 8000 logviewer.childopts
> -Xmx128m logviewer.appender.name A1 java.library.path /usr/local/lib
> drpc.worker.threads 64 drpc.request.timeout.secs 600 drpc.queue.size 128
> drpc.port 3772 drpc.invocations.port 3773 drpc.childopts -Xmx768m
> dev.zookeeper.path /tmp/dev-storm-zookeeper
> I have several questions:
>
> 1. What is complete latency for spout ? According to the google,
> complete latency means *The timer is started when the tuple is
> emitted from the spout and it is stopped when the tuple is acked. So it
> measures the time for the entire tuple tree to be completed.* As far
> as I understand, the spout emit a tuple, the tuple would be passed to the
> bolt and be parsed and processed there, then the spout would ack it. This
> means it might be my bolt which spend to much time to process the tuple.
> But according to the above data, it seems the bolt ’s execution latency and
> process latency is not high at all compared with complete latency of spout,
> which is contradict to my understanding.
> 2. What is the relationship between the complete latency and failure
> rate? Also what is the relationship between complete latency and the total
> throuput?
> 3. Is there any hint how should I deal with this high failure rate
> issue?
>
> Thanks a lot for the help.
>
>
>
>
>