You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Rohit Kochar <mn...@gmail.com> on 2015/08/05 08:27:12 UTC

Low throughput of Trident Spout

I have written a custom TridentSpout which reads from HDFS and publishes the content as messages to further group by bolts.This spout implements the IBatchSpout interface.

While performing the benchmark of my topology i observed that i am getting a max throughput of 10K messages from spout although if i execute my HDFS reading part in isolation and not as Trident Spout i get a throughput of 100K hence it is clear than bottleneck is on the storm side.

I further instrumented the code and figured that a single call of 
> collector.emit(List<Object>)
from the spout is taking 0.1 ms of time and increasing the buffer sizes i.e “topology.receiver.buffer.size = 16”,”topology.transfer.buffer.size=64”,”topology.executor.receive.buffer.size=32768” and “topology.executor.send.buffer.size=32768” also didn’t help.

Is it expected for the above mentioned function call to take so much time??

Thanks
Rohit