You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by 임정택 <ka...@gmail.com> on 2014/06/25 11:16:08 UTC

Native memory fills up from Trident with grouping two functions

Hello all.

We're using Storm with DRPC + Trident & ZeroMQ.
We were testing our topology and got stuck with lots of memory usage -
finally killed.

We have 3 machines and 3 workers.
Topology emits tree-like tuples and last function aggregates into one.

Let me name functions by A, B, C, D, E.
Topology emits tuples to A:100 -> B:1 (total 100) -> C:20000 (total:
200000) -> D:1 (total 200000) -> E:1 (aggregate)
C populates datas (10 datas per one populating) and emits to D
sequentially, so we gave D higher parallelism (about 10x or higher).
Each functions use repartitioning - shuffle.

At first, workers are killed.
It's not OOME, and we run worker with Xmx1g but memory raises 5~6G so it
may be native memory area, we were doubting ZeroMQ.

C -> D emits big tuple (about 100k) faster, and it could lead to a problem
when tuples are sent to outside of worker.
(Am I right? Or ZeroMQ handles inter-thread messages?)

So we removed shuffle between C and D.
(topology.optimize is true.)
Now C and D is recognized by one bolt (number-C-D by storm UI).

But modified topology has same behavior - uses lots of memory (similar to
before) and killed.

I'm wondering what does it change when we groups function into one, C-D.
We expect that C emits message, and D (same executor or same worker)
executes it, so there're no worker by worker interaction between C-D.
Are we expecting wrong? If so please explain this behavior related to
grouped function.

Thanks in advance!

Regards.
Jungtaek Lim (HeartSaVioR)


-- 
Name : 임 정택
Blog : http://www.heartsavior.net / http://dev.heartsavior.net
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior