You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Arnaud BOS <ar...@gmail.com> on 2017/11/14 15:51:09 UTC

Parallelism, field grouping and windowing

Hi all,

Suppose I have a WindowedBolt subscribing to a KafkaSpout stream.
The WindowedBolt is a sliding window of size 2 with a sliding interval of 1.
The stream from KafkaSpout to WindowedBolt is partitioned by field (Field
Grouping).
To keep things simple let's assume the stream gets partitioned in two
groups only, A and B in the context of two supervisors with one worker
process each (setNumWorkers(2)):

By default the number of tasks of a spout/bolt is equal to the parallelism
hint, correct ?

If the WindowedBolt has a parallelism hint of 2 and 2 tasks (setNumTasks(2)
or not specified), one thread (executor) running exactly 1 task will
execute on each worker/supervisor, correct ?

Each executor/task will receive tuples from only one of the partitions, for
instance executor/task 1 gets tuples from substream A and executor/task 2
get tuples from substream B, correct ?

Assuming the latter is correct, if a node/supervisor fails, will the
remaining task begin to receive tuples from the other half of the stream
partition ? That is, if executor/task 2 (on supervisor 2) disappears, will
the tuples from A and B be interleaved in the sliding window of
executor/task 1 ? Or will the substream B just "hang" until another task is
started to handle it ?

All of this makes sense and my use case might look counterproductive but I
need to be sure of what's happening, my workflow is inherently sequential
and I don't want "streams" to interleave.

Thanks !