You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Andre Piwoni <ap...@gmail.com> on 2017/12/19 07:34:58 UTC

Batch processing with orchestration within Apache Storm. Bad idea?

My group is using Apache Storm for some near real-time processing but also
for on-demand batch processing with orchestration. Recently, idea was
thrown to use Kafka to orchestrate between batch processing jobs/pipelines
and I don't think this is a good idea.

Given the following flow:

Request -> BatchJobA (find all missing IDs to process for request) -> when
all done and no IDs found BatchJobC (process existing IDs) -> notify when
all done
Request -> BatchJobA (find all missing IDs to process for request) -> when
all done and some IDs found BatchJobC (create missing IDs) ->  BatchJobC
(process existing IDs) -> notify when all done

While processing within batch jobs can be parallelized, each batch job has
to wait for the completion of previous job.
I don't think Apache Storm is a tool for such processing but if you have a
hammer everything may seem like a nail. Having said that, would
CoordinatorBolt work for above scenario and how? Would Trident be
appropriate for this type of processing?

Thanks for your thoughts,
Andre