You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Alberto Coello <co...@gmail.com> on 2020/09/22 14:13:27 UTC

Dynamic topologies

Greetings,
I have seen some discussions on this topic online but nothing conclusive
yet. I wanted to ask if there is any way to change the wiring of spouts and
bolts in runtime, my intention is to make a platform where users can
subscribe to different streams of data(Twitter, YT or whatever they want)
and then apply to it a given filter, I thought I could implement it on
Apache Storm but it seems to be more of a framework thought to deploy a
single behaviour topology for each cluster. Instead, I think that for me
would work better if I could implement bolts which do an specific filtering
task(blocking a specific word, blocking tweets with less than N retweets,
etc) and then wiring them according to what the user wants(some user may
want to use some filtering feature while others don't). The thing is that
given a configuration(by the user), I would like to change the wiring of
the topology. And also the bolts should be able to read some parameters
given by the user(like the word to block, or the number N used to block
tweets with less than N retweets).

I have found a framework built on top of Storm called Flux that lets you
change the wiring in runtime, the doubt I have is that if I use this I
should provide every user with a different cluster/topology, that they
could configure. But some problems will appear with this implementation:
1)If I wanted to add a new filtering feature, adding a new bolt to my
topology, I would have to redeploy every single user's cluster. ¿Would this
be a problem?
2)Would it have any sense that users will not share topologies? I mean if
it would be killing flies with cannons, since probably most of the clusters
would not be getting a big data size stream.
3)Scalability, How could I manage which bolts deploy where, in terms of
expected performance demand, and how could I move them or give them more
resources so they can cope with the stream they are receiving?

I think that maybe these could be fixed with docker swarm so it would give
me more flexibility but I do not know if there is another framework of real
time stream processing that would fit my necessities better or other
containering tool or other way of facing the problem that would help me.
I appreciate any suggestion or help, I am pretty stuck right now.
PD: I have checked distributed RPC but still in doubt.
Thanks a lot,
Alberto