You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Jori <jo...@me.com> on 2014/10/31 21:50:35 UTC

Am I mis-using Storm DRPC as if it's a message queue and should I stop doing it?

Hi all,

I'd like to collect opinions on our Storm / DRPC implementation to determine whether we're on the right track.

We have 1 "Dispatch" Storm topology, currently consuming data via DRPC, about 20 gig/day in approx 90,000 requests. 
We don't need to return anything to the data producer. We changed DRPCSpout to immediately return "ack" after receiving a call to reduce client block time.

So, this Dispatch Storm topology parses the input data: a big block of encrypted text with multiple lines: it decrypts it and splits it into separate lines. This Dispatch topology sends it back to the DRPC server as separate calls with a smaller payload, with a specific DRPC function depending on which topology needs to process it further (this is determined based on the content of each line). 1 of the DRPC calls described earlier is maybe 100 smaller DRPC calls in this step.

Currently 4 "Processing" topologies query the DRPC server with a specific function name and process the line-based data (again, the DRPC response is immediately given as "ack").


It kind of feels like we're mis-using DRPC as if it's a message queue and that we are better off switching to something like Kafka. I'm afraid the DRPCClient in the Dispatch topology is blocking until the processing topology picks it up. But so far it seems to work okay. I'm worried of higher loads which we expect in the future. Interested in opinions.


Kind regards,
Jori