You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Bradford Stephens <br...@gmail.com> on 2015/01/09 05:23:01 UTC

Is Storm appropriate for this use case?

Hi!

I wanted to run something by ya'll, since I'm working on a new project
that's a bit outside of my expertise. I'm not sure if Storm is the right
fit, as there's some unique aspects. Here's the data flow:

1. Data arrives and is appended to one of 10M buffers (partitioned by
userID)
1a. Each buffer holds only up to a few hundred records over a 24 hour period
2. When data arrives, computation is run on it that may also use data in
the buffer. The model executed is unique for every buffer.
3. Depending on results of the computation, data in the buffer is
unchanged, updated, or aggregated and pushed to another system.
4. Once every X hours from the last message received, all non-empty buffers
have a computation executed and the results pushed to another system.

I'm looking for something that seems to be a hybrid ring buffer/stream
processor, but it seems that I can only find one or the only component
(Kafka, Spark, Kinesis, etc.).

Does this make sense?  Is there enough detail? Can I achieve real-time,
buffering, and data/computational locality?


-B

-- 
Bradford Stephens
Freelance CTO & Startup Exec
22acacia.com
(530) 763-DATA