You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Iván García <iv...@gmail.com> on 2015/05/07 20:37:03 UTC

Fwd: Maximum recommended Spout TTL

Good morning,

I have a Storm project that aggregates Entities into Groups. Each Group is
considered "ready to process" if didn't receive an Event for the past 10
minutes. This means that I need to save those groups in the meantime. I
planned 2 scenarios: use an external storage (MySQL, Cassandra, Redis...)
or use Storm as storage (using a distributed HashMap along some kind of
bolt, for example). Both solutions have their benefits and their weak
points:

In the first case, we need to access to an external system, that means that
we will need to aggregate the tuples to minimize the round trips, but we
have the stuff stored in a more or less reliable place (depends of the db
will be more reliable or less reliable).

In the second case, we need to increase the TTL of the spout to ensure that
we give enough time to the group of tuples to be considered ready to
process. However, we prevent the round trips and we can scale just
considering the size of the storm.

Said that, which solution is better? And, more important, is there are any
red line that shouldn't be crossed in terms of TTL if the second case wants
to be applied? To give you an idea, I'm planning in have a TTL of near 2h.

Regards,

Ivan Garcia Maya