You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Bryan Stone <br...@synapse-wireless.com> on 2014/07/01 05:57:55 UTC

Task Assignment / In-Memory States

We are maintaining state of objects in-memory as part of maximizing the bolt performance and reduce the number of data access calls.  To date, the “cache” of the in-memory state has been persisted to a Cassandra column-family using the taskId as the row key.  However, Nathan pointed out at one time that the taskId was not entirely reliable for re-assignment, though in our testing it does tend to be re-assigned as long as the deployment looks identical.

So the question is: what’s the best “key” to use for a state that a worker can rely on across deployments and/or rebalancing?  We “build up” actions by writing to Cassandra, and then execute the actions once we reach a determined threshold.  However, each task is responsible for a subset based on the grouping.  In the event those groupings change (rebalance?), is there a way to programmatically have the task know which “keys” he is responsible for recovering?

Thanks!
Bryan


==========================

This e-mail, including any attachments, is intended for the exclusive use of the person(s) to which it is addressed and may contain proprietary, confidential and/or privileged information. If the reader of this e-mail is not the intended recipient or his or her authorized agent, any review, use, printing, copying, disclosure, dissemination or distribution of this e-mail is strictly prohibited. If you think that you have received the e-mail in error, please notify the sender immediately by return e-mail, delete this communication and destroy all copies.

==========================