You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Wijekoon, Manusha " <ma...@citi.com> on 2017/08/10 10:56:57 UTC

RE: is stateful bolts production ready?

In our case we prefer to use our own state implementation. After going through the code and reading documentation, following is how I understand it. Could you please see if my understanding is correct?

1. Derive from State and provide an implementation. In the commit (txID) method are we supposed to persists the state by our selves or does the framework take care of that? If it is taken care of by the framework, how do we add our own persisting mechanism - for example one that use Kafka to persist state?
2. Subclass StateProvider to return State objects for namespaces of interest. For example, in our case, we wish to use a custom state class in one of the bolts and use defaults for spouts. In this case, is it safe to return custom states for the bolt in concern and use the default state provider (InMemoryKeyValueStateProvider) for other namespaces? Is the custom provider supposed to load last saved state for the namespace in concern from the persistent store. Again if state persistence is handled by framework, how do we know where to get state from?
3. Are checkpoint related methods called by the same bolt or spout thread?

Thanks
Manusha


________________________________
From: Arun Iyer [aiyer@hortonworks.com] on behalf of Arun Mahadevan [arunm@apache.org]
Sent: Monday, July 24, 2017 2:29 PM
To: user@storm.apache.org
Subject: Re: is stateful bolts production ready?

The bolt just needs to “put” the values into the Key-Value state that the bolt gets initialized with during “initState”. The framework automatically takes care of saving the state behind the scenes.

Theres an example in storm-starter that you might find useful - https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/StatefulTopology.java<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_examples_storm-2Dstarter_src_jvm_org_apache_storm_starter_StatefulTopology.java&d=DwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM&m=afgts--lg7Jf3oTEhOyGvkwmkT8RVx1LedYRwfuTwLg&s=TLi3IYjWB8QoSVTxXNx7O2mJk5kuXb5w1SbFUj47OVQ&e=>

You can also find the more elaborate documentation here - https://github.com/apache/storm/blob/master/docs/State-checkpointing.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_docs_State-2Dcheckpointing.md&d=DwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM&m=afgts--lg7Jf3oTEhOyGvkwmkT8RVx1LedYRwfuTwLg&s=dV6qlBomTiYIN23BV3fzJl7nhJBd9ewoFsDi3HkxD6I&e=>

Thanks,
Arun

From: "Wijekoon, Manusha" <ma...@citi.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Monday, July 24, 2017 at 4:04 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: is stateful bolts production ready?

Hello

I am thinking of using stateful bolts to manage state of a bolt. From the documentation it is not clear how to save the bolt state however. I understand it has to be done when we process the checkpoint tuple, but how? Do I just need to update the state object and storm pick it up during three phase commit? How does Strom know which state object to pick for check pointing?

I wasn’t able to fine more complete examples either, specifically when we can’t keep the state in a kev/value map.

Also, Is this functionality tested in production like environments before?


Thanks
M

Re: RE: is stateful bolts production ready?

Posted by 王 纯超 <wa...@outlook.com>.
In addition to the query, what is the intent of stateful bolt since we can just hold state in bolt instance?

________________________________
wangchunchao@outlook.com

From: Wijekoon, Manusha<ma...@citi.com>
Date: 2017-08-10 18:56
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: RE: is stateful bolts production ready?
In our case we prefer to use our own state implementation. After going through the code and reading documentation, following is how I understand it. Could you please see if my understanding is correct?

1. Derive from State and provide an implementation. In the commit (txID) method are we supposed to persists the state by our selves or does the framework take care of that? If it is taken care of by the framework, how do we add our own persisting mechanism - for example one that use Kafka to persist state?
2. Subclass StateProvider to return State objects for namespaces of interest. For example, in our case, we wish to use a custom state class in one of the bolts and use defaults for spouts. In this case, is it safe to return custom states for the bolt in concern and use the default state provider (InMemoryKeyValueStateProvider) for other namespaces? Is the custom provider supposed to load last saved state for the namespace in concern from the persistent store. Again if state persistence is handled by framework, how do we know where to get state from?
3. Are checkpoint related methods called by the same bolt or spout thread?

Thanks
Manusha


________________________________
From: Arun Iyer [aiyer@hortonworks.com] on behalf of Arun Mahadevan [arunm@apache.org]
Sent: Monday, July 24, 2017 2:29 PM
To: user@storm.apache.org
Subject: Re: is stateful bolts production ready?

The bolt just needs to “put” the values into the Key-Value state that the bolt gets initialized with during “initState”. The framework automatically takes care of saving the state behind the scenes.

Theres an example in storm-starter that you might find useful - https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/StatefulTopology.java<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_examples_storm-2Dstarter_src_jvm_org_apache_storm_starter_StatefulTopology.java&d=DwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM&m=afgts--lg7Jf3oTEhOyGvkwmkT8RVx1LedYRwfuTwLg&s=TLi3IYjWB8QoSVTxXNx7O2mJk5kuXb5w1SbFUj47OVQ&e=>

You can also find the more elaborate documentation here - https://github.com/apache/storm/blob/master/docs/State-checkpointing.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_docs_State-2Dcheckpointing.md&d=DwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM&m=afgts--lg7Jf3oTEhOyGvkwmkT8RVx1LedYRwfuTwLg&s=dV6qlBomTiYIN23BV3fzJl7nhJBd9ewoFsDi3HkxD6I&e=>

Thanks,
Arun

From: "Wijekoon, Manusha" <ma...@citi.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Monday, July 24, 2017 at 4:04 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: is stateful bolts production ready?

Hello

I am thinking of using stateful bolts to manage state of a bolt. From the documentation it is not clear how to save the bolt state however. I understand it has to be done when we process the checkpoint tuple, but how? Do I just need to update the state object and storm pick it up during three phase commit? How does Strom know which state object to pick for check pointing?

I wasn’t able to fine more complete examples either, specifically when we can’t keep the state in a kev/value map.

Also, Is this functionality tested in production like environments before?


Thanks
M