You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Russell Bateman <ru...@windofkeltia.com> on 2019/08/09 17:53:53 UTC
StateManager race condition potential
I'm assuming that the StateManagerprotects itself against race
conditions for the consuming (custom) processor, but I'd like
confirmation on that. Let's say something simple like we get an integer
out of state to which we can add one to get the next (piece of work to
do), then immediately bump and write that value plus 1 for the next
thread to get. In the time it took us to get the value back, bump it by
1, then put it out (I'm assuming Scope.LOCAL), I don't see that the
StateManageris prevented from handing out that same value to another
instance or task of my processor.
How does StateManager Scopeaffect this? (By whether the instance of
state is per host or per cluster?)
How does processor behavior annotation affect this?
How does processor scheduling configuration (concurrent task count)
affect this?
Thanks for any comments.
Re: StateManager race condition potential
Posted by Bryan Bende <bb...@gmail.com>.
You would probably want to use the replace method in StateManager [1]
to perform a compare-and-set which would make sure what you are
sending back is accurate based on the previous state you retrieved.
The compare and set operation is implemented by the underlying
provider - write ahead log for local, and ZooKeeper for cluster (or
Redis).
[1] https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/components/state/StateManager.java#L92
On Fri, Aug 9, 2019 at 2:11 PM Joe Witt <jo...@gmail.com> wrote:
>
> Russell
>
> As I understand it the state is specific to an instance of a processor on
> the flow. Your safety ends there. If you allow that processor to have
> multiple tasks running concurrently then you'll need to protect usage of
> that state mechanism just as you would with any other variable in the scope
> of the processor. If the scope is local then the above is really all you
> need to think about. If the scope is cluster wide then generally speaking
> I think the intent of usage for that often is associated with a primary
> node only thing like a ListX processor (ListFile) with the idea being that
> the state can be restored by some other node if that could get assigned
> that role. I'm not clear on the role of cluster wide state otherwise.
> Others will have to comment on that.
>
> Thanks
>
> On Fri, Aug 9, 2019 at 1:54 PM Russell Bateman <ru...@windofkeltia.com>
> wrote:
>
> > I'm assuming that the StateManagerprotects itself against race
> > conditions for the consuming (custom) processor, but I'd like
> > confirmation on that. Let's say something simple like we get an integer
> > out of state to which we can add one to get the next (piece of work to
> > do), then immediately bump and write that value plus 1 for the next
> > thread to get. In the time it took us to get the value back, bump it by
> > 1, then put it out (I'm assuming Scope.LOCAL), I don't see that the
> > StateManageris prevented from handing out that same value to another
> > instance or task of my processor.
> >
> > How does StateManager Scopeaffect this? (By whether the instance of
> > state is per host or per cluster?)
> > How does processor behavior annotation affect this?
> > How does processor scheduling configuration (concurrent task count)
> > affect this?
> >
> > Thanks for any comments.
> >
Re: StateManager race condition potential
Posted by Mark Payne <ma...@hotmail.com>.
Russell,
The StateManager provides a "setState" method and a "replaceState" method. The former will update the state to whatever you pass it. The latter allows you to pass in the expected state, so that you can atomically replace the value, similar to how ConcurrentMap works.
> On Aug 9, 2019, at 2:11 PM, Joe Witt <jo...@gmail.com> wrote:
>
> Russell
>
> As I understand it the state is specific to an instance of a processor on
> the flow. Your safety ends there. If you allow that processor to have
> multiple tasks running concurrently then you'll need to protect usage of
> that state mechanism just as you would with any other variable in the scope
> of the processor. If the scope is local then the above is really all you
> need to think about. If the scope is cluster wide then generally speaking
> I think the intent of usage for that often is associated with a primary
> node only thing like a ListX processor (ListFile) with the idea being that
> the state can be restored by some other node if that could get assigned
> that role. I'm not clear on the role of cluster wide state otherwise.
> Others will have to comment on that.
>
> Thanks
>
> On Fri, Aug 9, 2019 at 1:54 PM Russell Bateman <ru...@windofkeltia.com>
> wrote:
>
>> I'm assuming that the StateManagerprotects itself against race
>> conditions for the consuming (custom) processor, but I'd like
>> confirmation on that. Let's say something simple like we get an integer
>> out of state to which we can add one to get the next (piece of work to
>> do), then immediately bump and write that value plus 1 for the next
>> thread to get. In the time it took us to get the value back, bump it by
>> 1, then put it out (I'm assuming Scope.LOCAL), I don't see that the
>> StateManageris prevented from handing out that same value to another
>> instance or task of my processor.
>>
>> How does StateManager Scopeaffect this? (By whether the instance of
>> state is per host or per cluster?)
>> How does processor behavior annotation affect this?
>> How does processor scheduling configuration (concurrent task count)
>> affect this?
>>
>> Thanks for any comments.
>>
Re: StateManager race condition potential
Posted by Joe Witt <jo...@gmail.com>.
Russell
As I understand it the state is specific to an instance of a processor on
the flow. Your safety ends there. If you allow that processor to have
multiple tasks running concurrently then you'll need to protect usage of
that state mechanism just as you would with any other variable in the scope
of the processor. If the scope is local then the above is really all you
need to think about. If the scope is cluster wide then generally speaking
I think the intent of usage for that often is associated with a primary
node only thing like a ListX processor (ListFile) with the idea being that
the state can be restored by some other node if that could get assigned
that role. I'm not clear on the role of cluster wide state otherwise.
Others will have to comment on that.
Thanks
On Fri, Aug 9, 2019 at 1:54 PM Russell Bateman <ru...@windofkeltia.com>
wrote:
> I'm assuming that the StateManagerprotects itself against race
> conditions for the consuming (custom) processor, but I'd like
> confirmation on that. Let's say something simple like we get an integer
> out of state to which we can add one to get the next (piece of work to
> do), then immediately bump and write that value plus 1 for the next
> thread to get. In the time it took us to get the value back, bump it by
> 1, then put it out (I'm assuming Scope.LOCAL), I don't see that the
> StateManageris prevented from handing out that same value to another
> instance or task of my processor.
>
> How does StateManager Scopeaffect this? (By whether the instance of
> state is per host or per cluster?)
> How does processor behavior annotation affect this?
> How does processor scheduling configuration (concurrent task count)
> affect this?
>
> Thanks for any comments.
>