You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Russell Bateman <ru...@windofkeltia.com> on 2019/08/09 17:53:53 UTC

StateManager race condition potential

I'm assuming that the StateManagerprotects itself against race 
conditions for the consuming (custom) processor, but I'd like 
confirmation on that. Let's say something simple like we get an integer 
out of state to which we can add one to get the next (piece of work to 
do), then immediately bump and write that value plus 1 for the next 
thread to get. In the time it took us to get the value back, bump it by 
1, then put it out (I'm assuming Scope.LOCAL), I don't see that the 
StateManageris prevented from handing out that same value to another 
instance or task of my processor.

How does StateManager Scopeaffect this? (By whether the instance of 
state is per host or per cluster?)
How does processor behavior annotation affect this?
How does processor scheduling configuration (concurrent task count) 
affect this?

Thanks for any comments.

Re: StateManager race condition potential

Posted by Bryan Bende <bb...@gmail.com>.
You would probably want to use the replace method in StateManager [1]
to perform a compare-and-set which would make sure what you are
sending back is accurate based on the previous state you retrieved.

The compare and set operation is implemented by the underlying
provider - write ahead log for local, and ZooKeeper for cluster (or
Redis).

[1] https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/components/state/StateManager.java#L92

On Fri, Aug 9, 2019 at 2:11 PM Joe Witt <jo...@gmail.com> wrote:
>
> Russell
>
> As I understand it the state is specific to an instance of a processor on
> the flow.  Your safety ends there.  If you allow that processor to have
> multiple tasks running concurrently then you'll need to protect usage of
> that state mechanism just as you would with any other variable in the scope
> of the processor.  If the scope is local then the above is really all you
> need to think about.  If the scope is cluster wide then generally speaking
> I think the intent of usage for that often is associated with a primary
> node only thing like a ListX processor (ListFile) with the idea being that
> the state can be restored by some other node if that could get assigned
> that role.  I'm not clear on the role of cluster wide state otherwise.
> Others will have to comment on that.
>
> Thanks
>
> On Fri, Aug 9, 2019 at 1:54 PM Russell Bateman <ru...@windofkeltia.com>
> wrote:
>
> > I'm assuming that the StateManagerprotects itself against race
> > conditions for the consuming (custom) processor, but I'd like
> > confirmation on that. Let's say something simple like we get an integer
> > out of state to which we can add one to get the next (piece of work to
> > do), then immediately bump and write that value plus 1 for the next
> > thread to get. In the time it took us to get the value back, bump it by
> > 1, then put it out (I'm assuming Scope.LOCAL), I don't see that the
> > StateManageris prevented from handing out that same value to another
> > instance or task of my processor.
> >
> > How does StateManager Scopeaffect this? (By whether the instance of
> > state is per host or per cluster?)
> > How does processor behavior annotation affect this?
> > How does processor scheduling configuration (concurrent task count)
> > affect this?
> >
> > Thanks for any comments.
> >

Re: StateManager race condition potential

Posted by Mark Payne <ma...@hotmail.com>.
Russell,

The StateManager provides a "setState" method and a "replaceState" method. The former will update the state to whatever you pass it. The latter allows you to pass in the expected state, so that you can atomically replace the value, similar to how ConcurrentMap works.

> On Aug 9, 2019, at 2:11 PM, Joe Witt <jo...@gmail.com> wrote:
> 
> Russell
> 
> As I understand it the state is specific to an instance of a processor on
> the flow.  Your safety ends there.  If you allow that processor to have
> multiple tasks running concurrently then you'll need to protect usage of
> that state mechanism just as you would with any other variable in the scope
> of the processor.  If the scope is local then the above is really all you
> need to think about.  If the scope is cluster wide then generally speaking
> I think the intent of usage for that often is associated with a primary
> node only thing like a ListX processor (ListFile) with the idea being that
> the state can be restored by some other node if that could get assigned
> that role.  I'm not clear on the role of cluster wide state otherwise.
> Others will have to comment on that.
> 
> Thanks
> 
> On Fri, Aug 9, 2019 at 1:54 PM Russell Bateman <ru...@windofkeltia.com>
> wrote:
> 
>> I'm assuming that the StateManagerprotects itself against race
>> conditions for the consuming (custom) processor, but I'd like
>> confirmation on that. Let's say something simple like we get an integer
>> out of state to which we can add one to get the next (piece of work to
>> do), then immediately bump and write that value plus 1 for the next
>> thread to get. In the time it took us to get the value back, bump it by
>> 1, then put it out (I'm assuming Scope.LOCAL), I don't see that the
>> StateManageris prevented from handing out that same value to another
>> instance or task of my processor.
>> 
>> How does StateManager Scopeaffect this? (By whether the instance of
>> state is per host or per cluster?)
>> How does processor behavior annotation affect this?
>> How does processor scheduling configuration (concurrent task count)
>> affect this?
>> 
>> Thanks for any comments.
>> 


Re: StateManager race condition potential

Posted by Joe Witt <jo...@gmail.com>.
Russell

As I understand it the state is specific to an instance of a processor on
the flow.  Your safety ends there.  If you allow that processor to have
multiple tasks running concurrently then you'll need to protect usage of
that state mechanism just as you would with any other variable in the scope
of the processor.  If the scope is local then the above is really all you
need to think about.  If the scope is cluster wide then generally speaking
I think the intent of usage for that often is associated with a primary
node only thing like a ListX processor (ListFile) with the idea being that
the state can be restored by some other node if that could get assigned
that role.  I'm not clear on the role of cluster wide state otherwise.
Others will have to comment on that.

Thanks

On Fri, Aug 9, 2019 at 1:54 PM Russell Bateman <ru...@windofkeltia.com>
wrote:

> I'm assuming that the StateManagerprotects itself against race
> conditions for the consuming (custom) processor, but I'd like
> confirmation on that. Let's say something simple like we get an integer
> out of state to which we can add one to get the next (piece of work to
> do), then immediately bump and write that value plus 1 for the next
> thread to get. In the time it took us to get the value back, bump it by
> 1, then put it out (I'm assuming Scope.LOCAL), I don't see that the
> StateManageris prevented from handing out that same value to another
> instance or task of my processor.
>
> How does StateManager Scopeaffect this? (By whether the instance of
> state is per host or per cluster?)
> How does processor behavior annotation affect this?
> How does processor scheduling configuration (concurrent task count)
> affect this?
>
> Thanks for any comments.
>