You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by "Peter Wicks (pwicks)" <pw...@micron.com> on 2019/08/16 15:07:37 UTC

OnPrimaryNodeStateChange vs Primary Only configuration

I'm working on a bug fix for HandleHttpRequest and need to check if a processor is configured to run only on primary node (and not if a processor has the attribute that ONLY allows it to run on primary node).
Here is the scenario for background:

  *   NiFi cluster, but all nodes are on the same physical machine; we do this to let developers develop/test in a cluster without needing a lot of infrastructure before deploying to the real prod cluster.
  *   To avoid Port conflicts, HandleHttpRequest is setup to run only on master. But, if there is a master node change then the Http server is not properly shutdown and we get a port conflict when the new master node starts up the new instance of the processor.

The problem is I don't think the Primary Only scheduling configuration is exposed to the processor. I'd like to do something like the code below:

    @OnPrimaryNodeStateChange
    public void onPrimaryNodeChange(final PrimaryNodeState newState) {
        // If this processor is running in Primary Only
        // and this is processor is not master, shutdown the http server.
        If(this.isMasterOnlyScheduled) shutdown();
    }

I can do some work to expose this, but I thought I'd ask in case I'm missing it.

Thanks,
  Peter

Re: [EXT] Re: OnPrimaryNodeStateChange vs Primary Only configuration

Posted by Bryan Bende <bb...@gmail.com>.
Ah sorry I confused "which node the processor is on" with "where it is
scheduled to run". Sounds like you would need the fix Mark suggested.

I was just looking at the code of HandleHttpRequest and I think it
avoids the problem I was talking about... I was thinking of the case
where the onScheduled method starts a server bound to a port, in that
case you wouldn't be able to start a 3 node cluster on the same
machine, even if its schedule to only primary node, because all 3
nodes would still have onScheduled called and attempt to start a
server. Luckily HandleHttpRequest does lazy initialization in
onTrigger so it avoids this problem.

On Fri, Aug 16, 2019 at 11:43 AM Mark Payne <ma...@hotmail.com> wrote:
>
> It seems reasonable to me to add a `ExecutionNode getExecutionNode()` method to ProcessContext. This enum already exists in nifi-api, but I don't believe that it's exposed anywhere to the Processor itself.
>
> > On Aug 16, 2019, at 11:32 AM, Peter Wicks (pwicks) <pw...@micron.com> wrote:
> >
> > Bryan,
> >
> > I'm familiar with the getNodeTypeProvider method.  Unfortunately, this does not differentiate between processors that are scheduled to run only on the Primary node and those that are scheduled to run on all of them.
> >
> > So you're saying, a better fix would be to properly call scheduled/unscheduled, and when a processor is unscheduled make sure it then handles this; but that it's complicated. I can believe hat.
> >
> > But, in the meantime, there probably isn't a problem with exposing this piece of scheduling information in the ProcessContext?
> >
> > Thanks,
> >  Peter
> >
> > -----Original Message-----
> > From: Bryan Bende <bb...@gmail.com>
> > Sent: Friday, August 16, 2019 9:19 AM
> > To: dev@nifi.apache.org
> > Subject: [EXT] Re: OnPrimaryNodeStateChange vs Primary Only configuration
> >
> > AbstractSessionFactoryProcessor has a method
> >
> > getNodeTypeProvider().isPrimary()
> >
> > The ultimate fix for your problem is that a processor shouldn't have it's onScheduled called at all unless it is actually schedule to run on that node. Currently it calls onScheduled on all nodes, but then never calls onTrigger on the ones where it isn't scheduled. There is a long standing JIRA for this, but it's a complex fix.
> >
> > On Fri, Aug 16, 2019 at 11:07 AM Peter Wicks (pwicks) <pw...@micron.com> wrote:
> >>
> >> I'm working on a bug fix for HandleHttpRequest and need to check if a processor is configured to run only on primary node (and not if a processor has the attribute that ONLY allows it to run on primary node).
> >> Here is the scenario for background:
> >>
> >>  *   NiFi cluster, but all nodes are on the same physical machine; we do this to let developers develop/test in a cluster without needing a lot of infrastructure before deploying to the real prod cluster.
> >>  *   To avoid Port conflicts, HandleHttpRequest is setup to run only on master. But, if there is a master node change then the Http server is not properly shutdown and we get a port conflict when the new master node starts up the new instance of the processor.
> >>
> >> The problem is I don't think the Primary Only scheduling configuration is exposed to the processor. I'd like to do something like the code below:
> >>
> >>    @OnPrimaryNodeStateChange
> >>    public void onPrimaryNodeChange(final PrimaryNodeState newState) {
> >>        // If this processor is running in Primary Only
> >>        // and this is processor is not master, shutdown the http server.
> >>        If(this.isMasterOnlyScheduled) shutdown();
> >>    }
> >>
> >> I can do some work to expose this, but I thought I'd ask in case I'm missing it.
> >>
> >> Thanks,
> >>  Peter
>

Re: [EXT] Re: OnPrimaryNodeStateChange vs Primary Only configuration

Posted by Mark Payne <ma...@hotmail.com>.
It seems reasonable to me to add a `ExecutionNode getExecutionNode()` method to ProcessContext. This enum already exists in nifi-api, but I don't believe that it's exposed anywhere to the Processor itself.

> On Aug 16, 2019, at 11:32 AM, Peter Wicks (pwicks) <pw...@micron.com> wrote:
> 
> Bryan,
> 
> I'm familiar with the getNodeTypeProvider method.  Unfortunately, this does not differentiate between processors that are scheduled to run only on the Primary node and those that are scheduled to run on all of them.
> 
> So you're saying, a better fix would be to properly call scheduled/unscheduled, and when a processor is unscheduled make sure it then handles this; but that it's complicated. I can believe hat.
> 
> But, in the meantime, there probably isn't a problem with exposing this piece of scheduling information in the ProcessContext?
> 
> Thanks,
>  Peter
> 
> -----Original Message-----
> From: Bryan Bende <bb...@gmail.com> 
> Sent: Friday, August 16, 2019 9:19 AM
> To: dev@nifi.apache.org
> Subject: [EXT] Re: OnPrimaryNodeStateChange vs Primary Only configuration
> 
> AbstractSessionFactoryProcessor has a method
> 
> getNodeTypeProvider().isPrimary()
> 
> The ultimate fix for your problem is that a processor shouldn't have it's onScheduled called at all unless it is actually schedule to run on that node. Currently it calls onScheduled on all nodes, but then never calls onTrigger on the ones where it isn't scheduled. There is a long standing JIRA for this, but it's a complex fix.
> 
> On Fri, Aug 16, 2019 at 11:07 AM Peter Wicks (pwicks) <pw...@micron.com> wrote:
>> 
>> I'm working on a bug fix for HandleHttpRequest and need to check if a processor is configured to run only on primary node (and not if a processor has the attribute that ONLY allows it to run on primary node).
>> Here is the scenario for background:
>> 
>>  *   NiFi cluster, but all nodes are on the same physical machine; we do this to let developers develop/test in a cluster without needing a lot of infrastructure before deploying to the real prod cluster.
>>  *   To avoid Port conflicts, HandleHttpRequest is setup to run only on master. But, if there is a master node change then the Http server is not properly shutdown and we get a port conflict when the new master node starts up the new instance of the processor.
>> 
>> The problem is I don't think the Primary Only scheduling configuration is exposed to the processor. I'd like to do something like the code below:
>> 
>>    @OnPrimaryNodeStateChange
>>    public void onPrimaryNodeChange(final PrimaryNodeState newState) {
>>        // If this processor is running in Primary Only
>>        // and this is processor is not master, shutdown the http server.
>>        If(this.isMasterOnlyScheduled) shutdown();
>>    }
>> 
>> I can do some work to expose this, but I thought I'd ask in case I'm missing it.
>> 
>> Thanks,
>>  Peter


RE: [EXT] Re: OnPrimaryNodeStateChange vs Primary Only configuration

Posted by "Peter Wicks (pwicks)" <pw...@micron.com>.
Bryan,

I'm familiar with the getNodeTypeProvider method.  Unfortunately, this does not differentiate between processors that are scheduled to run only on the Primary node and those that are scheduled to run on all of them.

So you're saying, a better fix would be to properly call scheduled/unscheduled, and when a processor is unscheduled make sure it then handles this; but that it's complicated. I can believe hat.

But, in the meantime, there probably isn't a problem with exposing this piece of scheduling information in the ProcessContext?

Thanks,
  Peter

-----Original Message-----
From: Bryan Bende <bb...@gmail.com> 
Sent: Friday, August 16, 2019 9:19 AM
To: dev@nifi.apache.org
Subject: [EXT] Re: OnPrimaryNodeStateChange vs Primary Only configuration

AbstractSessionFactoryProcessor has a method

getNodeTypeProvider().isPrimary()

The ultimate fix for your problem is that a processor shouldn't have it's onScheduled called at all unless it is actually schedule to run on that node. Currently it calls onScheduled on all nodes, but then never calls onTrigger on the ones where it isn't scheduled. There is a long standing JIRA for this, but it's a complex fix.

On Fri, Aug 16, 2019 at 11:07 AM Peter Wicks (pwicks) <pw...@micron.com> wrote:
>
> I'm working on a bug fix for HandleHttpRequest and need to check if a processor is configured to run only on primary node (and not if a processor has the attribute that ONLY allows it to run on primary node).
> Here is the scenario for background:
>
>   *   NiFi cluster, but all nodes are on the same physical machine; we do this to let developers develop/test in a cluster without needing a lot of infrastructure before deploying to the real prod cluster.
>   *   To avoid Port conflicts, HandleHttpRequest is setup to run only on master. But, if there is a master node change then the Http server is not properly shutdown and we get a port conflict when the new master node starts up the new instance of the processor.
>
> The problem is I don't think the Primary Only scheduling configuration is exposed to the processor. I'd like to do something like the code below:
>
>     @OnPrimaryNodeStateChange
>     public void onPrimaryNodeChange(final PrimaryNodeState newState) {
>         // If this processor is running in Primary Only
>         // and this is processor is not master, shutdown the http server.
>         If(this.isMasterOnlyScheduled) shutdown();
>     }
>
> I can do some work to expose this, but I thought I'd ask in case I'm missing it.
>
> Thanks,
>   Peter

Re: OnPrimaryNodeStateChange vs Primary Only configuration

Posted by Bryan Bende <bb...@gmail.com>.
AbstractSessionFactoryProcessor has a method

getNodeTypeProvider().isPrimary()

The ultimate fix for your problem is that a processor shouldn't have
it's onScheduled called at all unless it is actually schedule to run
on that node. Currently it calls onScheduled on all nodes, but then
never calls onTrigger on the ones where it isn't scheduled. There is a
long standing JIRA for this, but it's a complex fix.

On Fri, Aug 16, 2019 at 11:07 AM Peter Wicks (pwicks) <pw...@micron.com> wrote:
>
> I'm working on a bug fix for HandleHttpRequest and need to check if a processor is configured to run only on primary node (and not if a processor has the attribute that ONLY allows it to run on primary node).
> Here is the scenario for background:
>
>   *   NiFi cluster, but all nodes are on the same physical machine; we do this to let developers develop/test in a cluster without needing a lot of infrastructure before deploying to the real prod cluster.
>   *   To avoid Port conflicts, HandleHttpRequest is setup to run only on master. But, if there is a master node change then the Http server is not properly shutdown and we get a port conflict when the new master node starts up the new instance of the processor.
>
> The problem is I don't think the Primary Only scheduling configuration is exposed to the processor. I'd like to do something like the code below:
>
>     @OnPrimaryNodeStateChange
>     public void onPrimaryNodeChange(final PrimaryNodeState newState) {
>         // If this processor is running in Primary Only
>         // and this is processor is not master, shutdown the http server.
>         If(this.isMasterOnlyScheduled) shutdown();
>     }
>
> I can do some work to expose this, but I thought I'd ask in case I'm missing it.
>
> Thanks,
>   Peter