You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Michael Moser <mo...@gmail.com> on 2016/11/07 22:39:07 UTC

NiFi processor validation

All,

I would like to propose a fundamental change to processor validation based
on observations in https://issues.apache.org/jira/browse/NIFI-2996. I would
like to validate processors only when they are in the STOPPED state.

The properties on a processor in the RUNNING state should always be valid,
else you should not have been able to start the processor. A processor in
the DISABLED statue doesn't show validation results, so it seems a waste to
validate its properties.

The reason I'm proposing this change is because the NiFi UI slows down as
you add more processors and controller services to the graph. Beyond common
sense expectations that this would be true, it appears that processor
validation is a significant part of the 'cost' on the server when
responding to REST API requests.  Some details from my testing are in the
JIRA ticket.

Thoughts?

Thanks,
-- Mike

Re: NiFi processor validation

Posted by Michael Moser <mo...@gmail.com>.

Thanks for pointing out the NIFI-950 JIRA! I didn't find that one in my
search.

Processor validation compounds when controller services build their list of
referencing components.  The controller service display shows a list of all
processors that reference the service, and includes the status of each of
those components.  So if Processor P1 references Service S1, then
validation of P1 will cause validation of S1 which will cause validation of
P1 again.

1) Asynchronous validation

It sounds like NIFI-950 is the perfect JIRA for this.  Should we add any
information to it?

2) Validate less

I will go ahead with a PR using NIFI-2996, and we can discuss more.  When
properties become invalid while a processor is running, I hope processors
will throw exceptions and/or show bulletins when this happens.

I will also submit a new JIRA to improve the StandardSSLContextService
customValidate() method.  It cracks open both truststore and keystore
(twice) and creates a sample SSLContext which it just throws away.  Now
imagine doing this hundreds or thousands of times in one validation cycle.

-- Mike



On Tue, Nov 8, 2016 at 9:03 AM, Joe Witt <jo...@gmail.com> wrote:

> Agree on those points mark.  Should be two different JIRAs.
>
> 1) Asynchronous validation
>
>   Yeah probably not easy and requires lots of thought about how to
> tied into the lifecycle of things.
>
> 2) Validate less
>
>   It is true that things can 'become' invalid while processors are
> running but this is both unlikely and isn't something the framework
> will do anything about.  We don't stop the processor because it has
> become invalid because by the same logic it became invalid it could
> also become valid.  So, i'd be good with just not doing that anymore.
>
> I think if we did the second item it would help alot with these
> massive flow cases.
>
> Thanks
> Joe
>
> On Tue, Nov 8, 2016 at 8:49 AM, Mark Payne <ma...@hotmail.com> wrote:
> > All,
> >
> > These are certainly valid concerns. There are a few things to keep in
> mind, though,
> > that may help to explain the current design decisions.
> >
> > Firstly, if a component is disabled, then we do not perform validation
> on the component
> > (or, at least, if we do then it's a bug.) When a component is running,
> we DO still perform
> > validation. This is because a Processor (or any component really) can
> still become invalid
> > while it is running. An example of this is if a property uses a
> FileExistsValidator. If the file
> > is removed while the Processor is running, the Processor becomes
> invalid. The Processor
> > does continue to run (though it may or may not keep failing). However,
> the UI does change
> > its icon to show that it is invalid.
> >
> > Secondly, I believe we have no choice but to validate Service X twice if
> two different Processors
> > depend on the service, because its validity certainly may change between
> the time that it was
> > last validated and now (again, take the File Exists Validator as an
> example).
> >
> > I believe the best solution is to refactor how validation is performed
> and to ensure that any
> > action that enters 'user code' (including validation) is performed
> asynchronously. However, this
> > is a very significant change and if someone decides to take this on, it
> is not going to be quick or
> > easy.
> >
> > Even if validation is performed asynchronously, though, we still have
> the case of performing the
> > validation many times. This is why the Developer Guide explicitly spells
> out the importance of
> > ensuring that Validator logic and customValidate methods are very fast,
> efficient methods. If the
> > validation is going to take longer than a couple of milliseconds then it
> doesn't belong in validation
> > and should probably be moved into the @OnEnabled / @OnScheduled
> lifecycle events.
> >
> > I would like to see this go a step further, though, and support some
> mechanism for performing
> > more complex validation, so that processors such as PutSFTP can validate
> username/password
> > combinations, etc. This would be user-driven and performed by clicking
> some sort of "Test" or "Verify"
> > button in the UI. This would help to separate the notions of "valid"
> configurations from "correct"
> > configurations.
> >
> > Again, though, none of these are small efforts and are going to take
> quite a bit of time, and I don't
> > know that anyone has started working on them yet. So we will need
> someone to
> > volunteer to get the work done first :) But I would love to see some of
> this stuff get tackled as well!
> >
> >
> > Cheers
> > -Mark
> >
> >
> >
> >> On Nov 8, 2016, at 8:07 AM, Joe Skora <js...@gmail.com> wrote:
> >>
> >> +1 for the validation change.
> >>
> >> +1 for not calling into user code for GUI refresh.
> >>
> >> I understand the logic behind validating whenever we return current
> state,
> >> but that can put a great deal of load on a system unrelated to the
> actual
> >> data flow.  For the most part, state changes at discrete points such as
> >> configuration, start, onTrigger, etc.  When loading the GUI it seems
> like
> >> we should return the last known state, possibly with a GUI option to
> >> re-validate the components, to minimize the impact of the user interface
> >> side of the sytems on the actual dataflow components.
> >>
> >> As much as duplicate validation can be eliminated that would help as
> well.
> >> Currently I believe that if Processors A and B validate Service X, the
> >> Service X validation will occur twice, contributing to the "exponential"
> >> growth Mike mentioned in the ticket.
> >>
> >> On Tue, Nov 8, 2016 at 12:43 PM, Matt Gilman <ma...@gmail.com>
> >> wrote:
> >>
> >>> I also agreed these changes make sense. In addition, another approach
> we
> >>> could consider that has been discussed in the past [1] is to perform
> >>> component validation asynchronously. This presents its own challenges
> but
> >>> would also be helpful. We should try to avoid calling into user code
> in any
> >>> web thread.
> >>>
> >>> Matt
> >>>
> >>> [1] https://issues.apache.org/jira/browse/NIFI-950
> >>>
> >>> On Mon, Nov 7, 2016 at 6:15 PM, Matt Burgess <ma...@apache.org>
> wrote:
> >>>
> >>>> Agreed. Also we validate processors on a timer-based strategy in
> >>>> FlowController (looks like for snapshotting) and in the web server
> >>>> (via ControllerFacade), those seem to happen 6-7 times on that
> >>>> interval (which is like 15-20 seconds). Also we validate all
> >>>> processors on any change to the canvas (such as moving a processor).
> >>>> Besides Mike's suggestion, perhaps we should look at a purely
> >>>> event-driven strategy for validating processors if possible?
> >>>>
> >>>> Regards,
> >>>> Matt
> >>>>
> >>>> On Mon, Nov 7, 2016 at 6:06 PM, Joe Witt <jo...@gmail.com> wrote:
> >>>>> Makes good sense to me.
> >>>>>
> >>>>> On Nov 7, 2016 5:39 PM, "Michael Moser" <mo...@gmail.com> wrote:
> >>>>>
> >>>>>> All,
> >>>>>>
> >>>>>> I would like to propose a fundamental change to processor validation
> >>>> based
> >>>>>> on observations in https://issues.apache.org/jira/browse/NIFI-2996.
> I
> >>>>>> would
> >>>>>> like to validate processors only when they are in the STOPPED state.
> >>>>>>
> >>>>>> The properties on a processor in the RUNNING state should always be
> >>>> valid,
> >>>>>> else you should not have been able to start the processor. A
> processor
> >>>> in
> >>>>>> the DISABLED statue doesn't show validation results, so it seems a
> >>>> waste to
> >>>>>> validate its properties.
> >>>>>>
> >>>>>> The reason I'm proposing this change is because the NiFi UI slows
> down
> >>>> as
> >>>>>> you add more processors and controller services to the graph. Beyond
> >>>> common
> >>>>>> sense expectations that this would be true, it appears that
> processor
> >>>>>> validation is a significant part of the 'cost' on the server when
> >>>>>> responding to REST API requests.  Some details from my testing are
> in
> >>>> the
> >>>>>> JIRA ticket.
> >>>>>>
> >>>>>> Thoughts?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -- Mike
> >>>>>>
> >>>>
> >>>
> >
>

Re: NiFi processor validation

Posted by Joe Witt <jo...@gmail.com>.

Agree on those points mark.  Should be two different JIRAs.

1) Asynchronous validation

  Yeah probably not easy and requires lots of thought about how to
tied into the lifecycle of things.

2) Validate less

  It is true that things can 'become' invalid while processors are
running but this is both unlikely and isn't something the framework
will do anything about.  We don't stop the processor because it has
become invalid because by the same logic it became invalid it could
also become valid.  So, i'd be good with just not doing that anymore.

I think if we did the second item it would help alot with these
massive flow cases.

Thanks
Joe

On Tue, Nov 8, 2016 at 8:49 AM, Mark Payne <ma...@hotmail.com> wrote:
> All,
>
> These are certainly valid concerns. There are a few things to keep in mind, though,
> that may help to explain the current design decisions.
>
> Firstly, if a component is disabled, then we do not perform validation on the component
> (or, at least, if we do then it's a bug.) When a component is running, we DO still perform
> validation. This is because a Processor (or any component really) can still become invalid
> while it is running. An example of this is if a property uses a FileExistsValidator. If the file
> is removed while the Processor is running, the Processor becomes invalid. The Processor
> does continue to run (though it may or may not keep failing). However, the UI does change
> its icon to show that it is invalid.
>
> Secondly, I believe we have no choice but to validate Service X twice if two different Processors
> depend on the service, because its validity certainly may change between the time that it was
> last validated and now (again, take the File Exists Validator as an example).
>
> I believe the best solution is to refactor how validation is performed and to ensure that any
> action that enters 'user code' (including validation) is performed asynchronously. However, this
> is a very significant change and if someone decides to take this on, it is not going to be quick or
> easy.
>
> Even if validation is performed asynchronously, though, we still have the case of performing the
> validation many times. This is why the Developer Guide explicitly spells out the importance of
> ensuring that Validator logic and customValidate methods are very fast, efficient methods. If the
> validation is going to take longer than a couple of milliseconds then it doesn't belong in validation
> and should probably be moved into the @OnEnabled / @OnScheduled lifecycle events.
>
> I would like to see this go a step further, though, and support some mechanism for performing
> more complex validation, so that processors such as PutSFTP can validate username/password
> combinations, etc. This would be user-driven and performed by clicking some sort of "Test" or "Verify"
> button in the UI. This would help to separate the notions of "valid" configurations from "correct"
> configurations.
>
> Again, though, none of these are small efforts and are going to take quite a bit of time, and I don't
> know that anyone has started working on them yet. So we will need someone to
> volunteer to get the work done first :) But I would love to see some of this stuff get tackled as well!
>
>
> Cheers
> -Mark
>
>
>
>> On Nov 8, 2016, at 8:07 AM, Joe Skora <js...@gmail.com> wrote:
>>
>> +1 for the validation change.
>>
>> +1 for not calling into user code for GUI refresh.
>>
>> I understand the logic behind validating whenever we return current state,
>> but that can put a great deal of load on a system unrelated to the actual
>> data flow.  For the most part, state changes at discrete points such as
>> configuration, start, onTrigger, etc.  When loading the GUI it seems like
>> we should return the last known state, possibly with a GUI option to
>> re-validate the components, to minimize the impact of the user interface
>> side of the sytems on the actual dataflow components.
>>
>> As much as duplicate validation can be eliminated that would help as well.
>> Currently I believe that if Processors A and B validate Service X, the
>> Service X validation will occur twice, contributing to the "exponential"
>> growth Mike mentioned in the ticket.
>>
>> On Tue, Nov 8, 2016 at 12:43 PM, Matt Gilman <ma...@gmail.com>
>> wrote:
>>
>>> I also agreed these changes make sense. In addition, another approach we
>>> could consider that has been discussed in the past [1] is to perform
>>> component validation asynchronously. This presents its own challenges but
>>> would also be helpful. We should try to avoid calling into user code in any
>>> web thread.
>>>
>>> Matt
>>>
>>> [1] https://issues.apache.org/jira/browse/NIFI-950
>>>
>>> On Mon, Nov 7, 2016 at 6:15 PM, Matt Burgess <ma...@apache.org> wrote:
>>>
>>>> Agreed. Also we validate processors on a timer-based strategy in
>>>> FlowController (looks like for snapshotting) and in the web server
>>>> (via ControllerFacade), those seem to happen 6-7 times on that
>>>> interval (which is like 15-20 seconds). Also we validate all
>>>> processors on any change to the canvas (such as moving a processor).
>>>> Besides Mike's suggestion, perhaps we should look at a purely
>>>> event-driven strategy for validating processors if possible?
>>>>
>>>> Regards,
>>>> Matt
>>>>
>>>> On Mon, Nov 7, 2016 at 6:06 PM, Joe Witt <jo...@gmail.com> wrote:
>>>>> Makes good sense to me.
>>>>>
>>>>> On Nov 7, 2016 5:39 PM, "Michael Moser" <mo...@gmail.com> wrote:
>>>>>
>>>>>> All,
>>>>>>
>>>>>> I would like to propose a fundamental change to processor validation
>>>> based
>>>>>> on observations in https://issues.apache.org/jira/browse/NIFI-2996. I
>>>>>> would
>>>>>> like to validate processors only when they are in the STOPPED state.
>>>>>>
>>>>>> The properties on a processor in the RUNNING state should always be
>>>> valid,
>>>>>> else you should not have been able to start the processor. A processor
>>>> in
>>>>>> the DISABLED statue doesn't show validation results, so it seems a
>>>> waste to
>>>>>> validate its properties.
>>>>>>
>>>>>> The reason I'm proposing this change is because the NiFi UI slows down
>>>> as
>>>>>> you add more processors and controller services to the graph. Beyond
>>>> common
>>>>>> sense expectations that this would be true, it appears that processor
>>>>>> validation is a significant part of the 'cost' on the server when
>>>>>> responding to REST API requests.  Some details from my testing are in
>>>> the
>>>>>> JIRA ticket.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Thanks,
>>>>>> -- Mike
>>>>>>
>>>>
>>>
>

Re: NiFi processor validation

Posted by Mark Payne <ma...@hotmail.com>.

All,

These are certainly valid concerns. There are a few things to keep in mind, though,
that may help to explain the current design decisions.

Firstly, if a component is disabled, then we do not perform validation on the component
(or, at least, if we do then it's a bug.) When a component is running, we DO still perform
validation. This is because a Processor (or any component really) can still become invalid
while it is running. An example of this is if a property uses a FileExistsValidator. If the file
is removed while the Processor is running, the Processor becomes invalid. The Processor
does continue to run (though it may or may not keep failing). However, the UI does change
its icon to show that it is invalid.

Secondly, I believe we have no choice but to validate Service X twice if two different Processors
depend on the service, because its validity certainly may change between the time that it was
last validated and now (again, take the File Exists Validator as an example).

I believe the best solution is to refactor how validation is performed and to ensure that any
action that enters 'user code' (including validation) is performed asynchronously. However, this
is a very significant change and if someone decides to take this on, it is not going to be quick or
easy.

Even if validation is performed asynchronously, though, we still have the case of performing the
validation many times. This is why the Developer Guide explicitly spells out the importance of
ensuring that Validator logic and customValidate methods are very fast, efficient methods. If the
validation is going to take longer than a couple of milliseconds then it doesn't belong in validation
and should probably be moved into the @OnEnabled / @OnScheduled lifecycle events.

I would like to see this go a step further, though, and support some mechanism for performing
more complex validation, so that processors such as PutSFTP can validate username/password
combinations, etc. This would be user-driven and performed by clicking some sort of "Test" or "Verify"
button in the UI. This would help to separate the notions of "valid" configurations from "correct"
configurations.

Again, though, none of these are small efforts and are going to take quite a bit of time, and I don't
know that anyone has started working on them yet. So we will need someone to
volunteer to get the work done first :) But I would love to see some of this stuff get tackled as well!

Cheers
-Mark

> On Nov 8, 2016, at 8:07 AM, Joe Skora <js...@gmail.com> wrote:
> 
> +1 for the validation change.
> 
> +1 for not calling into user code for GUI refresh.
> 
> I understand the logic behind validating whenever we return current state,
> but that can put a great deal of load on a system unrelated to the actual
> data flow.  For the most part, state changes at discrete points such as
> configuration, start, onTrigger, etc.  When loading the GUI it seems like
> we should return the last known state, possibly with a GUI option to
> re-validate the components, to minimize the impact of the user interface
> side of the sytems on the actual dataflow components.
> 
> As much as duplicate validation can be eliminated that would help as well.
> Currently I believe that if Processors A and B validate Service X, the
> Service X validation will occur twice, contributing to the "exponential"
> growth Mike mentioned in the ticket.
> 
> On Tue, Nov 8, 2016 at 12:43 PM, Matt Gilman <ma...@gmail.com>
> wrote:
> 
>> I also agreed these changes make sense. In addition, another approach we
>> could consider that has been discussed in the past [1] is to perform
>> component validation asynchronously. This presents its own challenges but
>> would also be helpful. We should try to avoid calling into user code in any
>> web thread.
>> 
>> Matt
>> 
>> [1] https://issues.apache.org/jira/browse/NIFI-950
>> 
>> On Mon, Nov 7, 2016 at 6:15 PM, Matt Burgess <ma...@apache.org> wrote:
>> 
>>> Agreed. Also we validate processors on a timer-based strategy in
>>> FlowController (looks like for snapshotting) and in the web server
>>> (via ControllerFacade), those seem to happen 6-7 times on that
>>> interval (which is like 15-20 seconds). Also we validate all
>>> processors on any change to the canvas (such as moving a processor).
>>> Besides Mike's suggestion, perhaps we should look at a purely
>>> event-driven strategy for validating processors if possible?
>>> 
>>> Regards,
>>> Matt
>>> 
>>> On Mon, Nov 7, 2016 at 6:06 PM, Joe Witt <jo...@gmail.com> wrote:
>>>> Makes good sense to me.
>>>> 
>>>> On Nov 7, 2016 5:39 PM, "Michael Moser" <mo...@gmail.com> wrote:
>>>> 
>>>>> All,
>>>>> 
>>>>> I would like to propose a fundamental change to processor validation
>>> based
>>>>> on observations in https://issues.apache.org/jira/browse/NIFI-2996. I
>>>>> would
>>>>> like to validate processors only when they are in the STOPPED state.
>>>>> 
>>>>> The properties on a processor in the RUNNING state should always be
>>> valid,
>>>>> else you should not have been able to start the processor. A processor
>>> in
>>>>> the DISABLED statue doesn't show validation results, so it seems a
>>> waste to
>>>>> validate its properties.
>>>>> 
>>>>> The reason I'm proposing this change is because the NiFi UI slows down
>>> as
>>>>> you add more processors and controller services to the graph. Beyond
>>> common
>>>>> sense expectations that this would be true, it appears that processor
>>>>> validation is a significant part of the 'cost' on the server when
>>>>> responding to REST API requests.  Some details from my testing are in
>>> the
>>>>> JIRA ticket.
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Thanks,
>>>>> -- Mike
>>>>> 
>>> 
>>

Re: NiFi processor validation

Posted by Joe Skora <js...@gmail.com>.

+1 for the validation change.

+1 for not calling into user code for GUI refresh.

I understand the logic behind validating whenever we return current state,
but that can put a great deal of load on a system unrelated to the actual
data flow.  For the most part, state changes at discrete points such as
configuration, start, onTrigger, etc.  When loading the GUI it seems like
we should return the last known state, possibly with a GUI option to
re-validate the components, to minimize the impact of the user interface
side of the sytems on the actual dataflow components.

As much as duplicate validation can be eliminated that would help as well.
Currently I believe that if Processors A and B validate Service X, the
Service X validation will occur twice, contributing to the "exponential"
growth Mike mentioned in the ticket.

On Tue, Nov 8, 2016 at 12:43 PM, Matt Gilman <ma...@gmail.com>
wrote:

> I also agreed these changes make sense. In addition, another approach we
> could consider that has been discussed in the past [1] is to perform
> component validation asynchronously. This presents its own challenges but
> would also be helpful. We should try to avoid calling into user code in any
> web thread.
>
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-950
>
> On Mon, Nov 7, 2016 at 6:15 PM, Matt Burgess <ma...@apache.org> wrote:
>
> > Agreed. Also we validate processors on a timer-based strategy in
> > FlowController (looks like for snapshotting) and in the web server
> > (via ControllerFacade), those seem to happen 6-7 times on that
> > interval (which is like 15-20 seconds). Also we validate all
> > processors on any change to the canvas (such as moving a processor).
> > Besides Mike's suggestion, perhaps we should look at a purely
> > event-driven strategy for validating processors if possible?
> >
> > Regards,
> > Matt
> >
> > On Mon, Nov 7, 2016 at 6:06 PM, Joe Witt <jo...@gmail.com> wrote:
> > > Makes good sense to me.
> > >
> > > On Nov 7, 2016 5:39 PM, "Michael Moser" <mo...@gmail.com> wrote:
> > >
> > >> All,
> > >>
> > >> I would like to propose a fundamental change to processor validation
> > based
> > >> on observations in https://issues.apache.org/jira/browse/NIFI-2996. I
> > >> would
> > >> like to validate processors only when they are in the STOPPED state.
> > >>
> > >> The properties on a processor in the RUNNING state should always be
> > valid,
> > >> else you should not have been able to start the processor. A processor
> > in
> > >> the DISABLED statue doesn't show validation results, so it seems a
> > waste to
> > >> validate its properties.
> > >>
> > >> The reason I'm proposing this change is because the NiFi UI slows down
> > as
> > >> you add more processors and controller services to the graph. Beyond
> > common
> > >> sense expectations that this would be true, it appears that processor
> > >> validation is a significant part of the 'cost' on the server when
> > >> responding to REST API requests.  Some details from my testing are in
> > the
> > >> JIRA ticket.
> > >>
> > >> Thoughts?
> > >>
> > >> Thanks,
> > >> -- Mike
> > >>
> >
>

Re: NiFi processor validation

Posted by Oleg Zhurakousky <oz...@hortonworks.com>.

Could it also be related to https://issues.apache.org/jira/browse/NIFI-1318?

Cheers
Oleg

On Nov 8, 2016, at 7:52 AM, Joe Witt <jo...@gmail.com>> wrote:

+1 to both of those points:
1) Avoid validating that which it doesn't help (disabled and running)
2) Avoid using web/synchronous threading for any user code

On Tue, Nov 8, 2016 at 7:43 AM, Matt Gilman <ma...@gmail.com>> wrote:
I also agreed these changes make sense. In addition, another approach we
could consider that has been discussed in the past [1] is to perform
component validation asynchronously. This presents its own challenges but
would also be helpful. We should try to avoid calling into user code in any
web thread.

Matt

[1] https://issues.apache.org/jira/browse/NIFI-950

On Mon, Nov 7, 2016 at 6:15 PM, Matt Burgess <ma...@apache.org>> wrote:

Agreed. Also we validate processors on a timer-based strategy in
FlowController (looks like for snapshotting) and in the web server
(via ControllerFacade), those seem to happen 6-7 times on that
interval (which is like 15-20 seconds). Also we validate all
processors on any change to the canvas (such as moving a processor).
Besides Mike's suggestion, perhaps we should look at a purely
event-driven strategy for validating processors if possible?

Regards,
Matt

On Mon, Nov 7, 2016 at 6:06 PM, Joe Witt <jo...@gmail.com>> wrote:
Makes good sense to me.

On Nov 7, 2016 5:39 PM, "Michael Moser" <mo...@gmail.com>> wrote:

All,

I would like to propose a fundamental change to processor validation
based
on observations in https://issues.apache.org/jira/browse/NIFI-2996. I
would
like to validate processors only when they are in the STOPPED state.

The properties on a processor in the RUNNING state should always be
valid,
else you should not have been able to start the processor. A processor
in
the DISABLED statue doesn't show validation results, so it seems a
waste to
validate its properties.

The reason I'm proposing this change is because the NiFi UI slows down
as
you add more processors and controller services to the graph. Beyond
common
sense expectations that this would be true, it appears that processor
validation is a significant part of the 'cost' on the server when
responding to REST API requests.  Some details from my testing are in
the
JIRA ticket.

Thoughts?

Thanks,
-- Mike

Re: NiFi processor validation

Posted by Joe Witt <jo...@gmail.com>.

+1 to both of those points:
1) Avoid validating that which it doesn't help (disabled and running)
2) Avoid using web/synchronous threading for any user code

On Tue, Nov 8, 2016 at 7:43 AM, Matt Gilman <ma...@gmail.com> wrote:
> I also agreed these changes make sense. In addition, another approach we
> could consider that has been discussed in the past [1] is to perform
> component validation asynchronously. This presents its own challenges but
> would also be helpful. We should try to avoid calling into user code in any
> web thread.
>
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-950
>
> On Mon, Nov 7, 2016 at 6:15 PM, Matt Burgess <ma...@apache.org> wrote:
>
>> Agreed. Also we validate processors on a timer-based strategy in
>> FlowController (looks like for snapshotting) and in the web server
>> (via ControllerFacade), those seem to happen 6-7 times on that
>> interval (which is like 15-20 seconds). Also we validate all
>> processors on any change to the canvas (such as moving a processor).
>> Besides Mike's suggestion, perhaps we should look at a purely
>> event-driven strategy for validating processors if possible?
>>
>> Regards,
>> Matt
>>
>> On Mon, Nov 7, 2016 at 6:06 PM, Joe Witt <jo...@gmail.com> wrote:
>> > Makes good sense to me.
>> >
>> > On Nov 7, 2016 5:39 PM, "Michael Moser" <mo...@gmail.com> wrote:
>> >
>> >> All,
>> >>
>> >> I would like to propose a fundamental change to processor validation
>> based
>> >> on observations in https://issues.apache.org/jira/browse/NIFI-2996. I
>> >> would
>> >> like to validate processors only when they are in the STOPPED state.
>> >>
>> >> The properties on a processor in the RUNNING state should always be
>> valid,
>> >> else you should not have been able to start the processor. A processor
>> in
>> >> the DISABLED statue doesn't show validation results, so it seems a
>> waste to
>> >> validate its properties.
>> >>
>> >> The reason I'm proposing this change is because the NiFi UI slows down
>> as
>> >> you add more processors and controller services to the graph. Beyond
>> common
>> >> sense expectations that this would be true, it appears that processor
>> >> validation is a significant part of the 'cost' on the server when
>> >> responding to REST API requests.  Some details from my testing are in
>> the
>> >> JIRA ticket.
>> >>
>> >> Thoughts?
>> >>
>> >> Thanks,
>> >> -- Mike
>> >>
>>

Re: NiFi processor validation

Posted by Matt Gilman <ma...@gmail.com>.

I also agreed these changes make sense. In addition, another approach we
could consider that has been discussed in the past [1] is to perform
component validation asynchronously. This presents its own challenges but
would also be helpful. We should try to avoid calling into user code in any
web thread.

Matt

[1] https://issues.apache.org/jira/browse/NIFI-950

On Mon, Nov 7, 2016 at 6:15 PM, Matt Burgess <ma...@apache.org> wrote:

> Agreed. Also we validate processors on a timer-based strategy in
> FlowController (looks like for snapshotting) and in the web server
> (via ControllerFacade), those seem to happen 6-7 times on that
> interval (which is like 15-20 seconds). Also we validate all
> processors on any change to the canvas (such as moving a processor).
> Besides Mike's suggestion, perhaps we should look at a purely
> event-driven strategy for validating processors if possible?
>
> Regards,
> Matt
>
> On Mon, Nov 7, 2016 at 6:06 PM, Joe Witt <jo...@gmail.com> wrote:
> > Makes good sense to me.
> >
> > On Nov 7, 2016 5:39 PM, "Michael Moser" <mo...@gmail.com> wrote:
> >
> >> All,
> >>
> >> I would like to propose a fundamental change to processor validation
> based
> >> on observations in https://issues.apache.org/jira/browse/NIFI-2996. I
> >> would
> >> like to validate processors only when they are in the STOPPED state.
> >>
> >> The properties on a processor in the RUNNING state should always be
> valid,
> >> else you should not have been able to start the processor. A processor
> in
> >> the DISABLED statue doesn't show validation results, so it seems a
> waste to
> >> validate its properties.
> >>
> >> The reason I'm proposing this change is because the NiFi UI slows down
> as
> >> you add more processors and controller services to the graph. Beyond
> common
> >> sense expectations that this would be true, it appears that processor
> >> validation is a significant part of the 'cost' on the server when
> >> responding to REST API requests.  Some details from my testing are in
> the
> >> JIRA ticket.
> >>
> >> Thoughts?
> >>
> >> Thanks,
> >> -- Mike
> >>
>

Re: NiFi processor validation

Posted by Matt Burgess <ma...@apache.org>.

Agreed. Also we validate processors on a timer-based strategy in
FlowController (looks like for snapshotting) and in the web server
(via ControllerFacade), those seem to happen 6-7 times on that
interval (which is like 15-20 seconds). Also we validate all
processors on any change to the canvas (such as moving a processor).
Besides Mike's suggestion, perhaps we should look at a purely
event-driven strategy for validating processors if possible?

Regards,
Matt

On Mon, Nov 7, 2016 at 6:06 PM, Joe Witt <jo...@gmail.com> wrote:
> Makes good sense to me.
>
> On Nov 7, 2016 5:39 PM, "Michael Moser" <mo...@gmail.com> wrote:
>
>> All,
>>
>> I would like to propose a fundamental change to processor validation based
>> on observations in https://issues.apache.org/jira/browse/NIFI-2996. I
>> would
>> like to validate processors only when they are in the STOPPED state.
>>
>> The properties on a processor in the RUNNING state should always be valid,
>> else you should not have been able to start the processor. A processor in
>> the DISABLED statue doesn't show validation results, so it seems a waste to
>> validate its properties.
>>
>> The reason I'm proposing this change is because the NiFi UI slows down as
>> you add more processors and controller services to the graph. Beyond common
>> sense expectations that this would be true, it appears that processor
>> validation is a significant part of the 'cost' on the server when
>> responding to REST API requests.  Some details from my testing are in the
>> JIRA ticket.
>>
>> Thoughts?
>>
>> Thanks,
>> -- Mike
>>

Re: NiFi processor validation

Posted by Joe Witt <jo...@gmail.com>.

Makes good sense to me.

On Nov 7, 2016 5:39 PM, "Michael Moser" <mo...@gmail.com> wrote:

> All,
>
> I would like to propose a fundamental change to processor validation based
> on observations in https://issues.apache.org/jira/browse/NIFI-2996. I
> would
> like to validate processors only when they are in the STOPPED state.
>
> The properties on a processor in the RUNNING state should always be valid,
> else you should not have been able to start the processor. A processor in
> the DISABLED statue doesn't show validation results, so it seems a waste to
> validate its properties.
>
> The reason I'm proposing this change is because the NiFi UI slows down as
> you add more processors and controller services to the graph. Beyond common
> sense expectations that this would be true, it appears that processor
> validation is a significant part of the 'cost' on the server when
> responding to REST API requests.  Some details from my testing are in the
> JIRA ticket.
>
> Thoughts?
>
> Thanks,
> -- Mike
>