You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Russell Bateman <ru...@windofkeltia.com> on 2016/11/30 20:14:10 UTC

Global property for custom processor

I've written a custom processor for some trivial profiling, 
time-stamping, time-since, histogram-generating, etc., but would like 
the ability to turn all instances completely off without having to visit 
each instance in the UI. If it works out, I might consider even leaving 
some instances in production- or at least staging-environment flows.

1. I know that the NiFi Expression Language has access to various 
system- or NiFi properties or settings, but what would someone suggest 
as best practice for this? (Don't invade /conf/nifi.properties/, etc.)

2. I guess I'd add a property to configure in my processor and check 
whether it evaluates true/false/etc. based on the source data (whatever 
that will be--see previous paragraph)?

3. Last, if this processor is thereby reduced merely to

    session.transfer( flowfile, SUCCESS );

there isn't any handling even more minimal or faster than that in the 
sense of turning a processor off, right?

Thanks for any suggestions,

Russ

Re: Global property for custom processor

Posted by Andy LoPresto <al...@apache.org>.

The trigger should probably be a new Controller Service, because that makes more logical sense, but it’s not available today.

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Nov 30, 2016, at 5:19 PM, Andy LoPresto <al...@apache.org> wrote:
> 
> Hi Russ,
> 
> Could you use the cluster state manager [1] to hold this boolean trigger value which each instance of your customer processor checks before execution, and then use a simple ExecuteScript processor which simply toggles/explicitly writes that value? In this way, the ExecuteScript processor is like the light switch. You can manually start/stop that processor to trigger or stop all the others, or use the REST API to do the same, or even make the ExecuteScript processor read from a system/environment variable or the absence/presence/value of a file on disk to get the desired state value.
> 
> The ExecuteScript processor might have to abuse the StandardStateManager by first enumerating all instances of the desired “controllable” components (this could be achieved by dynamically querying a containing process group for processors by type or manually populating a list of component IDs in a static list, which the ExecuteScript processor could then store in its own StateManager) and then manually instantiating a StateManager containing the local/cluster StateProvider [2] for each component ID and setting the state.
> 
> Not sure if I explained that well, but Mark Payne would be your guy for a better explanation and possibly a cleaner solution.
> 
> [1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management <https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management>
> [2] https://github.com/apache/nifi/blob/master/nifi-framework-api/src/main/java/org/apache/nifi/components/state/StateProvider.java#L66 <https://github.com/apache/nifi/blob/master/nifi-framework-api/src/main/java/org/apache/nifi/components/state/StateProvider.java#L66>
> 
> 
> Andy LoPresto
> alopresto@apache.org <ma...@apache.org>
> alopresto.apache@gmail.com <ma...@gmail.com>
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Nov 30, 2016, at 2:00 PM, Russell Bateman <russ@windofkeltia.com <ma...@windofkeltia.com>> wrote:
>> 
>> Our usage, for ETL, is controlled very up-close and personal right now. Our ETL of medical documents is pretty involved, changes radically from customer to customer, and must be baby-sat closely. Anything we're able to do for our implementation folk to enable them to pin down waste of resources including and especially time to ingest horrendous quantities of information is going to serve us for a long time to come. User access to an easy processor like the one I've written, though what it does is pretty trivial, makes their life so much easier and they can talk back to us about where time (in particular) is being spent, in which processor (we have lots of custom processors that do very out-of-the-ordinary things), across which subflow, etc.
>> 
>> Except that we anticipate moving to a clustered implementation soon, I thought about merely looking for a system environment variable or even the presence of a file, then setting static state inside the processor to halt doing anything. Conversely, a change to that state might start the processor back up again (time-stamping, histogramming, etc.). I think this naïve control strategy falls apart as soon we go to a cluster.
>> 
>> It's taking me a while to get into the NiFi culture, I think. However, I also think that NiFi folk use NiFi in wildly different ways so maybe how I'm looking to do something isn't always so un-NiFi, but that others just haven't tackled it yet.
>> 
>> Yeah, if NiFi gave us some kind of modifiable, global state, especially if less static than /conf/nifi.properties/, but even if requiring a bounce to engage it (so, /conf/flow.properties/ or /conf/flow.conf/), that would solve our problem pretty elegantly. However, I haven't thought about what problems it also creates for you or others.
>> 
>> Russ
>> 
>> 
>> On 11/30/2016 02:42 PM, Joe Witt wrote:
>>> Russ
>>> 
>>> I don't think we provide anything particularly helpful here to do this
>>> conveniently.  You could of course script this external to NiFi to
>>> make HTTP calls to shut off such items.  Spitballing ideas here but
>>> what about giving you the ability to tag components with some label
>>> and then be able to do global execution of some task
>>> (stop/start/disable/delete/etc..) against components that you're
>>> authorized to and which have those labels.
>>> 
>>> Do you think this would be a typical use case or do you feel this is
>>> useful because you're testing right now?  Does the above idea make
>>> sense or do you have other suggestions?
>>> 
>>> Thanks
>>> Joe
>>> 
>>> On Wed, Nov 30, 2016 at 3:14 PM, Russell Bateman <russ@windofkeltia.com <ma...@windofkeltia.com>> wrote:
>>>> I've written a custom processor for some trivial profiling, time-stamping,
>>>> time-since, histogram-generating, etc., but would like the ability to turn
>>>> all instances completely off without having to visit each instance in the
>>>> UI. If it works out, I might consider even leaving some instances in
>>>> production- or at least staging-environment flows.
>>>> 
>>>> 1. I know that the NiFi Expression Language has access to various system- or
>>>> NiFi properties or settings, but what would someone suggest as best practice
>>>> for this? (Don't invade /conf/nifi.properties/, etc.)
>>>> 
>>>> 2. I guess I'd add a property to configure in my processor and check whether
>>>> it evaluates true/false/etc. based on the source data (whatever that will
>>>> be--see previous paragraph)?
>>>> 
>>>> 3. Last, if this processor is thereby reduced merely to
>>>> 
>>>>    session.transfer( flowfile, SUCCESS );
>>>> 
>>>> there isn't any handling even more minimal or faster than that in the sense
>>>> of turning a processor off, right?
>>>> 
>>>> Thanks for any suggestions,
>>>> 
>>>> Russ
>> 
>

Re: Global property for custom processor

Posted by Andy LoPresto <al...@apache.org>.

Hi Russ,

Could you use the cluster state manager [1] to hold this boolean trigger value which each instance of your customer processor checks before execution, and then use a simple ExecuteScript processor which simply toggles/explicitly writes that value? In this way, the ExecuteScript processor is like the light switch. You can manually start/stop that processor to trigger or stop all the others, or use the REST API to do the same, or even make the ExecuteScript processor read from a system/environment variable or the absence/presence/value of a file on disk to get the desired state value.

The ExecuteScript processor might have to abuse the StandardStateManager by first enumerating all instances of the desired “controllable” components (this could be achieved by dynamically querying a containing process group for processors by type or manually populating a list of component IDs in a static list, which the ExecuteScript processor could then store in its own StateManager) and then manually instantiating a StateManager containing the local/cluster StateProvider [2] for each component ID and setting the state.

Not sure if I explained that well, but Mark Payne would be your guy for a better explanation and possibly a cleaner solution.

[1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management
[2] https://github.com/apache/nifi/blob/master/nifi-framework-api/src/main/java/org/apache/nifi/components/state/StateProvider.java#L66 <https://github.com/apache/nifi/blob/master/nifi-framework-api/src/main/java/org/apache/nifi/components/state/StateProvider.java#L66>


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Nov 30, 2016, at 2:00 PM, Russell Bateman <ru...@windofkeltia.com> wrote:
> 
> Our usage, for ETL, is controlled very up-close and personal right now. Our ETL of medical documents is pretty involved, changes radically from customer to customer, and must be baby-sat closely. Anything we're able to do for our implementation folk to enable them to pin down waste of resources including and especially time to ingest horrendous quantities of information is going to serve us for a long time to come. User access to an easy processor like the one I've written, though what it does is pretty trivial, makes their life so much easier and they can talk back to us about where time (in particular) is being spent, in which processor (we have lots of custom processors that do very out-of-the-ordinary things), across which subflow, etc.
> 
> Except that we anticipate moving to a clustered implementation soon, I thought about merely looking for a system environment variable or even the presence of a file, then setting static state inside the processor to halt doing anything. Conversely, a change to that state might start the processor back up again (time-stamping, histogramming, etc.). I think this naïve control strategy falls apart as soon we go to a cluster.
> 
> It's taking me a while to get into the NiFi culture, I think. However, I also think that NiFi folk use NiFi in wildly different ways so maybe how I'm looking to do something isn't always so un-NiFi, but that others just haven't tackled it yet.
> 
> Yeah, if NiFi gave us some kind of modifiable, global state, especially if less static than /conf/nifi.properties/, but even if requiring a bounce to engage it (so, /conf/flow.properties/ or /conf/flow.conf/), that would solve our problem pretty elegantly. However, I haven't thought about what problems it also creates for you or others.
> 
> Russ
> 
> 
> On 11/30/2016 02:42 PM, Joe Witt wrote:
>> Russ
>> 
>> I don't think we provide anything particularly helpful here to do this
>> conveniently.  You could of course script this external to NiFi to
>> make HTTP calls to shut off such items.  Spitballing ideas here but
>> what about giving you the ability to tag components with some label
>> and then be able to do global execution of some task
>> (stop/start/disable/delete/etc..) against components that you're
>> authorized to and which have those labels.
>> 
>> Do you think this would be a typical use case or do you feel this is
>> useful because you're testing right now?  Does the above idea make
>> sense or do you have other suggestions?
>> 
>> Thanks
>> Joe
>> 
>> On Wed, Nov 30, 2016 at 3:14 PM, Russell Bateman <ru...@windofkeltia.com> wrote:
>>> I've written a custom processor for some trivial profiling, time-stamping,
>>> time-since, histogram-generating, etc., but would like the ability to turn
>>> all instances completely off without having to visit each instance in the
>>> UI. If it works out, I might consider even leaving some instances in
>>> production- or at least staging-environment flows.
>>> 
>>> 1. I know that the NiFi Expression Language has access to various system- or
>>> NiFi properties or settings, but what would someone suggest as best practice
>>> for this? (Don't invade /conf/nifi.properties/, etc.)
>>> 
>>> 2. I guess I'd add a property to configure in my processor and check whether
>>> it evaluates true/false/etc. based on the source data (whatever that will
>>> be--see previous paragraph)?
>>> 
>>> 3. Last, if this processor is thereby reduced merely to
>>> 
>>>    session.transfer( flowfile, SUCCESS );
>>> 
>>> there isn't any handling even more minimal or faster than that in the sense
>>> of turning a processor off, right?
>>> 
>>> Thanks for any suggestions,
>>> 
>>> Russ
>

Re: Global property for custom processor

Posted by Russell Bateman <ru...@windofkeltia.com>.

Our usage, for ETL, is controlled very up-close and personal right now. 
Our ETL of medical documents is pretty involved, changes radically from 
customer to customer, and must be baby-sat closely. Anything we're able 
to do for our implementation folk to enable them to pin down waste of 
resources including and especially time to ingest horrendous quantities 
of information is going to serve us for a long time to come. User access 
to an easy processor like the one I've written, though what it does is 
pretty trivial, makes their life so much easier and they can talk back 
to us about where time (in particular) is being spent, in which 
processor (we have lots of custom processors that do very 
out-of-the-ordinary things), across which subflow, etc.

Except that we anticipate moving to a clustered implementation soon, I 
thought about merely looking for a system environment variable or even 
the presence of a file, then setting static state inside the processor 
to halt doing anything. Conversely, a change to that state might start 
the processor back up again (time-stamping, histogramming, etc.). I 
think this na�ve control strategy falls apart as soon we go to a cluster.

It's taking me a while to get into the NiFi culture, I think. However, I 
also think that NiFi folk use NiFi in wildly different ways so maybe how 
I'm looking to do something isn't always so un-NiFi, but that others 
just haven't tackled it yet.

Yeah, if NiFi gave us some kind of modifiable, global state, especially 
if less static than /conf/nifi.properties/, but even if requiring a 
bounce to engage it (so, /conf/flow.properties/ or /conf/flow.conf/), 
that would solve our problem pretty elegantly. However, I haven't 
thought about what problems it also creates for you or others.

Russ

On 11/30/2016 02:42 PM, Joe Witt wrote:
> Russ
>
> I don't think we provide anything particularly helpful here to do this
> conveniently.  You could of course script this external to NiFi to
> make HTTP calls to shut off such items.  Spitballing ideas here but
> what about giving you the ability to tag components with some label
> and then be able to do global execution of some task
> (stop/start/disable/delete/etc..) against components that you're
> authorized to and which have those labels.
>
> Do you think this would be a typical use case or do you feel this is
> useful because you're testing right now?  Does the above idea make
> sense or do you have other suggestions?
>
> Thanks
> Joe
>
> On Wed, Nov 30, 2016 at 3:14 PM, Russell Bateman <ru...@windofkeltia.com> wrote:
>> I've written a custom processor for some trivial profiling, time-stamping,
>> time-since, histogram-generating, etc., but would like the ability to turn
>> all instances completely off without having to visit each instance in the
>> UI. If it works out, I might consider even leaving some instances in
>> production- or at least staging-environment flows.
>>
>> 1. I know that the NiFi Expression Language has access to various system- or
>> NiFi properties or settings, but what would someone suggest as best practice
>> for this? (Don't invade /conf/nifi.properties/, etc.)
>>
>> 2. I guess I'd add a property to configure in my processor and check whether
>> it evaluates true/false/etc. based on the source data (whatever that will
>> be--see previous paragraph)?
>>
>> 3. Last, if this processor is thereby reduced merely to
>>
>>     session.transfer( flowfile, SUCCESS );
>>
>> there isn't any handling even more minimal or faster than that in the sense
>> of turning a processor off, right?
>>
>> Thanks for any suggestions,
>>
>> Russ

Re: Global property for custom processor

Posted by Joe Witt <jo...@gmail.com>.

Russ

I don't think we provide anything particularly helpful here to do this
conveniently.  You could of course script this external to NiFi to
make HTTP calls to shut off such items.  Spitballing ideas here but
what about giving you the ability to tag components with some label
and then be able to do global execution of some task
(stop/start/disable/delete/etc..) against components that you're
authorized to and which have those labels.

Do you think this would be a typical use case or do you feel this is
useful because you're testing right now?  Does the above idea make
sense or do you have other suggestions?

Thanks
Joe

On Wed, Nov 30, 2016 at 3:14 PM, Russell Bateman <ru...@windofkeltia.com> wrote:
> I've written a custom processor for some trivial profiling, time-stamping,
> time-since, histogram-generating, etc., but would like the ability to turn
> all instances completely off without having to visit each instance in the
> UI. If it works out, I might consider even leaving some instances in
> production- or at least staging-environment flows.
>
> 1. I know that the NiFi Expression Language has access to various system- or
> NiFi properties or settings, but what would someone suggest as best practice
> for this? (Don't invade /conf/nifi.properties/, etc.)
>
> 2. I guess I'd add a property to configure in my processor and check whether
> it evaluates true/false/etc. based on the source data (whatever that will
> be--see previous paragraph)?
>
> 3. Last, if this processor is thereby reduced merely to
>
>    session.transfer( flowfile, SUCCESS );
>
> there isn't any handling even more minimal or faster than that in the sense
> of turning a processor off, right?
>
> Thanks for any suggestions,
>
> Russ