You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by James McMahon <js...@gmail.com> on 2017/03/28 14:48:36 UTC

ExecuteScript once at workflow inception

Hello. I am interested in calling a python script from ExecuteScript that
sets up Python loggers and establishes file handles to those loggers for
use by other python scripts called later in the workflow by other
ExecuteScript processors. Is there a means to execute a script at workflow
inception - once only, not once per flowfile? I have found some retry count
examples in the open source literature, but those seem to enforce counts at
the flowfile level. In other words the counter restriction sets to 0 for
each flowfile. Thank you for any insights. -Jim

Re: ExecuteScript once at workflow inception

Posted by Bryan Rosander <br...@apache.org>.
That would be the general idea, you'd probably need create a Controller
Service interface and implementation [1] that would take the result and
write it out to a file.  The filename could be part of the method signature.

Another alternative would be to use NiFi's logging framework and configure
logback [2] (via conf/logback.xml) and get a logger via slf4j with the name
that matches the logger you've defined in the logback file.

Thanks,
Bryan

[1]
http://www.nifi.rocks/developing-a-custom-apache-nifi-controller-service/
[2] https://logback.qos.ch/manual/configuration.html

On Tue, Mar 28, 2017 at 11:09 AM, James McMahon <js...@gmail.com>
wrote:

> Thank you Bryan. So would the Controller Service serve as an interface
> through which I direct log messages to a log file that I stipulate? Similar
> to how we can set up different SSL Context Services that relate to
> different cert authorities? If so, then that would help.
>
> Let me describe my requirement, and see if you think a Controller Service
> is suitable. My challenge right now:
> I have python scripts. They build json objects I save as flowfile content
> using a PyStreamCallback. I wish to output these results to log file
> logs/A.log from ExecuteScript instance A, logs/B.log from ExecuteScript
> instance B, etc.
> Evidently in Python you need to set up loggers and file handles to these
> loggers once. If I embed that in my python script, it will do that every
> time a flow file is processed by the ExecuteScript instance.
> Would the Controller Service establish those one-time loggers and one-time
> file handles for us, which I could then reference in my python scripts in
> Execute Script A and B?
>
> If this is what you envision, then it would be something of interest.  -Jim
>
> On Tue, Mar 28, 2017 at 10:56 AM, Bryan Rosander <br...@apache.org>
> wrote:
>
>> Hey James,
>>
>> I wonder if you'd be better suited with a Controller Service that could
>> provide access to configured loggers, etc.
>>
>> It looks like ExecuteScript can lookup Controller Services [1].  A
>> script-based Controller Service implementation (so you could use python or
>> another scripting language there as well) seems like it might be a useful
>> feature.  If there's interest, I could write up a Jira for it.
>>
>> Thanks,
>> Bryan
>>
>> [1] http://funnifi.blogspot.com/2016/04/sql-in-nifi-with-exe
>> cutescript.html
>>
>> On Tue, Mar 28, 2017 at 10:48 AM, James McMahon <js...@gmail.com>
>> wrote:
>>
>>> Hello. I am interested in calling a python script from ExecuteScript
>>> that sets up Python loggers and establishes file handles to those loggers
>>> for use by other python scripts called later in the workflow by other
>>> ExecuteScript processors. Is there a means to execute a script at workflow
>>> inception - once only, not once per flowfile? I have found some retry count
>>> examples in the open source literature, but those seem to enforce counts at
>>> the flowfile level. In other words the counter restriction sets to 0 for
>>> each flowfile. Thank you for any insights. -Jim
>>>
>>
>>
>

Re: ExecuteScript once at workflow inception

Posted by James McMahon <js...@gmail.com>.
Thank you Bryan. So would the Controller Service serve as an interface
through which I direct log messages to a log file that I stipulate? Similar
to how we can set up different SSL Context Services that relate to
different cert authorities? If so, then that would help.

Let me describe my requirement, and see if you think a Controller Service
is suitable. My challenge right now:
I have python scripts. They build json objects I save as flowfile content
using a PyStreamCallback. I wish to output these results to log file
logs/A.log from ExecuteScript instance A, logs/B.log from ExecuteScript
instance B, etc.
Evidently in Python you need to set up loggers and file handles to these
loggers once. If I embed that in my python script, it will do that every
time a flow file is processed by the ExecuteScript instance.
Would the Controller Service establish those one-time loggers and one-time
file handles for us, which I could then reference in my python scripts in
Execute Script A and B?

If this is what you envision, then it would be something of interest.  -Jim

On Tue, Mar 28, 2017 at 10:56 AM, Bryan Rosander <br...@apache.org>
wrote:

> Hey James,
>
> I wonder if you'd be better suited with a Controller Service that could
> provide access to configured loggers, etc.
>
> It looks like ExecuteScript can lookup Controller Services [1].  A
> script-based Controller Service implementation (so you could use python or
> another scripting language there as well) seems like it might be a useful
> feature.  If there's interest, I could write up a Jira for it.
>
> Thanks,
> Bryan
>
> [1] http://funnifi.blogspot.com/2016/04/sql-in-nifi-with-
> executescript.html
>
> On Tue, Mar 28, 2017 at 10:48 AM, James McMahon <js...@gmail.com>
> wrote:
>
>> Hello. I am interested in calling a python script from ExecuteScript that
>> sets up Python loggers and establishes file handles to those loggers for
>> use by other python scripts called later in the workflow by other
>> ExecuteScript processors. Is there a means to execute a script at workflow
>> inception - once only, not once per flowfile? I have found some retry count
>> examples in the open source literature, but those seem to enforce counts at
>> the flowfile level. In other words the counter restriction sets to 0 for
>> each flowfile. Thank you for any insights. -Jim
>>
>
>

Re: ExecuteScript once at workflow inception

Posted by Bryan Rosander <br...@apache.org>.
Hey James,

I wonder if you'd be better suited with a Controller Service that could
provide access to configured loggers, etc.

It looks like ExecuteScript can lookup Controller Services [1].  A
script-based Controller Service implementation (so you could use python or
another scripting language there as well) seems like it might be a useful
feature.  If there's interest, I could write up a Jira for it.

Thanks,
Bryan

[1] http://funnifi.blogspot.com/2016/04/sql-in-nifi-with-executescript.html

On Tue, Mar 28, 2017 at 10:48 AM, James McMahon <js...@gmail.com>
wrote:

> Hello. I am interested in calling a python script from ExecuteScript that
> sets up Python loggers and establishes file handles to those loggers for
> use by other python scripts called later in the workflow by other
> ExecuteScript processors. Is there a means to execute a script at workflow
> inception - once only, not once per flowfile? I have found some retry count
> examples in the open source literature, but those seem to enforce counts at
> the flowfile level. In other words the counter restriction sets to 0 for
> each flowfile. Thank you for any insights. -Jim
>

Re: ExecuteScript once at workflow inception

Posted by James McMahon <js...@gmail.com>.
*Matt, I am adapting a model I found in a reply at Hortonworks for using
Python from InvokeScriptedProcessor:*





*from org.apache.nifi.processor import Processor, Relationship*

*from org.python.core import PySet*



*class PythonProcessor(Processor):*

*   def __init__(self):*

*     self.REL_SUCCESS =
Relationship.Builder().name("success").description("FlowFiles that were
successfully processed").build()*



*     self.REL_FAILURE =
Relationship.Builder().name("failure").description("FlowFiles that failed
to be processed").build()*



*     self.REL_UNMATCH =
Relationship.Builder().name("unmatch").description("FlowFiles that did not
match rules").build()*



*     self.log = None*



*   def initialize(self, context):*

*     self.log = context.getLogger()*



*   def getRelationships(self):*

*     return PySet([self.REL_SUCCESS, self.REL_FAILURE, self.REL_UNMATCH])*



*   def validate(self, context):*

*     return None*



*   def getPropertyDescriptor(self, name):*

*     return None*



*   def getPropertyDescriptors(self):*

*     return None*



*   def validate(self, context):*

*     return None*



*   def onPropertyModified(self, descriptor, oldValue, newValue):*

*     pass*



*   def getIdentifier(self):*

*     return None*



*processor = PythonProcessor()*





This template is my starting point, and I am attempting to bring my python
code from my  ExecuteScript into this model. In the initialize() method I
intend to establish my logger and my handler - logging constructs which I
am given to understand should be done one time and one time only.  Something
like this:



*LOG_FILENAME='/home/nifi/latest/logs/LogFile1.log'*

*FORMAT="%(asctime)-15s %(message)s"*

*logging.basicConfig(filename=LOG_FILENAME,format=FORMAT,level=logging.INFO)*

*a = logging.getLogger("a")*

*formatter = logging.Formatter('%(asctime)-15s %(message)s')*

*handler = logging.FileHandler(LOG_FILENAME)*

*handler.setFormatter(formatter)*
*a.addHandler(handler)*

You mention that



You mention that "



* One caveat is that the Processor interface does not provide a "stop" or
"shutdown" method, so you will need to make sure that any created objects
(connections, clients, e.g.) will be cleaned up gracefully when the
Processor object is garbage-collected. This is not always easy to do, and
the alternative is to write a full custom processor."*


I assume by this you mean that when InvokeScriptedProcessor gets stopped,
I must have a method that executes - only on exit - that does a close() on
my
file handle and that does a shutdown() on my logger. How do I incorporate a
method that accomplishes these things only on exit?

Thank you.

I assume by this you mean that when this InvokeScriptedProcessor gets
stopped, I want to have a method that executes - only on exit - that closes
the handle and that shuts down the logger instance. How do I do this within
the design template I'm working with for InvokeExectuedScript?



Thank you again for your help. -Jim


I assume by this you mean that when this InvokeScriptedProcessor gets
stopped, I want to have a method that executes - only on exit - that closes
the handle and that shuts down the logger instance. How do I do this within
the design template I'm working with for InvokeExectuedS




On Tue, Mar 28, 2017 at 11:00 AM, Matt Burgess <ma...@apache.org> wrote:

> Jim,
>
> You can use InvokeScriptedProcessor [1] rather than ExecuteScript for
> this. ExecuteScript basically lets you provide an onTrigger() body,
> which is called when the ExecuteScript processor "has work to do".
> None of the other lifecycle methods are available.  For
> InvokeScriptedProcessor, you actually script up a subclass of
> Processor [2], and it will have its initialize() method called by
> InvokeScriptedProcessor when it is scheduled to run (once per
> "start"). If you stop and start InvokeScriptedProcessor, or change a
> property, the scripted initialize() method will be called again.
>
> One caveat is that the Processor interface does not provide a "stop"
> or "shutdown" method, so you will need to make sure that any created
> objects (connections, clients, e.g.) will be cleaned up gracefully
> when the Processor object is garbage-collected. This is not always
> easy to do, and the alternative is to write a full custom processor.
> There is an open Jira [3] to invoke annotated lifecycle methods such
> as @OnStopped on the scripted Processor instance.
>
> I have a simple example (albeit in Groovy) [4], but the same approach
> you're likely using for Jython should apply there too. Please let me
> know if you have any questions or issues in setting that up.
>
> Regards,
> Matt
>
> [1] https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi.processors.script.InvokeScriptedProcessor/index.html
> [2] https://github.com/apache/nifi/blob/master/nifi-api/src/
> main/java/org/apache/nifi/processor/Processor.java
> [3] https://issues.apache.org/jira/browse/NIFI-2215
> [4] http://funnifi.blogspot.com/2016/02/writing-reusable-
> scripted-processors-in.html
>
> On Tue, Mar 28, 2017 at 10:48 AM, James McMahon <js...@gmail.com>
> wrote:
> > Hello. I am interested in calling a python script from ExecuteScript that
> > sets up Python loggers and establishes file handles to those loggers for
> use
> > by other python scripts called later in the workflow by other
> ExecuteScript
> > processors. Is there a means to execute a script at workflow inception -
> > once only, not once per flowfile? I have found some retry count examples
> in
> > the open source literature, but those seem to enforce counts at the
> flowfile
> > level. In other words the counter restriction sets to 0 for each
> flowfile.
> > Thank you for any insights. -Jim
>

Re: ExecuteScript once at workflow inception

Posted by James McMahon <js...@gmail.com>.
Thank you Matt. I am not sure I fully understand how to do this in Python
yet, but am going to try and look closely at your example and see if I can
get something working. -Jim

On Tue, Mar 28, 2017 at 11:00 AM, Matt Burgess <ma...@apache.org> wrote:

> Jim,
>
> You can use InvokeScriptedProcessor [1] rather than ExecuteScript for
> this. ExecuteScript basically lets you provide an onTrigger() body,
> which is called when the ExecuteScript processor "has work to do".
> None of the other lifecycle methods are available.  For
> InvokeScriptedProcessor, you actually script up a subclass of
> Processor [2], and it will have its initialize() method called by
> InvokeScriptedProcessor when it is scheduled to run (once per
> "start"). If you stop and start InvokeScriptedProcessor, or change a
> property, the scripted initialize() method will be called again.
>
> One caveat is that the Processor interface does not provide a "stop"
> or "shutdown" method, so you will need to make sure that any created
> objects (connections, clients, e.g.) will be cleaned up gracefully
> when the Processor object is garbage-collected. This is not always
> easy to do, and the alternative is to write a full custom processor.
> There is an open Jira [3] to invoke annotated lifecycle methods such
> as @OnStopped on the scripted Processor instance.
>
> I have a simple example (albeit in Groovy) [4], but the same approach
> you're likely using for Jython should apply there too. Please let me
> know if you have any questions or issues in setting that up.
>
> Regards,
> Matt
>
> [1] https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi.processors.script.InvokeScriptedProcessor/index.html
> [2] https://github.com/apache/nifi/blob/master/nifi-api/src/
> main/java/org/apache/nifi/processor/Processor.java
> [3] https://issues.apache.org/jira/browse/NIFI-2215
> [4] http://funnifi.blogspot.com/2016/02/writing-reusable-
> scripted-processors-in.html
>
> On Tue, Mar 28, 2017 at 10:48 AM, James McMahon <js...@gmail.com>
> wrote:
> > Hello. I am interested in calling a python script from ExecuteScript that
> > sets up Python loggers and establishes file handles to those loggers for
> use
> > by other python scripts called later in the workflow by other
> ExecuteScript
> > processors. Is there a means to execute a script at workflow inception -
> > once only, not once per flowfile? I have found some retry count examples
> in
> > the open source literature, but those seem to enforce counts at the
> flowfile
> > level. In other words the counter restriction sets to 0 for each
> flowfile.
> > Thank you for any insights. -Jim
>

Re: ExecuteScript once at workflow inception

Posted by Matt Burgess <ma...@apache.org>.
Jim,

You can use InvokeScriptedProcessor [1] rather than ExecuteScript for
this. ExecuteScript basically lets you provide an onTrigger() body,
which is called when the ExecuteScript processor "has work to do".
None of the other lifecycle methods are available.  For
InvokeScriptedProcessor, you actually script up a subclass of
Processor [2], and it will have its initialize() method called by
InvokeScriptedProcessor when it is scheduled to run (once per
"start"). If you stop and start InvokeScriptedProcessor, or change a
property, the scripted initialize() method will be called again.

One caveat is that the Processor interface does not provide a "stop"
or "shutdown" method, so you will need to make sure that any created
objects (connections, clients, e.g.) will be cleaned up gracefully
when the Processor object is garbage-collected. This is not always
easy to do, and the alternative is to write a full custom processor.
There is an open Jira [3] to invoke annotated lifecycle methods such
as @OnStopped on the scripted Processor instance.

I have a simple example (albeit in Groovy) [4], but the same approach
you're likely using for Jython should apply there too. Please let me
know if you have any questions or issues in setting that up.

Regards,
Matt

[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.script.InvokeScriptedProcessor/index.html
[2] https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/processor/Processor.java
[3] https://issues.apache.org/jira/browse/NIFI-2215
[4] http://funnifi.blogspot.com/2016/02/writing-reusable-scripted-processors-in.html

On Tue, Mar 28, 2017 at 10:48 AM, James McMahon <js...@gmail.com> wrote:
> Hello. I am interested in calling a python script from ExecuteScript that
> sets up Python loggers and establishes file handles to those loggers for use
> by other python scripts called later in the workflow by other ExecuteScript
> processors. Is there a means to execute a script at workflow inception -
> once only, not once per flowfile? I have found some retry count examples in
> the open source literature, but those seem to enforce counts at the flowfile
> level. In other words the counter restriction sets to 0 for each flowfile.
> Thank you for any insights. -Jim