You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by "Crystal Deschamps Fogdall (cdeschampsfo)" <cd...@micron.com> on 2016/09/12 20:39:48 UTC

ExecuteSQL processor running in duplicate?

I'm using NiFi for a very simple, yet non-standard way - as a POC scheduler to call an existing stored procedure using an ExecuteSQL processor. The processor is timer driven (every 4 hours), uses a basic JDBC connection, and executes a stored procedure call that contains one hard-coded variable. I have failure and success relationships defined only to collect flow files to troubleshoot this POC.

The processor had been running successfully for a day and a half. Based on the NiFi status history, I could see a single FlowFile out every 4 hours. My sp takes the last known copy of the target table, renames it, and builds a new table that has a load timestamp. The timestamps in both tables reflected what I saw in the history.

After monkeying with the concurrent tasks (originally at 1, I moved it to 4 just to see what would happen), I suddenly saw two FlowFiles appear every 4 hours in the history and it appeared, based on the timestamps in my tables, that the processor would run once, kick off again a few minutes later (the new table's timestamp would be a few minutes later than the last known copy timestamp) and they lay dormant again until 4 hours had passed.

Because I don't have data inputs / outputs, troubleshooting the issue is really difficult and I'm finding the out-of-the-box monitoring tools (e.g. NiFi summary tabs, status history, etc) lacking. I'm wondering if anyone has seen this kind of behavior and, if so, what was its cause? (Certainly, it can't be the concurrent tasks could it?!) Secondly, I'm wondering if anyone has created their own or have external resources they can point to for troubleshooting and monitoring of process groups. My background is in traditional ETL so I'm looking for scheduler-type application reporting like ActiveBatch.

Thanks, in advance, for your help,
Crystal

RE: ExecuteSQL processor running in duplicate?

Posted by "Crystal Deschamps Fogdall (cdeschampsfo)" <cd...@micron.com>.

Thanks for all of your help Lee!

From: Lee Laim [mailto:lee.laim@gmail.com]
Sent: Tuesday, September 13, 2016 13:08 PM
To: users@nifi.apache.org
Subject: Re: ExecuteSQL processor running in duplicate?

Hi Crystal,

While this may not be the root cause of your specific case, I suspect the sequence of:
->stopping the processor,  ->changing concurrent task setting, ->then starting the timer driven processor

generated the rogue flowfile in the queue.  Timer driven processors are triggered on start, regardless of when the previous trigger occurred.  I've observed similar events in my dataflows and have switched over to a Cron Driven scheduling to start most processes.

For troubleshooting and one-time back-fill operations on a cron-initialed process, I also include a timer-driven processor that can  initiate the flow.   The timer-driven version is always-off, until it is required: I  start it, then immediately stop it.    (The run schedule is set for 100 seconds to prevent multiple runs between the start-click and stop-click).  This provides predictable cron-based scheduling with convenient ad-hoc backfills that do not require re-configuring the cron.

For your second question on monitoring, I've had success with the Monitor Activity processor paired with a Put Email processor to indicate something is awry. Depending on the amount of detail you require and your definition of failure, you can branch data into a process to check integrity and notify you when something about the data is not correct.

Here are some references that show different approaches to error-capture that have been useful to me:

  *   http://stackoverflow.com/questions/37430915/how-to-capture-bulletin-messages-in-apache-nifi
  *   http://stackoverflow.com/questions/36869043/get-failure-reason-in-apache-nifi
  *   https://kisstechdocs.wordpress.com/2015/01/15/creating-a-limited-failure-loop-in-nifi/

Thanks,
-Lee

On Mon, Sep 12, 2016 at 2:39 PM, Crystal Deschamps Fogdall (cdeschampsfo) <cd...@micron.com>> wrote:
I'm using NiFi for a very simple, yet non-standard way - as a POC scheduler to call an existing stored procedure using an ExecuteSQL processor. The processor is timer driven (every 4 hours), uses a basic JDBC connection, and executes a stored procedure call that contains one hard-coded variable. I have failure and success relationships defined only to collect flow files to troubleshoot this POC.

The processor had been running successfully for a day and a half. Based on the NiFi status history, I could see a single FlowFile out every 4 hours. My sp takes the last known copy of the target table, renames it, and builds a new table that has a load timestamp. The timestamps in both tables reflected what I saw in the history.

After monkeying with the concurrent tasks (originally at 1, I moved it to 4 just to see what would happen), I suddenly saw two FlowFiles appear every 4 hours in the history and it appeared, based on the timestamps in my tables, that the processor would run once, kick off again a few minutes later (the new table's timestamp would be a few minutes later than the last known copy timestamp) and they lay dormant again until 4 hours had passed.

Because I don't have data inputs / outputs, troubleshooting the issue is really difficult and I'm finding the out-of-the-box monitoring tools (e.g. NiFi summary tabs, status history, etc) lacking. I'm wondering if anyone has seen this kind of behavior and, if so, what was its cause? (Certainly, it can't be the concurrent tasks could it?!) Secondly, I'm wondering if anyone has created their own or have external resources they can point to for troubleshooting and monitoring of process groups. My background is in traditional ETL so I'm looking for scheduler-type application reporting like ActiveBatch.

Thanks, in advance, for your help,
Crystal

Re: ExecuteSQL processor running in duplicate?

Posted by Lee Laim <le...@gmail.com>.

Hi Crystal,

While this may not be the root cause of your specific case, I suspect the
sequence of:
->stopping the processor,  ->changing concurrent task setting, ->then
starting the timer driven processor

generated the rogue flowfile in the queue.  Timer driven processors are
triggered on start, regardless of when the previous trigger occurred.  I've
observed similar events in my dataflows and have switched over to a Cron
Driven scheduling to start most processes.

For troubleshooting and one-time back-fill operations on a cron-initialed
process, I also include a timer-driven processor that can  initiate the
flow.   The timer-driven version is always-off, until it is required: I
 start it, then immediately stop it.    (The run schedule is set for 100
seconds to prevent multiple runs between the start-click and stop-click).
This provides predictable cron-based scheduling with convenient ad-hoc
backfills that do not require re-configuring the cron.

For your second question on monitoring, I've had success with the Monitor
Activity processor paired with a Put Email processor to indicate something
is awry. Depending on the amount of detail you require and your definition
of failure, you can branch data into a process to check integrity and
notify you when something about the data is not correct.

Here are some references that show different approaches to error-capture
that have been useful to me:

   - http://stackoverflow.com/questions/37430915/how-to-capture-
   bulletin-messages-in-apache-nifi
   <http://stackoverflow.com/questions/37430915/how-to-capture-bulletin-messages-in-apache-nifi>
   - http://stackoverflow.com/questions/36869043/get-failure-
   reason-in-apache-nifi
   - https://kisstechdocs.wordpress.com/2015/01/15/creating-a-
   limited-failure-loop-in-nifi/

Thanks,
-Lee

On Mon, Sep 12, 2016 at 2:39 PM, Crystal Deschamps Fogdall (cdeschampsfo) <
cdeschampsfo@micron.com> wrote:

> I'm using NiFi for a very simple, yet non-standard way - as a POC
> scheduler to call an existing stored procedure using an ExecuteSQL
> processor. The processor is timer driven (every 4 hours), uses a basic JDBC
> connection, and executes a stored procedure call that contains one
> hard-coded variable. I have failure and success relationships defined only
> to collect flow files to troubleshoot this POC.
>
> The processor had been running successfully for a day and a half. Based on
> the NiFi status history, I could see a single FlowFile out every 4 hours.
> My sp takes the last known copy of the target table, renames it, and builds
> a new table that has a load timestamp. The timestamps in both tables
> reflected what I saw in the history.
>
> After monkeying with the concurrent tasks (originally at 1, I moved it to
> 4 just to see what would happen), I suddenly saw two FlowFiles appear every
> 4 hours in the history and it appeared, based on the timestamps in my
> tables, that the processor would run once, kick off again a few minutes
> later (the new table's timestamp would be a few minutes later than the last
> known copy timestamp) and they lay dormant again until 4 hours had passed.
>
> Because I don't have data inputs / outputs, troubleshooting the issue is
> really difficult and I'm finding the out-of-the-box monitoring tools (e.g.
> NiFi summary tabs, status history, etc) lacking. I'm wondering if anyone
> has seen this kind of behavior and, if so, what was its cause? (Certainly,
> it can't be the concurrent tasks could it?!) Secondly, I'm wondering if
> anyone has created their own or have external resources they can point to
> for troubleshooting and monitoring of process groups. My background is in
> traditional ETL so I'm looking for scheduler-type application reporting
> like ActiveBatch.
>
> Thanks, in advance, for your help,
> Crystal
>
>
>