You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by "Gresock, Joseph" <jo...@lmco.com> on 2016/07/06 13:02:09 UTC

ReportingTask provenance question

Hi folks,

When developing a ReportingTask, I see that i can call reportingContext.getEventAccess().getProvenanceRepository().  Will this repository contain only provenance events created since the last time my ReportingTask's onTrigger() was fired, or does it contain the entire provenance repository to date?

I'd like to develop a reporting task for metric purposes, and my hope is that I can simply grab all the latest provenance events each time the reporting task triggers.

Thanks,

Joe Gresock
Lockheed Martin Software Engineer Stf
443-294-2661
joseph.gresock@lmco.com

Re: ReportingTask provenance question

Posted by Mark Payne <ma...@hotmail.com>.
Hi Joe,

This provides access to the entire Provenance Repository. Typically, what you would do is use the State Management feature
(ReportingContext.getStateManager()) to store LOCAL-scoped state that includes the ID of the last provenance event pulled.
You can then get a new batch of Provenance Events by calling ProvenanceEventRepository.getEvents(long startId, int maxResults).

Once you've processed those events, you can store the state via the state manager. Then, on each invocation, just use getStateManager().getState()
to determine the last ID processed and go from there.

Does this all make sense?

Thanks
-Mark


> On Jul 6, 2016, at 9:02 AM, Gresock, Joseph <jo...@lmco.com> wrote:
> 
> Hi folks,
> 
> When developing a ReportingTask, I see that i can call reportingContext.getEventAccess().getProvenanceRepository().  Will this repository contain only provenance events created since the last time my ReportingTask's onTrigger() was fired, or does it contain the entire provenance repository to date?
> 
> I'd like to develop a reporting task for metric purposes, and my hope is that I can simply grab all the latest provenance events each time the reporting task triggers.
> 
> Thanks,
> 
> Joe Gresock
> Lockheed Martin Software Engineer Stf
> 443-294-2661
> joseph.gresock@lmco.com


Re: EXTERNAL: Re: ReportingTask provenance question

Posted by Bryan Bende <bb...@gmail.com>.
I haven't looked into a good way to filter them out yet, but I suspect
somehow using the component ids of the components being used after the
Input Port that receives the events.

The reporting task has a configurable batch size which defaults to 1000. So
assuming you are only doing a couple of things after receiving the batch,
you would be probably be producing 3-4 more provenance events per 1000.

On Wed, Jul 6, 2016 at 9:16 AM, Gresock, Joseph <jo...@lmco.com>
wrote:

> That's awesome, I'll just wait for that site-to-site provenance reporting
> task, then.
>
> Have you guys figured out a good way to identify those circular provenance
> events?  I will likely have to use the same cluster for the site-to-site
> endpoint.
>
> Joe Gresock
> Lockheed Martin Software Engineer Stf
> 443-294-2661
> joseph.gresock@lmco.com
>
> ________________________________________
> From: Bryan Bende [bbende@gmail.com]
> Sent: Wednesday, July 06, 2016 9:08 AM
> To: dev@nifi.apache.org
> Subject: EXTERNAL: Re: ReportingTask provenance question
>
> Joe,
>
> You will have to keep track of the last provenance event id that you
> queried in order to query for new events.
>
> In 0.7.0 we added a site-to-site provenance reporting task  [1] which may
> take care of what you need, or at least be an example to base your custom
> reporting task from.
>
> The reason we went this route was rather than having a whole bunch of
> custom reporting tasks to send provenance data to different places, we may
> as well make use of NiFi's existing processors.
> So you can have a separate NiFi instance that just receives provenance
> events over site-to-site and uses processors to send them wherever, or even
> site-to-site back to a single instance but this produces a few more
> circular provenance events.
>
> -Bryan
>
> [1]
>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-site-to-site-reporting-bundle/nifi-site-to-site-reporting-task/src/main/java/org/apache/nifi/reporting/SiteToSiteProvenanceReportingTask.java#L117
>
> On Wed, Jul 6, 2016 at 9:02 AM, Gresock, Joseph <jo...@lmco.com>
> wrote:
>
> > Hi folks,
> >
> > When developing a ReportingTask, I see that i can call
> > reportingContext.getEventAccess().getProvenanceRepository().  Will this
> > repository contain only provenance events created since the last time my
> > ReportingTask's onTrigger() was fired, or does it contain the entire
> > provenance repository to date?
> >
> > I'd like to develop a reporting task for metric purposes, and my hope is
> > that I can simply grab all the latest provenance events each time the
> > reporting task triggers.
> >
> > Thanks,
> >
> > Joe Gresock
> > Lockheed Martin Software Engineer Stf
> > 443-294-2661
> > joseph.gresock@lmco.com
> >
>

RE: EXTERNAL: Re: ReportingTask provenance question

Posted by "Gresock, Joseph" <jo...@lmco.com>.
That's awesome, I'll just wait for that site-to-site provenance reporting task, then.

Have you guys figured out a good way to identify those circular provenance events?  I will likely have to use the same cluster for the site-to-site endpoint.

Joe Gresock
Lockheed Martin Software Engineer Stf
443-294-2661
joseph.gresock@lmco.com

________________________________________
From: Bryan Bende [bbende@gmail.com]
Sent: Wednesday, July 06, 2016 9:08 AM
To: dev@nifi.apache.org
Subject: EXTERNAL: Re: ReportingTask provenance question

Joe,

You will have to keep track of the last provenance event id that you
queried in order to query for new events.

In 0.7.0 we added a site-to-site provenance reporting task  [1] which may
take care of what you need, or at least be an example to base your custom
reporting task from.

The reason we went this route was rather than having a whole bunch of
custom reporting tasks to send provenance data to different places, we may
as well make use of NiFi's existing processors.
So you can have a separate NiFi instance that just receives provenance
events over site-to-site and uses processors to send them wherever, or even
site-to-site back to a single instance but this produces a few more
circular provenance events.

-Bryan

[1]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-site-to-site-reporting-bundle/nifi-site-to-site-reporting-task/src/main/java/org/apache/nifi/reporting/SiteToSiteProvenanceReportingTask.java#L117

On Wed, Jul 6, 2016 at 9:02 AM, Gresock, Joseph <jo...@lmco.com>
wrote:

> Hi folks,
>
> When developing a ReportingTask, I see that i can call
> reportingContext.getEventAccess().getProvenanceRepository().  Will this
> repository contain only provenance events created since the last time my
> ReportingTask's onTrigger() was fired, or does it contain the entire
> provenance repository to date?
>
> I'd like to develop a reporting task for metric purposes, and my hope is
> that I can simply grab all the latest provenance events each time the
> reporting task triggers.
>
> Thanks,
>
> Joe Gresock
> Lockheed Martin Software Engineer Stf
> 443-294-2661
> joseph.gresock@lmco.com
>

Re: ReportingTask provenance question

Posted by Bryan Bende <bb...@gmail.com>.
Joe,

You will have to keep track of the last provenance event id that you
queried in order to query for new events.

In 0.7.0 we added a site-to-site provenance reporting task  [1] which may
take care of what you need, or at least be an example to base your custom
reporting task from.

The reason we went this route was rather than having a whole bunch of
custom reporting tasks to send provenance data to different places, we may
as well make use of NiFi's existing processors.
So you can have a separate NiFi instance that just receives provenance
events over site-to-site and uses processors to send them wherever, or even
site-to-site back to a single instance but this produces a few more
circular provenance events.

-Bryan

[1]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-site-to-site-reporting-bundle/nifi-site-to-site-reporting-task/src/main/java/org/apache/nifi/reporting/SiteToSiteProvenanceReportingTask.java#L117

On Wed, Jul 6, 2016 at 9:02 AM, Gresock, Joseph <jo...@lmco.com>
wrote:

> Hi folks,
>
> When developing a ReportingTask, I see that i can call
> reportingContext.getEventAccess().getProvenanceRepository().  Will this
> repository contain only provenance events created since the last time my
> ReportingTask's onTrigger() was fired, or does it contain the entire
> provenance repository to date?
>
> I'd like to develop a reporting task for metric purposes, and my hope is
> that I can simply grab all the latest provenance events each time the
> reporting task triggers.
>
> Thanks,
>
> Joe Gresock
> Lockheed Martin Software Engineer Stf
> 443-294-2661
> joseph.gresock@lmco.com
>