You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Eric Secules <es...@gmail.com> on 2020/05/04 21:47:09 UTC

Is provenance data preserved when processors are deleted?

Hello everyone,

If I am upgrading a process group to the latest version, do you know
whether provenance is preserved for processors that may get deleted in the
upgrade?
I have noticed that if I delete my process group and redownload it from the
registry, I am no longer able to see the provenance data from flowfiles
that went through the first process group.

What is the best way to view and archive provenance data for older versions
of flows? For background I am running NiFi in a docker container.
I think I might have to archive the currently running container and bring
the new version up on a new container.

Thanks,
Eric

Re: Is provenance data preserved when processors are deleted?

Posted by Andy LoPresto <al...@apache.org>.
Eric,

The provenance exported via the reporting task does not contain the flowfile content. 

NiFi wasn’t designed as a long term store for the content or provenance data, but given appropriate resources, you can certainly increase the retention policies significantly. This is not an endorsement, but there are other metadata storage systems like Apache Atlas [1] which you may want to look at for longer retention and some of the features you’re looking for, like a UI for lineage graphs. 

[1] https://atlas.apache.org/#/ <https://atlas.apache.org/#/>


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On May 5, 2020, at 11:39 PM, Eric Secules <es...@gmail.com> wrote:
> 
> Thanks Mike, 
> 
> So the content gets sent over the wire too or just a content URI? I see that the content gets aged out according to nifi.content.repository properties. Given that the defaults for retention are so short would nifi crumble on a long running system if the retention period is years and the available disk space is huge? Shipping the provenance info off to MongoDB or something isn't as attractive because we loose the provenance web UI and the ability to view the configuration of a processor that a flowfile went through.
> 
> Thanks,
> Eric
> 
> On Mon, May 4, 2020 at 4:13 PM Mike Thomsen <mikerthomsen@gmail.com <ma...@gmail.com>> wrote:
> It copies all of the provenance data, and no, there's no way yet to back the provenance repository with one of those nosql databases yet unfortunately.
> 
> On Mon, May 4, 2020 at 6:40 PM Eric Secules <esecules@gmail.com <ma...@gmail.com>> wrote:
> What information is transmitted by SiteToSiteProvenanceReporting? Is it the content, the attributes, and the path the flowfile takes through the system? Is there any way to connect the provenance view from NiFi to the nosql database instead of the internal provenance storage?
> 
> On Mon, May 4, 2020 at 3:07 PM Mike Thomsen <mikerthomsen@gmail.com <ma...@gmail.com>> wrote:
> One way to do it would be to set up a SiteToSiteProvenanceReporting task and have it send the data to another NiFi instance. That instance can post all of the provenance data into a NoSQL database like Mongo or Elasticsearch very quickly.
> 
> On Mon, May 4, 2020 at 5:47 PM Eric Secules <esecules@gmail.com <ma...@gmail.com>> wrote:
> Hello everyone,
> 
> If I am upgrading a process group to the latest version, do you know whether provenance is preserved for processors that may get deleted in the upgrade? 
> I have noticed that if I delete my process group and redownload it from the registry, I am no longer able to see the provenance data from flowfiles that went through the first process group.
> 
> What is the best way to view and archive provenance data for older versions of flows? For background I am running NiFi in a docker container.
> I think I might have to archive the currently running container and bring the new version up on a new container.
> 
> Thanks,
> Eric


Re: Is provenance data preserved when processors are deleted?

Posted by Eric Secules <es...@gmail.com>.
Thanks Mike,

So the content gets sent over the wire too or just a content URI? I see
that the content gets aged out according to nifi.content.repository
properties. Given that the defaults for retention are so short would nifi
crumble on a long running system if the retention period is years and the
available disk space is huge? Shipping the provenance info off to MongoDB
or something isn't as attractive because we loose the provenance web UI and
the ability to view the configuration of a processor that a flowfile went
through.

Thanks,
Eric

On Mon, May 4, 2020 at 4:13 PM Mike Thomsen <mi...@gmail.com> wrote:

> It copies all of the provenance data, and no, there's no way yet to back
> the provenance repository with one of those nosql databases yet
> unfortunately.
>
> On Mon, May 4, 2020 at 6:40 PM Eric Secules <es...@gmail.com> wrote:
>
>> What information is transmitted by SiteToSiteProvenanceReporting? Is it
>> the content, the attributes, and the path the flowfile takes through the
>> system? Is there any way to connect the provenance view from NiFi to the
>> nosql database instead of the internal provenance storage?
>>
>> On Mon, May 4, 2020 at 3:07 PM Mike Thomsen <mi...@gmail.com>
>> wrote:
>>
>>> One way to do it would be to set up a SiteToSiteProvenanceReporting task
>>> and have it send the data to another NiFi instance. That instance can post
>>> all of the provenance data into a NoSQL database like Mongo or
>>> Elasticsearch very quickly.
>>>
>>> On Mon, May 4, 2020 at 5:47 PM Eric Secules <es...@gmail.com> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> If I am upgrading a process group to the latest version, do you know
>>>> whether provenance is preserved for processors that may get deleted in the
>>>> upgrade?
>>>> I have noticed that if I delete my process group and redownload it from
>>>> the registry, I am no longer able to see the provenance data from flowfiles
>>>> that went through the first process group.
>>>>
>>>> What is the best way to view and archive provenance data for older
>>>> versions of flows? For background I am running NiFi in a docker container.
>>>> I think I might have to archive the currently running container and
>>>> bring the new version up on a new container.
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>

Re: Is provenance data preserved when processors are deleted?

Posted by Mike Thomsen <mi...@gmail.com>.
It copies all of the provenance data, and no, there's no way yet to back
the provenance repository with one of those nosql databases yet
unfortunately.

On Mon, May 4, 2020 at 6:40 PM Eric Secules <es...@gmail.com> wrote:

> What information is transmitted by SiteToSiteProvenanceReporting? Is it
> the content, the attributes, and the path the flowfile takes through the
> system? Is there any way to connect the provenance view from NiFi to the
> nosql database instead of the internal provenance storage?
>
> On Mon, May 4, 2020 at 3:07 PM Mike Thomsen <mi...@gmail.com>
> wrote:
>
>> One way to do it would be to set up a SiteToSiteProvenanceReporting task
>> and have it send the data to another NiFi instance. That instance can post
>> all of the provenance data into a NoSQL database like Mongo or
>> Elasticsearch very quickly.
>>
>> On Mon, May 4, 2020 at 5:47 PM Eric Secules <es...@gmail.com> wrote:
>>
>>> Hello everyone,
>>>
>>> If I am upgrading a process group to the latest version, do you know
>>> whether provenance is preserved for processors that may get deleted in the
>>> upgrade?
>>> I have noticed that if I delete my process group and redownload it from
>>> the registry, I am no longer able to see the provenance data from flowfiles
>>> that went through the first process group.
>>>
>>> What is the best way to view and archive provenance data for older
>>> versions of flows? For background I am running NiFi in a docker container.
>>> I think I might have to archive the currently running container and
>>> bring the new version up on a new container.
>>>
>>> Thanks,
>>> Eric
>>>
>>

Re: Is provenance data preserved when processors are deleted?

Posted by Eric Secules <es...@gmail.com>.
What information is transmitted by SiteToSiteProvenanceReporting? Is it the
content, the attributes, and the path the flowfile takes through the
system? Is there any way to connect the provenance view from NiFi to the
nosql database instead of the internal provenance storage?

On Mon, May 4, 2020 at 3:07 PM Mike Thomsen <mi...@gmail.com> wrote:

> One way to do it would be to set up a SiteToSiteProvenanceReporting task
> and have it send the data to another NiFi instance. That instance can post
> all of the provenance data into a NoSQL database like Mongo or
> Elasticsearch very quickly.
>
> On Mon, May 4, 2020 at 5:47 PM Eric Secules <es...@gmail.com> wrote:
>
>> Hello everyone,
>>
>> If I am upgrading a process group to the latest version, do you know
>> whether provenance is preserved for processors that may get deleted in the
>> upgrade?
>> I have noticed that if I delete my process group and redownload it from
>> the registry, I am no longer able to see the provenance data from flowfiles
>> that went through the first process group.
>>
>> What is the best way to view and archive provenance data for older
>> versions of flows? For background I am running NiFi in a docker container.
>> I think I might have to archive the currently running container and bring
>> the new version up on a new container.
>>
>> Thanks,
>> Eric
>>
>

Re: Is provenance data preserved when processors are deleted?

Posted by Mike Thomsen <mi...@gmail.com>.
One way to do it would be to set up a SiteToSiteProvenanceReporting task
and have it send the data to another NiFi instance. That instance can post
all of the provenance data into a NoSQL database like Mongo or
Elasticsearch very quickly.

On Mon, May 4, 2020 at 5:47 PM Eric Secules <es...@gmail.com> wrote:

> Hello everyone,
>
> If I am upgrading a process group to the latest version, do you know
> whether provenance is preserved for processors that may get deleted in the
> upgrade?
> I have noticed that if I delete my process group and redownload it from
> the registry, I am no longer able to see the provenance data from flowfiles
> that went through the first process group.
>
> What is the best way to view and archive provenance data for older
> versions of flows? For background I am running NiFi in a docker container.
> I think I might have to archive the currently running container and bring
> the new version up on a new container.
>
> Thanks,
> Eric
>