You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Leon Yu <Le...@macrohealth.com> on 2020/12/08 23:32:27 UTC

NiFi Missing Provenance Data

Hello,

This is the first time I’ve emailed here so please bear with me.

We are using nifi-1.12.1-RC2 and for one of our tests, we are seeing missing data provenance in NiFi despite the DB showing the flow being processed.
A simplified explanation of how the system works is NiFi ingests a text file, creates a flow file from text file, and some data is written to the DB.
The tests I was running ran two files through NiFi about 20 minutes apart.  The DB has shown both files having successfully been processed.  No other process writes to the DB other than NiFi.  NiFi’s data provenance only showed data for the 1st file, there was no trace of the 2nd file despite success flow.

Thank you,


[A close up of a sign  Description automatically generated]
  Leon Yu
  SDET, Macrohealth

   [A close up of a sign  Description automatically generated] <http://www.macrohealth.com/>    [A picture containing drawing  Description automatically generated] <https://www.linkedin.com/company/macrohealth/>


Re: NiFi Missing Provenance Data

Posted by Andrew Grande <ap...@gmail.com>.
Nice trick

On Thu, Dec 17, 2020, 1:55 PM Eric Secules <es...@gmail.com> wrote:

> Hi Mark,
>
> Thanks for the help! We set the rollover period to 10 minutes so it is
> easy for us to hit this. In addition to increasing the rollover time period
> we're going to include a lone GenerateFlowFile processor which runs every
> hour just to ensure that provenance events are in the log.
>
> Thanks,
> Eric
>
> On Thu, Dec 17, 2020 at 1:03 PM Mark Payne <ma...@hotmail.com> wrote:
>
>> Eric,
>>
>> Looks like you’re running into NIFI-7856 [1]. Should be addressed in the
>> next release. In the interim, there’s a work around that you can employ by
>> updating the settings in your nifi.properties file.
>>
>> The issue occurs when the Provenance Event file rolls over after the
>> configured amount of time but no Provenance Events have been written to it.
>> So, if you know that you’ll go 5 days, for instance, without any events,
>> just set the “nifi.provenance.repository.rollover.time” property to
>> something longer than that, such as 10 days.
>>
>> Thanks
>> -Mark
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-7856
>>
>>
>> On Dec 17, 2020, at 3:50 PM, Eric Secules <es...@gmail.com> wrote:
>>
>> Leon and I did some more digging and found this in both of our logs when
>> we were experiencing the lack of provenance. I think the missing ".prov"
>> file could be why those events are not visible.
>>
>> We also grepped for the filename and component ID we were using in the
>> provenance query and both searches came up with nothing. When provenance
>> was working again I tried grepping for filenames and container IDs and I
>> was successful.
>>
>> Here are the logs we got when grepping for "grep -i provenance . -r |
>> grep -v "Provenance Query" | grep -v "NiFi Web Server""
>>
>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:06:11,959 INFO [Timer-Driven
>>> Process Thread-7] o.a.n.p.store.WriteAheadStorePartition Successfully
>>> rolled over Event Writer for Provenance Event Store
>>> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
>>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:16:12,497 INFO
>>> [Timer-Driven Process Thread-5] o.a.n.p.store.WriteAheadStorePartition
>>> Successfully rolled over Event Writer for Provenance Event Store
>>> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
>>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:26:12,919 INFO
>>> [Timer-Driven Process Thread-8] o.a.n.p.store.WriteAheadStorePartition
>>> Successfully rolled over Event Writer for Provenance Event Store
>>> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
>>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:26:12,923 ERROR [Compress
>>> Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to
>>> compress ./provenance_repository/395.prov on rollover
>>> ./nifi-app_2020-12-17_19.0.log:java.io.FileNotFoundException:
>>> ./provenance_repository/395.prov (No such file or directory)
>>> ./nifi-app_2020-12-17_19.0.log: at
>>> org.apache.nifi.provenance.serialization.EventFileCompressor.compress(EventFileCompressor.java:164)
>>> ./nifi-app_2020-12-17_19.0.log: at
>>> org.apache.nifi.provenance.serialization.EventFileCompressor.run(EventFileCompressor.java:115)
>>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:26:12,923 WARN [Compress
>>> Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to delete
>>> ./provenance_repository/395.prov; this file should be cleaned up manually
>>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:36:13,781 INFO
>>> [Timer-Driven Process Thread-2] o.a.n.p.store.WriteAheadStorePartition
>>> Successfully rolled over Event Writer for Provenance Event Store
>>> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
>>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:46:13,876 INFO
>>> [Timer-Driven Process Thread-3] o.a.n.p.store.WriteAheadStorePartition
>>> Successfully rolled over Event Writer for Provenance Event Store
>>> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
>>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:46:13,876 ERROR [Compress
>>> Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to
>>> compress ./provenance_repository/1331.prov on rollover
>>> ./nifi-app_2020-12-17_19.0.log:java.io.FileNotFoundException:
>>> ./provenance_repository/1331.prov (No such file or directory)
>>> ./nifi-app_2020-12-17_19.0.log: at
>>> org.apache.nifi.provenance.serialization.EventFileCompressor.compress(EventFileCompressor.java:164)
>>> ./nifi-app_2020-12-17_19.0.log: at
>>> org.apache.nifi.provenance.serialization.EventFileCompressor.run(EventFileCompressor.java:115)
>>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:46:13,876 WARN [Compress
>>> Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to delete
>>> ./provenance_repository/1331.prov; this file should be cleaned up manually
>>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:56:14,656 INFO
>>> [Timer-Driven Process Thread-5] o.a.n.p.store.WriteAheadStorePartition
>>> Successfully rolled over Event Writer for Provenance Event Store
>>> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
>>>
>>>
>> Thanks,
>> Eric
>>
>> On Thu, Dec 17, 2020 at 11:38 AM Eric Secules <es...@gmail.com> wrote:
>>
>>> I am also trying out a simple example. I connected a Generate Flowfile
>>> processor to a LogAttribute processor and even though I could see flowfiles
>>> moving through I didn't see any events for either of the component ids nor
>>> did I see any events in the overall view of all provenance events.
>>>
>>> I did some grepping around the "provenance_repositoy" directory and I
>>> was able to find the component ids and the filenames I was setting in some
>>> of the files there. But nothing showed up in the UI.
>>>
>>> I am not sure of the conditions needed to reproduce this, but I do know
>>> it happens often enough to be a problem.
>>>
>>> Thanks,
>>> Eric
>>>
>>> On Wed, Dec 9, 2020 at 10:41 AM Eric Secules <es...@gmail.com> wrote:
>>>
>>>> Hi Leon,
>>>>
>>>> I want to try to rule out something. Are there any logs we can search
>>>> for that would help us find when provenance events are cleared out? I want
>>>> to rule out the CREATE event getting created and getting immediately
>>>> deleted due to some race condition.
>>>>
>>>> Some more detail is that the initial processor in our flow is a
>>>> ListSFTP.
>>>>
>>>> Leon, could you confirm whether there were any provenance events for
>>>> that file?
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>>
>>>>
>>>> On Tue, Dec 8, 2020 at 3:32 PM Leon Yu <Le...@macrohealth.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>>
>>>>>
>>>>> This is the first time I’ve emailed here so please bear with me.
>>>>>
>>>>>
>>>>>
>>>>> We are using nifi-1.12.1-RC2 and for one of our tests, we are seeing
>>>>> missing data provenance in NiFi despite the DB showing the flow being
>>>>> processed.
>>>>>
>>>>> A simplified explanation of how the system works is NiFi ingests a
>>>>> text file, creates a flow file from text file, and some data is written to
>>>>> the DB.
>>>>>
>>>>> The tests I was running ran two files through NiFi about 20 minutes
>>>>> apart.  The DB has shown both files having successfully been processed.  No
>>>>> other process writes to the DB other than NiFi.  NiFi’s data provenance
>>>>> only showed data for the 1st file, there was no trace of the 2nd file
>>>>> despite success flow.
>>>>>
>>>>>
>>>>>
>>>>> Thank you,
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> <image001.png>
>>>>>
>>>>> *  Leon Yu*
>>>>>
>>>>>   SDET, Macrohealth
>>>>>
>>>>>
>>>>>
>>>>>    <image002.png> <http://www.macrohealth.com/>   <image003.png>
>>>>> <https://www.linkedin.com/company/macrohealth/>
>>>>>
>>>>>
>>>>>
>>>>
>>

Re: NiFi Missing Provenance Data

Posted by Eric Secules <es...@gmail.com>.
Hi Mark,

Thanks for the help! We set the rollover period to 10 minutes so it is easy
for us to hit this. In addition to increasing the rollover time period
we're going to include a lone GenerateFlowFile processor which runs every
hour just to ensure that provenance events are in the log.

Thanks,
Eric

On Thu, Dec 17, 2020 at 1:03 PM Mark Payne <ma...@hotmail.com> wrote:

> Eric,
>
> Looks like you’re running into NIFI-7856 [1]. Should be addressed in the
> next release. In the interim, there’s a work around that you can employ by
> updating the settings in your nifi.properties file.
>
> The issue occurs when the Provenance Event file rolls over after the
> configured amount of time but no Provenance Events have been written to it.
> So, if you know that you’ll go 5 days, for instance, without any events,
> just set the “nifi.provenance.repository.rollover.time” property to
> something longer than that, such as 10 days.
>
> Thanks
> -Mark
>
> [1] https://issues.apache.org/jira/browse/NIFI-7856
>
>
> On Dec 17, 2020, at 3:50 PM, Eric Secules <es...@gmail.com> wrote:
>
> Leon and I did some more digging and found this in both of our logs when
> we were experiencing the lack of provenance. I think the missing ".prov"
> file could be why those events are not visible.
>
> We also grepped for the filename and component ID we were using in the
> provenance query and both searches came up with nothing. When provenance
> was working again I tried grepping for filenames and container IDs and I
> was successful.
>
> Here are the logs we got when grepping for "grep -i provenance . -r |
> grep -v "Provenance Query" | grep -v "NiFi Web Server""
>
> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:06:11,959 INFO [Timer-Driven
>> Process Thread-7] o.a.n.p.store.WriteAheadStorePartition Successfully
>> rolled over Event Writer for Provenance Event Store
>> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:16:12,497 INFO [Timer-Driven
>> Process Thread-5] o.a.n.p.store.WriteAheadStorePartition Successfully
>> rolled over Event Writer for Provenance Event Store
>> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:26:12,919 INFO [Timer-Driven
>> Process Thread-8] o.a.n.p.store.WriteAheadStorePartition Successfully
>> rolled over Event Writer for Provenance Event Store
>> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:26:12,923 ERROR [Compress
>> Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to
>> compress ./provenance_repository/395.prov on rollover
>> ./nifi-app_2020-12-17_19.0.log:java.io.FileNotFoundException:
>> ./provenance_repository/395.prov (No such file or directory)
>> ./nifi-app_2020-12-17_19.0.log: at
>> org.apache.nifi.provenance.serialization.EventFileCompressor.compress(EventFileCompressor.java:164)
>> ./nifi-app_2020-12-17_19.0.log: at
>> org.apache.nifi.provenance.serialization.EventFileCompressor.run(EventFileCompressor.java:115)
>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:26:12,923 WARN [Compress
>> Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to delete
>> ./provenance_repository/395.prov; this file should be cleaned up manually
>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:36:13,781 INFO [Timer-Driven
>> Process Thread-2] o.a.n.p.store.WriteAheadStorePartition Successfully
>> rolled over Event Writer for Provenance Event Store
>> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:46:13,876 INFO [Timer-Driven
>> Process Thread-3] o.a.n.p.store.WriteAheadStorePartition Successfully
>> rolled over Event Writer for Provenance Event Store
>> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:46:13,876 ERROR [Compress
>> Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to
>> compress ./provenance_repository/1331.prov on rollover
>> ./nifi-app_2020-12-17_19.0.log:java.io.FileNotFoundException:
>> ./provenance_repository/1331.prov (No such file or directory)
>> ./nifi-app_2020-12-17_19.0.log: at
>> org.apache.nifi.provenance.serialization.EventFileCompressor.compress(EventFileCompressor.java:164)
>> ./nifi-app_2020-12-17_19.0.log: at
>> org.apache.nifi.provenance.serialization.EventFileCompressor.run(EventFileCompressor.java:115)
>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:46:13,876 WARN [Compress
>> Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to delete
>> ./provenance_repository/1331.prov; this file should be cleaned up manually
>> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:56:14,656 INFO [Timer-Driven
>> Process Thread-5] o.a.n.p.store.WriteAheadStorePartition Successfully
>> rolled over Event Writer for Provenance Event Store
>> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
>>
>>
> Thanks,
> Eric
>
> On Thu, Dec 17, 2020 at 11:38 AM Eric Secules <es...@gmail.com> wrote:
>
>> I am also trying out a simple example. I connected a Generate Flowfile
>> processor to a LogAttribute processor and even though I could see flowfiles
>> moving through I didn't see any events for either of the component ids nor
>> did I see any events in the overall view of all provenance events.
>>
>> I did some grepping around the "provenance_repositoy" directory and I was
>> able to find the component ids and the filenames I was setting in some of
>> the files there. But nothing showed up in the UI.
>>
>> I am not sure of the conditions needed to reproduce this, but I do know
>> it happens often enough to be a problem.
>>
>> Thanks,
>> Eric
>>
>> On Wed, Dec 9, 2020 at 10:41 AM Eric Secules <es...@gmail.com> wrote:
>>
>>> Hi Leon,
>>>
>>> I want to try to rule out something. Are there any logs we can search
>>> for that would help us find when provenance events are cleared out? I want
>>> to rule out the CREATE event getting created and getting immediately
>>> deleted due to some race condition.
>>>
>>> Some more detail is that the initial processor in our flow is a ListSFTP.
>>>
>>> Leon, could you confirm whether there were any provenance events for
>>> that file?
>>>
>>> Thanks,
>>> Eric
>>>
>>>
>>>
>>> On Tue, Dec 8, 2020 at 3:32 PM Leon Yu <Le...@macrohealth.com> wrote:
>>>
>>>> Hello,
>>>>
>>>>
>>>>
>>>> This is the first time I’ve emailed here so please bear with me.
>>>>
>>>>
>>>>
>>>> We are using nifi-1.12.1-RC2 and for one of our tests, we are seeing
>>>> missing data provenance in NiFi despite the DB showing the flow being
>>>> processed.
>>>>
>>>> A simplified explanation of how the system works is NiFi ingests a text
>>>> file, creates a flow file from text file, and some data is written to the
>>>> DB.
>>>>
>>>> The tests I was running ran two files through NiFi about 20 minutes
>>>> apart.  The DB has shown both files having successfully been processed.  No
>>>> other process writes to the DB other than NiFi.  NiFi’s data provenance
>>>> only showed data for the 1st file, there was no trace of the 2nd file
>>>> despite success flow.
>>>>
>>>>
>>>>
>>>> Thank you,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> <image001.png>
>>>>
>>>> *  Leon Yu*
>>>>
>>>>   SDET, Macrohealth
>>>>
>>>>
>>>>
>>>>    <image002.png> <http://www.macrohealth.com/>   <image003.png>
>>>> <https://www.linkedin.com/company/macrohealth/>
>>>>
>>>>
>>>>
>>>
>

Re: NiFi Missing Provenance Data

Posted by Mark Payne <ma...@hotmail.com>.
Eric,

Looks like you’re running into NIFI-7856 [1]. Should be addressed in the next release. In the interim, there’s a work around that you can employ by updating the settings in your nifi.properties file.

The issue occurs when the Provenance Event file rolls over after the configured amount of time but no Provenance Events have been written to it. So, if you know that you’ll go 5 days, for instance, without any events, just set the “nifi.provenance.repository.rollover.time” property to something longer than that, such as 10 days.

Thanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-7856


On Dec 17, 2020, at 3:50 PM, Eric Secules <es...@gmail.com>> wrote:

Leon and I did some more digging and found this in both of our logs when we were experiencing the lack of provenance. I think the missing ".prov" file could be why those events are not visible.

We also grepped for the filename and component ID we were using in the provenance query and both searches came up with nothing. When provenance was working again I tried grepping for filenames and container IDs and I was successful.

Here are the logs we got when grepping for "grep -i provenance . -r | grep -v "Provenance Query" | grep -v "NiFi Web Server""

./nifi-app_2020-12-17_19.0.log:2020-12-17 19:06:11,959 INFO [Timer-Driven Process Thread-7] o.a.n.p.store.WriteAheadStorePartition Successfully rolled over Event Writer for Provenance Event Store Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
./nifi-app_2020-12-17_19.0.log:2020-12-17 19:16:12,497 INFO [Timer-Driven Process Thread-5] o.a.n.p.store.WriteAheadStorePartition Successfully rolled over Event Writer for Provenance Event Store Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
./nifi-app_2020-12-17_19.0.log:2020-12-17 19:26:12,919 INFO [Timer-Driven Process Thread-8] o.a.n.p.store.WriteAheadStorePartition Successfully rolled over Event Writer for Provenance Event Store Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
./nifi-app_2020-12-17_19.0.log:2020-12-17 19:26:12,923 ERROR [Compress Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to compress ./provenance_repository/395.prov on rollover
./nifi-app_2020-12-17_19.0.log:java.io.FileNotFoundException: ./provenance_repository/395.prov (No such file or directory)
./nifi-app_2020-12-17_19.0.log: at org.apache.nifi.provenance.serialization.EventFileCompressor.compress(EventFileCompressor.java:164)
./nifi-app_2020-12-17_19.0.log: at org.apache.nifi.provenance.serialization.EventFileCompressor.run(EventFileCompressor.java:115)
./nifi-app_2020-12-17_19.0.log:2020-12-17 19:26:12,923 WARN [Compress Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to delete ./provenance_repository/395.prov; this file should be cleaned up manually
./nifi-app_2020-12-17_19.0.log:2020-12-17 19:36:13,781 INFO [Timer-Driven Process Thread-2] o.a.n.p.store.WriteAheadStorePartition Successfully rolled over Event Writer for Provenance Event Store Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
./nifi-app_2020-12-17_19.0.log:2020-12-17 19:46:13,876 INFO [Timer-Driven Process Thread-3] o.a.n.p.store.WriteAheadStorePartition Successfully rolled over Event Writer for Provenance Event Store Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
./nifi-app_2020-12-17_19.0.log:2020-12-17 19:46:13,876 ERROR [Compress Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to compress ./provenance_repository/1331.prov on rollover
./nifi-app_2020-12-17_19.0.log:java.io.FileNotFoundException: ./provenance_repository/1331.prov (No such file or directory)
./nifi-app_2020-12-17_19.0.log: at org.apache.nifi.provenance.serialization.EventFileCompressor.compress(EventFileCompressor.java:164)
./nifi-app_2020-12-17_19.0.log: at org.apache.nifi.provenance.serialization.EventFileCompressor.run(EventFileCompressor.java:115)
./nifi-app_2020-12-17_19.0.log:2020-12-17 19:46:13,876 WARN [Compress Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to delete ./provenance_repository/1331.prov; this file should be cleaned up manually
./nifi-app_2020-12-17_19.0.log:2020-12-17 19:56:14,656 INFO [Timer-Driven Process Thread-5] o.a.n.p.store.WriteAheadStorePartition Successfully rolled over Event Writer for Provenance Event Store Partition[directory=./provenance_repository] due to MAX_TIME_REACHED


Thanks,
Eric

On Thu, Dec 17, 2020 at 11:38 AM Eric Secules <es...@gmail.com>> wrote:
I am also trying out a simple example. I connected a Generate Flowfile processor to a LogAttribute processor and even though I could see flowfiles moving through I didn't see any events for either of the component ids nor did I see any events in the overall view of all provenance events.

I did some grepping around the "provenance_repositoy" directory and I was able to find the component ids and the filenames I was setting in some of the files there. But nothing showed up in the UI.

I am not sure of the conditions needed to reproduce this, but I do know it happens often enough to be a problem.

Thanks,
Eric

On Wed, Dec 9, 2020 at 10:41 AM Eric Secules <es...@gmail.com>> wrote:
Hi Leon,

I want to try to rule out something. Are there any logs we can search for that would help us find when provenance events are cleared out? I want to rule out the CREATE event getting created and getting immediately deleted due to some race condition.

Some more detail is that the initial processor in our flow is a ListSFTP.

Leon, could you confirm whether there were any provenance events for that file?

Thanks,
Eric



On Tue, Dec 8, 2020 at 3:32 PM Leon Yu <Le...@macrohealth.com>> wrote:
Hello,

This is the first time I’ve emailed here so please bear with me.

We are using nifi-1.12.1-RC2 and for one of our tests, we are seeing missing data provenance in NiFi despite the DB showing the flow being processed.
A simplified explanation of how the system works is NiFi ingests a text file, creates a flow file from text file, and some data is written to the DB.
The tests I was running ran two files through NiFi about 20 minutes apart.  The DB has shown both files having successfully been processed.  No other process writes to the DB other than NiFi.  NiFi’s data provenance only showed data for the 1st file, there was no trace of the 2nd file despite success flow.

Thank you,


<image001.png>
  Leon Yu
  SDET, Macrohealth

   <image002.png><http://www.macrohealth.com/>   <image003.png><https://www.linkedin.com/company/macrohealth/>



Re: NiFi Missing Provenance Data

Posted by Eric Secules <es...@gmail.com>.
Leon and I did some more digging and found this in both of our logs when we
were experiencing the lack of provenance. I think the missing ".prov" file
could be why those events are not visible.

We also grepped for the filename and component ID we were using in the
provenance query and both searches came up with nothing. When provenance
was working again I tried grepping for filenames and container IDs and I
was successful.

Here are the logs we got when grepping for "grep -i provenance . -r | grep
-v "Provenance Query" | grep -v "NiFi Web Server""

./nifi-app_2020-12-17_19.0.log:2020-12-17 19:06:11,959 INFO [Timer-Driven
> Process Thread-7] o.a.n.p.store.WriteAheadStorePartition Successfully
> rolled over Event Writer for Provenance Event Store
> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:16:12,497 INFO [Timer-Driven
> Process Thread-5] o.a.n.p.store.WriteAheadStorePartition Successfully
> rolled over Event Writer for Provenance Event Store
> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:26:12,919 INFO [Timer-Driven
> Process Thread-8] o.a.n.p.store.WriteAheadStorePartition Successfully
> rolled over Event Writer for Provenance Event Store
> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:26:12,923 ERROR [Compress
> Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to
> compress ./provenance_repository/395.prov on rollover
> ./nifi-app_2020-12-17_19.0.log:java.io.FileNotFoundException:
> ./provenance_repository/395.prov (No such file or directory)
> ./nifi-app_2020-12-17_19.0.log: at
> org.apache.nifi.provenance.serialization.EventFileCompressor.compress(EventFileCompressor.java:164)
> ./nifi-app_2020-12-17_19.0.log: at
> org.apache.nifi.provenance.serialization.EventFileCompressor.run(EventFileCompressor.java:115)
> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:26:12,923 WARN [Compress
> Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to delete
> ./provenance_repository/395.prov; this file should be cleaned up manually
> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:36:13,781 INFO [Timer-Driven
> Process Thread-2] o.a.n.p.store.WriteAheadStorePartition Successfully
> rolled over Event Writer for Provenance Event Store
> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:46:13,876 INFO [Timer-Driven
> Process Thread-3] o.a.n.p.store.WriteAheadStorePartition Successfully
> rolled over Event Writer for Provenance Event Store
> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:46:13,876 ERROR [Compress
> Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to
> compress ./provenance_repository/1331.prov on rollover
> ./nifi-app_2020-12-17_19.0.log:java.io.FileNotFoundException:
> ./provenance_repository/1331.prov (No such file or directory)
> ./nifi-app_2020-12-17_19.0.log: at
> org.apache.nifi.provenance.serialization.EventFileCompressor.compress(EventFileCompressor.java:164)
> ./nifi-app_2020-12-17_19.0.log: at
> org.apache.nifi.provenance.serialization.EventFileCompressor.run(EventFileCompressor.java:115)
> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:46:13,876 WARN [Compress
> Provenance Logs-1-thread-3] o.a.n.p.s.EventFileCompressor Failed to delete
> ./provenance_repository/1331.prov; this file should be cleaned up manually
> ./nifi-app_2020-12-17_19.0.log:2020-12-17 19:56:14,656 INFO [Timer-Driven
> Process Thread-5] o.a.n.p.store.WriteAheadStorePartition Successfully
> rolled over Event Writer for Provenance Event Store
> Partition[directory=./provenance_repository] due to MAX_TIME_REACHED
>
>
Thanks,
Eric

On Thu, Dec 17, 2020 at 11:38 AM Eric Secules <es...@gmail.com> wrote:

> I am also trying out a simple example. I connected a Generate Flowfile
> processor to a LogAttribute processor and even though I could see flowfiles
> moving through I didn't see any events for either of the component ids nor
> did I see any events in the overall view of all provenance events.
>
> I did some grepping around the "provenance_repositoy" directory and I was
> able to find the component ids and the filenames I was setting in some of
> the files there. But nothing showed up in the UI.
>
> I am not sure of the conditions needed to reproduce this, but I do know it
> happens often enough to be a problem.
>
> Thanks,
> Eric
>
> On Wed, Dec 9, 2020 at 10:41 AM Eric Secules <es...@gmail.com> wrote:
>
>> Hi Leon,
>>
>> I want to try to rule out something. Are there any logs we can search for
>> that would help us find when provenance events are cleared out? I want to
>> rule out the CREATE event getting created and getting immediately deleted
>> due to some race condition.
>>
>> Some more detail is that the initial processor in our flow is a ListSFTP.
>>
>> Leon, could you confirm whether there were any provenance events for that
>> file?
>>
>> Thanks,
>> Eric
>>
>>
>>
>> On Tue, Dec 8, 2020 at 3:32 PM Leon Yu <Le...@macrohealth.com> wrote:
>>
>>> Hello,
>>>
>>>
>>>
>>> This is the first time I’ve emailed here so please bear with me.
>>>
>>>
>>>
>>> We are using nifi-1.12.1-RC2 and for one of our tests, we are seeing
>>> missing data provenance in NiFi despite the DB showing the flow being
>>> processed.
>>>
>>> A simplified explanation of how the system works is NiFi ingests a text
>>> file, creates a flow file from text file, and some data is written to the
>>> DB.
>>>
>>> The tests I was running ran two files through NiFi about 20 minutes
>>> apart.  The DB has shown both files having successfully been processed.  No
>>> other process writes to the DB other than NiFi.  NiFi’s data provenance
>>> only showed data for the 1st file, there was no trace of the 2nd file
>>> despite success flow.
>>>
>>>
>>>
>>> Thank you,
>>>
>>>
>>>
>>>
>>>
>>> [image: A close up of a sign Description automatically generated]
>>>
>>> *  Leon Yu*
>>>
>>>   SDET, Macrohealth
>>>
>>>
>>>
>>>    [image: A close up of a sign Description automatically generated]
>>> <http://www.macrohealth.com/>   [image: A picture containing drawing
>>> Description automatically generated]
>>> <https://www.linkedin.com/company/macrohealth/>
>>>
>>>
>>>
>>

Re: NiFi Missing Provenance Data

Posted by Eric Secules <es...@gmail.com>.
I am also trying out a simple example. I connected a Generate Flowfile
processor to a LogAttribute processor and even though I could see flowfiles
moving through I didn't see any events for either of the component ids nor
did I see any events in the overall view of all provenance events.

I did some grepping around the "provenance_repositoy" directory and I was
able to find the component ids and the filenames I was setting in some of
the files there. But nothing showed up in the UI.

I am not sure of the conditions needed to reproduce this, but I do know it
happens often enough to be a problem.

Thanks,
Eric

On Wed, Dec 9, 2020 at 10:41 AM Eric Secules <es...@gmail.com> wrote:

> Hi Leon,
>
> I want to try to rule out something. Are there any logs we can search for
> that would help us find when provenance events are cleared out? I want to
> rule out the CREATE event getting created and getting immediately deleted
> due to some race condition.
>
> Some more detail is that the initial processor in our flow is a ListSFTP.
>
> Leon, could you confirm whether there were any provenance events for that
> file?
>
> Thanks,
> Eric
>
>
>
> On Tue, Dec 8, 2020 at 3:32 PM Leon Yu <Le...@macrohealth.com> wrote:
>
>> Hello,
>>
>>
>>
>> This is the first time I’ve emailed here so please bear with me.
>>
>>
>>
>> We are using nifi-1.12.1-RC2 and for one of our tests, we are seeing
>> missing data provenance in NiFi despite the DB showing the flow being
>> processed.
>>
>> A simplified explanation of how the system works is NiFi ingests a text
>> file, creates a flow file from text file, and some data is written to the
>> DB.
>>
>> The tests I was running ran two files through NiFi about 20 minutes
>> apart.  The DB has shown both files having successfully been processed.  No
>> other process writes to the DB other than NiFi.  NiFi’s data provenance
>> only showed data for the 1st file, there was no trace of the 2nd file
>> despite success flow.
>>
>>
>>
>> Thank you,
>>
>>
>>
>>
>>
>> [image: A close up of a sign Description automatically generated]
>>
>> *  Leon Yu*
>>
>>   SDET, Macrohealth
>>
>>
>>
>>    [image: A close up of a sign Description automatically generated]
>> <http://www.macrohealth.com/>   [image: A picture containing drawing
>> Description automatically generated]
>> <https://www.linkedin.com/company/macrohealth/>
>>
>>
>>
>

Re: NiFi Missing Provenance Data

Posted by Eric Secules <es...@gmail.com>.
Hi Leon,

I want to try to rule out something. Are there any logs we can search for
that would help us find when provenance events are cleared out? I want to
rule out the CREATE event getting created and getting immediately deleted
due to some race condition.

Some more detail is that the initial processor in our flow is a ListSFTP.

Leon, could you confirm whether there were any provenance events for that
file?

Thanks,
Eric



On Tue, Dec 8, 2020 at 3:32 PM Leon Yu <Le...@macrohealth.com> wrote:

> Hello,
>
>
>
> This is the first time I’ve emailed here so please bear with me.
>
>
>
> We are using nifi-1.12.1-RC2 and for one of our tests, we are seeing
> missing data provenance in NiFi despite the DB showing the flow being
> processed.
>
> A simplified explanation of how the system works is NiFi ingests a text
> file, creates a flow file from text file, and some data is written to the
> DB.
>
> The tests I was running ran two files through NiFi about 20 minutes
> apart.  The DB has shown both files having successfully been processed.  No
> other process writes to the DB other than NiFi.  NiFi’s data provenance
> only showed data for the 1st file, there was no trace of the 2nd file
> despite success flow.
>
>
>
> Thank you,
>
>
>
>
>
> [image: A close up of a sign Description automatically generated]
>
> *  Leon Yu*
>
>   SDET, Macrohealth
>
>
>
>    [image: A close up of a sign Description automatically generated]
> <http://www.macrohealth.com/>   [image: A picture containing drawing
> Description automatically generated]
> <https://www.linkedin.com/company/macrohealth/>
>
>
>