You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Jairo Henao <ja...@gmail.com> on 2020/05/04 17:03:31 UTC

Re: Not Seeing Provenance data

Hi,

I went from an instance without security to one where I configured HTTPS.
After enabling security with policies and users, I couldn't check the
provenance with the admin user.

When I added the following policy, the provenance was again visible to me
(I don't know if this is documented anywhere):

<policy identifier="e097fd1d-0171-1000-fc74-4ca6dc8b3aed"
resource="/provenance-data/process-groups/<<MAIN_GROUP_ID>>" action="R">
            <user identifier="<<USER_IDENTIFIER>>"/>
        </policy>



On Sat, Apr 11, 2020 at 12:58 PM Wyllys Ingersoll <
wyllys.ingersoll@keepertech.com> wrote:

> Yes, each node has its persistent stores for each of those directories.
>
> On Sat, Apr 11, 2020 at 10:20 AM Patrick Timmins <pt...@cox.net> wrote:
>
>> Is the underlying storage for the four repositories (provenance,
>> database, flowfile, and content) consistent within a node?
>>
>> Are all three nodes in the cluster using the same type of underlying
>> storage/device for the various NiFi repositories?
>>
>>
>> On 4/11/2020 8:45 AM, Wyllys Ingersoll wrote:
>>
>> Nope, already checked that.
>>
>> On Fri, Apr 10, 2020 at 8:23 PM Patrick Timmins <pt...@cox.net> wrote:
>>
>>> No issues here.  Sounds like a timezone / system clock / clock drift
>>> issue (in a cluster).
>>> On 4/10/2020 11:59 AM, Joe Witt wrote:
>>>
>>> The provenance repo is in large scale use by many many users so
>>> fundamentally it does work.  There are conditions that apparently need
>>> improving.  In the past couple days these items have been flagged by folks
>>> on this list, JIRAs and PRs raised and merged, all good. If you can help by
>>> creating a build of the latest and confirm it fixes your case then please
>>> do so.
>>>
>>> Thanks
>>>
>>> On Fri, Apr 10, 2020 at 12:48 PM Darren Govoni <da...@ontrenet.com>
>>> wrote:
>>>
>>>> It would seem the feature is either broken completely or only works in
>>>> specific conditions.
>>>>
>>>> Can the Nifi team put a fix on their road map for this?
>>>> Its a rather central feature to Nifi.
>>>>
>>>> Sent from my Verizon, Samsung Galaxy smartphone
>>>>
>>>> ------------------------------
>>>> *From:* Wyllys Ingersoll <wy...@keepertech.com>
>>>> *Sent:* Friday, April 10, 2020 11:17:42 AM
>>>> *To:* users@nifi.apache.org <us...@nifi.apache.org>
>>>> *Subject:* Re: Not Seeing Provenance data
>>>>
>>>> I have a similar problem with viewing provenance.  I have a 3-node
>>>> cluster in a kubernetes environment, the provenance_repository directory
>>>> for each node is on a persistent data store so it is not deleted or lost
>>>> between container restarts (which are not very common).  My
>>>> nifi.provenance.repository.max.storage.time is 24 hours.
>>>>
>>>> Whenever I try to view any provenance, nothing is ever shown.  If I
>>>> manually inspect the provenance_repository directory, there is a lucene
>>>> index and TOC being created.
>>>>
>>>> I see log messages like these:
>>>>
>>>> Submitting query +processorId:882133fe-b684-148b-ad88-7850437ca591 with
>>>> identifier 64a703fe-0171-1000-0000-000065abd91a against index directories
>>>> [./provenance_repository/lucene-8-index-1560864819888]
>>>> Returning the following list of index locations because they were
>>>> finished being written to before 1586531601311: []
>>>> Found no events in the Provenance Repository. In order to perform
>>>> maintenace of the indices, will assume that the first event time is now
>>>> (1586531601311)
>>>>
>>>>
>>>> Any suggestions?
>>>>
>>>> -Wyllys Ingersoll
>>>>
>>>>
>>>>
>>>> On Thu, Apr 9, 2020 at 11:25 AM Dobbernack, Harald (Key-Work) <
>>>> harald.dobbernack@key-work.de> wrote:
>>>>
>>>> Hey Mark,
>>>>
>>>>
>>>>
>>>> great news and thank you very much!
>>>>
>>>>
>>>>
>>>> Happy Holidays!
>>>>
>>>> Harald
>>>>
>>>>
>>>>
>>>> *Von:* Mark Payne <ma...@hotmail.com>
>>>> *Gesendet:* Donnerstag, 9. April 2020 17:18
>>>> *An:* users@nifi.apache.org
>>>> *Betreff:* Re: Not Seeing Provenance data
>>>>
>>>>
>>>>
>>>> Thanks Harald,
>>>>
>>>>
>>>>
>>>> I have created a Jira [1] for this. There’s currently a PR up for it as
>>>> well.
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> -Mark
>>>>
>>>>
>>>>
>>>> [1] https://issues.apache.org/jira/browse/NIFI-7346
>>>>
>>>>
>>>>
>>>> On Apr 9, 2020, at 11:14 AM, Dobbernack, Harald (Key-Work) <
>>>> harald.dobbernack@key-work.de> wrote:
>>>>
>>>>
>>>>
>>>> Hi Mark,
>>>>
>>>>
>>>>
>>>> I can confirm after testing that if no provenance event has been
>>>> generated in a time greater than the set nifi.provenance.repository.max.storage.time
>>>> then as expected the last recorded provenance events don’t exist anymore
>>>> but also from then on any new provenance events are also not searchable,
>>>> the provenance Search remains completely empty regardless of how many flows
>>>> are active.  As described also *.prov file is then missing in provenance
>>>> repository. After restart of Nifi new prov File will be generated and
>>>> provenance will work again, but only showing stuff generated since last
>>>> NiFi Start.
>>>>
>>>>
>>>>
>>>> So yes, I’d say your Idea
>>>>
>>>>     ‘If so, then I think that would understand why it deleted the data.
>>>> It’s trying to age off old data
>>>>
>>>>      but unfortunately it doesn’t perform a check to first determine
>>>> whether or not the “old file”
>>>>
>>>>      that it’s about to delete is also the “active file”.’
>>>>
>>>> fits very nicely to my test.
>>>>
>>>>
>>>>
>>>> As a workaround we’re going to set a greater nifi.provenance.repository.max.storage.time
>>>> until this can be resolved.
>>>>
>>>>
>>>>
>>>> Thanks again for looking into this.
>>>>
>>>> Harald
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Von:* Dobbernack, Harald (Key-Work)
>>>> *Gesendet:* Donnerstag, 9. April 2020 15:22
>>>> *An:* users@nifi.apache.org
>>>> *Betreff:* AW: Not Seeing Provenance data
>>>>
>>>>
>>>>
>>>> Hi Mark,
>>>>
>>>>
>>>>
>>>> thank you for looking into this.
>>>>
>>>>
>>>>
>>>> The nifi.provenance.repository.max.storage.time setting might explain
>>>> why I haven’t been experiencing the effect so often since changing from the
>>>> default to 120 hours a few months ago 😉
>>>>
>>>>
>>>>
>>>> But I believe provenance stopped working last time although there was
>>>> an ‘active’ flows in wait Processor, expiring every hour, going on to ‘send
>>>> a message’ before being rerouted to the same wait processor. I would have
>>>> expected this generates provenance entries?  As I am not actually 100% sure
>>>> if that wait processor was in use when last provenance got lost I will
>>>> check with a testing system to see if I can reproduce provenance breakage
>>>> when no active flows are around for a time greater
>>>>  nifi.provenance.repository.max.storage.time and I will get back to
>>>> you.
>>>>
>>>>
>>>>
>>>> Thank you!
>>>>
>>>> Harald
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Von:* Mark Payne <ma...@hotmail.com>
>>>> *Gesendet:* Donnerstag, 9. April 2020 14:41
>>>> *An:* users@nifi.apache.org
>>>> *Betreff:* Re: Not Seeing Provenance data
>>>>
>>>>
>>>>
>>>> Hey Daren, Herald,
>>>>
>>>>
>>>>
>>>> Thanks for the note. I have seen this once before but couldn’t figure
>>>> out what caused it. Restarting addressed the issue.
>>>>
>>>>
>>>>
>>>> I think I may understand the problem, now, though, after looking at it
>>>> again.
>>>>
>>>>
>>>>
>>>> In nifi.properties, there are a couple of property named
>>>> “nifi.provenance.repository.max.storage.time” that defaults to “24 hours"
>>>>
>>>> Is it possible that you went 24 hours (or whatever value is set for
>>>> that property) without generating any Provenance events?
>>>>
>>>>
>>>>
>>>> If so, then I think that would understand why it deleted the data. It’s
>>>> trying to age off old data but unfortunately it doesn’t perform a check to
>>>> first determine whether or not the “old file” that it’s about to delete is
>>>> also the “active file”.
>>>>
>>>>
>>>>
>>>> Can you confirm whether or not you would expect to see 24 hours pass
>>>> without any provenance data?
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> -Mark
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Apr 9, 2020, at 4:32 AM, Dobbernack, Harald (Key-Work) <
>>>> harald.dobbernack@key-work.de> wrote:
>>>>
>>>>
>>>>
>>>> What I noticed is that as long as provenance is working there will be
>>>> *.prov files in the directory. When Provenance isn’t working these files
>>>> are not to be seen. Maybe some Cleaning Process deletes those files
>>>> prematurely or the process building them doesn’t work any more?
>>>>
>>>>
>>>>
>>>> *Von:* Dobbernack, Harald (Key-Work) <ha...@key-work.de>
>>>> *Gesendet:* Donnerstag, 9. April 2020 10:27
>>>> *An:* users@nifi.apache.org
>>>> *Betreff:* AW: Not Seeing Provenance data
>>>>
>>>>
>>>>
>>>> This is something I experience too from time to time. My quick and
>>>> dirty workaround is stop nifi, delete everything in the provenance
>>>> directory, restart….  Then Provenance is usable again (of course only with
>>>> data since the delete) . I’m hoping very much there is a better way,
>>>> someone can show us better settings or a potential bug can be discovered…
>>>>
>>>>
>>>>
>>>> *Von:* Darren Govoni <da...@ontrenet.com>
>>>> *Gesendet:* Mittwoch, 8. April 2020 20:31
>>>> *An:* users@nifi.apache.org
>>>> *Betreff:* Not Seeing Provenance data
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>   When I go to "View data provenance" in Nifi, I never see any logs for
>>>> my flow. Am I missing some configuration setting somewhere?
>>>>
>>>>
>>>>
>>>> thanks,
>>>>
>>>> Darren
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Harald Dobbernack*
>>>> Key-Work Consulting GmbH | Kriegsstr. 100 | 76133 | Karlsruhe | Germany
>>>> | https://www.key-work.de | Datenschutz
>>>> <https://www.key-work.de/de/footer/datenschutz.html>
>>>> Fon: +49-721-78203-264 | E-Mail: harald.dobbernack@key-work.de | Fax:
>>>> +49-721-78203-10
>>>>
>>>> Key-Work Consulting GmbH, Karlsruhe, HRB 108695, HRG Mannheim
>>>> Geschäftsführer: Andreas Stappert, Tobin Wotring
>>>>
>>>>
>>>>
>>>>

-- 
Saludos

Jairo Henao

*Chat Skype: jairo.henao.05*