You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Olson, Eric" <Er...@adm.com> on 2021/04/12 20:56:14 UTC

ListFile with DistributedMapCache

I have a number of directories I'm monitoring with ListFile, working in Tracking Entities mode, backed by a DistributedMapCacheClientService. That works pretty well, except when NiFi gets restarted for some reason or another, in which case all the ListFiles return their entire directory again, which causes a lot of unnecessary reprocessing. So I'm changing the DMCServer to one backed by a persistence directory. This way, when NiFi restarts, all the ListFile processors pick up again as if nothing happened. But it's not working as I expect and I'm wondering if I don't understand what it's doing.

1. I can use mburgess's dcache.groovy script<https://community.cloudera.com/t5/Community-Articles/Working-with-a-NiFi-DistributedMapCache/ta-p/248370> to list and remove cache entries from the DMC. If I stop a particular ListFile, delete its cache entry, then restart it, it doesn't output any FlowFiles. I would expect it to list the directory contents again. After all, it's cache entry is gone, so it no longer has a history. Stopping and restarting the ClientService and Server have no effect. But if I restart NiFi as a whole, it does.

2. Similarly, if I have ListFile backed by a DMC with no persistence directory, change it to a cache that does have a persistence directory, then start ListFile, it outputs no FlowFiles. That means I've changed it from a cache with history to a different cache with no history, but it behaves like it does until I restart NiFi.

So maybe I'm not understanding how ListFile interacts with a DistributedMapCache?



Confidentiality Notice:
This message may contain confidential or privileged information, or information that is otherwise exempt from disclosure. If you are not the intended recipient, you should promptly delete it and should not disclose, copy or distribute it to others.