You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Elli Schwarz <el...@yahoo.com> on 2021/08/06 15:56:13 UTC

Re: NiFi Get's Stuck Waiting On Non Existent Archive Cleanup

 We are experiencing this issue as well. We just upgraded from Nifi 1.11.4 to 1.13.2, and are running in to this issue where many of our high-usage Nifi instances are just hanging. For example, we have a 7 node cluster that has flowfiles stuck in queues and not moving. We noticed that on 3 of those nodes, the flowfile content storage was over 50%, and those are the nodes that have flowfiles stuck in the queue. The other nodes have nothing on them. No new data is flowing in to the cluster at all, and nothing is moving on any of the nodes. We see this problem also on non-cluster machines; the cluster just makes it more obvious that this archive max usage percentage might be the cause.
We have a lot of merge content processors. We realize that there were a lot of I/O improvements in the newer version of Nifi - Joe, we suspect these efficiencies might be exacerbating the problem:
NiFi 1.13.1 - [full_list]
   
   - [NIFI-7646] - Improve performance of MergeContent / others that read content of many small FlowFiles   

   - [NIFI-8222] - When processing a lot of small FlowFiles, Provenance Repo spends most of its time in lock contention. That can be improved.

NiFi 1.14.0 - [full list]
   
   - [NIFI-8633] - Content Repository can be improved to make fewer disks accesses on read.
   
   - Mark Payne's notes: "For those interested in the actual performance numbers here, I ran a pretty simple flow that generated a lot of tiny JSON messages, and then used ConvertRecord to convert from JSON to Avro. Ran a profiler against it and found that about 50% of the time for ConvertRecord was spent in FileSystemRepository.read(). This is called twice - once when we read the data for inferring schema, a second time when we parse the data.   
   
Of the time spent in FileSystemRepository.read(), about 50% of that time was spent in Files.exists(). So this should improve performance of that flow by something like 25%"

We didn't know about the ...archive.backpressure.percentage property - we don't see it in the Admin guide https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html. We will set this property to a lot higher than 2% above the max usage percentage and see how it goes. Now that we think about it, we believe we've experienced this problem occasionally before the upgrade, but it has become very frequent since the upgrade.

-Elli
    On Monday, May 3, 2021, 01:09:47 PM EDT, Shawn Weeks <sw...@weeksconsulting.us> wrote:  
 
 Sorry, I wasn't saying that 'nifi.content.repository.archive.max.usage.percentage' was new I just hadn't managed to get a NiFi instance stuck this way and even the documentation says that if archive is empty and the content repo needs more room it would disable the archive. I'm having trouble find where ' nifi.content.repository.archive.backpressure.percentage' is documented.

Thanks

-----Original Message-----
From: Mark Payne <ma...@hotmail.com> 
Sent: Monday, May 3, 2021 12:00 PM
To: users@nifi.apache.org
Subject: Re: NiFi Get's Stuck Waiting On Non Existent Archive Cleanup

Shawn,

There are a couple of properties at play. The “nifi.content.repository.archive.max.usage.percentage" property behaves as you have described. But there’s also a second property: nifi.content.repository.archive.backpressure.percentage
This controls at what point the Content Repository will actually apply back-pressure in order to avoid filling the disk. This property defaults to 2% more than the the max.usage.percentage. So by default it uses 50% and 52%.
You can adjust the backpressure percentage to something much higher like 80%. So then if you reach 50% it would start clearing things out, and if you reach 80% it’ll start applying the brakes. This is here as a safeguard because we’ve had data flows that can produce the data much faster than it could archive/delete the data. This is common for data flows that produce huge numbers of files in the content repository. So that backpressure is there to ensure that the archive has a chance to run.

This has always been here, though, ever since the initial open sourcing. Is not something new. It may be the case that in later versions we have been more efficient at creating the data, such that it’s now exceeding the rate that the cleanup can happen, not sure. But adjusting the “nifi.content.repository.archive.max.usage.percentage” property should get you into a better state.

Thanks
-Mark


> From: Shawn Weeks <sw...@weeksconsulting.us>
> Date: Mon, May 3, 2021 at 9:33 AM
> Subject: RE: NiFi Get's Stuck Waiting On Non Existent Archive Cleanup
> To: users@nifi.apache.org <us...@nifi.apache.org>
> 
> 
> Note I have a 2 node cluster which is why it’s sitting at around 900 GB. Per node content repo is sitting at 535gb currently and I’m not sure where the rest of the space is. I have 472GB free on each node in the content_repository partition as shown in the Cluster panel.
> 
>  
> 
> Thanks
> 
> Shawn Weeks
> 
>  
> 
> From: Shawn Weeks 
> Sent: Monday, May 3, 2021 11:30 AM
> To: users@nifi.apache.org
> Subject: NiFi Get's Stuck Waiting On Non Existent Archive Cleanup
> 
>  
> 
> I’m not sure if this is specific to clustering or not but using the default configuration with 50% content archiving it is possible to cause NiFi to quit processing any data by simple filling up a queue with 50% of your content_repository storage. In my example my content_repository is 1TB and once a queue get’s to 500gb or so the next processor won’t process any more data. Once this occurs even stopping GenerateFlowFile won’t fix the problem and my CompressContent never does anything. It’s my understanding that “nifi.content.repository.archive.max.usage.percentage” only set’s the max amount of space that archive’s will use and should never prevent new content from being written in the 1.13.2 it appears be functioning as a reserve instead. I haven’t seen this in older versions of NiFi like 1.9.2 and I’m not sure when the behavior changed but even the documentation seems to indicate that this should not be happening. For example ‘If the archive is empty and content repository disk usage is above this percentage, then archiving is temporarily disabled.’
> 
>  
> 
> <image001.png>
> 
>  
> 
> <image002.png>
> 
>  
> 
> Thanks
> 
> Shawn Weeks
>