You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Mike Thomsen <mi...@gmail.com> on 2018/12/12 18:06:46 UTC

NiFi suddenly releases a lot of disk space on shutdown

I configured the content repository to append 10 files per flowfile because
I'm dealing with a lot of decompressing and recompressing of small files.
The content repo goes up and down appropriately as the content claims are
removed, but I noticed that after a lot of heavy processing sometimes
20-25GB (on a 100GB hdd) would stay used (as shown by df -h) until I stop
NiFi. Then it suddenly gets purged.

Weird part is that when I do du -h on the whole NiFi folder where things
are stored, I don't see anywhere that much disk space being allocated.

Anyone have any ideas on what might be going on?

Thanks,

Mike

Re: NiFi suddenly releases a lot of disk space on shutdown

Posted by Mike Thomsen <mi...@gmail.com>.
So here's the explanation in case someone hits this on Google...

The processor wraps the JavaMail API and uses MStor to read MBox files. I
wrote the FlowFile contents to a temp file because the JavaMail API
couldn't use an InputStream. I forgot to close the "Folder" object and only
closed the "Store" object from the JavaMail API. Apparently, that was not
enough to release the file handle that the API was using behind the covers,
so the delete() method on the temp File object marked the file as deleted,
but the OS was not able to actually fully free the space until the JVM
closed.

So you need to explicitly close any open Folders, then the Store, before
deleting temporary storage in a situation like this with JavaMail.

On Wed, Dec 12, 2018 at 7:43 PM Joe Witt <jo...@gmail.com> wrote:

> good catch mike.  that is def a thing.  historically the jvm had issues
> with memory mapped io as well not being able to let go of files until
> restart.
>
> On Wed, Dec 12, 2018, 7:19 PM Mike Thomsen <mikerthomsen@gmail.com wrote:
>
>> Mike,
>>
>> I did lsof +L1 and saw a ton of files listed that were marked (deleted),
>> but the OS was hanging onto them. They were all temporary files that were
>> created by the processor I wrote because the API it wrapped has no
>> InputStream support. I saw some stuff on Google that hinted that it might
>> be an old JVM bug that causes File#delete to not work when a stream is
>> still open. So it looks like it might be some bug deeper in the API that
>> I'm using.
>>
>> On Wed, Dec 12, 2018 at 4:50 PM Mike Thomsen <mi...@gmail.com>
>> wrote:
>>
>>> Thanks, but what I cannot figure out is why du -h is reporting that the
>>> content repository is only a few GB and then suddenly tens of GB are
>>> released when NiFi shuts down. It's like the OS cannot accurately report
>>> who is claiming that much space.
>>>
>>> On Wed, Dec 12, 2018 at 2:19 PM Michael Moser <mo...@gmail.com>
>>> wrote:
>>>
>>>> Hi Mike,
>>>>
>>>> I spent some time looking into this about a year ago.  Here are some
>>>> JIRA tickets that describe what I think you are seeing.
>>>>
>>>> https://issues.apache.org/jira/browse/NIFI-3376
>>>> https://issues.apache.org/jira/browse/NIFI-3736
>>>> https://issues.apache.org/jira/browse/NIFI-4287
>>>>
>>>> Regards
>>>>
>>>>
>>>> On Wed, Dec 12, 2018 at 1:07 PM Mike Thomsen <mi...@gmail.com>
>>>> wrote:
>>>>
>>>>> I configured the content repository to append 10 files per flowfile
>>>>> because I'm dealing with a lot of decompressing and recompressing of small
>>>>> files. The content repo goes up and down appropriately as the content
>>>>> claims are removed, but I noticed that after a lot of heavy processing
>>>>> sometimes 20-25GB (on a 100GB hdd) would stay used (as shown by df -h)
>>>>> until I stop NiFi. Then it suddenly gets purged.
>>>>>
>>>>> Weird part is that when I do du -h on the whole NiFi folder where
>>>>> things are stored, I don't see anywhere that much disk space being
>>>>> allocated.
>>>>>
>>>>> Anyone have any ideas on what might be going on?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mike
>>>>>
>>>>

Re: NiFi suddenly releases a lot of disk space on shutdown

Posted by Joe Witt <jo...@gmail.com>.
good catch mike.  that is def a thing.  historically the jvm had issues
with memory mapped io as well not being able to let go of files until
restart.

On Wed, Dec 12, 2018, 7:19 PM Mike Thomsen <mikerthomsen@gmail.com wrote:

> Mike,
>
> I did lsof +L1 and saw a ton of files listed that were marked (deleted),
> but the OS was hanging onto them. They were all temporary files that were
> created by the processor I wrote because the API it wrapped has no
> InputStream support. I saw some stuff on Google that hinted that it might
> be an old JVM bug that causes File#delete to not work when a stream is
> still open. So it looks like it might be some bug deeper in the API that
> I'm using.
>
> On Wed, Dec 12, 2018 at 4:50 PM Mike Thomsen <mi...@gmail.com>
> wrote:
>
>> Thanks, but what I cannot figure out is why du -h is reporting that the
>> content repository is only a few GB and then suddenly tens of GB are
>> released when NiFi shuts down. It's like the OS cannot accurately report
>> who is claiming that much space.
>>
>> On Wed, Dec 12, 2018 at 2:19 PM Michael Moser <mo...@gmail.com> wrote:
>>
>>> Hi Mike,
>>>
>>> I spent some time looking into this about a year ago.  Here are some
>>> JIRA tickets that describe what I think you are seeing.
>>>
>>> https://issues.apache.org/jira/browse/NIFI-3376
>>> https://issues.apache.org/jira/browse/NIFI-3736
>>> https://issues.apache.org/jira/browse/NIFI-4287
>>>
>>> Regards
>>>
>>>
>>> On Wed, Dec 12, 2018 at 1:07 PM Mike Thomsen <mi...@gmail.com>
>>> wrote:
>>>
>>>> I configured the content repository to append 10 files per flowfile
>>>> because I'm dealing with a lot of decompressing and recompressing of small
>>>> files. The content repo goes up and down appropriately as the content
>>>> claims are removed, but I noticed that after a lot of heavy processing
>>>> sometimes 20-25GB (on a 100GB hdd) would stay used (as shown by df -h)
>>>> until I stop NiFi. Then it suddenly gets purged.
>>>>
>>>> Weird part is that when I do du -h on the whole NiFi folder where
>>>> things are stored, I don't see anywhere that much disk space being
>>>> allocated.
>>>>
>>>> Anyone have any ideas on what might be going on?
>>>>
>>>> Thanks,
>>>>
>>>> Mike
>>>>
>>>

Re: NiFi suddenly releases a lot of disk space on shutdown

Posted by Mike Thomsen <mi...@gmail.com>.
Mike,

I did lsof +L1 and saw a ton of files listed that were marked (deleted),
but the OS was hanging onto them. They were all temporary files that were
created by the processor I wrote because the API it wrapped has no
InputStream support. I saw some stuff on Google that hinted that it might
be an old JVM bug that causes File#delete to not work when a stream is
still open. So it looks like it might be some bug deeper in the API that
I'm using.

On Wed, Dec 12, 2018 at 4:50 PM Mike Thomsen <mi...@gmail.com> wrote:

> Thanks, but what I cannot figure out is why du -h is reporting that the
> content repository is only a few GB and then suddenly tens of GB are
> released when NiFi shuts down. It's like the OS cannot accurately report
> who is claiming that much space.
>
> On Wed, Dec 12, 2018 at 2:19 PM Michael Moser <mo...@gmail.com> wrote:
>
>> Hi Mike,
>>
>> I spent some time looking into this about a year ago.  Here are some JIRA
>> tickets that describe what I think you are seeing.
>>
>> https://issues.apache.org/jira/browse/NIFI-3376
>> https://issues.apache.org/jira/browse/NIFI-3736
>> https://issues.apache.org/jira/browse/NIFI-4287
>>
>> Regards
>>
>>
>> On Wed, Dec 12, 2018 at 1:07 PM Mike Thomsen <mi...@gmail.com>
>> wrote:
>>
>>> I configured the content repository to append 10 files per flowfile
>>> because I'm dealing with a lot of decompressing and recompressing of small
>>> files. The content repo goes up and down appropriately as the content
>>> claims are removed, but I noticed that after a lot of heavy processing
>>> sometimes 20-25GB (on a 100GB hdd) would stay used (as shown by df -h)
>>> until I stop NiFi. Then it suddenly gets purged.
>>>
>>> Weird part is that when I do du -h on the whole NiFi folder where things
>>> are stored, I don't see anywhere that much disk space being allocated.
>>>
>>> Anyone have any ideas on what might be going on?
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>

Re: NiFi suddenly releases a lot of disk space on shutdown

Posted by Mike Thomsen <mi...@gmail.com>.
Thanks, but what I cannot figure out is why du -h is reporting that the
content repository is only a few GB and then suddenly tens of GB are
released when NiFi shuts down. It's like the OS cannot accurately report
who is claiming that much space.

On Wed, Dec 12, 2018 at 2:19 PM Michael Moser <mo...@gmail.com> wrote:

> Hi Mike,
>
> I spent some time looking into this about a year ago.  Here are some JIRA
> tickets that describe what I think you are seeing.
>
> https://issues.apache.org/jira/browse/NIFI-3376
> https://issues.apache.org/jira/browse/NIFI-3736
> https://issues.apache.org/jira/browse/NIFI-4287
>
> Regards
>
>
> On Wed, Dec 12, 2018 at 1:07 PM Mike Thomsen <mi...@gmail.com>
> wrote:
>
>> I configured the content repository to append 10 files per flowfile
>> because I'm dealing with a lot of decompressing and recompressing of small
>> files. The content repo goes up and down appropriately as the content
>> claims are removed, but I noticed that after a lot of heavy processing
>> sometimes 20-25GB (on a 100GB hdd) would stay used (as shown by df -h)
>> until I stop NiFi. Then it suddenly gets purged.
>>
>> Weird part is that when I do du -h on the whole NiFi folder where things
>> are stored, I don't see anywhere that much disk space being allocated.
>>
>> Anyone have any ideas on what might be going on?
>>
>> Thanks,
>>
>> Mike
>>
>

Re: NiFi suddenly releases a lot of disk space on shutdown

Posted by Michael Moser <mo...@gmail.com>.
Hi Mike,

I spent some time looking into this about a year ago.  Here are some JIRA
tickets that describe what I think you are seeing.

https://issues.apache.org/jira/browse/NIFI-3376
https://issues.apache.org/jira/browse/NIFI-3736
https://issues.apache.org/jira/browse/NIFI-4287

Regards


On Wed, Dec 12, 2018 at 1:07 PM Mike Thomsen <mi...@gmail.com> wrote:

> I configured the content repository to append 10 files per flowfile
> because I'm dealing with a lot of decompressing and recompressing of small
> files. The content repo goes up and down appropriately as the content
> claims are removed, but I noticed that after a lot of heavy processing
> sometimes 20-25GB (on a 100GB hdd) would stay used (as shown by df -h)
> until I stop NiFi. Then it suddenly gets purged.
>
> Weird part is that when I do du -h on the whole NiFi folder where things
> are stored, I don't see anywhere that much disk space being allocated.
>
> Anyone have any ideas on what might be going on?
>
> Thanks,
>
> Mike
>