You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Mark Payne <ma...@hotmail.com> on 2015/08/03 20:59:30 UTC

RE: [jira] [Commented] (NIFI-800) FlowFile Repository can become corrupt if OutOfMemoryError is encountered

Joe,

Nice catch! That was there for debugging purposes, so that i could set a breakpoint and know how many bytes had been read off of the stream. I can take that out.

Thanks
-Mark

----------------------------------------
> Date: Mon, 3 Aug 2015 18:56:05 +0000
> From: jira@apache.org
> To: commits@nifi.apache.org
> Subject: [jira] [Commented] (NIFI-800) FlowFile Repository can become corrupt if OutOfMemoryError is encountered
>
>
> [ https://issues.apache.org/jira/browse/NIFI-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652268#comment-14652268 ]
>
> Joe Skora commented on NIFI-800:
> --------------------------------
>
> Mark,
>
> This version includes a ByteCountingInputStream in the createDataInputStream() in MinimalLockingWirteAheadLog, but I can't find where it is used. What is the significance of adding the ByteCountingInputStream?
>
> Joe
>
>> FlowFile Repository can become corrupt if OutOfMemoryError is encountered
>> -------------------------------------------------------------------------
>>
>> Key: NIFI-800
>> URL: https://issues.apache.org/jira/browse/NIFI-800
>> Project: Apache NiFi
>> Issue Type: Bug
>> Components: Core Framework
>> Reporter: Mark Payne
>> Assignee: Mark Payne
>> Priority: Critical
>> Fix For: 0.3.0
>>
>> Attachments: 0001-NIFI-800-Ensured-that-all-Throwable-that-gets-thrown.patch
>>
>>
>> If NiFi runs out of memory and the JVM starts throwing OutOfMemoryError, it is possible that the FlowFile Repository can become corrupt. This results in NiFi not being able to restart without deleting the FlowFile repository.
>> While the application can't be expected to run perfectly in the face of OutOfMemoryErrors, it should be able to continually properly after a restart of the application.
>> The issue appears to be that the MinimalLockingWriteAheadLog class catches Exception when it calls Partition.update and IOException when it calls Partition.rollover; if an Exception is caught, the Partition is blacklisted so that it cannot be updated again until the repo is checkpointed. It should catch Throwable, as any unexpected termination of the method call leaves the Partition in a 'bad state' because it potentially has a partial record written to it.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)