You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Tarun Lalu Hasija (Jira)" <ji...@apache.org> on 2020/12/08 00:57:00 UTC

[jira] [Commented] (NIFI-3273) MinimalLockingWriteAheadLog doesn't properly handle corrupted journals

    [ https://issues.apache.org/jira/browse/NIFI-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245597#comment-17245597 ] 

Tarun Lalu Hasija commented on NIFI-3273:
-----------------------------------------

[~markap14] we are seeing this issue with one of the production nifi nodes below is the flowfile configuration

nifi.flowfile.repository.implementation org.apache.nifi.controller.repository.WriteAheadFlowFileRepository

nifi.flowfile.repository.wal.implementation org.apache.nifi.wali.SequentialAccessWriteAheadLog

 

 
{code:java}
2020-12-07 13:12:06,826 INFO [main] o.a.n.wali.SequentialAccessWriteAheadLog Recovering records from Write-A
head Log at /var/lib/nifi/flowfile_repository
2020-12-07 13:12:08,145 INFO [main] org.apache.nifi.wali.HashMapSnapshot org.apache.nifi.wali.HashMapSnapshot@6508161b restored 73574 Records and 11 Swap Files from Snapshot, ending with Transaction ID 23742735983
2020-12-07 13:12:08,147 INFO [main] o.a.n.wali.SequentialAccessWriteAheadLog Successfully recovered 73574 re
cords and 11 swap files from Snapshot at /var/lib/nifi/flowfile_repository/checkpoint with Max Transaction I
D of 23742735983 in 1319 milliseconds. Now recovering records from 1 journal files
2020-12-07 13:12:08,159 INFO [main] o.a.nifi.wali.LengthDelimitedJournal Recovering records from journal /va
r/lib/nifi/flowfile_repository/journals/23742735984.journal
2020-12-07 13:12:09,005 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 6.83% of the way finished recoverin
g journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, having recovered 62730 updates
2020-12-07 13:12:09,971 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 13.68% of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, having recovered 135556 updates
2020-12-07 13:12:10,864 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 20.60% of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, having recovered 207179 updates
2020-12-07 13:12:11,933 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 27.45% of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, having recovered 276497 updates
2020-12-07 13:12:12,171 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 34.35% of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, having recovered 284820 updates
2020-12-07 13:12:12,445 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 41.45% of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, having recovered 292318 updates
2020-12-07 13:12:14,111 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 55.18% of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, having recovered 365112 updates
2020-12-07 13:12:14,512 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 62.44% of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, having recovered 390673 updates
2020-12-07 13:12:14,960 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 69.30% of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, having recovered 422410 updates
2020-12-07 13:12:15,585 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 76.20% of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, having recovered 461458 updates
2020-12-07 13:12:16,000 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 83.11% of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, having recovered 483167 updates
2020-12-07 13:12:16,854 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 90.05% of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, having recovered 542462 updates
2020-12-07 13:12:17,613 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 96.92% of the way finished recovering journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, having recovered 593333 updates
2020-12-07 13:12:18,038 ERROR [main] o.a.nifi.controller.StandardFlowService Failed to load flow from cluster due to: org.apache.nifi.cluster.ConnectionException: Failed to connect node to cluster due to: java.io.IOException: Expected to read a Sentinel Byte of '1' but got a value of '64' instead
org.apache.nifi.cluster.ConnectionException: Failed to connect node to cluster due to: java.io.IOException: Expected to read a Sentinel Byte of '1' but got a value of '64' instead
{code}
 

 

on running the nifitoolkit flow file repo we are getting the below message, it seems its missing the partition-* directories in the flowfile repository

 

 
{code:java}
java -cp nifi-toolkit-flowfile-repo-1.9.2.jar:/usr/hdf/current/nifi/lib/:/usr/hdf/current/nifi/ext/:nifi-utils-1.9.2.jar org.apache.nifi.toolkit.repos.flowfile.RepairCorruptedFileEndings /flowfile_repository_backup/flowfile_repository/journals/ /flowfile_repository_backup/repaired_flowfile_repository/
Found no partitions within input Repository Directory /flowfile_repository_backup/flowfile_repository/journals

{code}
 

 

> MinimalLockingWriteAheadLog doesn't properly handle corrupted journals 
> -----------------------------------------------------------------------
>
>                 Key: NIFI-3273
>                 URL: https://issues.apache.org/jira/browse/NIFI-3273
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Joe Percivall
>            Assignee: Joe Witt
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> When NiFi is running if the system dies abruptly (sudden power loss) without flushing writes then anything that was being written to disk can become corrupted. A ticket for the provenance repository is already created here[1]. The content repo handles this automatically since the content claim won't be valid if it hasn't been written out yet. The database repo is just a cache and is rebuilt anyway. The logs are handled by logback. The flow.xml.gz can be rolled back to one the last archive (manually).
> This ticket is for the MinimalLockingWriteAheadLog which backs the FlowFile repo and local state. Originally brought up here[2] for MiNiFi, it will also affect NiFi.
> One possible solution is to restore transactions up until the corrupted id and then ignore the rest. This could cause state to become out of sync with the processed flowfiles (if FF repo is restored but local state cannot be fully restored) but given the rarity of the event I think it is an appropriate risk to accept.
> The workaround for the FF repo is to set "nifi.flowfile.repository.always.sync" but currently there is no way to set "alway sync" for the local state provider.
> [1] https://issues.apache.org/jira/browse/NIFI-2890
> [2] https://community.hortonworks.com/questions/75280/why-does-my-minifi-flow-fail-to-run-when-turning-o.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)