You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nifi.apache.org by "Bryan Bende (JIRA)" <ji...@apache.org> on 2015/11/06 20:44:11 UTC

[jira] [Commented] (NIFI-994) Processor to tail files

    [ https://issues.apache.org/jira/browse/NIFI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994316#comment-14994316 ] 

Bryan Bende commented on NIFI-994:
----------------------------------

I've been testing this processor for the past two days and overall it is awesome! 

I created one scenario that I have reproduced a couple of times where it seems like the processor re-reads some lines from the last rolled file that it has already read. I added some logging to the processor to see what was going on in recoverRolledFiles() and here is what prints out when I see the problem:

{code}
2015-11-06 14:08:56,882 INFO [Timer-Driven Process Thread-10] o.a.nifi.processors.standard.TailFile TailFile[id=6b24b195-9fc6-4783-957f-13f891236de0] RECOVERED ROLLED FILES WITH STATE TIMESTAMP OF 1446836931000
2015-11-06 14:08:56,882 INFO [Timer-Driven Process Thread-10] o.a.nifi.processors.standard.TailFile TailFile[id=6b24b195-9fc6-4783-957f-13f891236de0] RECOVERED ROLLED FILE solr.log.1 WITH LAST MODIFIED TIME OF 1446836931000
2015-11-06 14:08:56,882 INFO [Timer-Driven Process Thread-10] o.a.nifi.processors.standard.TailFile TailFile[id=6b24b195-9fc6-4783-957f-13f891236de0] RECOVERED - firstFile LENGTH IS 262621 AND state.getPosition() IS 260201
2015-11-06 14:08:56,883 INFO [Timer-Driven Process Thread-10] o.a.nifi.processors.standard.TailFile TailFile[id=6b24b195-9fc6-4783-957f-13f891236de0] RECOVERED - EXPECTED RECOVERY CHECKSUM IS 3912972977 AND CHECKSUM RESULT IS 1100203812
{code}

I had TailFile stopped when solr.log rolled, started it shortly after so it picks up solr.log.1 correctly, determines that new data was written to it since the last time since the file length is > state.getPosition(), then it calculates the checksum which ends up not matching the expected checksum. I can't figure out why the checksum doesn't match, but since they don't match then it leaves that file in the list to be processed in full. 

> Processor to tail files
> -----------------------
>
>                 Key: NIFI-994
>                 URL: https://issues.apache.org/jira/browse/NIFI-994
>             Project: Apache NiFi
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Joseph Percivall
>            Assignee: Mark Payne
>             Fix For: 0.4.0
>
>         Attachments: 0001-NIFI-994-Initial-import-of-TailFile.patch, 0002-NIFI-994-Ensure-that-processor-is-not-valid-due-to-t.patch
>
>
> It's a very common data ingest situation to want to input text into the system by "tailing" a file, most commonly log files. Currently we don't have an easy way to do this. 
> A simple processor to tail a file would benefit many users. There would need to be an option to not just tail a file but pick up where the processor left off if it is interrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)