You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@ambari.apache.org by Oliver Szabo <os...@hortonworks.com> on 2017/07/02 15:06:56 UTC

Re: Review Request 60568: Log Feeder input config attribute "tail" should be clearer

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60568/#review179479
-----------------------------------------------------------


Ship it!




Ship It!

- Oliver Szabo


On June 30, 2017, 3:13 p.m., Miklos Gergely wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60568/
> -----------------------------------------------------------
> 
> (Updated June 30, 2017, 3:13 p.m.)
> 
> 
> Review request for Ambari, Oliver Szabo and Robert Nettleton.
> 
> 
> Bugs: AMBARI-21387
>     https://issues.apache.org/jira/browse/AMBARI-21387
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> tail = true should mean that only the latest log file is loaded
> tail = false should mean that the previous rolled over log files are loaded too, but not followed
> 
> AbstractInputFile must be prepared for previous rolled over files to check in even when the next part is actively handled, as the checking in is done when the data was moved to solr, so all file related fields were moved to maps, so the checkin can identify which file's status is checked in.
> 
> Log files are sorted alphabetically after identified, thus ensureing that the first one is the active one. In practice they were sorted already, but the File.listFiles function doesn't ensures it according to it's documentation, so it's better to do so explicitly.
> 
> The thread that is responsible for deleting unused checkpoint files should do one more check before deleting a checkpoint file because the file it belongs to is no more there, or it's file key doesn't matches the one in the checkpoint file: first it should look up if there is any other file in the same folder which have the same file key, in which case it is assumed that the file was renamed since. As the code that's looking for a checkpoint file to determine the next line to load does that by the name of the checkpoint file, which is generated from the file key it will successfuly identify it irrelevant of the name of the file.
> 
> I know, it's a bit complicated :) But I couldn't put it into words less complicated.
> 
> 
> Also fixed Log Search Config ZK bug not to handle not node data change related events.
> 
> 
> Diffs
> -----
> 
>   ambari-logsearch/ambari-logsearch-config-zookeeper/src/main/java/org/apache/ambari/logsearch/config/zookeeper/LogSearchConfigZK.java 26375e1 
>   ambari-logsearch/ambari-logsearch-logfeeder/src/main/java/org/apache/ambari/logfeeder/input/AbstractInputFile.java ab50eb7 
>   ambari-logsearch/ambari-logsearch-logfeeder/src/main/java/org/apache/ambari/logfeeder/input/Input.java c36f96b 
>   ambari-logsearch/ambari-logsearch-logfeeder/src/main/java/org/apache/ambari/logfeeder/input/InputFile.java fc40ca4 
>   ambari-logsearch/ambari-logsearch-logfeeder/src/main/java/org/apache/ambari/logfeeder/input/InputManager.java 19894ae 
>   ambari-logsearch/ambari-logsearch-logfeeder/src/main/java/org/apache/ambari/logfeeder/input/InputS3File.java 2b19503 
>   ambari-logsearch/ambari-logsearch-logfeeder/src/test/java/org/apache/ambari/logfeeder/input/InputManagerTest.java e9bbe7e 
> 
> 
> Diff: https://reviews.apache.org/r/60568/diff/1/
> 
> 
> Testing
> -------
> 
> Tested on local vagrant cluster, by loading previous files again and again, and by renaming rolled over files. In both cases no duplicate data were loaded, checkpoints were handled correctly.
> 
> 
> Thanks,
> 
> Miklos Gergely
> 
>