You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Attila Simon (JIRA)" <ji...@apache.org> on 2016/05/31 19:16:12 UTC

[jira] [Commented] (FLUME-2918) TaildirSource is underperforming with huge parent directories

    [ https://issues.apache.org/jira/browse/FLUME-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308404#comment-15308404 ] 

Attila Simon commented on FLUME-2918:
-------------------------------------

After checking the control flow it turned out that the function (ReliableTaildirEventReader.getMatchFiles) - which is responsible for checking whether new files has been added or removed within the parent dir of the file pattern - is called every time when the PollableSourceRunner$PollingRunner instructed the TaildirSource to harvest new data. Even though nothing changed in that directory. This check requires listing all of the files and filtering those using a pattern match and a isDirectory check within a single if statement calling directory check first. Profiling showed that isDirectory is much more expensive call than pattern match on the filename so changing the order of the expressions would speed up the evaluation(short-circuit nature of the java evaluation of boolean expressions) hence listing the dir. On the other hand caching what was the last modification time of the parent directory and the list of matched files for each filepattern prevent unnecessary rechecks.

> TaildirSource is underperforming with huge parent directories
> -------------------------------------------------------------
>
>                 Key: FLUME-2918
>                 URL: https://issues.apache.org/jira/browse/FLUME-2918
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>            Reporter: Attila Simon
>              Labels: performance
>             Fix For: v1.7.0
>
>
> TailDir source cause high cpu utilization, when large amount of file is sitting in the target directory. File pattern matches only a single file, but the parent directory contains about 50,000 other file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)