You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Muhammad Ehsan ul Haque (JIRA)" <ji...@apache.org> on 2014/02/08 01:58:19 UTC

[jira] [Updated] (FLUME-2309) Spooling directory should not always consume the oldest file first.

     [ https://issues.apache.org/jira/browse/FLUME-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Muhammad Ehsan ul Haque updated FLUME-2309:
-------------------------------------------

    Attachment: FLUME-2309-0.patch

This patch provides.
* A consume order feature in the Spooling directory source, which will allow the users to explicitly state in which order; oldest, youngest or randomly files should be consumed from the spooling directory.
* Fixes the old implementation of selecting the file from spooling directory. Previously, each file to be consumed was selected by sorting, which might become extremly time consuming if there are many files (of the order of 10K or more). The new implementation instead do a linear scan in case when the consume order is oldest or youngest.
* Updates the Flume user guide accordingly.

> Spooling directory should not always consume the oldest file first.
> -------------------------------------------------------------------
>
>                 Key: FLUME-2309
>                 URL: https://issues.apache.org/jira/browse/FLUME-2309
>             Project: Flume
>          Issue Type: New Feature
>            Reporter: Muhammad Ehsan ul Haque
>            Priority: Minor
>         Attachments: FLUME-2309-0.patch
>
>
> The ReliableSpoolingFileEventReader reads the oldest file in the spooling directory first. This is done by listing the directory contents and then sorting file list based on timestamp. This may be very slow if there are a lot of files (of the order of 100K or more) in the directory.
> However, this is not always needed, there can be simple cases in which the order to consume the file is not important.
> There should be an option of consuming the files in arbitrary order, allowing the files to be consumed quickly without any delay.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)