You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Jarek Jarcec Cecho (JIRA)" <ji...@apache.org> on 2016/01/13 23:37:39 UTC

[jira] [Commented] (FLUME-2703) HDFS sink: Ability to exclude time counter in fileName via sink configuration

    [ https://issues.apache.org/jira/browse/FLUME-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097170#comment-15097170 ] 

Jarek Jarcec Cecho commented on FLUME-2703:
-------------------------------------------

I'm a bit concerned about introducing this functionality to flume - flume has been designed as a event based system and not necessary file based one. Trying to preserve the original filename might make it seem like we're transferring whole files which is not the case. Even with SpoolDirectorySource that reads whole files we can change order or generate duplicates at the end so the resulting file on HDFS might not end up having the same checksum. Also this can lead to a lot of issues with two independent flume agents will try to write to the same output file on HDFS.

> HDFS sink: Ability to exclude time counter in fileName via sink configuration 
> ------------------------------------------------------------------------------
>
>                 Key: FLUME-2703
>                 URL: https://issues.apache.org/jira/browse/FLUME-2703
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.7.0
>            Reporter: Hari
>            Priority: Minor
>         Attachments: FLUME-2703-0.patch
>
>
> HDFS sinks always append time counter to filenames which is not configurable.
> In some use cases, it is desirable to retain the original filename. 
> For e.g. While ingesting a blob using Spool directory source, it's desirable to retain the original filename (basename) in HDFS.  
> This patch allows to configure a HDFS sink to override this behavior retaining the backward compatible file naming convention by default i.e,
> hdfs.appendTimeCounter = false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)