You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Mike Percy (JIRA)" <ji...@apache.org> on 2012/11/13 22:06:13 UTC

[jira] [Commented] (FLUME-1702) HDFSEventSink should write to a hidden file as opposed to a .tmp file

    [ https://issues.apache.org/jira/browse/FLUME-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496540#comment-13496540 ] 

Mike Percy commented on FLUME-1702:
-----------------------------------

Whoops, didn't notice you filed this JIRA Brock. Adding description from dup ticket:

We should add the capability to the HDFS sink to specify a prefix for the .tmp files. I believe this needs to be configurable and disabled by default.
However we should document that we recommend "_" or "." as a prefix for the temp files.
This is because Hadoop's default FileInputFormat will skip files beginning with "_" or "." (hidden files)
                
> HDFSEventSink should write to a hidden file as opposed to a .tmp file
> ---------------------------------------------------------------------
>
>                 Key: FLUME-1702
>                 URL: https://issues.apache.org/jira/browse/FLUME-1702
>             Project: Flume
>          Issue Type: Improvement
>            Reporter: Brock Noland
>
> Currently we write to a .tmp file. The problem is that if MR jobs are being run on the directory we are writing to, then it's common for an MR job to list the directory, get a .tmp file and then in the mean time the .tmp file is renamed causing the job to fail when run.
> Using JavaMR you can use a PathFilter to avoid this, however a custom solution is required for Pig, Hive, etc.
> Perhaps we should write to a hidden file so that MR never tries to process data in flight.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira