You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Brock Noland <br...@cloudera.com> on 2012/07/31 20:50:15 UTC

Re: Flume 1.2.0 HDFS Sink Output File Question

Hi,

I agree, it does not appear to work that way today. It looks like there is
already a JIRA for this https://issues.apache.org/jira/browse/FLUME-1350

If you have any ideas or patches, please update that JIRA!

Brock

On Tue, Jul 31, 2012 at 1:37 PM, Yongcheng Li <Yo...@sas.com> wrote:

>  Does anyone have comment on using time (such as day/hour) as part of the
> file name? When it crosses the boundary of the defined time period, Flume
> creates a new file. What is the expected way of handling the old file (it
> does not meet any of the roll over condition yet)? I would expect Flume to
> flush data out to disk, close that file and remove the .tmp suffix. Am I
> right? It does not behave in this manner right now.****
>
> ** **
>
> Regards,****
>
> ** **
>
> Yongcheng****
>
> ** **
>
> *From:* Gumnaam Sur [mailto:gumnaam.sur@gmail.com]
> *Sent:* Tuesday, July 31, 2012 2:04 PM
> *To:* user@flume.apache.org
> *Subject:* Re: Flume 1.2.0 HDFS Sink Output File Question****
>
> ** **
>
> Is there a documented way of shutting down flume ?****
>
> I just do kill -s TERM <pid> , and I do see flume shutting down normally.*
> ***
>
> But not all HDFS sink files are closed at times, even with a proper
> shutdown.****
>
> e.g. I was testing a setup with 5 HDFS sinks, and only the last one
> defined in the conf file was****
>
> being renamed to remove '.tmp' the other four still had '.tmp' extension.*
> ***
>
> On Tue, Jul 31, 2012 at 1:52 PM, Denny Ye <de...@gmail.com> wrote:****
>
> hi Yongcheng, ****
>
>     Flume doesn't recheck the destination in last Agent lifecycle. The
> last temporary file is not be reused in current process. Possible reason of
> this case might be : 1. Did that temporary file was closed normally? If
> not, Flume should close that file with appropriate way like 'recoverLease'
> interface.  2. Does that file name can be reuse in latest path pattern?***
> *
>
>     ****
>
>     No matter which case, we hope that there is unified activity in path
> pattern. Just like your mention, I agree with you. Need some other guys to
> discuss may be.****
>
> ** **
>
> -Regards****
>
> Denny Ye****
>
> ** **
>
> 2012/7/31 Yongcheng Li <Yo...@sas.com>****
>
> Hi,****
>
>  ****
>
> I am using Flume 1.2.0 HDFS sink. When Flume crashes (being killed), a
> file name with a suffix of .tmp is generated. I believe it contains the
> data that were flushed into disk when the crash happens. But why does it
> have a .tmp suffix? Shouldn’t Flume just write it into a regular file
> (without .tmp)?****
>
>  ****
>
> I am using month/day/hour as part of my HDFS file name (%m_%d_%H). When
> the hour passes, it still has a file like 07_31_09.events.1343742385766.tmp
> with a size of zero. Shouldn’t Flume just close that file and remove the
> .tmp suffix? When I kill Flume, I can see data written into this file but
> still with a .tmp suffix.****
>
>  ****
>
> Thanks!****
>
>  ****
>
> Yongcheng****
>
> ** **
>
> ** **
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/