You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Shara Shi <sh...@dhgate.com> on 2012/11/06 03:47:11 UTC

答复: HDFS sink leaves .tmp files

I met the similar problem that .tmp file appears on hdfs when hdfs reboot. 
I think the file dose not be closed properly . But I don't know how to
handle this problem.

Shara

-----邮件原件-----
发件人: Kathleen Ting [mailto:kathleen@apache.org] 
发送时间: 2012年9月14日 0:09
收件人: user@flume.apache.org
主题: Re: HDFS sink leaves .tmp files

Chris, glad to hear and glad to be of help. Thanks for letting us know that
it worked.

Regards, Kathleen

On Thu, Sep 13, 2012 at 7:38 AM, Chris Neal <cw...@gmail.com> wrote:
> Just to follow up, the .tmp file problem did go away using 
> 1.3.0-SNAPSHOT on the HDFS sink agent.
>
> Thanks again Kathleen :)
>
>
> On Mon, Sep 10, 2012 at 8:38 PM, Chris Neal <cw...@gmail.com> wrote:
>>
>> Thanks Kathleen!
>> I'll download that build tomorrow morning and give it a whirl.
>>
>> Chris
>>
>>
>> On Mon, Sep 10, 2012 at 5:09 PM, Kathleen Ting <ka...@apache.org>
>> wrote:
>>>
>>> [Moving to cdh-user@cloudera.org |
>>> https://groups.google.com/a/cloudera.org/group/cdh-user/topics since 
>>> this is getting to be CDH specific]
>>> bcc: user@flume.apache.org
>>>
>>> Chris,
>>>
>>> When the file has not been closed by the client, the file size may 
>>> be shown as 0. The NameNode will not update the metadata about the 
>>> file until the block is completed or the file handle is closed. Even 
>>> if it updates at a block boundary, the size won't be accurate until 
>>> the file is closed.
>>>
>>> The metadata takes some time to populate even though the files may 
>>> contain data. The CDH4.1 version of Flume includes FLUME-1238, which 
>>> will do auto-rolling of files and helps lower the period where these 
>>> files appear to be 0 size.
>>>
>>> Since the CDH3u5 version of Flume is compatible with CDH3* Hadoop 
>>> and the CDH4 Flume is compatible with CDH4* Hadoop, you can download 
>>> the nightly build of flume-ng-1.2.0-cdh4.1.0 from 
>>> http://nightly.cloudera.com/cdh4/cdh/4/
>>>
>>> Regards, Kathleen
>>>
>>> On Mon, Sep 10, 2012 at 1:08 PM, Bhaskar V. Karambelkar 
>>> <bh...@gmail.com> wrote:
>>> > Don't know about RPM, but there's a 1.2.x tarball of the 1.2 build 
>>> > @ http://archive.cloudera.com/cdh/3/flume-ng-1.2.0-cdh3u5.tar.gz
>>> >
>>> >
>>> > On Mon, Sep 10, 2012 at 3:01 PM, Chris Neal <cw...@gmail.com> wrote:
>>> >>
>>> >> Just checked, and from Cloudera, 1.1.0+121-1.cdh4.0.1.p0.1.el6 is 
>>> >> still the latest from their yum repo.
>>> >>
>>> >>
>>> >> On Mon, Sep 10, 2012 at 1:59 PM, Chris Neal <cw...@gmail.com> wrote:
>>> >>>
>>> >>> I'm using a combination :)
>>> >>>
>>> >>> The application tier is 1.3.0-SNAPSHOT The HDFS tier is CentOS, 
>>> >>> and I grabbed the latest (at the time) from the CDH repo.  It's 
>>> >>> version is:  1.1.0+121-1.cdh4.0.1.p0.1.el6
>>> >>>
>>> >>> If the issue is on the HDFS sink side, that it could definitely 
>>> >>> be in my version!
>>> >>> I'll check if Cloudera has a more recent version to update to.
>>> >>>
>>> >>> Thanks!
>>> >>> Chris
>>> >>>
>>> >>>
>>> >>> On Mon, Sep 10, 2012 at 12:37 PM, Kathleen Ting 
>>> >>> <ka...@apache.org>
>>> >>> wrote:
>>> >>>>
>>> >>>> Chris, Eran, this appears to be FLUME-1238, which was fixed in 
>>> >>>> Flume-1.2.0. Can you let me know if you are using Flume-1.2.0?
>>> >>>>
>>> >>>> Thanks, Kathleen
>>> >>>>
>>> >>>> On Mon, Sep 10, 2012 at 8:21 AM, Chris Neal <cw...@gmail.com>
>>> >>>> wrote:
>>> >>>> > Glad to know it's not just me :)
>>> >>>> >
>>> >>>> >
>>> >>>> > On Mon, Sep 10, 2012 at 10:16 AM, Eran Kutner 
>>> >>>> > <er...@gigya.com>
>>> >>>> > wrote:
>>> >>>> >>
>>> >>>> >> I have the same problem. I roll every 1 minute so I have 
>>> >>>> >> tons of those .tmp files.
>>> >>>> >>
>>> >>>> >> -eran
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> On Mon, Sep 10, 2012 at 6:02 PM, Chris Neal 
>>> >>>> >> <cw...@gmail.com>
>>> >>>> >> wrote:
>>> >>>> >>>
>>> >>>> >>> I'm still seeing this consistently every 24 hour period.  
>>> >>>> >>> Does this sound like a configuration issue, an issue with 
>>> >>>> >>> the Exec source, or an issue with the HDFS sink?
>>> >>>> >>>
>>> >>>> >>> Thanks!
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>> On Wed, Aug 29, 2012 at 9:18 AM, Chris Neal 
>>> >>>> >>> <cw...@gmail.com>
>>> >>>> >>> wrote:
>>> >>>> >>>>
>>> >>>> >>>> Hi all,
>>> >>>> >>>>
>>> >>>> >>>> I have an Exec Source running a tail -F on a 
>>> >>>> >>>> log4J-generated log file that gets rolled once a day.  It 
>>> >>>> >>>> seems that when log4J rolls the file to the new date, the 
>>> >>>> >>>> hdfs sink ends up with a .tmp file.  I haven't figured out 
>>> >>>> >>>> if there is any data loss yet, but was curious if this is 
>>> >>>> >>>> expected behavior?
>>> >>>> >>>>
>>> >>>> >>>> Thanks for your time.
>>> >>>> >>>> Chris
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>
>>> >>>> >
>>> >>>
>>> >>>
>>> >>
>>> >
>>
>>
>

Re: 答复: HDFS sink leaves .tmp files

Posted by Kathleen Ting <ka...@apache.org>.

Hi Shara,

The .tmp file contains the actual data before it is renamed to the
final file. If the .tmp file is still open, then Flume is still
holding it open in order to write to it. If Flume somehow dies and is
unable to clean up the .tmp files, then once the client lease expires,
the NameNode will consider it closed (but the NameNode will not rename
it from .tmp - we recommend writing a script to do that).

Regards, Kathleen

On Mon, Nov 5, 2012 at 6:47 PM, Shara Shi <sh...@dhgate.com> wrote:
> I met the similar problem that .tmp file appears on hdfs when hdfs reboot.
> I think the file dose not be closed properly . But I don't know how to
> handle this problem.
>
> Shara
>
> -----邮件原件-----
> 发件人: Kathleen Ting [mailto:kathleen@apache.org]
> 发送时间: 2012年9月14日 0:09
> 收件人: user@flume.apache.org
> 主题: Re: HDFS sink leaves .tmp files
>
> Chris, glad to hear and glad to be of help. Thanks for letting us know that
> it worked.
>
> Regards, Kathleen
>
> On Thu, Sep 13, 2012 at 7:38 AM, Chris Neal <cw...@gmail.com> wrote:
>> Just to follow up, the .tmp file problem did go away using
>> 1.3.0-SNAPSHOT on the HDFS sink agent.
>>
>> Thanks again Kathleen :)
>>
>>
>> On Mon, Sep 10, 2012 at 8:38 PM, Chris Neal <cw...@gmail.com> wrote:
>>>
>>> Thanks Kathleen!
>>> I'll download that build tomorrow morning and give it a whirl.
>>>
>>> Chris
>>>
>>>
>>> On Mon, Sep 10, 2012 at 5:09 PM, Kathleen Ting <ka...@apache.org>
>>> wrote:
>>>>
>>>> [Moving to cdh-user@cloudera.org |
>>>> https://groups.google.com/a/cloudera.org/group/cdh-user/topics since
>>>> this is getting to be CDH specific]
>>>> bcc: user@flume.apache.org
>>>>
>>>> Chris,
>>>>
>>>> When the file has not been closed by the client, the file size may
>>>> be shown as 0. The NameNode will not update the metadata about the
>>>> file until the block is completed or the file handle is closed. Even
>>>> if it updates at a block boundary, the size won't be accurate until
>>>> the file is closed.
>>>>
>>>> The metadata takes some time to populate even though the files may
>>>> contain data. The CDH4.1 version of Flume includes FLUME-1238, which
>>>> will do auto-rolling of files and helps lower the period where these
>>>> files appear to be 0 size.
>>>>
>>>> Since the CDH3u5 version of Flume is compatible with CDH3* Hadoop
>>>> and the CDH4 Flume is compatible with CDH4* Hadoop, you can download
>>>> the nightly build of flume-ng-1.2.0-cdh4.1.0 from
>>>> http://nightly.cloudera.com/cdh4/cdh/4/
>>>>
>>>> Regards, Kathleen
>>>>
>>>> On Mon, Sep 10, 2012 at 1:08 PM, Bhaskar V. Karambelkar
>>>> <bh...@gmail.com> wrote:
>>>> > Don't know about RPM, but there's a 1.2.x tarball of the 1.2 build
>>>> > @ http://archive.cloudera.com/cdh/3/flume-ng-1.2.0-cdh3u5.tar.gz
>>>> >
>>>> >
>>>> > On Mon, Sep 10, 2012 at 3:01 PM, Chris Neal <cw...@gmail.com> wrote:
>>>> >>
>>>> >> Just checked, and from Cloudera, 1.1.0+121-1.cdh4.0.1.p0.1.el6 is
>>>> >> still the latest from their yum repo.
>>>> >>
>>>> >>
>>>> >> On Mon, Sep 10, 2012 at 1:59 PM, Chris Neal <cw...@gmail.com> wrote:
>>>> >>>
>>>> >>> I'm using a combination :)
>>>> >>>
>>>> >>> The application tier is 1.3.0-SNAPSHOT The HDFS tier is CentOS,
>>>> >>> and I grabbed the latest (at the time) from the CDH repo.  It's
>>>> >>> version is:  1.1.0+121-1.cdh4.0.1.p0.1.el6
>>>> >>>
>>>> >>> If the issue is on the HDFS sink side, that it could definitely
>>>> >>> be in my version!
>>>> >>> I'll check if Cloudera has a more recent version to update to.
>>>> >>>
>>>> >>> Thanks!
>>>> >>> Chris
>>>> >>>
>>>> >>>
>>>> >>> On Mon, Sep 10, 2012 at 12:37 PM, Kathleen Ting
>>>> >>> <ka...@apache.org>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> Chris, Eran, this appears to be FLUME-1238, which was fixed in
>>>> >>>> Flume-1.2.0. Can you let me know if you are using Flume-1.2.0?
>>>> >>>>
>>>> >>>> Thanks, Kathleen
>>>> >>>>
>>>> >>>> On Mon, Sep 10, 2012 at 8:21 AM, Chris Neal <cw...@gmail.com>
>>>> >>>> wrote:
>>>> >>>> > Glad to know it's not just me :)
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > On Mon, Sep 10, 2012 at 10:16 AM, Eran Kutner
>>>> >>>> > <er...@gigya.com>
>>>> >>>> > wrote:
>>>> >>>> >>
>>>> >>>> >> I have the same problem. I roll every 1 minute so I have
>>>> >>>> >> tons of those .tmp files.
>>>> >>>> >>
>>>> >>>> >> -eran
>>>> >>>> >>
>>>> >>>> >>
>>>> >>>> >>
>>>> >>>> >> On Mon, Sep 10, 2012 at 6:02 PM, Chris Neal
>>>> >>>> >> <cw...@gmail.com>
>>>> >>>> >> wrote:
>>>> >>>> >>>
>>>> >>>> >>> I'm still seeing this consistently every 24 hour period.
>>>> >>>> >>> Does this sound like a configuration issue, an issue with
>>>> >>>> >>> the Exec source, or an issue with the HDFS sink?
>>>> >>>> >>>
>>>> >>>> >>> Thanks!
>>>> >>>> >>>
>>>> >>>> >>>
>>>> >>>> >>> On Wed, Aug 29, 2012 at 9:18 AM, Chris Neal
>>>> >>>> >>> <cw...@gmail.com>
>>>> >>>> >>> wrote:
>>>> >>>> >>>>
>>>> >>>> >>>> Hi all,
>>>> >>>> >>>>
>>>> >>>> >>>> I have an Exec Source running a tail -F on a
>>>> >>>> >>>> log4J-generated log file that gets rolled once a day.  It
>>>> >>>> >>>> seems that when log4J rolls the file to the new date, the
>>>> >>>> >>>> hdfs sink ends up with a .tmp file.  I haven't figured out
>>>> >>>> >>>> if there is any data loss yet, but was curious if this is
>>>> >>>> >>>> expected behavior?
>>>> >>>> >>>>
>>>> >>>> >>>> Thanks for your time.
>>>> >>>> >>>> Chris
>>>> >>>> >>>
>>>> >>>> >>>
>>>> >>>> >>
>>>> >>>> >
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>
>>>
>>
>