You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Arun Gujjar <ar...@yahoo.com> on 2014/11/12 00:35:19 UTC

How to convert *.bz2.tmp to *.bz2 file after restating the instance

Hi,


Whenever we restart flume agent it creates a new HDFS file and start writing the data into that file. The earlier file which was created will still be left as *bz2.tmp and from HIVE queries we found that we were not able to read the data from this file.Here are the two questions I have .
1. Could you please suggest how we can convert this bz2.tmp to bz2 file? because we loose this data i.e. present in bz2.tmp file today. 
2. Is there as way to configure flume to start writing the data into the existing bz2.tmp file instead of creating a new file?

Can someone please answer this?
RegardsArun

   

Re: How to convert *.bz2.tmp to *.bz2 file after restating the instance

Posted by Mike Percy <mp...@apache.org>.
Depending on your configuration setup, every batch is likely writing a
stream of bzip2 and these are effectively concatenated together into a
single file. So Hive should (hopefully) be reading all of them except the
last (partial) batch, which is OK to throw away because Flume will retry it
when it comes back up. If Hive doesn't support that, maybe you should try
writing in a format other than compressed text -- possibly compressed Avro
or compressed SequenceFile (both of these formats support compression
internally and are handled well by most tools).

Regarding the .tmp file, this should be manually renamed to a non-tmp file
when a server crash or ungraceful shutdown happens (or set up a cron job to
look for old ones). Flume doesn't currently try to remember the .tmp files
it previously wrote to and try to rename or continue them.

Mike

On Tue, Nov 11, 2014 at 3:35 PM, Arun Gujjar <ar...@yahoo.com>
wrote:

> Hi,
>
>
> Whenever we restart flume agent it creates a new HDFS file and start
> writing the data into that file. The earlier file which was created will
> still be left as *bz2.tmp and from HIVE queries we found that we were not
> able to read the data from this file.
> Here are the two questions I have .
> 1. Could you please suggest how we can convert this bz2.tmp to bz2 file?
> because we loose this data i.e. present in bz2.tmp file today.
> 2. Is there as way to configure flume to start writing the data into the
> existing bz2.tmp file instead of creating a new file?
>
> Can someone please answer this?
>
> Regards
> Arun
>
>
>