You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@crunch.apache.org by wu lihu <ro...@gmail.com> on 2016/09/21 07:48:19 UTC

How to deal with the log files end with gz compressed

Hi Everyone
  I want to ask one question about process the logs files end with
compressed files ? Is there any example for that ?

Re: How to deal with the log files end with gz compressed

Posted by wu lihu <ro...@gmail.com>.

Oh... I forgot the Crunch is only an abstract for MapReduce pipeline.
But anyone tried use it with S3 job output ?  It's strange, seems the
job froze after write the _SUCESS output to S3. The last log appeared
in my job log file is like below:

2016-09-22 10:05:37,194 INFO
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob
(Thread-5): Job status available at:
http://ip-172-31-103-28.cn-north-1.compute.internal:20888/proxy/application_1472715051930_0002/

2016-09-22 10:12:13,692 INFO
com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream
(Thread-5): close closed:false
s3://mgtv-ott-data-archive/vodstat-output/ov/year=2016/month=09/day=21/_SUCCESS

2016-09-22 1:09 GMT+08:00 Josh Wills <jo...@gmail.com>:
> I don't follow- Hadoop handles compression transparently for most of the
> commonly used input formats and compression schemes; you shouldn't have to
> do anything.
>
> On Wed, Sep 21, 2016 at 12:53 AM wu lihu <ro...@gmail.com> wrote:
>>
>> Hi Everyone
>>   I want to ask one question about process the logs files end with
>> compressed files ? Is there any example for that ?

Re: How to deal with the log files end with gz compressed

Posted by Josh Wills <jo...@gmail.com>.

I don't follow- Hadoop handles compression transparently for most of the
commonly used input formats and compression schemes; you shouldn't have to
do anything.
On Wed, Sep 21, 2016 at 12:53 AM wu lihu <ro...@gmail.com> wrote:

> Hi Everyone
>   I want to ask one question about process the logs files end with
> compressed files ? Is there any example for that ?
>