You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Amit Sela <am...@infolinks.com> on 2013/09/17 22:37:04 UTC

Bzip2 vs Gzip

Hi all,
I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop
(logs are gzipped into block size files).
I read that bzip2 is splittable. Is it so in hadoop 1.0.4 ? Does that mean
that any input file bigger then block size will be split between maps ?
What are the tradeoffs between the two ?

Thanks.

Re: Bzip2 vs Gzip

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Yes , bzip2 is splittable.
Tradeoffs - I have not done much experimentation with codecs.

Thanks,
Rahul


On Wed, Sep 18, 2013 at 2:07 AM, Amit Sela <am...@infolinks.com> wrote:

> Hi all,
> I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop
> (logs are gzipped into block size files).
> I read that bzip2 is splittable. Is it so in hadoop 1.0.4 ? Does that mean
> that any input file bigger then block size will be split between maps ?
> What are the tradeoffs between the two ?
>
> Thanks.
>

Re: Bzip2 vs Gzip

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Yes , bzip2 is splittable.
Tradeoffs - I have not done much experimentation with codecs.

Thanks,
Rahul


On Wed, Sep 18, 2013 at 2:07 AM, Amit Sela <am...@infolinks.com> wrote:

> Hi all,
> I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop
> (logs are gzipped into block size files).
> I read that bzip2 is splittable. Is it so in hadoop 1.0.4 ? Does that mean
> that any input file bigger then block size will be split between maps ?
> What are the tradeoffs between the two ?
>
> Thanks.
>

Re: Bzip2 vs Gzip

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Yes , bzip2 is splittable.
Tradeoffs - I have not done much experimentation with codecs.

Thanks,
Rahul


On Wed, Sep 18, 2013 at 2:07 AM, Amit Sela <am...@infolinks.com> wrote:

> Hi all,
> I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop
> (logs are gzipped into block size files).
> I read that bzip2 is splittable. Is it so in hadoop 1.0.4 ? Does that mean
> that any input file bigger then block size will be split between maps ?
> What are the tradeoffs between the two ?
>
> Thanks.
>

Re: Bzip2 vs Gzip

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Yes , bzip2 is splittable.
Tradeoffs - I have not done much experimentation with codecs.

Thanks,
Rahul


On Wed, Sep 18, 2013 at 2:07 AM, Amit Sela <am...@infolinks.com> wrote:

> Hi all,
> I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop
> (logs are gzipped into block size files).
> I read that bzip2 is splittable. Is it so in hadoop 1.0.4 ? Does that mean
> that any input file bigger then block size will be split between maps ?
> What are the tradeoffs between the two ?
>
> Thanks.
>