You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Zac Shepherd <zs...@about.com> on 2013/08/21 20:00:52 UTC
bz2 decompress in place
Hello,
I'm using an ancient version of Hadoop (0.20.2+228) and trying to run a
m/r job over a bz2 compressed file (18G). Since splitting support
wasn't added until 0.21.0, a single mapper is getting allocated and will
take far too long to complete. Is there a way that I can decompress the
file in place, or am I going to have to copy it down, decompress it
locally, and then copy it back up to the cluster?
Thanks for any help,
Zac Shepherd
Re: bz2 decompress in place
Posted by Zac Shepherd <zs...@about.com>.
Just because I always appreciate it when someone posts the answer to
their own question:
We have some java that does
BZip2Codec bz2 = new BZip2Codec();
CompressionOutputStream cout = bz2.createOutputStream(out);
for compression.
We just wrote another version that does
BZip2Codec bz2 = new BZip2Codec();
CompressionInputStream cin = bz2.createInputStream(in);
for decompression.
Not rocket science, but the decompress aspect of the bzip2codec is
poorly documented so I thought I'd send along.
On 08/21/2013 02:00 PM, Zac Shepherd wrote:
> Hello,
>
> I'm using an ancient version of Hadoop (0.20.2+228) and trying to run a
> m/r job over a bz2 compressed file (18G). Since splitting support
> wasn't added until 0.21.0, a single mapper is getting allocated and will
> take far too long to complete. Is there a way that I can decompress the
> file in place, or am I going to have to copy it down, decompress it
> locally, and then copy it back up to the cluster?
>
> Thanks for any help,
> Zac Shepherd
Re: bz2 decompress in place
Posted by Zac Shepherd <zs...@about.com>.
Just because I always appreciate it when someone posts the answer to
their own question:
We have some java that does
BZip2Codec bz2 = new BZip2Codec();
CompressionOutputStream cout = bz2.createOutputStream(out);
for compression.
We just wrote another version that does
BZip2Codec bz2 = new BZip2Codec();
CompressionInputStream cin = bz2.createInputStream(in);
for decompression.
Not rocket science, but the decompress aspect of the bzip2codec is
poorly documented so I thought I'd send along.
On 08/21/2013 02:00 PM, Zac Shepherd wrote:
> Hello,
>
> I'm using an ancient version of Hadoop (0.20.2+228) and trying to run a
> m/r job over a bz2 compressed file (18G). Since splitting support
> wasn't added until 0.21.0, a single mapper is getting allocated and will
> take far too long to complete. Is there a way that I can decompress the
> file in place, or am I going to have to copy it down, decompress it
> locally, and then copy it back up to the cluster?
>
> Thanks for any help,
> Zac Shepherd
Re: bz2 decompress in place
Posted by Zac Shepherd <zs...@about.com>.
Just because I always appreciate it when someone posts the answer to
their own question:
We have some java that does
BZip2Codec bz2 = new BZip2Codec();
CompressionOutputStream cout = bz2.createOutputStream(out);
for compression.
We just wrote another version that does
BZip2Codec bz2 = new BZip2Codec();
CompressionInputStream cin = bz2.createInputStream(in);
for decompression.
Not rocket science, but the decompress aspect of the bzip2codec is
poorly documented so I thought I'd send along.
On 08/21/2013 02:00 PM, Zac Shepherd wrote:
> Hello,
>
> I'm using an ancient version of Hadoop (0.20.2+228) and trying to run a
> m/r job over a bz2 compressed file (18G). Since splitting support
> wasn't added until 0.21.0, a single mapper is getting allocated and will
> take far too long to complete. Is there a way that I can decompress the
> file in place, or am I going to have to copy it down, decompress it
> locally, and then copy it back up to the cluster?
>
> Thanks for any help,
> Zac Shepherd
Re: bz2 decompress in place
Posted by Zac Shepherd <zs...@about.com>.
Just because I always appreciate it when someone posts the answer to
their own question:
We have some java that does
BZip2Codec bz2 = new BZip2Codec();
CompressionOutputStream cout = bz2.createOutputStream(out);
for compression.
We just wrote another version that does
BZip2Codec bz2 = new BZip2Codec();
CompressionInputStream cin = bz2.createInputStream(in);
for decompression.
Not rocket science, but the decompress aspect of the bzip2codec is
poorly documented so I thought I'd send along.
On 08/21/2013 02:00 PM, Zac Shepherd wrote:
> Hello,
>
> I'm using an ancient version of Hadoop (0.20.2+228) and trying to run a
> m/r job over a bz2 compressed file (18G). Since splitting support
> wasn't added until 0.21.0, a single mapper is getting allocated and will
> take far too long to complete. Is there a way that I can decompress the
> file in place, or am I going to have to copy it down, decompress it
> locally, and then copy it back up to the cluster?
>
> Thanks for any help,
> Zac Shepherd