You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Zac Shepherd <zs...@about.com> on 2013/08/21 20:00:52 UTC

bz2 decompress in place

Hello,

I'm using an ancient version of Hadoop (0.20.2+228) and trying to run a 
m/r job over a bz2 compressed file (18G).  Since splitting support 
wasn't added until 0.21.0, a single mapper is getting allocated and will 
take far too long to complete.  Is there a way that I can decompress the 
file in place, or am I going to have to copy it down, decompress it 
locally, and then copy it back up to the cluster?

Thanks for any help,
Zac Shepherd

Re: bz2 decompress in place

Posted by Zac Shepherd <zs...@about.com>.

Just because I always appreciate it when someone posts the answer to 
their own question:

We have some java that does
	BZip2Codec bz2 = new BZip2Codec();
	CompressionOutputStream cout = bz2.createOutputStream(out);
for compression.

We just wrote another version that does
	BZip2Codec bz2 = new BZip2Codec();
	CompressionInputStream cin = bz2.createInputStream(in);
for decompression.

Not rocket science, but the decompress aspect of the bzip2codec is 
poorly documented so I thought I'd send along.

On 08/21/2013 02:00 PM, Zac Shepherd wrote:
> Hello,
>
> I'm using an ancient version of Hadoop (0.20.2+228) and trying to run a
> m/r job over a bz2 compressed file (18G).  Since splitting support
> wasn't added until 0.21.0, a single mapper is getting allocated and will
> take far too long to complete.  Is there a way that I can decompress the
> file in place, or am I going to have to copy it down, decompress it
> locally, and then copy it back up to the cluster?
>
> Thanks for any help,
> Zac Shepherd

Re: bz2 decompress in place

Posted by Zac Shepherd <zs...@about.com>.

Just because I always appreciate it when someone posts the answer to 
their own question:

We have some java that does
	BZip2Codec bz2 = new BZip2Codec();
	CompressionOutputStream cout = bz2.createOutputStream(out);
for compression.

We just wrote another version that does
	BZip2Codec bz2 = new BZip2Codec();
	CompressionInputStream cin = bz2.createInputStream(in);
for decompression.

Not rocket science, but the decompress aspect of the bzip2codec is 
poorly documented so I thought I'd send along.

On 08/21/2013 02:00 PM, Zac Shepherd wrote:
> Hello,
>
> I'm using an ancient version of Hadoop (0.20.2+228) and trying to run a
> m/r job over a bz2 compressed file (18G).  Since splitting support
> wasn't added until 0.21.0, a single mapper is getting allocated and will
> take far too long to complete.  Is there a way that I can decompress the
> file in place, or am I going to have to copy it down, decompress it
> locally, and then copy it back up to the cluster?
>
> Thanks for any help,
> Zac Shepherd

Re: bz2 decompress in place

Posted by Zac Shepherd <zs...@about.com>.

Just because I always appreciate it when someone posts the answer to 
their own question:

We have some java that does
	BZip2Codec bz2 = new BZip2Codec();
	CompressionOutputStream cout = bz2.createOutputStream(out);
for compression.

We just wrote another version that does
	BZip2Codec bz2 = new BZip2Codec();
	CompressionInputStream cin = bz2.createInputStream(in);
for decompression.

Not rocket science, but the decompress aspect of the bzip2codec is 
poorly documented so I thought I'd send along.

On 08/21/2013 02:00 PM, Zac Shepherd wrote:
> Hello,
>
> I'm using an ancient version of Hadoop (0.20.2+228) and trying to run a
> m/r job over a bz2 compressed file (18G).  Since splitting support
> wasn't added until 0.21.0, a single mapper is getting allocated and will
> take far too long to complete.  Is there a way that I can decompress the
> file in place, or am I going to have to copy it down, decompress it
> locally, and then copy it back up to the cluster?
>
> Thanks for any help,
> Zac Shepherd

Re: bz2 decompress in place

Posted by Zac Shepherd <zs...@about.com>.

Just because I always appreciate it when someone posts the answer to 
their own question:

We have some java that does
	BZip2Codec bz2 = new BZip2Codec();
	CompressionOutputStream cout = bz2.createOutputStream(out);
for compression.

We just wrote another version that does
	BZip2Codec bz2 = new BZip2Codec();
	CompressionInputStream cin = bz2.createInputStream(in);
for decompression.

Not rocket science, but the decompress aspect of the bzip2codec is 
poorly documented so I thought I'd send along.

On 08/21/2013 02:00 PM, Zac Shepherd wrote:
> Hello,
>
> I'm using an ancient version of Hadoop (0.20.2+228) and trying to run a
> m/r job over a bz2 compressed file (18G).  Since splitting support
> wasn't added until 0.21.0, a single mapper is getting allocated and will
> take far too long to complete.  Is there a way that I can decompress the
> file in place, or am I going to have to copy it down, decompress it
> locally, and then copy it back up to the cluster?
>
> Thanks for any help,
> Zac Shepherd