You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Keith Wiley <kw...@keithwiley.com> on 2011/08/05 01:28:39 UTC

Upload, then decompress archive on HDFS?

Instead of "hd fs -put" hundreds of files of X megs, I want to do it once on a gzipped (or zipped) archive, one file, much smaller total megs.  Then I want to decompress the archive on HDFS?  I can't figure out what "hd fs" type command would do such a thing.

Thanks.

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"What I primarily learned in grad school is how much I *don't* know.
Consequently, I left grad school with a higher ignorance to knowledge ratio than
when I entered."
                                           --  Keith Wiley
________________________________________________________________________________


Re: Upload, then decompress archive on HDFS?

Posted by Harsh J <ha...@cloudera.com>.
I suppose we could do with a simple identity mapping/identity reducing
example/tool that can easily be reutilized for purposes such as these.
Could you file a JIRA on this?

The -text is like -cat but has codec and some file format detection.
Hopefully it should work for your case.

On Fri, Aug 5, 2011 at 8:44 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> I can envision an M/R job for the purpose of manipulating hdfs, such as (de)compressing files and resaving them back to HDFS.  I just didn't think it should be necessary to *write a program* to do something so seemingly minimal.  This (tarring/compressing/etc.) seems like an obvious method for moving data back and forth; I would expect the tools to support it.
>
> I'll read up on "-text".  Maybe that really is what I wanted, although I'm dubious since this has nothing to do with textual data at all.  Anyway, I'll see what I can find on that.
>
> Thanks.
>
> On Aug 4, 2011, at 9:04 PM, Harsh J wrote:
>
>> Keith,
>>
>> The 'hadoop fs -text' tool does decompress a file given to it if
>> needed/able, but what you could also do is run a distributed mapreduce
>> job that converts from compressed to decompressed, that'd be much
>> faster.
>>
>> On Fri, Aug 5, 2011 at 4:58 AM, Keith Wiley <kw...@keithwiley.com> wrote:
>>> Instead of "hd fs -put" hundreds of files of X megs, I want to do it once on a gzipped (or zipped) archive, one file, much smaller total megs.  Then I want to decompress the archive on HDFS?  I can't figure out what "hd fs" type command would do such a thing.
>>>
>>> Thanks.
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com
>
> "It's a fine line between meticulous and obsessive-compulsive and a slippery
> rope between obsessive-compulsive and debilitatingly slow."
>                                           --  Keith Wiley
> ________________________________________________________________________________
>
>



-- 
Harsh J

Re: Upload, then decompress archive on HDFS?

Posted by Keith Wiley <kw...@keithwiley.com>.
I can envision an M/R job for the purpose of manipulating hdfs, such as (de)compressing files and resaving them back to HDFS.  I just didn't think it should be necessary to *write a program* to do something so seemingly minimal.  This (tarring/compressing/etc.) seems like an obvious method for moving data back and forth; I would expect the tools to support it.

I'll read up on "-text".  Maybe that really is what I wanted, although I'm dubious since this has nothing to do with textual data at all.  Anyway, I'll see what I can find on that.

Thanks.

On Aug 4, 2011, at 9:04 PM, Harsh J wrote:

> Keith,
> 
> The 'hadoop fs -text' tool does decompress a file given to it if
> needed/able, but what you could also do is run a distributed mapreduce
> job that converts from compressed to decompressed, that'd be much
> faster.
> 
> On Fri, Aug 5, 2011 at 4:58 AM, Keith Wiley <kw...@keithwiley.com> wrote:
>> Instead of "hd fs -put" hundreds of files of X megs, I want to do it once on a gzipped (or zipped) archive, one file, much smaller total megs.  Then I want to decompress the archive on HDFS?  I can't figure out what "hd fs" type command would do such a thing.
>> 
>> Thanks.


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
                                           --  Keith Wiley
________________________________________________________________________________


Re: Upload, then decompress archive on HDFS?

Posted by Harsh J <ha...@cloudera.com>.
Keith,

The 'hadoop fs -text' tool does decompress a file given to it if
needed/able, but what you could also do is run a distributed mapreduce
job that converts from compressed to decompressed, that'd be much
faster.

On Fri, Aug 5, 2011 at 4:58 AM, Keith Wiley <kw...@keithwiley.com> wrote:
> Instead of "hd fs -put" hundreds of files of X megs, I want to do it once on a gzipped (or zipped) archive, one file, much smaller total megs.  Then I want to decompress the archive on HDFS?  I can't figure out what "hd fs" type command would do such a thing.
>
> Thanks.
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com
>
> "What I primarily learned in grad school is how much I *don't* know.
> Consequently, I left grad school with a higher ignorance to knowledge ratio than
> when I entered."
>                                           --  Keith Wiley
> ________________________________________________________________________________
>
>



-- 
Harsh J