You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Manhee Jo <jo...@nttdocomo.com> on 2011/07/07 03:52:17 UTC

Re: tar or hadoop archive

do you know how to set the number of map/reduce tasks rather than 1 during 
hadoop archiving?
i've tried -Dmapred.map.tasks=2 (we are using 0.19.2 actually :( ) but in 
vain.

thanks,
manhee

----- Original Message ----- 
From: "Joey Echeverria" <jo...@cloudera.com>
To: <co...@hadoop.apache.org>
Sent: Tuesday, June 28, 2011 8:46 AM
Subject: Re: tar or hadoop archive


> Yes, you can see a picture describing HAR files in this old blog post:
>
> http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>
> -Joey
>
> On Mon, Jun 27, 2011 at 4:36 PM, Rita <rm...@gmail.com> wrote:
>> So, it does an index of the file?
>>
>>
>>
>> On Mon, Jun 27, 2011 at 10:10 AM, Joey Echeverria <jo...@cloudera.com> 
>> wrote:
>>
>>> The advantage of a hadoop archive files is it lets you access the
>>> files stored in it directly. For example, if you archived three files
>>> (a.txt, b.txt, c.txt) in an archive called foo.har. You could cat one
>>> of the three files using the hadoop command line:
>>>
>>> hadoop fs -cat har:///user/joey/out/foo.har/a.txt
>>>
>>> You can also copy files out of the archive or use files in the archive
>>> as input to map reduce jobs.
>>>
>>> -Joey
>>>
>>> On Mon, Jun 27, 2011 at 3:06 AM, Rita <rm...@gmail.com> wrote:
>>> > We use hadoop/hdfs to archive data. I archive a lot of file by 
>>> > creating
>>> one
>>> > large tar file and then placing to hdfs. Is it better to use hadoop
>>> archive
>>> > for this or is it essentially the same thing?
>>> >
>>> > --
>>> > --- Get your facts first, then you can distort them as you please.--
>>> >
>>>
>>>
>>>
>>> --
>>> Joseph Echeverria
>>> Cloudera, Inc.
>>> 443.305.9434
>>>
>>
>>
>>
>> --
>> --- Get your facts first, then you can distort them as you please.--
>>
>
>
>
> -- 
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>