You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ranjith <ra...@gmail.com> on 2012/05/22 03:18:30 UTC

CopyFromLocal

I have always wondered about this and and not sure as to phenomenon. When I fire a map reduce job to copy data over in a distributed fashion I would expect to see mappers executing the copy. What happens with a copy command from Hadoop fs?

Thanks,
Ranjith

Re: CopyFromLocal

Posted by Ranjith <ra...@gmail.com>.
Harsh,

Thanks for the response bud. Appreciate it!

Thanks,
Ranjith

On May 21, 2012, at 11:09 PM, Harsh J <ha...@cloudera.com> wrote:

> Ranjith,
> 
> MapReduce and HDFS are two different things. MapReduce uses HDFS (and
> can use any other FS as well) to do some efficient work, but HDFS does
> not use MapReduce.
> 
> A simple HDFS transfer is done via network directly - Yes its just a
> block by block copy/write to/from the relevant DataNodes, done over
> network sockets at each end.
> 
> On Tue, May 22, 2012 at 8:58 AM, Ranjith <ra...@gmail.com> wrote:
>> Thanks harsh. So when it connects directly to the data nodes it does not fire off any mappers. So how does it get the data over? Is it just a block by block copy?
>> 
>> Thanks,
>> Ranjith
>> 
>> On May 21, 2012, at 9:22 PM, Harsh J <ha...@cloudera.com> wrote:
>> 
>>> Ranjith,
>>> 
>>> Are you speaking of DistCp?
>>> http://hadoop.apache.org/common/docs/current/distcp.html
>>> 
>>> An 'fs -copyFromLocal' otherwise just runs as a single program that
>>> connects to your DFS nodes and writes data from a single client
>>> thread, and is not distributed on its own.
>>> 
>>> On Tue, May 22, 2012 at 6:48 AM, Ranjith <ra...@gmail.com> wrote:
>>>> 
>>>> I have always wondered about this and and not sure as to phenomenon. When I fire a map reduce job to copy data over in a distributed fashion I would expect to see mappers executing the copy. What happens with a copy command from Hadoop fs?
>>>> 
>>>> Thanks,
>>>> Ranjith
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
> 
> 
> 
> -- 
> Harsh J

Re: CopyFromLocal

Posted by Harsh J <ha...@cloudera.com>.
Ranjith,

MapReduce and HDFS are two different things. MapReduce uses HDFS (and
can use any other FS as well) to do some efficient work, but HDFS does
not use MapReduce.

A simple HDFS transfer is done via network directly - Yes its just a
block by block copy/write to/from the relevant DataNodes, done over
network sockets at each end.

On Tue, May 22, 2012 at 8:58 AM, Ranjith <ra...@gmail.com> wrote:
> Thanks harsh. So when it connects directly to the data nodes it does not fire off any mappers. So how does it get the data over? Is it just a block by block copy?
>
> Thanks,
> Ranjith
>
> On May 21, 2012, at 9:22 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Ranjith,
>>
>> Are you speaking of DistCp?
>> http://hadoop.apache.org/common/docs/current/distcp.html
>>
>> An 'fs -copyFromLocal' otherwise just runs as a single program that
>> connects to your DFS nodes and writes data from a single client
>> thread, and is not distributed on its own.
>>
>> On Tue, May 22, 2012 at 6:48 AM, Ranjith <ra...@gmail.com> wrote:
>>>
>>> I have always wondered about this and and not sure as to phenomenon. When I fire a map reduce job to copy data over in a distributed fashion I would expect to see mappers executing the copy. What happens with a copy command from Hadoop fs?
>>>
>>> Thanks,
>>> Ranjith
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: CopyFromLocal

Posted by Ranjith <ra...@gmail.com>.
Thanks harsh. So when it connects directly to the data nodes it does not fire off any mappers. So how does it get the data over? Is it just a block by block copy?

Thanks,
Ranjith

On May 21, 2012, at 9:22 PM, Harsh J <ha...@cloudera.com> wrote:

> Ranjith,
> 
> Are you speaking of DistCp?
> http://hadoop.apache.org/common/docs/current/distcp.html
> 
> An 'fs -copyFromLocal' otherwise just runs as a single program that
> connects to your DFS nodes and writes data from a single client
> thread, and is not distributed on its own.
> 
> On Tue, May 22, 2012 at 6:48 AM, Ranjith <ra...@gmail.com> wrote:
>> 
>> I have always wondered about this and and not sure as to phenomenon. When I fire a map reduce job to copy data over in a distributed fashion I would expect to see mappers executing the copy. What happens with a copy command from Hadoop fs?
>> 
>> Thanks,
>> Ranjith
> 
> 
> 
> -- 
> Harsh J

Re: CopyFromLocal

Posted by Harsh J <ha...@cloudera.com>.
Ranjith,

Are you speaking of DistCp?
http://hadoop.apache.org/common/docs/current/distcp.html

An 'fs -copyFromLocal' otherwise just runs as a single program that
connects to your DFS nodes and writes data from a single client
thread, and is not distributed on its own.

On Tue, May 22, 2012 at 6:48 AM, Ranjith <ra...@gmail.com> wrote:
>
> I have always wondered about this and and not sure as to phenomenon. When I fire a map reduce job to copy data over in a distributed fashion I would expect to see mappers executing the copy. What happens with a copy command from Hadoop fs?
>
> Thanks,
> Ranjith



-- 
Harsh J