You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "Xie, Tao" <xi...@gmail.com> on 2009/04/23 11:14:37 UTC

The mechanism of choosing target datanodes

If a cluster has many datanodes and I want to copy a large file into DFS. 
If the replication number is set to 1, does the namenode will put the file
data on one datanode or several nodes? I wonder if the file will be split
into blocks then different unique blocks are on different datanodes.

-- 
View this message in context: http://www.nabble.com/The-mechanism-of-choosing-target-datanodes-tp23193235p23193235.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: The mechanism of choosing target datanodes

Posted by jason hadoop <ja...@gmail.com>.

I haven't checked the code for any special cases of replication = 1.
The write a block sequence is:

   1. Get a list of datanodes from the namenode for the block replicas, the
   reqest host being the first datanode returned if the request host is a
   datanode.
   2. send the block with the list of datanodes to receive it to the first
   datanode in the list
   3. That datanode sends the block to the next
   4. 3 repeats until the block is fully replicated.



On Thu, Apr 23, 2009 at 2:08 PM, Jerome Banks <jb...@quantcast.com> wrote:

> FYI, The pipe v2 results were created with
> com.quantcast.armor.jobs.pipev3.util.CountVG , inputing the results from
> com.quantcast.armor.jobs.pipev3.util.MyHarvestV2 (the mainline pipev2
> harvest).
>   The pipe v3 results were a one day run of BloomDaily for 04/12/2009.
>  The CSV files were generated with TopNFlow.
>
>
> On 4/23/09 1:56 PM, "Amr Awadallah" <aa...@cloudera.com> wrote:
>
> yes, it will be split across many nodes, and if possible each block will
> get a different datanode.
>
> see following link for more details:
>
>
> http://hadoop.apache.org/core/docs/current/hdfs_design.html#Data+Organization
>
> -- amr
>
> Alex Loddengaard wrote:
> > I believe the blocks will be distributed across data nodes and not local
> to
> > only one data node.  If this wasn't the case, then running a MR job on
> the
> > file would only be local to one task tracker.
> >
> > Alex
> >
> > On Thu, Apr 23, 2009 at 2:14 AM, Xie, Tao <xi...@gmail.com> wrote:
> >
> >
> >> If a cluster has many datanodes and I want to copy a large file into
> DFS.
> >> If the replication number is set to 1, does the namenode will put the
> file
> >> data on one datanode or several nodes? I wonder if the file will be
> split
> >> into blocks then different unique blocks are on different datanodes.
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/The-mechanism-of-choosing-target-datanodes-tp23193235p23193235.html
> >> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >>
> >>
> >>
> >
> >
>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: The mechanism of choosing target datanodes

Posted by Jerome Banks <jb...@quantcast.com>.

FYI, The pipe v2 results were created with com.quantcast.armor.jobs.pipev3.util.CountVG , inputing the results from com.quantcast.armor.jobs.pipev3.util.MyHarvestV2 (the mainline pipev2 harvest).
   The pipe v3 results were a one day run of BloomDaily for 04/12/2009.
  The CSV files were generated with TopNFlow.

On 4/23/09 1:56 PM, "Amr Awadallah" <aa...@cloudera.com> wrote:

yes, it will be split across many nodes, and if possible each block will
get a different datanode.

see following link for more details:

http://hadoop.apache.org/core/docs/current/hdfs_design.html#Data+Organization

-- amr

Alex Loddengaard wrote:
> I believe the blocks will be distributed across data nodes and not local to
> only one data node.  If this wasn't the case, then running a MR job on the
> file would only be local to one task tracker.
>
> Alex
>
> On Thu, Apr 23, 2009 at 2:14 AM, Xie, Tao <xi...@gmail.com> wrote:
>
>
>> If a cluster has many datanodes and I want to copy a large file into DFS.
>> If the replication number is set to 1, does the namenode will put the file
>> data on one datanode or several nodes? I wonder if the file will be split
>> into blocks then different unique blocks are on different datanodes.
>>
>> --
>> View this message in context:
>> http://www.nabble.com/The-mechanism-of-choosing-target-datanodes-tp23193235p23193235.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
>>
>
>

Re: The mechanism of choosing target datanodes

Posted by Amr Awadallah <aa...@cloudera.com>.

yes, it will be split across many nodes, and if possible each block will 
get a different datanode.

see following link for more details:

http://hadoop.apache.org/core/docs/current/hdfs_design.html#Data+Organization

-- amr

Alex Loddengaard wrote:
> I believe the blocks will be distributed across data nodes and not local to
> only one data node.  If this wasn't the case, then running a MR job on the
> file would only be local to one task tracker.
>
> Alex
>
> On Thu, Apr 23, 2009 at 2:14 AM, Xie, Tao <xi...@gmail.com> wrote:
>
>   
>> If a cluster has many datanodes and I want to copy a large file into DFS.
>> If the replication number is set to 1, does the namenode will put the file
>> data on one datanode or several nodes? I wonder if the file will be split
>> into blocks then different unique blocks are on different datanodes.
>>
>> --
>> View this message in context:
>> http://www.nabble.com/The-mechanism-of-choosing-target-datanodes-tp23193235p23193235.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
>>     
>
>

Re: The mechanism of choosing target datanodes

Posted by Alex Loddengaard <al...@cloudera.com>.

I believe the blocks will be distributed across data nodes and not local to
only one data node.  If this wasn't the case, then running a MR job on the
file would only be local to one task tracker.

Alex

On Thu, Apr 23, 2009 at 2:14 AM, Xie, Tao <xi...@gmail.com> wrote:

>
> If a cluster has many datanodes and I want to copy a large file into DFS.
> If the replication number is set to 1, does the namenode will put the file
> data on one datanode or several nodes? I wonder if the file will be split
> into blocks then different unique blocks are on different datanodes.
>
> --
> View this message in context:
> http://www.nabble.com/The-mechanism-of-choosing-target-datanodes-tp23193235p23193235.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>