You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Mohit Anchlia <mo...@gmail.com> on 2012/10/31 01:10:28 UTC

Replication

With respect to replication if I run pig job from one of the nodes within
the Hadoop cluster then do I always end up with writing 1 replica copy to
that client node always and remaining 2 replica copies to other nodes?

Re: Replication

Posted by Harsh J <ha...@cloudera.com>.

Hi,

Yes if you are purely a regular client (non DN box) writing to HDFS,
then the chosen DNs are selected at random (but fit within policy of
cross-rack writes, if it applies to your environment).

On Wed, Oct 31, 2012 at 6:43 AM, Mohit Anchlia <mo...@gmail.com> wrote:
> Thanks and if it is not the datanode then I am guessing namenode decides the
> nodes in replication pipeline?
>
>
> On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath
> <ra...@gmail.com> wrote:
>>
>> If your client node is a datanode with your cluster then the first copy
>> does get written to that data node.
>>
>> Experts please feel free to correct me here.
>>
>> On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>>>
>>> With respect to replication if I run pig job from one of the nodes within
>>> the Hadoop cluster then do I always end up with writing 1 replica copy to
>>> that client node always and remaining 2 replica copies to other nodes?
>>>
>
>



-- 
Harsh J

Re: Replication

Posted by Harsh J <ha...@cloudera.com>.

Hi,

Yes if you are purely a regular client (non DN box) writing to HDFS,
then the chosen DNs are selected at random (but fit within policy of
cross-rack writes, if it applies to your environment).

On Wed, Oct 31, 2012 at 6:43 AM, Mohit Anchlia <mo...@gmail.com> wrote:
> Thanks and if it is not the datanode then I am guessing namenode decides the
> nodes in replication pipeline?
>
>
> On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath
> <ra...@gmail.com> wrote:
>>
>> If your client node is a datanode with your cluster then the first copy
>> does get written to that data node.
>>
>> Experts please feel free to correct me here.
>>
>> On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>>>
>>> With respect to replication if I run pig job from one of the nodes within
>>> the Hadoop cluster then do I always end up with writing 1 replica copy to
>>> that client node always and remaining 2 replica copies to other nodes?
>>>
>
>



-- 
Harsh J

Re: Replication

Posted by ranjith raghunath <ra...@gmail.com>.

The namenode does decide the replica for either case. It just so happens
that when running from a datanode the first replica is housed on the same
node. Hope this makes sense.
On Oct 30, 2012 8:13 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:

> Thanks and if it is not the datanode then I am guessing namenode decides
> the nodes in replication pipeline?
>
> On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath <
> ranjith.raghunath1@gmail.com> wrote:
>
>> If your client node is a datanode with your cluster then the first copy
>> does get written to that data node.
>>
>> Experts please feel free to correct me here.
>>  On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>>
>>> With respect to replication if I run pig job from one of the nodes
>>> within the Hadoop cluster then do I always end up with writing 1 replica
>>> copy to that client node always and remaining 2 replica copies to other
>>> nodes?
>>>
>>>
>>
>

Re: Replication

Posted by ranjith raghunath <ra...@gmail.com>.

The namenode does decide the replica for either case. It just so happens
that when running from a datanode the first replica is housed on the same
node. Hope this makes sense.
On Oct 30, 2012 8:13 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:

> Thanks and if it is not the datanode then I am guessing namenode decides
> the nodes in replication pipeline?
>
> On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath <
> ranjith.raghunath1@gmail.com> wrote:
>
>> If your client node is a datanode with your cluster then the first copy
>> does get written to that data node.
>>
>> Experts please feel free to correct me here.
>>  On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>>
>>> With respect to replication if I run pig job from one of the nodes
>>> within the Hadoop cluster then do I always end up with writing 1 replica
>>> copy to that client node always and remaining 2 replica copies to other
>>> nodes?
>>>
>>>
>>
>

Re: Replication

Posted by ranjith raghunath <ra...@gmail.com>.

The namenode does decide the replica for either case. It just so happens
that when running from a datanode the first replica is housed on the same
node. Hope this makes sense.
On Oct 30, 2012 8:13 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:

> Thanks and if it is not the datanode then I am guessing namenode decides
> the nodes in replication pipeline?
>
> On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath <
> ranjith.raghunath1@gmail.com> wrote:
>
>> If your client node is a datanode with your cluster then the first copy
>> does get written to that data node.
>>
>> Experts please feel free to correct me here.
>>  On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>>
>>> With respect to replication if I run pig job from one of the nodes
>>> within the Hadoop cluster then do I always end up with writing 1 replica
>>> copy to that client node always and remaining 2 replica copies to other
>>> nodes?
>>>
>>>
>>
>

Re: Replication

Posted by Harsh J <ha...@cloudera.com>.

Hi,

Yes if you are purely a regular client (non DN box) writing to HDFS,
then the chosen DNs are selected at random (but fit within policy of
cross-rack writes, if it applies to your environment).

On Wed, Oct 31, 2012 at 6:43 AM, Mohit Anchlia <mo...@gmail.com> wrote:
> Thanks and if it is not the datanode then I am guessing namenode decides the
> nodes in replication pipeline?
>
>
> On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath
> <ra...@gmail.com> wrote:
>>
>> If your client node is a datanode with your cluster then the first copy
>> does get written to that data node.
>>
>> Experts please feel free to correct me here.
>>
>> On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>>>
>>> With respect to replication if I run pig job from one of the nodes within
>>> the Hadoop cluster then do I always end up with writing 1 replica copy to
>>> that client node always and remaining 2 replica copies to other nodes?
>>>
>
>



-- 
Harsh J

Re: Replication

Posted by ranjith raghunath <ra...@gmail.com>.

The namenode does decide the replica for either case. It just so happens
that when running from a datanode the first replica is housed on the same
node. Hope this makes sense.
On Oct 30, 2012 8:13 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:

> Thanks and if it is not the datanode then I am guessing namenode decides
> the nodes in replication pipeline?
>
> On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath <
> ranjith.raghunath1@gmail.com> wrote:
>
>> If your client node is a datanode with your cluster then the first copy
>> does get written to that data node.
>>
>> Experts please feel free to correct me here.
>>  On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>>
>>> With respect to replication if I run pig job from one of the nodes
>>> within the Hadoop cluster then do I always end up with writing 1 replica
>>> copy to that client node always and remaining 2 replica copies to other
>>> nodes?
>>>
>>>
>>
>

Re: Replication

Posted by Harsh J <ha...@cloudera.com>.

Hi,

Yes if you are purely a regular client (non DN box) writing to HDFS,
then the chosen DNs are selected at random (but fit within policy of
cross-rack writes, if it applies to your environment).

On Wed, Oct 31, 2012 at 6:43 AM, Mohit Anchlia <mo...@gmail.com> wrote:
> Thanks and if it is not the datanode then I am guessing namenode decides the
> nodes in replication pipeline?
>
>
> On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath
> <ra...@gmail.com> wrote:
>>
>> If your client node is a datanode with your cluster then the first copy
>> does get written to that data node.
>>
>> Experts please feel free to correct me here.
>>
>> On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>>>
>>> With respect to replication if I run pig job from one of the nodes within
>>> the Hadoop cluster then do I always end up with writing 1 replica copy to
>>> that client node always and remaining 2 replica copies to other nodes?
>>>
>
>



-- 
Harsh J

Re: Replication

Posted by Mohit Anchlia <mo...@gmail.com>.

Thanks and if it is not the datanode then I am guessing namenode decides
the nodes in replication pipeline?

On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath <
ranjith.raghunath1@gmail.com> wrote:

> If your client node is a datanode with your cluster then the first copy
> does get written to that data node.
>
> Experts please feel free to correct me here.
>  On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>
>> With respect to replication if I run pig job from one of the nodes within
>> the Hadoop cluster then do I always end up with writing 1 replica copy to
>> that client node always and remaining 2 replica copies to other nodes?
>>
>>
>

Re: Replication

Posted by Mohit Anchlia <mo...@gmail.com>.

Thanks and if it is not the datanode then I am guessing namenode decides
the nodes in replication pipeline?

On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath <
ranjith.raghunath1@gmail.com> wrote:

> If your client node is a datanode with your cluster then the first copy
> does get written to that data node.
>
> Experts please feel free to correct me here.
>  On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>
>> With respect to replication if I run pig job from one of the nodes within
>> the Hadoop cluster then do I always end up with writing 1 replica copy to
>> that client node always and remaining 2 replica copies to other nodes?
>>
>>
>

Re: Replication

Posted by Mohit Anchlia <mo...@gmail.com>.

Thanks and if it is not the datanode then I am guessing namenode decides
the nodes in replication pipeline?

On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath <
ranjith.raghunath1@gmail.com> wrote:

> If your client node is a datanode with your cluster then the first copy
> does get written to that data node.
>
> Experts please feel free to correct me here.
>  On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>
>> With respect to replication if I run pig job from one of the nodes within
>> the Hadoop cluster then do I always end up with writing 1 replica copy to
>> that client node always and remaining 2 replica copies to other nodes?
>>
>>
>

Re: Replication

Posted by Mohit Anchlia <mo...@gmail.com>.

Thanks and if it is not the datanode then I am guessing namenode decides
the nodes in replication pipeline?

On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath <
ranjith.raghunath1@gmail.com> wrote:

> If your client node is a datanode with your cluster then the first copy
> does get written to that data node.
>
> Experts please feel free to correct me here.
>  On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>
>> With respect to replication if I run pig job from one of the nodes within
>> the Hadoop cluster then do I always end up with writing 1 replica copy to
>> that client node always and remaining 2 replica copies to other nodes?
>>
>>
>

Re: Replication

Posted by ranjith raghunath <ra...@gmail.com>.

If your client node is a datanode with your cluster then the first copy
does get written to that data node.

Experts please feel free to correct me here.
On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:

> With respect to replication if I run pig job from one of the nodes within
> the Hadoop cluster then do I always end up with writing 1 replica copy to
> that client node always and remaining 2 replica copies to other nodes?
>
>

Re: Replication

Posted by ranjith raghunath <ra...@gmail.com>.

If your client node is a datanode with your cluster then the first copy
does get written to that data node.

Experts please feel free to correct me here.
On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:

> With respect to replication if I run pig job from one of the nodes within
> the Hadoop cluster then do I always end up with writing 1 replica copy to
> that client node always and remaining 2 replica copies to other nodes?
>
>

Re: Replication

Posted by ranjith raghunath <ra...@gmail.com>.

If your client node is a datanode with your cluster then the first copy
does get written to that data node.

Experts please feel free to correct me here.
On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:

> With respect to replication if I run pig job from one of the nodes within
> the Hadoop cluster then do I always end up with writing 1 replica copy to
> that client node always and remaining 2 replica copies to other nodes?
>
>

Re: Replication

Posted by ranjith raghunath <ra...@gmail.com>.

If your client node is a datanode with your cluster then the first copy
does get written to that data node.

Experts please feel free to correct me here.
On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:

> With respect to replication if I run pig job from one of the nodes within
> the Hadoop cluster then do I always end up with writing 1 replica copy to
> that client node always and remaining 2 replica copies to other nodes?
>
>