You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Fengyun RAO <ra...@gmail.com> on 2014/04/03 13:10:39 UTC

recommended block replication for small cluster

I know the default replication is 3, which ensures reliability when 2 nodes
crash at the same time.

However, for a small cluster, e.g. 10~20 nodes, the possibility that 2
nodes crash at the same time is too small.

Can we simply set the replication to 2, or are there any other defects?

any information are appreciated!

Re: recommended block replication for small cluster

Posted by Fengyun RAO <ra...@gmail.com>.
thanks, Peyman!

I know it's configurable, what I don't know is if it's typical to reduce it
in small cluster,

or are there any recommended setting, such as 2 for 10-node cluster, 3 for
100-node, 4 for 1000-node?
or no matter how big the cluster is, just set it to 3.



2014-04-03 21:13 GMT+08:00 Peyman Mohajerian <mo...@gmail.com>:

> The reason for replication also has to do with data locality in a larger
> cluster for running a map-reduce jobs. You can reduce the replication,
> that's why it's a configurable parameter.
>
>
> On Thu, Apr 3, 2014 at 7:10 AM, Fengyun RAO <ra...@gmail.com> wrote:
>
>> I know the default replication is 3, which ensures reliability when 2
>> nodes crash at the same time.
>>
>> However, for a small cluster, e.g. 10~20 nodes, the possibility that 2
>> nodes crash at the same time is too small.
>>
>> Can we simply set the replication to 2, or are there any other defects?
>>
>> any information are appreciated!
>>
>
>

Re: recommended block replication for small cluster

Posted by Fengyun RAO <ra...@gmail.com>.
thanks, Peyman!

I know it's configurable, what I don't know is if it's typical to reduce it
in small cluster,

or are there any recommended setting, such as 2 for 10-node cluster, 3 for
100-node, 4 for 1000-node?
or no matter how big the cluster is, just set it to 3.



2014-04-03 21:13 GMT+08:00 Peyman Mohajerian <mo...@gmail.com>:

> The reason for replication also has to do with data locality in a larger
> cluster for running a map-reduce jobs. You can reduce the replication,
> that's why it's a configurable parameter.
>
>
> On Thu, Apr 3, 2014 at 7:10 AM, Fengyun RAO <ra...@gmail.com> wrote:
>
>> I know the default replication is 3, which ensures reliability when 2
>> nodes crash at the same time.
>>
>> However, for a small cluster, e.g. 10~20 nodes, the possibility that 2
>> nodes crash at the same time is too small.
>>
>> Can we simply set the replication to 2, or are there any other defects?
>>
>> any information are appreciated!
>>
>
>

Re: recommended block replication for small cluster

Posted by Fengyun RAO <ra...@gmail.com>.
thanks, Peyman!

I know it's configurable, what I don't know is if it's typical to reduce it
in small cluster,

or are there any recommended setting, such as 2 for 10-node cluster, 3 for
100-node, 4 for 1000-node?
or no matter how big the cluster is, just set it to 3.



2014-04-03 21:13 GMT+08:00 Peyman Mohajerian <mo...@gmail.com>:

> The reason for replication also has to do with data locality in a larger
> cluster for running a map-reduce jobs. You can reduce the replication,
> that's why it's a configurable parameter.
>
>
> On Thu, Apr 3, 2014 at 7:10 AM, Fengyun RAO <ra...@gmail.com> wrote:
>
>> I know the default replication is 3, which ensures reliability when 2
>> nodes crash at the same time.
>>
>> However, for a small cluster, e.g. 10~20 nodes, the possibility that 2
>> nodes crash at the same time is too small.
>>
>> Can we simply set the replication to 2, or are there any other defects?
>>
>> any information are appreciated!
>>
>
>

Re: recommended block replication for small cluster

Posted by Fengyun RAO <ra...@gmail.com>.
thanks, Peyman!

I know it's configurable, what I don't know is if it's typical to reduce it
in small cluster,

or are there any recommended setting, such as 2 for 10-node cluster, 3 for
100-node, 4 for 1000-node?
or no matter how big the cluster is, just set it to 3.



2014-04-03 21:13 GMT+08:00 Peyman Mohajerian <mo...@gmail.com>:

> The reason for replication also has to do with data locality in a larger
> cluster for running a map-reduce jobs. You can reduce the replication,
> that's why it's a configurable parameter.
>
>
> On Thu, Apr 3, 2014 at 7:10 AM, Fengyun RAO <ra...@gmail.com> wrote:
>
>> I know the default replication is 3, which ensures reliability when 2
>> nodes crash at the same time.
>>
>> However, for a small cluster, e.g. 10~20 nodes, the possibility that 2
>> nodes crash at the same time is too small.
>>
>> Can we simply set the replication to 2, or are there any other defects?
>>
>> any information are appreciated!
>>
>
>

Re: recommended block replication for small cluster

Posted by Peyman Mohajerian <mo...@gmail.com>.
The reason for replication also has to do with data locality in a larger
cluster for running a map-reduce jobs. You can reduce the replication,
that's why it's a configurable parameter.


On Thu, Apr 3, 2014 at 7:10 AM, Fengyun RAO <ra...@gmail.com> wrote:

> I know the default replication is 3, which ensures reliability when 2
> nodes crash at the same time.
>
> However, for a small cluster, e.g. 10~20 nodes, the possibility that 2
> nodes crash at the same time is too small.
>
> Can we simply set the replication to 2, or are there any other defects?
>
> any information are appreciated!
>

Re: recommended block replication for small cluster

Posted by Peyman Mohajerian <mo...@gmail.com>.
The reason for replication also has to do with data locality in a larger
cluster for running a map-reduce jobs. You can reduce the replication,
that's why it's a configurable parameter.


On Thu, Apr 3, 2014 at 7:10 AM, Fengyun RAO <ra...@gmail.com> wrote:

> I know the default replication is 3, which ensures reliability when 2
> nodes crash at the same time.
>
> However, for a small cluster, e.g. 10~20 nodes, the possibility that 2
> nodes crash at the same time is too small.
>
> Can we simply set the replication to 2, or are there any other defects?
>
> any information are appreciated!
>

Re: recommended block replication for small cluster

Posted by Peyman Mohajerian <mo...@gmail.com>.
The reason for replication also has to do with data locality in a larger
cluster for running a map-reduce jobs. You can reduce the replication,
that's why it's a configurable parameter.


On Thu, Apr 3, 2014 at 7:10 AM, Fengyun RAO <ra...@gmail.com> wrote:

> I know the default replication is 3, which ensures reliability when 2
> nodes crash at the same time.
>
> However, for a small cluster, e.g. 10~20 nodes, the possibility that 2
> nodes crash at the same time is too small.
>
> Can we simply set the replication to 2, or are there any other defects?
>
> any information are appreciated!
>

Re: recommended block replication for small cluster

Posted by Peyman Mohajerian <mo...@gmail.com>.
The reason for replication also has to do with data locality in a larger
cluster for running a map-reduce jobs. You can reduce the replication,
that's why it's a configurable parameter.


On Thu, Apr 3, 2014 at 7:10 AM, Fengyun RAO <ra...@gmail.com> wrote:

> I know the default replication is 3, which ensures reliability when 2
> nodes crash at the same time.
>
> However, for a small cluster, e.g. 10~20 nodes, the possibility that 2
> nodes crash at the same time is too small.
>
> Can we simply set the replication to 2, or are there any other defects?
>
> any information are appreciated!
>