You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by Matt <mi...@gmail.com> on 2015/12/30 12:04:55 UTC

Review Request 41795: Make the hdfs replication as 1 when it is single node cluster

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/
-----------------------------------------------------------

Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.


Bugs: AMBARI-14459
    https://issues.apache.org/jira/browse/AMBARI-14459


Repository: ambari


Description
-------

By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1


Diffs
-----

  ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b 
  ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94 

Diff: https://reviews.apache.org/r/41795/diff/


Testing
-------

Manually Tested
Unit test updated.


Thanks,

Matt


Re: Review Request 41795: Make the hdfs replication as 1 when it is single node cluster

Posted by Matt <mi...@gmail.com>.

> On Dec. 30, 2015, 9:39 a.m., Sumit Mohanty wrote:
> > I wonder if someone with HDFS expertise can chime in here.
> > 
> > In general, I have not run into any problem with replication factor being 3 but number of DNs being less than that (yes, some tests such as decommissioning require explicit changing of the value). 
> > As you add more DNs HDFS starts adding replicas. 
> > 
> > The opposite seems to be risky to me. Start with a replication factor of 1 and then add DNs. If you forget to change the value, replicas will not be created. Data that is already stored will not be replicated. So in essence, if one is deploying a cluster that needs to live long "3" is a better value. If the cluster never needs more than 1-2 DN then its likely a test cluster and thus not a higher priority scenario.
> > 
> > So I would rather leave the replication factor to be 3 and then have test deployments change defaults based on what they are testing.
> > 
> > -1 for the change (the code change is good, but I am not convinced that 1 is a good default)

I agree with you Sumit! I did not think this through when I started working on this. 

I'd rather leave the replication factor to be 3 as default, regardless of having or not having less than three DNs.
As a user I might start with 1 DN when I start a cluster managed by Ambari, and add 3 DNs the next day. And totally forget that Ambari set the dfs.replication on my cluster to 1 on day 1, and never changed it. 

As a user, I myself would not want to have the default set to 1 or 2 (based on no of DNs). And leave the default as 3.

I can discard the changes and close the JIRA, with no action to be taken, if you agree to it.


- Matt


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/#review112324
-----------------------------------------------------------


On Dec. 30, 2015, 3:04 a.m., Matt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41795/
> -----------------------------------------------------------
> 
> (Updated Dec. 30, 2015, 3:04 a.m.)
> 
> 
> Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
> 
> 
> Bugs: AMBARI-14459
>     https://issues.apache.org/jira/browse/AMBARI-14459
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b 
>   ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94 
> 
> Diff: https://reviews.apache.org/r/41795/diff/
> 
> 
> Testing
> -------
> 
> Manually Tested
> Unit test updated.
> 
> 
> Thanks,
> 
> Matt
> 
>


Re: Review Request 41795: Make the hdfs replication as 1 when it is single node cluster

Posted by Sumit Mohanty <sm...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/#review112324
-----------------------------------------------------------


I wonder if someone with HDFS expertise can chime in here.

In general, I have not run into any problem with replication factor being 3 but number of DNs being less than that (yes, some tests such as decommissioning require explicit changing of the value). 
As you add more DNs HDFS starts adding replicas. 

The opposite seems to be risky to me. Start with a replication factor of 1 and then add DNs. If you forget to change the value, replicas will not be created. Data that is already stored will not be replicated. So in essence, if one is deploying a cluster that needs to live long "3" is a better value. If the cluster never needs more than 1-2 DN then its likely a test cluster and thus not a higher priority scenario.

So I would rather leave the replication factor to be 3 and then have test deployments change defaults based on what they are testing.

-1 for the change (the code change is good, but I am not convinced that 1 is a good default)

- Sumit Mohanty


On Dec. 30, 2015, 11:04 a.m., Matt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41795/
> -----------------------------------------------------------
> 
> (Updated Dec. 30, 2015, 11:04 a.m.)
> 
> 
> Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
> 
> 
> Bugs: AMBARI-14459
>     https://issues.apache.org/jira/browse/AMBARI-14459
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b 
>   ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94 
> 
> Diff: https://reviews.apache.org/r/41795/diff/
> 
> 
> Testing
> -------
> 
> Manually Tested
> Unit test updated.
> 
> 
> Thanks,
> 
> Matt
> 
>


Re: Review Request 41795: Make the hdfs replication as 1 when it is single node cluster

Posted by Matt <mi...@gmail.com>.

> On Dec. 30, 2015, 9:29 a.m., bhuvnesh chaudhary wrote:
> > ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py, line 245
> > <https://reviews.apache.org/r/41795/diff/1/?file=1178449#file1178449line245>
> >
> >     We can extend this condition to handle cases where we have 2 datanodes because in that case as well dfs.replication = 3 is not appropriate.
> >     
> >     So If 1 DN,dfs.replication=1, If 2 DN, dfs.replication=2.
> >     Rest as is.

After giving it some thought, I prefer to have dfs.replication to be set as 3, regardless of no. of datanodes. Sumit has a good point.


- Matt


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/#review112321
-----------------------------------------------------------


On Dec. 30, 2015, 3:04 a.m., Matt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41795/
> -----------------------------------------------------------
> 
> (Updated Dec. 30, 2015, 3:04 a.m.)
> 
> 
> Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
> 
> 
> Bugs: AMBARI-14459
>     https://issues.apache.org/jira/browse/AMBARI-14459
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b 
>   ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94 
> 
> Diff: https://reviews.apache.org/r/41795/diff/
> 
> 
> Testing
> -------
> 
> Manually Tested
> Unit test updated.
> 
> 
> Thanks,
> 
> Matt
> 
>


Re: Review Request 41795: Make the hdfs replication as 1 when it is single node cluster

Posted by bhuvnesh chaudhary <bc...@pivotal.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/#review112321
-----------------------------------------------------------



ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py (line 245)
<https://reviews.apache.org/r/41795/#comment172766>

    We can extend this condition to handle cases where we have 2 datanodes because in that case as well dfs.replication = 3 is not appropriate.
    
    So If 1 DN,dfs.replication=1, If 2 DN, dfs.replication=2.
    Rest as is.


- bhuvnesh chaudhary


On Dec. 30, 2015, 11:04 a.m., Matt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41795/
> -----------------------------------------------------------
> 
> (Updated Dec. 30, 2015, 11:04 a.m.)
> 
> 
> Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
> 
> 
> Bugs: AMBARI-14459
>     https://issues.apache.org/jira/browse/AMBARI-14459
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b 
>   ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94 
> 
> Diff: https://reviews.apache.org/r/41795/diff/
> 
> 
> Testing
> -------
> 
> Manually Tested
> Unit test updated.
> 
> 
> Thanks,
> 
> Matt
> 
>


Re: Review Request 41795: Make the hdfs replication as 1 when it is single node cluster

Posted by Alejandro Fernandez <af...@hortonworks.com>.

> On Dec. 30, 2015, 7:54 p.m., Alejandro Fernandez wrote:
> > ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py, line 245
> > <https://reviews.apache.org/r/41795/diff/1/?file=1178449#file1178449line245>
> >
> >     Agree with bhuvnesh, this should be based on # of DNs. If less than 3, it should be the # of DNs.
> 
> Matt wrote:
>     After giving it some thought, I prefer to have dfs.replication to be set as 3, regardless of no. of datanodes. Sumit has a good point.

HDFS may only check dfs.namenode.replication.min when coming out of safemode (so that value may always <= # DNs). So perhaps ok to set dfs.replication and dfs.replication.max to 3.


- Alejandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/#review112345
-----------------------------------------------------------


On Dec. 30, 2015, 11:04 a.m., Matt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41795/
> -----------------------------------------------------------
> 
> (Updated Dec. 30, 2015, 11:04 a.m.)
> 
> 
> Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
> 
> 
> Bugs: AMBARI-14459
>     https://issues.apache.org/jira/browse/AMBARI-14459
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b 
>   ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94 
> 
> Diff: https://reviews.apache.org/r/41795/diff/
> 
> 
> Testing
> -------
> 
> Manually Tested
> Unit test updated.
> 
> 
> Thanks,
> 
> Matt
> 
>


Re: Review Request 41795: Make the hdfs replication as 1 when it is single node cluster

Posted by Matt <mi...@gmail.com>.

> On Dec. 30, 2015, 11:54 a.m., Alejandro Fernandez wrote:
> > ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py, line 245
> > <https://reviews.apache.org/r/41795/diff/1/?file=1178449#file1178449line245>
> >
> >     Agree with bhuvnesh, this should be based on # of DNs. If less than 3, it should be the # of DNs.

After giving it some thought, I prefer to have dfs.replication to be set as 3, regardless of no. of datanodes. Sumit has a good point.


- Matt


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/#review112345
-----------------------------------------------------------


On Dec. 30, 2015, 3:04 a.m., Matt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41795/
> -----------------------------------------------------------
> 
> (Updated Dec. 30, 2015, 3:04 a.m.)
> 
> 
> Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
> 
> 
> Bugs: AMBARI-14459
>     https://issues.apache.org/jira/browse/AMBARI-14459
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b 
>   ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94 
> 
> Diff: https://reviews.apache.org/r/41795/diff/
> 
> 
> Testing
> -------
> 
> Manually Tested
> Unit test updated.
> 
> 
> Thanks,
> 
> Matt
> 
>


Re: Review Request 41795: Make the hdfs replication as 1 when it is single node cluster

Posted by Alejandro Fernandez <af...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/#review112345
-----------------------------------------------------------



ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py (line 245)
<https://reviews.apache.org/r/41795/#comment172826>

    Agree with bhuvnesh, this should be based on # of DNs. If less than 3, it should be the # of DNs.


- Alejandro Fernandez


On Dec. 30, 2015, 11:04 a.m., Matt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41795/
> -----------------------------------------------------------
> 
> (Updated Dec. 30, 2015, 11:04 a.m.)
> 
> 
> Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
> 
> 
> Bugs: AMBARI-14459
>     https://issues.apache.org/jira/browse/AMBARI-14459
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b 
>   ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94 
> 
> Diff: https://reviews.apache.org/r/41795/diff/
> 
> 
> Testing
> -------
> 
> Manually Tested
> Unit test updated.
> 
> 
> Thanks,
> 
> Matt
> 
>