You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by Matt <mi...@gmail.com> on 2015/12/30 12:04:55 UTC
Review Request 41795: Make the hdfs replication as 1 when it is
single node cluster
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/
-----------------------------------------------------------
Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
Bugs: AMBARI-14459
https://issues.apache.org/jira/browse/AMBARI-14459
Repository: ambari
Description
-------
By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
Diffs
-----
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b
ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94
Diff: https://reviews.apache.org/r/41795/diff/
Testing
-------
Manually Tested
Unit test updated.
Thanks,
Matt
Re: Review Request 41795: Make the hdfs replication as 1 when it is
single node cluster
Posted by Matt <mi...@gmail.com>.
> On Dec. 30, 2015, 9:39 a.m., Sumit Mohanty wrote:
> > I wonder if someone with HDFS expertise can chime in here.
> >
> > In general, I have not run into any problem with replication factor being 3 but number of DNs being less than that (yes, some tests such as decommissioning require explicit changing of the value).
> > As you add more DNs HDFS starts adding replicas.
> >
> > The opposite seems to be risky to me. Start with a replication factor of 1 and then add DNs. If you forget to change the value, replicas will not be created. Data that is already stored will not be replicated. So in essence, if one is deploying a cluster that needs to live long "3" is a better value. If the cluster never needs more than 1-2 DN then its likely a test cluster and thus not a higher priority scenario.
> >
> > So I would rather leave the replication factor to be 3 and then have test deployments change defaults based on what they are testing.
> >
> > -1 for the change (the code change is good, but I am not convinced that 1 is a good default)
I agree with you Sumit! I did not think this through when I started working on this.
I'd rather leave the replication factor to be 3 as default, regardless of having or not having less than three DNs.
As a user I might start with 1 DN when I start a cluster managed by Ambari, and add 3 DNs the next day. And totally forget that Ambari set the dfs.replication on my cluster to 1 on day 1, and never changed it.
As a user, I myself would not want to have the default set to 1 or 2 (based on no of DNs). And leave the default as 3.
I can discard the changes and close the JIRA, with no action to be taken, if you agree to it.
- Matt
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/#review112324
-----------------------------------------------------------
On Dec. 30, 2015, 3:04 a.m., Matt wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41795/
> -----------------------------------------------------------
>
> (Updated Dec. 30, 2015, 3:04 a.m.)
>
>
> Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
>
>
> Bugs: AMBARI-14459
> https://issues.apache.org/jira/browse/AMBARI-14459
>
>
> Repository: ambari
>
>
> Description
> -------
>
> By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
>
>
> Diffs
> -----
>
> ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b
> ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94
>
> Diff: https://reviews.apache.org/r/41795/diff/
>
>
> Testing
> -------
>
> Manually Tested
> Unit test updated.
>
>
> Thanks,
>
> Matt
>
>
Re: Review Request 41795: Make the hdfs replication as 1 when it is
single node cluster
Posted by Sumit Mohanty <sm...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/#review112324
-----------------------------------------------------------
I wonder if someone with HDFS expertise can chime in here.
In general, I have not run into any problem with replication factor being 3 but number of DNs being less than that (yes, some tests such as decommissioning require explicit changing of the value).
As you add more DNs HDFS starts adding replicas.
The opposite seems to be risky to me. Start with a replication factor of 1 and then add DNs. If you forget to change the value, replicas will not be created. Data that is already stored will not be replicated. So in essence, if one is deploying a cluster that needs to live long "3" is a better value. If the cluster never needs more than 1-2 DN then its likely a test cluster and thus not a higher priority scenario.
So I would rather leave the replication factor to be 3 and then have test deployments change defaults based on what they are testing.
-1 for the change (the code change is good, but I am not convinced that 1 is a good default)
- Sumit Mohanty
On Dec. 30, 2015, 11:04 a.m., Matt wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41795/
> -----------------------------------------------------------
>
> (Updated Dec. 30, 2015, 11:04 a.m.)
>
>
> Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
>
>
> Bugs: AMBARI-14459
> https://issues.apache.org/jira/browse/AMBARI-14459
>
>
> Repository: ambari
>
>
> Description
> -------
>
> By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
>
>
> Diffs
> -----
>
> ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b
> ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94
>
> Diff: https://reviews.apache.org/r/41795/diff/
>
>
> Testing
> -------
>
> Manually Tested
> Unit test updated.
>
>
> Thanks,
>
> Matt
>
>
Re: Review Request 41795: Make the hdfs replication as 1 when it is
single node cluster
Posted by Matt <mi...@gmail.com>.
> On Dec. 30, 2015, 9:29 a.m., bhuvnesh chaudhary wrote:
> > ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py, line 245
> > <https://reviews.apache.org/r/41795/diff/1/?file=1178449#file1178449line245>
> >
> > We can extend this condition to handle cases where we have 2 datanodes because in that case as well dfs.replication = 3 is not appropriate.
> >
> > So If 1 DN,dfs.replication=1, If 2 DN, dfs.replication=2.
> > Rest as is.
After giving it some thought, I prefer to have dfs.replication to be set as 3, regardless of no. of datanodes. Sumit has a good point.
- Matt
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/#review112321
-----------------------------------------------------------
On Dec. 30, 2015, 3:04 a.m., Matt wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41795/
> -----------------------------------------------------------
>
> (Updated Dec. 30, 2015, 3:04 a.m.)
>
>
> Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
>
>
> Bugs: AMBARI-14459
> https://issues.apache.org/jira/browse/AMBARI-14459
>
>
> Repository: ambari
>
>
> Description
> -------
>
> By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
>
>
> Diffs
> -----
>
> ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b
> ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94
>
> Diff: https://reviews.apache.org/r/41795/diff/
>
>
> Testing
> -------
>
> Manually Tested
> Unit test updated.
>
>
> Thanks,
>
> Matt
>
>
Re: Review Request 41795: Make the hdfs replication as 1 when it is
single node cluster
Posted by bhuvnesh chaudhary <bc...@pivotal.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/#review112321
-----------------------------------------------------------
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py (line 245)
<https://reviews.apache.org/r/41795/#comment172766>
We can extend this condition to handle cases where we have 2 datanodes because in that case as well dfs.replication = 3 is not appropriate.
So If 1 DN,dfs.replication=1, If 2 DN, dfs.replication=2.
Rest as is.
- bhuvnesh chaudhary
On Dec. 30, 2015, 11:04 a.m., Matt wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41795/
> -----------------------------------------------------------
>
> (Updated Dec. 30, 2015, 11:04 a.m.)
>
>
> Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
>
>
> Bugs: AMBARI-14459
> https://issues.apache.org/jira/browse/AMBARI-14459
>
>
> Repository: ambari
>
>
> Description
> -------
>
> By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
>
>
> Diffs
> -----
>
> ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b
> ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94
>
> Diff: https://reviews.apache.org/r/41795/diff/
>
>
> Testing
> -------
>
> Manually Tested
> Unit test updated.
>
>
> Thanks,
>
> Matt
>
>
Re: Review Request 41795: Make the hdfs replication as 1 when it is
single node cluster
Posted by Alejandro Fernandez <af...@hortonworks.com>.
> On Dec. 30, 2015, 7:54 p.m., Alejandro Fernandez wrote:
> > ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py, line 245
> > <https://reviews.apache.org/r/41795/diff/1/?file=1178449#file1178449line245>
> >
> > Agree with bhuvnesh, this should be based on # of DNs. If less than 3, it should be the # of DNs.
>
> Matt wrote:
> After giving it some thought, I prefer to have dfs.replication to be set as 3, regardless of no. of datanodes. Sumit has a good point.
HDFS may only check dfs.namenode.replication.min when coming out of safemode (so that value may always <= # DNs). So perhaps ok to set dfs.replication and dfs.replication.max to 3.
- Alejandro
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/#review112345
-----------------------------------------------------------
On Dec. 30, 2015, 11:04 a.m., Matt wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41795/
> -----------------------------------------------------------
>
> (Updated Dec. 30, 2015, 11:04 a.m.)
>
>
> Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
>
>
> Bugs: AMBARI-14459
> https://issues.apache.org/jira/browse/AMBARI-14459
>
>
> Repository: ambari
>
>
> Description
> -------
>
> By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
>
>
> Diffs
> -----
>
> ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b
> ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94
>
> Diff: https://reviews.apache.org/r/41795/diff/
>
>
> Testing
> -------
>
> Manually Tested
> Unit test updated.
>
>
> Thanks,
>
> Matt
>
>
Re: Review Request 41795: Make the hdfs replication as 1 when it is
single node cluster
Posted by Matt <mi...@gmail.com>.
> On Dec. 30, 2015, 11:54 a.m., Alejandro Fernandez wrote:
> > ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py, line 245
> > <https://reviews.apache.org/r/41795/diff/1/?file=1178449#file1178449line245>
> >
> > Agree with bhuvnesh, this should be based on # of DNs. If less than 3, it should be the # of DNs.
After giving it some thought, I prefer to have dfs.replication to be set as 3, regardless of no. of datanodes. Sumit has a good point.
- Matt
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/#review112345
-----------------------------------------------------------
On Dec. 30, 2015, 3:04 a.m., Matt wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41795/
> -----------------------------------------------------------
>
> (Updated Dec. 30, 2015, 3:04 a.m.)
>
>
> Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
>
>
> Bugs: AMBARI-14459
> https://issues.apache.org/jira/browse/AMBARI-14459
>
>
> Repository: ambari
>
>
> Description
> -------
>
> By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
>
>
> Diffs
> -----
>
> ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b
> ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94
>
> Diff: https://reviews.apache.org/r/41795/diff/
>
>
> Testing
> -------
>
> Manually Tested
> Unit test updated.
>
>
> Thanks,
>
> Matt
>
>
Re: Review Request 41795: Make the hdfs replication as 1 when it is
single node cluster
Posted by Alejandro Fernandez <af...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41795/#review112345
-----------------------------------------------------------
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py (line 245)
<https://reviews.apache.org/r/41795/#comment172826>
Agree with bhuvnesh, this should be based on # of DNs. If less than 3, it should be the # of DNs.
- Alejandro Fernandez
On Dec. 30, 2015, 11:04 a.m., Matt wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41795/
> -----------------------------------------------------------
>
> (Updated Dec. 30, 2015, 11:04 a.m.)
>
>
> Review request for Ambari, Alexander Denissov, Alejandro Fernandez, bhuvnesh chaudhary, Dmitro Lisnichenko, jun aoki, Lav Jain, Newton Alex, Oleksandr Diachenko, Sumit Mohanty, and Srimanth Gunturi.
>
>
> Bugs: AMBARI-14459
> https://issues.apache.org/jira/browse/AMBARI-14459
>
>
> Repository: ambari
>
>
> Description
> -------
>
> By default dfs.replication is set to 3. In a single node cluster, advise and validate dfs.replication = 1
>
>
> Diffs
> -----
>
> ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py 7e2871b
> ambari-server/src/test/python/stacks/2.0.6/common/test_stack_advisor.py 6699e94
>
> Diff: https://reviews.apache.org/r/41795/diff/
>
>
> Testing
> -------
>
> Manually Tested
> Unit test updated.
>
>
> Thanks,
>
> Matt
>
>