You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2021/08/24 04:04:56 UTC
[GitHub] [hadoop] Neilxzn commented on pull request #3320: HDFS-16182.numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage.
Neilxzn commented on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-904304994
Agree it. I think we should fix it.
In my cluster, we use BlockPlacementPolicyDefault to choose dn and the number of SSD DN is much less than DISK DN. It may cause to some block that should be placed to SSD DNs fallback to place DISK DNs when SSD DNs are too busy or no enough place. Consider the following scenario.
1. Create empty file /foo_file
2. Set its storagepolicy to All_SSD
3. Put data to /foo_file
4. /foo_file gets 3 DISK dns for pipeline because SSD dns are too busy at the beginning.
5. When it transfers data in pipeline, one of 3 DISK dns shut down.
6. The client need to get one new dn for existing pipeline in DataStreamer$addDatanode2ExistingPipeline.
7. If SSD dns are available at the moment, namenode will choose the 3 SSD dns and return it to the client. However, the client just need one new dn, namenode returns 3 new SSD dn and the client threw exception in DataStreamer$findNewDatanode.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org