You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Daryn Sharp (JIRA)" <ji...@apache.org> on 2017/10/12 14:23:00 UTC
[jira] [Created] (HDFS-12645) FSDatasetImpl lock will stall BP
service actors and may cause missing blocks
Daryn Sharp created HDFS-12645:
----------------------------------
Summary: FSDatasetImpl lock will stall BP service actors and may cause missing blocks
Key: HDFS-12645
URL: https://issues.apache.org/jira/browse/HDFS-12645
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp
The DN is extremely susceptible to a slow volume due bad locking practices. DN operations require a fs dataset lock. IO in the dataset lock should not be permissible as it leads to severe performance degradation and possibly (temporarily) missing blocks.
A slow disk will cause pipelines to experience significant latency and timeouts, increasing lock/io contention while cleaning up, leading to more timeouts, etc. Meanwhile, the actor service thread is interleaving multiple lock acquire/releases with xceivers. If many commands are issued, the node may be incorrectly declared as dead.
HDFS-12639 documents that both actors synchronize on the offer service lock while processing commands. A backlogged active actor will block the standby actor and cause it to go dead too.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org