You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Daryn Sharp (JIRA)" <ji...@apache.org> on 2017/10/12 14:23:00 UTC

[jira] [Created] (HDFS-12645) FSDatasetImpl lock will stall BP service actors and may cause missing blocks

Daryn Sharp created HDFS-12645:
----------------------------------

             Summary: FSDatasetImpl lock will stall BP service actors and may cause missing blocks
                 Key: HDFS-12645
                 URL: https://issues.apache.org/jira/browse/HDFS-12645
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 2.8.0
            Reporter: Daryn Sharp


The DN is extremely susceptible to a slow volume due bad locking practices.  DN operations require a fs dataset lock.  IO in the dataset lock should not be permissible as it leads to severe performance degradation and possibly (temporarily) missing blocks.

A slow disk will cause pipelines to experience significant latency and timeouts, increasing lock/io contention while cleaning up, leading to more timeouts, etc.  Meanwhile, the actor service thread is interleaving multiple lock acquire/releases with xceivers.  If many commands are issued, the node may be incorrectly declared as dead.

HDFS-12639 documents that both actors synchronize on the offer service lock while processing commands.  A backlogged active actor will block the standby actor and cause it to go dead too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org