You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ozone.apache.org by "Wei-Chiu Chuang (Jira)" <ji...@apache.org> on 2021/03/12 01:30:00 UTC

[jira] [Created] (HDDS-4970) Significant overhead when DataNode is over-scribed

Wei-Chiu Chuang created HDDS-4970:
-------------------------------------

             Summary: Significant overhead when DataNode is over-scribed
                 Key: HDDS-4970
                 URL: https://issues.apache.org/jira/browse/HDDS-4970
             Project: Apache Ozone
          Issue Type: Bug
          Components: Ozone Datanode
    Affects Versions: 1.0.0
            Reporter: Wei-Chiu Chuang
         Attachments: Screen Shot 2021-03-11 at 11.58.23 PM.png

Ran a microbenchmark to have concurrent clients reading chunks from a DataNode.

When the number of clients grows, there is a significant amount of overhead in accessing a concurrent hash map. The overhead grows exponentially with respect to the number of clients.
{code:java|title=ChunkUtils#processFileExclusively}
  @VisibleForTesting
  static <T> T processFileExclusively(Path path, Supplier<T> op) {
    for (;;) {
      if (LOCKS.add(path)) {
        break;
      }
    }

    try {
      return op.get();
    } finally {
      LOCKS.remove(path);
    }
  }
{code}
In my test, having 64 concurrent clients reading chunks from a 1-disk DataNode caused the DN to spend nearly half of the time adding into the LOCKS object (a concurrent hash map).

 

!Screen Shot 2021-03-11 at 11.58.23 PM.png|width=640!

 

Given that it is not uncommon to find HDFS DataNodes with tens of thousands of incoming client connections, I expect to see similar traffic to an Ozone DataNode at scale.

We should fix this code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org