You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Elek, Marton (JIRA)" <ji...@apache.org> on 2018/07/26 08:16:00 UTC
[jira] [Created] (HDDS-296) OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan

Elek, Marton created HDDS-296:
---------------------------------

             Summary: OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan
                 Key: HDDS-296
                 URL: https://issues.apache.org/jira/browse/HDDS-296
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
            Reporter: Elek, Marton
             Fix For: 0.2.1


We identified the problem during freon tests on real clusters. First I saw it on a kubernetes based pseudo cluster (50 datanode, 1 freon). After a while the rate of the key allocation was slowed down. (See the attached image).

I could also reproduce the problem with local cluster (I used the hadoop-dist/target/compose/ozoneperf setup). After the first 1 million keys the key creation is almost stopped.

With the help of [~nandakumar131] we identified the problem is the lock in the ozone manager. (We profiled the OM with visual vm and found that the code is locked for an extremity long time, also checked the rocksdb/rpc metrics from prometheus and everything else was worked well.

[~nandakumar131] suggested to use Instrumented lock in the OMMetadataManager. With a custom build we identified that the problem is that the deletion service holds the OMMetadataManager lock for a full range scan. For 1 million keys it took about 10 seconds (with my local developer machine + ssd)

{code}
ozoneManager_1  | 2018-07-25 12:45:03 WARN  OMMetadataManager:143 - Lock held time above threshold: lock identifier: OMMetadataManagerLock lockHeldTimeMs=2648 ms. Suppressed 0 lock warnings. The stack trace is: java.lang.Thread.getStackTrace(Thread.java:1559)
ozoneManager_1  | org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
ozoneManager_1  | org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
ozoneManager_1  | org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
ozoneManager_1  | org.apache.hadoop.util.InstrumentedReadLock.unlock(InstrumentedReadLock.java:78)
ozoneManager_1  | org.apache.hadoop.ozone.om.KeyManagerImpl.getPendingDeletionKeys(KeyManagerImpl.java:506)
ozoneManager_1  | org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:98)
ozoneManager_1  | org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:85)
ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
ozoneManager_1  | java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
ozoneManager_1  | java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
ozoneManager_1  | java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
ozoneManager_1  | java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
ozoneManager_1  | java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
ozoneManager_1  | java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
ozoneManager_1  | java.lang.Thread.run(Thread.java:748)
{code}

I checked it with disabled DeletionService and worked well.

Deletion service should be improved to make it work without long term locking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org