You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Peter Bacsko (Jira)" <ji...@apache.org> on 2019/09/13 04:09:00 UTC
[jira] [Created] (YARN-9833) Race condition when
DirectoryCollection.checkDirs() runs during container launch
Peter Bacsko created YARN-9833:
----------------------------------
Summary: Race condition when DirectoryCollection.checkDirs() runs during container launch
Key: YARN-9833
URL: https://issues.apache.org/jira/browse/YARN-9833
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 3.2.0
Reporter: Peter Bacsko
Assignee: Peter Bacsko
During endurance testing, we found a race condition that cause an empty {{localDirs}} being passed to container-executor.
The problem is that {{DirectoryCollection.checkDirs()}} clears three collections:
{code:java}
this.writeLock.lock();
try {
localDirs.clear();
errorDirs.clear();
fullDirs.clear();
...
{code}
This happens in critical section guarded by a write lock. When we start a container, we retrieve the local dirs by calling {{dirsHandler.getLocalDirs();}} which in turn invokes {{DirectoryCollection.getGoodDirs()}}. The implementation of this method is:
{code:java}
List<String> getGoodDirs() {
this.readLock.lock();
try {
return Collections.unmodifiableList(localDirs);
} finally {
this.readLock.unlock();
}
}
{code}
So we're also in a critical section guarded by the lock. But {{Collections.unmodifiableList()}} only returns a _view_ of the collection, not a copy. After we get the view, {{MonitoringTimerTask.run()}} might be scheduled to run and immediately clears {{localDirs}}.
This caused a weird behaviour in container-executor, which exited with error code 35 (COULD_NOT_CREATE_WORK_DIRECTORIES).
Therefore we can't just return a view, we must return a copy with {{ImmutableList.copyOf()}}.
Credits to [~snemeth] for analyzing and determining the root cause.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org