You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ryu Kobayashi (JIRA)" <ji...@apache.org> on 2015/07/17 08:25:04 UTC

[jira] [Created] (MAPREDUCE-6436) JobHistory cache issue

Ryu Kobayashi created MAPREDUCE-6436:
----------------------------------------

             Summary: JobHistory cache issue
                 Key: MAPREDUCE-6436
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Ryu Kobayashi



Problem: 
HistoryFileManager.addIfAbsent produces large amount of logs if number of
cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
larger than mapreduce.jobhistory.joblist.cache.size by far.

Example:
For example, if the cache contains 50000 entries in total and 10,000 entries
newer than mapreduce.jobhistory.max-age-ms where
mapreduce.jobhistory.joblist.cache.size is 20000, HistoryFileManager.addIfAbsent
method produces 50000 - 20000 = 30000 lines of "Waiting to remove <key> from
JobListCache because it is not in done yet" message.

It will attach a stacktrace.

Impact:
In addition to large disk consumption, this issue blocks JobHistory.getJob
long time and slows job execution down significantly because getJob is called
by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
multiple threads call scanIfNeeded simultaneously, one of them acquires lock
and the other threads are blocked until the first thread completes long-running
HistoryFileManager.addIfAbsent call.

Solution: 
* Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take too long time.
* Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
  scanning if another thread is already scanning. This changes semantics of
  some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
  because scanIfNeeded keep outdated state.
* Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls are
  not blocked by a loop at scale of tens of thousands.
 
This patch implemented the first item.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)