You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Anfernee Xu <an...@gmail.com> on 2014/09/11 00:28:11 UTC
The running job is blocked for a while if the queue is short of resources

Hi experts,

I faced one strange issue I cannot understand, can you guys tell me if this
is a bug or I configured something wrong. Below is my situation.

I'm running with Hadopp 2.2.0 release and all my jobs are uberized, each
node only can run a single job at a point of time, I used CapacityScheduler
and configured 2 queues(default and small), I only give 5% capacity(10
nodes) to the small queue. What I found is the thoughput of the small queue
is very poor if it's under heavy load( the inflow rate > processing speed),
I checked the log of the job, found out each job takes extra 1- 2 minutes
in job commit phase, see below log

2014-09-10 14:01:13,665 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from
attempt_1410336300553_9902_m_000000_0
2014-09-10 14:01:13,665 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of
TaskAttemptattempt_1410336300553_9902_m_000000_0 is : 1.0
2014-09-10 14:01:13,670 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from
attempt_1410336300553_9902_m_000000_0
2014-09-10 14:01:13,670 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.Task: Task 'attempt_1410336300553_9902_m_000000_0'
done.
2014-09-10 14:01:13,671 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1410336300553_9902_m_000000_0 TaskAttempt Transitioned from RUNNING
to SUCCESS_CONTAINER_CLEANUP
2014-09-10 14:01:13,671 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.LocalContainerLauncher: Processing the event
EventType: CONTAINER_REMOTE_CLEANUP for container
container_1410336300553_9902_01_000001 taskAttempt
attempt_1410336300553_9902_m_000000_0
2014-09-10 14:01:13,675 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1410336300553_9902_m_000000_0 TaskAttempt Transitioned from
SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
2014-09-10 14:01:13,685 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with
attempt attempt_1410336300553_9902_m_000000_0
2014-09-10 14:01:13,687 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
task_1410336300553_9902_m_000000 Task Transitioned from RUNNING to SUCCEEDED
2014-09-10 14:01:13,693 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2014-09-10 14:01:13,694 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.TIEMRAppMetrics: task is completed on
2014-09-10 14:01:13,697 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1410336300553_9902Job Transitioned from RUNNING to COMMITTING
2014-09-10 14:01:13,697 INFO [CommitterEvent Processor #1]
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing
the event EventType: JOB_COMMIT
2014-09-10 14:02:30,121 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling handler for
JobFinishedEvent
2014-09-10 14:02:30,122 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1410336300553_9902Job Transitioned from COMMITTING to SUCCEEDED

As you can see the job commit started at 14:01:13 and ended at  14:02:30,
it took a lot of time, I also captured the thread dump of the
job(AppMaster), the interesting part is here

"CommitterEvent Processor #1" id=91 idx=0x16c tid=29593 prio=5 alive,
waiting, native_blocked
    -- Waiting for notification on:
org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor0x906b46d0[fat
lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native
Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at
org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor.waitForValidCommitWindow(CommitterEventHandler.java:313)
    ^-- Lock released while waiting:
org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor0x906b46d0[fat
lock]
    at
org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:252)
    at
org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:216)
    at
java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java/lang/Thread.run(Thread.java:662)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

I checked the code, it got blocked and waiting for the heartbeat to RM, and
also I checked

org.apache.hadoop.mapreduce.v2.app.local.LocalContainerAllocator.heartbeat()

it seems sending another resource_allocate request to RM.

So my understanding(correct me if wrong) is if the queue is short of
resources(no available resources in the queue), the heartbeat will be
blocked for a while, thus the ongoing jobs will somehow suspend. To prove
my guess, I reduce the inflow rate and always keep one node idle(inflow
rate < processing speed), I can see the overall thoughput got improved
significantly.

My question is

Is this a bug or my configuration is incorrect?
Why heart beat send another resource_alloc request to RM?
   The scheduler supposes to inclining to the ongoing jobs when under load,
make ongoing jobs finish as soon as possible and block new jobs.

Thanks
-- 
--Anfernee