You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Johannes Zillmann <jz...@googlemail.com> on 2010/06/01 13:31:30 UTC

Re: hanging task

Hi Raghava,

it was a specific business logic problem not a common applicable problem!

Johannes

On May 28, 2010, at 6:18 PM, Raghava Mutharaju wrote:

> Hello Johannes,
> 
>        Is this about the common tip of using StringBuilder/StringBuffer
> instead of '+' when concatenating Strings? I would like to know more about
> the problem and how you resolved it.
> 
> 
> Regards,
> Raghava.
> 
> On Fri, May 28, 2010 at 6:43 AM, Johannes Zillmann <jzillmann@googlemail.com
>> wrote:
> 
>> Thanks for the valuable input!
>> It was some string concatenation i did in each map call and it got super
>> long. Resolving this made the problem go away!
>> 
>> cheers
>> Johannes
>> 
>> On May 7, 2010, at 6:35 PM, Michael Segel wrote:
>> 
>>> 
>>> There's a couple of things...
>>> 
>>> 1) Inside the map() method, if you're taking longer than the timeout, you
>> will fail.
>>> 2) You could also fail if you run in to GC problems that cause timeout
>> failures.
>>> 
>>> What sometimes happens in long running jobs, you tend to see your GC take
>> longer as the job runs.
>>> 
>>> Do you log a start and end time in each iteration of map()? This will
>> tell you if the individual time spent in a map() method call is too long.
>>> 
>>> Also look at GC. Trying to go through all the logs can really drive one
>> to drink. :-)
>>> 
>>> HTH
>>> 
>>> -Mike
>>> 
>>> 
>>>> Subject: Re: hanging task
>>>> From: jzillmann@googlemail.com
>>>> Date: Fri, 7 May 2010 17:59:01 +0200
>>>> To: common-user@hadoop.apache.org
>>>> 
>>>> Sorry, forgot to say that on map task runs about more then one hour.
>>>> So i do not think its the progress thing. So there are successful map
>> tasks and those which are failing failing long after the 10 min timeout.
>>>> And the question is why there is not any stack of the map task visibly
>> in the thread dump.
>>>> Is it swallowed or is hadoop stuck right after the mapper is done ?
>>>> 
>>>> Johannes
>>>> 
>>>> 
>>>> On May 7, 2010, at 5:16 PM, Raghava Mutharaju wrote:
>>>> 
>>>>> Hello Johannes,
>>>>> 
>>>>> I had a similar problem and I used the first approach suggested by
>> Joseph
>>>>> i.e. to report the status back. I used progress() method as well as the
>>>>> setStatus() method. progress() method javadoc link is given below.
>>>>> 
>>>>> 
>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Progressable.html#progress%28%29
>>>>> 
>>>>> This should solve your problem. If you come to know of the reason why
>> it is
>>>>> taking more time than you expected, please mention it here.
>>>>> 
>>>>> Regards,
>>>>> Raghava.
>>>>> 
>>>>> On Fri, May 7, 2010 at 10:31 AM, Joseph Stein <cr...@gmail.com>
>> wrote:
>>>>> 
>>>>>> You need to either report status or increment a counter from within
>>>>>> your task. In your Java code there is a little trick to help the job
>>>>>> be “aware” within the cluster of tasks that are not dead but just
>>>>>> working hard.  During execution of a task there is no built in
>>>>>> reporting that the job is running as expected if it is not writing
>>>>>> out.  So this means that if your tasks are taking up a lot of time
>>>>>> doing work it is possible the cluster will see that task as failed
>>>>>> (based on the mapred.task.tracker.expiry.interval setting).
>>>>>> 
>>>>>> Have no fear there is a way to tell cluster that your task is doing
>>>>>> just fine.  You have 2 ways todo this you can either report the status
>>>>>> or increment a counter.  Both of these will cause the task tracker to
>>>>>> properly know the task is ok and this will get seen by the jobtracker
>>>>>> in turn.  Both of these options are explained in the JavaDoc
>>>>>> 
>>>>>> 
>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reporter.html
>>>>>> 
>>>>>> Here are some pointers in general you may find useful
>>>>>> 
>>>>>> 
>> http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/
>>>>>> 
>>>>>> On Fri, May 7, 2010 at 10:26 AM, Johannes Zillmann
>>>>>> <jz...@googlemail.com> wrote:
>>>>>>> Hi hadoop folks,
>>>>>>> 
>>>>>>> i'm encountering following problem on a 4 node cluster running
>>>>>> hadoop-0.20.2.
>>>>>>> 
>>>>>>> Its a map only job reading about 9 GB data from outside of hadoop. 31
>> map
>>>>>> tasks at all while 12 map tasks running at a time.
>>>>>>> The first wave of mappers finishes successfully.
>>>>>>> Later on the first tasks are failing and they do that shortly before
>>>>>> finishing:
>>>>>>> Task attempt_201005061210_0002_m_000001_0 failed to report status for
>> 602
>>>>>> seconds. Killing!
>>>>>>> Task attempt_201005061210_0002_m_000001_1 failed to report status for
>> 600
>>>>>> seconds. Killing!
>>>>>>> Task attempt_201005061210_0002_m_000001_2 failed to report status for
>> 602
>>>>>> seconds. Killing!
>>>>>>> 
>>>>>>> The unusual is that i do not find any signs of the job code in the
>> thread
>>>>>> dump the tasktracker takes automatically:
>>>>>>> ----------------------------------------------------------------
>>>>>>> 2010-05-05 00:59:03,515 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> attempt_201004301437_0050_m_000001_0: Task
>>>>>> attempt_201004301437_0050_m_000001_0 failed to report status for 601
>>>>>> seconds. Killing!
>>>>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> Process Thread Dump: lost task
>>>>>>> 34 active threads
>>>>>>> Thread 29555 (process reaper):
>>>>>>> State: RUNNABLE
>>>>>>> Blocked count: 0
>>>>>>> Waited count: 0
>>>>>>> Stack:
>>>>>>> java.lang.UNIXProcess.waitForProcessExit(Native Method)
>>>>>>> java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
>>>>>>> java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
>>>>>>> Thread 29554 (JVM Runner jvm_201004301437_0050_m_1465855495
>> spawned.):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 1
>>>>>>> Waited count: 2
>>>>>>> Waiting on java.lang.UNIXProcess@34b56f69
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> java.lang.Object.wait(Object.java:485)
>>>>>>> java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
>>>>>>> org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
>>>>>>> org.apache.hadoop.util.Shell.run(Shell.java:134)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
>>>>>>> Thread 29550 (Thread-15193):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 13
>>>>>>> Waited count: 14
>>>>>>> Waiting on java.lang.Object@72c09161
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> java.lang.Object.wait(Object.java:485)
>>>>>>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
>>>>>>> Thread 29501 (process reaper):
>>>>>>> State: RUNNABLE
>>>>>>> Blocked count: 0
>>>>>>> Waited count: 0
>>>>>>> Stack:
>>>>>>> java.lang.UNIXProcess.waitForProcessExit(Native Method)
>>>>>>> java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
>>>>>>> java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
>>>>>>> Thread 29500 (JVM Runner jvm_201004301437_0050_m_861348535 spawned.):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 1
>>>>>>> Waited count: 2
>>>>>>> Waiting on java.lang.UNIXProcess@56175cd4
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> java.lang.Object.wait(Object.java:485)
>>>>>>> java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
>>>>>>> org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
>>>>>>> org.apache.hadoop.util.Shell.run(Shell.java:134)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
>>>>>>> Thread 29496 (Thread-15162):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 13
>>>>>>> Waited count: 14
>>>>>>> Waiting on java.lang.Object@3b916de2
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> java.lang.Object.wait(Object.java:485)
>>>>>>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
>>>>>>> Thread 29225 (process reaper):
>>>>>>> State: RUNNABLE
>>>>>>> Blocked count: 0
>>>>>>> Waited count: 0
>>>>>>> Stack:
>>>>>>> java.lang.UNIXProcess.waitForProcessExit(Native Method)
>>>>>>> java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
>>>>>>> java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
>>>>>>> Thread 29224 (JVM Runner jvm_201004301437_0050_m_-205889314
>> spawned.):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 1
>>>>>>> Waited count: 2
>>>>>>> Waiting on java.lang.UNIXProcess@39874d3b
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> java.lang.Object.wait(Object.java:485)
>>>>>>> java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
>>>>>>> org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
>>>>>>> org.apache.hadoop.util.Shell.run(Shell.java:134)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
>>>>>>> Thread 29217 (process reaper):
>>>>>>> State: RUNNABLE
>>>>>>> Blocked count: 0
>>>>>>> Waited count: 0
>>>>>>> Stack:
>>>>>>> java.lang.UNIXProcess.waitForProcessExit(Native Method)
>>>>>>> java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
>>>>>>> java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
>>>>>>> Thread 29215 (JVM Runner jvm_201004301437_0050_m_-1797469636
>> spawned.):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 1
>>>>>>> Waited count: 2
>>>>>>> Waiting on java.lang.UNIXProcess@2c39220f
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> java.lang.Object.wait(Object.java:485)
>>>>>>> java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
>>>>>>> org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
>>>>>>> org.apache.hadoop.util.Shell.run(Shell.java:134)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
>>>>>>> Thread 29208 (Thread-15014):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 18
>>>>>>> Waited count: 14
>>>>>>> Waiting on java.lang.Object@7ddbbae2
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> java.lang.Object.wait(Object.java:485)
>>>>>>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
>>>>>>> Thread 29203 (Thread-15011):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 13
>>>>>>> Waited count: 14
>>>>>>> Waiting on java.lang.Object@2dac3f6f
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> java.lang.Object.wait(Object.java:485)
>>>>>>> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
>>>>>>> Thread 26714 (IPC Client (47) connection to /10.0.11.64:9001 from
>>>>>> hadoop):
>>>>>>> State: TIMED_WAITING
>>>>>>> Blocked count: 10075
>>>>>>> Waited count: 10075
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403)
>>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:445)
>>>>>>> Thread 33 (Directory/File cleanup thread):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 2
>>>>>>> Waited count: 615
>>>>>>> Waiting on
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@4d6c68b3
>>>>>>> Stack:
>>>>>>> sun.misc.Unsafe.park(Native Method)
>>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run(CleanupQueue.java:89)
>>>>>>> Thread 9 (taskCleanup):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 30
>>>>>>> Waited count: 92
>>>>>>> Waiting on
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3298407f
>>>>>>> Stack:
>>>>>>> sun.misc.Unsafe.park(Native Method)
>>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>>> org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:317)
>>>>>>> java.lang.Thread.run(Thread.java:619)
>>>>>>> Thread 32 (TaskLauncher for task):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 127
>>>>>>> Waited count: 123
>>>>>>> Waiting on java.util.LinkedList@c33377
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> java.lang.Object.wait(Object.java:485)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1601)
>>>>>>> Thread 31 (TaskLauncher for task):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 3237
>>>>>>> Waited count: 3147
>>>>>>> Waiting on java.util.LinkedList@67001629
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> java.lang.Object.wait(Object.java:485)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1601)
>>>>>>> Thread 30 (Map-events fetcher for all reduce tasks on
>>>>>> tracker_node1t:localhost/127.0.0.1:48722):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 712
>>>>>>> Waited count: 804
>>>>>>> Waiting on java.util.TreeMap@1fec8cf1
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> java.lang.Object.wait(Object.java:485)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:606)
>>>>>>> Thread 28 (IPC Server handler 7 on 48722):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 10
>>>>>>> Waited count: 3199
>>>>>>> Waiting on
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>>>>>> Stack:
>>>>>>> sun.misc.Unsafe.park(Native Method)
>>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>>>> Thread 27 (IPC Server handler 6 on 48722):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 265
>>>>>>> Waited count: 3207
>>>>>>> Waiting on
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>>>>>> Stack:
>>>>>>> sun.misc.Unsafe.park(Native Method)
>>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>>>> Thread 26 (IPC Server handler 5 on 48722):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 296
>>>>>>> Waited count: 3206
>>>>>>> Waiting on
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>>>>>> Stack:
>>>>>>> sun.misc.Unsafe.park(Native Method)
>>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>>>> Thread 25 (IPC Server handler 4 on 48722):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 199
>>>>>>> Waited count: 3202
>>>>>>> Waiting on
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>>>>>> Stack:
>>>>>>> sun.misc.Unsafe.park(Native Method)
>>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>>>> Thread 24 (IPC Server handler 3 on 48722):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 216
>>>>>>> Waited count: 3201
>>>>>>> Waiting on
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>>>>>> Stack:
>>>>>>> sun.misc.Unsafe.park(Native Method)
>>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>>>> Thread 23 (IPC Server handler 2 on 48722):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 119
>>>>>>> Waited count: 3199
>>>>>>> Waiting on
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>>>>>> Stack:
>>>>>>> sun.misc.Unsafe.park(Native Method)
>>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>>>> Thread 22 (IPC Server handler 1 on 48722):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 193
>>>>>>> Waited count: 3207
>>>>>>> Waiting on
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>>>>>> Stack:
>>>>>>> sun.misc.Unsafe.park(Native Method)
>>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>>>> Thread 21 (IPC Server handler 0 on 48722):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 285
>>>>>>> Waited count: 3203
>>>>>>> Waiting on
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>>>>>> Stack:
>>>>>>> sun.misc.Unsafe.park(Native Method)
>>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>>>>> 
>>>>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>>>>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>>>>>> Thread 18 (IPC Server listener on 48722):
>>>>>>> State: RUNNABLE
>>>>>>> Blocked count: 0
>>>>>>> Waited count: 0
>>>>>>> Stack:
>>>>>>> sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>>>>>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
>>>>>>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>>>>>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>>>>>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>>>>>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
>>>>>>> org.apache.hadoop.ipc.Server$Listener.run(Server.java:318)
>>>>>>> Thread 20 (IPC Server Responder):
>>>>>>> State: RUNNABLE
>>>>>>> Blocked count: 0
>>>>>>> Waited count: 0
>>>>>>> Stack:
>>>>>>> sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>>>>>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
>>>>>>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>>>>>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>>>>>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>>>>>> org.apache.hadoop.ipc.Server$Responder.run(Server.java:478)
>>>>>>> Thread 17 (Timer-0):
>>>>>>> State: TIMED_WAITING
>>>>>>> Blocked count: 1
>>>>>>> Waited count: 12764
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> java.util.TimerThread.mainLoop(Timer.java:509)
>>>>>>> java.util.TimerThread.run(Timer.java:462)
>>>>>>> Thread 16 (295726648@qtp0-0 - Acceptor0
>>>>>> SelectChannelConnector@0.0.0.0:50060):
>>>>>>> State: RUNNABLE
>>>>>>> Blocked count: 1310
>>>>>>> Waited count: 1
>>>>>>> Stack:
>>>>>>> sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>>>>>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
>>>>>>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>>>>>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>>>>>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>>>>>> 
>>>>>> 
>> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429)
>>>>>>> 
>> org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185)
>>>>>>> 
>>>>>> 
>> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
>>>>>>> 
>>>>>> 
>> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
>>>>>>> 
>>>>>> 
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
>>>>>>> Thread 4 (Signal Dispatcher):
>>>>>>> State: RUNNABLE
>>>>>>> Blocked count: 0
>>>>>>> Waited count: 0
>>>>>>> Stack:
>>>>>>> Thread 3 (Finalizer):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 1337
>>>>>>> Waited count: 1336
>>>>>>> Waiting on java.lang.ref.ReferenceQueue$Lock@5e77533b
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
>>>>>>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
>>>>>>> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>>>>>>> Thread 2 (Reference Handler):
>>>>>>> State: WAITING
>>>>>>> Blocked count: 1544
>>>>>>> Waited count: 1336
>>>>>>> Waiting on java.lang.ref.Reference$Lock@46efbdf1
>>>>>>> Stack:
>>>>>>> java.lang.Object.wait(Native Method)
>>>>>>> java.lang.Object.wait(Object.java:485)
>>>>>>> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>>>>>>> Thread 1 (main):
>>>>>>> State: RUNNABLE
>>>>>>> Blocked count: 152594
>>>>>>> Waited count: 279661
>>>>>>> Stack:
>>>>>>> sun.management.ThreadImpl.getThreadInfo0(Native Method)
>>>>>>> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:147)
>>>>>>> sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:123)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.util.ReflectionUtils.logThreadInfo(ReflectionUtils.java:203)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks(TaskTracker.java:1323)
>>>>>>> 
>>>>>> 
>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1106)
>>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720)
>>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)
>>>>>>> 
>>>>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker:
>> About
>>>>>> to purge task: attempt_201004301437_0050_m_000001_0
>>>>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> addFreeSlot : current free slots : 1
>>>>>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskRunner:
>>>>>> attempt_201004301437_0050_m_000001_0 done; removing files.
>>>>>>> 2010-05-05 00:59:03,521 INFO org.apache.hadoop.mapred.IndexCache: Map
>> ID
>>>>>> attempt_201004301437_0050_m_000001_0 not found in cache
>>>>>>> 2010-05-05 00:59:05,012 INFO org.apache.hadoop.mapred.JvmManager: JVM
>> :
>>>>>> jvm_201004301437_0050_m_-205889314 exited. Number of tasks it ran: 0
>>>>>>> 2010-05-05 00:59:06,521 INFO org.apache.hadoop.mapred.TaskRunner:
>>>>>> attempt_201004301437_0050_m_000001_0 done; removing files.
>>>>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> LaunchTaskAction (registerTask): attempt_201004301437_0050_m_000001_0
>> task's
>>>>>> state:FAILED_UNCLEAN
>>>>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker:
>> Trying
>>>>>> to launch : attempt_201004301437_0050_m_000001_0
>>>>>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker: In
>>>>>> TaskLauncher, current free slots : 1 and trying to launch
>>>>>> attempt_201004301437_0050_m_000001_0
>>>>>>> 2010-05-05 00:59:08,606 INFO org.apache.hadoop.mapred.TaskTracker:
>> JVM
>>>>>> with ID: jvm_201004301437_0050_m_-1095145752 given task:
>>>>>> attempt_201004301437_0050_m_000001_0
>>>>>>> 2010-05-05 00:59:09,222 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> attempt_201004301437_0050_m_000001_0 0.0%
>>>>>>> 2010-05-05 00:59:09,560 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> attempt_201004301437_0050_m_000001_0 0.0% cleanup
>>>>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskTracker:
>> Task
>>>>>> attempt_201004301437_0050_m_000001_0 is done.
>>>>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> reported output size for attempt_201004301437_0050_m_000001_0  was 0
>>>>>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskRunner:
>>>>>> attempt_201004301437_0050_m_000001_0 done; removing files.
>>>>>>> ----------------------------------------------------------------
>>>>>>> 
>>>>>>> The syslog of the attempt just looks like:
>>>>>>> ----------------------------------------------------------------
>>>>>>> 2010-05-04 23:48:03,365 INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>>>> Initializing JVM Metrics with processName=MAP, sessionId=
>>>>>>> 2010-05-04 23:48:03,674 WARN datameer.dap.sdk.util.ManifestMetaData:
>>>>>> Failed to get version from the manifest file of jar
>>>>>> 
>> 'file:/data/drive0/mapred/tmp/taskTracker/archive/nurago64.local/das/jobjars/61a2133b7887b9be48d42e49002c85e0/stripped-dap-0.24.dev-job.jar/stripped-dap-0.24.dev-job.jar'
>>>>>> : null
>>>>>>> 2010-05-04 23:48:04,691 INFO org.apache.hadoop.mapred.MapTask:
>>>>>> numReduceTasks: 0
>>>>>>> 2010-05-04 23:48:04,872 INFO org.apache.hadoop.util.NativeCodeLoader:
>>>>>> Loaded the native-hadoop library
>>>>>>> 2010-05-04 23:48:04,873 INFO
>>>>>> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
>>>>>> initialized native-zlib library
>>>>>>> 2010-05-04 23:48:04,874 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got
>>>>>> brand-new compressor
>>>>>>> 2010-05-05 00:59:08,753 INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>>>> Initializing JVM Metrics with processName=CLEANUP, sessionId=
>>>>>>> 2010-05-05 00:59:09,223 INFO org.apache.hadoop.mapred.TaskRunner:
>>>>>> Runnning cleanup for the task
>>>>>>> 2010-05-05 00:59:09,442 INFO org.apache.hadoop.mapred.TaskRunner:
>>>>>> Task:attempt_201004301437_0050_m_000001_0 is done. And is in the
>> process of
>>>>>> commiting
>>>>>>> 2010-05-05 00:59:09,562 INFO org.apache.hadoop.mapred.TaskRunner:
>> Task
>>>>>> 'attempt_201004301437_0050_m_000001_0' done.
>>>>>>> ----------------------------------------------------------------
>>>>>>> 
>>>>>>> Any ideas ?
>>>>>>> 
>>>>>>> best regards
>>>>>>> Johannes
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> /*
>>>>>> Joe Stein
>>>>>> http://www.linkedin.com/in/charmalloc
>>>>>> */
>>>>>> 
>>>> 
>>> 
>>> _________________________________________________________________
>>> The New Busy is not the old busy. Search, chat and e-mail from your
>> inbox.
>>> 
>> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
>> 
>>