You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Mathias De Maré <ma...@gmail.com> on 2009/08/05 09:02:59 UTC

Some tasks fail to report status between the end of the map and the beginning of the merge

Hi,

I'm having some problems (Hadoop 0.20.0) where map tasks fail to report
status for 10 minutes and get killed eventually. All of the tasks output
around the same amount of data, some only take a few seconds before starting
the 'merge' on the segments, but some seem to fail by just stopping to work
for about 10 minutes.

Several of these failed tasks eventually do succeed, on their 2nd or 3rd
task attempt. It's nice to see them succeed eventually, but each of those
tasks crawls a few thousand websites, and it seems like a terrible waste to
let them retry a few times, in the meantime downloading all of those
websites again, after 10 minutes of doing nothing.

Even more annoyingly, eventually, one of the tasks fails completely, which
then kills the entire job.

I could probably increase the amount of task attempts and simply hope the
tasks will succeed eventually, but that doesn't solve the huge slowdowns and
the recrawling required.

Here's an example of a successful task attempt (this is attempt 3 of a
specific task -- note that it takes around 8 seconds between spill 133 and
spill 134):

2009-08-04 18:38:48,059 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 129

2009-08-04 18:39:00,626 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2009-08-04 18:39:00,626 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 2492163; bufend = 628913; bufvoid = 2988446

2009-08-04 18:39:00,626 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 9727; kvend = 7760; length = 9830
2009-08-04 18:39:01,467 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 130
2009-08-04 18:39:08,136 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true

2009-08-04 18:39:08,136 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 628913; bufend = 1880222; bufvoid = 2988448
2009-08-04 18:39:08,136 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 7760; kvend = 5793; length = 9830

2009-08-04 18:39:08,463 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 131
2009-08-04 18:39:12,456 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2009-08-04 18:39:12,459 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 1880222; bufend = 136018; bufvoid = 2988448

2009-08-04 18:39:12,459 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 5793; kvend = 3826; length = 9830
2009-08-04 18:39:12,697 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 132
2009-08-04 18:39:23,138 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true

2009-08-04 18:39:23,138 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 136018; bufend = 1347353; bufvoid = 2988448
2009-08-04 18:39:23,138 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 3826; kvend = 1859; length = 9830

2009-08-04 18:39:25,747 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 133
2009-08-04 18:47:49,823 INFO org.apache.hadoop.mapred.MapTask:
Starting flush of map output
2009-08-04 18:47:50,132 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 134

2009-08-04 18:47:50,525 INFO org.apache.hadoop.mapred.Merger: Merging
135 sorted segments
2009-08-04 18:47:50,528 INFO org.apache.hadoop.mapred.Merger: Merging
9 intermediate segments out of a total of 135
2009-08-04 18:47:52,224 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 127

2009-08-04 18:47:53,837 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 118
2009-08-04 18:47:55,417 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 109

2009-08-04 18:47:56,990 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 100
2009-08-04 18:47:58,492 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 91

2009-08-04 18:48:00,191 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 82
2009-08-04 18:48:02,315 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 73

2009-08-04 18:48:04,184 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 64
2009-08-04 18:48:06,162 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 55

2009-08-04 18:48:08,149 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 46
2009-08-04 18:48:09,888 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 37

2009-08-04 18:48:11,744 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 28
2009-08-04 18:48:13,544 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 19

2009-08-04 18:48:21,177 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 10 segments left of total size: 153426836
bytes
2009-08-04 18:48:46,226 INFO org.apache.hadoop.mapred.TaskRunner:
Task:attempt_200908041626_0001_m_000133_2 is done. And is in the
process of commiting

2009-08-04 18:48:46,373 INFO org.apache.hadoop.mapred.TaskRunner: Task
'attempt_200908041626_0001_m_000133_2' done.

And here the same task, but an earlier attempt (it only goes up to spill
132, but those do vary slightly in size, right?):

2009-08-04 17:15:49,112 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 321905; bufend = 1464691; bufvoid = 2988448
2009-08-04 17:15:49,112 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 7764; kvend = 5797; length = 9830

2009-08-04 17:15:49,910 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 126
2009-08-04 17:16:00,785 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2009-08-04 17:16:00,785 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 1464691; bufend = 2580856; bufvoid = 2988448

2009-08-04 17:16:00,785 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 5797; kvend = 3830; length = 9830
2009-08-04 17:16:05,728 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 127
2009-08-04 17:16:20,883 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true

2009-08-04 17:16:20,883 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 2580856; bufend = 1000978; bufvoid = 2988448
2009-08-04 17:16:20,883 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 3830; kvend = 1863; length = 9830

2009-08-04 17:16:21,709 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 128
2009-08-04 17:16:24,461 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2009-08-04 17:16:24,461 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 1000978; bufend = 2246166; bufvoid = 2988448

2009-08-04 17:16:24,461 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 1863; kvend = 9727; length = 9830
2009-08-04 17:16:24,934 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 129
2009-08-04 17:16:40,853 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true

2009-08-04 17:16:40,853 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 2246166; bufend = 374750; bufvoid = 2988435
2009-08-04 17:16:40,853 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 9727; kvend = 7760; length = 9830

2009-08-04 17:16:43,740 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 130
2009-08-04 17:16:49,141 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2009-08-04 17:16:49,141 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 374750; bufend = 1653302; bufvoid = 2988448

2009-08-04 17:16:49,141 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 7760; kvend = 5793; length = 9830
2009-08-04 17:16:49,868 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 131
2009-08-04 17:17:04,902 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true

2009-08-04 17:17:04,902 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 1653302; bufend = 2882491; bufvoid = 2988448
2009-08-04 17:17:04,902 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 5793; kvend = 3826; length = 9830

2009-08-04 17:17:05,355 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 132

There's no more output after that last spill, since the task eventually got
killed after 10 minutes of nothing.

And in case it can help, here's a thread dump for one of those failed tasks:
2009-08-04 18:53:23,460 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200908041626_0001_m_000135_3: Task
attempt_200908041626_0001_m_000135_3 failed to report status for 602
seconds. Killing!
2009-08-04 18:53:23,470 INFO org.apache.hadoop.mapred.TaskTracker: Process
Thread Dump: lost task
31 active threads
Thread 1673 (process reaper):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    java.lang.UNIXProcess.waitForProcessExit(Native Method)
    java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
    java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
Thread 1672 (JVM Runner jvm_200908041626_0001_m_1938204689 spawned.):
  State: WAITING
  Blocked count: 1
  Waited count: 2
  Waiting on java.lang.UNIXProcess@155d434
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
    org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
    org.apache.hadoop.util.Shell.run(Shell.java:134)

org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)

org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)

org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
Thread 1671 (Thread-1040):
  State: WAITING
  Blocked count: 0
  Waited count: 1
  Waiting on java.lang.Object@1ba4552
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
Thread 1646 (process reaper):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    java.lang.UNIXProcess.waitForProcessExit(Native Method)
    java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
    java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
Thread 1645 (JVM Runner jvm_200908041626_0001_m_2100332869 spawned.):
  State: WAITING
  Blocked count: 1
  Waited count: 2
  Waiting on java.lang.UNIXProcess@5e34ce
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
    org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
    org.apache.hadoop.util.Shell.run(Shell.java:134)

org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)

org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)

org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
Thread 1644 (Thread-1023):
  State: WAITING
  Blocked count: 0
  Waited count: 1
  Waiting on java.lang.Object@1bee6f
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
Thread 143 (process reaper):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    java.lang.UNIXProcess.waitForProcessExit(Native Method)
    java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
    java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
Thread 141 (JVM Runner jvm_200908041626_0001_r_-836397802 spawned.):
  State: WAITING
  Blocked count: 1
  Waited count: 2
  Waiting on java.lang.UNIXProcess@11fceed
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
    org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
    org.apache.hadoop.util.Shell.run(Shell.java:134)

org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)

org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)

org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
Thread 140 (Thread-89):
  State: WAITING
  Blocked count: 0
  Waited count: 1
  Waiting on java.lang.Object@1c63791
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
Thread 33 (IPC Client (47) connection to localhost/127.0.0.1:9001 from
root):
  State: TIMED_WAITING
  Blocked count: 5563
  Waited count: 5552
  Stack:
    java.lang.Object.wait(Native Method)
    org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403)
    org.apache.hadoop.ipc.Client$Connection.run(Client.java:445)
Thread 32 (Directory/File cleanup thread):
  State: WAITING
  Blocked count: 0
  Waited count: 165
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@fdffb1
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)

org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run(CleanupQueue.java:89)
Thread 8 (taskCleanup):
  State: WAITING
  Blocked count: 0
  Waited count: 2
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@19c205b
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:352)
    java.lang.Thread.run(Thread.java:619)
Thread 31 (TaskLauncher for task):
  State: WAITING
  Blocked count: 4
  Waited count: 5
  Waiting on java.util.LinkedList@1d9d565
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)

org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1662)
Thread 30 (TaskLauncher for task):
  State: WAITING
  Blocked count: 783
  Waited count: 781
  Waiting on java.util.LinkedList@1fbebee
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)

org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1662)
Thread 29 (Map-events fetcher for all reduce tasks on
tracker_ip-10-244-145-237.ec2.internal:localhost/127.0.0.1:40135):
  State: TIMED_WAITING
  Blocked count: 10965
  Waited count: 10957
  Stack:
    java.lang.Object.wait(Native Method)

org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:671)
Thread 27 (IPC Server handler 7 on 40135):
  State: WAITING
  Blocked count: 5
  Waited count: 2565
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 26 (IPC Server handler 6 on 40135):
  State: WAITING
  Blocked count: 11
  Waited count: 2565
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 25 (IPC Server handler 5 on 40135):
  State: WAITING
  Blocked count: 11
  Waited count: 2566
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 24 (IPC Server handler 4 on 40135):
  State: WAITING
  Blocked count: 7
  Waited count: 2565
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 23 (IPC Server handler 3 on 40135):
  State: WAITING
  Blocked count: 10
  Waited count: 2565
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 22 (IPC Server handler 2 on 40135):
  State: WAITING
  Blocked count: 11
  Waited count: 2566
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 21 (IPC Server handler 1 on 40135):
  State: WAITING
  Blocked count: 15
  Waited count: 2567
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 20 (IPC Server handler 0 on 40135):
  State: WAITING
  Blocked count: 12
  Waited count: 2565
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 17 (IPC Server listener on 40135):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
    org.apache.hadoop.ipc.Server$Listener.run(Server.java:318)
Thread 19 (IPC Server Responder):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    org.apache.hadoop.ipc.Server$Responder.run(Server.java:478)
Thread 16 (Timer-0):
  State: TIMED_WAITING
  Blocked count: 1
  Waited count: 298
  Stack:
    java.lang.Object.wait(Native Method)
    java.util.TimerThread.mainLoop(Timer.java:509)
    java.util.TimerThread.run(Timer.java:462)
Thread 15 (11502424@qtp0-0 - Acceptor0 SelectChannelConnector@0.0.0.0:50060
):
  State: RUNNABLE
  Blocked count: 50
  Waited count: 1
  Stack:
    sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)

org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429)
    org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185)

org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)

org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)

org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
Thread 4 (Signal Dispatcher):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
Thread 3 (Finalizer):
  State: WAITING
  Blocked count: 424
  Waited count: 425
  Waiting on java.lang.ref.ReferenceQueue$Lock@7b6d1c
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
    java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
    java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
Thread 2 (Reference Handler):
  State: WAITING
  Blocked count: 445
  Waited count: 446
  Waiting on java.lang.ref.Reference$Lock@16ee240
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
Thread 1 (main):
  State: RUNNABLE
  Blocked count: 2973
  Waited count: 5789
  Stack:
    sun.management.ThreadImpl.getThreadInfo0(Native Method)
    sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:147)
    sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:123)

org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)

org.apache.hadoop.util.ReflectionUtils.logThreadInfo(ReflectionUtils.java:203)

org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks(TaskTracker.java:1383)
    org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1129)
    org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1779)
    org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2881)

2009-08-04 18:53:23,471 INFO org.apache.hadoop.mapred.TaskTracker: About to
purge task: attempt_200908041626_0001_m_000135_3
2009-08-04 18:53:23,471 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 3
2009-08-04 18:53:23,471 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_200908041626_0001_m_000135_3 done; removing files.
2009-08-04 18:53:23,550 INFO org.apache.hadoop.mapred.IndexCache: Map ID
attempt_200908041626_0001_m_000135_3 not found in cache


Thanks in advance for any response!
I'm afraid I'm somewhat clueless about this.

Mathias

Re: Some tasks fail to report status between the end of the map and the beginning of the merge

Posted by Mathias De Maré <ma...@gmail.com>.

2009/8/12 Mathias De Maré <ma...@gmail.com>

> Thank you, that's very useful.
> In addition, I changed the way the tasks work, so they store their data in
> HBase now (since it's more suited for handling small files).
> I'm not 100% sure yet if the problems have been resolved (still doing
> extensive testing), but I think I might have gotten rid of them (and I'll
> add the 'skipping records' option in case I do get a failure).
>


Hi,

I can get everything to 'run' successfully now, but there are still some
tasks that crash.

I was thinking perhaps my Writable class is the issue, so I'll just post it
here. Does anyone notice anything that could cause a hang? In particular the
readFields and write methods could perhaps be the reason (but I just don't
see it).

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;

/**
 * Contains information on a URL, which other URLs link to it and if it has
been crawled previously.
 * @author mathias
 */
public class URLInfo implements Writable, WritableComparable {
    String url;
    Text[] linkedfrom;
    int urlStatus;
    int seconds;

    public URLInfo() {
        url = "";
        linkedfrom = new Text[0];
        urlStatus = Constants.URL_NEW;
        seconds = 1;
    }

    /**
     *
     * @param url
     * @param linkedfrom Must only contains domain names, nothing appended
     * @param urlStatus
     */
    public URLInfo(String url, Text[] linkedfrom, int urlStatus, int
seconds) {
        this.url = url;
        this.linkedfrom = linkedfrom;
        this.urlStatus = urlStatus;
        this.seconds = seconds;
    }

    public void write(DataOutput out) throws IOException {
        new Text(url).write(out);
        new ArrayWritable(Text.class, linkedfrom).write(out);
        new IntWritable(urlStatus).write(out);
        new IntWritable(seconds).write(out);
    }

    public void readFields(DataInput in) throws IOException {
        url = Text.readString(in);
        ArrayWritable aw = new ArrayWritable(Text.class);
        aw.readFields(in);
        Writable[] linkedfromWritable = aw.get();
        linkedfrom = new Text[linkedfromWritable.length];
        for(int i=0; i<linkedfromWritable.length; i++) {
            linkedfrom[i] = (Text) linkedfromWritable[i];
        }
        IntWritable iw = new IntWritable();
        iw.readFields(in);
        urlStatus = iw.get();
        IntWritable iw2 = new IntWritable();
        iw2.readFields(in);
        seconds = iw2.get();
    }

    public int compareTo(Object o) {
        return url.compareToIgnoreCase(((URLInfo) o).url);
    }

    public void setURLStatus(int urlStatus) {
        this.urlStatus = urlStatus;
    }

    public int getURLStatus() {
        return urlStatus;
    }

    public void setLinkedFrom(Text[] linkedfrom) {
        this.linkedfrom = linkedfrom;
    }

    public Text[] getLinkedFrom() {
        return linkedfrom;
    }

    public String getURL() {
        return new String(url);
    }

    public int getSeconds() {
        return seconds;
    }

    public void setSeconds(int seconds) {
        this.seconds = seconds;
    }

    @Override
    public String toString() {
        return new String(url);
    }

    @Override
    public boolean equals(Object obj) {
        if(!(obj instanceof URLInfo)) {
            return false;
        }
        URLInfo urlObject = (URLInfo) obj;
        return this.getURL().equals(urlObject.getURL());
    }

}

Re: Some tasks fail to report status between the end of the map and the beginning of the merge

Posted by Mathias De Maré <ma...@gmail.com>.

Thank you, that's very useful.
In addition, I changed the way the tasks work, so they store their data in
HBase now (since it's more suited for handling small files).
I'm not 100% sure yet if the problems have been resolved (still doing
extensive testing), but I think I might have gotten rid of them (and I'll
add the 'skipping records' option in case I do get a failure).

Mathias

On Mon, Aug 10, 2009 at 5:46 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:

> > but I didn't find a config option
> > that allows ignoring tasks that fail.
> >
> If 0.18,
>
> http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/mapred/Jo
> bConf.html#setMaxMapTaskFailuresPercent(int)<http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/mapred/Jo%0AbConf.html#setMaxMapTaskFailuresPercent%28int%29>
> (mapred.max.map.failures.percent)
>
>
>
> http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/mapred/Jo
> bConf.html#setMaxReduceTaskFailuresPercent(int)<http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/mapred/Jo%0AbConf.html#setMaxReduceTaskFailuresPercent%28int%29>
> (mapred.max.reduce.failures.percent)
>
>
> If 0.19 or later, you can also try skipping records.
>
>
> Koji
>

Re: Some tasks fail to report status between the end of the map and the beginning of the merge

Posted by Koji Noguchi <kn...@yahoo-inc.com>.

> but I didn't find a config option
> that allows ignoring tasks that fail.
>
If 0.18, 
http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/mapred/Jo
bConf.html#setMaxMapTaskFailuresPercent(int)
(mapred.max.map.failures.percent)


http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/mapred/Jo
bConf.html#setMaxReduceTaskFailuresPercent(int)
(mapred.max.reduce.failures.percent)


If 0.19 or later, you can also try skipping records.


Koji



On 8/9/09 2:18 AM, "Mathias De Maré" <ma...@gmail.com> wrote:

> I changed the maximum split size to 30000, and now most tasks actually
> succeed.
> However, I still have the failure problem with some tasks (with a job I was
> running yesterday, I got a failure after 1900 tasks).
> The problem is that these very few failures can bring down the entire job,
> as they sometimes seem to just keep failing.
> I looked through the mapred-default.xml, but I didn't find a config option
> that allows ignoring tasks that fail. Is there a way to do this (it seems
> like the only alternative I have, since I can't make the failures stop)?
> 
> Mathias
> 
> 2009/8/5 Mathias De Maré <ma...@gmail.com>
> 
>> 
>> On Wed, Aug 5, 2009 at 9:38 AM, Jothi Padmanabhan
>> <jo...@yahoo-inc.com>wrote:
>>> Hi,
>>> 
>>> Could you please try setting this parameter
>>> mapred.merge.recordsBeforeProgress to a lower number?
>>> See https://issues.apache.org/jira/browse/HADOOP-4714
>>> 
>>> Cheers
>>> Jothi
>> 
>> 
>> Hm, that bug looks like it's applicable during the merge, but my case is a
>> block right before the merge (but seemingly right after all of the map tasks
>> finish).
>> I tried putting mapred.merge.recordsBeforeProgress to 100, and it didn't
>> make a difference.
>> 
>> On Wed, Aug 5, 2009 at 10:32 AM, Amogh Vasekar <am...@yahoo-inc.com>wrote:
>> 
>>> 10 mins reminds me of parameter mapred.task.timeout . This is
>>> configurable. Or alternatively you might just do a sysout to let tracker
>>> know of its existence ( not an ideal solution though )
>>> 
>>> Thanks,
>>> Amogh
>> 
>> 
>> Well, the map tasks take around 30 minutes to run. Letting the task idle
>> for a large number of minutes after that is a lot of useless time, imho. I
>> tried with 20 minutes now, but I still get timeouts.
>> 
>> I don't know if it's useful, but here are the settings of the map tasks at
>> the moment:
>> 
>> <configuration>
>>   <property>
>>     <name>mapred.job.tracker</name>
>>     <value>localhost:9001</value>
>>   </property>
>>   <property>
>>     <name>io.sort.mb</name>
>>     <value>3</value>
>>     <description>The total amount of buffer memory to use while sorting
>>     files, in megabytes.  By default, gives each merge stream 1MB, which
>>     should minimize seeks.</description>
>>   </property>
>> <property>
>>   <name>mapred.tasktracker.map.tasks.maximum</name>
>>   <value>4</value>
>>   <description>The maximum number of map tasks that will be run
>>   simultaneously by a task tracker.
>>   </description>
>> </property>
>> 
>> <property>
>>   <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>   <value>4</value>
>>   <description>The maximum number of reduce tasks that will be run
>>   simultaneously by a task tracker.
>>   </description>
>> </property>
>> 
>> <property>
>>   <name>mapred.max.split.size</name>
>>   <value>1000000</value>
>> </property>
>> 
>> <property>
>>   <name>mapred.child.java.opts</name>
>>   <value>-Xmx400m</value>
>> </property>
>> 
>> <property>
>> <name>mapred.merge.recordsBeforeProgress</name>
>> <value>100</value>
>> </property>
>> 
>> <property>
>> <name>mapred.task.timeout</name>
>> <value>1200000</value>
>> </property>
>> 
>> </configuration>
>> 
>> Ideally, I would want to get rid of the delay that causes the timeouts, yet
>> also increase the split size somewhat (though I think a larger split size
>> would increase the delay even more?).
>> The map tasks take around 8000-11000 records as input, and can produce up
>> to 1 000 000 records as output (in case this is relevant).
>> 
>> Mathias
>> 
>>

Re: Some tasks fail to report status between the end of the map and the beginning of the merge

Posted by Mathias De Maré <ma...@gmail.com>.

I changed the maximum split size to 30000, and now most tasks actually
succeed.
However, I still have the failure problem with some tasks (with a job I was
running yesterday, I got a failure after 1900 tasks).
The problem is that these very few failures can bring down the entire job,
as they sometimes seem to just keep failing.
I looked through the mapred-default.xml, but I didn't find a config option
that allows ignoring tasks that fail. Is there a way to do this (it seems
like the only alternative I have, since I can't make the failures stop)?

Mathias

2009/8/5 Mathias De Maré <ma...@gmail.com>

>
> On Wed, Aug 5, 2009 at 9:38 AM, Jothi Padmanabhan <jo...@yahoo-inc.com>wrote:
>> Hi,
>>
>> Could you please try setting this parameter
>> mapred.merge.recordsBeforeProgress to a lower number?
>> See https://issues.apache.org/jira/browse/HADOOP-4714
>>
>> Cheers
>> Jothi
>
>
> Hm, that bug looks like it's applicable during the merge, but my case is a
> block right before the merge (but seemingly right after all of the map tasks
> finish).
> I tried putting mapred.merge.recordsBeforeProgress to 100, and it didn't
> make a difference.
>
> On Wed, Aug 5, 2009 at 10:32 AM, Amogh Vasekar <am...@yahoo-inc.com>wrote:
>
>> 10 mins reminds me of parameter mapred.task.timeout . This is
>> configurable. Or alternatively you might just do a sysout to let tracker
>> know of its existence ( not an ideal solution though )
>>
>> Thanks,
>> Amogh
>
>
> Well, the map tasks take around 30 minutes to run. Letting the task idle
> for a large number of minutes after that is a lot of useless time, imho. I
> tried with 20 minutes now, but I still get timeouts.
>
> I don't know if it's useful, but here are the settings of the map tasks at
> the moment:
>
> <configuration>
>   <property>
>     <name>mapred.job.tracker</name>
>     <value>localhost:9001</value>
>   </property>
>   <property>
>     <name>io.sort.mb</name>
>     <value>3</value>
>     <description>The total amount of buffer memory to use while sorting
>     files, in megabytes.  By default, gives each merge stream 1MB, which
>     should minimize seeks.</description>
>   </property>
> <property>
>   <name>mapred.tasktracker.map.tasks.maximum</name>
>   <value>4</value>
>   <description>The maximum number of map tasks that will be run
>   simultaneously by a task tracker.
>   </description>
> </property>
>
> <property>
>   <name>mapred.tasktracker.reduce.tasks.maximum</name>
>   <value>4</value>
>   <description>The maximum number of reduce tasks that will be run
>   simultaneously by a task tracker.
>   </description>
> </property>
>
> <property>
>   <name>mapred.max.split.size</name>
>   <value>1000000</value>
> </property>
>
> <property>
>   <name>mapred.child.java.opts</name>
>   <value>-Xmx400m</value>
> </property>
>
> <property>
> <name>mapred.merge.recordsBeforeProgress</name>
> <value>100</value>
> </property>
>
> <property>
> <name>mapred.task.timeout</name>
> <value>1200000</value>
> </property>
>
> </configuration>
>
> Ideally, I would want to get rid of the delay that causes the timeouts, yet
> also increase the split size somewhat (though I think a larger split size
> would increase the delay even more?).
> The map tasks take around 8000-11000 records as input, and can produce up
> to 1 000 000 records as output (in case this is relevant).
>
> Mathias
>
>

Re: Some tasks fail to report status between the end of the map and the beginning of the merge

Posted by Mathias De Maré <ma...@gmail.com>.

> On Wed, Aug 5, 2009 at 9:38 AM, Jothi Padmanabhan <jo...@yahoo-inc.com>wrote:
> Hi,
>
> Could you please try setting this parameter
> mapred.merge.recordsBeforeProgress to a lower number?
> See https://issues.apache.org/jira/browse/HADOOP-4714
>
> Cheers
> Jothi

Hm, that bug looks like it's applicable during the merge, but my case is a
block right before the merge (but seemingly right after all of the map tasks
finish).
I tried putting mapred.merge.recordsBeforeProgress to 100, and it didn't
make a difference.

On Wed, Aug 5, 2009 at 10:32 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:

> 10 mins reminds me of parameter mapred.task.timeout . This is configurable.
> Or alternatively you might just do a sysout to let tracker know of its
> existence ( not an ideal solution though )
>
> Thanks,
> Amogh

Well, the map tasks take around 30 minutes to run. Letting the task idle for
a large number of minutes after that is a lot of useless time, imho. I tried
with 20 minutes now, but I still get timeouts.

I don't know if it's useful, but here are the settings of the map tasks at
the moment:

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>
  <property>
    <name>io.sort.mb</name>
    <value>3</value>
    <description>The total amount of buffer memory to use while sorting
    files, in megabytes.  By default, gives each merge stream 1MB, which
    should minimize seeks.</description>
  </property>
<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>4</value>
  <description>The maximum number of map tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>

<property>
  <name>mapred.tasktracker.reduce.tasks.maximum</name>
  <value>4</value>
  <description>The maximum number of reduce tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>

<property>
  <name>mapred.max.split.size</name>
  <value>1000000</value>
</property>

<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx400m</value>
</property>

<property>
<name>mapred.merge.recordsBeforeProgress</name>
<value>100</value>
</property>

<property>
<name>mapred.task.timeout</name>
<value>1200000</value>
</property>

</configuration>

Ideally, I would want to get rid of the delay that causes the timeouts, yet
also increase the split size somewhat (though I think a larger split size
would increase the delay even more?).
The map tasks take around 8000-11000 records as input, and can produce up to
1 000 000 records as output (in case this is relevant).

Mathias

RE: Some tasks fail to report status between the end of the map and the beginning of the merge

Posted by Amogh Vasekar <am...@yahoo-inc.com>.

10 mins reminds me of parameter mapred.task.timeout . This is configurable. Or alternatively you might just do a sysout to let tracker know of its existence ( not an ideal solution though )

Thanks,
Amogh

-----Original Message-----
From: Mathias De Maré [mailto:mathias.demare@gmail.com]
Sent: Wednesday, August 05, 2009 12:33 PM
To: common-user@hadoop.apache.org
Subject: Some tasks fail to report status between the end of the map and the beginning of the merge

Hi,

I'm having some problems (Hadoop 0.20.0) where map tasks fail to report
status for 10 minutes and get killed eventually. All of the tasks output
around the same amount of data, some only take a few seconds before starting
the 'merge' on the segments, but some seem to fail by just stopping to work
for about 10 minutes.

Several of these failed tasks eventually do succeed, on their 2nd or 3rd
task attempt. It's nice to see them succeed eventually, but each of those
tasks crawls a few thousand websites, and it seems like a terrible waste to
let them retry a few times, in the meantime downloading all of those
websites again, after 10 minutes of doing nothing.

Even more annoyingly, eventually, one of the tasks fails completely, which
then kills the entire job.

I could probably increase the amount of task attempts and simply hope the
tasks will succeed eventually, but that doesn't solve the huge slowdowns and
the recrawling required.

Here's an example of a successful task attempt (this is attempt 3 of a
specific task -- note that it takes around 8 seconds between spill 133 and
spill 134):

2009-08-04 18:38:48,059 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 129

2009-08-04 18:39:00,626 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2009-08-04 18:39:00,626 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 2492163; bufend = 628913; bufvoid = 2988446

2009-08-04 18:39:00,626 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 9727; kvend = 7760; length = 9830
2009-08-04 18:39:01,467 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 130
2009-08-04 18:39:08,136 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true

2009-08-04 18:39:08,136 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 628913; bufend = 1880222; bufvoid = 2988448
2009-08-04 18:39:08,136 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 7760; kvend = 5793; length = 9830

2009-08-04 18:39:08,463 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 131
2009-08-04 18:39:12,456 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2009-08-04 18:39:12,459 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 1880222; bufend = 136018; bufvoid = 2988448

2009-08-04 18:39:12,459 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 5793; kvend = 3826; length = 9830
2009-08-04 18:39:12,697 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 132
2009-08-04 18:39:23,138 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true

2009-08-04 18:39:23,138 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 136018; bufend = 1347353; bufvoid = 2988448
2009-08-04 18:39:23,138 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 3826; kvend = 1859; length = 9830

2009-08-04 18:39:25,747 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 133
2009-08-04 18:47:49,823 INFO org.apache.hadoop.mapred.MapTask:
Starting flush of map output
2009-08-04 18:47:50,132 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 134

2009-08-04 18:47:50,525 INFO org.apache.hadoop.mapred.Merger: Merging
135 sorted segments
2009-08-04 18:47:50,528 INFO org.apache.hadoop.mapred.Merger: Merging
9 intermediate segments out of a total of 135
2009-08-04 18:47:52,224 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 127

2009-08-04 18:47:53,837 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 118
2009-08-04 18:47:55,417 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 109

2009-08-04 18:47:56,990 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 100
2009-08-04 18:47:58,492 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 91

2009-08-04 18:48:00,191 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 82
2009-08-04 18:48:02,315 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 73

2009-08-04 18:48:04,184 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 64
2009-08-04 18:48:06,162 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 55

2009-08-04 18:48:08,149 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 46
2009-08-04 18:48:09,888 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 37

2009-08-04 18:48:11,744 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 28
2009-08-04 18:48:13,544 INFO org.apache.hadoop.mapred.Merger: Merging
10 intermediate segments out of a total of 19

2009-08-04 18:48:21,177 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 10 segments left of total size: 153426836
bytes
2009-08-04 18:48:46,226 INFO org.apache.hadoop.mapred.TaskRunner:
Task:attempt_200908041626_0001_m_000133_2 is done. And is in the
process of commiting

2009-08-04 18:48:46,373 INFO org.apache.hadoop.mapred.TaskRunner: Task
'attempt_200908041626_0001_m_000133_2' done.

And here the same task, but an earlier attempt (it only goes up to spill
132, but those do vary slightly in size, right?):

2009-08-04 17:15:49,112 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 321905; bufend = 1464691; bufvoid = 2988448
2009-08-04 17:15:49,112 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 7764; kvend = 5797; length = 9830

2009-08-04 17:15:49,910 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 126
2009-08-04 17:16:00,785 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2009-08-04 17:16:00,785 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 1464691; bufend = 2580856; bufvoid = 2988448

2009-08-04 17:16:00,785 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 5797; kvend = 3830; length = 9830
2009-08-04 17:16:05,728 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 127
2009-08-04 17:16:20,883 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true

2009-08-04 17:16:20,883 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 2580856; bufend = 1000978; bufvoid = 2988448
2009-08-04 17:16:20,883 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 3830; kvend = 1863; length = 9830

2009-08-04 17:16:21,709 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 128
2009-08-04 17:16:24,461 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2009-08-04 17:16:24,461 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 1000978; bufend = 2246166; bufvoid = 2988448

2009-08-04 17:16:24,461 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 1863; kvend = 9727; length = 9830
2009-08-04 17:16:24,934 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 129
2009-08-04 17:16:40,853 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true

2009-08-04 17:16:40,853 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 2246166; bufend = 374750; bufvoid = 2988435
2009-08-04 17:16:40,853 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 9727; kvend = 7760; length = 9830

2009-08-04 17:16:43,740 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 130
2009-08-04 17:16:49,141 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true
2009-08-04 17:16:49,141 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 374750; bufend = 1653302; bufvoid = 2988448

2009-08-04 17:16:49,141 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 7760; kvend = 5793; length = 9830
2009-08-04 17:16:49,868 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 131
2009-08-04 17:17:04,902 INFO org.apache.hadoop.mapred.MapTask:
Spilling map output: record full = true

2009-08-04 17:17:04,902 INFO org.apache.hadoop.mapred.MapTask:
bufstart = 1653302; bufend = 2882491; bufvoid = 2988448
2009-08-04 17:17:04,902 INFO org.apache.hadoop.mapred.MapTask: kvstart
= 5793; kvend = 3826; length = 9830

2009-08-04 17:17:05,355 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 132

There's no more output after that last spill, since the task eventually got
killed after 10 minutes of nothing.

And in case it can help, here's a thread dump for one of those failed tasks:
2009-08-04 18:53:23,460 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200908041626_0001_m_000135_3: Task
attempt_200908041626_0001_m_000135_3 failed to report status for 602
seconds. Killing!
2009-08-04 18:53:23,470 INFO org.apache.hadoop.mapred.TaskTracker: Process
Thread Dump: lost task
31 active threads
Thread 1673 (process reaper):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    java.lang.UNIXProcess.waitForProcessExit(Native Method)
    java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
    java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
Thread 1672 (JVM Runner jvm_200908041626_0001_m_1938204689 spawned.):
  State: WAITING
  Blocked count: 1
  Waited count: 2
  Waiting on java.lang.UNIXProcess@155d434
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
    org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
    org.apache.hadoop.util.Shell.run(Shell.java:134)

org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)

org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)

org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
Thread 1671 (Thread-1040):
  State: WAITING
  Blocked count: 0
  Waited count: 1
  Waiting on java.lang.Object@1ba4552
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
Thread 1646 (process reaper):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    java.lang.UNIXProcess.waitForProcessExit(Native Method)
    java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
    java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
Thread 1645 (JVM Runner jvm_200908041626_0001_m_2100332869 spawned.):
  State: WAITING
  Blocked count: 1
  Waited count: 2
  Waiting on java.lang.UNIXProcess@5e34ce
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
    org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
    org.apache.hadoop.util.Shell.run(Shell.java:134)

org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)

org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)

org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
Thread 1644 (Thread-1023):
  State: WAITING
  Blocked count: 0
  Waited count: 1
  Waiting on java.lang.Object@1bee6f
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
Thread 143 (process reaper):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    java.lang.UNIXProcess.waitForProcessExit(Native Method)
    java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
    java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
Thread 141 (JVM Runner jvm_200908041626_0001_r_-836397802 spawned.):
  State: WAITING
  Blocked count: 1
  Waited count: 2
  Waiting on java.lang.UNIXProcess@11fceed
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
    org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
    org.apache.hadoop.util.Shell.run(Shell.java:134)

org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)

org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)

org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
Thread 140 (Thread-89):
  State: WAITING
  Blocked count: 0
  Waited count: 1
  Waiting on java.lang.Object@1c63791
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
Thread 33 (IPC Client (47) connection to localhost/127.0.0.1:9001 from
root):
  State: TIMED_WAITING
  Blocked count: 5563
  Waited count: 5552
  Stack:
    java.lang.Object.wait(Native Method)
    org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403)
    org.apache.hadoop.ipc.Client$Connection.run(Client.java:445)
Thread 32 (Directory/File cleanup thread):
  State: WAITING
  Blocked count: 0
  Waited count: 165
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@fdffb1
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)

org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run(CleanupQueue.java:89)
Thread 8 (taskCleanup):
  State: WAITING
  Blocked count: 0
  Waited count: 2
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@19c205b
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:352)
    java.lang.Thread.run(Thread.java:619)
Thread 31 (TaskLauncher for task):
  State: WAITING
  Blocked count: 4
  Waited count: 5
  Waiting on java.util.LinkedList@1d9d565
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)

org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1662)
Thread 30 (TaskLauncher for task):
  State: WAITING
  Blocked count: 783
  Waited count: 781
  Waiting on java.util.LinkedList@1fbebee
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)

org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1662)
Thread 29 (Map-events fetcher for all reduce tasks on
tracker_ip-10-244-145-237.ec2.internal:localhost/127.0.0.1:40135):
  State: TIMED_WAITING
  Blocked count: 10965
  Waited count: 10957
  Stack:
    java.lang.Object.wait(Native Method)

org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:671)
Thread 27 (IPC Server handler 7 on 40135):
  State: WAITING
  Blocked count: 5
  Waited count: 2565
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 26 (IPC Server handler 6 on 40135):
  State: WAITING
  Blocked count: 11
  Waited count: 2565
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 25 (IPC Server handler 5 on 40135):
  State: WAITING
  Blocked count: 11
  Waited count: 2566
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 24 (IPC Server handler 4 on 40135):
  State: WAITING
  Blocked count: 7
  Waited count: 2565
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 23 (IPC Server handler 3 on 40135):
  State: WAITING
  Blocked count: 10
  Waited count: 2565
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 22 (IPC Server handler 2 on 40135):
  State: WAITING
  Blocked count: 11
  Waited count: 2566
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 21 (IPC Server handler 1 on 40135):
  State: WAITING
  Blocked count: 15
  Waited count: 2567
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 20 (IPC Server handler 0 on 40135):
  State: WAITING
  Blocked count: 12
  Waited count: 2565
  Waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da690
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
Thread 17 (IPC Server listener on 40135):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
    org.apache.hadoop.ipc.Server$Listener.run(Server.java:318)
Thread 19 (IPC Server Responder):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    org.apache.hadoop.ipc.Server$Responder.run(Server.java:478)
Thread 16 (Timer-0):
  State: TIMED_WAITING
  Blocked count: 1
  Waited count: 298
  Stack:
    java.lang.Object.wait(Native Method)
    java.util.TimerThread.mainLoop(Timer.java:509)
    java.util.TimerThread.run(Timer.java:462)
Thread 15 (11502424@qtp0-0 - Acceptor0 SelectChannelConnector@0.0.0.0:50060
):
  State: RUNNABLE
  Blocked count: 50
  Waited count: 1
  Stack:
    sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)

org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429)
    org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185)

org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)

org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)

org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
Thread 4 (Signal Dispatcher):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
Thread 3 (Finalizer):
  State: WAITING
  Blocked count: 424
  Waited count: 425
  Waiting on java.lang.ref.ReferenceQueue$Lock@7b6d1c
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
    java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
    java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
Thread 2 (Reference Handler):
  State: WAITING
  Blocked count: 445
  Waited count: 446
  Waiting on java.lang.ref.Reference$Lock@16ee240
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
Thread 1 (main):
  State: RUNNABLE
  Blocked count: 2973
  Waited count: 5789
  Stack:
    sun.management.ThreadImpl.getThreadInfo0(Native Method)
    sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:147)
    sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:123)

org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)

org.apache.hadoop.util.ReflectionUtils.logThreadInfo(ReflectionUtils.java:203)

org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks(TaskTracker.java:1383)
    org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1129)
    org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1779)
    org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2881)

2009-08-04 18:53:23,471 INFO org.apache.hadoop.mapred.TaskTracker: About to
purge task: attempt_200908041626_0001_m_000135_3
2009-08-04 18:53:23,471 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 3
2009-08-04 18:53:23,471 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_200908041626_0001_m_000135_3 done; removing files.
2009-08-04 18:53:23,550 INFO org.apache.hadoop.mapred.IndexCache: Map ID
attempt_200908041626_0001_m_000135_3 not found in cache


Thanks in advance for any response!
I'm afraid I'm somewhat clueless about this.

Mathias