You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by AJ Chen <aj...@web2express.org> on 2010/08/19 19:30:49 UTC

indexing errors

I'm indexing 5M pages on a small/cheap cluster. there are some fatal errors
I try to understand.

1. no disk space error occurs on slave node even though there are still 30%
free space (>20GB) in hdfs partition. is it possible that disk requirement
may surge during nutch indexing?

2010-08-19 02:34:23,546 INFO  mapred.ReduceTask -
attempt_201008141418_0034_r_000004_2 Scheduled 1 outputs (1 slow hosts and0
dup hosts)
2010-08-19 02:34:24,191 ERROR mapred.ReduceTask - Task:
attempt_201008141418_0034_r_000004_2 - FSError:
org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
        at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:192)
        at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:104)
        at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleToDisk(ReduceTask.java:1620)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1416)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
Caused by: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:260)
        at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:190)
        ... 8 more

2. task failure errors. any idea what might cause it?

Task attempt_201008141418_0034_r_000005_0 failed to report status for 600
seconds. Killing!
Task attempt_201008141418_0034_r_000001_0 failed to report status for 600
seconds. Killing!
Task attempt_201008141418_0034_r_000002_0 failed to report status for 601
seconds. Killing!

2010-08-19 00:49:28,309 INFO  mapred.TaskTracker - Process Thread Dump: lost
task
28 active threads
Thread 12334 (IPC Client (47) connection to
vmo-crawl08-dev/10.1.1.60:9001from jboss):
  State: TIMED_WAITING
  Blocked count: 2498
  Waited count: 2498
  Stack:
    java.lang.Object.wait(Native Method)
    org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403)
    org.apache.hadoop.ipc.Client$Connection.run(Client.java:445)
Thread 11256 (process reaper):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    java.lang.UNIXProcess.waitForProcessExit(Native Method)
    java.lang.UNIXProcess.access$900(UNIXProcess.java:20)


thanks,
aj

Re: indexing errors

Posted by AJ Chen <aj...@web2express.org>.
I found more disk space is required during indexing. So, for slave node with
limited space, building smaller index, e.g. 2M pages instead of 10M pages,
can avoid the disk space error.

A related question: after crawling/indexing for some time, each slave node
accumulate lots of files (under hdfs/data/current and hdfs/mapreduce).
What's the correct way to recover the occupied disk space? I assume some of
these files are needed for communicating with master node.

thanks,
-aj

On Thu, Aug 19, 2010 at 10:30 AM, AJ Chen <aj...@web2express.org> wrote:

> I'm indexing 5M pages on a small/cheap cluster. there are some fatal errors
> I try to understand.
>
> 1. no disk space error occurs on slave node even though there are still 30%
> free space (>20GB) in hdfs partition. is it possible that disk requirement
> may surge during nutch indexing?
>
> 2010-08-19 02:34:23,546 INFO  mapred.ReduceTask -
> attempt_201008141418_0034_r_000004_2 Scheduled 1 outputs (1 slow hosts and0
> dup hosts)
> 2010-08-19 02:34:24,191 ERROR mapred.ReduceTask - Task:
> attempt_201008141418_0034_r_000004_2 - FSError:
> org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>         at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:192)
>         at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>         at
> java.io.BufferedOutputStream.write(BufferedOutputStream.java:104)
>         at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>         at java.io.DataOutputStream.write(DataOutputStream.java:90)
>         at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleToDisk(ReduceTask.java:1620)
>         at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1416)
>         at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
>         at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
> Caused by: java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method)
>         at java.io.FileOutputStream.write(FileOutputStream.java:260)
>         at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:190)
>         ... 8 more
>
> 2. task failure errors. any idea what might cause it?
>
> Task attempt_201008141418_0034_r_000005_0 failed to report status for 600
> seconds. Killing!
> Task attempt_201008141418_0034_r_000001_0 failed to report status for 600
> seconds. Killing!
> Task attempt_201008141418_0034_r_000002_0 failed to report status for 601
> seconds. Killing!
>
> 2010-08-19 00:49:28,309 INFO  mapred.TaskTracker - Process Thread Dump:
> lost task
> 28 active threads
> Thread 12334 (IPC Client (47) connection to vmo-crawl08-dev/10.1.1.60:9001from jboss):
>   State: TIMED_WAITING
>   Blocked count: 2498
>   Waited count: 2498
>   Stack:
>     java.lang.Object.wait(Native Method)
>     org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403)
>     org.apache.hadoop.ipc.Client$Connection.run(Client.java:445)
> Thread 11256 (process reaper):
>   State: RUNNABLE
>   Blocked count: 0
>   Waited count: 0
>   Stack:
>     java.lang.UNIXProcess.waitForProcessExit(Native Method)
>     java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
>
>
> thanks,
> aj
>
>


-- 
AJ Chen, PhD
Chair, Semantic Web SIG, sdforum.org
http://web2express.org
twitter @web2express
Palo Alto, CA, USA