You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by Yuesheng Hu <yu...@gmail.com> on 2012/10/16 08:59:20 UTC

Filesystem closed exception

Hi, Thomas

     When I test K-mean with cache enabled, the Filesystem closed exception
raised when the  input size became to  about 6GB, our cluster is:
     10 node (1 master, 9 slaves), 5 tasks/node, 1000MB RAM per task, I
think the cluster is power enough to handle this input size.
     but it failed, the log is :
12/10/11 10:05:17 INFO bsp.FileInputFormat: Total input paths to process :
45
12/10/11 10:05:18 INFO bsp.BSPJobClient: Running job: job_201210111001_0003
12/10/11 10:05:21 INFO bsp.BSPJobClient: Current supersteps number: 0
12/10/11 12:01:47 INFO bsp.BSPJobClient: Current supersteps number: 1
12/10/11 13:48:33 INFO bsp.BSPJobClient: Current supersteps number: 2
12/10/11 15:26:48 INFO bsp.BSPJobClient: Current supersteps number: 3
12/10/11 17:05:12 INFO bsp.BSPJobClient: Current supersteps number: 4
12/10/11 18:45:12 INFO bsp.BSPJobClient: Current supersteps number: 5
attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO bsp.BSPPeerImpl:
Moving to local cache files: INITIALLY IT WAS: null
attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO
sync.ZKSyncClient: Initializing ZK Sync Client
attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO
sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At datanode09/
192.168.1.219:61001
attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 ERROR
sync.ZooKeeperSyncClientImpl:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
NoNode for /bsp/job_201210111001_0003/peers
attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server:
Starting SocketReader
attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC
Server Responder: starting
attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO
message.HadoopMessageManagerImpl: BSPPeer address:datanode09 port:61001
attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC
Server listener on 61001: starting
attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC
Server handler 0 on 61001: starting
attempt_201210111001_0003_000004_0: 12/10/11 18:45:47 INFO ml.KMeansBSP:
Finished! Writing the assignments...
attempt_201210111001_0003_000004_0: 12/10/11 18:46:29 ERROR bsp.BSPTask:
Error running bsp setup and bsp function.
attempt_201210111001_0003_000004_0: java.io.IOException: Filesystem closed
attempt_201210111001_0003_000004_0: at
org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
attempt_201210111001_0003_000004_0: at
org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
attempt_201210111001_0003_000004_0: at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2213)
attempt_201210111001_0003_000004_0: at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2152)
attempt_201210111001_0003_000004_0: at
java.io.DataInputStream.readInt(DataInputStream.java:370)
attempt_201210111001_0003_000004_0: at
org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1953)
attempt_201210111001_0003_000004_0: at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1983)
attempt_201210111001_0003_000004_0: at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2120)
attempt_201210111001_0003_000004_0: at
org.apache.hama.bsp.SequenceFileRecordReader.next(SequenceFileRecordReader.java:85)
attempt_201210111001_0003_000004_0: at
org.apache.hama.bsp.TrackedRecordReader.moveToNext(TrackedRecordReader.java:63)
attempt_201210111001_0003_000004_0: at
org.apache.hama.bsp.TrackedRecordReader.next(TrackedRecordReader.java:49)
attempt_201210111001_0003_000004_0: at
org.apache.hama.bsp.BSPPeerImpl.readNext(BSPPeerImpl.java:630)
attempt_201210111001_0003_000004_0: at
org.apache.hama.ml.KMeansBSP.recalculateAssignmentsAndWrite(KMeansBSP.java:269)
attempt_201210111001_0003_000004_0: at
org.apache.hama.ml.KMeansBSP.bsp(KMeansBSP.java:142)
attempt_201210111001_0003_000004_0: at
org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166)
attempt_201210111001_0003_000004_0: at
org.apache.hama.bsp.BSPTask.run(BSPTask.java:143)
attempt_201210111001_0003_000004_0: at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1271)
12/10/11 18:45:54 INFO bsp.BSPJobClient: Job failed.

What happened?

Re: Filesystem closed exception

Posted by Thomas Jungblut <th...@gmail.com>.
Your Datanode is overloaded, try to profile it and check the heapsize of
your namenode and your datanodes.

2012/10/16 Yuesheng Hu <yu...@gmail.com>

> Hi, Thomas
>
>      When I test K-mean with cache enabled, the Filesystem closed exception
> raised when the  input size became to  about 6GB, our cluster is:
>      10 node (1 master, 9 slaves), 5 tasks/node, 1000MB RAM per task, I
> think the cluster is power enough to handle this input size.
>      but it failed, the log is :
> 12/10/11 10:05:17 INFO bsp.FileInputFormat: Total input paths to process :
> 45
> 12/10/11 10:05:18 INFO bsp.BSPJobClient: Running job: job_201210111001_0003
> 12/10/11 10:05:21 INFO bsp.BSPJobClient: Current supersteps number: 0
> 12/10/11 12:01:47 INFO bsp.BSPJobClient: Current supersteps number: 1
> 12/10/11 13:48:33 INFO bsp.BSPJobClient: Current supersteps number: 2
> 12/10/11 15:26:48 INFO bsp.BSPJobClient: Current supersteps number: 3
> 12/10/11 17:05:12 INFO bsp.BSPJobClient: Current supersteps number: 4
> 12/10/11 18:45:12 INFO bsp.BSPJobClient: Current supersteps number: 5
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO bsp.BSPPeerImpl:
> Moving to local cache files: INITIALLY IT WAS: null
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO
> sync.ZKSyncClient: Initializing ZK Sync Client
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 INFO
> sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At datanode09/
> 192.168.1.219:61001
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:00 ERROR
> sync.ZooKeeperSyncClientImpl:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for /bsp/job_201210111001_0003/peers
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server:
> Starting SocketReader
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC
> Server Responder: starting
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO
> message.HadoopMessageManagerImpl: BSPPeer address:datanode09 port:61001
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC
> Server listener on 61001: starting
> attempt_201210111001_0003_000004_0: 12/10/11 10:06:01 INFO ipc.Server: IPC
> Server handler 0 on 61001: starting
> attempt_201210111001_0003_000004_0: 12/10/11 18:45:47 INFO ml.KMeansBSP:
> Finished! Writing the assignments...
> attempt_201210111001_0003_000004_0: 12/10/11 18:46:29 ERROR bsp.BSPTask:
> Error running bsp setup and bsp function.
> attempt_201210111001_0003_000004_0: java.io.IOException: Filesystem closed
> attempt_201210111001_0003_000004_0: at
> org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
> attempt_201210111001_0003_000004_0: at
> org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
> attempt_201210111001_0003_000004_0: at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2213)
> attempt_201210111001_0003_000004_0: at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2152)
> attempt_201210111001_0003_000004_0: at
> java.io.DataInputStream.readInt(DataInputStream.java:370)
> attempt_201210111001_0003_000004_0: at
>
> org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1953)
> attempt_201210111001_0003_000004_0: at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1983)
> attempt_201210111001_0003_000004_0: at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2120)
> attempt_201210111001_0003_000004_0: at
>
> org.apache.hama.bsp.SequenceFileRecordReader.next(SequenceFileRecordReader.java:85)
> attempt_201210111001_0003_000004_0: at
>
> org.apache.hama.bsp.TrackedRecordReader.moveToNext(TrackedRecordReader.java:63)
> attempt_201210111001_0003_000004_0: at
> org.apache.hama.bsp.TrackedRecordReader.next(TrackedRecordReader.java:49)
> attempt_201210111001_0003_000004_0: at
> org.apache.hama.bsp.BSPPeerImpl.readNext(BSPPeerImpl.java:630)
> attempt_201210111001_0003_000004_0: at
>
> org.apache.hama.ml.KMeansBSP.recalculateAssignmentsAndWrite(KMeansBSP.java:269)
> attempt_201210111001_0003_000004_0: at
> org.apache.hama.ml.KMeansBSP.bsp(KMeansBSP.java:142)
> attempt_201210111001_0003_000004_0: at
> org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166)
> attempt_201210111001_0003_000004_0: at
> org.apache.hama.bsp.BSPTask.run(BSPTask.java:143)
> attempt_201210111001_0003_000004_0: at
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1271)
> 12/10/11 18:45:54 INFO bsp.BSPJobClient: Job failed.
>
> What happened?
>