You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Evert Lammerts <Ev...@sara.nl> on 2011/03/09 12:27:36 UTC

Could not obtain block

We see a lot of IOExceptions coming from HDFS during a job that does nothing but untar 100 files (1 per Mapper, sizes vary between 5GB and 80GB) that are in HDFS, to HDFS. DataNodes are also showing Exceptions that I think are related. (See stacktraces below.)

This job should not be able to overload the system I think... I realize that much data needs to go over the lines, but HDFS should still be responsive. Any ideas / help is much appreciated!

Some details:
* Hadoop 0.20.2 (CDH3b4)
* 5 node cluster plus 1 node for JT/NN (Sun Thumpers)
* 4 cores/node, 4GB RAM/core
* CentOS 5.5

Job output:

java.io.IOException: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
	at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:449)
	at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:1)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
	at org.apache.hadoop.mapred.Child.main(Child.java:234)
Caused by: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1977)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1784)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1932)
	at java.io.DataInputStream.read(DataInputStream.java:83)
	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:74)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:335)
	at ilps.DownloadICWSM$CopyThread.run(DownloadICWSM.java:149)


Example DataNode Exceptions (not that these come from the node at 192.168.28.211):

2011-03-08 19:40:40,297 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9222067946733189014_3798233 java.io.EOFException: while trying to read 3067064 bytes
2011-03-08 19:40:41,018 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.211:50050, dest: /192.168.28.211:49748, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201103071120_0030_m_000032_0, offset: 30
72, srvID: DS-568746059-145.100.2.180-50050-1291128670510, blockid: blk_3596618013242149887_4060598, duration: 2632000
2011-03-08 19:40:41,049 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221028436071074510_2325937 java.io.EOFException: while trying to read 2206400 bytes
2011-03-08 19:40:41,348 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221549395563181322_4024529 java.io.EOFException: while trying to read 3037288 bytes
2011-03-08 19:40:41,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221885906633018147_3895876 java.io.EOFException: while trying to read 1981952 bytes
2011-03-08 19:40:41,434 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221885906633018147_3895876 unfinalized and removed. 
2011-03-08 19:40:41,434 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9221885906633018147_3895876 received exception java.io.EOFException: while trying to read 1981952 bytes
2011-03-08 19:40:41,434 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 1981952 bytes
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
2011-03-08 19:40:41,465 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221549395563181322_4024529 unfinalized and removed. 
2011-03-08 19:40:41,466 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9221549395563181322_4024529 received exception java.io.EOFException: while trying to read 3037288 bytes
2011-03-08 19:40:41,466 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 3037288 bytes
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)

Cheers,

Evert Lammerts
Consultant eScience & Cloud Services
SARA Computing & Network Services 
Operations, Support & Development

Phone: +31 20 888 4101
Email: evert.lammerts@sara.nl
http://www.sara.nl



Re: Could not obtain block

Posted by elton sky <el...@gmail.com>.
>Caused by: java.io.IOException: Could not obtain block:
blk_-3695352030358969086_130839
file=/user/emeij/icwsm-data-test/01-26-?>SOCIAL_MEDIA.tar.gz
>       at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1977)



a question for you:
Does the exception always complain about "blk_-3695352030358969086_130839" ?
if so, you can try to find this block in your data node's local dir, just
make sure it does exist...


On Thu, Mar 10, 2011 at 3:09 AM, Evert Lammerts <Ev...@sara.nl>wrote:

> I didn't mention it but the complete filesystem is reported healthy by
> fsck. I'm guessing that the java.io.EOFException indicates a problem caused
> by the load of the job.
>
> Any ideas?
>
> ________________________________________
> From: Marcos Ortiz [mlortiz@uci.cu]
> Sent: Wednesday, March 09, 2011 4:31 PM
> To: mapreduce-user@hadoop.apache.org
> Cc: Evert Lammerts; 'hdfs-user@hadoop.apache.org'; cdh-user@cloudera.org
> Subject: Re: Could not obtain block
>
> El 3/9/2011 6:27 AM, Evert Lammerts escribió:
> > We see a lot of IOExceptions coming from HDFS during a job that does
> nothing but untar 100 files (1 per Mapper, sizes vary between 5GB and 80GB)
> that are in HDFS, to HDFS. DataNodes are also showing Exceptions that I
> think are related. (See stacktraces below.)
> >
> > This job should not be able to overload the system I think... I realize
> that much data needs to go over the lines, but HDFS should still be
> responsive. Any ideas / help is much appreciated!
> >
> > Some details:
> > * Hadoop 0.20.2 (CDH3b4)
> > * 5 node cluster plus 1 node for JT/NN (Sun Thumpers)
> > * 4 cores/node, 4GB RAM/core
> > * CentOS 5.5
> >
> > Job output:
> >
> > java.io.IOException: java.io.IOException: Could not obtain block:
> blk_-3695352030358969086_130839
> file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
> >
> Which is the ouput of:
>   bin/hadoop dfsadmin -report
>
> Which is the output of:
>   bin/hadoop fsck /user/emeij/icwsm-data-test/
> >       at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:449)
> >       at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:1)
> >       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390)
> >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
> >       at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> >       at java.security.AccessController.doPrivileged(Native Method)
> >       at javax.security.auth.Subject.doAs(Subject.java:396)
> >       at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >       at org.apache.hadoop.mapred.Child.main(Child.java:234)
> > Caused by: java.io.IOException: Could not obtain block:
> blk_-3695352030358969086_130839
> file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
> >
> Which is the ouput of:
>  bin/hadoop fsck /user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
> --files -blocks -racks
> >       at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1977)
> >       at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1784)
> >       at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1932)
> >       at java.io.DataInputStream.read(DataInputStream.java:83)
> >       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
> >       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:74)
> >       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:335)
> >       at ilps.DownloadICWSM$CopyThread.run(DownloadICWSM.java:149)
> >
> >
> > Example DataNode Exceptions (not that these come from the node at
> 192.168.28.211):
> >
> > 2011-03-08 19:40:40,297 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
> for block blk_-9222067946733189014_3798233 java.io.EOFException: while
> trying to read 3067064 bytes
> > 2011-03-08 19:40:41,018 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 192.168.28.211:50050, dest: /192.168.28.211:49748, bytes: 0, op:
> HDFS_READ, cliID: DFSClient_attempt_201103071120_0030_m_000032_0, offset: 30
> > 72, srvID: DS-568746059-145.100.2.180-50050-1291128670510, blockid:
> blk_3596618013242149887_4060598, duration: 2632000
> > 2011-03-08 19:40:41,049 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
> for block blk_-9221028436071074510_2325937 java.io.EOFException: while
> trying to read 2206400 bytes
> > 2011-03-08 19:40:41,348 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
> for block blk_-9221549395563181322_4024529 java.io.EOFException: while
> trying to read 3037288 bytes
> > 2011-03-08 19:40:41,357 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
> for block blk_-9221885906633018147_3895876 java.io.EOFException: while
> trying to read 1981952 bytes
> > 2011-03-08 19:40:41,434 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block
> blk_-9221885906633018147_3895876 unfinalized and removed.
> > 2011-03-08 19:40:41,434 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_-9221885906633018147_3895876 received exception java.io.EOFException:
> while trying to read 1981952 bytes
> > 2011-03-08 19:40:41,434 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.28.211:50050,
> storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075,
> ipcPort=50020):DataXceiver
> > java.io.EOFException: while trying to read 1981952 bytes
> >          at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
> >          at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
> >          at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
> >          at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
> >          at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
> >          at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
> > 2011-03-08 19:40:41,465 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block
> blk_-9221549395563181322_4024529 unfinalized and removed.
> > 2011-03-08 19:40:41,466 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_-9221549395563181322_4024529 received exception java.io.EOFException:
> while trying to read 3037288 bytes
> > 2011-03-08 19:40:41,466 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.28.211:50050,
> storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075,
> ipcPort=50020):DataXceiver
> > java.io.EOFException: while trying to read 3037288 bytes
> >          at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
> >          at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
> >          at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
> >          at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
> >          at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
> >          at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
> >
> > Cheers,
> >
> > Evert Lammerts
> > Consultant eScience&  Cloud Services
> > SARA Computing&  Network Services
> > Operations, Support&  Development
> >
> > Phone: +31 20 888 4101
> > Email: evert.lammerts@sara.nl
> > http://www.sara.nl
> >
> >
> >
>
> Then on the DataNode where you have the particular block
> (blk_-3695352030358969086_130839 )
> you can visit the web interface
> http://192.168.28.211:50075/blockScannerReport to see what's happening
> on the node
>
> Regards
>
> --
> Marcos Luís Ortíz Valmaseda
>  Software Engineer
>  Universidad de las Ciencias Informáticas
>  Linux User # 418229
>
> http://uncubanitolinuxero.blogspot.com
> http://www.linkedin.com/in/marcosluis2186
>
>

Re: Could not obtain block

Posted by Marcos Ortiz <ml...@uci.cu>.
El 3/9/2011 11:09 AM, Evert Lammerts escribió:
> I didn't mention it but the complete filesystem is reported healthy by fsck. I'm guessing that the java.io.EOFException indicates a problem caused by the load of the job.
>
> Any ideas?
>
>    
It's a very tricky work to debug a MapReduce Job execution but I'll try.

java.io.EOFException: while trying to read 1981952 bytes
 >          at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
 >          at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
 >          at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
 >          at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
 >          at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
 >          at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
 > 2011-03-08 19:40:41,465 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: Block 
blk_-9221549395563181322_4024529 unfinalized and removed.

1- Did you check this?
2- Which are the file permisions on /user/emeij/icwsm-data-test/ ?

If the fsck command gives that all is fine, really I don't know more.

Regards

-- 
Marcos Luís Ortíz Valmaseda
  Software Engineer
  Universidad de las Ciencias Informáticas
  Linux User # 418229

http://uncubanitolinuxero.blogspot.com
http://www.linkedin.com/in/marcosluis2186


RE: Could not obtain block

Posted by Evert Lammerts <Ev...@sara.nl>.
I didn't mention it but the complete filesystem is reported healthy by fsck. I'm guessing that the java.io.EOFException indicates a problem caused by the load of the job.

Any ideas?

________________________________________
From: Marcos Ortiz [mlortiz@uci.cu]
Sent: Wednesday, March 09, 2011 4:31 PM
To: mapreduce-user@hadoop.apache.org
Cc: Evert Lammerts; 'hdfs-user@hadoop.apache.org'; cdh-user@cloudera.org
Subject: Re: Could not obtain block

El 3/9/2011 6:27 AM, Evert Lammerts escribió:
> We see a lot of IOExceptions coming from HDFS during a job that does nothing but untar 100 files (1 per Mapper, sizes vary between 5GB and 80GB) that are in HDFS, to HDFS. DataNodes are also showing Exceptions that I think are related. (See stacktraces below.)
>
> This job should not be able to overload the system I think... I realize that much data needs to go over the lines, but HDFS should still be responsive. Any ideas / help is much appreciated!
>
> Some details:
> * Hadoop 0.20.2 (CDH3b4)
> * 5 node cluster plus 1 node for JT/NN (Sun Thumpers)
> * 4 cores/node, 4GB RAM/core
> * CentOS 5.5
>
> Job output:
>
> java.io.IOException: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
>
Which is the ouput of:
   bin/hadoop dfsadmin -report

Which is the output of:
   bin/hadoop fsck /user/emeij/icwsm-data-test/
>       at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:449)
>       at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:1)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>       at org.apache.hadoop.mapred.Child.main(Child.java:234)
> Caused by: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
>
Which is the ouput of:
  bin/hadoop fsck /user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
--files -blocks -racks
>       at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1977)
>       at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1784)
>       at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1932)
>       at java.io.DataInputStream.read(DataInputStream.java:83)
>       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
>       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:74)
>       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:335)
>       at ilps.DownloadICWSM$CopyThread.run(DownloadICWSM.java:149)
>
>
> Example DataNode Exceptions (not that these come from the node at 192.168.28.211):
>
> 2011-03-08 19:40:40,297 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9222067946733189014_3798233 java.io.EOFException: while trying to read 3067064 bytes
> 2011-03-08 19:40:41,018 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.211:50050, dest: /192.168.28.211:49748, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201103071120_0030_m_000032_0, offset: 30
> 72, srvID: DS-568746059-145.100.2.180-50050-1291128670510, blockid: blk_3596618013242149887_4060598, duration: 2632000
> 2011-03-08 19:40:41,049 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221028436071074510_2325937 java.io.EOFException: while trying to read 2206400 bytes
> 2011-03-08 19:40:41,348 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221549395563181322_4024529 java.io.EOFException: while trying to read 3037288 bytes
> 2011-03-08 19:40:41,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221885906633018147_3895876 java.io.EOFException: while trying to read 1981952 bytes
> 2011-03-08 19:40:41,434 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221885906633018147_3895876 unfinalized and removed.
> 2011-03-08 19:40:41,434 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9221885906633018147_3895876 received exception java.io.EOFException: while trying to read 1981952 bytes
> 2011-03-08 19:40:41,434 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.EOFException: while trying to read 1981952 bytes
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
>          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
>          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
> 2011-03-08 19:40:41,465 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221549395563181322_4024529 unfinalized and removed.
> 2011-03-08 19:40:41,466 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9221549395563181322_4024529 received exception java.io.EOFException: while trying to read 3037288 bytes
> 2011-03-08 19:40:41,466 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.EOFException: while trying to read 3037288 bytes
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
>          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
>          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
>
> Cheers,
>
> Evert Lammerts
> Consultant eScience&  Cloud Services
> SARA Computing&  Network Services
> Operations, Support&  Development
>
> Phone: +31 20 888 4101
> Email: evert.lammerts@sara.nl
> http://www.sara.nl
>
>
>

Then on the DataNode where you have the particular block
(blk_-3695352030358969086_130839 )
you can visit the web interface
http://192.168.28.211:50075/blockScannerReport to see what's happening
on the node

Regards

--
Marcos Luís Ortíz Valmaseda
  Software Engineer
  Universidad de las Ciencias Informáticas
  Linux User # 418229

http://uncubanitolinuxero.blogspot.com
http://www.linkedin.com/in/marcosluis2186


RE: Could not obtain block

Posted by Evert Lammerts <Ev...@sara.nl>.
I didn't mention it but the complete filesystem is reported healthy by fsck. I'm guessing that the java.io.EOFException indicates a problem caused by the load of the job.

Any ideas?

________________________________________
From: Marcos Ortiz [mlortiz@uci.cu]
Sent: Wednesday, March 09, 2011 4:31 PM
To: mapreduce-user@hadoop.apache.org
Cc: Evert Lammerts; 'hdfs-user@hadoop.apache.org'; cdh-user@cloudera.org
Subject: Re: Could not obtain block

El 3/9/2011 6:27 AM, Evert Lammerts escribió:
> We see a lot of IOExceptions coming from HDFS during a job that does nothing but untar 100 files (1 per Mapper, sizes vary between 5GB and 80GB) that are in HDFS, to HDFS. DataNodes are also showing Exceptions that I think are related. (See stacktraces below.)
>
> This job should not be able to overload the system I think... I realize that much data needs to go over the lines, but HDFS should still be responsive. Any ideas / help is much appreciated!
>
> Some details:
> * Hadoop 0.20.2 (CDH3b4)
> * 5 node cluster plus 1 node for JT/NN (Sun Thumpers)
> * 4 cores/node, 4GB RAM/core
> * CentOS 5.5
>
> Job output:
>
> java.io.IOException: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
>
Which is the ouput of:
   bin/hadoop dfsadmin -report

Which is the output of:
   bin/hadoop fsck /user/emeij/icwsm-data-test/
>       at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:449)
>       at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:1)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>       at org.apache.hadoop.mapred.Child.main(Child.java:234)
> Caused by: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
>
Which is the ouput of:
  bin/hadoop fsck /user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
--files -blocks -racks
>       at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1977)
>       at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1784)
>       at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1932)
>       at java.io.DataInputStream.read(DataInputStream.java:83)
>       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
>       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:74)
>       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:335)
>       at ilps.DownloadICWSM$CopyThread.run(DownloadICWSM.java:149)
>
>
> Example DataNode Exceptions (not that these come from the node at 192.168.28.211):
>
> 2011-03-08 19:40:40,297 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9222067946733189014_3798233 java.io.EOFException: while trying to read 3067064 bytes
> 2011-03-08 19:40:41,018 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.211:50050, dest: /192.168.28.211:49748, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201103071120_0030_m_000032_0, offset: 30
> 72, srvID: DS-568746059-145.100.2.180-50050-1291128670510, blockid: blk_3596618013242149887_4060598, duration: 2632000
> 2011-03-08 19:40:41,049 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221028436071074510_2325937 java.io.EOFException: while trying to read 2206400 bytes
> 2011-03-08 19:40:41,348 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221549395563181322_4024529 java.io.EOFException: while trying to read 3037288 bytes
> 2011-03-08 19:40:41,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221885906633018147_3895876 java.io.EOFException: while trying to read 1981952 bytes
> 2011-03-08 19:40:41,434 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221885906633018147_3895876 unfinalized and removed.
> 2011-03-08 19:40:41,434 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9221885906633018147_3895876 received exception java.io.EOFException: while trying to read 1981952 bytes
> 2011-03-08 19:40:41,434 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.EOFException: while trying to read 1981952 bytes
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
>          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
>          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
> 2011-03-08 19:40:41,465 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221549395563181322_4024529 unfinalized and removed.
> 2011-03-08 19:40:41,466 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9221549395563181322_4024529 received exception java.io.EOFException: while trying to read 3037288 bytes
> 2011-03-08 19:40:41,466 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.EOFException: while trying to read 3037288 bytes
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
>          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
>          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
>
> Cheers,
>
> Evert Lammerts
> Consultant eScience&  Cloud Services
> SARA Computing&  Network Services
> Operations, Support&  Development
>
> Phone: +31 20 888 4101
> Email: evert.lammerts@sara.nl
> http://www.sara.nl
>
>
>

Then on the DataNode where you have the particular block
(blk_-3695352030358969086_130839 )
you can visit the web interface
http://192.168.28.211:50075/blockScannerReport to see what's happening
on the node

Regards

--
Marcos Luís Ortíz Valmaseda
  Software Engineer
  Universidad de las Ciencias Informáticas
  Linux User # 418229

http://uncubanitolinuxero.blogspot.com
http://www.linkedin.com/in/marcosluis2186


Re: Could not obtain block

Posted by Marcos Ortiz <ml...@uci.cu>.
El 3/9/2011 6:27 AM, Evert Lammerts escribió:
> We see a lot of IOExceptions coming from HDFS during a job that does nothing but untar 100 files (1 per Mapper, sizes vary between 5GB and 80GB) that are in HDFS, to HDFS. DataNodes are also showing Exceptions that I think are related. (See stacktraces below.)
>
> This job should not be able to overload the system I think... I realize that much data needs to go over the lines, but HDFS should still be responsive. Any ideas / help is much appreciated!
>
> Some details:
> * Hadoop 0.20.2 (CDH3b4)
> * 5 node cluster plus 1 node for JT/NN (Sun Thumpers)
> * 4 cores/node, 4GB RAM/core
> * CentOS 5.5
>
> Job output:
>
> java.io.IOException: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
>    
Which is the ouput of:
   bin/hadoop dfsadmin -report

Which is the output of:
   bin/hadoop fsck /user/emeij/icwsm-data-test/
> 	at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:449)
> 	at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:1)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:234)
> Caused by: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
>    
Which is the ouput of:
  bin/hadoop fsck /user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz 
--files -blocks -racks
> 	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1977)
> 	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1784)
> 	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1932)
> 	at java.io.DataInputStream.read(DataInputStream.java:83)
> 	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
> 	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:74)
> 	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:335)
> 	at ilps.DownloadICWSM$CopyThread.run(DownloadICWSM.java:149)
>
>
> Example DataNode Exceptions (not that these come from the node at 192.168.28.211):
>
> 2011-03-08 19:40:40,297 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9222067946733189014_3798233 java.io.EOFException: while trying to read 3067064 bytes
> 2011-03-08 19:40:41,018 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.211:50050, dest: /192.168.28.211:49748, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201103071120_0030_m_000032_0, offset: 30
> 72, srvID: DS-568746059-145.100.2.180-50050-1291128670510, blockid: blk_3596618013242149887_4060598, duration: 2632000
> 2011-03-08 19:40:41,049 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221028436071074510_2325937 java.io.EOFException: while trying to read 2206400 bytes
> 2011-03-08 19:40:41,348 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221549395563181322_4024529 java.io.EOFException: while trying to read 3037288 bytes
> 2011-03-08 19:40:41,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221885906633018147_3895876 java.io.EOFException: while trying to read 1981952 bytes
> 2011-03-08 19:40:41,434 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221885906633018147_3895876 unfinalized and removed.
> 2011-03-08 19:40:41,434 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9221885906633018147_3895876 received exception java.io.EOFException: while trying to read 1981952 bytes
> 2011-03-08 19:40:41,434 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.EOFException: while trying to read 1981952 bytes
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
>          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
>          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
> 2011-03-08 19:40:41,465 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221549395563181322_4024529 unfinalized and removed.
> 2011-03-08 19:40:41,466 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9221549395563181322_4024529 received exception java.io.EOFException: while trying to read 3037288 bytes
> 2011-03-08 19:40:41,466 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.EOFException: while trying to read 3037288 bytes
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
>          at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
>          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
>          at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
>
> Cheers,
>
> Evert Lammerts
> Consultant eScience&  Cloud Services
> SARA Computing&  Network Services
> Operations, Support&  Development
>
> Phone: +31 20 888 4101
> Email: evert.lammerts@sara.nl
> http://www.sara.nl
>
>
>    

Then on the DataNode where you have the particular block 
(blk_-3695352030358969086_130839 )
you can visit the web interface 
http://192.168.28.211:50075/blockScannerReport to see what's happening 
on the node

Regards

-- 
Marcos Luís Ortíz Valmaseda
  Software Engineer
  Universidad de las Ciencias Informáticas
  Linux User # 418229

http://uncubanitolinuxero.blogspot.com
http://www.linkedin.com/in/marcosluis2186


RE: Could not obtain block

Posted by Evert Lammerts <Ev...@sara.nl>.
My bad! I have done a testrun with Kerberos on the cluster (which worked relatively well...). I was under the impression that just configuring the cluster to NOT use security would revert the process. But it turns out the HDFS deamons  were still started using the SecureStarters. This seemed to mess up the network AND cause the high system load. After removing hadoop-0.20-sbin and restarting the cluster everything seems back to normal!



> -----Original Message-----
> From: Evert Lammerts [mailto:Evert.Lammerts@sara.nl]
> Sent: vrijdag 18 maart 2011 14:16
> To: 'Todd Lipcon'; common-user@hadoop.apache.org
> Cc: CDH Users
> Subject: RE: Could not obtain block
>
> > Can you check the DN logs for "exceeds the limit of concurrent
> > xcievers"? You may need to bump the dfs.datanode.max.xcievers
> > parameter in hdfs-site.xml, and also possibly the nfiles ulimit.
>
> Thanks Todd, and sorry for the late reply - I missed this message.
>
> I didn't see any xciever messages in the DN logs, but I figured it
> might be a good idea to up the nofiles uplimit. The result is a jsvc
> that is eating memory:
>
> $top
>
> Mem:  16320412k total, 16199036k used,   121376k free,    25412k
> buffers
> Swap: 33554424k total,   291492k used, 33262932k free, 10966732k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 24835 mapred    18   0 2644m 157m 8316 S 34.1  1.0   7031:27 java
> 14794 hdfs      18   0 2430m 1.5g  10m S  3.3  9.8   3:39.56 jsvc
>
>
> I'll revert it and see what effect dfs.datanode.max.xcievers will have.
>
> Cheers,
> Evert
>
> >
> > -Todd
> >
> >
> > On Wed, Mar 9, 2011 at 3:27 AM, Evert Lammerts
> <Ev...@sara.nl>
> > wrote:
> > > We see a lot of IOExceptions coming from HDFS during a job that
> does
> > nothing but untar 100 files (1 per Mapper, sizes vary between 5GB and
> > 80GB) that are in HDFS, to HDFS. DataNodes are also showing
> Exceptions
> > that I think are related. (See stacktraces below.)
> > >
> > > This job should not be able to overload the system I think... I
> > realize that much data needs to go over the lines, but HDFS should
> > still be responsive. Any ideas / help is much appreciated!
> > >
> > > Some details:
> > > * Hadoop 0.20.2 (CDH3b4)
> > > * 5 node cluster plus 1 node for JT/NN (Sun Thumpers)
> > > * 4 cores/node, 4GB RAM/core
> > > * CentOS 5.5
> > >
> > > Job output:
> > >
> > > java.io.IOException: java.io.IOException: Could not obtain block:
> > blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-
> 26-
> > SOCIAL_MEDIA.tar.gz
> > >        at
> ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:449)
> > >        at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:1)
> > >        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> > >        at
> > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390)
> > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
> > >        at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> > >        at java.security.AccessController.doPrivileged(Native
> Method)
> > >        at javax.security.auth.Subject.doAs(Subject.java:396)
> > >        at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati
> > on.java:1115)
> > >        at org.apache.hadoop.mapred.Child.main(Child.java:234)
> > > Caused by: java.io.IOException: Could not obtain block: blk_-
> > 3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-
> > SOCIAL_MEDIA.tar.gz
> > >        at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClien
> > t.java:1977)
> > >        at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.j
> > ava:1784)
> > >        at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:193
> > 2)
> > >        at java.io.DataInputStream.read(DataInputStream.java:83)
> > >        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
> > >        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:74)
> > >        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:335)
> > >        at ilps.DownloadICWSM$CopyThread.run(DownloadICWSM.java:149)
> > >
> > >
> > > Example DataNode Exceptions (not that these come from the node at
> > 192.168.28.211):
> > >
> > > 2011-03-08 19:40:40,297 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
> > receiveBlock for block blk_-9222067946733189014_3798233
> > java.io.EOFException: while trying to read 3067064 bytes
> > > 2011-03-08 19:40:41,018 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> > /192.168.28.211:50050, dest: /192.168.28.211:49748, bytes: 0, op:
> > HDFS_READ, cliID: DFSClient_attempt_201103071120_0030_m_000032_0,
> > offset: 30
> > > 72, srvID: DS-568746059-145.100.2.180-50050-1291128670510, blockid:
> > blk_3596618013242149887_4060598, duration: 2632000
> > > 2011-03-08 19:40:41,049 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
> > receiveBlock for block blk_-9221028436071074510_2325937
> > java.io.EOFException: while trying to read 2206400 bytes
> > > 2011-03-08 19:40:41,348 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
> > receiveBlock for block blk_-9221549395563181322_4024529
> > java.io.EOFException: while trying to read 3037288 bytes
> > > 2011-03-08 19:40:41,357 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
> > receiveBlock for block blk_-9221885906633018147_3895876
> > java.io.EOFException: while trying to read 1981952 bytes
> > > 2011-03-08 19:40:41,434 WARN
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-
> > 9221885906633018147_3895876 unfinalized and removed.
> > > 2011-03-08 19:40:41,434 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-
> > 9221885906633018147_3895876 received exception java.io.EOFException:
> > while trying to read 1981952 bytes
> > > 2011-03-08 19:40:41,434 ERROR
> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-
> > 145.100.2.180-50050-1291128670510, infoPort=50075,
> > ipcPort=50020):DataXceiver
> > > java.io.EOFException: while trying to read 1981952 bytes
> > >        at
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockRec
> > eiver.java:270)
> > >        at
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(Blo
> > ckReceiver.java:357)
> > >        at
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(Bloc
> > kReceiver.java:378)
> > >        at
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(Block
> > Receiver.java:534)
> > >        at
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiv
> > er.java:417)
> > >        at
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java
> > :122)
> > > 2011-03-08 19:40:41,465 WARN
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-
> > 9221549395563181322_4024529 unfinalized and removed.
> > > 2011-03-08 19:40:41,466 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-
> > 9221549395563181322_4024529 received exception java.io.EOFException:
> > while trying to read 3037288 bytes
> > > 2011-03-08 19:40:41,466 ERROR
> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-
> > 145.100.2.180-50050-1291128670510, infoPort=50075,
> > ipcPort=50020):DataXceiver
> > > java.io.EOFException: while trying to read 3037288 bytes
> > >        at
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockRec
> > eiver.java:270)
> > >        at
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(Blo
> > ckReceiver.java:357)
> > >        at
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(Bloc
> > kReceiver.java:378)
> > >        at
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(Block
> > Receiver.java:534)
> > >        at
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiv
> > er.java:417)
> > >        at
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java
> > :122)
> > >
> > > Cheers,
> > >
> > > Evert Lammerts
> > > Consultant eScience & Cloud Services
> > > SARA Computing & Network Services
> > > Operations, Support & Development
> > >
> > > Phone: +31 20 888 4101
> > > Email: evert.lammerts@sara.nl
> > > http://www.sara.nl
> > >
> > >
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera

RE: Could not obtain block

Posted by Evert Lammerts <Ev...@sara.nl>.
> Can you check the DN logs for "exceeds the limit of concurrent
> xcievers"? You may need to bump the dfs.datanode.max.xcievers
> parameter in hdfs-site.xml, and also possibly the nfiles ulimit.

Thanks Todd, and sorry for the late reply - I missed this message.

I didn't see any xciever messages in the DN logs, but I figured it might be a good idea to up the nofiles uplimit. The result is a jsvc that is eating memory:

$top

Mem:  16320412k total, 16199036k used,   121376k free,    25412k buffers
Swap: 33554424k total,   291492k used, 33262932k free, 10966732k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24835 mapred    18   0 2644m 157m 8316 S 34.1  1.0   7031:27 java
14794 hdfs      18   0 2430m 1.5g  10m S  3.3  9.8   3:39.56 jsvc


I'll revert it and see what effect dfs.datanode.max.xcievers will have.

Cheers,
Evert

>
> -Todd
>
>
> On Wed, Mar 9, 2011 at 3:27 AM, Evert Lammerts <Ev...@sara.nl>
> wrote:
> > We see a lot of IOExceptions coming from HDFS during a job that does
> nothing but untar 100 files (1 per Mapper, sizes vary between 5GB and
> 80GB) that are in HDFS, to HDFS. DataNodes are also showing Exceptions
> that I think are related. (See stacktraces below.)
> >
> > This job should not be able to overload the system I think... I
> realize that much data needs to go over the lines, but HDFS should
> still be responsive. Any ideas / help is much appreciated!
> >
> > Some details:
> > * Hadoop 0.20.2 (CDH3b4)
> > * 5 node cluster plus 1 node for JT/NN (Sun Thumpers)
> > * 4 cores/node, 4GB RAM/core
> > * CentOS 5.5
> >
> > Job output:
> >
> > java.io.IOException: java.io.IOException: Could not obtain block:
> blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-
> SOCIAL_MEDIA.tar.gz
> >        at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:449)
> >        at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:1)
> >        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >        at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
> >        at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> >        at java.security.AccessController.doPrivileged(Native Method)
> >        at javax.security.auth.Subject.doAs(Subject.java:396)
> >        at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati
> on.java:1115)
> >        at org.apache.hadoop.mapred.Child.main(Child.java:234)
> > Caused by: java.io.IOException: Could not obtain block: blk_-
> 3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-
> SOCIAL_MEDIA.tar.gz
> >        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClien
> t.java:1977)
> >        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.j
> ava:1784)
> >        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:193
> 2)
> >        at java.io.DataInputStream.read(DataInputStream.java:83)
> >        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
> >        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:74)
> >        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:335)
> >        at ilps.DownloadICWSM$CopyThread.run(DownloadICWSM.java:149)
> >
> >
> > Example DataNode Exceptions (not that these come from the node at
> 192.168.28.211):
> >
> > 2011-03-08 19:40:40,297 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
> receiveBlock for block blk_-9222067946733189014_3798233
> java.io.EOFException: while trying to read 3067064 bytes
> > 2011-03-08 19:40:41,018 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /192.168.28.211:50050, dest: /192.168.28.211:49748, bytes: 0, op:
> HDFS_READ, cliID: DFSClient_attempt_201103071120_0030_m_000032_0,
> offset: 30
> > 72, srvID: DS-568746059-145.100.2.180-50050-1291128670510, blockid:
> blk_3596618013242149887_4060598, duration: 2632000
> > 2011-03-08 19:40:41,049 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
> receiveBlock for block blk_-9221028436071074510_2325937
> java.io.EOFException: while trying to read 2206400 bytes
> > 2011-03-08 19:40:41,348 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
> receiveBlock for block blk_-9221549395563181322_4024529
> java.io.EOFException: while trying to read 3037288 bytes
> > 2011-03-08 19:40:41,357 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
> receiveBlock for block blk_-9221885906633018147_3895876
> java.io.EOFException: while trying to read 1981952 bytes
> > 2011-03-08 19:40:41,434 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-
> 9221885906633018147_3895876 unfinalized and removed.
> > 2011-03-08 19:40:41,434 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-
> 9221885906633018147_3895876 received exception java.io.EOFException:
> while trying to read 1981952 bytes
> > 2011-03-08 19:40:41,434 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-
> 145.100.2.180-50050-1291128670510, infoPort=50075,
> ipcPort=50020):DataXceiver
> > java.io.EOFException: while trying to read 1981952 bytes
> >        at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockRec
> eiver.java:270)
> >        at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(Blo
> ckReceiver.java:357)
> >        at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(Bloc
> kReceiver.java:378)
> >        at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(Block
> Receiver.java:534)
> >        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiv
> er.java:417)
> >        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java
> :122)
> > 2011-03-08 19:40:41,465 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-
> 9221549395563181322_4024529 unfinalized and removed.
> > 2011-03-08 19:40:41,466 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-
> 9221549395563181322_4024529 received exception java.io.EOFException:
> while trying to read 3037288 bytes
> > 2011-03-08 19:40:41,466 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-
> 145.100.2.180-50050-1291128670510, infoPort=50075,
> ipcPort=50020):DataXceiver
> > java.io.EOFException: while trying to read 3037288 bytes
> >        at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockRec
> eiver.java:270)
> >        at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(Blo
> ckReceiver.java:357)
> >        at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(Bloc
> kReceiver.java:378)
> >        at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(Block
> Receiver.java:534)
> >        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiv
> er.java:417)
> >        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java
> :122)
> >
> > Cheers,
> >
> > Evert Lammerts
> > Consultant eScience & Cloud Services
> > SARA Computing & Network Services
> > Operations, Support & Development
> >
> > Phone: +31 20 888 4101
> > Email: evert.lammerts@sara.nl
> > http://www.sara.nl
> >
> >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera

Re: Could not obtain block

Posted by Todd Lipcon <to...@cloudera.com>.
[moving to common-user, since this spans both MR and HDFS - probably
easier than cross-posting]

Can you check the DN logs for "exceeds the limit of concurrent
xcievers"? You may need to bump the dfs.datanode.max.xcievers
parameter in hdfs-site.xml, and also possibly the nfiles ulimit.

-Todd


On Wed, Mar 9, 2011 at 3:27 AM, Evert Lammerts <Ev...@sara.nl> wrote:
> We see a lot of IOExceptions coming from HDFS during a job that does nothing but untar 100 files (1 per Mapper, sizes vary between 5GB and 80GB) that are in HDFS, to HDFS. DataNodes are also showing Exceptions that I think are related. (See stacktraces below.)
>
> This job should not be able to overload the system I think... I realize that much data needs to go over the lines, but HDFS should still be responsive. Any ideas / help is much appreciated!
>
> Some details:
> * Hadoop 0.20.2 (CDH3b4)
> * 5 node cluster plus 1 node for JT/NN (Sun Thumpers)
> * 4 cores/node, 4GB RAM/core
> * CentOS 5.5
>
> Job output:
>
> java.io.IOException: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
>        at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:449)
>        at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:1)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>        at org.apache.hadoop.mapred.Child.main(Child.java:234)
> Caused by: java.io.IOException: Could not obtain block: blk_-3695352030358969086_130839 file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
>        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1977)
>        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1784)
>        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1932)
>        at java.io.DataInputStream.read(DataInputStream.java:83)
>        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
>        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:74)
>        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:335)
>        at ilps.DownloadICWSM$CopyThread.run(DownloadICWSM.java:149)
>
>
> Example DataNode Exceptions (not that these come from the node at 192.168.28.211):
>
> 2011-03-08 19:40:40,297 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9222067946733189014_3798233 java.io.EOFException: while trying to read 3067064 bytes
> 2011-03-08 19:40:41,018 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.28.211:50050, dest: /192.168.28.211:49748, bytes: 0, op: HDFS_READ, cliID: DFSClient_attempt_201103071120_0030_m_000032_0, offset: 30
> 72, srvID: DS-568746059-145.100.2.180-50050-1291128670510, blockid: blk_3596618013242149887_4060598, duration: 2632000
> 2011-03-08 19:40:41,049 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221028436071074510_2325937 java.io.EOFException: while trying to read 2206400 bytes
> 2011-03-08 19:40:41,348 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221549395563181322_4024529 java.io.EOFException: while trying to read 3037288 bytes
> 2011-03-08 19:40:41,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-9221885906633018147_3895876 java.io.EOFException: while trying to read 1981952 bytes
> 2011-03-08 19:40:41,434 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221885906633018147_3895876 unfinalized and removed.
> 2011-03-08 19:40:41,434 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9221885906633018147_3895876 received exception java.io.EOFException: while trying to read 1981952 bytes
> 2011-03-08 19:40:41,434 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.EOFException: while trying to read 1981952 bytes
>        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
>        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
>        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
>        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
> 2011-03-08 19:40:41,465 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block blk_-9221549395563181322_4024529 unfinalized and removed.
> 2011-03-08 19:40:41,466 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-9221549395563181322_4024529 received exception java.io.EOFException: while trying to read 3037288 bytes
> 2011-03-08 19:40:41,466 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.28.211:50050, storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.EOFException: while trying to read 3037288 bytes
>        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
>        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
>        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
>        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
>
> Cheers,
>
> Evert Lammerts
> Consultant eScience & Cloud Services
> SARA Computing & Network Services
> Operations, Support & Development
>
> Phone: +31 20 888 4101
> Email: evert.lammerts@sara.nl
> http://www.sara.nl
>
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera