You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ivan Tretyakov <it...@griddynamics.com> on 2013/01/10 13:04:14 UTC
could only be replicated to 0 nodes instead of minReplication
Hello!
On our cluster jobs fails with the following exception:
2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
could only be replicated to 0 nodes instead of minReplication (=1). There
are 6 datanode(s) running and no node(s) are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
at org.apache.hadoop.ipc.Client.call(Client.java:1160)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy10.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
I've found that it could be cause by lack of free disk space, but as I
could see there is everything well (see attached dfs report output).
Also, I could see following exception in TaskTracker log
https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it is
related.
Could it be related with another issue on our cluster? -
http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
Thanks in advance!
--
Best Regards
Ivan Tretyakov
Re: could only be replicated to 0 nodes instead of minReplication
Posted by Ivan Tretyakov <it...@griddynamics.com>.
Thanks for replies.
Finally after trying many ways to resolve problem, e.g.:
- Number open files for mapreduce user
- xcievers and handler threads number options in datanode
- And others
Problem was obvious from apache wiki page below - there was a lack of disk
space.
But it was about 100-200Gb free per disk and space reserved for non-dfs
usage is only 10Gb and problem still appears.
Problem was solved after we freed up to ~500Gb per disk.
There are no:
- selinux
- quotas
- lack of inodes
- small limit on open files
in our fs.
But we could see problem with following disk space layout:
node01
/dev/sdb1 1.9T 317M 1.9T 1% /data1
/dev/sdc1 1.9T 317M 1.9T 1% /data2
/dev/sdd1 1.9T 317M 1.9T 1% /data3
node02
/dev/sdb1 1.9T 1.7T 146G 93% /data1
/dev/sdc1 1.9T 1.6T 225G 88% /data2
/dev/sdd1 1.9T 1.7T 219G 89% /data3
node03
/dev/sdb1 1.9T 1.8T 116G 94% /data1
/dev/sdc1 1.9T 1.8T 98G 95% /data2
/dev/sdd1 1.9T 1.7T 210G 89% /data3
node04
/dev/sdb1 1.9T 1.7T 140G 93% /data1
/dev/sdc1 1.9T 1.7T 178G 91% /data2
/dev/sdd1 1.9T 1.7T 209G 89% /data3
node05
/dev/sdb1 1.9T 1.7T 178G 91% /data1
/dev/sdc1 1.9T 1.7T 212G 89% /data2
/dev/sdd1 1.9T 1.7T 215G 89% /data3
node06
/dev/sdb1 1.9T 1.7T 170G 91% /data1
/dev/sdc1 1.9T 1.7T 198G 90% /data2
/dev/sdd1 1.9T 1.7T 212G 89% /data3
node07
/dev/sdb1 1.9T 1.7T 197G 90% /data1
/dev/sdc1 1.9T 1.6T 263G 86% /data2
/dev/sdd1 1.9T 1.6T 236G 88% /data3
Do you think hadoop is correct here?
On Thu, Jan 10, 2013 at 9:17 PM, Robert Molina <rm...@hortonworks.com>wrote:
> Hi Ivan,
> Here are a couple of more suggestions provided by the wiki:
>
> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> Regards,
> Robert
>
>
> On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> I also found following exception in datanode, I suppose it might give
>> some clue:
>>
>> 2013-01-10 11:37:55,397 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
>> src: /192.168.1.112:35991 dest: /192.168.1.112:50010
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
>> channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
>> 192.168.1.112:35991]
>> at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
>> at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
>> at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
>> at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
>> at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
>> at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
>> at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
>> at java.lang.Thread.run(Thread.java:662)
>>
>>
>> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
>> itretyakov@griddynamics.com> wrote:
>>
>>> Hello!
>>>
>>> On our cluster jobs fails with the following exception:
>>>
>>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>>> DataStreamer Exception
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>>> could only be replicated to 0 nodes instead of minReplication (=1). There
>>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>>> at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>>> at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>>> at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>> at $Proxy10.addBlock(Unknown Source)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>> at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>> at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>> at $Proxy10.addBlock(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>>> at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>>> at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>>> at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>>
>>> I've found that it could be cause by lack of free disk space, but as I
>>> could see there is everything well (see attached dfs report output).
>>> Also, I could see following exception in TaskTracker log
>>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if
>>> it is related.
>>>
>>> Could it be related with another issue on our cluster? -
>>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Best Regards
>>> Ivan Tretyakov
>>>
>>
>>
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>> Deployment Engineer
>> Grid Dynamics
>> +7 812 640 38 76
>> Skype: ivan.tretyakov
>> www.griddynamics.com
>> itretyakov@griddynamics.com
>>
>
>
--
Best Regards
Ivan Tretyakov
Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com
Re: could only be replicated to 0 nodes instead of minReplication
Posted by Ivan Tretyakov <it...@griddynamics.com>.
Thanks for replies.
Finally after trying many ways to resolve problem, e.g.:
- Number open files for mapreduce user
- xcievers and handler threads number options in datanode
- And others
Problem was obvious from apache wiki page below - there was a lack of disk
space.
But it was about 100-200Gb free per disk and space reserved for non-dfs
usage is only 10Gb and problem still appears.
Problem was solved after we freed up to ~500Gb per disk.
There are no:
- selinux
- quotas
- lack of inodes
- small limit on open files
in our fs.
But we could see problem with following disk space layout:
node01
/dev/sdb1 1.9T 317M 1.9T 1% /data1
/dev/sdc1 1.9T 317M 1.9T 1% /data2
/dev/sdd1 1.9T 317M 1.9T 1% /data3
node02
/dev/sdb1 1.9T 1.7T 146G 93% /data1
/dev/sdc1 1.9T 1.6T 225G 88% /data2
/dev/sdd1 1.9T 1.7T 219G 89% /data3
node03
/dev/sdb1 1.9T 1.8T 116G 94% /data1
/dev/sdc1 1.9T 1.8T 98G 95% /data2
/dev/sdd1 1.9T 1.7T 210G 89% /data3
node04
/dev/sdb1 1.9T 1.7T 140G 93% /data1
/dev/sdc1 1.9T 1.7T 178G 91% /data2
/dev/sdd1 1.9T 1.7T 209G 89% /data3
node05
/dev/sdb1 1.9T 1.7T 178G 91% /data1
/dev/sdc1 1.9T 1.7T 212G 89% /data2
/dev/sdd1 1.9T 1.7T 215G 89% /data3
node06
/dev/sdb1 1.9T 1.7T 170G 91% /data1
/dev/sdc1 1.9T 1.7T 198G 90% /data2
/dev/sdd1 1.9T 1.7T 212G 89% /data3
node07
/dev/sdb1 1.9T 1.7T 197G 90% /data1
/dev/sdc1 1.9T 1.6T 263G 86% /data2
/dev/sdd1 1.9T 1.6T 236G 88% /data3
Do you think hadoop is correct here?
On Thu, Jan 10, 2013 at 9:17 PM, Robert Molina <rm...@hortonworks.com>wrote:
> Hi Ivan,
> Here are a couple of more suggestions provided by the wiki:
>
> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> Regards,
> Robert
>
>
> On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> I also found following exception in datanode, I suppose it might give
>> some clue:
>>
>> 2013-01-10 11:37:55,397 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
>> src: /192.168.1.112:35991 dest: /192.168.1.112:50010
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
>> channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
>> 192.168.1.112:35991]
>> at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
>> at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
>> at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
>> at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
>> at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
>> at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
>> at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
>> at java.lang.Thread.run(Thread.java:662)
>>
>>
>> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
>> itretyakov@griddynamics.com> wrote:
>>
>>> Hello!
>>>
>>> On our cluster jobs fails with the following exception:
>>>
>>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>>> DataStreamer Exception
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>>> could only be replicated to 0 nodes instead of minReplication (=1). There
>>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>>> at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>>> at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>>> at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>> at $Proxy10.addBlock(Unknown Source)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>> at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>> at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>> at $Proxy10.addBlock(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>>> at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>>> at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>>> at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>>
>>> I've found that it could be cause by lack of free disk space, but as I
>>> could see there is everything well (see attached dfs report output).
>>> Also, I could see following exception in TaskTracker log
>>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if
>>> it is related.
>>>
>>> Could it be related with another issue on our cluster? -
>>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Best Regards
>>> Ivan Tretyakov
>>>
>>
>>
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>> Deployment Engineer
>> Grid Dynamics
>> +7 812 640 38 76
>> Skype: ivan.tretyakov
>> www.griddynamics.com
>> itretyakov@griddynamics.com
>>
>
>
--
Best Regards
Ivan Tretyakov
Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com
Re: could only be replicated to 0 nodes instead of minReplication
Posted by Ivan Tretyakov <it...@griddynamics.com>.
Thanks for replies.
Finally after trying many ways to resolve problem, e.g.:
- Number open files for mapreduce user
- xcievers and handler threads number options in datanode
- And others
Problem was obvious from apache wiki page below - there was a lack of disk
space.
But it was about 100-200Gb free per disk and space reserved for non-dfs
usage is only 10Gb and problem still appears.
Problem was solved after we freed up to ~500Gb per disk.
There are no:
- selinux
- quotas
- lack of inodes
- small limit on open files
in our fs.
But we could see problem with following disk space layout:
node01
/dev/sdb1 1.9T 317M 1.9T 1% /data1
/dev/sdc1 1.9T 317M 1.9T 1% /data2
/dev/sdd1 1.9T 317M 1.9T 1% /data3
node02
/dev/sdb1 1.9T 1.7T 146G 93% /data1
/dev/sdc1 1.9T 1.6T 225G 88% /data2
/dev/sdd1 1.9T 1.7T 219G 89% /data3
node03
/dev/sdb1 1.9T 1.8T 116G 94% /data1
/dev/sdc1 1.9T 1.8T 98G 95% /data2
/dev/sdd1 1.9T 1.7T 210G 89% /data3
node04
/dev/sdb1 1.9T 1.7T 140G 93% /data1
/dev/sdc1 1.9T 1.7T 178G 91% /data2
/dev/sdd1 1.9T 1.7T 209G 89% /data3
node05
/dev/sdb1 1.9T 1.7T 178G 91% /data1
/dev/sdc1 1.9T 1.7T 212G 89% /data2
/dev/sdd1 1.9T 1.7T 215G 89% /data3
node06
/dev/sdb1 1.9T 1.7T 170G 91% /data1
/dev/sdc1 1.9T 1.7T 198G 90% /data2
/dev/sdd1 1.9T 1.7T 212G 89% /data3
node07
/dev/sdb1 1.9T 1.7T 197G 90% /data1
/dev/sdc1 1.9T 1.6T 263G 86% /data2
/dev/sdd1 1.9T 1.6T 236G 88% /data3
Do you think hadoop is correct here?
On Thu, Jan 10, 2013 at 9:17 PM, Robert Molina <rm...@hortonworks.com>wrote:
> Hi Ivan,
> Here are a couple of more suggestions provided by the wiki:
>
> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> Regards,
> Robert
>
>
> On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> I also found following exception in datanode, I suppose it might give
>> some clue:
>>
>> 2013-01-10 11:37:55,397 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
>> src: /192.168.1.112:35991 dest: /192.168.1.112:50010
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
>> channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
>> 192.168.1.112:35991]
>> at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
>> at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
>> at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
>> at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
>> at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
>> at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
>> at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
>> at java.lang.Thread.run(Thread.java:662)
>>
>>
>> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
>> itretyakov@griddynamics.com> wrote:
>>
>>> Hello!
>>>
>>> On our cluster jobs fails with the following exception:
>>>
>>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>>> DataStreamer Exception
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>>> could only be replicated to 0 nodes instead of minReplication (=1). There
>>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>>> at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>>> at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>>> at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>> at $Proxy10.addBlock(Unknown Source)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>> at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>> at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>> at $Proxy10.addBlock(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>>> at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>>> at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>>> at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>>
>>> I've found that it could be cause by lack of free disk space, but as I
>>> could see there is everything well (see attached dfs report output).
>>> Also, I could see following exception in TaskTracker log
>>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if
>>> it is related.
>>>
>>> Could it be related with another issue on our cluster? -
>>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Best Regards
>>> Ivan Tretyakov
>>>
>>
>>
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>> Deployment Engineer
>> Grid Dynamics
>> +7 812 640 38 76
>> Skype: ivan.tretyakov
>> www.griddynamics.com
>> itretyakov@griddynamics.com
>>
>
>
--
Best Regards
Ivan Tretyakov
Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com
Re: could only be replicated to 0 nodes instead of minReplication
Posted by Ivan Tretyakov <it...@griddynamics.com>.
Thanks for replies.
Finally after trying many ways to resolve problem, e.g.:
- Number open files for mapreduce user
- xcievers and handler threads number options in datanode
- And others
Problem was obvious from apache wiki page below - there was a lack of disk
space.
But it was about 100-200Gb free per disk and space reserved for non-dfs
usage is only 10Gb and problem still appears.
Problem was solved after we freed up to ~500Gb per disk.
There are no:
- selinux
- quotas
- lack of inodes
- small limit on open files
in our fs.
But we could see problem with following disk space layout:
node01
/dev/sdb1 1.9T 317M 1.9T 1% /data1
/dev/sdc1 1.9T 317M 1.9T 1% /data2
/dev/sdd1 1.9T 317M 1.9T 1% /data3
node02
/dev/sdb1 1.9T 1.7T 146G 93% /data1
/dev/sdc1 1.9T 1.6T 225G 88% /data2
/dev/sdd1 1.9T 1.7T 219G 89% /data3
node03
/dev/sdb1 1.9T 1.8T 116G 94% /data1
/dev/sdc1 1.9T 1.8T 98G 95% /data2
/dev/sdd1 1.9T 1.7T 210G 89% /data3
node04
/dev/sdb1 1.9T 1.7T 140G 93% /data1
/dev/sdc1 1.9T 1.7T 178G 91% /data2
/dev/sdd1 1.9T 1.7T 209G 89% /data3
node05
/dev/sdb1 1.9T 1.7T 178G 91% /data1
/dev/sdc1 1.9T 1.7T 212G 89% /data2
/dev/sdd1 1.9T 1.7T 215G 89% /data3
node06
/dev/sdb1 1.9T 1.7T 170G 91% /data1
/dev/sdc1 1.9T 1.7T 198G 90% /data2
/dev/sdd1 1.9T 1.7T 212G 89% /data3
node07
/dev/sdb1 1.9T 1.7T 197G 90% /data1
/dev/sdc1 1.9T 1.6T 263G 86% /data2
/dev/sdd1 1.9T 1.6T 236G 88% /data3
Do you think hadoop is correct here?
On Thu, Jan 10, 2013 at 9:17 PM, Robert Molina <rm...@hortonworks.com>wrote:
> Hi Ivan,
> Here are a couple of more suggestions provided by the wiki:
>
> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> Regards,
> Robert
>
>
> On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> I also found following exception in datanode, I suppose it might give
>> some clue:
>>
>> 2013-01-10 11:37:55,397 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
>> src: /192.168.1.112:35991 dest: /192.168.1.112:50010
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
>> channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
>> 192.168.1.112:35991]
>> at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
>> at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
>> at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
>> at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
>> at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
>> at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
>> at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
>> at java.lang.Thread.run(Thread.java:662)
>>
>>
>> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
>> itretyakov@griddynamics.com> wrote:
>>
>>> Hello!
>>>
>>> On our cluster jobs fails with the following exception:
>>>
>>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>>> DataStreamer Exception
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>>> could only be replicated to 0 nodes instead of minReplication (=1). There
>>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>>> at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>>> at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>>> at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>> at $Proxy10.addBlock(Unknown Source)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>> at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>> at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>> at $Proxy10.addBlock(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>>> at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>>> at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>>> at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>>
>>> I've found that it could be cause by lack of free disk space, but as I
>>> could see there is everything well (see attached dfs report output).
>>> Also, I could see following exception in TaskTracker log
>>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if
>>> it is related.
>>>
>>> Could it be related with another issue on our cluster? -
>>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Best Regards
>>> Ivan Tretyakov
>>>
>>
>>
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>> Deployment Engineer
>> Grid Dynamics
>> +7 812 640 38 76
>> Skype: ivan.tretyakov
>> www.griddynamics.com
>> itretyakov@griddynamics.com
>>
>
>
--
Best Regards
Ivan Tretyakov
Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com
Re: could only be replicated to 0 nodes instead of minReplication
Posted by Robert Molina <rm...@hortonworks.com>.
Hi Ivan,
Here are a couple of more suggestions provided by the wiki:
http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
Regards,
Robert
On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:
> I also found following exception in datanode, I suppose it might give some
> clue:
>
> 2013-01-10 11:37:55,397 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
> src: /192.168.1.112:35991 dest: /192.168.1.112:50010
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
> 192.168.1.112:35991]
> at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
> at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
> at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
> at java.lang.Thread.run(Thread.java:662)
>
>
> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> Hello!
>>
>> On our cluster jobs fails with the following exception:
>>
>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>> DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>> could only be replicated to 0 nodes instead of minReplication (=1). There
>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>> at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>> at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>> at $Proxy10.addBlock(Unknown Source)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>> at $Proxy10.addBlock(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>
>> I've found that it could be cause by lack of free disk space, but as I
>> could see there is everything well (see attached dfs report output).
>> Also, I could see following exception in TaskTracker log
>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
>> is related.
>>
>> Could it be related with another issue on our cluster? -
>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>
>> Thanks in advance!
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>
>
>
> --
> Best Regards
> Ivan Tretyakov
>
> Deployment Engineer
> Grid Dynamics
> +7 812 640 38 76
> Skype: ivan.tretyakov
> www.griddynamics.com
> itretyakov@griddynamics.com
>
Re: could only be replicated to 0 nodes instead of minReplication
Posted by Robert Molina <rm...@hortonworks.com>.
Hi Ivan,
Here are a couple of more suggestions provided by the wiki:
http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
Regards,
Robert
On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:
> I also found following exception in datanode, I suppose it might give some
> clue:
>
> 2013-01-10 11:37:55,397 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
> src: /192.168.1.112:35991 dest: /192.168.1.112:50010
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
> 192.168.1.112:35991]
> at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
> at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
> at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
> at java.lang.Thread.run(Thread.java:662)
>
>
> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> Hello!
>>
>> On our cluster jobs fails with the following exception:
>>
>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>> DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>> could only be replicated to 0 nodes instead of minReplication (=1). There
>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>> at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>> at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>> at $Proxy10.addBlock(Unknown Source)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>> at $Proxy10.addBlock(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>
>> I've found that it could be cause by lack of free disk space, but as I
>> could see there is everything well (see attached dfs report output).
>> Also, I could see following exception in TaskTracker log
>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
>> is related.
>>
>> Could it be related with another issue on our cluster? -
>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>
>> Thanks in advance!
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>
>
>
> --
> Best Regards
> Ivan Tretyakov
>
> Deployment Engineer
> Grid Dynamics
> +7 812 640 38 76
> Skype: ivan.tretyakov
> www.griddynamics.com
> itretyakov@griddynamics.com
>
Re: could only be replicated to 0 nodes instead of minReplication
Posted by Robert Molina <rm...@hortonworks.com>.
Hi Ivan,
Here are a couple of more suggestions provided by the wiki:
http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
Regards,
Robert
On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:
> I also found following exception in datanode, I suppose it might give some
> clue:
>
> 2013-01-10 11:37:55,397 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
> src: /192.168.1.112:35991 dest: /192.168.1.112:50010
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
> 192.168.1.112:35991]
> at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
> at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
> at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
> at java.lang.Thread.run(Thread.java:662)
>
>
> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> Hello!
>>
>> On our cluster jobs fails with the following exception:
>>
>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>> DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>> could only be replicated to 0 nodes instead of minReplication (=1). There
>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>> at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>> at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>> at $Proxy10.addBlock(Unknown Source)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>> at $Proxy10.addBlock(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>
>> I've found that it could be cause by lack of free disk space, but as I
>> could see there is everything well (see attached dfs report output).
>> Also, I could see following exception in TaskTracker log
>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
>> is related.
>>
>> Could it be related with another issue on our cluster? -
>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>
>> Thanks in advance!
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>
>
>
> --
> Best Regards
> Ivan Tretyakov
>
> Deployment Engineer
> Grid Dynamics
> +7 812 640 38 76
> Skype: ivan.tretyakov
> www.griddynamics.com
> itretyakov@griddynamics.com
>
Re: could only be replicated to 0 nodes instead of minReplication
Posted by Robert Molina <rm...@hortonworks.com>.
Hi Ivan,
Here are a couple of more suggestions provided by the wiki:
http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
Regards,
Robert
On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:
> I also found following exception in datanode, I suppose it might give some
> clue:
>
> 2013-01-10 11:37:55,397 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
> src: /192.168.1.112:35991 dest: /192.168.1.112:50010
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
> 192.168.1.112:35991]
> at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
> at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
> at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
> at java.lang.Thread.run(Thread.java:662)
>
>
> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> Hello!
>>
>> On our cluster jobs fails with the following exception:
>>
>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>> DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>> could only be replicated to 0 nodes instead of minReplication (=1). There
>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>> at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>> at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>> at $Proxy10.addBlock(Unknown Source)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>> at $Proxy10.addBlock(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>
>> I've found that it could be cause by lack of free disk space, but as I
>> could see there is everything well (see attached dfs report output).
>> Also, I could see following exception in TaskTracker log
>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
>> is related.
>>
>> Could it be related with another issue on our cluster? -
>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>
>> Thanks in advance!
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>
>
>
> --
> Best Regards
> Ivan Tretyakov
>
> Deployment Engineer
> Grid Dynamics
> +7 812 640 38 76
> Skype: ivan.tretyakov
> www.griddynamics.com
> itretyakov@griddynamics.com
>
Re: could only be replicated to 0 nodes instead of minReplication
Posted by Ivan Tretyakov <it...@griddynamics.com>.
I also found following exception in datanode, I suppose it might give some
clue:
2013-01-10 11:37:55,397 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
src: /192.168.1.112:35991 dest: /192.168.1.112:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
192.168.1.112:35991]
at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
at java.lang.Thread.run(Thread.java:662)
On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:
> Hello!
>
> On our cluster jobs fails with the following exception:
>
> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
> could only be replicated to 0 nodes instead of minReplication (=1). There
> are 6 datanode(s) running and no node(s) are excluded in this operation.
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1160)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy10.addBlock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy10.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>
> I've found that it could be cause by lack of free disk space, but as I
> could see there is everything well (see attached dfs report output).
> Also, I could see following exception in TaskTracker log
> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
> is related.
>
> Could it be related with another issue on our cluster? -
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>
> Thanks in advance!
>
> --
> Best Regards
> Ivan Tretyakov
>
--
Best Regards
Ivan Tretyakov
Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com
Re: could only be replicated to 0 nodes instead of minReplication
Posted by Ivan Tretyakov <it...@griddynamics.com>.
I also found following exception in datanode, I suppose it might give some
clue:
2013-01-10 11:37:55,397 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
src: /192.168.1.112:35991 dest: /192.168.1.112:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
192.168.1.112:35991]
at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
at java.lang.Thread.run(Thread.java:662)
On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:
> Hello!
>
> On our cluster jobs fails with the following exception:
>
> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
> could only be replicated to 0 nodes instead of minReplication (=1). There
> are 6 datanode(s) running and no node(s) are excluded in this operation.
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1160)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy10.addBlock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy10.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>
> I've found that it could be cause by lack of free disk space, but as I
> could see there is everything well (see attached dfs report output).
> Also, I could see following exception in TaskTracker log
> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
> is related.
>
> Could it be related with another issue on our cluster? -
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>
> Thanks in advance!
>
> --
> Best Regards
> Ivan Tretyakov
>
--
Best Regards
Ivan Tretyakov
Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com
Re: could only be replicated to 0 nodes instead of minReplication
Posted by Ivan Tretyakov <it...@griddynamics.com>.
I also found following exception in datanode, I suppose it might give some
clue:
2013-01-10 11:37:55,397 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
src: /192.168.1.112:35991 dest: /192.168.1.112:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
192.168.1.112:35991]
at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
at java.lang.Thread.run(Thread.java:662)
On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:
> Hello!
>
> On our cluster jobs fails with the following exception:
>
> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
> could only be replicated to 0 nodes instead of minReplication (=1). There
> are 6 datanode(s) running and no node(s) are excluded in this operation.
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1160)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy10.addBlock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy10.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>
> I've found that it could be cause by lack of free disk space, but as I
> could see there is everything well (see attached dfs report output).
> Also, I could see following exception in TaskTracker log
> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
> is related.
>
> Could it be related with another issue on our cluster? -
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>
> Thanks in advance!
>
> --
> Best Regards
> Ivan Tretyakov
>
--
Best Regards
Ivan Tretyakov
Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com
Re: could only be replicated to 0 nodes instead of minReplication
Posted by Ivan Tretyakov <it...@griddynamics.com>.
I also found following exception in datanode, I suppose it might give some
clue:
2013-01-10 11:37:55,397 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
src: /192.168.1.112:35991 dest: /192.168.1.112:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
192.168.1.112:35991]
at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
at java.lang.Thread.run(Thread.java:662)
On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:
> Hello!
>
> On our cluster jobs fails with the following exception:
>
> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
> could only be replicated to 0 nodes instead of minReplication (=1). There
> are 6 datanode(s) running and no node(s) are excluded in this operation.
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1160)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy10.addBlock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy10.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>
> I've found that it could be cause by lack of free disk space, but as I
> could see there is everything well (see attached dfs report output).
> Also, I could see following exception in TaskTracker log
> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
> is related.
>
> Could it be related with another issue on our cluster? -
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>
> Thanks in advance!
>
> --
> Best Regards
> Ivan Tretyakov
>
--
Best Regards
Ivan Tretyakov
Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com