You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ivan Tretyakov <it...@griddynamics.com> on 2013/01/10 13:04:14 UTC

could only be replicated to 0 nodes instead of minReplication

Hello!

On our cluster jobs fails with the following exception:

2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
could only be replicated to 0 nodes instead of minReplication (=1).  There
are 6 datanode(s) running and no node(s) are excluded in this operation.
        at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

        at org.apache.hadoop.ipc.Client.call(Client.java:1160)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy10.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy10.addBlock(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)

I've found that it could be cause by lack of free disk space, but as I
could see there is everything well (see attached dfs report output).
Also, I could see following exception in TaskTracker log
https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it is
related.

Could it be related with another issue on our cluster? -
http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E

Thanks in advance!

-- 
Best Regards
Ivan Tretyakov

Re: could only be replicated to 0 nodes instead of minReplication

Posted by Ivan Tretyakov <it...@griddynamics.com>.
Thanks for replies.

Finally after trying many ways to resolve problem, e.g.:

- Number open files for mapreduce user
- xcievers and handler threads number options in datanode
- And others

Problem was obvious from apache wiki page below - there was a lack of disk
space.
But it was about 100-200Gb free per disk and space reserved for non-dfs
usage is only 10Gb and problem still appears.
Problem was solved after we freed up to ~500Gb per disk.

There are no:

- selinux
- quotas
- lack of inodes
- small limit on open files

in our fs.

But we could see problem with following disk space layout:

node01
/dev/sdb1 1.9T 317M 1.9T 1% /data1
/dev/sdc1 1.9T 317M 1.9T 1% /data2
/dev/sdd1 1.9T 317M 1.9T 1% /data3
node02
/dev/sdb1 1.9T 1.7T 146G 93% /data1
/dev/sdc1 1.9T 1.6T 225G 88% /data2
/dev/sdd1 1.9T 1.7T 219G 89% /data3
node03
/dev/sdb1 1.9T 1.8T 116G 94% /data1
/dev/sdc1 1.9T 1.8T 98G 95% /data2
/dev/sdd1 1.9T 1.7T 210G 89% /data3
node04
/dev/sdb1 1.9T 1.7T 140G 93% /data1
/dev/sdc1 1.9T 1.7T 178G 91% /data2
/dev/sdd1 1.9T 1.7T 209G 89% /data3
node05
/dev/sdb1 1.9T 1.7T 178G 91% /data1
/dev/sdc1 1.9T 1.7T 212G 89% /data2
/dev/sdd1 1.9T 1.7T 215G 89% /data3
node06
/dev/sdb1 1.9T 1.7T 170G 91% /data1
/dev/sdc1 1.9T 1.7T 198G 90% /data2
/dev/sdd1 1.9T 1.7T 212G 89% /data3
node07
/dev/sdb1 1.9T 1.7T 197G 90% /data1
/dev/sdc1 1.9T 1.6T 263G 86% /data2
/dev/sdd1 1.9T 1.6T 236G 88% /data3

Do you think hadoop is correct here?


On Thu, Jan 10, 2013 at 9:17 PM, Robert Molina <rm...@hortonworks.com>wrote:

> Hi Ivan,
> Here are a couple of more suggestions provided by the wiki:
>
> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> Regards,
> Robert
>
>
> On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> I also found following exception in datanode, I suppose it might give
>> some clue:
>>
>> 2013-01-10 11:37:55,397 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
>>  src: /192.168.1.112:35991 dest: /192.168.1.112:50010
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
>> channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
>> 192.168.1.112:35991]
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
>>         at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
>>         at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
>>         at java.lang.Thread.run(Thread.java:662)
>>
>>
>> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
>> itretyakov@griddynamics.com> wrote:
>>
>>> Hello!
>>>
>>> On our cluster jobs fails with the following exception:
>>>
>>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>>> DataStreamer Exception
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>>> could only be replicated to 0 nodes instead of minReplication (=1).  There
>>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>>>         at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>>         at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>         at $Proxy10.addBlock(Unknown Source)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>         at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>>         at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>>         at $Proxy10.addBlock(Unknown Source)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>>
>>> I've found that it could be cause by lack of free disk space, but as I
>>> could see there is everything well (see attached dfs report output).
>>> Also, I could see following exception in TaskTracker log
>>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if
>>> it is related.
>>>
>>> Could it be related with another issue on our cluster? -
>>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Best Regards
>>> Ivan Tretyakov
>>>
>>
>>
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>> Deployment Engineer
>> Grid Dynamics
>> +7 812 640 38 76
>>  Skype: ivan.tretyakov
>> www.griddynamics.com
>> itretyakov@griddynamics.com
>>
>
>


-- 
Best Regards
Ivan Tretyakov

Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com

Re: could only be replicated to 0 nodes instead of minReplication

Posted by Ivan Tretyakov <it...@griddynamics.com>.
Thanks for replies.

Finally after trying many ways to resolve problem, e.g.:

- Number open files for mapreduce user
- xcievers and handler threads number options in datanode
- And others

Problem was obvious from apache wiki page below - there was a lack of disk
space.
But it was about 100-200Gb free per disk and space reserved for non-dfs
usage is only 10Gb and problem still appears.
Problem was solved after we freed up to ~500Gb per disk.

There are no:

- selinux
- quotas
- lack of inodes
- small limit on open files

in our fs.

But we could see problem with following disk space layout:

node01
/dev/sdb1 1.9T 317M 1.9T 1% /data1
/dev/sdc1 1.9T 317M 1.9T 1% /data2
/dev/sdd1 1.9T 317M 1.9T 1% /data3
node02
/dev/sdb1 1.9T 1.7T 146G 93% /data1
/dev/sdc1 1.9T 1.6T 225G 88% /data2
/dev/sdd1 1.9T 1.7T 219G 89% /data3
node03
/dev/sdb1 1.9T 1.8T 116G 94% /data1
/dev/sdc1 1.9T 1.8T 98G 95% /data2
/dev/sdd1 1.9T 1.7T 210G 89% /data3
node04
/dev/sdb1 1.9T 1.7T 140G 93% /data1
/dev/sdc1 1.9T 1.7T 178G 91% /data2
/dev/sdd1 1.9T 1.7T 209G 89% /data3
node05
/dev/sdb1 1.9T 1.7T 178G 91% /data1
/dev/sdc1 1.9T 1.7T 212G 89% /data2
/dev/sdd1 1.9T 1.7T 215G 89% /data3
node06
/dev/sdb1 1.9T 1.7T 170G 91% /data1
/dev/sdc1 1.9T 1.7T 198G 90% /data2
/dev/sdd1 1.9T 1.7T 212G 89% /data3
node07
/dev/sdb1 1.9T 1.7T 197G 90% /data1
/dev/sdc1 1.9T 1.6T 263G 86% /data2
/dev/sdd1 1.9T 1.6T 236G 88% /data3

Do you think hadoop is correct here?


On Thu, Jan 10, 2013 at 9:17 PM, Robert Molina <rm...@hortonworks.com>wrote:

> Hi Ivan,
> Here are a couple of more suggestions provided by the wiki:
>
> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> Regards,
> Robert
>
>
> On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> I also found following exception in datanode, I suppose it might give
>> some clue:
>>
>> 2013-01-10 11:37:55,397 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
>>  src: /192.168.1.112:35991 dest: /192.168.1.112:50010
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
>> channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
>> 192.168.1.112:35991]
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
>>         at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
>>         at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
>>         at java.lang.Thread.run(Thread.java:662)
>>
>>
>> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
>> itretyakov@griddynamics.com> wrote:
>>
>>> Hello!
>>>
>>> On our cluster jobs fails with the following exception:
>>>
>>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>>> DataStreamer Exception
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>>> could only be replicated to 0 nodes instead of minReplication (=1).  There
>>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>>>         at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>>         at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>         at $Proxy10.addBlock(Unknown Source)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>         at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>>         at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>>         at $Proxy10.addBlock(Unknown Source)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>>
>>> I've found that it could be cause by lack of free disk space, but as I
>>> could see there is everything well (see attached dfs report output).
>>> Also, I could see following exception in TaskTracker log
>>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if
>>> it is related.
>>>
>>> Could it be related with another issue on our cluster? -
>>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Best Regards
>>> Ivan Tretyakov
>>>
>>
>>
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>> Deployment Engineer
>> Grid Dynamics
>> +7 812 640 38 76
>>  Skype: ivan.tretyakov
>> www.griddynamics.com
>> itretyakov@griddynamics.com
>>
>
>


-- 
Best Regards
Ivan Tretyakov

Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com

Re: could only be replicated to 0 nodes instead of minReplication

Posted by Ivan Tretyakov <it...@griddynamics.com>.
Thanks for replies.

Finally after trying many ways to resolve problem, e.g.:

- Number open files for mapreduce user
- xcievers and handler threads number options in datanode
- And others

Problem was obvious from apache wiki page below - there was a lack of disk
space.
But it was about 100-200Gb free per disk and space reserved for non-dfs
usage is only 10Gb and problem still appears.
Problem was solved after we freed up to ~500Gb per disk.

There are no:

- selinux
- quotas
- lack of inodes
- small limit on open files

in our fs.

But we could see problem with following disk space layout:

node01
/dev/sdb1 1.9T 317M 1.9T 1% /data1
/dev/sdc1 1.9T 317M 1.9T 1% /data2
/dev/sdd1 1.9T 317M 1.9T 1% /data3
node02
/dev/sdb1 1.9T 1.7T 146G 93% /data1
/dev/sdc1 1.9T 1.6T 225G 88% /data2
/dev/sdd1 1.9T 1.7T 219G 89% /data3
node03
/dev/sdb1 1.9T 1.8T 116G 94% /data1
/dev/sdc1 1.9T 1.8T 98G 95% /data2
/dev/sdd1 1.9T 1.7T 210G 89% /data3
node04
/dev/sdb1 1.9T 1.7T 140G 93% /data1
/dev/sdc1 1.9T 1.7T 178G 91% /data2
/dev/sdd1 1.9T 1.7T 209G 89% /data3
node05
/dev/sdb1 1.9T 1.7T 178G 91% /data1
/dev/sdc1 1.9T 1.7T 212G 89% /data2
/dev/sdd1 1.9T 1.7T 215G 89% /data3
node06
/dev/sdb1 1.9T 1.7T 170G 91% /data1
/dev/sdc1 1.9T 1.7T 198G 90% /data2
/dev/sdd1 1.9T 1.7T 212G 89% /data3
node07
/dev/sdb1 1.9T 1.7T 197G 90% /data1
/dev/sdc1 1.9T 1.6T 263G 86% /data2
/dev/sdd1 1.9T 1.6T 236G 88% /data3

Do you think hadoop is correct here?


On Thu, Jan 10, 2013 at 9:17 PM, Robert Molina <rm...@hortonworks.com>wrote:

> Hi Ivan,
> Here are a couple of more suggestions provided by the wiki:
>
> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> Regards,
> Robert
>
>
> On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> I also found following exception in datanode, I suppose it might give
>> some clue:
>>
>> 2013-01-10 11:37:55,397 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
>>  src: /192.168.1.112:35991 dest: /192.168.1.112:50010
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
>> channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
>> 192.168.1.112:35991]
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
>>         at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
>>         at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
>>         at java.lang.Thread.run(Thread.java:662)
>>
>>
>> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
>> itretyakov@griddynamics.com> wrote:
>>
>>> Hello!
>>>
>>> On our cluster jobs fails with the following exception:
>>>
>>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>>> DataStreamer Exception
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>>> could only be replicated to 0 nodes instead of minReplication (=1).  There
>>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>>>         at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>>         at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>         at $Proxy10.addBlock(Unknown Source)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>         at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>>         at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>>         at $Proxy10.addBlock(Unknown Source)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>>
>>> I've found that it could be cause by lack of free disk space, but as I
>>> could see there is everything well (see attached dfs report output).
>>> Also, I could see following exception in TaskTracker log
>>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if
>>> it is related.
>>>
>>> Could it be related with another issue on our cluster? -
>>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Best Regards
>>> Ivan Tretyakov
>>>
>>
>>
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>> Deployment Engineer
>> Grid Dynamics
>> +7 812 640 38 76
>>  Skype: ivan.tretyakov
>> www.griddynamics.com
>> itretyakov@griddynamics.com
>>
>
>


-- 
Best Regards
Ivan Tretyakov

Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com

Re: could only be replicated to 0 nodes instead of minReplication

Posted by Ivan Tretyakov <it...@griddynamics.com>.
Thanks for replies.

Finally after trying many ways to resolve problem, e.g.:

- Number open files for mapreduce user
- xcievers and handler threads number options in datanode
- And others

Problem was obvious from apache wiki page below - there was a lack of disk
space.
But it was about 100-200Gb free per disk and space reserved for non-dfs
usage is only 10Gb and problem still appears.
Problem was solved after we freed up to ~500Gb per disk.

There are no:

- selinux
- quotas
- lack of inodes
- small limit on open files

in our fs.

But we could see problem with following disk space layout:

node01
/dev/sdb1 1.9T 317M 1.9T 1% /data1
/dev/sdc1 1.9T 317M 1.9T 1% /data2
/dev/sdd1 1.9T 317M 1.9T 1% /data3
node02
/dev/sdb1 1.9T 1.7T 146G 93% /data1
/dev/sdc1 1.9T 1.6T 225G 88% /data2
/dev/sdd1 1.9T 1.7T 219G 89% /data3
node03
/dev/sdb1 1.9T 1.8T 116G 94% /data1
/dev/sdc1 1.9T 1.8T 98G 95% /data2
/dev/sdd1 1.9T 1.7T 210G 89% /data3
node04
/dev/sdb1 1.9T 1.7T 140G 93% /data1
/dev/sdc1 1.9T 1.7T 178G 91% /data2
/dev/sdd1 1.9T 1.7T 209G 89% /data3
node05
/dev/sdb1 1.9T 1.7T 178G 91% /data1
/dev/sdc1 1.9T 1.7T 212G 89% /data2
/dev/sdd1 1.9T 1.7T 215G 89% /data3
node06
/dev/sdb1 1.9T 1.7T 170G 91% /data1
/dev/sdc1 1.9T 1.7T 198G 90% /data2
/dev/sdd1 1.9T 1.7T 212G 89% /data3
node07
/dev/sdb1 1.9T 1.7T 197G 90% /data1
/dev/sdc1 1.9T 1.6T 263G 86% /data2
/dev/sdd1 1.9T 1.6T 236G 88% /data3

Do you think hadoop is correct here?


On Thu, Jan 10, 2013 at 9:17 PM, Robert Molina <rm...@hortonworks.com>wrote:

> Hi Ivan,
> Here are a couple of more suggestions provided by the wiki:
>
> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> Regards,
> Robert
>
>
> On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> I also found following exception in datanode, I suppose it might give
>> some clue:
>>
>> 2013-01-10 11:37:55,397 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
>>  src: /192.168.1.112:35991 dest: /192.168.1.112:50010
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
>> channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
>> 192.168.1.112:35991]
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
>>         at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
>>         at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
>>         at java.lang.Thread.run(Thread.java:662)
>>
>>
>> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
>> itretyakov@griddynamics.com> wrote:
>>
>>> Hello!
>>>
>>> On our cluster jobs fails with the following exception:
>>>
>>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>>> DataStreamer Exception
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>>> could only be replicated to 0 nodes instead of minReplication (=1).  There
>>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>>>         at
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>>         at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>         at $Proxy10.addBlock(Unknown Source)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>         at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>>         at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>>         at $Proxy10.addBlock(Unknown Source)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>>
>>> I've found that it could be cause by lack of free disk space, but as I
>>> could see there is everything well (see attached dfs report output).
>>> Also, I could see following exception in TaskTracker log
>>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if
>>> it is related.
>>>
>>> Could it be related with another issue on our cluster? -
>>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Best Regards
>>> Ivan Tretyakov
>>>
>>
>>
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>> Deployment Engineer
>> Grid Dynamics
>> +7 812 640 38 76
>>  Skype: ivan.tretyakov
>> www.griddynamics.com
>> itretyakov@griddynamics.com
>>
>
>


-- 
Best Regards
Ivan Tretyakov

Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com

Re: could only be replicated to 0 nodes instead of minReplication

Posted by Robert Molina <rm...@hortonworks.com>.
Hi Ivan,
Here are a couple of more suggestions provided by the wiki:

http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo

Regards,
Robert

On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:

> I also found following exception in datanode, I suppose it might give some
> clue:
>
> 2013-01-10 11:37:55,397 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
>  src: /192.168.1.112:35991 dest: /192.168.1.112:50010
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
> 192.168.1.112:35991]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
>         at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
>         at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
>         at java.lang.Thread.run(Thread.java:662)
>
>
> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> Hello!
>>
>> On our cluster jobs fails with the following exception:
>>
>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>> DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>> could only be replicated to 0 nodes instead of minReplication (=1).  There
>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>>         at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>>         at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>>         at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>>         at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>         at $Proxy10.addBlock(Unknown Source)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>         at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>         at $Proxy10.addBlock(Unknown Source)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>
>> I've found that it could be cause by lack of free disk space, but as I
>> could see there is everything well (see attached dfs report output).
>> Also, I could see following exception in TaskTracker log
>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
>> is related.
>>
>> Could it be related with another issue on our cluster? -
>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>
>> Thanks in advance!
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>
>
>
> --
> Best Regards
> Ivan Tretyakov
>
> Deployment Engineer
> Grid Dynamics
> +7 812 640 38 76
> Skype: ivan.tretyakov
> www.griddynamics.com
> itretyakov@griddynamics.com
>

Re: could only be replicated to 0 nodes instead of minReplication

Posted by Robert Molina <rm...@hortonworks.com>.
Hi Ivan,
Here are a couple of more suggestions provided by the wiki:

http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo

Regards,
Robert

On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:

> I also found following exception in datanode, I suppose it might give some
> clue:
>
> 2013-01-10 11:37:55,397 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
>  src: /192.168.1.112:35991 dest: /192.168.1.112:50010
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
> 192.168.1.112:35991]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
>         at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
>         at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
>         at java.lang.Thread.run(Thread.java:662)
>
>
> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> Hello!
>>
>> On our cluster jobs fails with the following exception:
>>
>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>> DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>> could only be replicated to 0 nodes instead of minReplication (=1).  There
>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>>         at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>>         at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>>         at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>>         at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>         at $Proxy10.addBlock(Unknown Source)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>         at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>         at $Proxy10.addBlock(Unknown Source)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>
>> I've found that it could be cause by lack of free disk space, but as I
>> could see there is everything well (see attached dfs report output).
>> Also, I could see following exception in TaskTracker log
>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
>> is related.
>>
>> Could it be related with another issue on our cluster? -
>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>
>> Thanks in advance!
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>
>
>
> --
> Best Regards
> Ivan Tretyakov
>
> Deployment Engineer
> Grid Dynamics
> +7 812 640 38 76
> Skype: ivan.tretyakov
> www.griddynamics.com
> itretyakov@griddynamics.com
>

Re: could only be replicated to 0 nodes instead of minReplication

Posted by Robert Molina <rm...@hortonworks.com>.
Hi Ivan,
Here are a couple of more suggestions provided by the wiki:

http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo

Regards,
Robert

On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:

> I also found following exception in datanode, I suppose it might give some
> clue:
>
> 2013-01-10 11:37:55,397 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
>  src: /192.168.1.112:35991 dest: /192.168.1.112:50010
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
> 192.168.1.112:35991]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
>         at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
>         at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
>         at java.lang.Thread.run(Thread.java:662)
>
>
> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> Hello!
>>
>> On our cluster jobs fails with the following exception:
>>
>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>> DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>> could only be replicated to 0 nodes instead of minReplication (=1).  There
>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>>         at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>>         at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>>         at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>>         at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>         at $Proxy10.addBlock(Unknown Source)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>         at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>         at $Proxy10.addBlock(Unknown Source)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>
>> I've found that it could be cause by lack of free disk space, but as I
>> could see there is everything well (see attached dfs report output).
>> Also, I could see following exception in TaskTracker log
>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
>> is related.
>>
>> Could it be related with another issue on our cluster? -
>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>
>> Thanks in advance!
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>
>
>
> --
> Best Regards
> Ivan Tretyakov
>
> Deployment Engineer
> Grid Dynamics
> +7 812 640 38 76
> Skype: ivan.tretyakov
> www.griddynamics.com
> itretyakov@griddynamics.com
>

Re: could only be replicated to 0 nodes instead of minReplication

Posted by Robert Molina <rm...@hortonworks.com>.
Hi Ivan,
Here are a couple of more suggestions provided by the wiki:

http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo

Regards,
Robert

On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:

> I also found following exception in datanode, I suppose it might give some
> clue:
>
> 2013-01-10 11:37:55,397 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
>  src: /192.168.1.112:35991 dest: /192.168.1.112:50010
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
> 192.168.1.112:35991]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
>         at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
>         at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
>         at java.lang.Thread.run(Thread.java:662)
>
>
> On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> Hello!
>>
>> On our cluster jobs fails with the following exception:
>>
>> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
>> DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
>> could only be replicated to 0 nodes instead of minReplication (=1).  There
>> are 6 datanode(s) running and no node(s) are excluded in this operation.
>>         at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>>         at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>>         at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>>         at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>         at $Proxy10.addBlock(Unknown Source)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>         at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>         at $Proxy10.addBlock(Unknown Source)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>>
>> I've found that it could be cause by lack of free disk space, but as I
>> could see there is everything well (see attached dfs report output).
>> Also, I could see following exception in TaskTracker log
>> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
>> is related.
>>
>> Could it be related with another issue on our cluster? -
>> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>>
>> Thanks in advance!
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>
>
>
> --
> Best Regards
> Ivan Tretyakov
>
> Deployment Engineer
> Grid Dynamics
> +7 812 640 38 76
> Skype: ivan.tretyakov
> www.griddynamics.com
> itretyakov@griddynamics.com
>

Re: could only be replicated to 0 nodes instead of minReplication

Posted by Ivan Tretyakov <it...@griddynamics.com>.
I also found following exception in datanode, I suppose it might give some
clue:

2013-01-10 11:37:55,397 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
 src: /192.168.1.112:35991 dest: /192.168.1.112:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
192.168.1.112:35991]
        at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
        at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
        at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
        at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
        at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
        at java.lang.Thread.run(Thread.java:662)


On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:

> Hello!
>
> On our cluster jobs fails with the following exception:
>
> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
> could only be replicated to 0 nodes instead of minReplication (=1).  There
> are 6 datanode(s) running and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at $Proxy10.addBlock(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at $Proxy10.addBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>
> I've found that it could be cause by lack of free disk space, but as I
> could see there is everything well (see attached dfs report output).
> Also, I could see following exception in TaskTracker log
> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
> is related.
>
> Could it be related with another issue on our cluster? -
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>
> Thanks in advance!
>
> --
> Best Regards
> Ivan Tretyakov
>



-- 
Best Regards
Ivan Tretyakov

Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com

Re: could only be replicated to 0 nodes instead of minReplication

Posted by Ivan Tretyakov <it...@griddynamics.com>.
I also found following exception in datanode, I suppose it might give some
clue:

2013-01-10 11:37:55,397 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
 src: /192.168.1.112:35991 dest: /192.168.1.112:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
192.168.1.112:35991]
        at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
        at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
        at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
        at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
        at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
        at java.lang.Thread.run(Thread.java:662)


On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:

> Hello!
>
> On our cluster jobs fails with the following exception:
>
> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
> could only be replicated to 0 nodes instead of minReplication (=1).  There
> are 6 datanode(s) running and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at $Proxy10.addBlock(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at $Proxy10.addBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>
> I've found that it could be cause by lack of free disk space, but as I
> could see there is everything well (see attached dfs report output).
> Also, I could see following exception in TaskTracker log
> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
> is related.
>
> Could it be related with another issue on our cluster? -
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>
> Thanks in advance!
>
> --
> Best Regards
> Ivan Tretyakov
>



-- 
Best Regards
Ivan Tretyakov

Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com

Re: could only be replicated to 0 nodes instead of minReplication

Posted by Ivan Tretyakov <it...@griddynamics.com>.
I also found following exception in datanode, I suppose it might give some
clue:

2013-01-10 11:37:55,397 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
 src: /192.168.1.112:35991 dest: /192.168.1.112:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
192.168.1.112:35991]
        at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
        at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
        at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
        at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
        at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
        at java.lang.Thread.run(Thread.java:662)


On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:

> Hello!
>
> On our cluster jobs fails with the following exception:
>
> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
> could only be replicated to 0 nodes instead of minReplication (=1).  There
> are 6 datanode(s) running and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at $Proxy10.addBlock(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at $Proxy10.addBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>
> I've found that it could be cause by lack of free disk space, but as I
> could see there is everything well (see attached dfs report output).
> Also, I could see following exception in TaskTracker log
> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
> is related.
>
> Could it be related with another issue on our cluster? -
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>
> Thanks in advance!
>
> --
> Best Regards
> Ivan Tretyakov
>



-- 
Best Regards
Ivan Tretyakov

Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com

Re: could only be replicated to 0 nodes instead of minReplication

Posted by Ivan Tretyakov <it...@griddynamics.com>.
I also found following exception in datanode, I suppose it might give some
clue:

2013-01-10 11:37:55,397 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation
 src: /192.168.1.112:35991 dest: /192.168.1.112:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/
192.168.1.112:35991]
        at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247)
        at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
        at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280)
        at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88)
        at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
        at java.lang.Thread.run(Thread.java:662)


On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov <itretyakov@griddynamics.com
> wrote:

> Hello!
>
> On our cluster jobs fails with the following exception:
>
> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca
> could only be replicated to 0 nodes instead of minReplication (=1).  There
> are 6 datanode(s) running and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1160)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at $Proxy10.addBlock(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at $Proxy10.addBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>
> I've found that it could be cause by lack of free disk space, but as I
> could see there is everything well (see attached dfs report output).
> Also, I could see following exception in TaskTracker log
> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it
> is related.
>
> Could it be related with another issue on our cluster? -
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E
>
> Thanks in advance!
>
> --
> Best Regards
> Ivan Tretyakov
>



-- 
Best Regards
Ivan Tretyakov

Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com