You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Nishant Verma <ni...@gmail.com> on 2017/07/03 07:27:18 UTC

java.io.IOException on Namenode logs

Hello

I am having Kafka Connect writing records on my HDFS nodes. HDFS cluster
has 3 datanodes. Last night I observed data loss in records committed to
HDFS. There was no issue on Kafka Connect side. However, I can see Namenode
showing below error logs:

java.io.IOException: File
/topics/+tmp/testTopic/year=2017/month=07/day=03/hour=03/8237cfb7-2b3d-4d5c-ab04-924c0f647cd6_tmp
could only be replicated to 0 nodes instead of minReplication (=1).  There
are 3 datanode(s) running and no node(s) are excluded in this operation.
        at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1571)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:725)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy:
Failed to place enough replicas, still in need of 3 to reach 3
(unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
newBlock=true) For more information, please enable DEBUG log level on
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy


Before occurence of every such line, we see below line:
2017-07-02 23:33:43,255 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 9000, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.1.2.3:4982
Call#274492 Retry#0

10.1.2.3 is one of the Kafka Connect nodes.


I checked below things:

- There is no disk issue on datanodes. There is 110 GB space left in each
datanode.
- In dfsadmin report, there are 3 live datanodes showing.
- dfs.datanode.du.reserved is used as its default value i.e. 0
- dfs.replication is set as 3.
- dfs.datanode.handler.count is used as its default value i.e. 10.
- dfs.datanode.data.dir.perm is used as its default value i.e. 700. But
single user is used everywhere. So permission issue would not be there.
Also, it did give accurate result for 22 hours and happened after 22nd hour.
- Could not find any error occurrence for this timestamp in datanode logs.
- The path where dfs.data.dir points has 64% space available on disk.

What could be the cause of this error and how to fix this? Why is it saying
the file could only be replicated to 0 nodes when it also says there are 3
datanodes available?

Thanks
Nishant

RE: java.io.IOException on Namenode logs

Posted by Brahma Reddy Battula <br...@huawei.com>.
Hi Nishant Verma

It will be great, if you mention which version of Hadoop you are using.

Apart from your findings(even I appreciate) and daemeon mentioned, you can check following also.


1)      Non-dfs used is more(you can check in namenodeUI/adminreport/jmx)

2)      Scheduled blocks are more(you can check jmx)

If there is any possibility enable the debug logs which can give useful info.


--Brahma Reddy Battula

From: daemeon reiydelle [mailto:daemeonr@gmail.com]
Sent: 04 July 2017 01:04
To: Nishant Verma
Cc: user
Subject: Re: java.io.IOException on Namenode logs

A possibility is that the node showing errors was not able to get tcp connection, or heavy network conjestion, or (possibly) heavy garbage collection tomeouts. Would suspect network
...
There is no sin except stupidity - Oscar Wilde
...
Daemeon (Dæmœn) Reiydelle
USA 1.415.501.0198

On Jul 3, 2017 12:27 AM, "Nishant Verma" <ni...@gmail.com>> wrote:
Hello

I am having Kafka Connect writing records on my HDFS nodes. HDFS cluster has 3 datanodes. Last night I observed data loss in records committed to HDFS. There was no issue on Kafka Connect side. However, I can see Namenode showing below error logs:

java.io.IOException: File /topics/+tmp/testTopic/year=2017/month=07/day=03/hour=03/8237cfb7-2b3d-4d5c-ab04-924c0f647cd6_tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1571)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:725)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 3 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy


Before occurence of every such line, we see below line:
2017-07-02 23:33:43,255 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.1.2.3:4982<http://10.1.2.3:4982> Call#274492 Retry#0

10.1.2.3 is one of the Kafka Connect nodes.


I checked below things:

- There is no disk issue on datanodes. There is 110 GB space left in each datanode.
- In dfsadmin report, there are 3 live datanodes showing.
- dfs.datanode.du.reserved is used as its default value i.e. 0
- dfs.replication is set as 3.
- dfs.datanode.handler.count is used as its default value i.e. 10.
- dfs.datanode.data.dir.perm is used as its default value i.e. 700. But single user is used everywhere. So permission issue would not be there. Also, it did give accurate result for 22 hours and happened after 22nd hour.
- Could not find any error occurrence for this timestamp in datanode logs.
- The path where dfs.data.dir points has 64% space available on disk.

What could be the cause of this error and how to fix this? Why is it saying the file could only be replicated to 0 nodes when it also says there are 3 datanodes available?

Thanks
Nishant


Re: java.io.IOException on Namenode logs

Posted by daemeon reiydelle <da...@gmail.com>.
A possibility is that the node showing errors was not able to get tcp
connection, or heavy network conjestion, or (possibly) heavy garbage
collection tomeouts. Would suspect network

...
There is no sin except stupidity - Oscar Wilde
...
Daemeon (Dæmœn) Reiydelle
USA 1.415.501.0198

On Jul 3, 2017 12:27 AM, "Nishant Verma" <ni...@gmail.com>
wrote:

> Hello
>
> I am having Kafka Connect writing records on my HDFS nodes. HDFS cluster
> has 3 datanodes. Last night I observed data loss in records committed to
> HDFS. There was no issue on Kafka Connect side. However, I can see Namenode
> showing below error logs:
>
> java.io.IOException: File /topics/+tmp/testTopic/year=
> 2017/month=07/day=03/hour=03/8237cfb7-2b3d-4d5c-ab04-924c0f647cd6_tmp
> could only be replicated to 0 nodes instead of minReplication (=1).  There
> are 3 datanode(s) running and no node(s) are excluded in this operation.
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.
> chooseTarget4NewBlock(BlockManager.java:1571)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
> getNewBlockTargets(FSNamesystem.java:3107)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
> getAdditionalBlock(FSNamesystem.java:3031)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
> addBlock(NameNodeRpcServer.java:725)
>         at org.apache.hadoop.hdfs.protocolPB.
> ClientNamenodeProtocolServerSideTranslatorPB.addBlock(
> ClientNamenodeProtocolServerSideTranslatorPB.java:492)
>         at org.apache.hadoop.hdfs.protocol.proto.
> ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
> ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1698)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy:
> Failed to place enough replicas, still in need of 3 to reach 3
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
>
>
> Before occurence of every such line, we see below line:
> 2017-07-02 23:33:43,255 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 5 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock
> from 10.1.2.3:4982 Call#274492 Retry#0
>
> 10.1.2.3 is one of the Kafka Connect nodes.
>
>
> I checked below things:
>
> - There is no disk issue on datanodes. There is 110 GB space left in each
> datanode.
> - In dfsadmin report, there are 3 live datanodes showing.
> - dfs.datanode.du.reserved is used as its default value i.e. 0
> - dfs.replication is set as 3.
> - dfs.datanode.handler.count is used as its default value i.e. 10.
> - dfs.datanode.data.dir.perm is used as its default value i.e. 700. But
> single user is used everywhere. So permission issue would not be there.
> Also, it did give accurate result for 22 hours and happened after 22nd hour.
> - Could not find any error occurrence for this timestamp in datanode logs.
> - The path where dfs.data.dir points has 64% space available on disk.
>
> What could be the cause of this error and how to fix this? Why is it
> saying the file could only be replicated to 0 nodes when it also says there
> are 3 datanodes available?
>
> Thanks
> Nishant
>
>