You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by omprakash <om...@cdac.in> on 2017/06/21 06:50:28 UTC

Lots of warning messages and exception in namenode logs

Hi,

 

I am receiving lots of  warning messages in namenodes logs on ACTIVE NN in
my HA Hadoop setup. Below are the logs

 

"2017-06-21 12:11:26,523 WARN
org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
replicas: expected size is 1 but only 0 storage types can be selected
(replication=2, selected=[], unavailable=[DISK], removed=[DISK],
policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
replicationFallbacks=[ARCHIVE]})

2017-06-21 12:11:26,523 WARN
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
to place enough replicas, still in need of 1 to reach 2
(unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
newBlock=true) All required storage types are unavailable:
unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}

2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
allocate blk_1073894332_153508, replicas=192.168.9.174:50010 for
/36962._COPYING_

2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR*
completeFile: /36962._COPYING_ is closed by
DFSClient_NONMAPREDUCE_146762699_1

2017-06-21 12:11:30,626 WARN
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
to place enough replicas, still in need of 1 to reach 2
(unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
newBlock=true) For more information, please enable DEBUG log level on
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and
org.apache.hadoop.net.NetworkTopology

2017-06-21 12:11:30,626 WARN
org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
replicas: expected size is 1 but only 0 storage types can be selected
(replication=2, selected=[], unavailable=[DISK], removed=[DISK],
policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
replicationFallbacks=[ARCHIVE]})"

 

I am also encountering exceptions in active namenode related to LeaseManager

 

2017-06-21 12:13:16,706 INFO
org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder:
DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired
hard limit

2017-06-21 12:13:16,706 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1],
src=/user/hadoop/2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79

2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.internalReleaseLease: Failed to release lease for file
/user/hadoop/2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed
blocks are waiting to be minimally replicated. Try again later.

2017-06-21 12:13:16,706 ERROR
org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path
/user/hadoop/2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79 in the lease
[Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates:
1]

org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR*
NameSystem.internalReleaseLease: Failed to release lease for file
/user/hadoop/2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed
blocks are waiting to be minimally replicated. Try again later.

        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSN
amesystem.java:3200)

        at
org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager
.java:383)

        at
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager
.java:329)

        at java.lang.Thread.run(Thread.java:745)

 

I have checked the two datanodes. Both are running and have enough space for
new data. 

 

PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is setuped
using Qourom Journal Manager and  Zookeeper server.

 

Any idea why these errors?

 

Regards

Omprakash Paliwal

HPC-Medical and Bioinformatics Applications Group

Centre for Development of Advanced Computing (C-DAC)

Pune University campus,

PUNE-411007

Maharashtra, India

email:omprakashp@cdac.in

Contact : +91-20-25704231

 


-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------

Re: Lots of warning messages and exception in namenode logs

Posted by Philippe Kernévez <pk...@octo.com>.

Hi all,

>After setting *dfs.replication=2 *, I did a clean start of hdfs.
This should not changed anything. The dfs.replication value is only used
for the new files, the existing files keep their own value.
https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
If you want to change the replication factor for existing files you have to
use the setrep command :
https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html#setrep

I think that the real change was to add 2 more datanodes that increase the
workload distribution.

Regards,
Philippe


On Thu, Jun 29, 2017 at 5:35 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Omprakash!
>
> If both datanodes die at the same time, then yes, data will be lost. In
> that case, you should increase dfs.replication to 3 (so that there will be
> 3 copies). This obviously adversely affects the total amount of data you
> can store on HDFS.
>
> However if only 1 datanode dies, the namenode notices that, and orders the
> remaining replica to be replicated. The rate at which it orders
> re-replication is determined by dfs.namenode.replication.work.multiplier.per.iteration
> and the number of nodes in your cluster. The more nodes you have in your
> cluster (some companies run 1000s of nodes in 1 cluster), the faster the
> lost replicas will be replicated. Let's say there were 2 million blocks on
> each datanode, and you configured only 2 blocks to be re-replicated per
> datanode heartbeat (usually 3 seconds). If there were 2 other datanodes, it
> would take 2000000 / 2 * 3 seconds to re-replicate data. Ofcourse you can't
> crank up the number of blocks re-replicated too high, because there's only
> so much data that datanodes can transfer amongst themselves. You should
> calculate how many blocks you have, how much bandwidth is available between
> any two datanodes, how quickly you want replication (if your disks are only
> re-replicating, jobs may not make progress), and set that configuration
> accordingly. Depending on your datanode capacity it may take 1-2 days to
> rereplicate all the data.
>
> Also, I'd encourage you to read through more of the documentation
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
> and become familiar with the system. There can be a *huge* difference
> between a well-tuned Hadoop cluster and a poorly configured one.
>
> HTH
> Ravi
>
>
> On Thu, Jun 29, 2017 at 4:50 AM, omprakash <om...@cdac.in> wrote:
>
>> Hi Sidharth,
>>
>>
>>
>> Thanks a lot for the clarification. May you suggest parameters that can
>> improve the re-replication in case of failure.
>>
>>
>>
>> Regards
>>
>> Om
>>
>>
>>
>> *From:* Sidharth Kumar [mailto:sidharthkumar2707@gmail.com]
>> *Sent:* 29 June 2017 16:06
>> *To:* omprakash <om...@cdac.in>
>> *Cc:* Arpit Agarwal <aa...@hortonworks.com>;
>> common-user@hadoop.apache.org <us...@hadoop.apache.org>; Ravi Prakash <
>> ravihadoop@gmail.com>
>>
>> *Subject:* RE: Lots of warning messages and exception in namenode logs
>>
>>
>>
>> Hi,
>>
>>
>>
>> No, as there will be no copy exists of that file. You can increase the
>> replication factor to 3 so that there will be 3 copies created and even if
>> 2 data nodes goes down you will still have one copy available which will be
>> again replicated to 3 by the namenode in due course of time.
>>
>>
>> Warm Regards
>>
>> Sidharth Kumar | Mob: +91 8197 555 599 <+91%2081975%2055599>/7892 192
>> 367 |  LinkedIn:www.linkedin.com/in/sidharthkumar2792
>>
>>
>>
>>
>>
>>
>>
>>
>> On 29-Jun-2017 3:45 PM, "omprakash" <om...@cdac.in> wrote:
>>
>> Hi Ravi,
>>
>>
>>
>> I have 5 nodes in Hadoop cluster and all have same configurations. After
>> setting *dfs.replication=2 *, I did a clean start of hdfs.
>>
>>
>>
>> As per your suggestion, I added 2 more datanodes and clean all the data
>> and metadata. The performance of the cluster has dramatically improved. I
>> can see through logs that the files are randomly replicated to four
>> datanodes (2 replica of each file).
>>
>>
>>
>> But here my problem arise. I want redundant datanodes such that if any
>> two of the datanodes goes down I still be able to get files from other two.
>> In above case suppose file block-xyz get stored on datanode1 and datanode2,
>> and some day these two datanodes goes down , will I be able to access the
>> block-xyz? This is what I am worried about.
>>
>>
>>
>>
>>
>> Regards
>>
>> Om
>>
>>
>>
>>
>>
>> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com]
>> *Sent:* 27 June 2017 22:36
>> *To:* omprakash <om...@cdac.in>
>> *Cc:* Arpit Agarwal <aa...@hortonworks.com>; user <
>> user@hadoop.apache.org>
>> *Subject:* Re: Lots of warning messages and exception in namenode logs
>>
>>
>>
>> Hi Omprakash!
>>
>> This is *not* ok. Please go through the datanode logs of the inactive
>> datanode and figure out why its inactive. If you set dfs.replication to 2,
>> atleast as many datanodes (and ideally a LOT more datanodes) should be
>> active and participating in the cluster.
>>
>> Do you have the hdfs-site.xml you posted to the mailing list on all the
>> nodes (including the Namenode)? Was the file containing block
>> *blk_1074074104_337394* created when you had the cluster misconfigured
>> to dfs.replication=3 ? You can determine which file the block belongs to
>> using this command:
>>
>> hdfs fsck -blockId blk_1074074104
>>
>> Once you have the file, you can set its replication using
>> hdfs dfs -setrep 2 <Filename>
>>
>> I'm guessing that you probably have a lot of files with this replication,
>> in which case you should set it on / (This would overwrite the replication
>> on all the files)
>>
>>
>>
>> If the data on this cluster is important I would be very worried about
>> the condition its in.
>>
>> HTH
>>
>> Ravi
>>
>>
>>
>> On Mon, Jun 26, 2017 at 11:22 PM, omprakash <om...@cdac.in> wrote:
>>
>> Hi all,
>>
>>
>>
>> I started the HDFS in DEBUG mode. After examining the logs I found below
>> logs which read that the replication factor required is 3 (as against the
>> specified *dfs.replication=2*).
>>
>>
>>
>> *DEBUG BlockStateChange: BLOCK* NameSystem.UnderReplicationBlock.add:
>> blk_1074074104_337394 has only 1 replicas and need 3 replicas so is added
>> to neededReplications at priority level 0*
>>
>>
>>
>> *P.S : I have 1 datanode active out of 2. *
>>
>>
>>
>> I can also see from Namenode UI that the no. of under replicated blocks
>> are growing.
>>
>>
>>
>> Any idea? Or this is OK.
>>
>>
>>
>> regards
>>
>>
>>
>>
>>
>> *From:* omprakash [mailto:omprakashp@cdac.in]
>> *Sent:* 23 June 2017 11:02
>> *To:* 'Ravi Prakash' <ra...@gmail.com>; 'Arpit Agarwal' <
>> aagarwal@hortonworks.com>
>> *Cc:* 'user' <us...@hadoop.apache.org>
>> *Subject:* RE: Lots of warning messages and exception in namenode logs
>>
>>
>>
>> Hi Arpit,
>>
>>
>>
>> I will enable the settings as suggested and will post the results.
>>
>>
>>
>> I am just curious about setting *Namenode RPC service  port*. As I have
>> checked the *hdfs-site.xml* properties, *dfs.namenode.rpc-address *is
>> already set which will be default value to RPC service port also. Does
>> specifying any other port have advantage over default one?
>>
>>
>>
>> Regarding JvmPauseMonitor Error, there are 5-6 instances of this error in namenode logs. Here is one of them.
>>
>>
>>
>> How to identify the size of heap In such cases as I have 4GB of RAM on
>> the namenode VM.?
>>
>>
>>
>> *@Ravi* Since the file size are very small thus I have only configured a
>> VM with 20 GB space. The additional disk is simple SATA disk not SSD.
>>
>>
>>
>> As I can see from Namenode UI there are more than 50% of block under
>> replicated. I have now 400K blocks out of which 200K are under-replicated.
>>
>> I will post the results again after changing the value of
>> *dfs.namenode.replication.work* <http://dfs.namenode.replication.work>
>> *.multiplier.per.iteration*
>>
>>
>>
>>
>>
>> Thanks
>>
>> Om Prakash
>>
>>
>>
>> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com <ra...@gmail.com>]
>>
>> *Sent:* 22 June 2017 23:04
>> *To:* Arpit Agarwal <aa...@hortonworks.com>
>> *Cc:* omprakash <om...@cdac.in>; user <us...@hadoop.apache.org>
>>
>>
>> *Subject:* Re: Lots of warning messages and exception in namenode logs
>>
>>
>>
>> Hi Omprakash!
>>
>> How big are your disks? Just 20Gb? Just out of curiosity, are these SSDs?
>>
>> In addition to Arpit's reply, I'm also concerned with the number of
>> under-replicated blocks you have: Under replicated blocks: 141863
>>
>> When there are fewer replicas for a block than there are supposed to be
>> (in your case e.g. when there's 1 replica when there ought to be 2), the
>> namenode will order the datanodes to create more replicas. The rate at
>> which it does this is controlled by
>> dfs.namenode.replication.work.multiplier.per.iteration . Given you have
>> only 2 datanodes, you'll only be re-replicating 4 blocks every 3 seconds.
>> So, it will take quite a while to re-replicate all the blocks.
>>
>> Also, please know that you want files to be much bigger than 1kb. Ideally
>> you'd have a couple of blocks (blocks=128Mb) for each file. You should
>> append to files when they are this small.
>>
>> Please do let us know how things turn out.
>>
>> Cheers,
>>
>> Ravi
>>
>>
>>
>> On Wed, Jun 21, 2017 at 11:23 PM, Arpit Agarwal <aa...@hortonworks.com>
>> wrote:
>>
>> Hi Omprakash,
>>
>>
>>
>> Your description suggests DataNodes cannot send timely reports to the
>> NameNode. You can check it by looking for ‘stale’ DataNodes in the NN web
>> UI when this situation is occurring. A few ideas:
>>
>>
>>
>>    - Try increasing the NameNode RPC handler count a bit (set
>>    dfs.namenode.handler.count to 20 in hdfs-site.xml).
>>    - Enable the NameNode service RPC port. This requires downtime and
>>    reformatting the ZKFC znode.
>>    - Search for JvmPauseMonitor messages in your service logs. If you
>>    see any, try increasing JVM heap for that service.
>>    - Enable debug logging as suggested here:
>>
>>
>>
>> *2017-06-21 12:11:30,626 WARN
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
>> to place enough replicas, still in need of 1 to reach 2
>> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
>> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
>> newBlock=true) For more information, please enable DEBUG log level on
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and *
>> *org.apache.hadoop.net* <http://org.apache.hadoop.net/>*.NetworkTopology*
>>
>>
>>
>>
>>
>> *From: *omprakash <om...@cdac.in>
>> *Date: *Wednesday, June 21, 2017 at 9:23 PM
>> *To: *'Ravi Prakash' <ra...@gmail.com>
>> *Cc: *'user' <us...@hadoop.apache.org>
>> *Subject: *RE: Lots of warning messages and exception in namenode logs
>>
>>
>>
>> Hi Ravi,
>>
>>
>>
>> Pasting below my core-site and hdfs-site  configurations. I have kept
>> bare minimal configurations for my cluster.  The cluster started fine and I
>> was able to put couple of 100K files on hdfs but then when I checked the
>> logs there were errors/Exceptions. After restart of datanodes they work
>> well for few thousand files but same problem again.  No idea what is wrong.
>>
>>
>>
>> *PS: I am pumping 1 file per second to hdfs with aprox size 1KB*
>>
>>
>>
>> I thought it may be due to space quota on datanodes but here is the
>> output of *hdfs dfs -report*. Looks fine to me
>>
>>
>>
>> $ hdfs dfsadmin -report
>>
>>
>>
>> Configured Capacity: 42005069824 (39.12 GB)
>>
>> Present Capacity: 38085839568 (35.47 GB)
>>
>> DFS Remaining: 34949058560 (32.55 GB)
>>
>> DFS Used: 3136781008 <(313)%20678-1008> (2.92 GB)
>>
>> DFS Used%: 8.24%
>>
>> Under replicated blocks: 141863
>>
>> Blocks with corrupt replicas: 0
>>
>> Missing blocks: 0
>>
>> Missing blocks (with replication factor 1): 0
>>
>> Pending deletion blocks: 0
>>
>>
>>
>> -------------------------------------------------
>>
>> Live datanodes (2):
>>
>>
>>
>> Name: 192.168.9.174:50010 (node5)
>>
>> Hostname: node5
>>
>> Decommission Status : Normal
>>
>> Configured Capacity: 21002534912 (19.56 GB)
>>
>> DFS Used: 1764211024 (1.64 GB)
>>
>> Non DFS Used: 811509424 (773.92 MB)
>>
>> DFS Remaining: 17067913216 <(706)%20791-3216> (15.90 GB)
>>
>> DFS Used%: 8.40%
>>
>> DFS Remaining%: 81.27%
>>
>> Configured Cache Capacity: 0 (0 B)
>>
>> Cache Used: 0 (0 B)
>>
>> Cache Remaining: 0 (0 B)
>>
>> Cache Used%: 100.00%
>>
>> Cache Remaining%: 0.00%
>>
>> Xceivers: 2
>>
>> Last contact: Wed Jun 21 14:38:17 IST 2017
>>
>>
>>
>>
>>
>> Name: 192.168.9.225:50010 (node4)
>>
>> Hostname: node5
>>
>> Decommission Status : Normal
>>
>> Configured Capacity: 21002534912 (19.56 GB)
>>
>> DFS Used: 1372569984 (1.28 GB)
>>
>> Non DFS Used: 658353792 (627.86 MB)
>>
>> DFS Remaining: 17881145344 (16.65 GB)
>>
>> DFS Used%: 6.54%
>>
>> DFS Remaining%: 85.14%
>>
>> Configured Cache Capacity: 0 (0 B)
>>
>> Cache Used: 0 (0 B)
>>
>> Cache Remaining: 0 (0 B)
>>
>> Cache Used%: 100.00%
>>
>> Cache Remaining%: 0.00%
>>
>> Xceivers: 1
>>
>> Last contact: Wed Jun 21 14:38:19 IST 2017
>>
>>
>>
>> *core-site.xml*
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>> <configuration>
>>
>> <property>
>>
>>   <name>fs.defaultFS</name>
>>
>>   <value>hdfs://hdfsCluster</value>
>>
>> </property>
>>
>> <property>
>>
>>   <name>dfs.journalnode.edits.dir</name>
>>
>>   <value>/mnt/hadoopData/hadoop/journal/node/local/data</value>
>>
>> </property>
>>
>> </configuration>
>>
>>
>>
>> *hdfs-site.xml*
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>> <configuration>
>>
>> *<property>*
>>
>> *<name>dfs.replication</name>*
>>
>> *<value>2</value>*
>>
>> *</property>*
>>
>> <property>
>>
>>   <name>dfs.name.dir</name>
>>
>>     <value>file:///mnt/hadoopData/hadoop/hdfs/namenode</value>
>>
>> </property>
>>
>> <property>
>>
>>   <name>dfs.data.dir</name>
>>
>>     <value>file:///mnt/hadoopData/hadoop/hdfs/datanode</value>
>>
>> </property>
>>
>> <property>
>>
>> <name>dfs.nameservices</name>
>>
>> <value>hdfsCluster</value>
>>
>> </property>
>>
>> <property>
>>
>>   <name>dfs.ha.namenodes.hdfsCluster</name>
>>
>>   <value>nn1,nn2</value>
>>
>> </property>
>>
>>
>>
>> <property>
>>
>>   <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name>
>>
>>   <value>node1:8020</value>
>>
>> </property>
>>
>> <property>
>>
>>   <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name>
>>
>>   <value>node22:8020</value>
>>
>> </property>
>>
>>
>>
>> <property>
>>
>>   <name>dfs.namenode.http-address.hdfsCluster.nn1</name>
>>
>>   <value>node1:50070</value>
>>
>> </property>
>>
>> <property>
>>
>>   <name>dfs.namenode.http-address.hdfsCluster.nn2</name>
>>
>>   <value>node2:50070</value>
>>
>> </property>
>>
>>
>>
>> <property>
>>
>>   <name>dfs.namenode.shared.edits.dir</name>
>>
>>   <value>qjournal://node1:8485;node2:8485;node3:8485;node4:848
>> 5;node5:8485/hdfsCluster</value>
>>
>> </property>
>>
>> <property>
>>
>>   <name>dfs.client.failover.proxy.provider.hdfsCluster</name>
>>
>>   <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredF
>> ailoverProxyProvider</value>
>>
>> </property>
>>
>> <property>
>>
>>    <name>ha.zookeeper.quorum</name>
>>
>>    <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value>
>>
>> </property>
>>
>> <property>
>>
>> <name>dfs.ha.fencing.methods</name>
>>
>> <value>sshfence</value>
>>
>> </property>
>>
>> <property>
>>
>> <name>dfs.ha.fencing.ssh.private-key-files</name>
>>
>> <value>/home/hadoop/.ssh/id_rsa</value>
>>
>> </property>
>>
>> <property>
>>
>>    <name>dfs.ha.automatic-failover.enabled</name>
>>
>>    <value>true</value>
>>
>> </property>
>>
>> </configuration>
>>
>>
>>
>>
>>
>> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com]
>> *Sent:* 22 June 2017 02:38
>> *To:* omprakash <om...@cdac.in>
>> *Cc:* user <us...@hadoop.apache.org>
>> *Subject:* Re: Lots of warning messages and exception in namenode logs
>>
>>
>>
>> Hi Omprakash!
>>
>> What is your default replication set to? What kind of disks do your
>> datanodes have? Were you able to start a cluster with a simple
>> configuration before you started tuning it?
>>
>> HDFS tries to create the default number of replicas for a block on
>> different datanodes. The Namenode tries to give a list of datanodes that
>> the client can write replicas of the block to. If the Namenode is not able
>> to construct a list with adequate number of datanodes, you will see the
>> message you are seeing. This may mean that datanodes are unhealthy (failed
>> disks), or full (disks have no more space), being decomissioned ( HDFS will
>> not write replicas on decomissioning datanodes) or misconfigured ( I'd
>> suggest turning on storage classes only after a simple configuration works).
>>
>> When a client that was trying to write a file was killed (e.g. if you
>> killed your MR job), after some time (hard limit expiring) the Namenode
>> will try to recover the file. In your case the namenode is also not able to
>> find enough datanodes for recovering the files.
>>
>>
>>
>> HTH
>>
>> Ravi
>>
>>
>>
>>
>>
>> On Tue, Jun 20, 2017 at 11:50 PM, omprakash <om...@cdac.in> wrote:
>>
>> Hi,
>>
>>
>>
>> I am receiving lots of  *warning messages in namenodes* logs on ACTIVE
>> NN in my *HA Hadoop setup*. Below are the logs
>>
>>
>>
>> *“2017-06-21 12:11:26,523 WARN
>> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
>> replicas: expected size is 1 but only 0 storage types can be selected
>> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
>> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
>> replicationFallbacks=[ARCHIVE]})*
>>
>> *2017-06-21 12:11:26,523 WARN
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
>> to place enough replicas, still in need of 1 to reach 2
>> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
>> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
>> newBlock=true) All required storage types are unavailable:
>> unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
>> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}*
>>
>> *2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>> allocate blk_1073894332_153508, replicas=**192.168.9.174:50010*
>> <http://192.168.9.174:50010>* for /36962._COPYING_*
>>
>> *2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR*
>> completeFile: /36962._COPYING_ is closed by
>> DFSClient_NONMAPREDUCE_146762699_1*
>>
>> *2017-06-21 12:11:30,626 WARN
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
>> to place enough replicas, still in need of 1 to reach 2
>> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
>> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
>> newBlock=true) For more information, please enable DEBUG log level on
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and *
>> *org.apache.hadoop.net* <http://org.apache.hadoop.net>*.NetworkTopology*
>>
>> *2017-06-21 12:11:30,626 WARN
>> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
>> replicas: expected size is 1 but only 0 storage types can be selected
>> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
>> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
>> replicationFallbacks=[ARCHIVE]})”*
>>
>>
>>
>> I am also encountering exceptions in active namenode related to
>> LeaseManager
>>
>>
>>
>> *2017-06-21 12:13:16,706 INFO
>> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder:
>> DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired
>> hard limit*
>>
>> *2017-06-21 12:13:16,706 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
>> Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1],
>> src=/user/hadoop/**2106201707* <(210)%20620-1707>
>> */02d5adda-d90f-47cb-85d5-999a079f4d79*
>>
>> *2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR*
>> NameSystem.internalReleaseLease: Failed to release lease for file
>> /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79.
>> Committed blocks are waiting to be minimally replicated. Try again later.*
>>
>> *2017-06-21 12:13:16,706 ERROR
>> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the
>> path /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79
>> in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092,
>> pending creates: 1]*
>>
>> *org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR*
>> NameSystem.internalReleaseLease: Failed to release lease for file
>> /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79.
>> Committed blocks are waiting to be minimally replicated. Try again later.*
>>
>> *        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)*
>>
>> *        at
>> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)*
>>
>> *        at
>> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)*
>>
>> *        at java.lang.Thread.run(Thread.java:745)*
>>
>>
>>
>> I have checked the two datanodes. Both are running and have enough space
>> for new data.
>>
>>
>>
>> *PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is
>> setuped using Qourom Journal Manager and  Zookeeper server.*
>>
>>
>>
>> Any idea why these errors?
>>
>>
>>
>> *Regards*
>>
>> *Omprakash Paliwal*
>>
>> HPC-Medical and Bioinformatics Applications Group
>>
>> Centre for Development of Advanced Computing (C-DAC)
>>
>> Pune University campus,
>>
>> PUNE-411007
>>
>> Maharashtra, India
>>
>> email:omprakashp@cdac.in
>>
>> Contact : +91-20-25704231 <+91%2020%202570%204231>
>>
>>
>>
>>
>> ------------------------------------------------------------
>> -------------------------------------------------------------------
>> [ C-DAC is on Social-Media too. Kindly follow us at:
>> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>>
>> This e-mail is for the sole use of the intended recipient(s) and may
>> contain confidential and privileged information. If you are not the
>> intended recipient, please contact the sender by reply e-mail and destroy
>> all copies and the original message. Any unauthorized review, use,
>> disclosure, dissemination, forwarding, printing or copying of this email
>> is strictly prohibited and appropriate legal action will be taken.
>> ------------------------------------------------------------
>> -------------------------------------------------------------------
>>
>>
>>
>>
>> ------------------------------------------------------------
>> -------------------------------------------------------------------
>> [ C-DAC is on Social-Media too. Kindly follow us at:
>> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>>
>> This e-mail is for the sole use of the intended recipient(s) and may
>> contain confidential and privileged information. If you are not the
>> intended recipient, please contact the sender by reply e-mail and destroy
>> all copies and the original message. Any unauthorized review, use,
>> disclosure, dissemination, forwarding, printing or copying of this email
>> is strictly prohibited and appropriate legal action will be taken.
>> ------------------------------------------------------------
>> -------------------------------------------------------------------
>>
>>
>>
>>
>> ------------------------------------------------------------
>> -------------------------------------------------------------------
>> [ C-DAC is on Social-Media too. Kindly follow us at:
>> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>>
>> This e-mail is for the sole use of the intended recipient(s) and may
>> contain confidential and privileged information. If you are not the
>> intended recipient, please contact the sender by reply e-mail and destroy
>> all copies and the original message. Any unauthorized review, use,
>> disclosure, dissemination, forwarding, printing or copying of this email
>> is strictly prohibited and appropriate legal action will be taken.
>> ------------------------------------------------------------
>> -------------------------------------------------------------------
>>
>>
>>
>>
>> ------------------------------------------------------------
>> -------------------------------------------------------------------
>> [ C-DAC is on Social-Media too. Kindly follow us at:
>> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>>
>> This e-mail is for the sole use of the intended recipient(s) and may
>> contain confidential and privileged information. If you are not the
>> intended recipient, please contact the sender by reply e-mail and destroy
>> all copies and the original message. Any unauthorized review, use,
>> disclosure, dissemination, forwarding, printing or copying of this email
>> is strictly prohibited and appropriate legal action will be taken.
>> ------------------------------------------------------------
>> -------------------------------------------------------------------
>>
>>
>> ------------------------------------------------------------
>> -------------------------------------------------------------------
>> [ C-DAC is on Social-Media too. Kindly follow us at:
>> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>>
>> This e-mail is for the sole use of the intended recipient(s) and may
>> contain confidential and privileged information. If you are not the
>> intended recipient, please contact the sender by reply e-mail and destroy
>> all copies and the original message. Any unauthorized review, use,
>> disclosure, dissemination, forwarding, printing or copying of this email
>> is strictly prohibited and appropriate legal action will be taken.
>> ------------------------------------------------------------
>> -------------------------------------------------------------------
>>
>
>


-- 
Philippe Kernévez



Directeur technique (Suisse),
pkernevez@octo.com
+41 79 888 33 32

Retrouvez OCTO sur OCTO Talk : http://blog.octo.com
OCTO Technology http://www.octo.ch

Re: Lots of warning messages and exception in namenode logs

Posted by Ravi Prakash <ra...@gmail.com>.

Hi Omprakash!

If both datanodes die at the same time, then yes, data will be lost. In
that case, you should increase dfs.replication to 3 (so that there will be
3 copies). This obviously adversely affects the total amount of data you
can store on HDFS.

However if only 1 datanode dies, the namenode notices that, and orders the
remaining replica to be replicated. The rate at which it orders
re-replication is determined by
dfs.namenode.replication.work.multiplier.per.iteration
and the number of nodes in your cluster. The more nodes you have in your
cluster (some companies run 1000s of nodes in 1 cluster), the faster the
lost replicas will be replicated. Let's say there were 2 million blocks on
each datanode, and you configured only 2 blocks to be re-replicated per
datanode heartbeat (usually 3 seconds). If there were 2 other datanodes, it
would take 2000000 / 2 * 3 seconds to re-replicate data. Ofcourse you can't
crank up the number of blocks re-replicated too high, because there's only
so much data that datanodes can transfer amongst themselves. You should
calculate how many blocks you have, how much bandwidth is available between
any two datanodes, how quickly you want replication (if your disks are only
re-replicating, jobs may not make progress), and set that configuration
accordingly. Depending on your datanode capacity it may take 1-2 days to
rereplicate all the data.

Also, I'd encourage you to read through more of the documentation
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
and become familiar with the system. There can be a *huge* difference
between a well-tuned Hadoop cluster and a poorly configured one.

HTH
Ravi


On Thu, Jun 29, 2017 at 4:50 AM, omprakash <om...@cdac.in> wrote:

> Hi Sidharth,
>
>
>
> Thanks a lot for the clarification. May you suggest parameters that can
> improve the re-replication in case of failure.
>
>
>
> Regards
>
> Om
>
>
>
> *From:* Sidharth Kumar [mailto:sidharthkumar2707@gmail.com]
> *Sent:* 29 June 2017 16:06
> *To:* omprakash <om...@cdac.in>
> *Cc:* Arpit Agarwal <aa...@hortonworks.com>;
> common-user@hadoop.apache.org <us...@hadoop.apache.org>; Ravi Prakash <
> ravihadoop@gmail.com>
>
> *Subject:* RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi,
>
>
>
> No, as there will be no copy exists of that file. You can increase the
> replication factor to 3 so that there will be 3 copies created and even if
> 2 data nodes goes down you will still have one copy available which will be
> again replicated to 3 by the namenode in due course of time.
>
>
> Warm Regards
>
> Sidharth Kumar | Mob: +91 8197 555 599 <+91%2081975%2055599>/7892 192 367
> |  LinkedIn:www.linkedin.com/in/sidharthkumar2792
>
>
>
>
>
>
>
>
> On 29-Jun-2017 3:45 PM, "omprakash" <om...@cdac.in> wrote:
>
> Hi Ravi,
>
>
>
> I have 5 nodes in Hadoop cluster and all have same configurations. After
> setting *dfs.replication=2 *, I did a clean start of hdfs.
>
>
>
> As per your suggestion, I added 2 more datanodes and clean all the data
> and metadata. The performance of the cluster has dramatically improved. I
> can see through logs that the files are randomly replicated to four
> datanodes (2 replica of each file).
>
>
>
> But here my problem arise. I want redundant datanodes such that if any two
> of the datanodes goes down I still be able to get files from other two. In
> above case suppose file block-xyz get stored on datanode1 and datanode2,
> and some day these two datanodes goes down , will I be able to access the
> block-xyz? This is what I am worried about.
>
>
>
>
>
> Regards
>
> Om
>
>
>
>
>
> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com]
> *Sent:* 27 June 2017 22:36
> *To:* omprakash <om...@cdac.in>
> *Cc:* Arpit Agarwal <aa...@hortonworks.com>; user <
> user@hadoop.apache.org>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> This is *not* ok. Please go through the datanode logs of the inactive
> datanode and figure out why its inactive. If you set dfs.replication to 2,
> atleast as many datanodes (and ideally a LOT more datanodes) should be
> active and participating in the cluster.
>
> Do you have the hdfs-site.xml you posted to the mailing list on all the
> nodes (including the Namenode)? Was the file containing block
> *blk_1074074104_337394* created when you had the cluster misconfigured to
> dfs.replication=3 ? You can determine which file the block belongs to using
> this command:
>
> hdfs fsck -blockId blk_1074074104
>
> Once you have the file, you can set its replication using
> hdfs dfs -setrep 2 <Filename>
>
> I'm guessing that you probably have a lot of files with this replication,
> in which case you should set it on / (This would overwrite the replication
> on all the files)
>
>
>
> If the data on this cluster is important I would be very worried about the
> condition its in.
>
> HTH
>
> Ravi
>
>
>
> On Mon, Jun 26, 2017 at 11:22 PM, omprakash <om...@cdac.in> wrote:
>
> Hi all,
>
>
>
> I started the HDFS in DEBUG mode. After examining the logs I found below
> logs which read that the replication factor required is 3 (as against the
> specified *dfs.replication=2*).
>
>
>
> *DEBUG BlockStateChange: BLOCK* NameSystem.UnderReplicationBlock.add:
> blk_1074074104_337394 has only 1 replicas and need 3 replicas so is added
> to neededReplications at priority level 0*
>
>
>
> *P.S : I have 1 datanode active out of 2. *
>
>
>
> I can also see from Namenode UI that the no. of under replicated blocks
> are growing.
>
>
>
> Any idea? Or this is OK.
>
>
>
> regards
>
>
>
>
>
> *From:* omprakash [mailto:omprakashp@cdac.in]
> *Sent:* 23 June 2017 11:02
> *To:* 'Ravi Prakash' <ra...@gmail.com>; 'Arpit Agarwal' <
> aagarwal@hortonworks.com>
> *Cc:* 'user' <us...@hadoop.apache.org>
> *Subject:* RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Arpit,
>
>
>
> I will enable the settings as suggested and will post the results.
>
>
>
> I am just curious about setting *Namenode RPC service  port*. As I have
> checked the *hdfs-site.xml* properties, *dfs.namenode.rpc-address *is
> already set which will be default value to RPC service port also. Does
> specifying any other port have advantage over default one?
>
>
>
> Regarding JvmPauseMonitor Error, there are 5-6 instances of this error in namenode logs. Here is one of them.
>
>
>
> How to identify the size of heap In such cases as I have 4GB of RAM on the
> namenode VM.?
>
>
>
> *@Ravi* Since the file size are very small thus I have only configured a
> VM with 20 GB space. The additional disk is simple SATA disk not SSD.
>
>
>
> As I can see from Namenode UI there are more than 50% of block under
> replicated. I have now 400K blocks out of which 200K are under-replicated.
>
> I will post the results again after changing the value of
> *dfs.namenode.replication.work* <http://dfs.namenode.replication.work>
> *.multiplier.per.iteration*
>
>
>
>
>
> Thanks
>
> Om Prakash
>
>
>
> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com <ra...@gmail.com>]
> *Sent:* 22 June 2017 23:04
> *To:* Arpit Agarwal <aa...@hortonworks.com>
> *Cc:* omprakash <om...@cdac.in>; user <us...@hadoop.apache.org>
>
>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> How big are your disks? Just 20Gb? Just out of curiosity, are these SSDs?
>
> In addition to Arpit's reply, I'm also concerned with the number of
> under-replicated blocks you have: Under replicated blocks: 141863
>
> When there are fewer replicas for a block than there are supposed to be
> (in your case e.g. when there's 1 replica when there ought to be 2), the
> namenode will order the datanodes to create more replicas. The rate at
> which it does this is controlled by
> dfs.namenode.replication.work.multiplier.per.iteration . Given you have
> only 2 datanodes, you'll only be re-replicating 4 blocks every 3 seconds.
> So, it will take quite a while to re-replicate all the blocks.
>
> Also, please know that you want files to be much bigger than 1kb. Ideally
> you'd have a couple of blocks (blocks=128Mb) for each file. You should
> append to files when they are this small.
>
> Please do let us know how things turn out.
>
> Cheers,
>
> Ravi
>
>
>
> On Wed, Jun 21, 2017 at 11:23 PM, Arpit Agarwal <aa...@hortonworks.com>
> wrote:
>
> Hi Omprakash,
>
>
>
> Your description suggests DataNodes cannot send timely reports to the
> NameNode. You can check it by looking for ‘stale’ DataNodes in the NN web
> UI when this situation is occurring. A few ideas:
>
>
>
>    - Try increasing the NameNode RPC handler count a bit (set
>    dfs.namenode.handler.count to 20 in hdfs-site.xml).
>    - Enable the NameNode service RPC port. This requires downtime and
>    reformatting the ZKFC znode.
>    - Search for JvmPauseMonitor messages in your service logs. If you see
>    any, try increasing JVM heap for that service.
>    - Enable debug logging as suggested here:
>
>
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and *
> *org.apache.hadoop.net* <http://org.apache.hadoop.net/>*.NetworkTopology*
>
>
>
>
>
> *From: *omprakash <om...@cdac.in>
> *Date: *Wednesday, June 21, 2017 at 9:23 PM
> *To: *'Ravi Prakash' <ra...@gmail.com>
> *Cc: *'user' <us...@hadoop.apache.org>
> *Subject: *RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Ravi,
>
>
>
> Pasting below my core-site and hdfs-site  configurations. I have kept bare
> minimal configurations for my cluster.  The cluster started fine and I was
> able to put couple of 100K files on hdfs but then when I checked the logs
> there were errors/Exceptions. After restart of datanodes they work well for
> few thousand files but same problem again.  No idea what is wrong.
>
>
>
> *PS: I am pumping 1 file per second to hdfs with aprox size 1KB*
>
>
>
> I thought it may be due to space quota on datanodes but here is the output
> of *hdfs dfs -report*. Looks fine to me
>
>
>
> $ hdfs dfsadmin -report
>
>
>
> Configured Capacity: 42005069824 (39.12 GB)
>
> Present Capacity: 38085839568 (35.47 GB)
>
> DFS Remaining: 34949058560 (32.55 GB)
>
> DFS Used: 3136781008 <(313)%20678-1008> (2.92 GB)
>
> DFS Used%: 8.24%
>
> Under replicated blocks: 141863
>
> Blocks with corrupt replicas: 0
>
> Missing blocks: 0
>
> Missing blocks (with replication factor 1): 0
>
> Pending deletion blocks: 0
>
>
>
> -------------------------------------------------
>
> Live datanodes (2):
>
>
>
> Name: 192.168.9.174:50010 (node5)
>
> Hostname: node5
>
> Decommission Status : Normal
>
> Configured Capacity: 21002534912 (19.56 GB)
>
> DFS Used: 1764211024 (1.64 GB)
>
> Non DFS Used: 811509424 (773.92 MB)
>
> DFS Remaining: 17067913216 <(706)%20791-3216> (15.90 GB)
>
> DFS Used%: 8.40%
>
> DFS Remaining%: 81.27%
>
> Configured Cache Capacity: 0 (0 B)
>
> Cache Used: 0 (0 B)
>
> Cache Remaining: 0 (0 B)
>
> Cache Used%: 100.00%
>
> Cache Remaining%: 0.00%
>
> Xceivers: 2
>
> Last contact: Wed Jun 21 14:38:17 IST 2017
>
>
>
>
>
> Name: 192.168.9.225:50010 (node4)
>
> Hostname: node5
>
> Decommission Status : Normal
>
> Configured Capacity: 21002534912 (19.56 GB)
>
> DFS Used: 1372569984 (1.28 GB)
>
> Non DFS Used: 658353792 (627.86 MB)
>
> DFS Remaining: 17881145344 (16.65 GB)
>
> DFS Used%: 6.54%
>
> DFS Remaining%: 85.14%
>
> Configured Cache Capacity: 0 (0 B)
>
> Cache Used: 0 (0 B)
>
> Cache Remaining: 0 (0 B)
>
> Cache Used%: 100.00%
>
> Cache Remaining%: 0.00%
>
> Xceivers: 1
>
> Last contact: Wed Jun 21 14:38:19 IST 2017
>
>
>
> *core-site.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> <property>
>
>   <name>fs.defaultFS</name>
>
>   <value>hdfs://hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.journalnode.edits.dir</name>
>
>   <value>/mnt/hadoopData/hadoop/journal/node/local/data</value>
>
> </property>
>
> </configuration>
>
>
>
> *hdfs-site.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> *<property>*
>
> *<name>dfs.replication</name>*
>
> *<value>2</value>*
>
> *</property>*
>
> <property>
>
>   <name>dfs.name.dir</name>
>
>     <value>file:///mnt/hadoopData/hadoop/hdfs/namenode</value>
>
> </property>
>
> <property>
>
>   <name>dfs.data.dir</name>
>
>     <value>file:///mnt/hadoopData/hadoop/hdfs/datanode</value>
>
> </property>
>
> <property>
>
> <name>dfs.nameservices</name>
>
> <value>hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.ha.namenodes.hdfsCluster</name>
>
>   <value>nn1,nn2</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name>
>
>   <value>node1:8020</value>
>
> </property>
>
> <property>
>
>   <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name>
>
>   <value>node22:8020</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.http-address.hdfsCluster.nn1</name>
>
>   <value>node1:50070</value>
>
> </property>
>
> <property>
>
>   <name>dfs.namenode.http-address.hdfsCluster.nn2</name>
>
>   <value>node2:50070</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.shared.edits.dir</name>
>
>   <value>qjournal://node1:8485;node2:8485;node3:8485;node4:
> 8485;node5:8485/hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.client.failover.proxy.provider.hdfsCluster</name>
>
>   <value>org.apache.hadoop.hdfs.server.namenode.ha.
> ConfiguredFailoverProxyProvider</value>
>
> </property>
>
> <property>
>
>    <name>ha.zookeeper.quorum</name>
>
>    <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value>
>
> </property>
>
> <property>
>
> <name>dfs.ha.fencing.methods</name>
>
> <value>sshfence</value>
>
> </property>
>
> <property>
>
> <name>dfs.ha.fencing.ssh.private-key-files</name>
>
> <value>/home/hadoop/.ssh/id_rsa</value>
>
> </property>
>
> <property>
>
>    <name>dfs.ha.automatic-failover.enabled</name>
>
>    <value>true</value>
>
> </property>
>
> </configuration>
>
>
>
>
>
> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com]
> *Sent:* 22 June 2017 02:38
> *To:* omprakash <om...@cdac.in>
> *Cc:* user <us...@hadoop.apache.org>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> What is your default replication set to? What kind of disks do your
> datanodes have? Were you able to start a cluster with a simple
> configuration before you started tuning it?
>
> HDFS tries to create the default number of replicas for a block on
> different datanodes. The Namenode tries to give a list of datanodes that
> the client can write replicas of the block to. If the Namenode is not able
> to construct a list with adequate number of datanodes, you will see the
> message you are seeing. This may mean that datanodes are unhealthy (failed
> disks), or full (disks have no more space), being decomissioned ( HDFS will
> not write replicas on decomissioning datanodes) or misconfigured ( I'd
> suggest turning on storage classes only after a simple configuration works).
>
> When a client that was trying to write a file was killed (e.g. if you
> killed your MR job), after some time (hard limit expiring) the Namenode
> will try to recover the file. In your case the namenode is also not able to
> find enough datanodes for recovering the files.
>
>
>
> HTH
>
> Ravi
>
>
>
>
>
> On Tue, Jun 20, 2017 at 11:50 PM, omprakash <om...@cdac.in> wrote:
>
> Hi,
>
>
>
> I am receiving lots of  *warning messages in namenodes* logs on ACTIVE NN
> in my *HA Hadoop setup*. Below are the logs
>
>
>
> *“2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})*
>
> *2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) All required storage types are unavailable:
> unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}*
>
> *2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> allocate blk_1073894332_153508, replicas=**192.168.9.174:50010*
> <http://192.168.9.174:50010>* for /36962._COPYING_*
>
> *2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> completeFile: /36962._COPYING_ is closed by
> DFSClient_NONMAPREDUCE_146762699_1*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and *
> *org.apache.hadoop.net* <http://org.apache.hadoop.net>*.NetworkTopology*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})”*
>
>
>
> I am also encountering exceptions in active namenode related to
> LeaseManager
>
>
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder:
> DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired
> hard limit*
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
> Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1],
> src=/user/hadoop/**2106201707* <(210)%20620-1707>
> */02d5adda-d90f-47cb-85d5-999a079f4d79*
>
> *2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79.
> Committed blocks are waiting to be minimally replicated. Try again later.*
>
> *2017-06-21 12:13:16,706 ERROR
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the
> path /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79
> in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092,
> pending creates: 1]*
>
> *org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79.
> Committed blocks are waiting to be minimally replicated. Try again later.*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)*
>
> *        at java.lang.Thread.run(Thread.java:745)*
>
>
>
> I have checked the two datanodes. Both are running and have enough space
> for new data.
>
>
>
> *PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is
> setuped using Qourom Journal Manager and  Zookeeper server.*
>
>
>
> Any idea why these errors?
>
>
>
> *Regards*
>
> *Omprakash Paliwal*
>
> HPC-Medical and Bioinformatics Applications Group
>
> Centre for Development of Advanced Computing (C-DAC)
>
> Pune University campus,
>
> PUNE-411007
>
> Maharashtra, India
>
> email:omprakashp@cdac.in
>
> Contact : +91-20-25704231 <+91%2020%202570%204231>
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>

Re: Lots of warning messages and exception in namenode logs

Posted by Atul Rajan <at...@gmail.com>.

unsubscribe

On 29 June 2017 at 17:20, omprakash <om...@cdac.in> wrote:

> Hi Sidharth,
>
>
>
> Thanks a lot for the clarification. May you suggest parameters that can
> improve the re-replication in case of failure.
>
>
>
> Regards
>
> Om
>
>
>
> *From:* Sidharth Kumar [mailto:sidharthkumar2707@gmail.com]
> *Sent:* 29 June 2017 16:06
> *To:* omprakash <om...@cdac.in>
> *Cc:* Arpit Agarwal <aa...@hortonworks.com>;
> common-user@hadoop.apache.org <us...@hadoop.apache.org>; Ravi Prakash <
> ravihadoop@gmail.com>
> *Subject:* RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi,
>
>
>
> No, as there will be no copy exists of that file. You can increase the
> replication factor to 3 so that there will be 3 copies created and even if
> 2 data nodes goes down you will still have one copy available which will be
> again replicated to 3 by the namenode in due course of time.
>
>
> Warm Regards
>
> Sidharth Kumar | Mob: +91 8197 555 599/7892 192 367 |  LinkedIn:
> www.linkedin.com/in/sidharthkumar2792
>
>
>
>
>
>
>
>
> On 29-Jun-2017 3:45 PM, "omprakash" <om...@cdac.in> wrote:
>
> Hi Ravi,
>
>
>
> I have 5 nodes in Hadoop cluster and all have same configurations. After
> setting *dfs.replication=2 *, I did a clean start of hdfs.
>
>
>
> As per your suggestion, I added 2 more datanodes and clean all the data
> and metadata. The performance of the cluster has dramatically improved. I
> can see through logs that the files are randomly replicated to four
> datanodes (2 replica of each file).
>
>
>
> But here my problem arise. I want redundant datanodes such that if any two
> of the datanodes goes down I still be able to get files from other two. In
> above case suppose file block-xyz get stored on datanode1 and datanode2,
> and some day these two datanodes goes down , will I be able to access the
> block-xyz? This is what I am worried about.
>
>
>
>
>
> Regards
>
> Om
>
>
>
>
>
> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com]
> *Sent:* 27 June 2017 22:36
> *To:* omprakash <om...@cdac.in>
> *Cc:* Arpit Agarwal <aa...@hortonworks.com>; user <
> user@hadoop.apache.org>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> This is *not* ok. Please go through the datanode logs of the inactive
> datanode and figure out why its inactive. If you set dfs.replication to 2,
> atleast as many datanodes (and ideally a LOT more datanodes) should be
> active and participating in the cluster.
>
> Do you have the hdfs-site.xml you posted to the mailing list on all the
> nodes (including the Namenode)? Was the file containing block
> *blk_1074074104_337394* created when you had the cluster misconfigured to
> dfs.replication=3 ? You can determine which file the block belongs to using
> this command:
>
> hdfs fsck -blockId blk_1074074104
>
> Once you have the file, you can set its replication using
> hdfs dfs -setrep 2 <Filename>
>
> I'm guessing that you probably have a lot of files with this replication,
> in which case you should set it on / (This would overwrite the replication
> on all the files)
>
>
>
> If the data on this cluster is important I would be very worried about the
> condition its in.
>
> HTH
>
> Ravi
>
>
>
> On Mon, Jun 26, 2017 at 11:22 PM, omprakash <om...@cdac.in> wrote:
>
> Hi all,
>
>
>
> I started the HDFS in DEBUG mode. After examining the logs I found below
> logs which read that the replication factor required is 3 (as against the
> specified *dfs.replication=2*).
>
>
>
> *DEBUG BlockStateChange: BLOCK* NameSystem.UnderReplicationBlock.add:
> blk_1074074104_337394 has only 1 replicas and need 3 replicas so is added
> to neededReplications at priority level 0*
>
>
>
> *P.S : I have 1 datanode active out of 2. *
>
>
>
> I can also see from Namenode UI that the no. of under replicated blocks
> are growing.
>
>
>
> Any idea? Or this is OK.
>
>
>
> regards
>
>
>
>
>
> *From:* omprakash [mailto:omprakashp@cdac.in]
> *Sent:* 23 June 2017 11:02
> *To:* 'Ravi Prakash' <ra...@gmail.com>; 'Arpit Agarwal' <
> aagarwal@hortonworks.com>
> *Cc:* 'user' <us...@hadoop.apache.org>
> *Subject:* RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Arpit,
>
>
>
> I will enable the settings as suggested and will post the results.
>
>
>
> I am just curious about setting *Namenode RPC service  port*. As I have
> checked the *hdfs-site.xml* properties, *dfs.namenode.rpc-address *is
> already set which will be default value to RPC service port also. Does
> specifying any other port have advantage over default one?
>
>
>
> Regarding JvmPauseMonitor Error, there are 5-6 instances of this error in namenode logs. Here is one of them.
>
>
>
> How to identify the size of heap In such cases as I have 4GB of RAM on the
> namenode VM.?
>
>
>
> *@Ravi* Since the file size are very small thus I have only configured a
> VM with 20 GB space. The additional disk is simple SATA disk not SSD.
>
>
>
> As I can see from Namenode UI there are more than 50% of block under
> replicated. I have now 400K blocks out of which 200K are under-replicated.
>
> I will post the results again after changing the value of
> *dfs.namenode.replication.work* <http://dfs.namenode.replication.work>
> *.multiplier.per.iteration*
>
>
>
>
>
> Thanks
>
> Om Prakash
>
>
>
> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com <ra...@gmail.com>]
> *Sent:* 22 June 2017 23:04
> *To:* Arpit Agarwal <aa...@hortonworks.com>
> *Cc:* omprakash <om...@cdac.in>; user <us...@hadoop.apache.org>
>
>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> How big are your disks? Just 20Gb? Just out of curiosity, are these SSDs?
>
> In addition to Arpit's reply, I'm also concerned with the number of
> under-replicated blocks you have: Under replicated blocks: 141863
>
> When there are fewer replicas for a block than there are supposed to be
> (in your case e.g. when there's 1 replica when there ought to be 2), the
> namenode will order the datanodes to create more replicas. The rate at
> which it does this is controlled by
> dfs.namenode.replication.work.multiplier.per.iteration . Given you have
> only 2 datanodes, you'll only be re-replicating 4 blocks every 3 seconds.
> So, it will take quite a while to re-replicate all the blocks.
>
> Also, please know that you want files to be much bigger than 1kb. Ideally
> you'd have a couple of blocks (blocks=128Mb) for each file. You should
> append to files when they are this small.
>
> Please do let us know how things turn out.
>
> Cheers,
>
> Ravi
>
>
>
> On Wed, Jun 21, 2017 at 11:23 PM, Arpit Agarwal <aa...@hortonworks.com>
> wrote:
>
> Hi Omprakash,
>
>
>
> Your description suggests DataNodes cannot send timely reports to the
> NameNode. You can check it by looking for ‘stale’ DataNodes in the NN web
> UI when this situation is occurring. A few ideas:
>
>
>
>    - Try increasing the NameNode RPC handler count a bit (set
>    dfs.namenode.handler.count to 20 in hdfs-site.xml).
>    - Enable the NameNode service RPC port. This requires downtime and
>    reformatting the ZKFC znode.
>    - Search for JvmPauseMonitor messages in your service logs. If you see
>    any, try increasing JVM heap for that service.
>    - Enable debug logging as suggested here:
>
>
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and *
> *org.apache.hadoop.net* <http://org.apache.hadoop.net/>*.NetworkTopology*
>
>
>
>
>
> *From: *omprakash <om...@cdac.in>
> *Date: *Wednesday, June 21, 2017 at 9:23 PM
> *To: *'Ravi Prakash' <ra...@gmail.com>
> *Cc: *'user' <us...@hadoop.apache.org>
> *Subject: *RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Ravi,
>
>
>
> Pasting below my core-site and hdfs-site  configurations. I have kept bare
> minimal configurations for my cluster.  The cluster started fine and I was
> able to put couple of 100K files on hdfs but then when I checked the logs
> there were errors/Exceptions. After restart of datanodes they work well for
> few thousand files but same problem again.  No idea what is wrong.
>
>
>
> *PS: I am pumping 1 file per second to hdfs with aprox size 1KB*
>
>
>
> I thought it may be due to space quota on datanodes but here is the output
> of *hdfs dfs -report*. Looks fine to me
>
>
>
> $ hdfs dfsadmin -report
>
>
>
> Configured Capacity: 42005069824 (39.12 GB)
>
> Present Capacity: 38085839568 (35.47 GB)
>
> DFS Remaining: 34949058560 (32.55 GB)
>
> DFS Used: 3136781008 <(313)%20678-1008> (2.92 GB)
>
> DFS Used%: 8.24%
>
> Under replicated blocks: 141863
>
> Blocks with corrupt replicas: 0
>
> Missing blocks: 0
>
> Missing blocks (with replication factor 1): 0
>
> Pending deletion blocks: 0
>
>
>
> -------------------------------------------------
>
> Live datanodes (2):
>
>
>
> Name: 192.168.9.174:50010 (node5)
>
> Hostname: node5
>
> Decommission Status : Normal
>
> Configured Capacity: 21002534912 (19.56 GB)
>
> DFS Used: 1764211024 (1.64 GB)
>
> Non DFS Used: 811509424 (773.92 MB)
>
> DFS Remaining: 17067913216 <(706)%20791-3216> (15.90 GB)
>
> DFS Used%: 8.40%
>
> DFS Remaining%: 81.27%
>
> Configured Cache Capacity: 0 (0 B)
>
> Cache Used: 0 (0 B)
>
> Cache Remaining: 0 (0 B)
>
> Cache Used%: 100.00%
>
> Cache Remaining%: 0.00%
>
> Xceivers: 2
>
> Last contact: Wed Jun 21 14:38:17 IST 2017
>
>
>
>
>
> Name: 192.168.9.225:50010 (node4)
>
> Hostname: node5
>
> Decommission Status : Normal
>
> Configured Capacity: 21002534912 (19.56 GB)
>
> DFS Used: 1372569984 (1.28 GB)
>
> Non DFS Used: 658353792 (627.86 MB)
>
> DFS Remaining: 17881145344 (16.65 GB)
>
> DFS Used%: 6.54%
>
> DFS Remaining%: 85.14%
>
> Configured Cache Capacity: 0 (0 B)
>
> Cache Used: 0 (0 B)
>
> Cache Remaining: 0 (0 B)
>
> Cache Used%: 100.00%
>
> Cache Remaining%: 0.00%
>
> Xceivers: 1
>
> Last contact: Wed Jun 21 14:38:19 IST 2017
>
>
>
> *core-site.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> <property>
>
>   <name>fs.defaultFS</name>
>
>   <value>hdfs://hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.journalnode.edits.dir</name>
>
>   <value>/mnt/hadoopData/hadoop/journal/node/local/data</value>
>
> </property>
>
> </configuration>
>
>
>
> *hdfs-site.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> *<property>*
>
> *<name>dfs.replication</name>*
>
> *<value>2</value>*
>
> *</property>*
>
> <property>
>
>   <name>dfs.name.dir</name>
>
>     <value>file:///mnt/hadoopData/hadoop/hdfs/namenode</value>
>
> </property>
>
> <property>
>
>   <name>dfs.data.dir</name>
>
>     <value>file:///mnt/hadoopData/hadoop/hdfs/datanode</value>
>
> </property>
>
> <property>
>
> <name>dfs.nameservices</name>
>
> <value>hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.ha.namenodes.hdfsCluster</name>
>
>   <value>nn1,nn2</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name>
>
>   <value>node1:8020</value>
>
> </property>
>
> <property>
>
>   <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name>
>
>   <value>node22:8020</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.http-address.hdfsCluster.nn1</name>
>
>   <value>node1:50070</value>
>
> </property>
>
> <property>
>
>   <name>dfs.namenode.http-address.hdfsCluster.nn2</name>
>
>   <value>node2:50070</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.shared.edits.dir</name>
>
>   <value>qjournal://node1:8485;node2:8485;node3:8485;node4:
> 8485;node5:8485/hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.client.failover.proxy.provider.hdfsCluster</name>
>
>   <value>org.apache.hadoop.hdfs.server.namenode.ha.
> ConfiguredFailoverProxyProvider</value>
>
> </property>
>
> <property>
>
>    <name>ha.zookeeper.quorum</name>
>
>    <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value>
>
> </property>
>
> <property>
>
> <name>dfs.ha.fencing.methods</name>
>
> <value>sshfence</value>
>
> </property>
>
> <property>
>
> <name>dfs.ha.fencing.ssh.private-key-files</name>
>
> <value>/home/hadoop/.ssh/id_rsa</value>
>
> </property>
>
> <property>
>
>    <name>dfs.ha.automatic-failover.enabled</name>
>
>    <value>true</value>
>
> </property>
>
> </configuration>
>
>
>
>
>
> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com]
> *Sent:* 22 June 2017 02:38
> *To:* omprakash <om...@cdac.in>
> *Cc:* user <us...@hadoop.apache.org>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> What is your default replication set to? What kind of disks do your
> datanodes have? Were you able to start a cluster with a simple
> configuration before you started tuning it?
>
> HDFS tries to create the default number of replicas for a block on
> different datanodes. The Namenode tries to give a list of datanodes that
> the client can write replicas of the block to. If the Namenode is not able
> to construct a list with adequate number of datanodes, you will see the
> message you are seeing. This may mean that datanodes are unhealthy (failed
> disks), or full (disks have no more space), being decomissioned ( HDFS will
> not write replicas on decomissioning datanodes) or misconfigured ( I'd
> suggest turning on storage classes only after a simple configuration works).
>
> When a client that was trying to write a file was killed (e.g. if you
> killed your MR job), after some time (hard limit expiring) the Namenode
> will try to recover the file. In your case the namenode is also not able to
> find enough datanodes for recovering the files.
>
>
>
> HTH
>
> Ravi
>
>
>
>
>
> On Tue, Jun 20, 2017 at 11:50 PM, omprakash <om...@cdac.in> wrote:
>
> Hi,
>
>
>
> I am receiving lots of  *warning messages in namenodes* logs on ACTIVE NN
> in my *HA Hadoop setup*. Below are the logs
>
>
>
> *“2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})*
>
> *2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) All required storage types are unavailable:
> unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}*
>
> *2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> allocate blk_1073894332_153508, replicas=**192.168.9.174:50010*
> <http://192.168.9.174:50010>* for /36962._COPYING_*
>
> *2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> completeFile: /36962._COPYING_ is closed by
> DFSClient_NONMAPREDUCE_146762699_1*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and *
> *org.apache.hadoop.net* <http://org.apache.hadoop.net>*.NetworkTopology*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})”*
>
>
>
> I am also encountering exceptions in active namenode related to
> LeaseManager
>
>
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder:
> DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired
> hard limit*
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
> Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1],
> src=/user/hadoop/**2106201707* <(210)%20620-1707>
> */02d5adda-d90f-47cb-85d5-999a079f4d79*
>
> *2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79.
> Committed blocks are waiting to be minimally replicated. Try again later.*
>
> *2017-06-21 12:13:16,706 ERROR
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the
> path /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79
> in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092,
> pending creates: 1]*
>
> *org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79.
> Committed blocks are waiting to be minimally replicated. Try again later.*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)*
>
> *        at java.lang.Thread.run(Thread.java:745)*
>
>
>
> I have checked the two datanodes. Both are running and have enough space
> for new data.
>
>
>
> *PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is
> setuped using Qourom Journal Manager and  Zookeeper server.*
>
>
>
> Any idea why these errors?
>
>
>
> *Regards*
>
> *Omprakash Paliwal*
>
> HPC-Medical and Bioinformatics Applications Group
>
> Centre for Development of Advanced Computing (C-DAC)
>
> Pune University campus,
>
> PUNE-411007
>
> Maharashtra, India
>
> email:omprakashp@cdac.in
>
> Contact : +91-20-25704231 <+91%2020%202570%204231>
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>



-- 
*Best Regards*
*Atul Rajan*

RE: Lots of warning messages and exception in namenode logs

Posted by omprakash <om...@cdac.in>.

Hi Sidharth,

 

Thanks a lot for the clarification. May you suggest parameters that can improve the re-replication in case of failure. 

 

Regards

Om

 

From: Sidharth Kumar [mailto:sidharthkumar2707@gmail.com] 
Sent: 29 June 2017 16:06
To: omprakash <om...@cdac.in>
Cc: Arpit Agarwal <aa...@hortonworks.com>; common-user@hadoop.apache.org <us...@hadoop.apache.org>; Ravi Prakash <ra...@gmail.com>
Subject: RE: Lots of warning messages and exception in namenode logs

 

Hi,

 

No, as there will be no copy exists of that file. You can increase the replication factor to 3 so that there will be 3 copies created and even if 2 data nodes goes down you will still have one copy available which will be again replicated to 3 by the namenode in due course of time. 


Warm Regards

Sidharth Kumar | Mob: +91 8197 555 599/7892 192 367 |  LinkedIn:www.linkedin.com/in/sidharthkumar2792 <http://www.linkedin.com/in/sidharthkumar2792> 




    

 

On 29-Jun-2017 3:45 PM, "omprakash" <omprakashp@cdac.in <ma...@cdac.in> > wrote:

Hi Ravi,

 

I have 5 nodes in Hadoop cluster and all have same configurations. After setting dfs.replication=2 , I did a clean start of hdfs. 

 

As per your suggestion, I added 2 more datanodes and clean all the data and metadata. The performance of the cluster has dramatically improved. I can see through logs that the files are randomly replicated to four datanodes (2 replica of each file).

 

But here my problem arise. I want redundant datanodes such that if any two of the datanodes goes down I still be able to get files from other two. In above case suppose file block-xyz get stored on datanode1 and datanode2, and some day these two datanodes goes down , will I be able to access the block-xyz? This is what I am worried about.

 

 

Regards

Om

 

 

From: Ravi Prakash [mailto: <ma...@gmail.com> ravihadoop@gmail.com] 
Sent: 27 June 2017 22:36
To: omprakash < <ma...@cdac.in> omprakashp@cdac.in>
Cc: Arpit Agarwal < <ma...@hortonworks.com> aagarwal@hortonworks.com>; user < <ma...@hadoop.apache.org> user@hadoop.apache.org>
Subject: Re: Lots of warning messages and exception in namenode logs

 

Hi Omprakash!

This is *not* ok. Please go through the datanode logs of the inactive datanode and figure out why its inactive. If you set dfs.replication to 2, atleast as many datanodes (and ideally a LOT more datanodes) should be active and participating in the cluster. 

Do you have the hdfs-site.xml you posted to the mailing list on all the nodes (including the Namenode)? Was the file containing block blk_1074074104_337394 created when you had the cluster misconfigured to dfs.replication=3 ? You can determine which file the block belongs to using this command:

hdfs fsck -blockId blk_1074074104

Once you have the file, you can set its replication using 
hdfs dfs -setrep 2 <Filename>

I'm guessing that you probably have a lot of files with this replication, in which case you should set it on / (This would overwrite the replication on all the files)

 

If the data on this cluster is important I would be very worried about the condition its in.

HTH

Ravi

 

On Mon, Jun 26, 2017 at 11:22 PM, omprakash <omprakashp@cdac.in <ma...@cdac.in> > wrote:

Hi all,

 

I started the HDFS in DEBUG mode. After examining the logs I found below logs which read that the replication factor required is 3 (as against the specified dfs.replication=2). 

 

DEBUG BlockStateChange: BLOCK* NameSystem.UnderReplicationBlock.add: blk_1074074104_337394 has only 1 replicas and need 3 replicas so is added to neededReplications at priority level 0

 

P.S : I have 1 datanode active out of 2. 

 

I can also see from Namenode UI that the no. of under replicated blocks are growing.

 

Any idea? Or this is OK. 

 

regards

 

 

From: omprakash [mailto: <ma...@cdac.in> omprakashp@cdac.in] 
Sent: 23 June 2017 11:02
To: 'Ravi Prakash' < <ma...@gmail.com> ravihadoop@gmail.com>; 'Arpit Agarwal' < <ma...@hortonworks.com> aagarwal@hortonworks.com>
Cc: 'user' < <ma...@hadoop.apache.org> user@hadoop.apache.org>
Subject: RE: Lots of warning messages and exception in namenode logs

 

Hi Arpit,

 

I will enable the settings as suggested and will post the results.

 

I am just curious about setting Namenode RPC service  port. As I have checked the hdfs-site.xml properties, dfs.namenode.rpc-address is already set which will be default value to RPC service port also. Does specifying any other port have advantage over default one?

 

Regarding JvmPauseMonitor Error, there are 5-6 instances of this error in namenode logs. Here is one of them.

 

How to identify the size of heap In such cases as I have 4GB of RAM on the namenode VM.?

 

@Ravi Since the file size are very small thus I have only configured a VM with 20 GB space. The additional disk is simple SATA disk not SSD. 

 

As I can see from Namenode UI there are more than 50% of block under replicated. I have now 400K blocks out of which 200K are under-replicated. 

I will post the results again after changing the value of  <http://dfs.namenode.replication.work> dfs.namenode.replication.work.multiplier.per.iteration

 

 

Thanks 

Om Prakash

 

From: Ravi Prakash [ <ma...@gmail.com> mailto:ravihadoop@gmail.com] 
Sent: 22 June 2017 23:04
To: Arpit Agarwal < <ma...@hortonworks.com> aagarwal@hortonworks.com>
Cc: omprakash < <ma...@cdac.in> omprakashp@cdac.in>; user < <ma...@hadoop.apache.org> user@hadoop.apache.org>


Subject: Re: Lots of warning messages and exception in namenode logs

 

Hi Omprakash!

How big are your disks? Just 20Gb? Just out of curiosity, are these SSDs?

In addition to Arpit's reply, I'm also concerned with the number of under-replicated blocks you have: Under replicated blocks: 141863

When there are fewer replicas for a block than there are supposed to be (in your case e.g. when there's 1 replica when there ought to be 2), the namenode will order the datanodes to create more replicas. The rate at which it does this is controlled by 
dfs.namenode.replication.work <http://dfs.namenode.replication.work> .multiplier.per.iteration . Given you have only 2 datanodes, you'll only be re-replicating 4 blocks every 3 seconds. So, it will take quite a while to re-replicate all the blocks. 

Also, please know that you want files to be much bigger than 1kb. Ideally you'd have a couple of blocks (blocks=128Mb) for each file. You should append to files when they are this small.

Please do let us know how things turn out.

Cheers,

Ravi

 

On Wed, Jun 21, 2017 at 11:23 PM, Arpit Agarwal <aagarwal@hortonworks.com <ma...@hortonworks.com> > wrote:

Hi Omprakash,

 

Your description suggests DataNodes cannot send timely reports to the NameNode. You can check it by looking for ‘stale’ DataNodes in the NN web UI when this situation is occurring. A few ideas:

 

*	Try increasing the NameNode RPC handler count a bit (set dfs.namenode.handler.count to 20 in hdfs-site.xml).
*	Enable the NameNode service RPC port. This requires downtime and reformatting the ZKFC znode.
*	Search for JvmPauseMonitor messages in your service logs. If you see any, try increasing JVM heap for that service.
*	Enable debug logging as suggested here:

 

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and  <http://org.apache.hadoop.net/> org.apache.hadoop.net.NetworkTopology

 

 

From: omprakash < <ma...@cdac.in> omprakashp@cdac.in>
Date: Wednesday, June 21, 2017 at 9:23 PM
To: 'Ravi Prakash' < <ma...@gmail.com> ravihadoop@gmail.com>
Cc: 'user' < <ma...@hadoop.apache.org> user@hadoop.apache.org>
Subject: RE: Lots of warning messages and exception in namenode logs

 

Hi Ravi,

 

Pasting below my core-site and hdfs-site  configurations. I have kept bare minimal configurations for my cluster.  The cluster started fine and I was able to put couple of 100K files on hdfs but then when I checked the logs there were errors/Exceptions. After restart of datanodes they work well for few thousand files but same problem again.  No idea what is wrong. 

 

PS: I am pumping 1 file per second to hdfs with aprox size 1KB

 

I thought it may be due to space quota on datanodes but here is the output of hdfs dfs -report. Looks fine to me

 

$ hdfs dfsadmin -report

 

Configured Capacity: 42005069824 (39.12 GB)

Present Capacity: 38085839568 (35.47 GB)

DFS Remaining: 34949058560 (32.55 GB)

DFS Used:  <tel:(313)%20678-1008> 3136781008 (2.92 GB)

DFS Used%: 8.24%

Under replicated blocks: 141863

Blocks with corrupt replicas: 0

Missing blocks: 0

Missing blocks (with replication factor 1): 0

Pending deletion blocks: 0

 

-------------------------------------------------

Live datanodes (2):

 

Name:  <http://192.168.9.174:50010> 192.168.9.174:50010 (node5)

Hostname: node5

Decommission Status : Normal

Configured Capacity: 21002534912 (19.56 GB)

DFS Used: 1764211024 (1.64 GB)

Non DFS Used: 811509424 (773.92 MB)

DFS Remaining:  <tel:(706)%20791-3216> 17067913216 (15.90 GB)

DFS Used%: 8.40%

DFS Remaining%: 81.27%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 2

Last contact: Wed Jun 21 14:38:17 IST 2017

 

 

Name:  <http://192.168.9.225:50010> 192.168.9.225:50010 (node4)

Hostname: node5

Decommission Status : Normal

Configured Capacity: 21002534912 (19.56 GB)

DFS Used: 1372569984 (1.28 GB)

Non DFS Used: 658353792 (627.86 MB)

DFS Remaining: 17881145344 (16.65 GB)

DFS Used%: 6.54%

DFS Remaining%: 85.14%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Wed Jun 21 14:38:19 IST 2017

 

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

  <name>fs.defaultFS</name>

  <value>hdfs://hdfsCluster</value>

</property>

<property>

  <name>dfs.journalnode.edits.dir</name>

  <value>/mnt/hadoopData/hadoop/journal/node/local/data</value>

</property>

</configuration>

 

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

  <name>dfs.name.dir</name>

    <value> <file:///mnt/hadoopData/hadoop/hdfs/namenode%3c/value> file:///mnt/hadoopData/hadoop/hdfs/namenode</value>

</property>

<property>

  <name>dfs.data.dir</name>

    <value> <file:///mnt/hadoopData/hadoop/hdfs/datanode%3c/value> file:///mnt/hadoopData/hadoop/hdfs/datanode</value>

</property>

<property>

<name>dfs.nameservices</name>

<value>hdfsCluster</value>

</property>

<property>

  <name>dfs.ha.namenodes.hdfsCluster</name>

  <value>nn1,nn2</value>

</property>

 

<property>

  <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name>

  <value>node1:8020</value>

</property>

<property>

  <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name>

  <value>node22:8020</value>

</property>

 

<property>

  <name>dfs.namenode.http-address.hdfsCluster.nn1</name>

  <value>node1:50070</value>

</property>

<property>

  <name>dfs.namenode.http-address.hdfsCluster.nn2</name>

  <value>node2:50070</value>

</property>

 

<property>

  <name>dfs.namenode.shared.edits.dir</name>

  <value>qjournal://node1:8485;node2:8485;node3:8485;node4:8485;node5:8485/hdfsCluster</value>

</property>

<property>

  <name>dfs.client.failover.proxy.provider.hdfsCluster</name>

  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<property>

   <name>ha.zookeeper.quorum</name>

   <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value>

</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/hadoop/.ssh/id_rsa</value>

</property>

<property>

   <name>dfs.ha.automatic-failover.enabled</name>

   <value>true</value>

</property>

</configuration>

 

 

From: Ravi Prakash [mailto: <ma...@gmail.com> ravihadoop@gmail.com] 
Sent: 22 June 2017 02:38
To: omprakash < <ma...@cdac.in> omprakashp@cdac.in>
Cc: user < <ma...@hadoop.apache.org> user@hadoop.apache.org>
Subject: Re: Lots of warning messages and exception in namenode logs

 

Hi Omprakash!

What is your default replication set to? What kind of disks do your datanodes have? Were you able to start a cluster with a simple configuration before you started tuning it?

HDFS tries to create the default number of replicas for a block on different datanodes. The Namenode tries to give a list of datanodes that the client can write replicas of the block to. If the Namenode is not able to construct a list with adequate number of datanodes, you will see the message you are seeing. This may mean that datanodes are unhealthy (failed disks), or full (disks have no more space), being decomissioned ( HDFS will not write replicas on decomissioning datanodes) or misconfigured ( I'd suggest turning on storage classes only after a simple configuration works).

When a client that was trying to write a file was killed (e.g. if you killed your MR job), after some time (hard limit expiring) the Namenode will try to recover the file. In your case the namenode is also not able to find enough datanodes for recovering the files.

 

HTH

Ravi

 

 

On Tue, Jun 20, 2017 at 11:50 PM, omprakash < <ma...@cdac.in> omprakashp@cdac.in> wrote:

Hi,

 

I am receiving lots of  warning messages in namenodes logs on ACTIVE NN in my HA Hadoop setup. Below are the logs

 

“2017-06-21 12:11:26,523 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=2, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})

2017-06-21 12:11:26,523 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable:  unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}

2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073894332_153508, replicas= <http://192.168.9.174:50010> 192.168.9.174:50010 for /36962._COPYING_

2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /36962._COPYING_ is closed by DFSClient_NONMAPREDUCE_146762699_1

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and  <http://org.apache.hadoop.net> org.apache.hadoop.net.NetworkTopology

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=2, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})”

 

I am also encountering exceptions in active namenode related to LeaseManager

 

2017-06-21 12:13:16,706 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired hard limit

2017-06-21 12:13:16,706 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1], src=/user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79

2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks are waiting to be minimally replicated. Try again later.

2017-06-21 12:13:16,706 ERROR org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79 in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1]

org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks are waiting to be minimally replicated. Try again later.

        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)

        at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)

        at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)

        at java.lang.Thread.run(Thread.java:745)

 

I have checked the two datanodes. Both are running and have enough space for new data. 

 

PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is setuped using Qourom Journal Manager and  Zookeeper server.

 

Any idea why these errors?

 

Regards

Omprakash Paliwal

HPC-Medical and Bioinformatics Applications Group

Centre for Development of Advanced Computing (C-DAC)

Pune University campus,

PUNE-411007

Maharashtra, India

email: <ma...@cdac.in> omprakashp@cdac.in

Contact :  <tel:+91%2020%202570%204231> +91-20-25704231

 


------------------------------------------------------------------------------------------------------------------------------- 
[ C-DAC is on Social-Media too. Kindly follow us at: 
Facebook:  <https://www.facebook.com/CDACINDIA> https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] 

This e-mail is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy 
all copies and the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email 
is strictly prohibited and appropriate legal action will be taken. 
------------------------------------------------------------------------------------------------------------------------------- 

 


------------------------------------------------------------------------------------------------------------------------------- 
[ C-DAC is on Social-Media too. Kindly follow us at: 
Facebook:  <https://www.facebook.com/CDACINDIA> https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] 

This e-mail is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy 
all copies and the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email 
is strictly prohibited and appropriate legal action will be taken. 
------------------------------------------------------------------------------------------------------------------------------- 

 


------------------------------------------------------------------------------------------------------------------------------- 
[ C-DAC is on Social-Media too. Kindly follow us at: 
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] 

This e-mail is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy 
all copies and the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email 
is strictly prohibited and appropriate legal action will be taken. 
------------------------------------------------------------------------------------------------------------------------------- 

 


------------------------------------------------------------------------------------------------------------------------------- 
[ C-DAC is on Social-Media too. Kindly follow us at: 
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] 

This e-mail is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy 
all copies and the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email 
is strictly prohibited and appropriate legal action will be taken. 
------------------------------------------------------------------------------------------------------------------------------- 


-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------

RE: Lots of warning messages and exception in namenode logs

Posted by Sidharth Kumar <si...@gmail.com>.

Hi,

No, as there will be no copy exists of that file. You can increase the
replication factor to 3 so that there will be 3 copies created and even if
2 data nodes goes down you will still have one copy available which will be
again replicated to 3 by the namenode in due course of time.


Warm Regards

Sidharth Kumar | Mob: +91 8197 555 599/7892 192 367 |  LinkedIn:
www.linkedin.com/in/sidharthkumar2792






On 29-Jun-2017 3:45 PM, "omprakash" <om...@cdac.in> wrote:

> Hi Ravi,
>
>
>
> I have 5 nodes in Hadoop cluster and all have same configurations. After
> setting *dfs.replication=2 *, I did a clean start of hdfs.
>
>
>
> As per your suggestion, I added 2 more datanodes and clean all the data
> and metadata. The performance of the cluster has dramatically improved. I
> can see through logs that the files are randomly replicated to four
> datanodes (2 replica of each file).
>
>
>
> But here my problem arise. I want redundant datanodes such that if any two
> of the datanodes goes down I still be able to get files from other two. In
> above case suppose file block-xyz get stored on datanode1 and datanode2,
> and some day these two datanodes goes down , will I be able to access the
> block-xyz? This is what I am worried about.
>
>
>
>
>
> Regards
>
> Om
>
>
>
>
>
> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com]
> *Sent:* 27 June 2017 22:36
> *To:* omprakash <om...@cdac.in>
> *Cc:* Arpit Agarwal <aa...@hortonworks.com>; user <
> user@hadoop.apache.org>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> This is *not* ok. Please go through the datanode logs of the inactive
> datanode and figure out why its inactive. If you set dfs.replication to 2,
> atleast as many datanodes (and ideally a LOT more datanodes) should be
> active and participating in the cluster.
>
> Do you have the hdfs-site.xml you posted to the mailing list on all the
> nodes (including the Namenode)? Was the file containing block
> *blk_1074074104_337394* created when you had the cluster misconfigured to
> dfs.replication=3 ? You can determine which file the block belongs to using
> this command:
>
> hdfs fsck -blockId blk_1074074104
>
> Once you have the file, you can set its replication using
> hdfs dfs -setrep 2 <Filename>
>
> I'm guessing that you probably have a lot of files with this replication,
> in which case you should set it on / (This would overwrite the replication
> on all the files)
>
>
>
> If the data on this cluster is important I would be very worried about the
> condition its in.
>
> HTH
>
> Ravi
>
>
>
> On Mon, Jun 26, 2017 at 11:22 PM, omprakash <om...@cdac.in> wrote:
>
> Hi all,
>
>
>
> I started the HDFS in DEBUG mode. After examining the logs I found below
> logs which read that the replication factor required is 3 (as against the
> specified *dfs.replication=2*).
>
>
>
> *DEBUG BlockStateChange: BLOCK* NameSystem.UnderReplicationBlock.add:
> blk_1074074104_337394 has only 1 replicas and need 3 replicas so is added
> to neededReplications at priority level 0*
>
>
>
> *P.S : I have 1 datanode active out of 2. *
>
>
>
> I can also see from Namenode UI that the no. of under replicated blocks
> are growing.
>
>
>
> Any idea? Or this is OK.
>
>
>
> regards
>
>
>
>
>
> *From:* omprakash [mailto:omprakashp@cdac.in]
> *Sent:* 23 June 2017 11:02
> *To:* 'Ravi Prakash' <ra...@gmail.com>; 'Arpit Agarwal' <
> aagarwal@hortonworks.com>
> *Cc:* 'user' <us...@hadoop.apache.org>
> *Subject:* RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Arpit,
>
>
>
> I will enable the settings as suggested and will post the results.
>
>
>
> I am just curious about setting *Namenode RPC service  port*. As I have
> checked the *hdfs-site.xml* properties, *dfs.namenode.rpc-address* is
> already set which will be default value to RPC service port also. Does
> specifying any other port have advantage over default one?
>
>
>
> Regarding JvmPauseMonitor Error, there are 5-6 instances of this error in namenode logs. Here is one of them.
>
>
>
> How to identify the size of heap In such cases as I have 4GB of RAM on the
> namenode VM.?
>
>
>
> *@Ravi* Since the file size are very small thus I have only configured a
> VM with 20 GB space. The additional disk is simple SATA disk not SSD.
>
>
>
> As I can see from Namenode UI there are more than 50% of block under
> replicated. I have now 400K blocks out of which 200K are under-replicated.
>
> I will post the results again after changing the value of
> *dfs.namenode.replication.work* <http://dfs.namenode.replication.work>
> *.multiplier.per.iteration*
>
>
>
>
>
> Thanks
>
> Om Prakash
>
>
>
> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com <ra...@gmail.com>]
> *Sent:* 22 June 2017 23:04
> *To:* Arpit Agarwal <aa...@hortonworks.com>
> *Cc:* omprakash <om...@cdac.in>; user <us...@hadoop.apache.org>
>
>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> How big are your disks? Just 20Gb? Just out of curiosity, are these SSDs?
>
> In addition to Arpit's reply, I'm also concerned with the number of
> under-replicated blocks you have: Under replicated blocks: 141863
>
> When there are fewer replicas for a block than there are supposed to be
> (in your case e.g. when there's 1 replica when there ought to be 2), the
> namenode will order the datanodes to create more replicas. The rate at
> which it does this is controlled by
> dfs.namenode.replication.work.multiplier.per.iteration . Given you have
> only 2 datanodes, you'll only be re-replicating 4 blocks every 3 seconds.
> So, it will take quite a while to re-replicate all the blocks.
>
> Also, please know that you want files to be much bigger than 1kb. Ideally
> you'd have a couple of blocks (blocks=128Mb) for each file. You should
> append to files when they are this small.
>
> Please do let us know how things turn out.
>
> Cheers,
>
> Ravi
>
>
>
> On Wed, Jun 21, 2017 at 11:23 PM, Arpit Agarwal <aa...@hortonworks.com>
> wrote:
>
> Hi Omprakash,
>
>
>
> Your description suggests DataNodes cannot send timely reports to the
> NameNode. You can check it by looking for ‘stale’ DataNodes in the NN web
> UI when this situation is occurring. A few ideas:
>
>
>
>    - Try increasing the NameNode RPC handler count a bit (set
>    dfs.namenode.handler.count to 20 in hdfs-site.xml).
>    - Enable the NameNode service RPC port. This requires downtime and
>    reformatting the ZKFC znode.
>    - Search for JvmPauseMonitor messages in your service logs. If you see
>    any, try increasing JVM heap for that service.
>    - Enable debug logging as suggested here:
>
>
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and *
> *org.apache.hadoop.net* <http://org.apache.hadoop.net/>*.NetworkTopology*
>
>
>
>
>
> *From: *omprakash <om...@cdac.in>
> *Date: *Wednesday, June 21, 2017 at 9:23 PM
> *To: *'Ravi Prakash' <ra...@gmail.com>
> *Cc: *'user' <us...@hadoop.apache.org>
> *Subject: *RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Ravi,
>
>
>
> Pasting below my core-site and hdfs-site  configurations. I have kept bare
> minimal configurations for my cluster.  The cluster started fine and I was
> able to put couple of 100K files on hdfs but then when I checked the logs
> there were errors/Exceptions. After restart of datanodes they work well for
> few thousand files but same problem again.  No idea what is wrong.
>
>
>
> *PS: I am pumping 1 file per second to hdfs with aprox size 1KB*
>
>
>
> I thought it may be due to space quota on datanodes but here is the output
> of *hdfs dfs -report*. Looks fine to me
>
>
>
> $ hdfs dfsadmin -report
>
>
>
> Configured Capacity: 42005069824 (39.12 GB)
>
> Present Capacity: 38085839568 (35.47 GB)
>
> DFS Remaining: 34949058560 (32.55 GB)
>
> DFS Used: 3136781008 <(313)%20678-1008> (2.92 GB)
>
> DFS Used%: 8.24%
>
> Under replicated blocks: 141863
>
> Blocks with corrupt replicas: 0
>
> Missing blocks: 0
>
> Missing blocks (with replication factor 1): 0
>
> Pending deletion blocks: 0
>
>
>
> -------------------------------------------------
>
> Live datanodes (2):
>
>
>
> Name: 192.168.9.174:50010 (node5)
>
> Hostname: node5
>
> Decommission Status : Normal
>
> Configured Capacity: 21002534912 (19.56 GB)
>
> DFS Used: 1764211024 (1.64 GB)
>
> Non DFS Used: 811509424 (773.92 MB)
>
> DFS Remaining: 17067913216 <(706)%20791-3216> (15.90 GB)
>
> DFS Used%: 8.40%
>
> DFS Remaining%: 81.27%
>
> Configured Cache Capacity: 0 (0 B)
>
> Cache Used: 0 (0 B)
>
> Cache Remaining: 0 (0 B)
>
> Cache Used%: 100.00%
>
> Cache Remaining%: 0.00%
>
> Xceivers: 2
>
> Last contact: Wed Jun 21 14:38:17 IST 2017
>
>
>
>
>
> Name: 192.168.9.225:50010 (node4)
>
> Hostname: node5
>
> Decommission Status : Normal
>
> Configured Capacity: 21002534912 (19.56 GB)
>
> DFS Used: 1372569984 (1.28 GB)
>
> Non DFS Used: 658353792 (627.86 MB)
>
> DFS Remaining: 17881145344 (16.65 GB)
>
> DFS Used%: 6.54%
>
> DFS Remaining%: 85.14%
>
> Configured Cache Capacity: 0 (0 B)
>
> Cache Used: 0 (0 B)
>
> Cache Remaining: 0 (0 B)
>
> Cache Used%: 100.00%
>
> Cache Remaining%: 0.00%
>
> Xceivers: 1
>
> Last contact: Wed Jun 21 14:38:19 IST 2017
>
>
>
> *core-site.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> <property>
>
>   <name>fs.defaultFS</name>
>
>   <value>hdfs://hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.journalnode.edits.dir</name>
>
>   <value>/mnt/hadoopData/hadoop/journal/node/local/data</value>
>
> </property>
>
> </configuration>
>
>
>
> *hdfs-site.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> *<property>*
>
> *<name>dfs.replication</name>*
>
> *<value>2</value>*
>
> *</property>*
>
> <property>
>
>   <name>dfs.name.dir</name>
>
>     <value>file:///mnt/hadoopData/hadoop/hdfs/namenode</value>
>
> </property>
>
> <property>
>
>   <name>dfs.data.dir</name>
>
>     <value>file:///mnt/hadoopData/hadoop/hdfs/datanode</value>
>
> </property>
>
> <property>
>
> <name>dfs.nameservices</name>
>
> <value>hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.ha.namenodes.hdfsCluster</name>
>
>   <value>nn1,nn2</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name>
>
>   <value>node1:8020</value>
>
> </property>
>
> <property>
>
>   <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name>
>
>   <value>node22:8020</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.http-address.hdfsCluster.nn1</name>
>
>   <value>node1:50070</value>
>
> </property>
>
> <property>
>
>   <name>dfs.namenode.http-address.hdfsCluster.nn2</name>
>
>   <value>node2:50070</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.shared.edits.dir</name>
>
>   <value>qjournal://node1:8485;node2:8485;node3:8485;node4:
> 8485;node5:8485/hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.client.failover.proxy.provider.hdfsCluster</name>
>
>   <value>org.apache.hadoop.hdfs.server.namenode.ha.
> ConfiguredFailoverProxyProvider</value>
>
> </property>
>
> <property>
>
>    <name>ha.zookeeper.quorum</name>
>
>    <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value>
>
> </property>
>
> <property>
>
> <name>dfs.ha.fencing.methods</name>
>
> <value>sshfence</value>
>
> </property>
>
> <property>
>
> <name>dfs.ha.fencing.ssh.private-key-files</name>
>
> <value>/home/hadoop/.ssh/id_rsa</value>
>
> </property>
>
> <property>
>
>    <name>dfs.ha.automatic-failover.enabled</name>
>
>    <value>true</value>
>
> </property>
>
> </configuration>
>
>
>
>
>
> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com]
> *Sent:* 22 June 2017 02:38
> *To:* omprakash <om...@cdac.in>
> *Cc:* user <us...@hadoop.apache.org>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> What is your default replication set to? What kind of disks do your
> datanodes have? Were you able to start a cluster with a simple
> configuration before you started tuning it?
>
> HDFS tries to create the default number of replicas for a block on
> different datanodes. The Namenode tries to give a list of datanodes that
> the client can write replicas of the block to. If the Namenode is not able
> to construct a list with adequate number of datanodes, you will see the
> message you are seeing. This may mean that datanodes are unhealthy (failed
> disks), or full (disks have no more space), being decomissioned ( HDFS will
> not write replicas on decomissioning datanodes) or misconfigured ( I'd
> suggest turning on storage classes only after a simple configuration works).
>
> When a client that was trying to write a file was killed (e.g. if you
> killed your MR job), after some time (hard limit expiring) the Namenode
> will try to recover the file. In your case the namenode is also not able to
> find enough datanodes for recovering the files.
>
>
>
> HTH
>
> Ravi
>
>
>
>
>
> On Tue, Jun 20, 2017 at 11:50 PM, omprakash <om...@cdac.in> wrote:
>
> Hi,
>
>
>
> I am receiving lots of  *warning messages in namenodes* logs on ACTIVE NN
> in my *HA Hadoop setup*. Below are the logs
>
>
>
> *“2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})*
>
> *2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) All required storage types are unavailable:
> unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}*
>
> *2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> allocate blk_1073894332_153508, replicas=**192.168.9.174:50010*
> <http://192.168.9.174:50010>* for /36962._COPYING_*
>
> *2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> completeFile: /36962._COPYING_ is closed by
> DFSClient_NONMAPREDUCE_146762699_1*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and *
> *org.apache.hadoop.net* <http://org.apache.hadoop.net>*.NetworkTopology*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})”*
>
>
>
> I am also encountering exceptions in active namenode related to
> LeaseManager
>
>
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder:
> DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired
> hard limit*
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
> Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1],
> src=/user/hadoop/**2106201707* <(210)%20620-1707>
> */02d5adda-d90f-47cb-85d5-999a079f4d79*
>
> *2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79.
> Committed blocks are waiting to be minimally replicated. Try again later.*
>
> *2017-06-21 12:13:16,706 ERROR
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the
> path /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79
> in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092,
> pending creates: 1]*
>
> *org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79.
> Committed blocks are waiting to be minimally replicated. Try again later.*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)*
>
> *        at java.lang.Thread.run(Thread.java:745)*
>
>
>
> I have checked the two datanodes. Both are running and have enough space
> for new data.
>
>
>
> *PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is
> setuped using Qourom Journal Manager and  Zookeeper server.*
>
>
>
> Any idea why these errors?
>
>
>
> *Regards*
>
> *Omprakash Paliwal*
>
> HPC-Medical and Bioinformatics Applications Group
>
> Centre for Development of Advanced Computing (C-DAC)
>
> Pune University campus,
>
> PUNE-411007
>
> Maharashtra, India
>
> email:omprakashp@cdac.in
>
> Contact : +91-20-25704231 <+91%2020%202570%204231>
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>

RE: Lots of warning messages and exception in namenode logs

Posted by omprakash <om...@cdac.in>.

Hi Ravi,

 

I have 5 nodes in Hadoop cluster and all have same configurations. After setting dfs.replication=2 , I did a clean start of hdfs. 

 

As per your suggestion, I added 2 more datanodes and clean all the data and metadata. The performance of the cluster has dramatically improved. I can see through logs that the files are randomly replicated to four datanodes (2 replica of each file).

 

But here my problem arise. I want redundant datanodes such that if any two of the datanodes goes down I still be able to get files from other two. In above case suppose file block-xyz get stored on datanode1 and datanode2, and some day these two datanodes goes down , will I be able to access the block-xyz? This is what I am worried about.

 

 

Regards

Om

 

 

From: Ravi Prakash [mailto:ravihadoop@gmail.com] 
Sent: 27 June 2017 22:36
To: omprakash <om...@cdac.in>
Cc: Arpit Agarwal <aa...@hortonworks.com>; user <us...@hadoop.apache.org>
Subject: Re: Lots of warning messages and exception in namenode logs

 

Hi Omprakash!

This is *not* ok. Please go through the datanode logs of the inactive datanode and figure out why its inactive. If you set dfs.replication to 2, atleast as many datanodes (and ideally a LOT more datanodes) should be active and participating in the cluster. 

Do you have the hdfs-site.xml you posted to the mailing list on all the nodes (including the Namenode)? Was the file containing block blk_1074074104_337394 created when you had the cluster misconfigured to dfs.replication=3 ? You can determine which file the block belongs to using this command:

hdfs fsck -blockId blk_1074074104

Once you have the file, you can set its replication using 
hdfs dfs -setrep 2 <Filename>

I'm guessing that you probably have a lot of files with this replication, in which case you should set it on / (This would overwrite the replication on all the files)

 

If the data on this cluster is important I would be very worried about the condition its in.

HTH

Ravi

 

On Mon, Jun 26, 2017 at 11:22 PM, omprakash <omprakashp@cdac.in <ma...@cdac.in> > wrote:

Hi all,

 

I started the HDFS in DEBUG mode. After examining the logs I found below logs which read that the replication factor required is 3 (as against the specified dfs.replication=2). 

 

DEBUG BlockStateChange: BLOCK* NameSystem.UnderReplicationBlock.add: blk_1074074104_337394 has only 1 replicas and need 3 replicas so is added to neededReplications at priority level 0

 

P.S : I have 1 datanode active out of 2. 

 

I can also see from Namenode UI that the no. of under replicated blocks are growing.

 

Any idea? Or this is OK. 

 

regards

 

 

From: omprakash [mailto: <ma...@cdac.in> omprakashp@cdac.in] 
Sent: 23 June 2017 11:02
To: 'Ravi Prakash' < <ma...@gmail.com> ravihadoop@gmail.com>; 'Arpit Agarwal' < <ma...@hortonworks.com> aagarwal@hortonworks.com>
Cc: 'user' < <ma...@hadoop.apache.org> user@hadoop.apache.org>
Subject: RE: Lots of warning messages and exception in namenode logs

 

Hi Arpit,

 

I will enable the settings as suggested and will post the results.

 

I am just curious about setting Namenode RPC service  port. As I have checked the hdfs-site.xml properties, dfs.namenode.rpc-address is already set which will be default value to RPC service port also. Does specifying any other port have advantage over default one?

 

Regarding JvmPauseMonitor Error, there are 5-6 instances of this error in namenode logs. Here is one of them.

 

How to identify the size of heap In such cases as I have 4GB of RAM on the namenode VM.?

 

@Ravi Since the file size are very small thus I have only configured a VM with 20 GB space. The additional disk is simple SATA disk not SSD. 

 

As I can see from Namenode UI there are more than 50% of block under replicated. I have now 400K blocks out of which 200K are under-replicated. 

I will post the results again after changing the value of  <http://dfs.namenode.replication.work> dfs.namenode.replication.work.multiplier.per.iteration

 

 

Thanks 

Om Prakash

 

From: Ravi Prakash [ <ma...@gmail.com> mailto:ravihadoop@gmail.com] 
Sent: 22 June 2017 23:04
To: Arpit Agarwal < <ma...@hortonworks.com> aagarwal@hortonworks.com>
Cc: omprakash < <ma...@cdac.in> omprakashp@cdac.in>; user < <ma...@hadoop.apache.org> user@hadoop.apache.org>


Subject: Re: Lots of warning messages and exception in namenode logs

 

Hi Omprakash!

How big are your disks? Just 20Gb? Just out of curiosity, are these SSDs?

In addition to Arpit's reply, I'm also concerned with the number of under-replicated blocks you have: Under replicated blocks: 141863

When there are fewer replicas for a block than there are supposed to be (in your case e.g. when there's 1 replica when there ought to be 2), the namenode will order the datanodes to create more replicas. The rate at which it does this is controlled by 
dfs.namenode.replication.work.multiplier.per.iteration . Given you have only 2 datanodes, you'll only be re-replicating 4 blocks every 3 seconds. So, it will take quite a while to re-replicate all the blocks. 

Also, please know that you want files to be much bigger than 1kb. Ideally you'd have a couple of blocks (blocks=128Mb) for each file. You should append to files when they are this small.

Please do let us know how things turn out.

Cheers,

Ravi

 

On Wed, Jun 21, 2017 at 11:23 PM, Arpit Agarwal <aagarwal@hortonworks.com <ma...@hortonworks.com> > wrote:

Hi Omprakash,

 

Your description suggests DataNodes cannot send timely reports to the NameNode. You can check it by looking for ‘stale’ DataNodes in the NN web UI when this situation is occurring. A few ideas:

 

*	Try increasing the NameNode RPC handler count a bit (set dfs.namenode.handler.count to 20 in hdfs-site.xml).
*	Enable the NameNode service RPC port. This requires downtime and reformatting the ZKFC znode.
*	Search for JvmPauseMonitor messages in your service logs. If you see any, try increasing JVM heap for that service.
*	Enable debug logging as suggested here:

 

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and  <http://org.apache.hadoop.net/> org.apache.hadoop.net.NetworkTopology

 

 

From: omprakash < <ma...@cdac.in> omprakashp@cdac.in>
Date: Wednesday, June 21, 2017 at 9:23 PM
To: 'Ravi Prakash' < <ma...@gmail.com> ravihadoop@gmail.com>
Cc: 'user' < <ma...@hadoop.apache.org> user@hadoop.apache.org>
Subject: RE: Lots of warning messages and exception in namenode logs

 

Hi Ravi,

 

Pasting below my core-site and hdfs-site  configurations. I have kept bare minimal configurations for my cluster.  The cluster started fine and I was able to put couple of 100K files on hdfs but then when I checked the logs there were errors/Exceptions. After restart of datanodes they work well for few thousand files but same problem again.  No idea what is wrong. 

 

PS: I am pumping 1 file per second to hdfs with aprox size 1KB

 

I thought it may be due to space quota on datanodes but here is the output of hdfs dfs -report. Looks fine to me

 

$ hdfs dfsadmin -report

 

Configured Capacity: 42005069824 (39.12 GB)

Present Capacity: 38085839568 (35.47 GB)

DFS Remaining: 34949058560 (32.55 GB)

DFS Used:  <tel:(313)%20678-1008> 3136781008 (2.92 GB)

DFS Used%: 8.24%

Under replicated blocks: 141863

Blocks with corrupt replicas: 0

Missing blocks: 0

Missing blocks (with replication factor 1): 0

Pending deletion blocks: 0

 

-------------------------------------------------

Live datanodes (2):

 

Name:  <http://192.168.9.174:50010> 192.168.9.174:50010 (node5)

Hostname: node5

Decommission Status : Normal

Configured Capacity: 21002534912 (19.56 GB)

DFS Used: 1764211024 (1.64 GB)

Non DFS Used: 811509424 (773.92 MB)

DFS Remaining:  <tel:(706)%20791-3216> 17067913216 (15.90 GB)

DFS Used%: 8.40%

DFS Remaining%: 81.27%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 2

Last contact: Wed Jun 21 14:38:17 IST 2017

 

 

Name:  <http://192.168.9.225:50010> 192.168.9.225:50010 (node4)

Hostname: node5

Decommission Status : Normal

Configured Capacity: 21002534912 (19.56 GB)

DFS Used: 1372569984 (1.28 GB)

Non DFS Used: 658353792 (627.86 MB)

DFS Remaining: 17881145344 (16.65 GB)

DFS Used%: 6.54%

DFS Remaining%: 85.14%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Wed Jun 21 14:38:19 IST 2017

 

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

  <name>fs.defaultFS</name>

  <value>hdfs://hdfsCluster</value>

</property>

<property>

  <name>dfs.journalnode.edits.dir</name>

  <value>/mnt/hadoopData/hadoop/journal/node/local/data</value>

</property>

</configuration>

 

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

  <name>dfs.name.dir</name>

    <value> <file:///mnt/hadoopData/hadoop/hdfs/namenode%3c/value> file:///mnt/hadoopData/hadoop/hdfs/namenode</value>

</property>

<property>

  <name>dfs.data.dir</name>

    <value> <file:///mnt/hadoopData/hadoop/hdfs/datanode%3c/value> file:///mnt/hadoopData/hadoop/hdfs/datanode</value>

</property>

<property>

<name>dfs.nameservices</name>

<value>hdfsCluster</value>

</property>

<property>

  <name>dfs.ha.namenodes.hdfsCluster</name>

  <value>nn1,nn2</value>

</property>

 

<property>

  <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name>

  <value>node1:8020</value>

</property>

<property>

  <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name>

  <value>node22:8020</value>

</property>

 

<property>

  <name>dfs.namenode.http-address.hdfsCluster.nn1</name>

  <value>node1:50070</value>

</property>

<property>

  <name>dfs.namenode.http-address.hdfsCluster.nn2</name>

  <value>node2:50070</value>

</property>

 

<property>

  <name>dfs.namenode.shared.edits.dir</name>

  <value>qjournal://node1:8485;node2:8485;node3:8485;node4:8485;node5:8485/hdfsCluster</value>

</property>

<property>

  <name>dfs.client.failover.proxy.provider.hdfsCluster</name>

  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<property>

   <name>ha.zookeeper.quorum</name>

   <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value>

</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/hadoop/.ssh/id_rsa</value>

</property>

<property>

   <name>dfs.ha.automatic-failover.enabled</name>

   <value>true</value>

</property>

</configuration>

 

 

From: Ravi Prakash [mailto: <ma...@gmail.com> ravihadoop@gmail.com] 
Sent: 22 June 2017 02:38
To: omprakash < <ma...@cdac.in> omprakashp@cdac.in>
Cc: user < <ma...@hadoop.apache.org> user@hadoop.apache.org>
Subject: Re: Lots of warning messages and exception in namenode logs

 

Hi Omprakash!

What is your default replication set to? What kind of disks do your datanodes have? Were you able to start a cluster with a simple configuration before you started tuning it?

HDFS tries to create the default number of replicas for a block on different datanodes. The Namenode tries to give a list of datanodes that the client can write replicas of the block to. If the Namenode is not able to construct a list with adequate number of datanodes, you will see the message you are seeing. This may mean that datanodes are unhealthy (failed disks), or full (disks have no more space), being decomissioned ( HDFS will not write replicas on decomissioning datanodes) or misconfigured ( I'd suggest turning on storage classes only after a simple configuration works).

When a client that was trying to write a file was killed (e.g. if you killed your MR job), after some time (hard limit expiring) the Namenode will try to recover the file. In your case the namenode is also not able to find enough datanodes for recovering the files.

 

HTH

Ravi

 

 

On Tue, Jun 20, 2017 at 11:50 PM, omprakash < <ma...@cdac.in> omprakashp@cdac.in> wrote:

Hi,

 

I am receiving lots of  warning messages in namenodes logs on ACTIVE NN in my HA Hadoop setup. Below are the logs

 

“2017-06-21 12:11:26,523 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=2, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})

2017-06-21 12:11:26,523 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable:  unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}

2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073894332_153508, replicas= <http://192.168.9.174:50010> 192.168.9.174:50010 for /36962._COPYING_

2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /36962._COPYING_ is closed by DFSClient_NONMAPREDUCE_146762699_1

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and  <http://org.apache.hadoop.net> org.apache.hadoop.net.NetworkTopology

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=2, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})”

 

I am also encountering exceptions in active namenode related to LeaseManager

 

2017-06-21 12:13:16,706 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired hard limit

2017-06-21 12:13:16,706 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1], src=/user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79

2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks are waiting to be minimally replicated. Try again later.

2017-06-21 12:13:16,706 ERROR org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79 in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1]

org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks are waiting to be minimally replicated. Try again later.

        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)

        at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)

        at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)

        at java.lang.Thread.run(Thread.java:745)

 

I have checked the two datanodes. Both are running and have enough space for new data. 

 

PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is setuped using Qourom Journal Manager and  Zookeeper server.

 

Any idea why these errors?

 

Regards

Omprakash Paliwal

HPC-Medical and Bioinformatics Applications Group

Centre for Development of Advanced Computing (C-DAC)

Pune University campus,

PUNE-411007

Maharashtra, India

email: <ma...@cdac.in> omprakashp@cdac.in

Contact :  <tel:+91%2020%202570%204231> +91-20-25704231

 


------------------------------------------------------------------------------------------------------------------------------- 
[ C-DAC is on Social-Media too. Kindly follow us at: 
Facebook:  <https://www.facebook.com/CDACINDIA> https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] 

This e-mail is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy 
all copies and the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email 
is strictly prohibited and appropriate legal action will be taken. 
------------------------------------------------------------------------------------------------------------------------------- 

 


------------------------------------------------------------------------------------------------------------------------------- 
[ C-DAC is on Social-Media too. Kindly follow us at: 
Facebook:  <https://www.facebook.com/CDACINDIA> https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] 

This e-mail is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy 
all copies and the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email 
is strictly prohibited and appropriate legal action will be taken. 
------------------------------------------------------------------------------------------------------------------------------- 

 


------------------------------------------------------------------------------------------------------------------------------- 
[ C-DAC is on Social-Media too. Kindly follow us at: 
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] 

This e-mail is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy 
all copies and the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email 
is strictly prohibited and appropriate legal action will be taken. 
------------------------------------------------------------------------------------------------------------------------------- 

 


-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------

Re: Lots of warning messages and exception in namenode logs

Posted by Ravi Prakash <ra...@gmail.com>.

Hi Omprakash!

This is *not* ok. Please go through the datanode logs of the inactive
datanode and figure out why its inactive. If you set dfs.replication to 2,
atleast as many datanodes (and ideally a LOT more datanodes) should be
active and participating in the cluster.

Do you have the hdfs-site.xml you posted to the mailing list on all the
nodes (including the Namenode)? Was the file containing block
*blk_1074074104_337394* created when you had the cluster misconfigured to
dfs.replication=3 ? You can determine which file the block belongs to using
this command:

hdfs fsck -blockId blk_1074074104

Once you have the file, you can set its replication using
hdfs dfs -setrep 2 <Filename>

I'm guessing that you probably have a lot of files with this replication,
in which case you should set it on / (This would overwrite the replication
on all the files)

If the data on this cluster is important I would be very worried about the
condition its in.

HTH
Ravi

On Mon, Jun 26, 2017 at 11:22 PM, omprakash <om...@cdac.in> wrote:

> Hi all,
>
>
>
> I started the HDFS in DEBUG mode. After examining the logs I found below
> logs which read that the replication factor required is 3 (as against the
> specified *dfs.replication=2*).
>
>
>
> *DEBUG BlockStateChange: BLOCK* NameSystem.UnderReplicationBlock.add:
> blk_1074074104_337394 has only 1 replicas and need 3 replicas so is added
> to neededReplications at priority level 0*
>
>
>
> *P.S : I have 1 datanode active out of 2. *
>
>
>
> I can also see from Namenode UI that the no. of under replicated blocks
> are growing.
>
>
>
> Any idea? Or this is OK.
>
>
>
> regards
>
>
>
>
>
> *From:* omprakash [mailto:omprakashp@cdac.in]
> *Sent:* 23 June 2017 11:02
> *To:* 'Ravi Prakash' <ra...@gmail.com>; 'Arpit Agarwal' <
> aagarwal@hortonworks.com>
> *Cc:* 'user' <us...@hadoop.apache.org>
> *Subject:* RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Arpit,
>
>
>
> I will enable the settings as suggested and will post the results.
>
>
>
> I am just curious about setting *Namenode RPC service  port*. As I have
> checked the *hdfs-site.xml* properties, *dfs.namenode.rpc-address* is
> already set which will be default value to RPC service port also. Does
> specifying any other port have advantage over default one?
>
>
>
> Regarding JvmPauseMonitor Error, there are 5-6 instances of this error in namenode logs. Here is one of them.
>
>
>
> How to identify the size of heap In such cases as I have 4GB of RAM on the
> namenode VM.?
>
>
>
> *@Ravi* Since the file size are very small thus I have only configured a
> VM with 20 GB space. The additional disk is simple SATA disk not SSD.
>
>
>
> As I can see from Namenode UI there are more than 50% of block under
> replicated. I have now 400K blocks out of which 200K are under-replicated.
>
> I will post the results again after changing the value of *dfs.namenode.replication.work
> <http://dfs.namenode.replication.work>.multiplier.per.iteration*
>
>
>
>
>
> Thanks
>
> Om Prakash
>
>
>
> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com <ra...@gmail.com>]
> *Sent:* 22 June 2017 23:04
> *To:* Arpit Agarwal <aa...@hortonworks.com>
> *Cc:* omprakash <om...@cdac.in>; user <us...@hadoop.apache.org>
>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> How big are your disks? Just 20Gb? Just out of curiosity, are these SSDs?
>
> In addition to Arpit's reply, I'm also concerned with the number of
> under-replicated blocks you have: Under replicated blocks: 141863
>
> When there are fewer replicas for a block than there are supposed to be
> (in your case e.g. when there's 1 replica when there ought to be 2), the
> namenode will order the datanodes to create more replicas. The rate at
> which it does this is controlled by
> dfs.namenode.replication.work.multiplier.per.iteration . Given you have
> only 2 datanodes, you'll only be re-replicating 4 blocks every 3 seconds.
> So, it will take quite a while to re-replicate all the blocks.
>
> Also, please know that you want files to be much bigger than 1kb. Ideally
> you'd have a couple of blocks (blocks=128Mb) for each file. You should
> append to files when they are this small.
>
> Please do let us know how things turn out.
>
> Cheers,
>
> Ravi
>
>
>
> On Wed, Jun 21, 2017 at 11:23 PM, Arpit Agarwal <aa...@hortonworks.com>
> wrote:
>
> Hi Omprakash,
>
>
>
> Your description suggests DataNodes cannot send timely reports to the
> NameNode. You can check it by looking for ‘stale’ DataNodes in the NN web
> UI when this situation is occurring. A few ideas:
>
>
>
>    - Try increasing the NameNode RPC handler count a bit (set
>    dfs.namenode.handler.count to 20 in hdfs-site.xml).
>    - Enable the NameNode service RPC port. This requires downtime and
>    reformatting the ZKFC znode.
>    - Search for JvmPauseMonitor messages in your service logs. If you see
>    any, try increasing JVM heap for that service.
>    - Enable debug logging as suggested here:
>
>
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and **org.apache.hadoop.net
> <http://org.apache.hadoop.net/>.NetworkTopology*
>
>
>
>
>
> *From: *omprakash <om...@cdac.in>
> *Date: *Wednesday, June 21, 2017 at 9:23 PM
> *To: *'Ravi Prakash' <ra...@gmail.com>
> *Cc: *'user' <us...@hadoop.apache.org>
> *Subject: *RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Ravi,
>
>
>
> Pasting below my core-site and hdfs-site  configurations. I have kept bare
> minimal configurations for my cluster.  The cluster started fine and I was
> able to put couple of 100K files on hdfs but then when I checked the logs
> there were errors/Exceptions. After restart of datanodes they work well for
> few thousand files but same problem again.  No idea what is wrong.
>
>
>
> *PS: I am pumping 1 file per second to hdfs with aprox size 1KB*
>
>
>
> I thought it may be due to space quota on datanodes but here is the output
> of *hdfs dfs -report*. Looks fine to me
>
>
>
> $ hdfs dfsadmin -report
>
>
>
> Configured Capacity: 42005069824 (39.12 GB)
>
> Present Capacity: 38085839568 (35.47 GB)
>
> DFS Remaining: 34949058560 (32.55 GB)
>
> DFS Used: 3136781008 <(313)%20678-1008> (2.92 GB)
>
> DFS Used%: 8.24%
>
> Under replicated blocks: 141863
>
> Blocks with corrupt replicas: 0
>
> Missing blocks: 0
>
> Missing blocks (with replication factor 1): 0
>
> Pending deletion blocks: 0
>
>
>
> -------------------------------------------------
>
> Live datanodes (2):
>
>
>
> Name: 192.168.9.174:50010 (node5)
>
> Hostname: node5
>
> Decommission Status : Normal
>
> Configured Capacity: 21002534912 (19.56 GB)
>
> DFS Used: 1764211024 (1.64 GB)
>
> Non DFS Used: 811509424 (773.92 MB)
>
> DFS Remaining: 17067913216 <(706)%20791-3216> (15.90 GB)
>
> DFS Used%: 8.40%
>
> DFS Remaining%: 81.27%
>
> Configured Cache Capacity: 0 (0 B)
>
> Cache Used: 0 (0 B)
>
> Cache Remaining: 0 (0 B)
>
> Cache Used%: 100.00%
>
> Cache Remaining%: 0.00%
>
> Xceivers: 2
>
> Last contact: Wed Jun 21 14:38:17 IST 2017
>
>
>
>
>
> Name: 192.168.9.225:50010 (node4)
>
> Hostname: node5
>
> Decommission Status : Normal
>
> Configured Capacity: 21002534912 (19.56 GB)
>
> DFS Used: 1372569984 (1.28 GB)
>
> Non DFS Used: 658353792 (627.86 MB)
>
> DFS Remaining: 17881145344 (16.65 GB)
>
> DFS Used%: 6.54%
>
> DFS Remaining%: 85.14%
>
> Configured Cache Capacity: 0 (0 B)
>
> Cache Used: 0 (0 B)
>
> Cache Remaining: 0 (0 B)
>
> Cache Used%: 100.00%
>
> Cache Remaining%: 0.00%
>
> Xceivers: 1
>
> Last contact: Wed Jun 21 14:38:19 IST 2017
>
>
>
> *core-site.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> <property>
>
>   <name>fs.defaultFS</name>
>
>   <value>hdfs://hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.journalnode.edits.dir</name>
>
>   <value>/mnt/hadoopData/hadoop/journal/node/local/data</value>
>
> </property>
>
> </configuration>
>
>
>
> *hdfs-site.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> *<property>*
>
> *<name>dfs.replication</name>*
>
> *<value>2</value>*
>
> *</property>*
>
> <property>
>
>   <name>dfs.name.dir</name>
>
>     <value>file:///mnt/hadoopData/hadoop/hdfs/namenode</value>
>
> </property>
>
> <property>
>
>   <name>dfs.data.dir</name>
>
>     <value>file:///mnt/hadoopData/hadoop/hdfs/datanode</value>
>
> </property>
>
> <property>
>
> <name>dfs.nameservices</name>
>
> <value>hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.ha.namenodes.hdfsCluster</name>
>
>   <value>nn1,nn2</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name>
>
>   <value>node1:8020</value>
>
> </property>
>
> <property>
>
>   <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name>
>
>   <value>node22:8020</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.http-address.hdfsCluster.nn1</name>
>
>   <value>node1:50070</value>
>
> </property>
>
> <property>
>
>   <name>dfs.namenode.http-address.hdfsCluster.nn2</name>
>
>   <value>node2:50070</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.shared.edits.dir</name>
>
>   <value>qjournal://node1:8485;node2:8485;node3:8485;node4:
> 8485;node5:8485/hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.client.failover.proxy.provider.hdfsCluster</name>
>
>   <value>org.apache.hadoop.hdfs.server.namenode.ha.
> ConfiguredFailoverProxyProvider</value>
>
> </property>
>
> <property>
>
>    <name>ha.zookeeper.quorum</name>
>
>    <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value>
>
> </property>
>
> <property>
>
> <name>dfs.ha.fencing.methods</name>
>
> <value>sshfence</value>
>
> </property>
>
> <property>
>
> <name>dfs.ha.fencing.ssh.private-key-files</name>
>
> <value>/home/hadoop/.ssh/id_rsa</value>
>
> </property>
>
> <property>
>
>    <name>dfs.ha.automatic-failover.enabled</name>
>
>    <value>true</value>
>
> </property>
>
> </configuration>
>
>
>
>
>
> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com]
> *Sent:* 22 June 2017 02:38
> *To:* omprakash <om...@cdac.in>
> *Cc:* user <us...@hadoop.apache.org>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> What is your default replication set to? What kind of disks do your
> datanodes have? Were you able to start a cluster with a simple
> configuration before you started tuning it?
>
> HDFS tries to create the default number of replicas for a block on
> different datanodes. The Namenode tries to give a list of datanodes that
> the client can write replicas of the block to. If the Namenode is not able
> to construct a list with adequate number of datanodes, you will see the
> message you are seeing. This may mean that datanodes are unhealthy (failed
> disks), or full (disks have no more space), being decomissioned ( HDFS will
> not write replicas on decomissioning datanodes) or misconfigured ( I'd
> suggest turning on storage classes only after a simple configuration works).
>
> When a client that was trying to write a file was killed (e.g. if you
> killed your MR job), after some time (hard limit expiring) the Namenode
> will try to recover the file. In your case the namenode is also not able to
> find enough datanodes for recovering the files.
>
>
>
> HTH
>
> Ravi
>
>
>
>
>
> On Tue, Jun 20, 2017 at 11:50 PM, omprakash <om...@cdac.in> wrote:
>
> Hi,
>
>
>
> I am receiving lots of  *warning messages in namenodes* logs on ACTIVE NN
> in my *HA Hadoop setup*. Below are the logs
>
>
>
> *“2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})*
>
> *2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) All required storage types are unavailable:
> unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}*
>
> *2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> allocate blk_1073894332_153508, replicas=**192.168.9.174:50010*
> <http://192.168.9.174:50010>* for /36962._COPYING_*
>
> *2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> completeFile: /36962._COPYING_ is closed by
> DFSClient_NONMAPREDUCE_146762699_1*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and *
> *org.apache.hadoop.net* <http://org.apache.hadoop.net>*.NetworkTopology*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})”*
>
>
>
> I am also encountering exceptions in active namenode related to
> LeaseManager
>
>
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder:
> DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired
> hard limit*
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
> Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1],
> src=/user/hadoop/**2106201707* <(210)%20620-1707>
> */02d5adda-d90f-47cb-85d5-999a079f4d79*
>
> *2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79.
> Committed blocks are waiting to be minimally replicated. Try again later.*
>
> *2017-06-21 12:13:16,706 ERROR
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the
> path /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79
> in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092,
> pending creates: 1]*
>
> *org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79.
> Committed blocks are waiting to be minimally replicated. Try again later.*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)*
>
> *        at java.lang.Thread.run(Thread.java:745)*
>
>
>
> I have checked the two datanodes. Both are running and have enough space
> for new data.
>
>
>
> *PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is
> setuped using Qourom Journal Manager and  Zookeeper server.*
>
>
>
> Any idea why these errors?
>
>
>
> *Regards*
>
> *Omprakash Paliwal*
>
> HPC-Medical and Bioinformatics Applications Group
>
> Centre for Development of Advanced Computing (C-DAC)
>
> Pune University campus,
>
> PUNE-411007
>
> Maharashtra, India
>
> email:omprakashp@cdac.in
>
> Contact : +91-20-25704231 <+91%2020%202570%204231>
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>

RE: Lots of warning messages and exception in namenode logs

Posted by omprakash <om...@cdac.in>.

Hi all,

 

I started the HDFS in DEBUG mode. After examining the logs I found below logs which read that the replication factor required is 3 (as against the specified dfs.replication=2). 

 

DEBUG BlockStateChange: BLOCK* NameSystem.UnderReplicationBlock.add: blk_1074074104_337394 has only 1 replicas and need 3 replicas so is added to neededReplications at priority level 0

 

P.S : I have 1 datanode active out of 2. 

 

I can also see from Namenode UI that the no. of under replicated blocks are growing.

 

Any idea? Or this is OK. 

 

regards

 

 

From: omprakash [mailto:omprakashp@cdac.in] 
Sent: 23 June 2017 11:02
To: 'Ravi Prakash' <ra...@gmail.com>; 'Arpit Agarwal' <aa...@hortonworks.com>
Cc: 'user' <us...@hadoop.apache.org>
Subject: RE: Lots of warning messages and exception in namenode logs

 

Hi Arpit,

 

I will enable the settings as suggested and will post the results.

 

I am just curious about setting Namenode RPC service  port. As I have checked the hdfs-site.xml properties, dfs.namenode.rpc-address is already set which will be default value to RPC service port also. Does specifying any other port have advantage over default one?

 

Regarding JvmPauseMonitor Error, there are 5-6 instances of this error in namenode logs. Here is one of them.

 

How to identify the size of heap In such cases as I have 4GB of RAM on the namenode VM.?

 

@Ravi Since the file size are very small thus I have only configured a VM with 20 GB space. The additional disk is simple SATA disk not SSD. 

 

As I can see from Namenode UI there are more than 50% of block under replicated. I have now 400K blocks out of which 200K are under-replicated. 

I will post the results again after changing the value of dfs.namenode.replication.work.multiplier.per.iteration

 

 

Thanks 

Om Prakash

 

From: Ravi Prakash [mailto:ravihadoop@gmail.com] 
Sent: 22 June 2017 23:04
To: Arpit Agarwal <aagarwal@hortonworks.com <ma...@hortonworks.com> >
Cc: omprakash <omprakashp@cdac.in <ma...@cdac.in> >; user <user@hadoop.apache.org <ma...@hadoop.apache.org> >
Subject: Re: Lots of warning messages and exception in namenode logs

 

Hi Omprakash!

How big are your disks? Just 20Gb? Just out of curiosity, are these SSDs?

In addition to Arpit's reply, I'm also concerned with the number of under-replicated blocks you have: Under replicated blocks: 141863

When there are fewer replicas for a block than there are supposed to be (in your case e.g. when there's 1 replica when there ought to be 2), the namenode will order the datanodes to create more replicas. The rate at which it does this is controlled by 
dfs.namenode.replication.work.multiplier.per.iteration . Given you have only 2 datanodes, you'll only be re-replicating 4 blocks every 3 seconds. So, it will take quite a while to re-replicate all the blocks. 

Also, please know that you want files to be much bigger than 1kb. Ideally you'd have a couple of blocks (blocks=128Mb) for each file. You should append to files when they are this small.

Please do let us know how things turn out.

Cheers,

Ravi

 

On Wed, Jun 21, 2017 at 11:23 PM, Arpit Agarwal <aagarwal@hortonworks.com <ma...@hortonworks.com> > wrote:

Hi Omprakash,

 

Your description suggests DataNodes cannot send timely reports to the NameNode. You can check it by looking for ‘stale’ DataNodes in the NN web UI when this situation is occurring. A few ideas:

 

*	Try increasing the NameNode RPC handler count a bit (set dfs.namenode.handler.count to 20 in hdfs-site.xml).
*	Enable the NameNode service RPC port. This requires downtime and reformatting the ZKFC znode.
*	Search for JvmPauseMonitor messages in your service logs. If you see any, try increasing JVM heap for that service.
*	Enable debug logging as suggested here:

 

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and  <http://org.apache.hadoop.net/> org.apache.hadoop.net.NetworkTopology

 

 

From: omprakash <omprakashp@cdac.in <ma...@cdac.in> >
Date: Wednesday, June 21, 2017 at 9:23 PM
To: 'Ravi Prakash' <ravihadoop@gmail.com <ma...@gmail.com> >
Cc: 'user' <user@hadoop.apache.org <ma...@hadoop.apache.org> >
Subject: RE: Lots of warning messages and exception in namenode logs

 

Hi Ravi,

 

Pasting below my core-site and hdfs-site  configurations. I have kept bare minimal configurations for my cluster.  The cluster started fine and I was able to put couple of 100K files on hdfs but then when I checked the logs there were errors/Exceptions. After restart of datanodes they work well for few thousand files but same problem again.  No idea what is wrong. 

 

PS: I am pumping 1 file per second to hdfs with aprox size 1KB

 

I thought it may be due to space quota on datanodes but here is the output of hdfs dfs -report. Looks fine to me

 

$ hdfs dfsadmin -report

 

Configured Capacity: 42005069824 (39.12 GB)

Present Capacity: 38085839568 (35.47 GB)

DFS Remaining: 34949058560 (32.55 GB)

DFS Used: 3136781008 <tel:(313)%20678-1008>  (2.92 GB)

DFS Used%: 8.24%

Under replicated blocks: 141863

Blocks with corrupt replicas: 0

Missing blocks: 0

Missing blocks (with replication factor 1): 0

Pending deletion blocks: 0

 

-------------------------------------------------

Live datanodes (2):

 

Name: 192.168.9.174:50010 <http://192.168.9.174:50010>  (node5)

Hostname: node5

Decommission Status : Normal

Configured Capacity: 21002534912 (19.56 GB)

DFS Used: 1764211024 (1.64 GB)

Non DFS Used: 811509424 (773.92 MB)

DFS Remaining: 17067913216 <tel:(706)%20791-3216>  (15.90 GB)

DFS Used%: 8.40%

DFS Remaining%: 81.27%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 2

Last contact: Wed Jun 21 14:38:17 IST 2017

 

 

Name: 192.168.9.225:50010 <http://192.168.9.225:50010>  (node4)

Hostname: node5

Decommission Status : Normal

Configured Capacity: 21002534912 (19.56 GB)

DFS Used: 1372569984 (1.28 GB)

Non DFS Used: 658353792 (627.86 MB)

DFS Remaining: 17881145344 (16.65 GB)

DFS Used%: 6.54%

DFS Remaining%: 85.14%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Wed Jun 21 14:38:19 IST 2017

 

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

  <name>fs.defaultFS</name>

  <value>hdfs://hdfsCluster</value>

</property>

<property>

  <name>dfs.journalnode.edits.dir</name>

  <value>/mnt/hadoopData/hadoop/journal/node/local/data</value>

</property>

</configuration>

 

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

  <name>dfs.name.dir</name>

    <value>file:///mnt/hadoopData/hadoop/hdfs/namenode <file:///mnt/hadoopData/hadoop/hdfs/namenode%3c/value> </value>

</property>

<property>

  <name>dfs.data.dir</name>

    <value>file:///mnt/hadoopData/hadoop/hdfs/datanode <file:///mnt/hadoopData/hadoop/hdfs/datanode%3c/value> </value>

</property>

<property>

<name>dfs.nameservices</name>

<value>hdfsCluster</value>

</property>

<property>

  <name>dfs.ha.namenodes.hdfsCluster</name>

  <value>nn1,nn2</value>

</property>

 

<property>

  <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name>

  <value>node1:8020</value>

</property>

<property>

  <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name>

  <value>node22:8020</value>

</property>

 

<property>

  <name>dfs.namenode.http-address.hdfsCluster.nn1</name>

  <value>node1:50070</value>

</property>

<property>

  <name>dfs.namenode.http-address.hdfsCluster.nn2</name>

  <value>node2:50070</value>

</property>

 

<property>

  <name>dfs.namenode.shared.edits.dir</name>

  <value>qjournal://node1:8485;node2:8485;node3:8485;node4:8485;node5:8485/hdfsCluster</value>

</property>

<property>

  <name>dfs.client.failover.proxy.provider.hdfsCluster</name>

  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<property>

   <name>ha.zookeeper.quorum</name>

   <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value>

</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/hadoop/.ssh/id_rsa</value>

</property>

<property>

   <name>dfs.ha.automatic-failover.enabled</name>

   <value>true</value>

</property>

</configuration>

 

 

From: Ravi Prakash [mailto:ravihadoop@gmail.com <ma...@gmail.com> ] 
Sent: 22 June 2017 02:38
To: omprakash <omprakashp@cdac.in <ma...@cdac.in> >
Cc: user <user@hadoop.apache.org <ma...@hadoop.apache.org> >
Subject: Re: Lots of warning messages and exception in namenode logs

 

Hi Omprakash!

What is your default replication set to? What kind of disks do your datanodes have? Were you able to start a cluster with a simple configuration before you started tuning it?

HDFS tries to create the default number of replicas for a block on different datanodes. The Namenode tries to give a list of datanodes that the client can write replicas of the block to. If the Namenode is not able to construct a list with adequate number of datanodes, you will see the message you are seeing. This may mean that datanodes are unhealthy (failed disks), or full (disks have no more space), being decomissioned ( HDFS will not write replicas on decomissioning datanodes) or misconfigured ( I'd suggest turning on storage classes only after a simple configuration works).

When a client that was trying to write a file was killed (e.g. if you killed your MR job), after some time (hard limit expiring) the Namenode will try to recover the file. In your case the namenode is also not able to find enough datanodes for recovering the files.

 

HTH

Ravi





 

On Tue, Jun 20, 2017 at 11:50 PM, omprakash <omprakashp@cdac.in <ma...@cdac.in> > wrote:

Hi,

 

I am receiving lots of  warning messages in namenodes logs on ACTIVE NN in my HA Hadoop setup. Below are the logs

 

“2017-06-21 12:11:26,523 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=2, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})

2017-06-21 12:11:26,523 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable:  unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}

2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073894332_153508, replicas= <http://192.168.9.174:50010> 192.168.9.174:50010 for /36962._COPYING_

2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /36962._COPYING_ is closed by DFSClient_NONMAPREDUCE_146762699_1

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and  <http://org.apache.hadoop.net> org.apache.hadoop.net.NetworkTopology

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=2, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})”

 

I am also encountering exceptions in active namenode related to LeaseManager

 

2017-06-21 12:13:16,706 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired hard limit

2017-06-21 12:13:16,706 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1], src=/user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79

2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks are waiting to be minimally replicated. Try again later.

2017-06-21 12:13:16,706 ERROR org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79 in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1]

org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks are waiting to be minimally replicated. Try again later.

        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)

        at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)

        at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)

        at java.lang.Thread.run(Thread.java:745)

 

I have checked the two datanodes. Both are running and have enough space for new data. 

 

PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is setuped using Qourom Journal Manager and  Zookeeper server.

 

Any idea why these errors?

 

Regards

Omprakash Paliwal

HPC-Medical and Bioinformatics Applications Group

Centre for Development of Advanced Computing (C-DAC)

Pune University campus,

PUNE-411007

Maharashtra, India

email: <ma...@cdac.in> omprakashp@cdac.in

Contact :  <tel:+91%2020%202570%204231> +91-20-25704231

 


------------------------------------------------------------------------------------------------------------------------------- 
[ C-DAC is on Social-Media too. Kindly follow us at: 
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] 

This e-mail is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy 
all copies and the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email 
is strictly prohibited and appropriate legal action will be taken. 
------------------------------------------------------------------------------------------------------------------------------- 

 


------------------------------------------------------------------------------------------------------------------------------- 
[ C-DAC is on Social-Media too. Kindly follow us at: 
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] 

This e-mail is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy 
all copies and the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email 
is strictly prohibited and appropriate legal action will be taken. 
------------------------------------------------------------------------------------------------------------------------------- 

 


-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------

RE: Lots of warning messages and exception in namenode logs

Posted by omprakash <om...@cdac.in>.

Hi Arpit,

 

I will enable the settings as suggested and will post the results.

I am just curious about setting Namenode RPC service  port. As I have checked the hdfs-site.xml properties, dfs.namenode.rpc-address is already set which will be default value to RPC service port also. Does specifying any other port have advantage over default one?

 

Regarding JvmPauseMonitor Error, there are 5-6 instances of this error in namenode logs. Here is one of them.

 

How to identify the size of heap In such cases as I have 4GB of RAM on the namenode VM.?

 

@Ravi Since the file size are very small thus I have only configured a VM with 20 GB space. The additional disk is simple SATA disk not SSD. 

 

As I can see from Namenode UI there are more than 50% of block under replicated. I have now 400K blocks out of which 200K are under-replicated. 

I will post the results again after changing the value of dfs.namenode.replication.work.multiplier.per.iteration

 

 

Thanks 

Om Prakash

 

From: Ravi Prakash [mailto:ravihadoop@gmail.com] 
Sent: 22 June 2017 23:04
To: Arpit Agarwal <aa...@hortonworks.com>
Cc: omprakash <om...@cdac.in>; user <us...@hadoop.apache.org>
Subject: Re: Lots of warning messages and exception in namenode logs

 

Hi Omprakash!

How big are your disks? Just 20Gb? Just out of curiosity, are these SSDs?

In addition to Arpit's reply, I'm also concerned with the number of under-replicated blocks you have: Under replicated blocks: 141863

When there are fewer replicas for a block than there are supposed to be (in your case e.g. when there's 1 replica when there ought to be 2), the namenode will order the datanodes to create more replicas. The rate at which it does this is controlled by 
dfs.namenode.replication.work.multiplier.per.iteration . Given you have only 2 datanodes, you'll only be re-replicating 4 blocks every 3 seconds. So, it will take quite a while to re-replicate all the blocks. 

Also, please know that you want files to be much bigger than 1kb. Ideally you'd have a couple of blocks (blocks=128Mb) for each file. You should append to files when they are this small.

Please do let us know how things turn out.

Cheers,

Ravi

 

On Wed, Jun 21, 2017 at 11:23 PM, Arpit Agarwal <aagarwal@hortonworks.com <ma...@hortonworks.com> > wrote:

Hi Omprakash,

 

Your description suggests DataNodes cannot send timely reports to the NameNode. You can check it by looking for ‘stale’ DataNodes in the NN web UI when this situation is occurring. A few ideas:

 

*	Try increasing the NameNode RPC handler count a bit (set dfs.namenode.handler.count to 20 in hdfs-site.xml).
*	Enable the NameNode service RPC port. This requires downtime and reformatting the ZKFC znode.
*	Search for JvmPauseMonitor messages in your service logs. If you see any, try increasing JVM heap for that service.
*	Enable debug logging as suggested here:

 

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and  <http://org.apache.hadoop.net/> org.apache.hadoop.net.NetworkTopology

 

 

From: omprakash <omprakashp@cdac.in <ma...@cdac.in> >
Date: Wednesday, June 21, 2017 at 9:23 PM
To: 'Ravi Prakash' <ravihadoop@gmail.com <ma...@gmail.com> >
Cc: 'user' <user@hadoop.apache.org <ma...@hadoop.apache.org> >
Subject: RE: Lots of warning messages and exception in namenode logs

 

Hi Ravi,

 

Pasting below my core-site and hdfs-site  configurations. I have kept bare minimal configurations for my cluster.  The cluster started fine and I was able to put couple of 100K files on hdfs but then when I checked the logs there were errors/Exceptions. After restart of datanodes they work well for few thousand files but same problem again.  No idea what is wrong. 

 

PS: I am pumping 1 file per second to hdfs with aprox size 1KB

 

I thought it may be due to space quota on datanodes but here is the output of hdfs dfs -report. Looks fine to me

 

$ hdfs dfsadmin -report

 

Configured Capacity: 42005069824 (39.12 GB)

Present Capacity: 38085839568 (35.47 GB)

DFS Remaining: 34949058560 (32.55 GB)

DFS Used: 3136781008 <tel:(313)%20678-1008>  (2.92 GB)

DFS Used%: 8.24%

Under replicated blocks: 141863

Blocks with corrupt replicas: 0

Missing blocks: 0

Missing blocks (with replication factor 1): 0

Pending deletion blocks: 0

 

-------------------------------------------------

Live datanodes (2):

 

Name: 192.168.9.174:50010 <http://192.168.9.174:50010>  (node5)

Hostname: node5

Decommission Status : Normal

Configured Capacity: 21002534912 (19.56 GB)

DFS Used: 1764211024 (1.64 GB)

Non DFS Used: 811509424 (773.92 MB)

DFS Remaining: 17067913216 <tel:(706)%20791-3216>  (15.90 GB)

DFS Used%: 8.40%

DFS Remaining%: 81.27%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 2

Last contact: Wed Jun 21 14:38:17 IST 2017

 

 

Name: 192.168.9.225:50010 <http://192.168.9.225:50010>  (node4)

Hostname: node5

Decommission Status : Normal

Configured Capacity: 21002534912 (19.56 GB)

DFS Used: 1372569984 (1.28 GB)

Non DFS Used: 658353792 (627.86 MB)

DFS Remaining: 17881145344 (16.65 GB)

DFS Used%: 6.54%

DFS Remaining%: 85.14%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Wed Jun 21 14:38:19 IST 2017

 

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

  <name>fs.defaultFS</name>

  <value>hdfs://hdfsCluster</value>

</property>

<property>

  <name>dfs.journalnode.edits.dir</name>

  <value>/mnt/hadoopData/hadoop/journal/node/local/data</value>

</property>

</configuration>

 

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

  <name>dfs.name.dir</name>

    <value>file:///mnt/hadoopData/hadoop/hdfs/namenode <file:///mnt/hadoopData/hadoop/hdfs/namenode%3c/value> </value>

</property>

<property>

  <name>dfs.data.dir</name>

    <value>file:///mnt/hadoopData/hadoop/hdfs/datanode <file:///mnt/hadoopData/hadoop/hdfs/datanode%3c/value> </value>

</property>

<property>

<name>dfs.nameservices</name>

<value>hdfsCluster</value>

</property>

<property>

  <name>dfs.ha.namenodes.hdfsCluster</name>

  <value>nn1,nn2</value>

</property>

 

<property>

  <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name>

  <value>node1:8020</value>

</property>

<property>

  <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name>

  <value>node22:8020</value>

</property>

 

<property>

  <name>dfs.namenode.http-address.hdfsCluster.nn1</name>

  <value>node1:50070</value>

</property>

<property>

  <name>dfs.namenode.http-address.hdfsCluster.nn2</name>

  <value>node2:50070</value>

</property>

 

<property>

  <name>dfs.namenode.shared.edits.dir</name>

  <value>qjournal://node1:8485;node2:8485;node3:8485;node4:8485;node5:8485/hdfsCluster</value>

</property>

<property>

  <name>dfs.client.failover.proxy.provider.hdfsCluster</name>

  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<property>

   <name>ha.zookeeper.quorum</name>

   <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value>

</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/hadoop/.ssh/id_rsa</value>

</property>

<property>

   <name>dfs.ha.automatic-failover.enabled</name>

   <value>true</value>

</property>

</configuration>

 

 

From: Ravi Prakash [mailto:ravihadoop@gmail.com <ma...@gmail.com> ] 
Sent: 22 June 2017 02:38
To: omprakash <omprakashp@cdac.in <ma...@cdac.in> >
Cc: user <user@hadoop.apache.org <ma...@hadoop.apache.org> >
Subject: Re: Lots of warning messages and exception in namenode logs

 

Hi Omprakash!

What is your default replication set to? What kind of disks do your datanodes have? Were you able to start a cluster with a simple configuration before you started tuning it?

HDFS tries to create the default number of replicas for a block on different datanodes. The Namenode tries to give a list of datanodes that the client can write replicas of the block to. If the Namenode is not able to construct a list with adequate number of datanodes, you will see the message you are seeing. This may mean that datanodes are unhealthy (failed disks), or full (disks have no more space), being decomissioned ( HDFS will not write replicas on decomissioning datanodes) or misconfigured ( I'd suggest turning on storage classes only after a simple configuration works).

When a client that was trying to write a file was killed (e.g. if you killed your MR job), after some time (hard limit expiring) the Namenode will try to recover the file. In your case the namenode is also not able to find enough datanodes for recovering the files.

 

HTH

Ravi






 

On Tue, Jun 20, 2017 at 11:50 PM, omprakash <omprakashp@cdac.in <ma...@cdac.in> > wrote:

Hi,

 

I am receiving lots of  warning messages in namenodes logs on ACTIVE NN in my HA Hadoop setup. Below are the logs

 

“2017-06-21 12:11:26,523 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=2, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})

2017-06-21 12:11:26,523 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable:  unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}

2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073894332_153508, replicas= <http://192.168.9.174:50010> 192.168.9.174:50010 for /36962._COPYING_

2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /36962._COPYING_ is closed by DFSClient_NONMAPREDUCE_146762699_1

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and  <http://org.apache.hadoop.net> org.apache.hadoop.net.NetworkTopology

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=2, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})”

 

I am also encountering exceptions in active namenode related to LeaseManager

 

2017-06-21 12:13:16,706 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired hard limit

2017-06-21 12:13:16,706 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1], src=/user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79

2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks are waiting to be minimally replicated. Try again later.

2017-06-21 12:13:16,706 ERROR org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79 in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1]

org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks are waiting to be minimally replicated. Try again later.

        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)

        at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)

        at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)

        at java.lang.Thread.run(Thread.java:745)

 

I have checked the two datanodes. Both are running and have enough space for new data. 

 

PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is setuped using Qourom Journal Manager and  Zookeeper server.

 

Any idea why these errors?

 

Regards

Omprakash Paliwal

HPC-Medical and Bioinformatics Applications Group

Centre for Development of Advanced Computing (C-DAC)

Pune University campus,

PUNE-411007

Maharashtra, India

email: <ma...@cdac.in> omprakashp@cdac.in

Contact :  <tel:+91%2020%202570%204231> +91-20-25704231

 


------------------------------------------------------------------------------------------------------------------------------- 
[ C-DAC is on Social-Media too. Kindly follow us at: 
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] 

This e-mail is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy 
all copies and the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email 
is strictly prohibited and appropriate legal action will be taken. 
------------------------------------------------------------------------------------------------------------------------------- 

 


------------------------------------------------------------------------------------------------------------------------------- 
[ C-DAC is on Social-Media too. Kindly follow us at: 
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] 

This e-mail is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy 
all copies and the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email 
is strictly prohibited and appropriate legal action will be taken. 
------------------------------------------------------------------------------------------------------------------------------- 

 


-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------

Re: Lots of warning messages and exception in namenode logs

Posted by Ravi Prakash <ra...@gmail.com>.

Hi Omprakash!

How big are your disks? Just 20Gb? Just out of curiosity, are these SSDs?

In addition to Arpit's reply, I'm also concerned with the number of
under-replicated blocks you have: Under replicated blocks: 141863
When there are fewer replicas for a block than there are supposed to be (in
your case e.g. when there's 1 replica when there ought to be 2), the
namenode will order the datanodes to create more replicas. The rate at
which it does this is controlled by
dfs.namenode.replication.work.multiplier.per.iteration . Given you have
only 2 datanodes, you'll only be re-replicating 4 blocks every 3 seconds.
So, it will take quite a while to re-replicate all the blocks.

Also, please know that you want files to be much bigger than 1kb. Ideally
you'd have a couple of blocks (blocks=128Mb) for each file. You should
append to files when they are this small.

Please do let us know how things turn out.

Cheers,
Ravi

On Wed, Jun 21, 2017 at 11:23 PM, Arpit Agarwal <aa...@hortonworks.com>
wrote:

> Hi Omprakash,
>
>
>
> Your description suggests DataNodes cannot send timely reports to the
> NameNode. You can check it by looking for ‘stale’ DataNodes in the NN web
> UI when this situation is occurring. A few ideas:
>
>
>
>    - Try increasing the NameNode RPC handler count a bit (set
>    dfs.namenode.handler.count to 20 in hdfs-site.xml).
>    - Enable the NameNode service RPC port. This requires downtime and
>    reformatting the ZKFC znode.
>    - Search for JvmPauseMonitor messages in your service logs. If you see
>    any, try increasing JVM heap for that service.
>    - Enable debug logging as suggested here:
>
>
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and **org.apache.hadoop.net
> <http://org.apache.hadoop.net/>.NetworkTopology*
>
>
>
>
>
> *From: *omprakash <om...@cdac.in>
> *Date: *Wednesday, June 21, 2017 at 9:23 PM
> *To: *'Ravi Prakash' <ra...@gmail.com>
> *Cc: *'user' <us...@hadoop.apache.org>
> *Subject: *RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Ravi,
>
>
>
> Pasting below my core-site and hdfs-site  configurations. I have kept bare
> minimal configurations for my cluster.  The cluster started fine and I was
> able to put couple of 100K files on hdfs but then when I checked the logs
> there were errors/Exceptions. After restart of datanodes they work well for
> few thousand files but same problem again.  No idea what is wrong.
>
>
>
> *PS: I am pumping 1 file per second to hdfs with aprox size 1KB*
>
>
>
> I thought it may be due to space quota on datanodes but here is the output
> of *hdfs dfs -report*. Looks fine to me
>
>
>
> $ hdfs dfsadmin -report
>
>
>
> Configured Capacity: 42005069824 (39.12 GB)
>
> Present Capacity: 38085839568 (35.47 GB)
>
> DFS Remaining: 34949058560 (32.55 GB)
>
> DFS Used: 3136781008 <(313)%20678-1008> (2.92 GB)
>
> DFS Used%: 8.24%
>
> Under replicated blocks: 141863
>
> Blocks with corrupt replicas: 0
>
> Missing blocks: 0
>
> Missing blocks (with replication factor 1): 0
>
> Pending deletion blocks: 0
>
>
>
> -------------------------------------------------
>
> Live datanodes (2):
>
>
>
> Name: 192.168.9.174:50010 (node5)
>
> Hostname: node5
>
> Decommission Status : Normal
>
> Configured Capacity: 21002534912 (19.56 GB)
>
> DFS Used: 1764211024 (1.64 GB)
>
> Non DFS Used: 811509424 (773.92 MB)
>
> DFS Remaining: 17067913216 <(706)%20791-3216> (15.90 GB)
>
> DFS Used%: 8.40%
>
> DFS Remaining%: 81.27%
>
> Configured Cache Capacity: 0 (0 B)
>
> Cache Used: 0 (0 B)
>
> Cache Remaining: 0 (0 B)
>
> Cache Used%: 100.00%
>
> Cache Remaining%: 0.00%
>
> Xceivers: 2
>
> Last contact: Wed Jun 21 14:38:17 IST 2017
>
>
>
>
>
> Name: 192.168.9.225:50010 (node4)
>
> Hostname: node5
>
> Decommission Status : Normal
>
> Configured Capacity: 21002534912 (19.56 GB)
>
> DFS Used: 1372569984 (1.28 GB)
>
> Non DFS Used: 658353792 (627.86 MB)
>
> DFS Remaining: 17881145344 (16.65 GB)
>
> DFS Used%: 6.54%
>
> DFS Remaining%: 85.14%
>
> Configured Cache Capacity: 0 (0 B)
>
> Cache Used: 0 (0 B)
>
> Cache Remaining: 0 (0 B)
>
> Cache Used%: 100.00%
>
> Cache Remaining%: 0.00%
>
> Xceivers: 1
>
> Last contact: Wed Jun 21 14:38:19 IST 2017
>
>
>
> *core-site.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> <property>
>
>   <name>fs.defaultFS</name>
>
>   <value>hdfs://hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.journalnode.edits.dir</name>
>
>   <value>/mnt/hadoopData/hadoop/journal/node/local/data</value>
>
> </property>
>
> </configuration>
>
>
>
> *hdfs-site.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> *<property>*
>
> *<name>dfs.replication</name>*
>
> *<value>2</value>*
>
> *</property>*
>
> <property>
>
>   <name>dfs.name.dir</name>
>
>     <value>file:///mnt/hadoopData/hadoop/hdfs/namenode</value>
>
> </property>
>
> <property>
>
>   <name>dfs.data.dir</name>
>
>     <value>file:///mnt/hadoopData/hadoop/hdfs/datanode</value>
>
> </property>
>
> <property>
>
> <name>dfs.nameservices</name>
>
> <value>hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.ha.namenodes.hdfsCluster</name>
>
>   <value>nn1,nn2</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name>
>
>   <value>node1:8020</value>
>
> </property>
>
> <property>
>
>   <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name>
>
>   <value>node22:8020</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.http-address.hdfsCluster.nn1</name>
>
>   <value>node1:50070</value>
>
> </property>
>
> <property>
>
>   <name>dfs.namenode.http-address.hdfsCluster.nn2</name>
>
>   <value>node2:50070</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.shared.edits.dir</name>
>
>   <value>qjournal://node1:8485;node2:8485;node3:8485;node4:
> 8485;node5:8485/hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.client.failover.proxy.provider.hdfsCluster</name>
>
>   <value>org.apache.hadoop.hdfs.server.namenode.ha.
> ConfiguredFailoverProxyProvider</value>
>
> </property>
>
> <property>
>
>    <name>ha.zookeeper.quorum</name>
>
>    <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value>
>
> </property>
>
> <property>
>
> <name>dfs.ha.fencing.methods</name>
>
> <value>sshfence</value>
>
> </property>
>
> <property>
>
> <name>dfs.ha.fencing.ssh.private-key-files</name>
>
> <value>/home/hadoop/.ssh/id_rsa</value>
>
> </property>
>
> <property>
>
>    <name>dfs.ha.automatic-failover.enabled</name>
>
>    <value>true</value>
>
> </property>
>
> </configuration>
>
>
>
>
>
> *From:* Ravi Prakash [mailto:ravihadoop@gmail.com]
> *Sent:* 22 June 2017 02:38
> *To:* omprakash <om...@cdac.in>
> *Cc:* user <us...@hadoop.apache.org>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> What is your default replication set to? What kind of disks do your
> datanodes have? Were you able to start a cluster with a simple
> configuration before you started tuning it?
>
> HDFS tries to create the default number of replicas for a block on
> different datanodes. The Namenode tries to give a list of datanodes that
> the client can write replicas of the block to. If the Namenode is not able
> to construct a list with adequate number of datanodes, you will see the
> message you are seeing. This may mean that datanodes are unhealthy (failed
> disks), or full (disks have no more space), being decomissioned ( HDFS will
> not write replicas on decomissioning datanodes) or misconfigured ( I'd
> suggest turning on storage classes only after a simple configuration works).
>
> When a client that was trying to write a file was killed (e.g. if you
> killed your MR job), after some time (hard limit expiring) the Namenode
> will try to recover the file. In your case the namenode is also not able to
> find enough datanodes for recovering the files.
>
>
>
> HTH
>
> Ravi
>
>
>
>
>
>
>
> On Tue, Jun 20, 2017 at 11:50 PM, omprakash <om...@cdac.in> wrote:
>
> Hi,
>
>
>
> I am receiving lots of  *warning messages in namenodes* logs on ACTIVE NN
> in my *HA Hadoop setup*. Below are the logs
>
>
>
> *“2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})*
>
> *2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) All required storage types are unavailable:
> unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}*
>
> *2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> allocate blk_1073894332_153508, replicas=**192.168.9.174:50010*
> <http://192.168.9.174:50010>* for /36962._COPYING_*
>
> *2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> completeFile: /36962._COPYING_ is closed by
> DFSClient_NONMAPREDUCE_146762699_1*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and *
> *org.apache.hadoop.net* <http://org.apache.hadoop.net>*.NetworkTopology*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})”*
>
>
>
> I am also encountering exceptions in active namenode related to
> LeaseManager
>
>
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder:
> DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired
> hard limit*
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
> Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1],
> src=/user/hadoop/**2106201707* <(210)%20620-1707>
> */02d5adda-d90f-47cb-85d5-999a079f4d79*
>
> *2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79.
> Committed blocks are waiting to be minimally replicated. Try again later.*
>
> *2017-06-21 12:13:16,706 ERROR
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the
> path /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79
> in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092,
> pending creates: 1]*
>
> *org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/**2106201707* <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79.
> Committed blocks are waiting to be minimally replicated. Try again later.*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)*
>
> *        at java.lang.Thread.run(Thread.java:745)*
>
>
>
> I have checked the two datanodes. Both are running and have enough space
> for new data.
>
>
>
> *PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is
> setuped using Qourom Journal Manager and  Zookeeper server.*
>
>
>
> Any idea why these errors?
>
>
>
> *Regards*
>
> *Omprakash Paliwal*
>
> HPC-Medical and Bioinformatics Applications Group
>
> Centre for Development of Advanced Computing (C-DAC)
>
> Pune University campus,
>
> PUNE-411007
>
> Maharashtra, India
>
> email:omprakashp@cdac.in
>
> Contact : +91-20-25704231 <+91%2020%202570%204231>
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>

Re: Lots of warning messages and exception in namenode logs

Posted by Arpit Agarwal <aa...@hortonworks.com>.

Hi Omprakash,

Your description suggests DataNodes cannot send timely reports to the NameNode. You can check it by looking for ‘stale’ DataNodes in the NN web UI when this situation is occurring. A few ideas:


  *   Try increasing the NameNode RPC handler count a bit (set dfs.namenode.handler.count to 20 in hdfs-site.xml).
  *   Enable the NameNode service RPC port. This requires downtime and reformatting the ZKFC znode.
  *   Search for JvmPauseMonitor messages in your service logs. If you see any, try increasing JVM heap for that service.
  *   Enable debug logging as suggested here:

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net<http://org.apache.hadoop.net/>.NetworkTopology


From: omprakash <om...@cdac.in>
Date: Wednesday, June 21, 2017 at 9:23 PM
To: 'Ravi Prakash' <ra...@gmail.com>
Cc: 'user' <us...@hadoop.apache.org>
Subject: RE: Lots of warning messages and exception in namenode logs

Hi Ravi,

Pasting below my core-site and hdfs-site  configurations. I have kept bare minimal configurations for my cluster.  The cluster started fine and I was able to put couple of 100K files on hdfs but then when I checked the logs there were errors/Exceptions. After restart of datanodes they work well for few thousand files but same problem again.  No idea what is wrong.

PS: I am pumping 1 file per second to hdfs with aprox size 1KB

I thought it may be due to space quota on datanodes but here is the output of hdfs dfs -report. Looks fine to me

$ hdfs dfsadmin -report

Configured Capacity: 42005069824 (39.12 GB)
Present Capacity: 38085839568 (35.47 GB)
DFS Remaining: 34949058560 (32.55 GB)
DFS Used: 3136781008 (2.92 GB)
DFS Used%: 8.24%
Under replicated blocks: 141863
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 192.168.9.174:50010 (node5)
Hostname: node5
Decommission Status : Normal
Configured Capacity: 21002534912 (19.56 GB)
DFS Used: 1764211024 (1.64 GB)
Non DFS Used: 811509424 (773.92 MB)
DFS Remaining: 17067913216 (15.90 GB)
DFS Used%: 8.40%
DFS Remaining%: 81.27%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Wed Jun 21 14:38:17 IST 2017


Name: 192.168.9.225:50010 (node4)
Hostname: node5
Decommission Status : Normal
Configured Capacity: 21002534912 (19.56 GB)
DFS Used: 1372569984 (1.28 GB)
Non DFS Used: 658353792 (627.86 MB)
DFS Remaining: 17881145344 (16.65 GB)
DFS Used%: 6.54%
DFS Remaining%: 85.14%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Jun 21 14:38:19 IST 2017

core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://hdfsCluster</value>
</property>
<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/mnt/hadoopData/hadoop/journal/node/local/data</value>
</property>
</configuration>

hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
  <name>dfs.name.dir</name>
    <value>file:///mnt/hadoopData/hadoop/hdfs/namenode</value>
</property>
<property>
  <name>dfs.data.dir</name>
    <value>file:///mnt/hadoopData/hadoop/hdfs/datanode</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>hdfsCluster</value>
</property>
<property>
  <name>dfs.ha.namenodes.hdfsCluster</name>
  <value>nn1,nn2</value>
</property>

<property>
  <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name>
  <value>node1:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name>
  <value>node22:8020</value>
</property>

<property>
  <name>dfs.namenode.http-address.hdfsCluster.nn1</name>
  <value>node1:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.hdfsCluster.nn2</name>
  <value>node2:50070</value>
</property>

<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://node1:8485;node2:8485;node3:8485;node4:8485;node5:8485/hdfsCluster</value>
</property>
<property>
  <name>dfs.client.failover.proxy.provider.hdfsCluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
   <name>ha.zookeeper.quorum</name>
   <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
   <name>dfs.ha.automatic-failover.enabled</name>
   <value>true</value>
</property>
</configuration>


From: Ravi Prakash [mailto:ravihadoop@gmail.com]
Sent: 22 June 2017 02:38
To: omprakash <om...@cdac.in>
Cc: user <us...@hadoop.apache.org>
Subject: Re: Lots of warning messages and exception in namenode logs

Hi Omprakash!
What is your default replication set to? What kind of disks do your datanodes have? Were you able to start a cluster with a simple configuration before you started tuning it?
HDFS tries to create the default number of replicas for a block on different datanodes. The Namenode tries to give a list of datanodes that the client can write replicas of the block to. If the Namenode is not able to construct a list with adequate number of datanodes, you will see the message you are seeing. This may mean that datanodes are unhealthy (failed disks), or full (disks have no more space), being decomissioned ( HDFS will not write replicas on decomissioning datanodes) or misconfigured ( I'd suggest turning on storage classes only after a simple configuration works).
When a client that was trying to write a file was killed (e.g. if you killed your MR job), after some time (hard limit expiring) the Namenode will try to recover the file. In your case the namenode is also not able to find enough datanodes for recovering the files.

HTH
Ravi





On Tue, Jun 20, 2017 at 11:50 PM, omprakash <om...@cdac.in>> wrote:
Hi,

I am receiving lots of  warning messages in namenodes logs on ACTIVE NN in my HA Hadoop setup. Below are the logs

“2017-06-21 12:11:26,523 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=2, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
2017-06-21 12:11:26,523 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable:  unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073894332_153508, replicas=192.168.9.174:50010<http://192.168.9.174:50010> for /36962._COPYING_
2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /36962._COPYING_ is closed by DFSClient_NONMAPREDUCE_146762699_1
2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net<http://org.apache.hadoop.net>.NetworkTopology
2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=2, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})”

I am also encountering exceptions in active namenode related to LeaseManager

2017-06-21 12:13:16,706 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired hard limit
2017-06-21 12:13:16,706 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1], src=/user/hadoop/2106201707<tel:(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79
2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /user/hadoop/2106201707<tel:(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks are waiting to be minimally replicated. Try again later.
2017-06-21 12:13:16,706 ERROR org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path /user/hadoop/2106201707<tel:(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79 in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1]
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /user/hadoop/2106201707<tel:(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks are waiting to be minimally replicated. Try again later.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)
        at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)
        at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)
        at java.lang.Thread.run(Thread.java:745)

I have checked the two datanodes. Both are running and have enough space for new data.

PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is setuped using Qourom Journal Manager and  Zookeeper server.

Any idea why these errors?

Regards
Omprakash Paliwal
HPC-Medical and Bioinformatics Applications Group
Centre for Development of Advanced Computing (C-DAC)
Pune University campus,
PUNE-411007
Maharashtra, India
email:omprakashp@cdac.in<ma...@cdac.in>
Contact : +91-20-25704231<tel:+91%2020%202570%204231>


-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------


-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------

RE: Lots of warning messages and exception in namenode logs

Posted by omprakash <om...@cdac.in>.

Hi Ravi,

 

Pasting below my core-site and hdfs-site  configurations. I have kept bare minimal configurations for my cluster.  The cluster started fine and I was able to put couple of 100K files on hdfs but then when I checked the logs there were errors/Exceptions. After restart of datanodes they work well for few thousand files but same problem again.  No idea what is wrong. 

 

PS: I am pumping 1 file per second to hdfs with aprox size 1KB

 

I thought it may be due to space quota on datanodes but here is the output of hdfs dfs -report. Looks fine to me

 

$ hdfs dfsadmin -report

 

Configured Capacity: 42005069824 (39.12 GB)

Present Capacity: 38085839568 (35.47 GB)

DFS Remaining: 34949058560 (32.55 GB)

DFS Used: 3136781008 (2.92 GB)

DFS Used%: 8.24%

Under replicated blocks: 141863

Blocks with corrupt replicas: 0

Missing blocks: 0

Missing blocks (with replication factor 1): 0

Pending deletion blocks: 0

 

-------------------------------------------------

Live datanodes (2):

 

Name: 192.168.9.174:50010 (node5)

Hostname: node5

Decommission Status : Normal

Configured Capacity: 21002534912 (19.56 GB)

DFS Used: 1764211024 (1.64 GB)

Non DFS Used: 811509424 (773.92 MB)

DFS Remaining: 17067913216 (15.90 GB)

DFS Used%: 8.40%

DFS Remaining%: 81.27%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 2

Last contact: Wed Jun 21 14:38:17 IST 2017

 

 

Name: 192.168.9.225:50010 (node4)

Hostname: node5

Decommission Status : Normal

Configured Capacity: 21002534912 (19.56 GB)

DFS Used: 1372569984 (1.28 GB)

Non DFS Used: 658353792 (627.86 MB)

DFS Remaining: 17881145344 (16.65 GB)

DFS Used%: 6.54%

DFS Remaining%: 85.14%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Wed Jun 21 14:38:19 IST 2017

 

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

  <name>fs.defaultFS</name>

  <value>hdfs://hdfsCluster</value>

</property>

<property>

  <name>dfs.journalnode.edits.dir</name>

  <value>/mnt/hadoopData/hadoop/journal/node/local/data</value>

</property>

</configuration>

 

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

  <name>dfs.name.dir</name>

    <value>file:///mnt/hadoopData/hadoop/hdfs/namenode</value>

</property>

<property>

  <name>dfs.data.dir</name>

    <value>file:///mnt/hadoopData/hadoop/hdfs/datanode</value>

</property>

<property>

<name>dfs.nameservices</name>

<value>hdfsCluster</value>

</property>

<property>

  <name>dfs.ha.namenodes.hdfsCluster</name>

  <value>nn1,nn2</value>

</property>

 

<property>

  <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name>

  <value>node1:8020</value>

</property>

<property>

  <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name>

  <value>node22:8020</value>

</property>

 

<property>

  <name>dfs.namenode.http-address.hdfsCluster.nn1</name>

  <value>node1:50070</value>

</property>

<property>

  <name>dfs.namenode.http-address.hdfsCluster.nn2</name>

  <value>node2:50070</value>

</property>

 

<property>

  <name>dfs.namenode.shared.edits.dir</name>

  <value>qjournal://node1:8485;node2:8485;node3:8485;node4:8485;node5:8485/hdfsCluster</value>

</property>

<property>

  <name>dfs.client.failover.proxy.provider.hdfsCluster</name>

  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<property>

   <name>ha.zookeeper.quorum</name>

   <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value>

</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/hadoop/.ssh/id_rsa</value>

</property>

<property>

   <name>dfs.ha.automatic-failover.enabled</name>

   <value>true</value>

</property>

</configuration>

 

 

From: Ravi Prakash [mailto:ravihadoop@gmail.com] 
Sent: 22 June 2017 02:38
To: omprakash <om...@cdac.in>
Cc: user <us...@hadoop.apache.org>
Subject: Re: Lots of warning messages and exception in namenode logs

 

Hi Omprakash!

What is your default replication set to? What kind of disks do your datanodes have? Were you able to start a cluster with a simple configuration before you started tuning it?

HDFS tries to create the default number of replicas for a block on different datanodes. The Namenode tries to give a list of datanodes that the client can write replicas of the block to. If the Namenode is not able to construct a list with adequate number of datanodes, you will see the message you are seeing. This may mean that datanodes are unhealthy (failed disks), or full (disks have no more space), being decomissioned ( HDFS will not write replicas on decomissioning datanodes) or misconfigured ( I'd suggest turning on storage classes only after a simple configuration works).

When a client that was trying to write a file was killed (e.g. if you killed your MR job), after some time (hard limit expiring) the Namenode will try to recover the file. In your case the namenode is also not able to find enough datanodes for recovering the files.

 

HTH

Ravi






 

On Tue, Jun 20, 2017 at 11:50 PM, omprakash <omprakashp@cdac.in <ma...@cdac.in> > wrote:

Hi,

 

I am receiving lots of  warning messages in namenodes logs on ACTIVE NN in my HA Hadoop setup. Below are the logs

 

“2017-06-21 12:11:26,523 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=2, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})

2017-06-21 12:11:26,523 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable:  unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}

2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_1073894332_153508, replicas= <http://192.168.9.174:50010> 192.168.9.174:50010 for /36962._COPYING_

2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /36962._COPYING_ is closed by DFSClient_NONMAPREDUCE_146762699_1

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 2 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and  <http://org.apache.hadoop.net> org.apache.hadoop.net.NetworkTopology

2017-06-21 12:11:30,626 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=2, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})”

 

I am also encountering exceptions in active namenode related to LeaseManager

 

2017-06-21 12:13:16,706 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired hard limit

2017-06-21 12:13:16,706 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1], src=/user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79

2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks are waiting to be minimally replicated. Try again later.

2017-06-21 12:13:16,706 ERROR org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79 in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1]

org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /user/hadoop/ <tel:(210)%20620-1707> 2106201707/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks are waiting to be minimally replicated. Try again later.

        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)

        at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)

        at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)

        at java.lang.Thread.run(Thread.java:745)

 

I have checked the two datanodes. Both are running and have enough space for new data. 

 

PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is setuped using Qourom Journal Manager and  Zookeeper server.

 

Any idea why these errors?

 

Regards

Omprakash Paliwal

HPC-Medical and Bioinformatics Applications Group

Centre for Development of Advanced Computing (C-DAC)

Pune University campus,

PUNE-411007

Maharashtra, India

email: <ma...@cdac.in> omprakashp@cdac.in

Contact :  <tel:+91%2020%202570%204231> +91-20-25704231

 


------------------------------------------------------------------------------------------------------------------------------- 
[ C-DAC is on Social-Media too. Kindly follow us at: 
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] 

This e-mail is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy 
all copies and the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email 
is strictly prohibited and appropriate legal action will be taken. 
------------------------------------------------------------------------------------------------------------------------------- 

 


-------------------------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
-------------------------------------------------------------------------------------------------------------------------------

Re: Lots of warning messages and exception in namenode logs

Posted by Ravi Prakash <ra...@gmail.com>.

Hi Omprakash!

What is your default replication set to? What kind of disks do your
datanodes have? Were you able to start a cluster with a simple
configuration before you started tuning it?

HDFS tries to create the default number of replicas for a block on
different datanodes. The Namenode tries to give a list of datanodes that
the client can write replicas of the block to. If the Namenode is not able
to construct a list with adequate number of datanodes, you will see the
message you are seeing. This may mean that datanodes are unhealthy (failed
disks), or full (disks have no more space), being decomissioned ( HDFS will
not write replicas on decomissioning datanodes) or misconfigured ( I'd
suggest turning on storage classes only after a simple configuration works).

When a client that was trying to write a file was killed (e.g. if you
killed your MR job), after some time (hard limit expiring) the Namenode
will try to recover the file. In your case the namenode is also not able to
find enough datanodes for recovering the files.

HTH
Ravi





On Tue, Jun 20, 2017 at 11:50 PM, omprakash <om...@cdac.in> wrote:

> Hi,
>
>
>
> I am receiving lots of  *warning messages in namenodes* logs on ACTIVE NN
> in my *HA Hadoop setup*. Below are the logs
>
>
>
> *“2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})*
>
> *2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) All required storage types are unavailable:
> unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}*
>
> *2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> allocate blk_1073894332_153508, replicas=192.168.9.174:50010
> <http://192.168.9.174:50010> for /36962._COPYING_*
>
> *2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> completeFile: /36962._COPYING_ is closed by
> DFSClient_NONMAPREDUCE_146762699_1*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and
> org.apache.hadoop.net <http://org.apache.hadoop.net>.NetworkTopology*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})”*
>
>
>
> I am also encountering exceptions in active namenode related to
> LeaseManager
>
>
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder:
> DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired
> hard limit*
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
> Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1],
> src=/user/hadoop/2106201707
> <(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79*
>
> *2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/2106201707
> <(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks
> are waiting to be minimally replicated. Try again later.*
>
> *2017-06-21 12:13:16,706 ERROR
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the
> path /user/hadoop/2106201707
> <(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79 in the lease
> [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates:
> 1]*
>
> *org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/2106201707
> <(210)%20620-1707>/02d5adda-d90f-47cb-85d5-999a079f4d79. Committed blocks
> are waiting to be minimally replicated. Try again later.*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)*
>
> *        at java.lang.Thread.run(Thread.java:745)*
>
>
>
> I have checked the two datanodes. Both are running and have enough space
> for new data.
>
>
>
> *PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is
> setuped using Qourom Journal Manager and  Zookeeper server.*
>
>
>
> Any idea why these errors?
>
>
>
> *Regards*
>
> *Omprakash Paliwal*
>
> HPC-Medical and Bioinformatics Applications Group
>
> Centre for Development of Advanced Computing (C-DAC)
>
> Pune University campus,
>
> PUNE-411007
>
> Maharashtra, India
>
> email:*omprakashp@cdac.in <om...@cdac.in>*
>
> Contact : +91-20-25704231 <+91%2020%202570%204231>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>