You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Chackravarthy Esakkimuthu <ch...@gmail.com> on 2016/04/29 13:30:21 UTC

Guideline on setting Namenode RPC Handler count (client and service)

Hi,

Is there any recommendation or guideline on setting no of RPC handlers in
Namenode based on cluster size (no of datanodes)?

Cluster details :

No of datanodes - 1200
NN hardware - 74G heap allocated to NN process, 40 core machine
Total blocks - 80M+
Total Files/Directories - 60M+
Total FSObjects - 150M+

We have isolated service and client RPC by enabling service-rpc.

Currently dfs.namenode.handler.count=400 and
dfs.namenode.service.handler.count=200

Is 200 good fit for this cluster or any change recommended. Please help out.

Thanks in advance!

(We have tried increasing service handler count to 600 and have seen delay
in NN startup time and then it looked quite stable. And setting it to 200
decreases the delay in startup time but it has slightly higher rpcQueueTime
and rpcAvgProcessingTime comparing to 600 handler count.)

Thanks,
Chackra

Re: Guideline on setting Namenode RPC Handler count (client and service)

Posted by Chackravarthy Esakkimuthu <ch...@gmail.com>.
Thanks Brahma for the reply,

Will look into the issue you mentioned. (yes we are using 2.6.0 (hdp-2.2))

On Tue, May 3, 2016 at 6:04 PM, Brahma Reddy Battula <
brahmareddy.battula@huawei.com> wrote:

> Hope you are using hadoop-2.6 release.
>
>
>
> As you are targeting to amount of time it’s getting processed, your
> proposed configs options ( *ipc.ping.interval* and *split threshold* can
> be changed)  should be fine .  I mean to say, 2nd and 3rd options.
>
>
>
> You can try once, let’s know.
>
>
>
>
>
> Had seen related issue recently , may be you can have look at *HDFS-10301*
> .
>
>
>
>
>
>
>
> --Brahma Reddy Battula
>
>
>
> *From:* Chackravarthy Esakkimuthu [mailto:chaku.mitcs@gmail.com]
> *Sent:* 03 May 2016 18:10
> *To:* Gokul
> *Cc:* user@hadoop.apache.org
> *Subject:* Re: Guideline on setting Namenode RPC Handler count (client
> and service)
>
>
>
> To add more details on why NN startup delayed while setting handler count
> as 600.
>
>
>
> We are seeing many duplicate full block reports (FBR) from most of the
> DN's for long time (around 3 hours since NN startup) even though NN comes
> out of safe mode in 10 or 15 mins. Since NN comes out of safe mode,
> duplicate FBR's are not rejected.
>
>
>
> It's because DN getting timeout (ipc.ping.interval=60s default) on block
> report RPC call before NN completes processing the blockReport RPC call
> (takes around 70-80 secs). Hence DN does not realise that FBR got processed
> and it kept trying to send again. But NN has processed it already and gets
> error only while sending output.
>
>
>
> The reason why NN takes more than 1 min to process FBR :
>
>    - FBR contains array of storageBlockReport. (no of data directories
>    configured is 10)
>    - Name system write lock is acquired on processing each
>    storageBlockReport and hence single handler thread cannot just complete
>    processing FBR completely once it acquires the lock.
>    - There is a lock contention with other 599 handler threads who are
>    also busy in processing FBR from all DN's. Hence acquiring lock gets
>    delayed and then next storageBlockReport gets processed.
>
>
>    - t -> storageBlockReport[0]   --> Handler thread starts FBR
>       processing.
>       - t + 5s -> storageBlockReport[1]
>       - t + 12s ->  storageBlockReport[2]
>       - ...
>       - ...
>       - t + 70s -> storageBlockReport[9]  --> Handler thread completes
>       FBR processing.
>
>
>
> We are looking for some suggestion to resolve this situation of having
> delayed start of NN. (delayed start means even though NN comes out of safe
> mode, because of duplicate FBR, serviceRPC latency remains high and skips
> the heartbeat for more than 1 minute continuously)
>
>
>
> Possible config options are :
>
>    1. Current value for dfs.blockreport.initialDelay is 120s. This can be
>    increased to 10 - 15 mins to avoid block report storm.
>    2. Increase ipc.ping.interval from 60s to 90s or so.
>    3. Decrease dfs.blockreport.split.threashold to 100k (from 1M) so that
>    block reports from DN will be sent for each storageBlock. Hence DN would
>    get the response quickly from NN. But this would delay in sending the
>    heartbeat as each RPC call might consume upto 60 secs timeout. Hence
>    heartbeat might get delayed for 590s (worst case if all rpc calls succeed
>    consuming 59s).
>
> Or can we move the write lock at higher level and take it once, process
> all storageBlockReports and release it. because from logs, we have seen
> that each storageBlockReport processing takes 20ms-100ms and hence single
> FBR would consume 1s. Also since FBR calls are not that frequent, (block
> report once in 6 hours in our cluster / when disk failure happens) Is it ok
> to reduce the lock granularity?
>
>
>
> Please give suggestion on the same. Also correct me if I am wrong.
>
>
>
> Thanks,
>
> Chackra
>
>
>
>
>
> On Mon, May 2, 2016 at 2:12 PM, Gokul <go...@gmail.com> wrote:
>
> *bump*
>
>
>
> On Fri, Apr 29, 2016 at 5:00 PM, Chackravarthy Esakkimuthu <
> chaku.mitcs@gmail.com> wrote:
>
> Hi,
>
>
>
> Is there any recommendation or guideline on setting no of RPC handlers in
> Namenode based on cluster size (no of datanodes)?
>
>
>
> Cluster details :
>
>
>
> No of datanodes - 1200
>
> NN hardware - 74G heap allocated to NN process, 40 core machine
>
> Total blocks - 80M+
>
> Total Files/Directories - 60M+
>
> Total FSObjects - 150M+
>
>
>
> We have isolated service and client RPC by enabling service-rpc.
>
>
>
> Currently dfs.namenode.handler.count=400 and
> dfs.namenode.service.handler.count=200
>
>
>
> Is 200 good fit for this cluster or any change recommended. Please help
> out.
>
>
>
> Thanks in advance!
>
>
>
> (We have tried increasing service handler count to 600 and have seen delay
> in NN startup time and then it looked quite stable. And setting it to 200
> decreases the delay in startup time but it has slightly higher rpcQueueTime
> and rpcAvgProcessingTime comparing to 600 handler count.)
>
>
>
> Thanks,
>
> Chackra
>
>
>
>
>
> --
>
> Thanks and Regards,
> Gokul
>
>
>

RE: Guideline on setting Namenode RPC Handler count (client and service)

Posted by Brahma Reddy Battula <br...@huawei.com>.
Hope you are using hadoop-2.6 release.

As you are targeting to amount of time it’s getting processed, your proposed configs options ( ipc.ping.interval and split threshold can be changed)  should be fine .  I mean to say, 2nd and 3rd options.

You can try once, let’s know.


Had seen related issue recently , may be you can have look at HDFS-10301.



--Brahma Reddy Battula

From: Chackravarthy Esakkimuthu [mailto:chaku.mitcs@gmail.com]
Sent: 03 May 2016 18:10
To: Gokul
Cc: user@hadoop.apache.org
Subject: Re: Guideline on setting Namenode RPC Handler count (client and service)

To add more details on why NN startup delayed while setting handler count as 600.

We are seeing many duplicate full block reports (FBR) from most of the DN's for long time (around 3 hours since NN startup) even though NN comes out of safe mode in 10 or 15 mins. Since NN comes out of safe mode, duplicate FBR's are not rejected.

It's because DN getting timeout (ipc.ping.interval=60s default) on block report RPC call before NN completes processing the blockReport RPC call (takes around 70-80 secs). Hence DN does not realise that FBR got processed and it kept trying to send again. But NN has processed it already and gets error only while sending output.

The reason why NN takes more than 1 min to process FBR :

  *   FBR contains array of storageBlockReport. (no of data directories configured is 10)
  *   Name system write lock is acquired on processing each storageBlockReport and hence single handler thread cannot just complete processing FBR completely once it acquires the lock.
  *   There is a lock contention with other 599 handler threads who are also busy in processing FBR from all DN's. Hence acquiring lock gets delayed and then next storageBlockReport gets processed.

     *   t -> storageBlockReport[0]   --> Handler thread starts FBR processing.
     *   t + 5s -> storageBlockReport[1]
     *   t + 12s ->  storageBlockReport[2]
     *   ...
     *   ...
     *   t + 70s -> storageBlockReport[9]  --> Handler thread completes FBR processing.

We are looking for some suggestion to resolve this situation of having delayed start of NN. (delayed start means even though NN comes out of safe mode, because of duplicate FBR, serviceRPC latency remains high and skips the heartbeat for more than 1 minute continuously)

Possible config options are :

  1.  Current value for dfs.blockreport.initialDelay is 120s. This can be increased to 10 - 15 mins to avoid block report storm.
  2.  Increase ipc.ping.interval from 60s to 90s or so.
  3.  Decrease dfs.blockreport.split.threashold to 100k (from 1M) so that block reports from DN will be sent for each storageBlock. Hence DN would get the response quickly from NN. But this would delay in sending the heartbeat as each RPC call might consume upto 60 secs timeout. Hence heartbeat might get delayed for 590s (worst case if all rpc calls succeed consuming 59s).
Or can we move the write lock at higher level and take it once, process all storageBlockReports and release it. because from logs, we have seen that each storageBlockReport processing takes 20ms-100ms and hence single FBR would consume 1s. Also since FBR calls are not that frequent, (block report once in 6 hours in our cluster / when disk failure happens) Is it ok to reduce the lock granularity?

Please give suggestion on the same. Also correct me if I am wrong.

Thanks,
Chackra


On Mon, May 2, 2016 at 2:12 PM, Gokul <go...@gmail.com>> wrote:
*bump*

On Fri, Apr 29, 2016 at 5:00 PM, Chackravarthy Esakkimuthu <ch...@gmail.com>> wrote:
Hi,

Is there any recommendation or guideline on setting no of RPC handlers in Namenode based on cluster size (no of datanodes)?

Cluster details :

No of datanodes - 1200
NN hardware - 74G heap allocated to NN process, 40 core machine
Total blocks - 80M+
Total Files/Directories - 60M+
Total FSObjects - 150M+

We have isolated service and client RPC by enabling service-rpc.

Currently dfs.namenode.handler.count=400 and dfs.namenode.service.handler.count=200

Is 200 good fit for this cluster or any change recommended. Please help out.

Thanks in advance!

(We have tried increasing service handler count to 600 and have seen delay in NN startup time and then it looked quite stable. And setting it to 200 decreases the delay in startup time but it has slightly higher rpcQueueTime and rpcAvgProcessingTime comparing to 600 handler count.)

Thanks,
Chackra



--
Thanks and Regards,
Gokul


Re: Guideline on setting Namenode RPC Handler count (client and service)

Posted by Chackravarthy Esakkimuthu <ch...@gmail.com>.
To add more details on why NN startup delayed while setting handler count
as 600.

We are seeing many duplicate full block reports (FBR) from most of the DN's
for long time (around 3 hours since NN startup) even though NN comes out of
safe mode in 10 or 15 mins. Since NN comes out of safe mode, duplicate
FBR's are not rejected.

It's because DN getting timeout (ipc.ping.interval=60s default) on block
report RPC call before NN completes processing the blockReport RPC call
(takes around 70-80 secs). Hence DN does not realise that FBR got processed
and it kept trying to send again. But NN has processed it already and gets
error only while sending output.

The reason why NN takes more than 1 min to process FBR :

   - FBR contains array of storageBlockReport. (no of data directories
   configured is 10)
   - Name system write lock is acquired on processing each
   storageBlockReport and hence single handler thread cannot just complete
   processing FBR completely once it acquires the lock.
   - There is a lock contention with other 599 handler threads who are also
   busy in processing FBR from all DN's. Hence acquiring lock gets delayed and
   then next storageBlockReport gets processed.
      - t -> storageBlockReport[0]   --> Handler thread starts FBR
      processing.
      - t + 5s -> storageBlockReport[1]
      - t + 12s ->  storageBlockReport[2]
      - ...
      - ...
      - t + 70s -> storageBlockReport[9]  --> Handler thread completes FBR
      processing.


We are looking for some suggestion to resolve this situation of having
delayed start of NN. (delayed start means even though NN comes out of safe
mode, because of duplicate FBR, serviceRPC latency remains high and skips
the heartbeat for more than 1 minute continuously)

Possible config options are :

   1. Current value for dfs.blockreport.initialDelay is 120s. This can be
   increased to 10 - 15 mins to avoid block report storm.
   2. Increase ipc.ping.interval from 60s to 90s or so.
   3. Decrease dfs.blockreport.split.threashold to 100k (from 1M) so that
   block reports from DN will be sent for each storageBlock. Hence DN would
   get the response quickly from NN. But this would delay in sending the
   heartbeat as each RPC call might consume upto 60 secs timeout. Hence
   heartbeat might get delayed for 590s (worst case if all rpc calls succeed
   consuming 59s).

Or can we move the write lock at higher level and take it once, process all
storageBlockReports and release it. because from logs, we have seen that
each storageBlockReport processing takes 20ms-100ms and hence single FBR
would consume 1s. Also since FBR calls are not that frequent, (block report
once in 6 hours in our cluster / when disk failure happens) Is it ok to
reduce the lock granularity?

Please give suggestion on the same. Also correct me if I am wrong.

Thanks,
Chackra


On Mon, May 2, 2016 at 2:12 PM, Gokul <go...@gmail.com> wrote:

> *bump*
>
> On Fri, Apr 29, 2016 at 5:00 PM, Chackravarthy Esakkimuthu <
> chaku.mitcs@gmail.com> wrote:
>
>> Hi,
>>
>> Is there any recommendation or guideline on setting no of RPC handlers in
>> Namenode based on cluster size (no of datanodes)?
>>
>> Cluster details :
>>
>> No of datanodes - 1200
>> NN hardware - 74G heap allocated to NN process, 40 core machine
>> Total blocks - 80M+
>> Total Files/Directories - 60M+
>> Total FSObjects - 150M+
>>
>> We have isolated service and client RPC by enabling service-rpc.
>>
>> Currently dfs.namenode.handler.count=400 and
>> dfs.namenode.service.handler.count=200
>>
>> Is 200 good fit for this cluster or any change recommended. Please help
>> out.
>>
>> Thanks in advance!
>>
>> (We have tried increasing service handler count to 600 and have seen
>> delay in NN startup time and then it looked quite stable. And setting it to
>> 200 decreases the delay in startup time but it has slightly higher
>> rpcQueueTime and rpcAvgProcessingTime comparing to 600 handler count.)
>>
>> Thanks,
>> Chackra
>>
>
>
>
> --
> Thanks and Regards,
> Gokul
>