You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by selva <se...@gmail.com> on 2013/04/27 20:03:21 UTC

High IO Usage in Datanodes due to Replication

Hi All,

I have lost amazon instances of my hadoop cluster. But i had all the data
in aws EBS volumes. So i launched new instances and attached volumes.

But all of the datanode logs keep on print the below lines it cauased to
high IO rate. Due to IO usage i am not able to run any jobs.

Can anyone help me to understand what it is doing? Thanks in advance.

2013-04-27 17:51:40,197 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.157.10.242:10013,
storageID=DS-407656544-10.28.217.27-10013-1353165843727, infoPort=15075,
ipcPort=10014) Starting thread to transfer block
blk_2440813767266473910_11564425 to 10.168.18.178:10013
2013-04-27 17:51:40,230 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.157.10.242:10013,
storageID=DS-407656544-10.28.217.27-10013-1353165843727, infoPort=15075,
ipcPort=10014):Transmitted block blk_2440813767266473910_11564425 to /
10.168.18.178:10013
2013-04-27 17:51:40,433 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest: /
10.157.10.242:10013
2013-04-27 17:51:40,450 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest: /
10.157.10.242:10013 of size 25431

Thanks
Selva

RE: High IO Usage in Datanodes due to Replication

Posted by "S, Manoj" <ma...@intel.com>.

Adding to Harsh's comments:

You can also tweak a few OS level parameters to improve the I/O performance.
1) Mount the filesystem with "noatime" option.
2) Check if changing the IO scheduling the algorithm will improve the cluster's performance. (Check this file /sys/block/<device_name>/queue/scheduler)
3) If there are lots of I/O requests and your cluster hangs because of that, you can increase the queue length by increasing the value in /sys/block/<device_name>/queue/nr_requests.

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Sunday, April 28, 2013 12:03 AM
To: <us...@hadoop.apache.org>
Subject: Re: High IO Usage in Datanodes due to Replication

They seem to be transferring blocks between one another. This may most likely be due to under-replication and the NN UI will have numbers on work left to perform. The inter-DN transfer is controlled by the balancing bandwidth though, so you can lower that down if you want to, to cripple it - but you'll lose out on time for a perfectly replicated state again.

On Sat, Apr 27, 2013 at 11:33 PM, selva <se...@gmail.com> wrote:
> Hi All,
>
> I have lost amazon instances of my hadoop cluster. But i had all the 
> data in aws EBS volumes. So i launched new instances and attached volumes.
>
> But all of the datanode logs keep on print the below lines it cauased 
> to high IO rate. Due to IO usage i am not able to run any jobs.
>
> Can anyone help me to understand what it is doing? Thanks in advance.
>
> 2013-04-27 17:51:40,197 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, 
> infoPort=15075,
> ipcPort=10014) Starting thread to transfer block
> blk_2440813767266473910_11564425 to 10.168.18.178:10013
> 2013-04-27 17:51:40,230 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, 
> infoPort=15075, ipcPort=10014):Transmitted block 
> blk_2440813767266473910_11564425 to
> /10.168.18.178:10013
> 2013-04-27 17:51:40,433 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013
> 2013-04-27 17:51:40,450 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013 of size 25431
>
> Thanks
> Selva
>
>
>
>
>
>

--
Harsh J

RE: High IO Usage in Datanodes due to Replication

Posted by "S, Manoj" <ma...@intel.com>.

Adding to Harsh's comments:

You can also tweak a few OS level parameters to improve the I/O performance.
1) Mount the filesystem with "noatime" option.
2) Check if changing the IO scheduling the algorithm will improve the cluster's performance. (Check this file /sys/block/<device_name>/queue/scheduler)
3) If there are lots of I/O requests and your cluster hangs because of that, you can increase the queue length by increasing the value in /sys/block/<device_name>/queue/nr_requests.

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Sunday, April 28, 2013 12:03 AM
To: <us...@hadoop.apache.org>
Subject: Re: High IO Usage in Datanodes due to Replication

They seem to be transferring blocks between one another. This may most likely be due to under-replication and the NN UI will have numbers on work left to perform. The inter-DN transfer is controlled by the balancing bandwidth though, so you can lower that down if you want to, to cripple it - but you'll lose out on time for a perfectly replicated state again.

On Sat, Apr 27, 2013 at 11:33 PM, selva <se...@gmail.com> wrote:
> Hi All,
>
> I have lost amazon instances of my hadoop cluster. But i had all the 
> data in aws EBS volumes. So i launched new instances and attached volumes.
>
> But all of the datanode logs keep on print the below lines it cauased 
> to high IO rate. Due to IO usage i am not able to run any jobs.
>
> Can anyone help me to understand what it is doing? Thanks in advance.
>
> 2013-04-27 17:51:40,197 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, 
> infoPort=15075,
> ipcPort=10014) Starting thread to transfer block
> blk_2440813767266473910_11564425 to 10.168.18.178:10013
> 2013-04-27 17:51:40,230 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, 
> infoPort=15075, ipcPort=10014):Transmitted block 
> blk_2440813767266473910_11564425 to
> /10.168.18.178:10013
> 2013-04-27 17:51:40,433 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013
> 2013-04-27 17:51:40,450 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013 of size 25431
>
> Thanks
> Selva
>
>
>
>
>
>

--
Harsh J

RE: High IO Usage in Datanodes due to Replication

Posted by "S, Manoj" <ma...@intel.com>.

Adding to Harsh's comments:

You can also tweak a few OS level parameters to improve the I/O performance.
1) Mount the filesystem with "noatime" option.
2) Check if changing the IO scheduling the algorithm will improve the cluster's performance. (Check this file /sys/block/<device_name>/queue/scheduler)
3) If there are lots of I/O requests and your cluster hangs because of that, you can increase the queue length by increasing the value in /sys/block/<device_name>/queue/nr_requests.

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Sunday, April 28, 2013 12:03 AM
To: <us...@hadoop.apache.org>
Subject: Re: High IO Usage in Datanodes due to Replication

They seem to be transferring blocks between one another. This may most likely be due to under-replication and the NN UI will have numbers on work left to perform. The inter-DN transfer is controlled by the balancing bandwidth though, so you can lower that down if you want to, to cripple it - but you'll lose out on time for a perfectly replicated state again.

On Sat, Apr 27, 2013 at 11:33 PM, selva <se...@gmail.com> wrote:
> Hi All,
>
> I have lost amazon instances of my hadoop cluster. But i had all the 
> data in aws EBS volumes. So i launched new instances and attached volumes.
>
> But all of the datanode logs keep on print the below lines it cauased 
> to high IO rate. Due to IO usage i am not able to run any jobs.
>
> Can anyone help me to understand what it is doing? Thanks in advance.
>
> 2013-04-27 17:51:40,197 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, 
> infoPort=15075,
> ipcPort=10014) Starting thread to transfer block
> blk_2440813767266473910_11564425 to 10.168.18.178:10013
> 2013-04-27 17:51:40,230 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, 
> infoPort=15075, ipcPort=10014):Transmitted block 
> blk_2440813767266473910_11564425 to
> /10.168.18.178:10013
> 2013-04-27 17:51:40,433 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013
> 2013-04-27 17:51:40,450 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013 of size 25431
>
> Thanks
> Selva
>
>
>
>
>
>

--
Harsh J

RE: High IO Usage in Datanodes due to Replication

Posted by "S, Manoj" <ma...@intel.com>.

Adding to Harsh's comments:

You can also tweak a few OS level parameters to improve the I/O performance.
1) Mount the filesystem with "noatime" option.
2) Check if changing the IO scheduling the algorithm will improve the cluster's performance. (Check this file /sys/block/<device_name>/queue/scheduler)
3) If there are lots of I/O requests and your cluster hangs because of that, you can increase the queue length by increasing the value in /sys/block/<device_name>/queue/nr_requests.

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Sunday, April 28, 2013 12:03 AM
To: <us...@hadoop.apache.org>
Subject: Re: High IO Usage in Datanodes due to Replication

They seem to be transferring blocks between one another. This may most likely be due to under-replication and the NN UI will have numbers on work left to perform. The inter-DN transfer is controlled by the balancing bandwidth though, so you can lower that down if you want to, to cripple it - but you'll lose out on time for a perfectly replicated state again.

On Sat, Apr 27, 2013 at 11:33 PM, selva <se...@gmail.com> wrote:
> Hi All,
>
> I have lost amazon instances of my hadoop cluster. But i had all the 
> data in aws EBS volumes. So i launched new instances and attached volumes.
>
> But all of the datanode logs keep on print the below lines it cauased 
> to high IO rate. Due to IO usage i am not able to run any jobs.
>
> Can anyone help me to understand what it is doing? Thanks in advance.
>
> 2013-04-27 17:51:40,197 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, 
> infoPort=15075,
> ipcPort=10014) Starting thread to transfer block
> blk_2440813767266473910_11564425 to 10.168.18.178:10013
> 2013-04-27 17:51:40,230 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, 
> infoPort=15075, ipcPort=10014):Transmitted block 
> blk_2440813767266473910_11564425 to
> /10.168.18.178:10013
> 2013-04-27 17:51:40,433 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013
> 2013-04-27 17:51:40,450 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013 of size 25431
>
> Thanks
> Selva
>
>
>
>
>
>

--
Harsh J

Re: High IO Usage in Datanodes due to Replication

Posted by Harsh J <ha...@cloudera.com>.

They seem to be transferring blocks between one another. This may most
likely be due to under-replication and the NN UI will have numbers on
work left to perform. The inter-DN transfer is controlled by the
balancing bandwidth though, so you can lower that down if you want to,
to cripple it - but you'll lose out on time for a perfectly replicated
state again.

On Sat, Apr 27, 2013 at 11:33 PM, selva <se...@gmail.com> wrote:
> Hi All,
>
> I have lost amazon instances of my hadoop cluster. But i had all the data in
> aws EBS volumes. So i launched new instances and attached volumes.
>
> But all of the datanode logs keep on print the below lines it cauased to
> high IO rate. Due to IO usage i am not able to run any jobs.
>
> Can anyone help me to understand what it is doing? Thanks in advance.
>
> 2013-04-27 17:51:40,197 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, infoPort=15075,
> ipcPort=10014) Starting thread to transfer block
> blk_2440813767266473910_11564425 to 10.168.18.178:10013
> 2013-04-27 17:51:40,230 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, infoPort=15075,
> ipcPort=10014):Transmitted block blk_2440813767266473910_11564425 to
> /10.168.18.178:10013
> 2013-04-27 17:51:40,433 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013
> 2013-04-27 17:51:40,450 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013 of size 25431
>
> Thanks
> Selva
>
>
>
>
>
>



-- 
Harsh J

Re: High IO Usage in Datanodes due to Replication

Posted by Harsh J <ha...@cloudera.com>.

They seem to be transferring blocks between one another. This may most
likely be due to under-replication and the NN UI will have numbers on
work left to perform. The inter-DN transfer is controlled by the
balancing bandwidth though, so you can lower that down if you want to,
to cripple it - but you'll lose out on time for a perfectly replicated
state again.

On Sat, Apr 27, 2013 at 11:33 PM, selva <se...@gmail.com> wrote:
> Hi All,
>
> I have lost amazon instances of my hadoop cluster. But i had all the data in
> aws EBS volumes. So i launched new instances and attached volumes.
>
> But all of the datanode logs keep on print the below lines it cauased to
> high IO rate. Due to IO usage i am not able to run any jobs.
>
> Can anyone help me to understand what it is doing? Thanks in advance.
>
> 2013-04-27 17:51:40,197 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, infoPort=15075,
> ipcPort=10014) Starting thread to transfer block
> blk_2440813767266473910_11564425 to 10.168.18.178:10013
> 2013-04-27 17:51:40,230 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, infoPort=15075,
> ipcPort=10014):Transmitted block blk_2440813767266473910_11564425 to
> /10.168.18.178:10013
> 2013-04-27 17:51:40,433 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013
> 2013-04-27 17:51:40,450 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013 of size 25431
>
> Thanks
> Selva
>
>
>
>
>
>



-- 
Harsh J

Re: High IO Usage in Datanodes due to Replication

Posted by Harsh J <ha...@cloudera.com>.

They seem to be transferring blocks between one another. This may most
likely be due to under-replication and the NN UI will have numbers on
work left to perform. The inter-DN transfer is controlled by the
balancing bandwidth though, so you can lower that down if you want to,
to cripple it - but you'll lose out on time for a perfectly replicated
state again.

On Sat, Apr 27, 2013 at 11:33 PM, selva <se...@gmail.com> wrote:
> Hi All,
>
> I have lost amazon instances of my hadoop cluster. But i had all the data in
> aws EBS volumes. So i launched new instances and attached volumes.
>
> But all of the datanode logs keep on print the below lines it cauased to
> high IO rate. Due to IO usage i am not able to run any jobs.
>
> Can anyone help me to understand what it is doing? Thanks in advance.
>
> 2013-04-27 17:51:40,197 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, infoPort=15075,
> ipcPort=10014) Starting thread to transfer block
> blk_2440813767266473910_11564425 to 10.168.18.178:10013
> 2013-04-27 17:51:40,230 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, infoPort=15075,
> ipcPort=10014):Transmitted block blk_2440813767266473910_11564425 to
> /10.168.18.178:10013
> 2013-04-27 17:51:40,433 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013
> 2013-04-27 17:51:40,450 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013 of size 25431
>
> Thanks
> Selva
>
>
>
>
>
>



-- 
Harsh J

Re: High IO Usage in Datanodes due to Replication

Posted by Harsh J <ha...@cloudera.com>.

They seem to be transferring blocks between one another. This may most
likely be due to under-replication and the NN UI will have numbers on
work left to perform. The inter-DN transfer is controlled by the
balancing bandwidth though, so you can lower that down if you want to,
to cripple it - but you'll lose out on time for a perfectly replicated
state again.

On Sat, Apr 27, 2013 at 11:33 PM, selva <se...@gmail.com> wrote:
> Hi All,
>
> I have lost amazon instances of my hadoop cluster. But i had all the data in
> aws EBS volumes. So i launched new instances and attached volumes.
>
> But all of the datanode logs keep on print the below lines it cauased to
> high IO rate. Due to IO usage i am not able to run any jobs.
>
> Can anyone help me to understand what it is doing? Thanks in advance.
>
> 2013-04-27 17:51:40,197 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, infoPort=15075,
> ipcPort=10014) Starting thread to transfer block
> blk_2440813767266473910_11564425 to 10.168.18.178:10013
> 2013-04-27 17:51:40,230 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.157.10.242:10013,
> storageID=DS-407656544-10.28.217.27-10013-1353165843727, infoPort=15075,
> ipcPort=10014):Transmitted block blk_2440813767266473910_11564425 to
> /10.168.18.178:10013
> 2013-04-27 17:51:40,433 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013
> 2013-04-27 17:51:40,450 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> /10.157.10.242:10013 of size 25431
>
> Thanks
> Selva
>
>
>
>
>
>



-- 
Harsh J