You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Siddharth Tiwari <si...@live.com> on 2014/03/03 01:28:06 UTC
Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards to follow so that this IO can be distributed across to other disks as well. What should be the standard around setting up the hadoop.tmp.dir parameter. Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtnsda 2.11 37.65 226.20 313512628 1883809216sdb 1.47 96.44 152.48 803144582 1269829840sdc 1.45 93.03 153.10 774765734 1274979080sdd 1.46 95.06 152.73 791690022 1271944848sde 1.47 92.70 153.24 772025750 1276195288sdf 1.55 95.77 153.06 797567654 1274657320sdg 10.10 364.26 1951.79 3033537062 16254346480sdi 1.46 94.82 152.98 789646630 1274014936sdh 1.44 94.09 152.57 783547390 1270598232sdj 1.44 91.94 153.37 765678470 1277220208sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Siddharth Tiwari <si...@live.com>.
Thanks Brahma,
That answers my question.
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 06:51:30 +0000
What should be the standard around setting up the hadoop.tmp.dir parameter.
>>>>>>>> As I know hadoop.tmp.dir will be used for follow properites, If you are configuring following properties,then you no need to configure this one..
MapReduce:
mapreduce.cluster.local.dir
${hadoop.tmp.dir}/mapred/local
The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored.
mapreduce.jobtracker.system.dir
${hadoop.tmp.dir}/mapred/system
The directory where MapReduce stores control files.
mapreduce.jobtracker.staging.root.dir
${hadoop.tmp.dir}/mapred/staging
The root of the staging area for users' job files In practice, this should be the directory where users' home directories are located (usually /user)
mapreduce.cluster.temp.dir
${hadoop.tmp.dir}/mapred/temp
A shared directory for temporary files.
Yarn :
yarn.nodemanager.local-dirs
${hadoop.tmp.dir}/nm-local-dir
List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called
container_${contid}, will be subdirectories of this.
HDFS :
dfs.namenode.name.dir
file://${hadoop.tmp.dir}/dfs/name
Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.datanode.data.dir
file://${hadoop.tmp.dir}/dfs/data
Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist
are ignored.
dfs.namenode.checkpoint.dir
file://${hadoop.tmp.dir}/dfs/namesecondary
Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.
Thanks & Regards
Brahma Reddy Battula
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 11:20 AM
To: USers Hadoop
Subject: RE: Huge disk IO on only one disk
Hi Brahma,
No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir . Have
put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not ? if yes what should be standards around.
*------------------------*
Cheers !!!
Siddharth
Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 05:14:34 +0000
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards
to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Siddharth Tiwari <si...@live.com>.
Thanks Brahma,
That answers my question.
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 06:51:30 +0000
What should be the standard around setting up the hadoop.tmp.dir parameter.
>>>>>>>> As I know hadoop.tmp.dir will be used for follow properites, If you are configuring following properties,then you no need to configure this one..
MapReduce:
mapreduce.cluster.local.dir
${hadoop.tmp.dir}/mapred/local
The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored.
mapreduce.jobtracker.system.dir
${hadoop.tmp.dir}/mapred/system
The directory where MapReduce stores control files.
mapreduce.jobtracker.staging.root.dir
${hadoop.tmp.dir}/mapred/staging
The root of the staging area for users' job files In practice, this should be the directory where users' home directories are located (usually /user)
mapreduce.cluster.temp.dir
${hadoop.tmp.dir}/mapred/temp
A shared directory for temporary files.
Yarn :
yarn.nodemanager.local-dirs
${hadoop.tmp.dir}/nm-local-dir
List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called
container_${contid}, will be subdirectories of this.
HDFS :
dfs.namenode.name.dir
file://${hadoop.tmp.dir}/dfs/name
Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.datanode.data.dir
file://${hadoop.tmp.dir}/dfs/data
Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist
are ignored.
dfs.namenode.checkpoint.dir
file://${hadoop.tmp.dir}/dfs/namesecondary
Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.
Thanks & Regards
Brahma Reddy Battula
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 11:20 AM
To: USers Hadoop
Subject: RE: Huge disk IO on only one disk
Hi Brahma,
No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir . Have
put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not ? if yes what should be standards around.
*------------------------*
Cheers !!!
Siddharth
Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 05:14:34 +0000
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards
to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Siddharth Tiwari <si...@live.com>.
Thanks Brahma,
That answers my question.
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 06:51:30 +0000
What should be the standard around setting up the hadoop.tmp.dir parameter.
>>>>>>>> As I know hadoop.tmp.dir will be used for follow properites, If you are configuring following properties,then you no need to configure this one..
MapReduce:
mapreduce.cluster.local.dir
${hadoop.tmp.dir}/mapred/local
The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored.
mapreduce.jobtracker.system.dir
${hadoop.tmp.dir}/mapred/system
The directory where MapReduce stores control files.
mapreduce.jobtracker.staging.root.dir
${hadoop.tmp.dir}/mapred/staging
The root of the staging area for users' job files In practice, this should be the directory where users' home directories are located (usually /user)
mapreduce.cluster.temp.dir
${hadoop.tmp.dir}/mapred/temp
A shared directory for temporary files.
Yarn :
yarn.nodemanager.local-dirs
${hadoop.tmp.dir}/nm-local-dir
List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called
container_${contid}, will be subdirectories of this.
HDFS :
dfs.namenode.name.dir
file://${hadoop.tmp.dir}/dfs/name
Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.datanode.data.dir
file://${hadoop.tmp.dir}/dfs/data
Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist
are ignored.
dfs.namenode.checkpoint.dir
file://${hadoop.tmp.dir}/dfs/namesecondary
Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.
Thanks & Regards
Brahma Reddy Battula
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 11:20 AM
To: USers Hadoop
Subject: RE: Huge disk IO on only one disk
Hi Brahma,
No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir . Have
put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not ? if yes what should be standards around.
*------------------------*
Cheers !!!
Siddharth
Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 05:14:34 +0000
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards
to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Siddharth Tiwari <si...@live.com>.
Thanks Brahma,
That answers my question.
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 06:51:30 +0000
What should be the standard around setting up the hadoop.tmp.dir parameter.
>>>>>>>> As I know hadoop.tmp.dir will be used for follow properites, If you are configuring following properties,then you no need to configure this one..
MapReduce:
mapreduce.cluster.local.dir
${hadoop.tmp.dir}/mapred/local
The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored.
mapreduce.jobtracker.system.dir
${hadoop.tmp.dir}/mapred/system
The directory where MapReduce stores control files.
mapreduce.jobtracker.staging.root.dir
${hadoop.tmp.dir}/mapred/staging
The root of the staging area for users' job files In practice, this should be the directory where users' home directories are located (usually /user)
mapreduce.cluster.temp.dir
${hadoop.tmp.dir}/mapred/temp
A shared directory for temporary files.
Yarn :
yarn.nodemanager.local-dirs
${hadoop.tmp.dir}/nm-local-dir
List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called
container_${contid}, will be subdirectories of this.
HDFS :
dfs.namenode.name.dir
file://${hadoop.tmp.dir}/dfs/name
Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.datanode.data.dir
file://${hadoop.tmp.dir}/dfs/data
Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist
are ignored.
dfs.namenode.checkpoint.dir
file://${hadoop.tmp.dir}/dfs/namesecondary
Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.
Thanks & Regards
Brahma Reddy Battula
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 11:20 AM
To: USers Hadoop
Subject: RE: Huge disk IO on only one disk
Hi Brahma,
No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir . Have
put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not ? if yes what should be standards around.
*------------------------*
Cheers !!!
Siddharth
Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 05:14:34 +0000
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards
to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Brahma Reddy Battula <br...@huawei.com>.
What should be the standard around setting up the hadoop.tmp.dir parameter.
>>>>>>>> As I know hadoop.tmp.dir will be used for follow properites, If you are configuring following properties,then you no need to configure this one..
MapReduce:
mapreduce.cluster.local.dir ${hadoop.tmp.dir}/mapred/local The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored.
mapreduce.jobtracker.system.dir ${hadoop.tmp.dir}/mapred/system The directory where MapReduce stores control files.
mapreduce.jobtracker.staging.root.dir ${hadoop.tmp.dir}/mapred/staging The root of the staging area for users' job files In practice, this should be the directory where users' home directories are located (usually /user)
mapreduce.cluster.temp.dir ${hadoop.tmp.dir}/mapred/temp A shared directory for temporary files.
Yarn :
yarn.nodemanager.local-dirs ${hadoop.tmp.dir}/nm-local-dir List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called container_${contid}, will be subdirectories of this.
HDFS :
dfs.namenode.name.dir file://${hadoop.tmp.dir}/dfs/name Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/data Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.
dfs.namenode.checkpoint.dir file://${hadoop.tmp.dir}/dfs/namesecondary Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.
Thanks & Regards
Brahma Reddy Battula
________________________________
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 11:20 AM
To: USers Hadoop
Subject: RE: Huge disk IO on only one disk
Hi Brahma,
No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir . Have put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not ? if yes what should be standards around.
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
________________________________
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 05:14:34 +0000
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
________________________________
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Brahma Reddy Battula <br...@huawei.com>.
What should be the standard around setting up the hadoop.tmp.dir parameter.
>>>>>>>> As I know hadoop.tmp.dir will be used for follow properites, If you are configuring following properties,then you no need to configure this one..
MapReduce:
mapreduce.cluster.local.dir ${hadoop.tmp.dir}/mapred/local The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored.
mapreduce.jobtracker.system.dir ${hadoop.tmp.dir}/mapred/system The directory where MapReduce stores control files.
mapreduce.jobtracker.staging.root.dir ${hadoop.tmp.dir}/mapred/staging The root of the staging area for users' job files In practice, this should be the directory where users' home directories are located (usually /user)
mapreduce.cluster.temp.dir ${hadoop.tmp.dir}/mapred/temp A shared directory for temporary files.
Yarn :
yarn.nodemanager.local-dirs ${hadoop.tmp.dir}/nm-local-dir List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called container_${contid}, will be subdirectories of this.
HDFS :
dfs.namenode.name.dir file://${hadoop.tmp.dir}/dfs/name Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/data Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.
dfs.namenode.checkpoint.dir file://${hadoop.tmp.dir}/dfs/namesecondary Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.
Thanks & Regards
Brahma Reddy Battula
________________________________
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 11:20 AM
To: USers Hadoop
Subject: RE: Huge disk IO on only one disk
Hi Brahma,
No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir . Have put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not ? if yes what should be standards around.
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
________________________________
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 05:14:34 +0000
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
________________________________
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Brahma Reddy Battula <br...@huawei.com>.
What should be the standard around setting up the hadoop.tmp.dir parameter.
>>>>>>>> As I know hadoop.tmp.dir will be used for follow properites, If you are configuring following properties,then you no need to configure this one..
MapReduce:
mapreduce.cluster.local.dir ${hadoop.tmp.dir}/mapred/local The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored.
mapreduce.jobtracker.system.dir ${hadoop.tmp.dir}/mapred/system The directory where MapReduce stores control files.
mapreduce.jobtracker.staging.root.dir ${hadoop.tmp.dir}/mapred/staging The root of the staging area for users' job files In practice, this should be the directory where users' home directories are located (usually /user)
mapreduce.cluster.temp.dir ${hadoop.tmp.dir}/mapred/temp A shared directory for temporary files.
Yarn :
yarn.nodemanager.local-dirs ${hadoop.tmp.dir}/nm-local-dir List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called container_${contid}, will be subdirectories of this.
HDFS :
dfs.namenode.name.dir file://${hadoop.tmp.dir}/dfs/name Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/data Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.
dfs.namenode.checkpoint.dir file://${hadoop.tmp.dir}/dfs/namesecondary Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.
Thanks & Regards
Brahma Reddy Battula
________________________________
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 11:20 AM
To: USers Hadoop
Subject: RE: Huge disk IO on only one disk
Hi Brahma,
No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir . Have put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not ? if yes what should be standards around.
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
________________________________
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 05:14:34 +0000
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
________________________________
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Brahma Reddy Battula <br...@huawei.com>.
What should be the standard around setting up the hadoop.tmp.dir parameter.
>>>>>>>> As I know hadoop.tmp.dir will be used for follow properites, If you are configuring following properties,then you no need to configure this one..
MapReduce:
mapreduce.cluster.local.dir ${hadoop.tmp.dir}/mapred/local The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored.
mapreduce.jobtracker.system.dir ${hadoop.tmp.dir}/mapred/system The directory where MapReduce stores control files.
mapreduce.jobtracker.staging.root.dir ${hadoop.tmp.dir}/mapred/staging The root of the staging area for users' job files In practice, this should be the directory where users' home directories are located (usually /user)
mapreduce.cluster.temp.dir ${hadoop.tmp.dir}/mapred/temp A shared directory for temporary files.
Yarn :
yarn.nodemanager.local-dirs ${hadoop.tmp.dir}/nm-local-dir List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called container_${contid}, will be subdirectories of this.
HDFS :
dfs.namenode.name.dir file://${hadoop.tmp.dir}/dfs/name Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/data Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.
dfs.namenode.checkpoint.dir file://${hadoop.tmp.dir}/dfs/namesecondary Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.
Thanks & Regards
Brahma Reddy Battula
________________________________
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 11:20 AM
To: USers Hadoop
Subject: RE: Huge disk IO on only one disk
Hi Brahma,
No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir . Have put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not ? if yes what should be standards around.
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
________________________________
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 05:14:34 +0000
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
________________________________
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Siddharth Tiwari <si...@live.com>.
Hi Brahma,
No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir . Have put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not ? if yes what should be standards around.
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 05:14:34 +0000
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards
to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth
Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Siddharth Tiwari <si...@live.com>.
Hi Brahma,
No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir . Have put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not ? if yes what should be standards around.
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 05:14:34 +0000
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards
to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth
Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Siddharth Tiwari <si...@live.com>.
Hi Brahma,
No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir . Have put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not ? if yes what should be standards around.
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 05:14:34 +0000
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards
to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth
Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Siddharth Tiwari <si...@live.com>.
Hi Brahma,
No I havnt, I have put comma separated list of disks here dfs.datanode.data.dir . Have put disk5 for hadoop.tmp.dir. My Q is, should we set up hadoop.tmp.dir or not ? if yes what should be standards around.
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
From: brahmareddy.battula@huawei.com
To: user@hadoop.apache.org
Subject: RE: Huge disk IO on only one disk
Date: Mon, 3 Mar 2014 05:14:34 +0000
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards
to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth
Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Brahma Reddy Battula <br...@huawei.com>.
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
________________________________
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Brahma Reddy Battula <br...@huawei.com>.
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
________________________________
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Brahma Reddy Battula <br...@huawei.com>.
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
________________________________
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"
RE: Huge disk IO on only one disk
Posted by Brahma Reddy Battula <br...@huawei.com>.
Seems to be you had started cluster with default values for the following two properties and configured for only hadoop.tmp.dir .
dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default value)
>>>>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices
yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default value)
>>>>>>To store localized files, It's like inetermediate files
Please configure above two values as muliple dir's..
Thanks & Regards
Brahma Reddy Battula
________________________________
From: Siddharth Tiwari [siddharth.tiwari@live.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk
Hi Team,
I have 10 disks over which I am running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured. I see that on this disk I have huge IO when I run my jobs compared to other disks. Can you guide my to the standards to follow so that this IO can be distributed across to other disks as well.
What should be the standard around setting up the hadoop.tmp.dir parameter.
Any help would be highly appreciated. below is IO while I am running a huge job.
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.11 37.65 226.20 313512628 1883809216
sdb 1.47 96.44 152.48 803144582 1269829840
sdc 1.45 93.03 153.10 774765734 1274979080
sdd 1.46 95.06 152.73 791690022 1271944848
sde 1.47 92.70 153.24 772025750 1276195288
sdf 1.55 95.77 153.06 797567654 1274657320
sdg 10.10 364.26 1951.79 3033537062 16254346480
sdi 1.46 94.82 152.98 789646630 1274014936
sdh 1.44 94.09 152.57 783547390 1270598232
sdj 1.44 91.94 153.37 765678470 1277220208
sdk 1.52 97.01 153.02 807928678 1274300360
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.”
"Maybe other people will try to limit me but I don't limit myself"