You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Abdul Navaz <na...@gmail.com> on 2014/10/03 01:06:07 UTC
Re: No space when running a hadoop job
Hello,
As you suggested I have changed the hdfs-site.xml file of datanodes and name
node as below and formatted the name node.
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/mnt</value>
<description>Comma separated list of paths. Use the list of directories from
$DFS_DATA_DIR.
For example,
/grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
</property>
hduser@dn1:~$ df -h
Filesystem Size Used Avail Use%
Mounted on
/dev/xvda2 5.9G 5.3G 258M 96% /
udev 98M 4.0K 98M 1% /dev
tmpfs 48M 196K 48M 1% /run
none 5.0M 0 5.0M 0%
/run/lock
none 120M 0 120M 0%
/run/shm
172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 113G 70G 62%
/groups/ch-geni-net/Hadoop-NET
172.17.253.254:/q/proj/ch-geni-net 198G 113G 70G 62%
/proj/ch-geni-net
/dev/xvda4 7.9G 147M 7.4G 2% /mnt
hduser@dn1:~$
Even after doing so, the file is copied only to /dev/xvda2 instead of
/dev/xvda4.
Once /dev/xvda2 is full I am getting the below error message.
hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
Warning: $HADOOP_HOME is deprecated.
14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead
of 1
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam
esystem.java:1639)
Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
5.9GB , I want to use only /dev/xvda4. How can I do this ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388
From: Abdul Navaz <na...@gmail.com>
Date: Monday, September 29, 2014 at 1:53 PM
To: <us...@hadoop.apache.org>
Subject: Re: No space when running a hadoop job
Dear All,
I am not doing load balancing here. I am just copying a file and it is
throwing me an error no space left on the device.
hduser@dn1:~$ df -h
Filesystem Size Used Avail Use%
Mounted on
/dev/xvda2 5.9G 5.1G 533M 91% /
udev 98M 4.0K 98M 1% /dev
tmpfs 48M 196K 48M 1% /run
none 5.0M 0 5.0M 0%
/run/lock
none 120M 0 120M 0%
/run/shm
172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 116G 67G 64%
/groups/ch-geni-net/Hadoop-NET
172.17.253.254:/q/proj/ch-geni-net 198G 116G 67G 64%
/proj/ch-geni-net
/dev/xvda4 7.9G 147M 7.4G 2% /mnt
hduser@dn1:~$
hduser@dn1:~$
hduser@dn1:~$
hduser@dn1:~$ cp data2.txt data3.txt
cp: writing `data3.txt': No space left on device
cp: failed to extend `data3.txt': No space left on device
hduser@dn1:~$
I guess by default it is copying to default location. Why I am getting this
error ? How can I fix this ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388
From: Aitor Cedres <ac...@pivotal.io>
Reply-To: <us...@hadoop.apache.org>
Date: Monday, September 29, 2014 at 7:53 AM
To: <us...@hadoop.apache.org>
Subject: Re: No space when running a hadoop job
I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.
On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:
> Thank Aitor.
>
> That is what is my observation too.
>
> I added a new disk location and manually moved some files.
>
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
>
> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>> > Hi Susheel,
>> >
>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>> > disks straightforward. Eventually, by HDFS activity (deleting/invalidating
>> > some block, writing new ones), the disks will become balanced. If you want
>> > to balance them right after adding the new disk and changing the
>> > ³dfs.datanode.data.dir²
>> > value, you have to shutdown the DN and manually move (mv) some files in the
>> > old directory to the new one.
>> >
>> > The balancer will try to balance the usage between HDFS nodes, but it won't
>> > care about "internal" node disks utilization. For your particular case, the
>> > balancer won't fix your issue.
>> >
>> > Hope it helps,
>> > Aitor
>> >
>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>> > wrote:
>> >
>>> >> You mean if multiple directory locations are given, Hadoop will
>>> >> balance the distribution of files across these different directories.
>>> >>
>>> >> But normally we start with 1 directory location and once it is
>>> >> reaching the maximum, we add new directory.
>>> >>
>>> >> In this case how can we balance the distribution of files?
>>> >>
>>> >> One way is to list the files and move.
>>> >>
>>> >> Will start balance script will work?
>>> >>
>>> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>> >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>> >> > On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>> <sk...@gmail.com>
>>>> >> > wrote:
>>>> >> >
>>>>> >> >> Correct me if I am wrong.
>>>>> >> >>
>>>>> >> >> Adding multiple directories will not balance the files distributions
>>>>> >> >> across these locations.
>>>>> >> >>
>>>>> >> >> Hadoop will add exhaust the first directory and then start using the
>>>>> >> >> next, next ..
>>>>> >> >>
>>>>> >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>> >> >>
>>>>> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>> >> >> > You can add a comma separated list of paths to the
>>>>> >> >> ³dfs.datanode.data.dir²
>>>>>> >> >> > property in your hdfs-site.xml
>>>>>> >> >> >
>>>>>> >> >> > mn
>>>>>> >> >> >
>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>> >> >> > wrote:
>>>>>> >> >> >
>>>>>>> >> >> >> Hi
>>>>>>> >> >> >>
>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>> >> >> >> running
>>>>>>> >> >> >> map reduce job.
>>>>>>> >> >> >>
>>>>>>> >> >> >> root@nn:~# df -h
>>>>>>> >> >> >> Filesystem Size Used
Avail
>>> >> Use%
>>>>>>> >> >> >> Mounted on
>>>>>>> >> >> >> /dev/xvda2 5.9G 5.9G
0
>>> >> 100%
>>>>>>> >> >> >> /
>>>>>>> >> >> >> udev 98M 4.0K
98M
>>> >> 1%
>>>>>>> >> >> >> /dev
>>>>>>> >> >> >> tmpfs 48M 192K
48M
>>> >> 1%
>>>>>>> >> >> >> /run
>>>>>>> >> >> >> none 5.0M 0
5.0M
>>> >> 0%
>>>>>>> >> >> >> /run/lock
>>>>>>> >> >> >> none 120M 0
120M
>>> >> 0%
>>>>>>> >> >> >> /run/shm
>>>>>>> >> >> >> overflow 1.0M 4.0K
1020K
>>> >> 1%
>>>>>>> >> >> >> /tmp
>>>>>>> >> >> >> /dev/xvda4 7.9G 147M
7.4G
>>> >> 2%
>>>>>>> >> >> >> /mnt
>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G
75G
>>> >> 59%
>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net 198G 108G
75G
>>> >> 59%
>>>>>>> >> >> >> /proj/ch-geni-net
>>>>>>> >> >> >> root@nn:~#
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>> >> >> >>
>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
>>>>>>> >> >> >> need
>>>>>>> >> >> >> to
>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> Thanks & Regards,
>>>>>>> >> >> >>
>>>>>>> >> >> >> Abdul Navaz
>>>>>>> >> >> >> Research Assistant
>>>>>>> >> >> >> University of Houston Main Campus, Houston TX
>>>>>>> >> >> >> Ph: 281-685-0388
>>>>>>> >> >> >>
>>>>>> >> >> >
>>>>>> >> >> >
>>>>> >> >>
>>>> >> >
>>> >>
>> >
Re: No space when running a hadoop job
Posted by Abdul Navaz <na...@gmail.com>.
Thank You Very much. This is what I am trying to do.
This is what storage I have.
Filesystem Size Used Avail Use%
Mounted on
/dev/xvda2 5.9G 5.3G 238M 96% /
/dev/xvda4 7.9G 147M 7.4G 2% /mnt
I have configured in dfs.datanode.dir in hdfs-site.
<name>dfs.datanode.data.dir</name>
<value>/mnt</value>
I have formatted the name node and restarted and it is still copying to /
and if it is full it throws an error instead of copying to /mnt¹.
Error:
14/10/03 15:23:21 WARN hdfs.DFSClient: Could not get block locations. Source
file "/user/hduser/getty/data4" - Aborting...
put: java.io.IOException: File /user/hduser/getty/data4 could only be
replicated to 0 nodes, instead of 1
14/10/03 15:23:21 ERROR hdfs.DFSClient: Failed to close file
/user/hduser/getty/data4
Am I doing anything wrong here ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388
From: ViSolve Hadoop Support <ha...@visolve.com>
Reply-To: <us...@hadoop.apache.org>
Date: Friday, October 3, 2014 at 1:29 AM
To: <us...@hadoop.apache.org>
Subject: Re: No space when running a hadoop job
Hello,
If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
After the changes restart the hadoop services and check the available space
using the below command.
# hadoop fs -df -h
Regards,
ViSolve Hadoop Team
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
>
>
> Hello,
>
>
>
>
> As you suggested I have changed the hdfs-site.xml file of datanodes and name
> node as below and formatted the name node.
>
>
>
>
>
>
> </property>
>
>
> <property>
>
>
> <name>dfs.datanode.data.dir</name>
>
>
> <value>/mnt</value>
>
>
> <description>Comma separated list of paths. Use the list of directories from
> $DFS_DATA_DIR.
>
>
> For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
>
> </property>
>
>
>
>
>
>
>
>
>
>
> hduser@dn1:~$ df -h
>
>
> Filesystem Size Used Avail Use% Mounted
> on
>
>
> /dev/xvda2 5.9G 5.3G 258M 96% /
>
>
> udev 98M 4.0K 98M 1% /dev
>
>
> tmpfs 48M 196K 48M 1% /run
>
>
> none 5.0M 0 5.0M 0%
> /run/lock
>
>
> none 120M 0 120M 0%
> /run/shm
>
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 113G 70G 62%
> /groups/ch-geni-net/Hadoop-NET
>
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 113G 70G 62%
> /proj/ch-geni-net
>
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
>
> hduser@dn1:~$
>
>
>
>
>
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>
>
>
>
> Once /dev/xvda2 is full I am getting the below error message.
>
>
>
>
>
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
>
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead of
> 1
>
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNames
> ystem.java:1639)
>
>
>
>
>
>
>
>
>
>
>
>
> Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
> 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
>
>
>
>
>
>
>
>
>
> Thanks & Regards,
>
>
>
>
> Abdul Navaz
>
> Research Assistant
>
> University of Houston Main Campus, Houston TX
>
> Ph: 281-685-0388
>
>
>
>
>
>
>
>
>
> From: Abdul Navaz <na...@gmail.com>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: No space when running a hadoop job
>
>
>
>
>
>
>
>
>
> Dear All,
>
>
>
>
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>
>
>
>
>
>
>
>
>
> hduser@dn1:~$ df -h
>
>
> Filesystem Size Used Avail Use% Mounted
> on
>
>
> /dev/xvda2 5.9G 5.1G 533M 91% /
>
>
> udev 98M 4.0K 98M 1% /dev
>
>
> tmpfs 48M 196K 48M 1% /run
>
>
> none 5.0M 0 5.0M 0%
> /run/lock
>
>
> none 120M 0 120M 0%
> /run/shm
>
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 116G 67G 64%
> /groups/ch-geni-net/Hadoop-NET
>
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 116G 67G 64%
> /proj/ch-geni-net
>
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
>
> hduser@dn1:~$
>
>
> hduser@dn1:~$
>
>
> hduser@dn1:~$
>
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
>
> cp: writing `data3.txt': No space left on device
>
>
> cp: failed to extend `data3.txt': No space left on device
>
>
> hduser@dn1:~$
>
>
>
>
>
>
> I guess by default it is copying to default location. Why I am getting this
> error ? How can I fix this ?
>
>
>
>
>
>
>
> Thanks & Regards,
>
>
>
>
> Abdul Navaz
>
> Research Assistant
>
> University of Houston Main Campus, Houston TX
>
> Ph: 281-685-0388
>
>
>
>
>
>
>
>
>
>
> From: Aitor Cedres <ac...@pivotal.io>
> Reply-To: <us...@hadoop.apache.org>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: No space when running a hadoop job
>
>
>
>
>
>
>
> I think they way it works when HDFS has a list in dfs.datanode.data.dir, it's
> basically a round robin between disks. And yes, it may not be perfect balanced
> cause of different file sizes.
>
>
>
>
>
>
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
>
>> Thank Aitor.
>>
>> That is what is my observation too.
>>
>> I added a new disk location and manually moved some files.
>>
>> But if 2 locations are given at the beginning itself for
>> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>> perfect because file sizes may differ.
>>
>>
>>
>> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>>> > Hi Susheel,
>>> >
>>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>>> > disks straightforward. Eventually, by HDFS activity
>>> (deleting/invalidating
>>> > some block, writing new ones), the disks will become balanced. If you >>>
want
>>> > to balance them right after adding the new disk and changing the
>>> > ³dfs.datanode.data.dir²
>>> > value, you have to shutdown the DN and manually move (mv) some files in
>>> the
>>> > old directory to the new one.
>>> >
>>> > The balancer will try to balance the usage between HDFS nodes, but it
>>> won't
>>> > care about "internal" node disks utilization. For your particular case,
>>> the
>>> > balancer won't fix your issue.
>>> >
>>> > Hope it helps,
>>> > Aitor
>>> >
>>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>>> > wrote:
>>> >
>>>> >> You mean if multiple directory locations are given, Hadoop will
>>>> >> balance the distribution of files across these different directories.
>>>> >>
>>>> >> But normally we start with 1 directory location and once it is
>>>> >> reaching the maximum, we add new directory.
>>>> >>
>>>> >> In this case how can we balance the distribution of files?
>>>> >>
>>>> >> One way is to list the files and move.
>>>> >>
>>>> >> Will start balance script will work?
>>>> >>
>>>> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>>> >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>>> >> > On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>>> <sk...@gmail.com>
>>>>> >> > wrote:
>>>>> >> >
>>>>>> >> >> Correct me if I am wrong.
>>>>>> >> >>
>>>>>> >> >> Adding multiple directories will not balance the files
>>>>>> distributions
>>>>>> >> >> across these locations.
>>>>>> >> >>
>>>>>> >> >> Hadoop will add exhaust the first directory and then start using
the
>>>>>> >> >> next, next ..
>>>>>> >> >>
>>>>>> >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>>> >> >>
>>>>>> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>>> >> >> > You can add a comma separated list of paths to the
>>>>>> >> >> ³dfs.datanode.data.dir²
>>>>>>> >> >> > property in your hdfs-site.xml
>>>>>>> >> >> >
>>>>>>> >> >> > mn
>>>>>>> >> >> >
>>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>>> >> >> > wrote:
>>>>>>> >> >> >
>>>>>>>> >> >> >> Hi
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>>> >> >> >> running
>>>>>>>> >> >> >> map reduce job.
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> root@nn:~# df -h
>>>>>>>> >> >> >> Filesystem Size Used
Avail
>>>> >> Use%
>>>>>>>> >> >> >> Mounted on
>>>>>>>> >> >> >> /dev/xvda2 5.9G 5.9G
0
>>>> >> 100%
>>>>>>>> >> >> >> /
>>>>>>>> >> >> >> udev 98M 4.0K
98M
>>>> >> 1%
>>>>>>>> >> >> >> /dev
>>>>>>>> >> >> >> tmpfs 48M 192K
48M
>>>> >> 1%
>>>>>>>> >> >> >> /run
>>>>>>>> >> >> >> none 5.0M 0
5.0M
>>>> >> 0%
>>>>>>>> >> >> >> /run/lock
>>>>>>>> >> >> >> none 120M 0
120M
>>>> >> 0%
>>>>>>>> >> >> >> /run/shm
>>>>>>>> >> >> >> overflow 1.0M 4.0K
1020K
>>>> >> 1%
>>>>>>>> >> >> >> /tmp
>>>>>>>> >> >> >> /dev/xvda4 7.9G 147M
7.4G
>>>> >> 2%
>>>>>>>> >> >> >> /mnt
>>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G
75G
>>>> >> 59%
>>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net 198G 108G
75G
>>>> >> 59%
>>>>>>>> >> >> >> /proj/ch-geni-net
>>>>>>>> >> >> >> root@nn:~#
>>>>>>>> >> >> >>
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do
I
>>>>>>>> >> >> >> need
>>>>>>>> >> >> >> to
>>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>>> >> >> >>
>>>>>>>> >> >> >>
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> Thanks & Regards,
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> Abdul Navaz
>>>>>>>> >> >> >> Research Assistant
>>>>>>>> >> >> >> University of Houston Main Campus, Houston TX
>>>>>>>> >> >> >> Ph: 281-685-0388
>>>>>>>> >> >> >>
>>>>>>> >> >> >
>>>>>>> >> >> >
>>>>>> >> >>
>>>>> >> >
>>>> >>
>>> >
>>
>>
>>
>
>
>
>
>
>
>
Re: No space when running a hadoop job
Posted by Abdul Navaz <na...@gmail.com>.
Thank You Very much. This is what I am trying to do.
This is what storage I have.
Filesystem Size Used Avail Use%
Mounted on
/dev/xvda2 5.9G 5.3G 238M 96% /
/dev/xvda4 7.9G 147M 7.4G 2% /mnt
I have configured in dfs.datanode.dir in hdfs-site.
<name>dfs.datanode.data.dir</name>
<value>/mnt</value>
I have formatted the name node and restarted and it is still copying to /
and if it is full it throws an error instead of copying to /mnt¹.
Error:
14/10/03 15:23:21 WARN hdfs.DFSClient: Could not get block locations. Source
file "/user/hduser/getty/data4" - Aborting...
put: java.io.IOException: File /user/hduser/getty/data4 could only be
replicated to 0 nodes, instead of 1
14/10/03 15:23:21 ERROR hdfs.DFSClient: Failed to close file
/user/hduser/getty/data4
Am I doing anything wrong here ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388
From: ViSolve Hadoop Support <ha...@visolve.com>
Reply-To: <us...@hadoop.apache.org>
Date: Friday, October 3, 2014 at 1:29 AM
To: <us...@hadoop.apache.org>
Subject: Re: No space when running a hadoop job
Hello,
If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
After the changes restart the hadoop services and check the available space
using the below command.
# hadoop fs -df -h
Regards,
ViSolve Hadoop Team
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
>
>
> Hello,
>
>
>
>
> As you suggested I have changed the hdfs-site.xml file of datanodes and name
> node as below and formatted the name node.
>
>
>
>
>
>
> </property>
>
>
> <property>
>
>
> <name>dfs.datanode.data.dir</name>
>
>
> <value>/mnt</value>
>
>
> <description>Comma separated list of paths. Use the list of directories from
> $DFS_DATA_DIR.
>
>
> For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
>
> </property>
>
>
>
>
>
>
>
>
>
>
> hduser@dn1:~$ df -h
>
>
> Filesystem Size Used Avail Use% Mounted
> on
>
>
> /dev/xvda2 5.9G 5.3G 258M 96% /
>
>
> udev 98M 4.0K 98M 1% /dev
>
>
> tmpfs 48M 196K 48M 1% /run
>
>
> none 5.0M 0 5.0M 0%
> /run/lock
>
>
> none 120M 0 120M 0%
> /run/shm
>
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 113G 70G 62%
> /groups/ch-geni-net/Hadoop-NET
>
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 113G 70G 62%
> /proj/ch-geni-net
>
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
>
> hduser@dn1:~$
>
>
>
>
>
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>
>
>
>
> Once /dev/xvda2 is full I am getting the below error message.
>
>
>
>
>
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
>
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead of
> 1
>
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNames
> ystem.java:1639)
>
>
>
>
>
>
>
>
>
>
>
>
> Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
> 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
>
>
>
>
>
>
>
>
>
> Thanks & Regards,
>
>
>
>
> Abdul Navaz
>
> Research Assistant
>
> University of Houston Main Campus, Houston TX
>
> Ph: 281-685-0388
>
>
>
>
>
>
>
>
>
> From: Abdul Navaz <na...@gmail.com>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: No space when running a hadoop job
>
>
>
>
>
>
>
>
>
> Dear All,
>
>
>
>
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>
>
>
>
>
>
>
>
>
> hduser@dn1:~$ df -h
>
>
> Filesystem Size Used Avail Use% Mounted
> on
>
>
> /dev/xvda2 5.9G 5.1G 533M 91% /
>
>
> udev 98M 4.0K 98M 1% /dev
>
>
> tmpfs 48M 196K 48M 1% /run
>
>
> none 5.0M 0 5.0M 0%
> /run/lock
>
>
> none 120M 0 120M 0%
> /run/shm
>
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 116G 67G 64%
> /groups/ch-geni-net/Hadoop-NET
>
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 116G 67G 64%
> /proj/ch-geni-net
>
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
>
> hduser@dn1:~$
>
>
> hduser@dn1:~$
>
>
> hduser@dn1:~$
>
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
>
> cp: writing `data3.txt': No space left on device
>
>
> cp: failed to extend `data3.txt': No space left on device
>
>
> hduser@dn1:~$
>
>
>
>
>
>
> I guess by default it is copying to default location. Why I am getting this
> error ? How can I fix this ?
>
>
>
>
>
>
>
> Thanks & Regards,
>
>
>
>
> Abdul Navaz
>
> Research Assistant
>
> University of Houston Main Campus, Houston TX
>
> Ph: 281-685-0388
>
>
>
>
>
>
>
>
>
>
> From: Aitor Cedres <ac...@pivotal.io>
> Reply-To: <us...@hadoop.apache.org>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: No space when running a hadoop job
>
>
>
>
>
>
>
> I think they way it works when HDFS has a list in dfs.datanode.data.dir, it's
> basically a round robin between disks. And yes, it may not be perfect balanced
> cause of different file sizes.
>
>
>
>
>
>
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
>
>> Thank Aitor.
>>
>> That is what is my observation too.
>>
>> I added a new disk location and manually moved some files.
>>
>> But if 2 locations are given at the beginning itself for
>> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>> perfect because file sizes may differ.
>>
>>
>>
>> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>>> > Hi Susheel,
>>> >
>>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>>> > disks straightforward. Eventually, by HDFS activity
>>> (deleting/invalidating
>>> > some block, writing new ones), the disks will become balanced. If you >>>
want
>>> > to balance them right after adding the new disk and changing the
>>> > ³dfs.datanode.data.dir²
>>> > value, you have to shutdown the DN and manually move (mv) some files in
>>> the
>>> > old directory to the new one.
>>> >
>>> > The balancer will try to balance the usage between HDFS nodes, but it
>>> won't
>>> > care about "internal" node disks utilization. For your particular case,
>>> the
>>> > balancer won't fix your issue.
>>> >
>>> > Hope it helps,
>>> > Aitor
>>> >
>>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>>> > wrote:
>>> >
>>>> >> You mean if multiple directory locations are given, Hadoop will
>>>> >> balance the distribution of files across these different directories.
>>>> >>
>>>> >> But normally we start with 1 directory location and once it is
>>>> >> reaching the maximum, we add new directory.
>>>> >>
>>>> >> In this case how can we balance the distribution of files?
>>>> >>
>>>> >> One way is to list the files and move.
>>>> >>
>>>> >> Will start balance script will work?
>>>> >>
>>>> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>>> >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>>> >> > On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>>> <sk...@gmail.com>
>>>>> >> > wrote:
>>>>> >> >
>>>>>> >> >> Correct me if I am wrong.
>>>>>> >> >>
>>>>>> >> >> Adding multiple directories will not balance the files
>>>>>> distributions
>>>>>> >> >> across these locations.
>>>>>> >> >>
>>>>>> >> >> Hadoop will add exhaust the first directory and then start using
the
>>>>>> >> >> next, next ..
>>>>>> >> >>
>>>>>> >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>>> >> >>
>>>>>> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>>> >> >> > You can add a comma separated list of paths to the
>>>>>> >> >> ³dfs.datanode.data.dir²
>>>>>>> >> >> > property in your hdfs-site.xml
>>>>>>> >> >> >
>>>>>>> >> >> > mn
>>>>>>> >> >> >
>>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>>> >> >> > wrote:
>>>>>>> >> >> >
>>>>>>>> >> >> >> Hi
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>>> >> >> >> running
>>>>>>>> >> >> >> map reduce job.
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> root@nn:~# df -h
>>>>>>>> >> >> >> Filesystem Size Used
Avail
>>>> >> Use%
>>>>>>>> >> >> >> Mounted on
>>>>>>>> >> >> >> /dev/xvda2 5.9G 5.9G
0
>>>> >> 100%
>>>>>>>> >> >> >> /
>>>>>>>> >> >> >> udev 98M 4.0K
98M
>>>> >> 1%
>>>>>>>> >> >> >> /dev
>>>>>>>> >> >> >> tmpfs 48M 192K
48M
>>>> >> 1%
>>>>>>>> >> >> >> /run
>>>>>>>> >> >> >> none 5.0M 0
5.0M
>>>> >> 0%
>>>>>>>> >> >> >> /run/lock
>>>>>>>> >> >> >> none 120M 0
120M
>>>> >> 0%
>>>>>>>> >> >> >> /run/shm
>>>>>>>> >> >> >> overflow 1.0M 4.0K
1020K
>>>> >> 1%
>>>>>>>> >> >> >> /tmp
>>>>>>>> >> >> >> /dev/xvda4 7.9G 147M
7.4G
>>>> >> 2%
>>>>>>>> >> >> >> /mnt
>>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G
75G
>>>> >> 59%
>>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net 198G 108G
75G
>>>> >> 59%
>>>>>>>> >> >> >> /proj/ch-geni-net
>>>>>>>> >> >> >> root@nn:~#
>>>>>>>> >> >> >>
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do
I
>>>>>>>> >> >> >> need
>>>>>>>> >> >> >> to
>>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>>> >> >> >>
>>>>>>>> >> >> >>
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> Thanks & Regards,
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> Abdul Navaz
>>>>>>>> >> >> >> Research Assistant
>>>>>>>> >> >> >> University of Houston Main Campus, Houston TX
>>>>>>>> >> >> >> Ph: 281-685-0388
>>>>>>>> >> >> >>
>>>>>>> >> >> >
>>>>>>> >> >> >
>>>>>> >> >>
>>>>> >> >
>>>> >>
>>> >
>>
>>
>>
>
>
>
>
>
>
>
Re: No space when running a hadoop job
Posted by Abdul Navaz <na...@gmail.com>.
Thank You Very much. This is what I am trying to do.
This is what storage I have.
Filesystem Size Used Avail Use%
Mounted on
/dev/xvda2 5.9G 5.3G 238M 96% /
/dev/xvda4 7.9G 147M 7.4G 2% /mnt
I have configured in dfs.datanode.dir in hdfs-site.
<name>dfs.datanode.data.dir</name>
<value>/mnt</value>
I have formatted the name node and restarted and it is still copying to /
and if it is full it throws an error instead of copying to /mnt¹.
Error:
14/10/03 15:23:21 WARN hdfs.DFSClient: Could not get block locations. Source
file "/user/hduser/getty/data4" - Aborting...
put: java.io.IOException: File /user/hduser/getty/data4 could only be
replicated to 0 nodes, instead of 1
14/10/03 15:23:21 ERROR hdfs.DFSClient: Failed to close file
/user/hduser/getty/data4
Am I doing anything wrong here ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388
From: ViSolve Hadoop Support <ha...@visolve.com>
Reply-To: <us...@hadoop.apache.org>
Date: Friday, October 3, 2014 at 1:29 AM
To: <us...@hadoop.apache.org>
Subject: Re: No space when running a hadoop job
Hello,
If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
After the changes restart the hadoop services and check the available space
using the below command.
# hadoop fs -df -h
Regards,
ViSolve Hadoop Team
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
>
>
> Hello,
>
>
>
>
> As you suggested I have changed the hdfs-site.xml file of datanodes and name
> node as below and formatted the name node.
>
>
>
>
>
>
> </property>
>
>
> <property>
>
>
> <name>dfs.datanode.data.dir</name>
>
>
> <value>/mnt</value>
>
>
> <description>Comma separated list of paths. Use the list of directories from
> $DFS_DATA_DIR.
>
>
> For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
>
> </property>
>
>
>
>
>
>
>
>
>
>
> hduser@dn1:~$ df -h
>
>
> Filesystem Size Used Avail Use% Mounted
> on
>
>
> /dev/xvda2 5.9G 5.3G 258M 96% /
>
>
> udev 98M 4.0K 98M 1% /dev
>
>
> tmpfs 48M 196K 48M 1% /run
>
>
> none 5.0M 0 5.0M 0%
> /run/lock
>
>
> none 120M 0 120M 0%
> /run/shm
>
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 113G 70G 62%
> /groups/ch-geni-net/Hadoop-NET
>
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 113G 70G 62%
> /proj/ch-geni-net
>
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
>
> hduser@dn1:~$
>
>
>
>
>
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>
>
>
>
> Once /dev/xvda2 is full I am getting the below error message.
>
>
>
>
>
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
>
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead of
> 1
>
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNames
> ystem.java:1639)
>
>
>
>
>
>
>
>
>
>
>
>
> Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
> 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
>
>
>
>
>
>
>
>
>
> Thanks & Regards,
>
>
>
>
> Abdul Navaz
>
> Research Assistant
>
> University of Houston Main Campus, Houston TX
>
> Ph: 281-685-0388
>
>
>
>
>
>
>
>
>
> From: Abdul Navaz <na...@gmail.com>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: No space when running a hadoop job
>
>
>
>
>
>
>
>
>
> Dear All,
>
>
>
>
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>
>
>
>
>
>
>
>
>
> hduser@dn1:~$ df -h
>
>
> Filesystem Size Used Avail Use% Mounted
> on
>
>
> /dev/xvda2 5.9G 5.1G 533M 91% /
>
>
> udev 98M 4.0K 98M 1% /dev
>
>
> tmpfs 48M 196K 48M 1% /run
>
>
> none 5.0M 0 5.0M 0%
> /run/lock
>
>
> none 120M 0 120M 0%
> /run/shm
>
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 116G 67G 64%
> /groups/ch-geni-net/Hadoop-NET
>
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 116G 67G 64%
> /proj/ch-geni-net
>
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
>
> hduser@dn1:~$
>
>
> hduser@dn1:~$
>
>
> hduser@dn1:~$
>
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
>
> cp: writing `data3.txt': No space left on device
>
>
> cp: failed to extend `data3.txt': No space left on device
>
>
> hduser@dn1:~$
>
>
>
>
>
>
> I guess by default it is copying to default location. Why I am getting this
> error ? How can I fix this ?
>
>
>
>
>
>
>
> Thanks & Regards,
>
>
>
>
> Abdul Navaz
>
> Research Assistant
>
> University of Houston Main Campus, Houston TX
>
> Ph: 281-685-0388
>
>
>
>
>
>
>
>
>
>
> From: Aitor Cedres <ac...@pivotal.io>
> Reply-To: <us...@hadoop.apache.org>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: No space when running a hadoop job
>
>
>
>
>
>
>
> I think they way it works when HDFS has a list in dfs.datanode.data.dir, it's
> basically a round robin between disks. And yes, it may not be perfect balanced
> cause of different file sizes.
>
>
>
>
>
>
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
>
>> Thank Aitor.
>>
>> That is what is my observation too.
>>
>> I added a new disk location and manually moved some files.
>>
>> But if 2 locations are given at the beginning itself for
>> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>> perfect because file sizes may differ.
>>
>>
>>
>> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>>> > Hi Susheel,
>>> >
>>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>>> > disks straightforward. Eventually, by HDFS activity
>>> (deleting/invalidating
>>> > some block, writing new ones), the disks will become balanced. If you >>>
want
>>> > to balance them right after adding the new disk and changing the
>>> > ³dfs.datanode.data.dir²
>>> > value, you have to shutdown the DN and manually move (mv) some files in
>>> the
>>> > old directory to the new one.
>>> >
>>> > The balancer will try to balance the usage between HDFS nodes, but it
>>> won't
>>> > care about "internal" node disks utilization. For your particular case,
>>> the
>>> > balancer won't fix your issue.
>>> >
>>> > Hope it helps,
>>> > Aitor
>>> >
>>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>>> > wrote:
>>> >
>>>> >> You mean if multiple directory locations are given, Hadoop will
>>>> >> balance the distribution of files across these different directories.
>>>> >>
>>>> >> But normally we start with 1 directory location and once it is
>>>> >> reaching the maximum, we add new directory.
>>>> >>
>>>> >> In this case how can we balance the distribution of files?
>>>> >>
>>>> >> One way is to list the files and move.
>>>> >>
>>>> >> Will start balance script will work?
>>>> >>
>>>> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>>> >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>>> >> > On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>>> <sk...@gmail.com>
>>>>> >> > wrote:
>>>>> >> >
>>>>>> >> >> Correct me if I am wrong.
>>>>>> >> >>
>>>>>> >> >> Adding multiple directories will not balance the files
>>>>>> distributions
>>>>>> >> >> across these locations.
>>>>>> >> >>
>>>>>> >> >> Hadoop will add exhaust the first directory and then start using
the
>>>>>> >> >> next, next ..
>>>>>> >> >>
>>>>>> >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>>> >> >>
>>>>>> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>>> >> >> > You can add a comma separated list of paths to the
>>>>>> >> >> ³dfs.datanode.data.dir²
>>>>>>> >> >> > property in your hdfs-site.xml
>>>>>>> >> >> >
>>>>>>> >> >> > mn
>>>>>>> >> >> >
>>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>>> >> >> > wrote:
>>>>>>> >> >> >
>>>>>>>> >> >> >> Hi
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>>> >> >> >> running
>>>>>>>> >> >> >> map reduce job.
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> root@nn:~# df -h
>>>>>>>> >> >> >> Filesystem Size Used
Avail
>>>> >> Use%
>>>>>>>> >> >> >> Mounted on
>>>>>>>> >> >> >> /dev/xvda2 5.9G 5.9G
0
>>>> >> 100%
>>>>>>>> >> >> >> /
>>>>>>>> >> >> >> udev 98M 4.0K
98M
>>>> >> 1%
>>>>>>>> >> >> >> /dev
>>>>>>>> >> >> >> tmpfs 48M 192K
48M
>>>> >> 1%
>>>>>>>> >> >> >> /run
>>>>>>>> >> >> >> none 5.0M 0
5.0M
>>>> >> 0%
>>>>>>>> >> >> >> /run/lock
>>>>>>>> >> >> >> none 120M 0
120M
>>>> >> 0%
>>>>>>>> >> >> >> /run/shm
>>>>>>>> >> >> >> overflow 1.0M 4.0K
1020K
>>>> >> 1%
>>>>>>>> >> >> >> /tmp
>>>>>>>> >> >> >> /dev/xvda4 7.9G 147M
7.4G
>>>> >> 2%
>>>>>>>> >> >> >> /mnt
>>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G
75G
>>>> >> 59%
>>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net 198G 108G
75G
>>>> >> 59%
>>>>>>>> >> >> >> /proj/ch-geni-net
>>>>>>>> >> >> >> root@nn:~#
>>>>>>>> >> >> >>
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do
I
>>>>>>>> >> >> >> need
>>>>>>>> >> >> >> to
>>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>>> >> >> >>
>>>>>>>> >> >> >>
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> Thanks & Regards,
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> Abdul Navaz
>>>>>>>> >> >> >> Research Assistant
>>>>>>>> >> >> >> University of Houston Main Campus, Houston TX
>>>>>>>> >> >> >> Ph: 281-685-0388
>>>>>>>> >> >> >>
>>>>>>> >> >> >
>>>>>>> >> >> >
>>>>>> >> >>
>>>>> >> >
>>>> >>
>>> >
>>
>>
>>
>
>
>
>
>
>
>
Re: No space when running a hadoop job
Posted by Abdul Navaz <na...@gmail.com>.
Thank You Very much. This is what I am trying to do.
This is what storage I have.
Filesystem Size Used Avail Use%
Mounted on
/dev/xvda2 5.9G 5.3G 238M 96% /
/dev/xvda4 7.9G 147M 7.4G 2% /mnt
I have configured in dfs.datanode.dir in hdfs-site.
<name>dfs.datanode.data.dir</name>
<value>/mnt</value>
I have formatted the name node and restarted and it is still copying to /
and if it is full it throws an error instead of copying to /mnt¹.
Error:
14/10/03 15:23:21 WARN hdfs.DFSClient: Could not get block locations. Source
file "/user/hduser/getty/data4" - Aborting...
put: java.io.IOException: File /user/hduser/getty/data4 could only be
replicated to 0 nodes, instead of 1
14/10/03 15:23:21 ERROR hdfs.DFSClient: Failed to close file
/user/hduser/getty/data4
Am I doing anything wrong here ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388
From: ViSolve Hadoop Support <ha...@visolve.com>
Reply-To: <us...@hadoop.apache.org>
Date: Friday, October 3, 2014 at 1:29 AM
To: <us...@hadoop.apache.org>
Subject: Re: No space when running a hadoop job
Hello,
If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
After the changes restart the hadoop services and check the available space
using the below command.
# hadoop fs -df -h
Regards,
ViSolve Hadoop Team
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
>
>
> Hello,
>
>
>
>
> As you suggested I have changed the hdfs-site.xml file of datanodes and name
> node as below and formatted the name node.
>
>
>
>
>
>
> </property>
>
>
> <property>
>
>
> <name>dfs.datanode.data.dir</name>
>
>
> <value>/mnt</value>
>
>
> <description>Comma separated list of paths. Use the list of directories from
> $DFS_DATA_DIR.
>
>
> For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
>
> </property>
>
>
>
>
>
>
>
>
>
>
> hduser@dn1:~$ df -h
>
>
> Filesystem Size Used Avail Use% Mounted
> on
>
>
> /dev/xvda2 5.9G 5.3G 258M 96% /
>
>
> udev 98M 4.0K 98M 1% /dev
>
>
> tmpfs 48M 196K 48M 1% /run
>
>
> none 5.0M 0 5.0M 0%
> /run/lock
>
>
> none 120M 0 120M 0%
> /run/shm
>
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 113G 70G 62%
> /groups/ch-geni-net/Hadoop-NET
>
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 113G 70G 62%
> /proj/ch-geni-net
>
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
>
> hduser@dn1:~$
>
>
>
>
>
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>
>
>
>
> Once /dev/xvda2 is full I am getting the below error message.
>
>
>
>
>
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
>
> Warning: $HADOOP_HOME is deprecated.
>
>
>
>
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead of
> 1
>
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNames
> ystem.java:1639)
>
>
>
>
>
>
>
>
>
>
>
>
> Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
> 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
>
>
>
>
>
>
>
>
>
> Thanks & Regards,
>
>
>
>
> Abdul Navaz
>
> Research Assistant
>
> University of Houston Main Campus, Houston TX
>
> Ph: 281-685-0388
>
>
>
>
>
>
>
>
>
> From: Abdul Navaz <na...@gmail.com>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: No space when running a hadoop job
>
>
>
>
>
>
>
>
>
> Dear All,
>
>
>
>
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>
>
>
>
>
>
>
>
>
> hduser@dn1:~$ df -h
>
>
> Filesystem Size Used Avail Use% Mounted
> on
>
>
> /dev/xvda2 5.9G 5.1G 533M 91% /
>
>
> udev 98M 4.0K 98M 1% /dev
>
>
> tmpfs 48M 196K 48M 1% /run
>
>
> none 5.0M 0 5.0M 0%
> /run/lock
>
>
> none 120M 0 120M 0%
> /run/shm
>
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 116G 67G 64%
> /groups/ch-geni-net/Hadoop-NET
>
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 116G 67G 64%
> /proj/ch-geni-net
>
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
>
> hduser@dn1:~$
>
>
> hduser@dn1:~$
>
>
> hduser@dn1:~$
>
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
>
> cp: writing `data3.txt': No space left on device
>
>
> cp: failed to extend `data3.txt': No space left on device
>
>
> hduser@dn1:~$
>
>
>
>
>
>
> I guess by default it is copying to default location. Why I am getting this
> error ? How can I fix this ?
>
>
>
>
>
>
>
> Thanks & Regards,
>
>
>
>
> Abdul Navaz
>
> Research Assistant
>
> University of Houston Main Campus, Houston TX
>
> Ph: 281-685-0388
>
>
>
>
>
>
>
>
>
>
> From: Aitor Cedres <ac...@pivotal.io>
> Reply-To: <us...@hadoop.apache.org>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: No space when running a hadoop job
>
>
>
>
>
>
>
> I think they way it works when HDFS has a list in dfs.datanode.data.dir, it's
> basically a round robin between disks. And yes, it may not be perfect balanced
> cause of different file sizes.
>
>
>
>
>
>
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
>
>> Thank Aitor.
>>
>> That is what is my observation too.
>>
>> I added a new disk location and manually moved some files.
>>
>> But if 2 locations are given at the beginning itself for
>> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>> perfect because file sizes may differ.
>>
>>
>>
>> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>>> > Hi Susheel,
>>> >
>>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>>> > disks straightforward. Eventually, by HDFS activity
>>> (deleting/invalidating
>>> > some block, writing new ones), the disks will become balanced. If you >>>
want
>>> > to balance them right after adding the new disk and changing the
>>> > ³dfs.datanode.data.dir²
>>> > value, you have to shutdown the DN and manually move (mv) some files in
>>> the
>>> > old directory to the new one.
>>> >
>>> > The balancer will try to balance the usage between HDFS nodes, but it
>>> won't
>>> > care about "internal" node disks utilization. For your particular case,
>>> the
>>> > balancer won't fix your issue.
>>> >
>>> > Hope it helps,
>>> > Aitor
>>> >
>>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>>> > wrote:
>>> >
>>>> >> You mean if multiple directory locations are given, Hadoop will
>>>> >> balance the distribution of files across these different directories.
>>>> >>
>>>> >> But normally we start with 1 directory location and once it is
>>>> >> reaching the maximum, we add new directory.
>>>> >>
>>>> >> In this case how can we balance the distribution of files?
>>>> >>
>>>> >> One way is to list the files and move.
>>>> >>
>>>> >> Will start balance script will work?
>>>> >>
>>>> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>>> >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>>> >> > On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>>> <sk...@gmail.com>
>>>>> >> > wrote:
>>>>> >> >
>>>>>> >> >> Correct me if I am wrong.
>>>>>> >> >>
>>>>>> >> >> Adding multiple directories will not balance the files
>>>>>> distributions
>>>>>> >> >> across these locations.
>>>>>> >> >>
>>>>>> >> >> Hadoop will add exhaust the first directory and then start using
the
>>>>>> >> >> next, next ..
>>>>>> >> >>
>>>>>> >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>>> >> >>
>>>>>> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>>> >> >> > You can add a comma separated list of paths to the
>>>>>> >> >> ³dfs.datanode.data.dir²
>>>>>>> >> >> > property in your hdfs-site.xml
>>>>>>> >> >> >
>>>>>>> >> >> > mn
>>>>>>> >> >> >
>>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>>> >> >> > wrote:
>>>>>>> >> >> >
>>>>>>>> >> >> >> Hi
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>>> >> >> >> running
>>>>>>>> >> >> >> map reduce job.
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> root@nn:~# df -h
>>>>>>>> >> >> >> Filesystem Size Used
Avail
>>>> >> Use%
>>>>>>>> >> >> >> Mounted on
>>>>>>>> >> >> >> /dev/xvda2 5.9G 5.9G
0
>>>> >> 100%
>>>>>>>> >> >> >> /
>>>>>>>> >> >> >> udev 98M 4.0K
98M
>>>> >> 1%
>>>>>>>> >> >> >> /dev
>>>>>>>> >> >> >> tmpfs 48M 192K
48M
>>>> >> 1%
>>>>>>>> >> >> >> /run
>>>>>>>> >> >> >> none 5.0M 0
5.0M
>>>> >> 0%
>>>>>>>> >> >> >> /run/lock
>>>>>>>> >> >> >> none 120M 0
120M
>>>> >> 0%
>>>>>>>> >> >> >> /run/shm
>>>>>>>> >> >> >> overflow 1.0M 4.0K
1020K
>>>> >> 1%
>>>>>>>> >> >> >> /tmp
>>>>>>>> >> >> >> /dev/xvda4 7.9G 147M
7.4G
>>>> >> 2%
>>>>>>>> >> >> >> /mnt
>>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G
75G
>>>> >> 59%
>>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net 198G 108G
75G
>>>> >> 59%
>>>>>>>> >> >> >> /proj/ch-geni-net
>>>>>>>> >> >> >> root@nn:~#
>>>>>>>> >> >> >>
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do
I
>>>>>>>> >> >> >> need
>>>>>>>> >> >> >> to
>>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>>> >> >> >>
>>>>>>>> >> >> >>
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> Thanks & Regards,
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> Abdul Navaz
>>>>>>>> >> >> >> Research Assistant
>>>>>>>> >> >> >> University of Houston Main Campus, Houston TX
>>>>>>>> >> >> >> Ph: 281-685-0388
>>>>>>>> >> >> >>
>>>>>>> >> >> >
>>>>>>> >> >> >
>>>>>> >> >>
>>>>> >> >
>>>> >>
>>> >
>>
>>
>>
>
>
>
>
>
>
>
Re: No space when running a hadoop job
Posted by ViSolve Hadoop Support <ha...@visolve.com>.
Hello,
If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
After the changes restart the hadoop services and check the available
space using the below command.
# hadoop fs -df -h
Regards,
ViSolve Hadoop Team
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
> Hello,
>
> As you suggested I have changed the hdfs-site.xml file of datanodes
> and name node as below and formatted the name node.
>
> </property>
>
> <property>
>
> <name>dfs.datanode.data.dir</name>
>
> <value>/mnt</value>
>
> <description>Comma separated list of paths. Use the list of
> directories from $DFS_DATA_DIR.
>
> For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
> </property>
>
>
>
> hduser@dn1:~$ df -h
>
> Filesystem Size Used Avail Use% Mounted on
>
> /dev/xvda2 5.9G 5.3G 258M 96% /
>
> udev 98M 4.0K 98M 1% /dev
>
> tmpfs 48M 196K 48M 1% /run
>
> none 5.0M 0 5.0M 0% /run/lock
>
> none 120M 0 120M 0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 113G 70G 62%
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 113G 70G 62%
> /proj/ch-geni-net
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
> hduser@dn1:~$
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>
> Once /dev/xvda2 is full I am getting the below error message.
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
> Warning: $HADOOP_HOME is deprecated.
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes,
> instead of 1
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>
>
>
>
> Let me say like this: I don't want to use /dev/xvda2 as it has
> capacity of 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Abdul Navaz <navaz.enc@gmail.com <ma...@gmail.com>>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
> Dear All,
>
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>
>
> hduser@dn1:~$ df -h
>
> Filesystem Size Used Avail Use%
> Mounted on
>
> /dev/xvda2 5.9G 5.1G 533M 91% /
>
> udev 98M 4.0K 98M 1% /dev
>
> tmpfs 48M 196K 48M 1% /run
>
> none 5.0M 0 5.0M 0% /run/lock
>
> none 120M 0 120M 0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 116G 67G 64%
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 116G 67G 64%
> /proj/ch-geni-net
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
> cp: writing `data3.txt': No space left on device
>
> cp: failed to extend `data3.txt': No space left on device
>
> hduser@dn1:~$
>
>
> I guess by default it is copying to default location. Why I am getting
> this error ? How can I fix this ?
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Aitor Cedres <acedres@pivotal.io <ma...@pivotal.io>>
> Reply-To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
>
> I think they way it works when HDFS has a list
> in dfs.datanode.data.dir, it's basically a round robin between disks.
> And yes, it may not be perfect balanced cause of different file sizes.
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <skgadalay@gmail.com
> <ma...@gmail.com>> wrote:
>
> Thank Aitor.
>
> That is what is my observation too.
>
> I added a new disk location and manually moved some files.
>
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
>
> On 9/29/14, Aitor Cedres <acedres@pivotal.io
> <ma...@pivotal.io>> wrote:
> > Hi Susheel,
> >
> > Adding a new directory to "dfs.datanode.data.dir" will not
> balance your
> > disks straightforward. Eventually, by HDFS activity
> (deleting/invalidating
> > some block, writing new ones), the disks will become balanced.
> If you want
> > to balance them right after adding the new disk and changing the
> > "dfs.datanode.data.dir"
> > value, you have to shutdown the DN and manually move (mv) some
> files in the
> > old directory to the new one.
> >
> > The balancer will try to balance the usage between HDFS nodes,
> but it won't
> > care about "internal" node disks utilization. For your
> particular case, the
> > balancer won't fix your issue.
> >
> > Hope it helps,
> > Aitor
> >
> > On 29 September 2014 05:53, Susheel Kumar Gadalay
> <skgadalay@gmail.com <ma...@gmail.com>>
> > wrote:
> >
> >> You mean if multiple directory locations are given, Hadoop will
> >> balance the distribution of files across these different
> directories.
> >>
> >> But normally we start with 1 directory location and once it is
> >> reaching the maximum, we add new directory.
> >>
> >> In this case how can we balance the distribution of files?
> >>
> >> One way is to list the files and move.
> >>
> >> Will start balance script will work?
> >>
> >> On 9/27/14, Alexander Pivovarov <apivovarov@gmail.com
> <ma...@gmail.com>> wrote:
> >> > It can read/write in parallel to all drives. More hdd more io
> speed.
> >> > On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
> <skgadalay@gmail.com <ma...@gmail.com>>
> >> > wrote:
> >> >
> >> >> Correct me if I am wrong.
> >> >>
> >> >> Adding multiple directories will not balance the files
> distributions
> >> >> across these locations.
> >> >>
> >> >> Hadoop will add exhaust the first directory and then start
> using the
> >> >> next, next ..
> >> >>
> >> >> How can I tell Hadoop to evenly balance across these
> directories.
> >> >>
> >> >> On 9/26/14, Matt Narrell <matt.narrell@gmail.com
> <ma...@gmail.com>> wrote:
> >> >> > You can add a comma separated list of paths to the
> >> >> "dfs.datanode.data.dir"
> >> >> > property in your hdfs-site.xml
> >> >> >
> >> >> > mn
> >> >> >
> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz
> <navaz.enc@gmail.com <ma...@gmail.com>>
> >> >> > wrote:
> >> >> >
> >> >> >> Hi
> >> >> >>
> >> >> >> I am facing some space issue when I saving file into HDFS
> and/or
> >> >> >> running
> >> >> >> map reduce job.
> >> >> >>
> >> >> >> root@nn:~# df -h
> >> >> >> Filesystem Size Used Avail
> >> Use%
> >> >> >> Mounted on
> >> >> >> /dev/xvda2 5.9G 5.9G 0
> >> 100%
> >> >> >> /
> >> >> >> udev 98M 4.0K 98M
> >> 1%
> >> >> >> /dev
> >> >> >> tmpfs 48M 192K 48M
> >> 1%
> >> >> >> /run
> >> >> >> none 5.0M 0 5.0M
> >> 0%
> >> >> >> /run/lock
> >> >> >> none 120M 0 120M
> >> 0%
> >> >> >> /run/shm
> >> >> >> overflow 1.0M 4.0K 1020K
> >> 1%
> >> >> >> /tmp
> >> >> >> /dev/xvda4 7.9G 147M 7.4G
> >> 2%
> >> >> >> /mnt
> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G
> 108G 75G
> >> 59%
> >> >> >> /groups/ch-geni-net/Hadoop-NET
> >> >> >> 172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G
> >> 59%
> >> >> >> /proj/ch-geni-net
> >> >> >> root@nn:~#
> >> >> >>
> >> >> >>
> >> >> >> I can see there is no space left on /dev/xvda2.
> >> >> >>
> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ?
> Or do I
> >> >> >> need
> >> >> >> to
> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Thanks & Regards,
> >> >> >>
> >> >> >> Abdul Navaz
> >> >> >> Research Assistant
> >> >> >> University of Houston Main Campus, Houston TX
> >> >> >> Ph: 281-685-0388
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> >
>
>
Re: No space when running a hadoop job
Posted by ViSolve Hadoop Support <ha...@visolve.com>.
Hello,
If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
After the changes restart the hadoop services and check the available
space using the below command.
# hadoop fs -df -h
Regards,
ViSolve Hadoop Team
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
> Hello,
>
> As you suggested I have changed the hdfs-site.xml file of datanodes
> and name node as below and formatted the name node.
>
> </property>
>
> <property>
>
> <name>dfs.datanode.data.dir</name>
>
> <value>/mnt</value>
>
> <description>Comma separated list of paths. Use the list of
> directories from $DFS_DATA_DIR.
>
> For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
> </property>
>
>
>
> hduser@dn1:~$ df -h
>
> Filesystem Size Used Avail Use% Mounted on
>
> /dev/xvda2 5.9G 5.3G 258M 96% /
>
> udev 98M 4.0K 98M 1% /dev
>
> tmpfs 48M 196K 48M 1% /run
>
> none 5.0M 0 5.0M 0% /run/lock
>
> none 120M 0 120M 0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 113G 70G 62%
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 113G 70G 62%
> /proj/ch-geni-net
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
> hduser@dn1:~$
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>
> Once /dev/xvda2 is full I am getting the below error message.
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
> Warning: $HADOOP_HOME is deprecated.
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes,
> instead of 1
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>
>
>
>
> Let me say like this: I don't want to use /dev/xvda2 as it has
> capacity of 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Abdul Navaz <navaz.enc@gmail.com <ma...@gmail.com>>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
> Dear All,
>
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>
>
> hduser@dn1:~$ df -h
>
> Filesystem Size Used Avail Use%
> Mounted on
>
> /dev/xvda2 5.9G 5.1G 533M 91% /
>
> udev 98M 4.0K 98M 1% /dev
>
> tmpfs 48M 196K 48M 1% /run
>
> none 5.0M 0 5.0M 0% /run/lock
>
> none 120M 0 120M 0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 116G 67G 64%
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 116G 67G 64%
> /proj/ch-geni-net
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
> cp: writing `data3.txt': No space left on device
>
> cp: failed to extend `data3.txt': No space left on device
>
> hduser@dn1:~$
>
>
> I guess by default it is copying to default location. Why I am getting
> this error ? How can I fix this ?
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Aitor Cedres <acedres@pivotal.io <ma...@pivotal.io>>
> Reply-To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
>
> I think they way it works when HDFS has a list
> in dfs.datanode.data.dir, it's basically a round robin between disks.
> And yes, it may not be perfect balanced cause of different file sizes.
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <skgadalay@gmail.com
> <ma...@gmail.com>> wrote:
>
> Thank Aitor.
>
> That is what is my observation too.
>
> I added a new disk location and manually moved some files.
>
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
>
> On 9/29/14, Aitor Cedres <acedres@pivotal.io
> <ma...@pivotal.io>> wrote:
> > Hi Susheel,
> >
> > Adding a new directory to "dfs.datanode.data.dir" will not
> balance your
> > disks straightforward. Eventually, by HDFS activity
> (deleting/invalidating
> > some block, writing new ones), the disks will become balanced.
> If you want
> > to balance them right after adding the new disk and changing the
> > "dfs.datanode.data.dir"
> > value, you have to shutdown the DN and manually move (mv) some
> files in the
> > old directory to the new one.
> >
> > The balancer will try to balance the usage between HDFS nodes,
> but it won't
> > care about "internal" node disks utilization. For your
> particular case, the
> > balancer won't fix your issue.
> >
> > Hope it helps,
> > Aitor
> >
> > On 29 September 2014 05:53, Susheel Kumar Gadalay
> <skgadalay@gmail.com <ma...@gmail.com>>
> > wrote:
> >
> >> You mean if multiple directory locations are given, Hadoop will
> >> balance the distribution of files across these different
> directories.
> >>
> >> But normally we start with 1 directory location and once it is
> >> reaching the maximum, we add new directory.
> >>
> >> In this case how can we balance the distribution of files?
> >>
> >> One way is to list the files and move.
> >>
> >> Will start balance script will work?
> >>
> >> On 9/27/14, Alexander Pivovarov <apivovarov@gmail.com
> <ma...@gmail.com>> wrote:
> >> > It can read/write in parallel to all drives. More hdd more io
> speed.
> >> > On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
> <skgadalay@gmail.com <ma...@gmail.com>>
> >> > wrote:
> >> >
> >> >> Correct me if I am wrong.
> >> >>
> >> >> Adding multiple directories will not balance the files
> distributions
> >> >> across these locations.
> >> >>
> >> >> Hadoop will add exhaust the first directory and then start
> using the
> >> >> next, next ..
> >> >>
> >> >> How can I tell Hadoop to evenly balance across these
> directories.
> >> >>
> >> >> On 9/26/14, Matt Narrell <matt.narrell@gmail.com
> <ma...@gmail.com>> wrote:
> >> >> > You can add a comma separated list of paths to the
> >> >> "dfs.datanode.data.dir"
> >> >> > property in your hdfs-site.xml
> >> >> >
> >> >> > mn
> >> >> >
> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz
> <navaz.enc@gmail.com <ma...@gmail.com>>
> >> >> > wrote:
> >> >> >
> >> >> >> Hi
> >> >> >>
> >> >> >> I am facing some space issue when I saving file into HDFS
> and/or
> >> >> >> running
> >> >> >> map reduce job.
> >> >> >>
> >> >> >> root@nn:~# df -h
> >> >> >> Filesystem Size Used Avail
> >> Use%
> >> >> >> Mounted on
> >> >> >> /dev/xvda2 5.9G 5.9G 0
> >> 100%
> >> >> >> /
> >> >> >> udev 98M 4.0K 98M
> >> 1%
> >> >> >> /dev
> >> >> >> tmpfs 48M 192K 48M
> >> 1%
> >> >> >> /run
> >> >> >> none 5.0M 0 5.0M
> >> 0%
> >> >> >> /run/lock
> >> >> >> none 120M 0 120M
> >> 0%
> >> >> >> /run/shm
> >> >> >> overflow 1.0M 4.0K 1020K
> >> 1%
> >> >> >> /tmp
> >> >> >> /dev/xvda4 7.9G 147M 7.4G
> >> 2%
> >> >> >> /mnt
> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G
> 108G 75G
> >> 59%
> >> >> >> /groups/ch-geni-net/Hadoop-NET
> >> >> >> 172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G
> >> 59%
> >> >> >> /proj/ch-geni-net
> >> >> >> root@nn:~#
> >> >> >>
> >> >> >>
> >> >> >> I can see there is no space left on /dev/xvda2.
> >> >> >>
> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ?
> Or do I
> >> >> >> need
> >> >> >> to
> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Thanks & Regards,
> >> >> >>
> >> >> >> Abdul Navaz
> >> >> >> Research Assistant
> >> >> >> University of Houston Main Campus, Houston TX
> >> >> >> Ph: 281-685-0388
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> >
>
>
Re: No space when running a hadoop job
Posted by ViSolve Hadoop Support <ha...@visolve.com>.
Hello,
If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
After the changes restart the hadoop services and check the available
space using the below command.
# hadoop fs -df -h
Regards,
ViSolve Hadoop Team
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
> Hello,
>
> As you suggested I have changed the hdfs-site.xml file of datanodes
> and name node as below and formatted the name node.
>
> </property>
>
> <property>
>
> <name>dfs.datanode.data.dir</name>
>
> <value>/mnt</value>
>
> <description>Comma separated list of paths. Use the list of
> directories from $DFS_DATA_DIR.
>
> For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
> </property>
>
>
>
> hduser@dn1:~$ df -h
>
> Filesystem Size Used Avail Use% Mounted on
>
> /dev/xvda2 5.9G 5.3G 258M 96% /
>
> udev 98M 4.0K 98M 1% /dev
>
> tmpfs 48M 196K 48M 1% /run
>
> none 5.0M 0 5.0M 0% /run/lock
>
> none 120M 0 120M 0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 113G 70G 62%
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 113G 70G 62%
> /proj/ch-geni-net
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
> hduser@dn1:~$
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>
> Once /dev/xvda2 is full I am getting the below error message.
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
> Warning: $HADOOP_HOME is deprecated.
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes,
> instead of 1
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>
>
>
>
> Let me say like this: I don't want to use /dev/xvda2 as it has
> capacity of 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Abdul Navaz <navaz.enc@gmail.com <ma...@gmail.com>>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
> Dear All,
>
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>
>
> hduser@dn1:~$ df -h
>
> Filesystem Size Used Avail Use%
> Mounted on
>
> /dev/xvda2 5.9G 5.1G 533M 91% /
>
> udev 98M 4.0K 98M 1% /dev
>
> tmpfs 48M 196K 48M 1% /run
>
> none 5.0M 0 5.0M 0% /run/lock
>
> none 120M 0 120M 0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 116G 67G 64%
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 116G 67G 64%
> /proj/ch-geni-net
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
> cp: writing `data3.txt': No space left on device
>
> cp: failed to extend `data3.txt': No space left on device
>
> hduser@dn1:~$
>
>
> I guess by default it is copying to default location. Why I am getting
> this error ? How can I fix this ?
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Aitor Cedres <acedres@pivotal.io <ma...@pivotal.io>>
> Reply-To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
>
> I think they way it works when HDFS has a list
> in dfs.datanode.data.dir, it's basically a round robin between disks.
> And yes, it may not be perfect balanced cause of different file sizes.
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <skgadalay@gmail.com
> <ma...@gmail.com>> wrote:
>
> Thank Aitor.
>
> That is what is my observation too.
>
> I added a new disk location and manually moved some files.
>
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
>
> On 9/29/14, Aitor Cedres <acedres@pivotal.io
> <ma...@pivotal.io>> wrote:
> > Hi Susheel,
> >
> > Adding a new directory to "dfs.datanode.data.dir" will not
> balance your
> > disks straightforward. Eventually, by HDFS activity
> (deleting/invalidating
> > some block, writing new ones), the disks will become balanced.
> If you want
> > to balance them right after adding the new disk and changing the
> > "dfs.datanode.data.dir"
> > value, you have to shutdown the DN and manually move (mv) some
> files in the
> > old directory to the new one.
> >
> > The balancer will try to balance the usage between HDFS nodes,
> but it won't
> > care about "internal" node disks utilization. For your
> particular case, the
> > balancer won't fix your issue.
> >
> > Hope it helps,
> > Aitor
> >
> > On 29 September 2014 05:53, Susheel Kumar Gadalay
> <skgadalay@gmail.com <ma...@gmail.com>>
> > wrote:
> >
> >> You mean if multiple directory locations are given, Hadoop will
> >> balance the distribution of files across these different
> directories.
> >>
> >> But normally we start with 1 directory location and once it is
> >> reaching the maximum, we add new directory.
> >>
> >> In this case how can we balance the distribution of files?
> >>
> >> One way is to list the files and move.
> >>
> >> Will start balance script will work?
> >>
> >> On 9/27/14, Alexander Pivovarov <apivovarov@gmail.com
> <ma...@gmail.com>> wrote:
> >> > It can read/write in parallel to all drives. More hdd more io
> speed.
> >> > On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
> <skgadalay@gmail.com <ma...@gmail.com>>
> >> > wrote:
> >> >
> >> >> Correct me if I am wrong.
> >> >>
> >> >> Adding multiple directories will not balance the files
> distributions
> >> >> across these locations.
> >> >>
> >> >> Hadoop will add exhaust the first directory and then start
> using the
> >> >> next, next ..
> >> >>
> >> >> How can I tell Hadoop to evenly balance across these
> directories.
> >> >>
> >> >> On 9/26/14, Matt Narrell <matt.narrell@gmail.com
> <ma...@gmail.com>> wrote:
> >> >> > You can add a comma separated list of paths to the
> >> >> "dfs.datanode.data.dir"
> >> >> > property in your hdfs-site.xml
> >> >> >
> >> >> > mn
> >> >> >
> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz
> <navaz.enc@gmail.com <ma...@gmail.com>>
> >> >> > wrote:
> >> >> >
> >> >> >> Hi
> >> >> >>
> >> >> >> I am facing some space issue when I saving file into HDFS
> and/or
> >> >> >> running
> >> >> >> map reduce job.
> >> >> >>
> >> >> >> root@nn:~# df -h
> >> >> >> Filesystem Size Used Avail
> >> Use%
> >> >> >> Mounted on
> >> >> >> /dev/xvda2 5.9G 5.9G 0
> >> 100%
> >> >> >> /
> >> >> >> udev 98M 4.0K 98M
> >> 1%
> >> >> >> /dev
> >> >> >> tmpfs 48M 192K 48M
> >> 1%
> >> >> >> /run
> >> >> >> none 5.0M 0 5.0M
> >> 0%
> >> >> >> /run/lock
> >> >> >> none 120M 0 120M
> >> 0%
> >> >> >> /run/shm
> >> >> >> overflow 1.0M 4.0K 1020K
> >> 1%
> >> >> >> /tmp
> >> >> >> /dev/xvda4 7.9G 147M 7.4G
> >> 2%
> >> >> >> /mnt
> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G
> 108G 75G
> >> 59%
> >> >> >> /groups/ch-geni-net/Hadoop-NET
> >> >> >> 172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G
> >> 59%
> >> >> >> /proj/ch-geni-net
> >> >> >> root@nn:~#
> >> >> >>
> >> >> >>
> >> >> >> I can see there is no space left on /dev/xvda2.
> >> >> >>
> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ?
> Or do I
> >> >> >> need
> >> >> >> to
> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Thanks & Regards,
> >> >> >>
> >> >> >> Abdul Navaz
> >> >> >> Research Assistant
> >> >> >> University of Houston Main Campus, Houston TX
> >> >> >> Ph: 281-685-0388
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> >
>
>
Re: No space when running a hadoop job
Posted by ViSolve Hadoop Support <ha...@visolve.com>.
Hello,
If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
After the changes restart the hadoop services and check the available
space using the below command.
# hadoop fs -df -h
Regards,
ViSolve Hadoop Team
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
> Hello,
>
> As you suggested I have changed the hdfs-site.xml file of datanodes
> and name node as below and formatted the name node.
>
> </property>
>
> <property>
>
> <name>dfs.datanode.data.dir</name>
>
> <value>/mnt</value>
>
> <description>Comma separated list of paths. Use the list of
> directories from $DFS_DATA_DIR.
>
> For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
> </property>
>
>
>
> hduser@dn1:~$ df -h
>
> Filesystem Size Used Avail Use% Mounted on
>
> /dev/xvda2 5.9G 5.3G 258M 96% /
>
> udev 98M 4.0K 98M 1% /dev
>
> tmpfs 48M 196K 48M 1% /run
>
> none 5.0M 0 5.0M 0% /run/lock
>
> none 120M 0 120M 0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 113G 70G 62%
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 113G 70G 62%
> /proj/ch-geni-net
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
> hduser@dn1:~$
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>
> Once /dev/xvda2 is full I am getting the below error message.
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
> Warning: $HADOOP_HOME is deprecated.
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes,
> instead of 1
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>
>
>
>
> Let me say like this: I don't want to use /dev/xvda2 as it has
> capacity of 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Abdul Navaz <navaz.enc@gmail.com <ma...@gmail.com>>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
> Dear All,
>
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>
>
> hduser@dn1:~$ df -h
>
> Filesystem Size Used Avail Use%
> Mounted on
>
> /dev/xvda2 5.9G 5.1G 533M 91% /
>
> udev 98M 4.0K 98M 1% /dev
>
> tmpfs 48M 196K 48M 1% /run
>
> none 5.0M 0 5.0M 0% /run/lock
>
> none 120M 0 120M 0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 116G 67G 64%
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net 198G 116G 67G 64%
> /proj/ch-geni-net
>
> /dev/xvda4 7.9G 147M 7.4G 2% /mnt
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
> cp: writing `data3.txt': No space left on device
>
> cp: failed to extend `data3.txt': No space left on device
>
> hduser@dn1:~$
>
>
> I guess by default it is copying to default location. Why I am getting
> this error ? How can I fix this ?
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Aitor Cedres <acedres@pivotal.io <ma...@pivotal.io>>
> Reply-To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
>
> I think they way it works when HDFS has a list
> in dfs.datanode.data.dir, it's basically a round robin between disks.
> And yes, it may not be perfect balanced cause of different file sizes.
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <skgadalay@gmail.com
> <ma...@gmail.com>> wrote:
>
> Thank Aitor.
>
> That is what is my observation too.
>
> I added a new disk location and manually moved some files.
>
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
>
> On 9/29/14, Aitor Cedres <acedres@pivotal.io
> <ma...@pivotal.io>> wrote:
> > Hi Susheel,
> >
> > Adding a new directory to "dfs.datanode.data.dir" will not
> balance your
> > disks straightforward. Eventually, by HDFS activity
> (deleting/invalidating
> > some block, writing new ones), the disks will become balanced.
> If you want
> > to balance them right after adding the new disk and changing the
> > "dfs.datanode.data.dir"
> > value, you have to shutdown the DN and manually move (mv) some
> files in the
> > old directory to the new one.
> >
> > The balancer will try to balance the usage between HDFS nodes,
> but it won't
> > care about "internal" node disks utilization. For your
> particular case, the
> > balancer won't fix your issue.
> >
> > Hope it helps,
> > Aitor
> >
> > On 29 September 2014 05:53, Susheel Kumar Gadalay
> <skgadalay@gmail.com <ma...@gmail.com>>
> > wrote:
> >
> >> You mean if multiple directory locations are given, Hadoop will
> >> balance the distribution of files across these different
> directories.
> >>
> >> But normally we start with 1 directory location and once it is
> >> reaching the maximum, we add new directory.
> >>
> >> In this case how can we balance the distribution of files?
> >>
> >> One way is to list the files and move.
> >>
> >> Will start balance script will work?
> >>
> >> On 9/27/14, Alexander Pivovarov <apivovarov@gmail.com
> <ma...@gmail.com>> wrote:
> >> > It can read/write in parallel to all drives. More hdd more io
> speed.
> >> > On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
> <skgadalay@gmail.com <ma...@gmail.com>>
> >> > wrote:
> >> >
> >> >> Correct me if I am wrong.
> >> >>
> >> >> Adding multiple directories will not balance the files
> distributions
> >> >> across these locations.
> >> >>
> >> >> Hadoop will add exhaust the first directory and then start
> using the
> >> >> next, next ..
> >> >>
> >> >> How can I tell Hadoop to evenly balance across these
> directories.
> >> >>
> >> >> On 9/26/14, Matt Narrell <matt.narrell@gmail.com
> <ma...@gmail.com>> wrote:
> >> >> > You can add a comma separated list of paths to the
> >> >> "dfs.datanode.data.dir"
> >> >> > property in your hdfs-site.xml
> >> >> >
> >> >> > mn
> >> >> >
> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz
> <navaz.enc@gmail.com <ma...@gmail.com>>
> >> >> > wrote:
> >> >> >
> >> >> >> Hi
> >> >> >>
> >> >> >> I am facing some space issue when I saving file into HDFS
> and/or
> >> >> >> running
> >> >> >> map reduce job.
> >> >> >>
> >> >> >> root@nn:~# df -h
> >> >> >> Filesystem Size Used Avail
> >> Use%
> >> >> >> Mounted on
> >> >> >> /dev/xvda2 5.9G 5.9G 0
> >> 100%
> >> >> >> /
> >> >> >> udev 98M 4.0K 98M
> >> 1%
> >> >> >> /dev
> >> >> >> tmpfs 48M 192K 48M
> >> 1%
> >> >> >> /run
> >> >> >> none 5.0M 0 5.0M
> >> 0%
> >> >> >> /run/lock
> >> >> >> none 120M 0 120M
> >> 0%
> >> >> >> /run/shm
> >> >> >> overflow 1.0M 4.0K 1020K
> >> 1%
> >> >> >> /tmp
> >> >> >> /dev/xvda4 7.9G 147M 7.4G
> >> 2%
> >> >> >> /mnt
> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G
> 108G 75G
> >> 59%
> >> >> >> /groups/ch-geni-net/Hadoop-NET
> >> >> >> 172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G
> >> 59%
> >> >> >> /proj/ch-geni-net
> >> >> >> root@nn:~#
> >> >> >>
> >> >> >>
> >> >> >> I can see there is no space left on /dev/xvda2.
> >> >> >>
> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ?
> Or do I
> >> >> >> need
> >> >> >> to
> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Thanks & Regards,
> >> >> >>
> >> >> >> Abdul Navaz
> >> >> >> Research Assistant
> >> >> >> University of Houston Main Campus, Houston TX
> >> >> >> Ph: 281-685-0388
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> >
>
>