You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Abdul Navaz <na...@gmail.com> on 2014/10/03 01:06:07 UTC

Re: No space when running a hadoop job

Hello,

As you suggested I have changed the hdfs-site.xml file of datanodes and name
node as below and formatted the name node.

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/mnt</value>

<description>Comma separated list of paths. Use the list of directories from
$DFS_DATA_DIR.

                For example,
/grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>

</property>



hduser@dn1:~$ df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.3G  258M  96% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  196K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  113G   70G  62%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62%
/proj/ch-geni-net

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

hduser@dn1:~$ 



Even after doing so, the file is copied only to /dev/xvda2 instead of
/dev/xvda4.

Once /dev/xvda2 is full I am getting the below error message.

hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt

Warning: $HADOOP_HOME is deprecated.



14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead
of 1

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam
esystem.java:1639)




Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
5.9GB , I want to use only /dev/xvda4. How can I do this ?




Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Abdul Navaz <na...@gmail.com>
Date:  Monday, September 29, 2014 at 1:53 PM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job

Dear All,

I am not doing load balancing here. I am just copying a file and it is
throwing me an error no space left on the device.


hduser@dn1:~$ df -h

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.1G  533M  91% /

udev                                              98M  4.0K   98M   1% /dev

tmpfs                                             48M  196K   48M   1% /run

none                                             5.0M     0  5.0M   0%
/run/lock

none                                             120M     0  120M   0%
/run/shm

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
/proj/ch-geni-net

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ 

hduser@dn1:~$ cp data2.txt data3.txt

cp: writing `data3.txt': No space left on device

cp: failed to extend `data3.txt': No space left on device

hduser@dn1:~$ 


I guess by default it is copying to default location. Why I am getting this
error ? How can I fix this ?


Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Aitor Cedres <ac...@pivotal.io>
Reply-To:  <us...@hadoop.apache.org>
Date:  Monday, September 29, 2014 at 7:53 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job


I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.


On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com>
wrote:
> Thank Aitor.
> 
> That is what is my observation too.
> 
> I added a new disk location and manually moved some files.
> 
> But if 2 locations are given at the beginning itself for
> dfs.datanode.data.dir, will hadoop balance the disks usage, if not
> perfect because file sizes may differ.
> 
> On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>> > Hi Susheel,
>> >
>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>> > disks straightforward. Eventually, by HDFS activity (deleting/invalidating
>> > some block, writing new ones), the disks will become balanced. If you want
>> > to balance them right after adding the new disk and changing the
>> > ³dfs.datanode.data.dir²
>> > value, you have to shutdown the DN and manually move (mv) some files in the
>> > old directory to the new one.
>> >
>> > The balancer will try to balance the usage between HDFS nodes, but it won't
>> > care about "internal" node disks utilization. For your particular case, the
>> > balancer won't fix your issue.
>> >
>> > Hope it helps,
>> > Aitor
>> >
>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>> > wrote:
>> >
>>> >> You mean if multiple directory locations are given, Hadoop will
>>> >> balance the distribution of files across these different directories.
>>> >>
>>> >> But normally we start with 1 directory location and once it is
>>> >> reaching the maximum, we add new directory.
>>> >>
>>> >> In this case how can we balance the distribution of files?
>>> >>
>>> >> One way is to list the files and move.
>>> >>
>>> >> Will start balance script will work?
>>> >>
>>> >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>> >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>> >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>> <sk...@gmail.com>
>>>> >> > wrote:
>>>> >> >
>>>>> >> >> Correct me if I am wrong.
>>>>> >> >>
>>>>> >> >> Adding multiple directories will not balance the files distributions
>>>>> >> >> across these locations.
>>>>> >> >>
>>>>> >> >> Hadoop will add exhaust the first directory and then start using the
>>>>> >> >> next, next ..
>>>>> >> >>
>>>>> >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>> >> >>
>>>>> >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>> >> >> > You can add a comma separated list of paths to the
>>>>> >> >> ³dfs.datanode.data.dir²
>>>>>> >> >> > property in your hdfs-site.xml
>>>>>> >> >> >
>>>>>> >> >> > mn
>>>>>> >> >> >
>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>> >> >> > wrote:
>>>>>> >> >> >
>>>>>>> >> >> >> Hi
>>>>>>> >> >> >>
>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>> >> >> >> running
>>>>>>> >> >> >> map reduce job.
>>>>>>> >> >> >>
>>>>>>> >> >> >> root@nn:~# df -h
>>>>>>> >> >> >> Filesystem                                       Size  Used
Avail
>>> >> Use%
>>>>>>> >> >> >> Mounted on
>>>>>>> >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>> >> 100%
>>>>>>> >> >> >> /
>>>>>>> >> >> >> udev                                              98M  4.0K
98M
>>> >>  1%
>>>>>>> >> >> >> /dev
>>>>>>> >> >> >> tmpfs                                             48M  192K
48M
>>> >>  1%
>>>>>>> >> >> >> /run
>>>>>>> >> >> >> none                                             5.0M     0
5.0M
>>> >>  0%
>>>>>>> >> >> >> /run/lock
>>>>>>> >> >> >> none                                             120M     0
120M
>>> >>  0%
>>>>>>> >> >> >> /run/shm
>>>>>>> >> >> >> overflow                                         1.0M  4.0K
1020K
>>> >>  1%
>>>>>>> >> >> >> /tmp
>>>>>>> >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>> >>  2%
>>>>>>> >> >> >> /mnt
>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>> >> 59%
>>>>>>> >> >> >> /proj/ch-geni-net
>>>>>>> >> >> >> root@nn:~#
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>> >> >> >>
>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I
>>>>>>> >> >> >> need
>>>>>>> >> >> >> to
>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> Thanks & Regards,
>>>>>>> >> >> >>
>>>>>>> >> >> >> Abdul Navaz
>>>>>>> >> >> >> Research Assistant
>>>>>>> >> >> >> University of Houston Main Campus, Houston TX
>>>>>>> >> >> >> Ph: 281-685-0388
>>>>>>> >> >> >>
>>>>>> >> >> >
>>>>>> >> >> >
>>>>> >> >>
>>>> >> >
>>> >>
>> >

Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.

Thank You Very much. This is what I am trying to do.

This is what storage I have.

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.3G  238M  96% /

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt


I have configured in dfs.datanode.dir in hdfs-site.

<name>dfs.datanode.data.dir</name>

<value>/mnt</value>




I have formatted the name node and restarted and it is still copying to  /
  and if it is full it throws an error instead of copying to  /mnt¹.

Error:
14/10/03 15:23:21 WARN hdfs.DFSClient: Could not get block locations. Source
file "/user/hduser/getty/data4" - Aborting...

put: java.io.IOException: File /user/hduser/getty/data4 could only be
replicated to 0 nodes, instead of 1

14/10/03 15:23:21 ERROR hdfs.DFSClient: Failed to close file
/user/hduser/getty/data4



Am I doing anything wrong here ?

Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  ViSolve Hadoop Support <ha...@visolve.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Friday, October 3, 2014 at 1:29 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job

    
 Hello,
 
 If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
 
 After the changes restart the hadoop services and check the available space
using the below command.
      # hadoop fs -df -h
 
 Regards,
 ViSolve Hadoop Team
 
  
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
 
 
>  
>  
> Hello,
>  
> 
>  
>  
> As you suggested I have changed the hdfs-site.xml file of datanodes and name
> node as below and formatted the name node.
>  
> 
>  
>  
>  
> 
> </property>
>  
> 
> <property>
>  
> 
> <name>dfs.datanode.data.dir</name>
>  
> 
> <value>/mnt</value>
>  
> 
> <description>Comma separated list of paths. Use the list of directories from
> $DFS_DATA_DIR.
>  
> 
>                 For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>  
> 
> </property>
>  
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.3G  258M  96% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  113G   70G  62%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
> 
>  
>  
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>  
> 
>  
>  
> Once /dev/xvda2 is full I am getting the below error message.
>  
> 
>  
>  
>  
> 
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>  
> 
> Warning: $HADOOP_HOME is deprecated.
>  
> 
> 
>  
>  
> 
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead of
> 1
>  
> 
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNames
> ystem.java:1639)
>  
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
> 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
> 
>  
>   
> From:  Abdul Navaz <na...@gmail.com>
>  Date:  Monday, September 29, 2014 at 1:53 PM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
>  
>  
>  
>  
> Dear All,
>  
> 
>  
>  
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.1G  533M  91% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ cp data2.txt data3.txt
>  
> 
> cp: writing `data3.txt': No space left on device
>  
> 
> cp: failed to extend `data3.txt': No space left on device
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
>  
> I guess by default it is copying to default location. Why I am getting this
> error ? How can I fix this ?
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
>  
> 
>  
>   
> From:  Aitor Cedres <ac...@pivotal.io>
>  Reply-To:  <us...@hadoop.apache.org>
>  Date:  Monday, September 29, 2014 at 7:53 AM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
> 
>  
> I think they way it works when HDFS has a list in dfs.datanode.data.dir, it's
> basically a round robin between disks. And yes, it may not be perfect balanced
> cause of different file sizes.
>  
>  
>  
> 
>  
>  
>  
>  
> On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
>  
>> Thank Aitor.
>>  
>>  That is what is my observation too.
>>  
>>  I added a new disk location and manually moved some files.
>>  
>>  But if 2 locations are given at the beginning itself for
>>  dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>>  perfect because file sizes may differ.
>>  
>>  
>> 
>>  On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>>>  > Hi Susheel,
>>>  >
>>>  > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>>>  > disks straightforward. Eventually, by HDFS activity
>>> (deleting/invalidating
>>>  > some block, writing new ones), the disks will become balanced. If you >>>
want
>>>  > to balance them right after adding the new disk and changing the
>>>  > ³dfs.datanode.data.dir²
>>>  > value, you have to shutdown the DN and manually move (mv) some files in
>>> the
>>>  > old directory to the new one.
>>>  >
>>>  > The balancer will try to balance the usage between HDFS nodes, but it
>>> won't
>>>  > care about "internal" node disks utilization. For your particular case,
>>> the
>>>  > balancer won't fix your issue.
>>>  >
>>>  > Hope it helps,
>>>  > Aitor
>>>  >
>>>  > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>>>  > wrote:
>>>  >
>>>>  >> You mean if multiple directory locations are given, Hadoop will
>>>>  >> balance the distribution of files across these different directories.
>>>>  >>
>>>>  >> But normally we start with 1 directory location and once it is
>>>>  >> reaching the maximum, we add new directory.
>>>>  >>
>>>>  >> In this case how can we balance the distribution of files?
>>>>  >>
>>>>  >> One way is to list the files and move.
>>>>  >>
>>>>  >> Will start balance script will work?
>>>>  >>
>>>>  >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>>>  >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>>>  >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>>> <sk...@gmail.com>
>>>>>  >> > wrote:
>>>>>  >> >
>>>>>>  >> >> Correct me if I am wrong.
>>>>>>  >> >>
>>>>>>  >> >> Adding multiple directories will not balance the files
>>>>>> distributions
>>>>>>  >> >> across these locations.
>>>>>>  >> >>
>>>>>>  >> >> Hadoop will add exhaust the first directory and then start using
the
>>>>>>  >> >> next, next ..
>>>>>>  >> >>
>>>>>>  >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>>>  >> >>
>>>>>>  >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>>>  >> >> > You can add a comma separated list of paths to the
>>>>>>  >> >> ³dfs.datanode.data.dir²
>>>>>>>  >> >> > property in your hdfs-site.xml
>>>>>>>  >> >> >
>>>>>>>  >> >> > mn
>>>>>>>  >> >> >
>>>>>>>  >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>>>  >> >> > wrote:
>>>>>>>  >> >> >
>>>>>>>>  >> >> >> Hi
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>>>  >> >> >> running
>>>>>>>>  >> >> >> map reduce job.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> root@nn:~# df -h
>>>>>>>>  >> >> >> Filesystem                                       Size  Used
Avail
>>>>  >> Use%
>>>>>>>>  >> >> >> Mounted on
>>>>>>>>  >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>>>  >> 100%
>>>>>>>>  >> >> >> /
>>>>>>>>  >> >> >> udev                                              98M  4.0K
98M
>>>>  >>  1%
>>>>>>>>  >> >> >> /dev
>>>>>>>>  >> >> >> tmpfs                                             48M  192K
48M
>>>>  >>  1%
>>>>>>>>  >> >> >> /run
>>>>>>>>  >> >> >> none                                             5.0M     0
5.0M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/lock
>>>>>>>>  >> >> >> none                                             120M     0
120M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/shm
>>>>>>>>  >> >> >> overflow                                         1.0M  4.0K
1020K
>>>>  >>  1%
>>>>>>>>  >> >> >> /tmp
>>>>>>>>  >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>>>  >>  2%
>>>>>>>>  >> >> >> /mnt
>>>>>>>>  >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>>>  >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /proj/ch-geni-net
>>>>>>>>  >> >> >> root@nn:~#
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do
I
>>>>>>>>  >> >> >> need
>>>>>>>>  >> >> >> to
>>>>>>>>  >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Thanks & Regards,
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Abdul Navaz
>>>>>>>>  >> >> >> Research Assistant
>>>>>>>>  >> >> >> University of Houston Main Campus, Houston TX
>>>>>>>>  >> >> >> Ph: 281-685-0388
>>>>>>>>  >> >> >>
>>>>>>>  >> >> >
>>>>>>>  >> >> >
>>>>>>  >> >>
>>>>>  >> >
>>>>  >>
>>>  >
>>  
>>  
>>  
>  
>  
>  
>  
>  
>  
>

Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.

Thank You Very much. This is what I am trying to do.

This is what storage I have.

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.3G  238M  96% /

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt


I have configured in dfs.datanode.dir in hdfs-site.

<name>dfs.datanode.data.dir</name>

<value>/mnt</value>




I have formatted the name node and restarted and it is still copying to  /
  and if it is full it throws an error instead of copying to  /mnt¹.

Error:
14/10/03 15:23:21 WARN hdfs.DFSClient: Could not get block locations. Source
file "/user/hduser/getty/data4" - Aborting...

put: java.io.IOException: File /user/hduser/getty/data4 could only be
replicated to 0 nodes, instead of 1

14/10/03 15:23:21 ERROR hdfs.DFSClient: Failed to close file
/user/hduser/getty/data4



Am I doing anything wrong here ?

Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  ViSolve Hadoop Support <ha...@visolve.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Friday, October 3, 2014 at 1:29 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job

    
 Hello,
 
 If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
 
 After the changes restart the hadoop services and check the available space
using the below command.
      # hadoop fs -df -h
 
 Regards,
 ViSolve Hadoop Team
 
  
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
 
 
>  
>  
> Hello,
>  
> 
>  
>  
> As you suggested I have changed the hdfs-site.xml file of datanodes and name
> node as below and formatted the name node.
>  
> 
>  
>  
>  
> 
> </property>
>  
> 
> <property>
>  
> 
> <name>dfs.datanode.data.dir</name>
>  
> 
> <value>/mnt</value>
>  
> 
> <description>Comma separated list of paths. Use the list of directories from
> $DFS_DATA_DIR.
>  
> 
>                 For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>  
> 
> </property>
>  
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.3G  258M  96% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  113G   70G  62%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
> 
>  
>  
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>  
> 
>  
>  
> Once /dev/xvda2 is full I am getting the below error message.
>  
> 
>  
>  
>  
> 
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>  
> 
> Warning: $HADOOP_HOME is deprecated.
>  
> 
> 
>  
>  
> 
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead of
> 1
>  
> 
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNames
> ystem.java:1639)
>  
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
> 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
> 
>  
>   
> From:  Abdul Navaz <na...@gmail.com>
>  Date:  Monday, September 29, 2014 at 1:53 PM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
>  
>  
>  
>  
> Dear All,
>  
> 
>  
>  
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.1G  533M  91% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ cp data2.txt data3.txt
>  
> 
> cp: writing `data3.txt': No space left on device
>  
> 
> cp: failed to extend `data3.txt': No space left on device
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
>  
> I guess by default it is copying to default location. Why I am getting this
> error ? How can I fix this ?
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
>  
> 
>  
>   
> From:  Aitor Cedres <ac...@pivotal.io>
>  Reply-To:  <us...@hadoop.apache.org>
>  Date:  Monday, September 29, 2014 at 7:53 AM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
> 
>  
> I think they way it works when HDFS has a list in dfs.datanode.data.dir, it's
> basically a round robin between disks. And yes, it may not be perfect balanced
> cause of different file sizes.
>  
>  
>  
> 
>  
>  
>  
>  
> On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
>  
>> Thank Aitor.
>>  
>>  That is what is my observation too.
>>  
>>  I added a new disk location and manually moved some files.
>>  
>>  But if 2 locations are given at the beginning itself for
>>  dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>>  perfect because file sizes may differ.
>>  
>>  
>> 
>>  On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>>>  > Hi Susheel,
>>>  >
>>>  > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>>>  > disks straightforward. Eventually, by HDFS activity
>>> (deleting/invalidating
>>>  > some block, writing new ones), the disks will become balanced. If you >>>
want
>>>  > to balance them right after adding the new disk and changing the
>>>  > ³dfs.datanode.data.dir²
>>>  > value, you have to shutdown the DN and manually move (mv) some files in
>>> the
>>>  > old directory to the new one.
>>>  >
>>>  > The balancer will try to balance the usage between HDFS nodes, but it
>>> won't
>>>  > care about "internal" node disks utilization. For your particular case,
>>> the
>>>  > balancer won't fix your issue.
>>>  >
>>>  > Hope it helps,
>>>  > Aitor
>>>  >
>>>  > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>>>  > wrote:
>>>  >
>>>>  >> You mean if multiple directory locations are given, Hadoop will
>>>>  >> balance the distribution of files across these different directories.
>>>>  >>
>>>>  >> But normally we start with 1 directory location and once it is
>>>>  >> reaching the maximum, we add new directory.
>>>>  >>
>>>>  >> In this case how can we balance the distribution of files?
>>>>  >>
>>>>  >> One way is to list the files and move.
>>>>  >>
>>>>  >> Will start balance script will work?
>>>>  >>
>>>>  >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>>>  >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>>>  >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>>> <sk...@gmail.com>
>>>>>  >> > wrote:
>>>>>  >> >
>>>>>>  >> >> Correct me if I am wrong.
>>>>>>  >> >>
>>>>>>  >> >> Adding multiple directories will not balance the files
>>>>>> distributions
>>>>>>  >> >> across these locations.
>>>>>>  >> >>
>>>>>>  >> >> Hadoop will add exhaust the first directory and then start using
the
>>>>>>  >> >> next, next ..
>>>>>>  >> >>
>>>>>>  >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>>>  >> >>
>>>>>>  >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>>>  >> >> > You can add a comma separated list of paths to the
>>>>>>  >> >> ³dfs.datanode.data.dir²
>>>>>>>  >> >> > property in your hdfs-site.xml
>>>>>>>  >> >> >
>>>>>>>  >> >> > mn
>>>>>>>  >> >> >
>>>>>>>  >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>>>  >> >> > wrote:
>>>>>>>  >> >> >
>>>>>>>>  >> >> >> Hi
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>>>  >> >> >> running
>>>>>>>>  >> >> >> map reduce job.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> root@nn:~# df -h
>>>>>>>>  >> >> >> Filesystem                                       Size  Used
Avail
>>>>  >> Use%
>>>>>>>>  >> >> >> Mounted on
>>>>>>>>  >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>>>  >> 100%
>>>>>>>>  >> >> >> /
>>>>>>>>  >> >> >> udev                                              98M  4.0K
98M
>>>>  >>  1%
>>>>>>>>  >> >> >> /dev
>>>>>>>>  >> >> >> tmpfs                                             48M  192K
48M
>>>>  >>  1%
>>>>>>>>  >> >> >> /run
>>>>>>>>  >> >> >> none                                             5.0M     0
5.0M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/lock
>>>>>>>>  >> >> >> none                                             120M     0
120M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/shm
>>>>>>>>  >> >> >> overflow                                         1.0M  4.0K
1020K
>>>>  >>  1%
>>>>>>>>  >> >> >> /tmp
>>>>>>>>  >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>>>  >>  2%
>>>>>>>>  >> >> >> /mnt
>>>>>>>>  >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>>>  >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /proj/ch-geni-net
>>>>>>>>  >> >> >> root@nn:~#
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do
I
>>>>>>>>  >> >> >> need
>>>>>>>>  >> >> >> to
>>>>>>>>  >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Thanks & Regards,
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Abdul Navaz
>>>>>>>>  >> >> >> Research Assistant
>>>>>>>>  >> >> >> University of Houston Main Campus, Houston TX
>>>>>>>>  >> >> >> Ph: 281-685-0388
>>>>>>>>  >> >> >>
>>>>>>>  >> >> >
>>>>>>>  >> >> >
>>>>>>  >> >>
>>>>>  >> >
>>>>  >>
>>>  >
>>  
>>  
>>  
>  
>  
>  
>  
>  
>  
>

Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.

Thank You Very much. This is what I am trying to do.

This is what storage I have.

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.3G  238M  96% /

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt


I have configured in dfs.datanode.dir in hdfs-site.

<name>dfs.datanode.data.dir</name>

<value>/mnt</value>




I have formatted the name node and restarted and it is still copying to  /
  and if it is full it throws an error instead of copying to  /mnt¹.

Error:
14/10/03 15:23:21 WARN hdfs.DFSClient: Could not get block locations. Source
file "/user/hduser/getty/data4" - Aborting...

put: java.io.IOException: File /user/hduser/getty/data4 could only be
replicated to 0 nodes, instead of 1

14/10/03 15:23:21 ERROR hdfs.DFSClient: Failed to close file
/user/hduser/getty/data4



Am I doing anything wrong here ?

Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  ViSolve Hadoop Support <ha...@visolve.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Friday, October 3, 2014 at 1:29 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job

    
 Hello,
 
 If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
 
 After the changes restart the hadoop services and check the available space
using the below command.
      # hadoop fs -df -h
 
 Regards,
 ViSolve Hadoop Team
 
  
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
 
 
>  
>  
> Hello,
>  
> 
>  
>  
> As you suggested I have changed the hdfs-site.xml file of datanodes and name
> node as below and formatted the name node.
>  
> 
>  
>  
>  
> 
> </property>
>  
> 
> <property>
>  
> 
> <name>dfs.datanode.data.dir</name>
>  
> 
> <value>/mnt</value>
>  
> 
> <description>Comma separated list of paths. Use the list of directories from
> $DFS_DATA_DIR.
>  
> 
>                 For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>  
> 
> </property>
>  
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.3G  258M  96% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  113G   70G  62%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
> 
>  
>  
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>  
> 
>  
>  
> Once /dev/xvda2 is full I am getting the below error message.
>  
> 
>  
>  
>  
> 
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>  
> 
> Warning: $HADOOP_HOME is deprecated.
>  
> 
> 
>  
>  
> 
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead of
> 1
>  
> 
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNames
> ystem.java:1639)
>  
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
> 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
> 
>  
>   
> From:  Abdul Navaz <na...@gmail.com>
>  Date:  Monday, September 29, 2014 at 1:53 PM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
>  
>  
>  
>  
> Dear All,
>  
> 
>  
>  
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.1G  533M  91% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ cp data2.txt data3.txt
>  
> 
> cp: writing `data3.txt': No space left on device
>  
> 
> cp: failed to extend `data3.txt': No space left on device
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
>  
> I guess by default it is copying to default location. Why I am getting this
> error ? How can I fix this ?
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
>  
> 
>  
>   
> From:  Aitor Cedres <ac...@pivotal.io>
>  Reply-To:  <us...@hadoop.apache.org>
>  Date:  Monday, September 29, 2014 at 7:53 AM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
> 
>  
> I think they way it works when HDFS has a list in dfs.datanode.data.dir, it's
> basically a round robin between disks. And yes, it may not be perfect balanced
> cause of different file sizes.
>  
>  
>  
> 
>  
>  
>  
>  
> On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
>  
>> Thank Aitor.
>>  
>>  That is what is my observation too.
>>  
>>  I added a new disk location and manually moved some files.
>>  
>>  But if 2 locations are given at the beginning itself for
>>  dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>>  perfect because file sizes may differ.
>>  
>>  
>> 
>>  On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>>>  > Hi Susheel,
>>>  >
>>>  > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>>>  > disks straightforward. Eventually, by HDFS activity
>>> (deleting/invalidating
>>>  > some block, writing new ones), the disks will become balanced. If you >>>
want
>>>  > to balance them right after adding the new disk and changing the
>>>  > ³dfs.datanode.data.dir²
>>>  > value, you have to shutdown the DN and manually move (mv) some files in
>>> the
>>>  > old directory to the new one.
>>>  >
>>>  > The balancer will try to balance the usage between HDFS nodes, but it
>>> won't
>>>  > care about "internal" node disks utilization. For your particular case,
>>> the
>>>  > balancer won't fix your issue.
>>>  >
>>>  > Hope it helps,
>>>  > Aitor
>>>  >
>>>  > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>>>  > wrote:
>>>  >
>>>>  >> You mean if multiple directory locations are given, Hadoop will
>>>>  >> balance the distribution of files across these different directories.
>>>>  >>
>>>>  >> But normally we start with 1 directory location and once it is
>>>>  >> reaching the maximum, we add new directory.
>>>>  >>
>>>>  >> In this case how can we balance the distribution of files?
>>>>  >>
>>>>  >> One way is to list the files and move.
>>>>  >>
>>>>  >> Will start balance script will work?
>>>>  >>
>>>>  >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>>>  >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>>>  >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>>> <sk...@gmail.com>
>>>>>  >> > wrote:
>>>>>  >> >
>>>>>>  >> >> Correct me if I am wrong.
>>>>>>  >> >>
>>>>>>  >> >> Adding multiple directories will not balance the files
>>>>>> distributions
>>>>>>  >> >> across these locations.
>>>>>>  >> >>
>>>>>>  >> >> Hadoop will add exhaust the first directory and then start using
the
>>>>>>  >> >> next, next ..
>>>>>>  >> >>
>>>>>>  >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>>>  >> >>
>>>>>>  >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>>>  >> >> > You can add a comma separated list of paths to the
>>>>>>  >> >> ³dfs.datanode.data.dir²
>>>>>>>  >> >> > property in your hdfs-site.xml
>>>>>>>  >> >> >
>>>>>>>  >> >> > mn
>>>>>>>  >> >> >
>>>>>>>  >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>>>  >> >> > wrote:
>>>>>>>  >> >> >
>>>>>>>>  >> >> >> Hi
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>>>  >> >> >> running
>>>>>>>>  >> >> >> map reduce job.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> root@nn:~# df -h
>>>>>>>>  >> >> >> Filesystem                                       Size  Used
Avail
>>>>  >> Use%
>>>>>>>>  >> >> >> Mounted on
>>>>>>>>  >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>>>  >> 100%
>>>>>>>>  >> >> >> /
>>>>>>>>  >> >> >> udev                                              98M  4.0K
98M
>>>>  >>  1%
>>>>>>>>  >> >> >> /dev
>>>>>>>>  >> >> >> tmpfs                                             48M  192K
48M
>>>>  >>  1%
>>>>>>>>  >> >> >> /run
>>>>>>>>  >> >> >> none                                             5.0M     0
5.0M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/lock
>>>>>>>>  >> >> >> none                                             120M     0
120M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/shm
>>>>>>>>  >> >> >> overflow                                         1.0M  4.0K
1020K
>>>>  >>  1%
>>>>>>>>  >> >> >> /tmp
>>>>>>>>  >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>>>  >>  2%
>>>>>>>>  >> >> >> /mnt
>>>>>>>>  >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>>>  >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /proj/ch-geni-net
>>>>>>>>  >> >> >> root@nn:~#
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do
I
>>>>>>>>  >> >> >> need
>>>>>>>>  >> >> >> to
>>>>>>>>  >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Thanks & Regards,
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Abdul Navaz
>>>>>>>>  >> >> >> Research Assistant
>>>>>>>>  >> >> >> University of Houston Main Campus, Houston TX
>>>>>>>>  >> >> >> Ph: 281-685-0388
>>>>>>>>  >> >> >>
>>>>>>>  >> >> >
>>>>>>>  >> >> >
>>>>>>  >> >>
>>>>>  >> >
>>>>  >>
>>>  >
>>  
>>  
>>  
>  
>  
>  
>  
>  
>  
>

Re: No space when running a hadoop job

Posted by Abdul Navaz <na...@gmail.com>.

Thank You Very much. This is what I am trying to do.

This is what storage I have.

Filesystem                                       Size  Used Avail Use%
Mounted on

/dev/xvda2                                       5.9G  5.3G  238M  96% /

/dev/xvda4                                       7.9G  147M  7.4G   2% /mnt


I have configured in dfs.datanode.dir in hdfs-site.

<name>dfs.datanode.data.dir</name>

<value>/mnt</value>




I have formatted the name node and restarted and it is still copying to  /
  and if it is full it throws an error instead of copying to  /mnt¹.

Error:
14/10/03 15:23:21 WARN hdfs.DFSClient: Could not get block locations. Source
file "/user/hduser/getty/data4" - Aborting...

put: java.io.IOException: File /user/hduser/getty/data4 could only be
replicated to 0 nodes, instead of 1

14/10/03 15:23:21 ERROR hdfs.DFSClient: Failed to close file
/user/hduser/getty/data4



Am I doing anything wrong here ?

Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  ViSolve Hadoop Support <ha...@visolve.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Friday, October 3, 2014 at 1:29 AM
To:  <us...@hadoop.apache.org>
Subject:  Re: No space when running a hadoop job

    
 Hello,
 
 If you want to use drive /dev/xvda4 only, then add file location for
'/dev/xvda4' and remove the file location for '/dev/xvda2' under
"dfs.datanode.data.dir".
 
 After the changes restart the hadoop services and check the available space
using the below command.
      # hadoop fs -df -h
 
 Regards,
 ViSolve Hadoop Team
 
  
On 10/3/2014 4:36 AM, Abdul Navaz wrote:
 
 
>  
>  
> Hello,
>  
> 
>  
>  
> As you suggested I have changed the hdfs-site.xml file of datanodes and name
> node as below and formatted the name node.
>  
> 
>  
>  
>  
> 
> </property>
>  
> 
> <property>
>  
> 
> <name>dfs.datanode.data.dir</name>
>  
> 
> <value>/mnt</value>
>  
> 
> <description>Comma separated list of paths. Use the list of directories from
> $DFS_DATA_DIR.
>  
> 
>                 For example,
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>  
> 
> </property>
>  
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.3G  258M  96% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  113G   70G  62%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
> 
>  
>  
> Even after doing so, the file is copied only to /dev/xvda2 instead of
> /dev/xvda4.
>  
> 
>  
>  
> Once /dev/xvda2 is full I am getting the below error message.
>  
> 
>  
>  
>  
> 
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>  
> 
> Warning: $HADOOP_HOME is deprecated.
>  
> 
> 
>  
>  
> 
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead of
> 1
>  
> 
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNames
> ystem.java:1639)
>  
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of
> 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
> 
>  
>   
> From:  Abdul Navaz <na...@gmail.com>
>  Date:  Monday, September 29, 2014 at 1:53 PM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
>  
>  
>  
>  
> Dear All,
>  
> 
>  
>  
> I am not doing load balancing here. I am just copying a file and it is
> throwing me an error no space left on the device.
>  
> 
>  
>  
> 
>  
>  
>  
> 
> hduser@dn1:~$ df -h
>  
> 
> Filesystem                                       Size  Used Avail Use% Mounted
> on
>  
> 
> /dev/xvda2                                       5.9G  5.1G  533M  91% /
>  
> 
> udev                                              98M  4.0K   98M   1% /dev
>  
> 
> tmpfs                                             48M  196K   48M   1% /run
>  
> 
> none                                             5.0M     0  5.0M   0%
> /run/lock
>  
> 
> none                                             120M     0  120M   0%
> /run/shm
>  
> 
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  116G   67G  64%
> /groups/ch-geni-net/Hadoop-NET
>  
> 
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64%
> /proj/ch-geni-net
>  
> 
> /dev/xvda4                                       7.9G  147M  7.4G   2% /mnt
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ 
>  
> 
> hduser@dn1:~$ cp data2.txt data3.txt
>  
> 
> cp: writing `data3.txt': No space left on device
>  
> 
> cp: failed to extend `data3.txt': No space left on device
>  
> 
> hduser@dn1:~$ 
>  
>  
> 
>  
>  
>  
> I guess by default it is copying to default location. Why I am getting this
> error ? How can I fix this ?
>  
> 
>  
>  
> 
>  
>  
> Thanks & Regards,
>  
> 
>  
>  
> Abdul Navaz
>  
> Research Assistant
>  
> University of Houston Main Campus, Houston TX
>  
> Ph: 281-685-0388
>  
> 
>  
>  
>  
>  
>  
> 
>  
>   
> From:  Aitor Cedres <ac...@pivotal.io>
>  Reply-To:  <us...@hadoop.apache.org>
>  Date:  Monday, September 29, 2014 at 7:53 AM
>  To:  <us...@hadoop.apache.org>
>  Subject:  Re: No space when running a hadoop job
>  
>  
> 
>  
>  
> 
>  
> I think they way it works when HDFS has a list in dfs.datanode.data.dir, it's
> basically a round robin between disks. And yes, it may not be perfect balanced
> cause of different file sizes.
>  
>  
>  
> 
>  
>  
>  
>  
> On 29 September 2014 13:15, Susheel Kumar Gadalay <sk...@gmail.com> wrote:
>  
>> Thank Aitor.
>>  
>>  That is what is my observation too.
>>  
>>  I added a new disk location and manually moved some files.
>>  
>>  But if 2 locations are given at the beginning itself for
>>  dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>>  perfect because file sizes may differ.
>>  
>>  
>> 
>>  On 9/29/14, Aitor Cedres <ac...@pivotal.io> wrote:
>>>  > Hi Susheel,
>>>  >
>>>  > Adding a new directory to ³dfs.datanode.data.dir² will not balance your
>>>  > disks straightforward. Eventually, by HDFS activity
>>> (deleting/invalidating
>>>  > some block, writing new ones), the disks will become balanced. If you >>>
want
>>>  > to balance them right after adding the new disk and changing the
>>>  > ³dfs.datanode.data.dir²
>>>  > value, you have to shutdown the DN and manually move (mv) some files in
>>> the
>>>  > old directory to the new one.
>>>  >
>>>  > The balancer will try to balance the usage between HDFS nodes, but it
>>> won't
>>>  > care about "internal" node disks utilization. For your particular case,
>>> the
>>>  > balancer won't fix your issue.
>>>  >
>>>  > Hope it helps,
>>>  > Aitor
>>>  >
>>>  > On 29 September 2014 05:53, Susheel Kumar Gadalay <sk...@gmail.com>
>>>  > wrote:
>>>  >
>>>>  >> You mean if multiple directory locations are given, Hadoop will
>>>>  >> balance the distribution of files across these different directories.
>>>>  >>
>>>>  >> But normally we start with 1 directory location and once it is
>>>>  >> reaching the maximum, we add new directory.
>>>>  >>
>>>>  >> In this case how can we balance the distribution of files?
>>>>  >>
>>>>  >> One way is to list the files and move.
>>>>  >>
>>>>  >> Will start balance script will work?
>>>>  >>
>>>>  >> On 9/27/14, Alexander Pivovarov <ap...@gmail.com> wrote:
>>>>>  >> > It can read/write in parallel to all drives. More hdd more io speed.
>>>>>  >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>>>>> <sk...@gmail.com>
>>>>>  >> > wrote:
>>>>>  >> >
>>>>>>  >> >> Correct me if I am wrong.
>>>>>>  >> >>
>>>>>>  >> >> Adding multiple directories will not balance the files
>>>>>> distributions
>>>>>>  >> >> across these locations.
>>>>>>  >> >>
>>>>>>  >> >> Hadoop will add exhaust the first directory and then start using
the
>>>>>>  >> >> next, next ..
>>>>>>  >> >>
>>>>>>  >> >> How can I tell Hadoop to evenly balance across these directories.
>>>>>>  >> >>
>>>>>>  >> >> On 9/26/14, Matt Narrell <ma...@gmail.com> wrote:
>>>>>>>  >> >> > You can add a comma separated list of paths to the
>>>>>>  >> >> ³dfs.datanode.data.dir²
>>>>>>>  >> >> > property in your hdfs-site.xml
>>>>>>>  >> >> >
>>>>>>>  >> >> > mn
>>>>>>>  >> >> >
>>>>>>>  >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <na...@gmail.com>
>>>>>>>  >> >> > wrote:
>>>>>>>  >> >> >
>>>>>>>>  >> >> >> Hi
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I am facing some space issue when I saving file into HDFS
and/or
>>>>>>>>  >> >> >> running
>>>>>>>>  >> >> >> map reduce job.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> root@nn:~# df -h
>>>>>>>>  >> >> >> Filesystem                                       Size  Used
Avail
>>>>  >> Use%
>>>>>>>>  >> >> >> Mounted on
>>>>>>>>  >> >> >> /dev/xvda2                                       5.9G  5.9G
0
>>>>  >> 100%
>>>>>>>>  >> >> >> /
>>>>>>>>  >> >> >> udev                                              98M  4.0K
98M
>>>>  >>  1%
>>>>>>>>  >> >> >> /dev
>>>>>>>>  >> >> >> tmpfs                                             48M  192K
48M
>>>>  >>  1%
>>>>>>>>  >> >> >> /run
>>>>>>>>  >> >> >> none                                             5.0M     0
5.0M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/lock
>>>>>>>>  >> >> >> none                                             120M     0
120M
>>>>  >>  0%
>>>>>>>>  >> >> >> /run/shm
>>>>>>>>  >> >> >> overflow                                         1.0M  4.0K
1020K
>>>>  >>  1%
>>>>>>>>  >> >> >> /tmp
>>>>>>>>  >> >> >> /dev/xvda4                                       7.9G  147M
7.4G
>>>>  >>  2%
>>>>>>>>  >> >> >> /mnt
>>>>>>>>  >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET  198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /groups/ch-geni-net/Hadoop-NET
>>>>>>>>  >> >> >> 172.17.253.254:/q/proj/ch-geni-net               198G  108G
75G
>>>>  >> 59%
>>>>>>>>  >> >> >> /proj/ch-geni-net
>>>>>>>>  >> >> >> root@nn:~#
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> I can see there is no space left on /dev/xvda2.
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do
I
>>>>>>>>  >> >> >> need
>>>>>>>>  >> >> >> to
>>>>>>>>  >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Thanks & Regards,
>>>>>>>>  >> >> >>
>>>>>>>>  >> >> >> Abdul Navaz
>>>>>>>>  >> >> >> Research Assistant
>>>>>>>>  >> >> >> University of Houston Main Campus, Houston TX
>>>>>>>>  >> >> >> Ph: 281-685-0388
>>>>>>>>  >> >> >>
>>>>>>>  >> >> >
>>>>>>>  >> >> >
>>>>>>  >> >>
>>>>>  >> >
>>>>  >>
>>>  >
>>  
>>  
>>  
>  
>  
>  
>  
>  
>  
>

Re: No space when running a hadoop job

Posted by ViSolve Hadoop Support <ha...@visolve.com>.

Hello,

If you want to use drive /dev/xvda4 only, then add file location for 
'/dev/xvda4' and remove the file location for '/dev/xvda2' under 
"dfs.datanode.data.dir".

After the changes restart the hadoop services and check the available 
space using the below command.
      # hadoop fs -df -h

Regards,
ViSolve Hadoop Team

On 10/3/2014 4:36 AM, Abdul Navaz wrote:
> Hello,
>
> As you suggested I have changed the hdfs-site.xml file of datanodes 
> and name node as below and formatted the name node.
>
> </property>
>
> <property>
>
> <name>dfs.datanode.data.dir</name>
>
> <value>/mnt</value>
>
> <description>Comma separated list of paths. Use the list of 
> directories from $DFS_DATA_DIR.
>
>                 For example, 
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
> </property>
>
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                             Size  Used Avail Use% Mounted on
>
> /dev/xvda2                             5.9G  5.3G  258M  96% /
>
> udev                             98M  4.0K   98M   1% /dev
>
> tmpfs                             48M  196K   48M   1% /run
>
> none                             5.0M     0  5.0M   0% /run/lock
>
> none                             120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  113G   70G  62% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62% 
> /proj/ch-geni-net
>
> /dev/xvda4                             7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of 
> /dev/xvda4.
>
> Once /dev/xvda2 is full I am getting the below error message.
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
> Warning: $HADOOP_HOME is deprecated.
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception: 
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, 
> instead of 1
>
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>
>
>
>
> Let me say like this: I don't want to use /dev/xvda2 as it has 
> capacity of 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Abdul Navaz <navaz.enc@gmail.com <ma...@gmail.com>>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
> Dear All,
>
> I am not doing load balancing here. I am just copying a file and it is 
> throwing me an error no space left on the device.
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                                     Size  Used Avail Use% 
> Mounted on
>
> /dev/xvda2               5.9G  5.1G  533M  91% /
>
> udev                                     98M  4.0K   98M   1% /dev
>
> tmpfs                                     48M  196K   48M   1% /run
>
> none                                     5.0M     0  5.0M   0% /run/lock
>
> none                                     120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  116G   67G  64% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64% 
> /proj/ch-geni-net
>
> /dev/xvda4               7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
> cp: writing `data3.txt': No space left on device
>
> cp: failed to extend `data3.txt': No space left on device
>
> hduser@dn1:~$
>
>
> I guess by default it is copying to default location. Why I am getting 
> this error ? How can I fix this ?
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Aitor Cedres <acedres@pivotal.io <ma...@pivotal.io>>
> Reply-To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
>
> I think they way it works when HDFS has a list 
> in dfs.datanode.data.dir, it's basically a round robin between disks. 
> And yes, it may not be perfect balanced cause of different file sizes.
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <skgadalay@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Thank Aitor.
>
>     That is what is my observation too.
>
>     I added a new disk location and manually moved some files.
>
>     But if 2 locations are given at the beginning itself for
>     dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>     perfect because file sizes may differ.
>
>     On 9/29/14, Aitor Cedres <acedres@pivotal.io
>     <ma...@pivotal.io>> wrote:
>     > Hi Susheel,
>     >
>     > Adding a new directory to "dfs.datanode.data.dir" will not
>     balance your
>     > disks straightforward. Eventually, by HDFS activity
>     (deleting/invalidating
>     > some block, writing new ones), the disks will become balanced.
>     If you want
>     > to balance them right after adding the new disk and changing the
>     > "dfs.datanode.data.dir"
>     > value, you have to shutdown the DN and manually move (mv) some
>     files in the
>     > old directory to the new one.
>     >
>     > The balancer will try to balance the usage between HDFS nodes,
>     but it won't
>     > care about "internal" node disks utilization. For your
>     particular case, the
>     > balancer won't fix your issue.
>     >
>     > Hope it helps,
>     > Aitor
>     >
>     > On 29 September 2014 05:53, Susheel Kumar Gadalay
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     > wrote:
>     >
>     >> You mean if multiple directory locations are given, Hadoop will
>     >> balance the distribution of files across these different
>     directories.
>     >>
>     >> But normally we start with 1 directory location and once it is
>     >> reaching the maximum, we add new directory.
>     >>
>     >> In this case how can we balance the distribution of files?
>     >>
>     >> One way is to list the files and move.
>     >>
>     >> Will start balance script will work?
>     >>
>     >> On 9/27/14, Alexander Pivovarov <apivovarov@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> > It can read/write in parallel to all drives. More hdd more io
>     speed.
>     >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     >> > wrote:
>     >> >
>     >> >> Correct me if I am wrong.
>     >> >>
>     >> >> Adding multiple directories will not balance the files
>     distributions
>     >> >> across these locations.
>     >> >>
>     >> >> Hadoop will add exhaust the first directory and then start
>     using the
>     >> >> next, next ..
>     >> >>
>     >> >> How can I tell Hadoop to evenly balance across these
>     directories.
>     >> >>
>     >> >> On 9/26/14, Matt Narrell <matt.narrell@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> >> > You can add a comma separated list of paths to the
>     >> >> "dfs.datanode.data.dir"
>     >> >> > property in your hdfs-site.xml
>     >> >> >
>     >> >> > mn
>     >> >> >
>     >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz
>     <navaz.enc@gmail.com <ma...@gmail.com>>
>     >> >> > wrote:
>     >> >> >
>     >> >> >> Hi
>     >> >> >>
>     >> >> >> I am facing some space issue when I saving file into HDFS
>     and/or
>     >> >> >> running
>     >> >> >> map reduce job.
>     >> >> >>
>     >> >> >> root@nn:~# df -h
>     >> >> >> Filesystem                              Size  Used Avail
>     >> Use%
>     >> >> >> Mounted on
>     >> >> >> /dev/xvda2                              5.9G  5.9G     0
>     >> 100%
>     >> >> >> /
>     >> >> >> udev                               98M  4.0K   98M
>     >>  1%
>     >> >> >> /dev
>     >> >> >> tmpfs                                48M  192K   48M
>     >>  1%
>     >> >> >> /run
>     >> >> >> none                              5.0M     0  5.0M
>     >>  0%
>     >> >> >> /run/lock
>     >> >> >> none                              120M     0  120M
>     >>  0%
>     >> >> >> /run/shm
>     >> >> >> overflow                              1.0M  4.0K 1020K
>     >>  1%
>     >> >> >> /tmp
>     >> >> >> /dev/xvda4                              7.9G  147M  7.4G
>     >>  2%
>     >> >> >> /mnt
>     >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 
>     108G   75G
>     >> 59%
>     >> >> >> /groups/ch-geni-net/Hadoop-NET
>     >> >> >> 172.17.253.254:/q/proj/ch-geni-net    198G  108G   75G
>     >> 59%
>     >> >> >> /proj/ch-geni-net
>     >> >> >> root@nn:~#
>     >> >> >>
>     >> >> >>
>     >> >> >> I can see there is no space left on /dev/xvda2.
>     >> >> >>
>     >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ?
>     Or do I
>     >> >> >> need
>     >> >> >> to
>     >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>     >> >> >>
>     >> >> >>
>     >> >> >>
>     >> >> >> Thanks & Regards,
>     >> >> >>
>     >> >> >> Abdul Navaz
>     >> >> >> Research Assistant
>     >> >> >> University of Houston Main Campus, Houston TX
>     >> >> >> Ph: 281-685-0388
>     >> >> >>
>     >> >> >
>     >> >> >
>     >> >>
>     >> >
>     >>
>     >
>
>

Re: No space when running a hadoop job

Posted by ViSolve Hadoop Support <ha...@visolve.com>.

Hello,

If you want to use drive /dev/xvda4 only, then add file location for 
'/dev/xvda4' and remove the file location for '/dev/xvda2' under 
"dfs.datanode.data.dir".

After the changes restart the hadoop services and check the available 
space using the below command.
      # hadoop fs -df -h

Regards,
ViSolve Hadoop Team

On 10/3/2014 4:36 AM, Abdul Navaz wrote:
> Hello,
>
> As you suggested I have changed the hdfs-site.xml file of datanodes 
> and name node as below and formatted the name node.
>
> </property>
>
> <property>
>
> <name>dfs.datanode.data.dir</name>
>
> <value>/mnt</value>
>
> <description>Comma separated list of paths. Use the list of 
> directories from $DFS_DATA_DIR.
>
>                 For example, 
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
> </property>
>
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                             Size  Used Avail Use% Mounted on
>
> /dev/xvda2                             5.9G  5.3G  258M  96% /
>
> udev                             98M  4.0K   98M   1% /dev
>
> tmpfs                             48M  196K   48M   1% /run
>
> none                             5.0M     0  5.0M   0% /run/lock
>
> none                             120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  113G   70G  62% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62% 
> /proj/ch-geni-net
>
> /dev/xvda4                             7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of 
> /dev/xvda4.
>
> Once /dev/xvda2 is full I am getting the below error message.
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
> Warning: $HADOOP_HOME is deprecated.
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception: 
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, 
> instead of 1
>
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>
>
>
>
> Let me say like this: I don't want to use /dev/xvda2 as it has 
> capacity of 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Abdul Navaz <navaz.enc@gmail.com <ma...@gmail.com>>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
> Dear All,
>
> I am not doing load balancing here. I am just copying a file and it is 
> throwing me an error no space left on the device.
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                                     Size  Used Avail Use% 
> Mounted on
>
> /dev/xvda2               5.9G  5.1G  533M  91% /
>
> udev                                     98M  4.0K   98M   1% /dev
>
> tmpfs                                     48M  196K   48M   1% /run
>
> none                                     5.0M     0  5.0M   0% /run/lock
>
> none                                     120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  116G   67G  64% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64% 
> /proj/ch-geni-net
>
> /dev/xvda4               7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
> cp: writing `data3.txt': No space left on device
>
> cp: failed to extend `data3.txt': No space left on device
>
> hduser@dn1:~$
>
>
> I guess by default it is copying to default location. Why I am getting 
> this error ? How can I fix this ?
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Aitor Cedres <acedres@pivotal.io <ma...@pivotal.io>>
> Reply-To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
>
> I think they way it works when HDFS has a list 
> in dfs.datanode.data.dir, it's basically a round robin between disks. 
> And yes, it may not be perfect balanced cause of different file sizes.
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <skgadalay@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Thank Aitor.
>
>     That is what is my observation too.
>
>     I added a new disk location and manually moved some files.
>
>     But if 2 locations are given at the beginning itself for
>     dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>     perfect because file sizes may differ.
>
>     On 9/29/14, Aitor Cedres <acedres@pivotal.io
>     <ma...@pivotal.io>> wrote:
>     > Hi Susheel,
>     >
>     > Adding a new directory to "dfs.datanode.data.dir" will not
>     balance your
>     > disks straightforward. Eventually, by HDFS activity
>     (deleting/invalidating
>     > some block, writing new ones), the disks will become balanced.
>     If you want
>     > to balance them right after adding the new disk and changing the
>     > "dfs.datanode.data.dir"
>     > value, you have to shutdown the DN and manually move (mv) some
>     files in the
>     > old directory to the new one.
>     >
>     > The balancer will try to balance the usage between HDFS nodes,
>     but it won't
>     > care about "internal" node disks utilization. For your
>     particular case, the
>     > balancer won't fix your issue.
>     >
>     > Hope it helps,
>     > Aitor
>     >
>     > On 29 September 2014 05:53, Susheel Kumar Gadalay
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     > wrote:
>     >
>     >> You mean if multiple directory locations are given, Hadoop will
>     >> balance the distribution of files across these different
>     directories.
>     >>
>     >> But normally we start with 1 directory location and once it is
>     >> reaching the maximum, we add new directory.
>     >>
>     >> In this case how can we balance the distribution of files?
>     >>
>     >> One way is to list the files and move.
>     >>
>     >> Will start balance script will work?
>     >>
>     >> On 9/27/14, Alexander Pivovarov <apivovarov@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> > It can read/write in parallel to all drives. More hdd more io
>     speed.
>     >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     >> > wrote:
>     >> >
>     >> >> Correct me if I am wrong.
>     >> >>
>     >> >> Adding multiple directories will not balance the files
>     distributions
>     >> >> across these locations.
>     >> >>
>     >> >> Hadoop will add exhaust the first directory and then start
>     using the
>     >> >> next, next ..
>     >> >>
>     >> >> How can I tell Hadoop to evenly balance across these
>     directories.
>     >> >>
>     >> >> On 9/26/14, Matt Narrell <matt.narrell@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> >> > You can add a comma separated list of paths to the
>     >> >> "dfs.datanode.data.dir"
>     >> >> > property in your hdfs-site.xml
>     >> >> >
>     >> >> > mn
>     >> >> >
>     >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz
>     <navaz.enc@gmail.com <ma...@gmail.com>>
>     >> >> > wrote:
>     >> >> >
>     >> >> >> Hi
>     >> >> >>
>     >> >> >> I am facing some space issue when I saving file into HDFS
>     and/or
>     >> >> >> running
>     >> >> >> map reduce job.
>     >> >> >>
>     >> >> >> root@nn:~# df -h
>     >> >> >> Filesystem                              Size  Used Avail
>     >> Use%
>     >> >> >> Mounted on
>     >> >> >> /dev/xvda2                              5.9G  5.9G     0
>     >> 100%
>     >> >> >> /
>     >> >> >> udev                               98M  4.0K   98M
>     >>  1%
>     >> >> >> /dev
>     >> >> >> tmpfs                                48M  192K   48M
>     >>  1%
>     >> >> >> /run
>     >> >> >> none                              5.0M     0  5.0M
>     >>  0%
>     >> >> >> /run/lock
>     >> >> >> none                              120M     0  120M
>     >>  0%
>     >> >> >> /run/shm
>     >> >> >> overflow                              1.0M  4.0K 1020K
>     >>  1%
>     >> >> >> /tmp
>     >> >> >> /dev/xvda4                              7.9G  147M  7.4G
>     >>  2%
>     >> >> >> /mnt
>     >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 
>     108G   75G
>     >> 59%
>     >> >> >> /groups/ch-geni-net/Hadoop-NET
>     >> >> >> 172.17.253.254:/q/proj/ch-geni-net    198G  108G   75G
>     >> 59%
>     >> >> >> /proj/ch-geni-net
>     >> >> >> root@nn:~#
>     >> >> >>
>     >> >> >>
>     >> >> >> I can see there is no space left on /dev/xvda2.
>     >> >> >>
>     >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ?
>     Or do I
>     >> >> >> need
>     >> >> >> to
>     >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>     >> >> >>
>     >> >> >>
>     >> >> >>
>     >> >> >> Thanks & Regards,
>     >> >> >>
>     >> >> >> Abdul Navaz
>     >> >> >> Research Assistant
>     >> >> >> University of Houston Main Campus, Houston TX
>     >> >> >> Ph: 281-685-0388
>     >> >> >>
>     >> >> >
>     >> >> >
>     >> >>
>     >> >
>     >>
>     >
>
>

Re: No space when running a hadoop job

Posted by ViSolve Hadoop Support <ha...@visolve.com>.

Hello,

If you want to use drive /dev/xvda4 only, then add file location for 
'/dev/xvda4' and remove the file location for '/dev/xvda2' under 
"dfs.datanode.data.dir".

After the changes restart the hadoop services and check the available 
space using the below command.
      # hadoop fs -df -h

Regards,
ViSolve Hadoop Team

On 10/3/2014 4:36 AM, Abdul Navaz wrote:
> Hello,
>
> As you suggested I have changed the hdfs-site.xml file of datanodes 
> and name node as below and formatted the name node.
>
> </property>
>
> <property>
>
> <name>dfs.datanode.data.dir</name>
>
> <value>/mnt</value>
>
> <description>Comma separated list of paths. Use the list of 
> directories from $DFS_DATA_DIR.
>
>                 For example, 
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
> </property>
>
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                             Size  Used Avail Use% Mounted on
>
> /dev/xvda2                             5.9G  5.3G  258M  96% /
>
> udev                             98M  4.0K   98M   1% /dev
>
> tmpfs                             48M  196K   48M   1% /run
>
> none                             5.0M     0  5.0M   0% /run/lock
>
> none                             120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  113G   70G  62% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62% 
> /proj/ch-geni-net
>
> /dev/xvda4                             7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of 
> /dev/xvda4.
>
> Once /dev/xvda2 is full I am getting the below error message.
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
> Warning: $HADOOP_HOME is deprecated.
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception: 
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, 
> instead of 1
>
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>
>
>
>
> Let me say like this: I don't want to use /dev/xvda2 as it has 
> capacity of 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Abdul Navaz <navaz.enc@gmail.com <ma...@gmail.com>>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
> Dear All,
>
> I am not doing load balancing here. I am just copying a file and it is 
> throwing me an error no space left on the device.
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                                     Size  Used Avail Use% 
> Mounted on
>
> /dev/xvda2               5.9G  5.1G  533M  91% /
>
> udev                                     98M  4.0K   98M   1% /dev
>
> tmpfs                                     48M  196K   48M   1% /run
>
> none                                     5.0M     0  5.0M   0% /run/lock
>
> none                                     120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  116G   67G  64% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64% 
> /proj/ch-geni-net
>
> /dev/xvda4               7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
> cp: writing `data3.txt': No space left on device
>
> cp: failed to extend `data3.txt': No space left on device
>
> hduser@dn1:~$
>
>
> I guess by default it is copying to default location. Why I am getting 
> this error ? How can I fix this ?
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Aitor Cedres <acedres@pivotal.io <ma...@pivotal.io>>
> Reply-To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
>
> I think they way it works when HDFS has a list 
> in dfs.datanode.data.dir, it's basically a round robin between disks. 
> And yes, it may not be perfect balanced cause of different file sizes.
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <skgadalay@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Thank Aitor.
>
>     That is what is my observation too.
>
>     I added a new disk location and manually moved some files.
>
>     But if 2 locations are given at the beginning itself for
>     dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>     perfect because file sizes may differ.
>
>     On 9/29/14, Aitor Cedres <acedres@pivotal.io
>     <ma...@pivotal.io>> wrote:
>     > Hi Susheel,
>     >
>     > Adding a new directory to "dfs.datanode.data.dir" will not
>     balance your
>     > disks straightforward. Eventually, by HDFS activity
>     (deleting/invalidating
>     > some block, writing new ones), the disks will become balanced.
>     If you want
>     > to balance them right after adding the new disk and changing the
>     > "dfs.datanode.data.dir"
>     > value, you have to shutdown the DN and manually move (mv) some
>     files in the
>     > old directory to the new one.
>     >
>     > The balancer will try to balance the usage between HDFS nodes,
>     but it won't
>     > care about "internal" node disks utilization. For your
>     particular case, the
>     > balancer won't fix your issue.
>     >
>     > Hope it helps,
>     > Aitor
>     >
>     > On 29 September 2014 05:53, Susheel Kumar Gadalay
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     > wrote:
>     >
>     >> You mean if multiple directory locations are given, Hadoop will
>     >> balance the distribution of files across these different
>     directories.
>     >>
>     >> But normally we start with 1 directory location and once it is
>     >> reaching the maximum, we add new directory.
>     >>
>     >> In this case how can we balance the distribution of files?
>     >>
>     >> One way is to list the files and move.
>     >>
>     >> Will start balance script will work?
>     >>
>     >> On 9/27/14, Alexander Pivovarov <apivovarov@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> > It can read/write in parallel to all drives. More hdd more io
>     speed.
>     >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     >> > wrote:
>     >> >
>     >> >> Correct me if I am wrong.
>     >> >>
>     >> >> Adding multiple directories will not balance the files
>     distributions
>     >> >> across these locations.
>     >> >>
>     >> >> Hadoop will add exhaust the first directory and then start
>     using the
>     >> >> next, next ..
>     >> >>
>     >> >> How can I tell Hadoop to evenly balance across these
>     directories.
>     >> >>
>     >> >> On 9/26/14, Matt Narrell <matt.narrell@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> >> > You can add a comma separated list of paths to the
>     >> >> "dfs.datanode.data.dir"
>     >> >> > property in your hdfs-site.xml
>     >> >> >
>     >> >> > mn
>     >> >> >
>     >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz
>     <navaz.enc@gmail.com <ma...@gmail.com>>
>     >> >> > wrote:
>     >> >> >
>     >> >> >> Hi
>     >> >> >>
>     >> >> >> I am facing some space issue when I saving file into HDFS
>     and/or
>     >> >> >> running
>     >> >> >> map reduce job.
>     >> >> >>
>     >> >> >> root@nn:~# df -h
>     >> >> >> Filesystem                              Size  Used Avail
>     >> Use%
>     >> >> >> Mounted on
>     >> >> >> /dev/xvda2                              5.9G  5.9G     0
>     >> 100%
>     >> >> >> /
>     >> >> >> udev                               98M  4.0K   98M
>     >>  1%
>     >> >> >> /dev
>     >> >> >> tmpfs                                48M  192K   48M
>     >>  1%
>     >> >> >> /run
>     >> >> >> none                              5.0M     0  5.0M
>     >>  0%
>     >> >> >> /run/lock
>     >> >> >> none                              120M     0  120M
>     >>  0%
>     >> >> >> /run/shm
>     >> >> >> overflow                              1.0M  4.0K 1020K
>     >>  1%
>     >> >> >> /tmp
>     >> >> >> /dev/xvda4                              7.9G  147M  7.4G
>     >>  2%
>     >> >> >> /mnt
>     >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 
>     108G   75G
>     >> 59%
>     >> >> >> /groups/ch-geni-net/Hadoop-NET
>     >> >> >> 172.17.253.254:/q/proj/ch-geni-net    198G  108G   75G
>     >> 59%
>     >> >> >> /proj/ch-geni-net
>     >> >> >> root@nn:~#
>     >> >> >>
>     >> >> >>
>     >> >> >> I can see there is no space left on /dev/xvda2.
>     >> >> >>
>     >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ?
>     Or do I
>     >> >> >> need
>     >> >> >> to
>     >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>     >> >> >>
>     >> >> >>
>     >> >> >>
>     >> >> >> Thanks & Regards,
>     >> >> >>
>     >> >> >> Abdul Navaz
>     >> >> >> Research Assistant
>     >> >> >> University of Houston Main Campus, Houston TX
>     >> >> >> Ph: 281-685-0388
>     >> >> >>
>     >> >> >
>     >> >> >
>     >> >>
>     >> >
>     >>
>     >
>
>

Re: No space when running a hadoop job

Posted by ViSolve Hadoop Support <ha...@visolve.com>.

Hello,

If you want to use drive /dev/xvda4 only, then add file location for 
'/dev/xvda4' and remove the file location for '/dev/xvda2' under 
"dfs.datanode.data.dir".

After the changes restart the hadoop services and check the available 
space using the below command.
      # hadoop fs -df -h

Regards,
ViSolve Hadoop Team

On 10/3/2014 4:36 AM, Abdul Navaz wrote:
> Hello,
>
> As you suggested I have changed the hdfs-site.xml file of datanodes 
> and name node as below and formatted the name node.
>
> </property>
>
> <property>
>
> <name>dfs.datanode.data.dir</name>
>
> <value>/mnt</value>
>
> <description>Comma separated list of paths. Use the list of 
> directories from $DFS_DATA_DIR.
>
>                 For example, 
> /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description>
>
> </property>
>
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                             Size  Used Avail Use% Mounted on
>
> /dev/xvda2                             5.9G  5.3G  258M  96% /
>
> udev                             98M  4.0K   98M   1% /dev
>
> tmpfs                             48M  196K   48M   1% /run
>
> none                             5.0M     0  5.0M   0% /run/lock
>
> none                             120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  113G   70G  62% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  113G   70G  62% 
> /proj/ch-geni-net
>
> /dev/xvda4                             7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
>
>
> Even after doing so, the file is copied only to /dev/xvda2 instead of 
> /dev/xvda4.
>
> Once /dev/xvda2 is full I am getting the below error message.
>
> hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt
>
> Warning: $HADOOP_HOME is deprecated.
>
>
> 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception: 
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
> /user/hduser/getty/file12.txt could only be replicated to 0 nodes, 
> instead of 1
>
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>
>
>
>
> Let me say like this: I don't want to use /dev/xvda2 as it has 
> capacity of 5.9GB , I want to use only /dev/xvda4. How can I do this ?
>
>
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Abdul Navaz <navaz.enc@gmail.com <ma...@gmail.com>>
> Date: Monday, September 29, 2014 at 1:53 PM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
> Dear All,
>
> I am not doing load balancing here. I am just copying a file and it is 
> throwing me an error no space left on the device.
>
>
> hduser@dn1:~$ df -h
>
> Filesystem                                     Size  Used Avail Use% 
> Mounted on
>
> /dev/xvda2               5.9G  5.1G  533M  91% /
>
> udev                                     98M  4.0K   98M   1% /dev
>
> tmpfs                                     48M  196K   48M   1% /run
>
> none                                     5.0M     0  5.0M   0% /run/lock
>
> none                                     120M     0  120M   0% /run/shm
>
> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G  116G   67G  64% 
> /groups/ch-geni-net/Hadoop-NET
>
> 172.17.253.254:/q/proj/ch-geni-net               198G  116G   67G  64% 
> /proj/ch-geni-net
>
> /dev/xvda4               7.9G  147M  7.4G   2% /mnt
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$
>
> hduser@dn1:~$ cp data2.txt data3.txt
>
> cp: writing `data3.txt': No space left on device
>
> cp: failed to extend `data3.txt': No space left on device
>
> hduser@dn1:~$
>
>
> I guess by default it is copying to default location. Why I am getting 
> this error ? How can I fix this ?
>
>
> Thanks & Regards,
>
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388
>
>
> From: Aitor Cedres <acedres@pivotal.io <ma...@pivotal.io>>
> Reply-To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Date: Monday, September 29, 2014 at 7:53 AM
> To: <user@hadoop.apache.org <ma...@hadoop.apache.org>>
> Subject: Re: No space when running a hadoop job
>
>
> I think they way it works when HDFS has a list 
> in dfs.datanode.data.dir, it's basically a round robin between disks. 
> And yes, it may not be perfect balanced cause of different file sizes.
>
>
> On 29 September 2014 13:15, Susheel Kumar Gadalay <skgadalay@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Thank Aitor.
>
>     That is what is my observation too.
>
>     I added a new disk location and manually moved some files.
>
>     But if 2 locations are given at the beginning itself for
>     dfs.datanode.data.dir, will hadoop balance the disks usage, if not
>     perfect because file sizes may differ.
>
>     On 9/29/14, Aitor Cedres <acedres@pivotal.io
>     <ma...@pivotal.io>> wrote:
>     > Hi Susheel,
>     >
>     > Adding a new directory to "dfs.datanode.data.dir" will not
>     balance your
>     > disks straightforward. Eventually, by HDFS activity
>     (deleting/invalidating
>     > some block, writing new ones), the disks will become balanced.
>     If you want
>     > to balance them right after adding the new disk and changing the
>     > "dfs.datanode.data.dir"
>     > value, you have to shutdown the DN and manually move (mv) some
>     files in the
>     > old directory to the new one.
>     >
>     > The balancer will try to balance the usage between HDFS nodes,
>     but it won't
>     > care about "internal" node disks utilization. For your
>     particular case, the
>     > balancer won't fix your issue.
>     >
>     > Hope it helps,
>     > Aitor
>     >
>     > On 29 September 2014 05:53, Susheel Kumar Gadalay
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     > wrote:
>     >
>     >> You mean if multiple directory locations are given, Hadoop will
>     >> balance the distribution of files across these different
>     directories.
>     >>
>     >> But normally we start with 1 directory location and once it is
>     >> reaching the maximum, we add new directory.
>     >>
>     >> In this case how can we balance the distribution of files?
>     >>
>     >> One way is to list the files and move.
>     >>
>     >> Will start balance script will work?
>     >>
>     >> On 9/27/14, Alexander Pivovarov <apivovarov@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> > It can read/write in parallel to all drives. More hdd more io
>     speed.
>     >> >  On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay"
>     <skgadalay@gmail.com <ma...@gmail.com>>
>     >> > wrote:
>     >> >
>     >> >> Correct me if I am wrong.
>     >> >>
>     >> >> Adding multiple directories will not balance the files
>     distributions
>     >> >> across these locations.
>     >> >>
>     >> >> Hadoop will add exhaust the first directory and then start
>     using the
>     >> >> next, next ..
>     >> >>
>     >> >> How can I tell Hadoop to evenly balance across these
>     directories.
>     >> >>
>     >> >> On 9/26/14, Matt Narrell <matt.narrell@gmail.com
>     <ma...@gmail.com>> wrote:
>     >> >> > You can add a comma separated list of paths to the
>     >> >> "dfs.datanode.data.dir"
>     >> >> > property in your hdfs-site.xml
>     >> >> >
>     >> >> > mn
>     >> >> >
>     >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz
>     <navaz.enc@gmail.com <ma...@gmail.com>>
>     >> >> > wrote:
>     >> >> >
>     >> >> >> Hi
>     >> >> >>
>     >> >> >> I am facing some space issue when I saving file into HDFS
>     and/or
>     >> >> >> running
>     >> >> >> map reduce job.
>     >> >> >>
>     >> >> >> root@nn:~# df -h
>     >> >> >> Filesystem                              Size  Used Avail
>     >> Use%
>     >> >> >> Mounted on
>     >> >> >> /dev/xvda2                              5.9G  5.9G     0
>     >> 100%
>     >> >> >> /
>     >> >> >> udev                               98M  4.0K   98M
>     >>  1%
>     >> >> >> /dev
>     >> >> >> tmpfs                                48M  192K   48M
>     >>  1%
>     >> >> >> /run
>     >> >> >> none                              5.0M     0  5.0M
>     >>  0%
>     >> >> >> /run/lock
>     >> >> >> none                              120M     0  120M
>     >>  0%
>     >> >> >> /run/shm
>     >> >> >> overflow                              1.0M  4.0K 1020K
>     >>  1%
>     >> >> >> /tmp
>     >> >> >> /dev/xvda4                              7.9G  147M  7.4G
>     >>  2%
>     >> >> >> /mnt
>     >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 
>     108G   75G
>     >> 59%
>     >> >> >> /groups/ch-geni-net/Hadoop-NET
>     >> >> >> 172.17.253.254:/q/proj/ch-geni-net    198G  108G   75G
>     >> 59%
>     >> >> >> /proj/ch-geni-net
>     >> >> >> root@nn:~#
>     >> >> >>
>     >> >> >>
>     >> >> >> I can see there is no space left on /dev/xvda2.
>     >> >> >>
>     >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ?
>     Or do I
>     >> >> >> need
>     >> >> >> to
>     >> >> >> move the file manually from /dev/xvda2 to xvda4 ?
>     >> >> >>
>     >> >> >>
>     >> >> >>
>     >> >> >> Thanks & Regards,
>     >> >> >>
>     >> >> >> Abdul Navaz
>     >> >> >> Research Assistant
>     >> >> >> University of Houston Main Campus, Houston TX
>     >> >> >> Ph: 281-685-0388
>     >> >> >>
>     >> >> >
>     >> >> >
>     >> >>
>     >> >
>     >>
>     >
>
>